Ingesting

Ingesting files from external storage

A particularly powerful feature of Ngenea HSM is the ability to ingest existing data into a PixStor file system by "reverse stubbing". This process creates a migrated file stub on the file system which points to any file on a defined external storage target. The file is then immediately accessible via the PixStor file system as if it had been natively created and then migrated via Ngenea HSM.

In this way, it is possible to rapidly and efficiently migrate any existing data into a PixStor file system, without requiring a wholesale copy or move of data. Only metadata records need to be created prior to beginning use of the data via the PixStor file system. Once this initial metadata creation is complete, data will automatically migrate to the file system on access, and can also be brought across as a background process.

Note

If a storage endpoint is not a PixStor filesystem, ngrecall may change the access time of files it is accessing in the storage endpoint while creating reverse stubs for them.

The process for ingesting existing data holdings will vary based on requirements, but the process will typically consist of:

  1. define the Ngenea HSM configuration for the external storage

  2. generate a list of file/object paths to be ingested

  3. create any required directories

  4. create reverse stubs inside the directories with the command ngrecall --stub

Example - ingesting existing NFS storage

In this example, an existing storage system is mounted via NFS at /mnt/legacy on the Ngenea HSM node(s).

All data will be ingested into the legacy/ folder on the PixStor file system at /mmfs1/.

The goal is to eventually move all data from the legacy system into the /mmfs1 file system.

Simultaneously, the /mmfs1 file system will be enabled with Ngenea HSM to migrate data to an S3 storage target, as per a standard Ngenea HSM deployment.

Master configuration files

Here, the external storage is coupled to the /mmfs1/legacy path. Since a different target is being used for subsequent migrations, this is created as a dedicated configuration file for the ingest (/opt/arcapix/etc/ngenea-ingest.conf). The default configuration file (/opt/arcapix/etc/ngenea.conf) defines how to recall data from the target as well as setting the default migration (and recall) target(s) for subsequently migrated data.

/opt/arcapix/etc/ngenea-ingest.conf

This specifies the location of the configuration file for the legacy storage (legacy_nfs.conf), and assigns it as the target to be used for files under the /mmfs1/legacy path, with the relative path set underneath /mmfs1/legacy. For example, a file with path /mmfs1/legacy/folder1/file1 would be mapped to a file on the target storage at /folder1/file1, relative to its root - i.e. /mnt/legacy/folder1/file1 in this scenario.

[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/mmfs1/legacy/(.+)

/opt/arcapix/etc/ngenea.conf

This specifies the configuration file to be used where file data blocks are stored in the legacy_nfs target, and also an AWS object storage target which will be the default used for subsequent migrations for the whole /mmfs1 file system.

The use of READONLY as the LocalFileRegex effectively disables any use of the legacy NFS storage for standard data migrations.

[Storage aws_bucket1]
StorageType=AmazonS3
ConfigFile=/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf
RemoteLocationXAttrRegex=aws_bucket1:(.+)
LocalFileRegex=/mmfs1/(.+)
[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/READONLY(.+)

Storage Target configuration files

/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf

This specifies that the legacy storage is mounted at /mnt/legacy. It also enables DeleteOnRecall, as the goal is to move all data off the legacy storage over time.

If it were to be used ongoing as a migration target, DeleteOnRecall would typically be set to False.

[General]
RemoteLocationXAttr=legacy_nfs:$1
RetrieveObjectBasePath=/mnt/legacy
RetrieveObjectName=$1
StoreObjectBasePath=/mnt/legacy
StoreObjectName=$1
EnsureMountPoint=/mnt/legacy
DeleteOnRecall=True
ObjectXAttrManipulationMode=auto

/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf

Note that we use DeleteOnRecall=False here, as this will be the general purpose migration target, and we wish to make use of premigration functionality.

[General]
AccessKeyId=ACCESSKEYID
SecretAccessKey=SECRETACCESSKEY
Bucket=my_ngenea_bucket
Region=eu-west-2
Scheme=HTTPS
SSLVerify=True
RemoteLocationXAttr=aws_bucket1:$1
RetrieveObjectName=$1
StoreObjectName=$1
DeleteOnRecall=False

Ingest command

The following ngrecall command will:

  1. scan the legacy storage;

  2. create any required directories;

  3. create reverse stubs for all files contained;

  4. print the names of created stub files.

It is safe to re-run it multiple times - it will skip past any files which already exist.

Note

Storage existing prior to Ngenea HSM operations does not contain files with UUID suffixes created by ngmigrate. In this case, ingestion speed can be substantially increased by additionally passing the option --all-obj-instances to ngrecall. That option disables scanning files that differ in UUID suffixes to ingest only most recently migrated files.

The command can be executed on a sub-folder basis to perform a selective ingest. For example, to process only the folder /mnt/legacy/folder1:

ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs:folder1

Alternatively, to ingest the whole file system:

ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs

Creating Stub Files with Supplied Parameters

Use the ngmakestub tool to create stub files using parameters supplied on the command line without accessing remote storage. Later, the ngrecall tool can use the stub files to recall actual data from remote storage.

ngmakestub

Synopsis

ngmakestub --size=LENGTH --uuid=STRING
           [ --fmode=MODE ] [ --uid=USER ] [ --gid=GROUP ]
           [ --sha512=STRING ] [ --acl=STRING ]
           [ --atime=TIME ] [ --mtime=TIME ] [ --ctime=TIME ]
           ( --remote-loc=KEY/LOCATION )+
           ( --xattr=NAME=VALUE )* FILE1 ... FILEn

Description

Creates stub (migrated) files.

Options

--acl=STRING    ACL in ngenea format to set for a stub file.
--atime=TIME    last access time in RFC 2822 or 3339 format to set for a stub
                file. Examples:
                "Wed, 15 Jan 2020 14:56:57 +0300"     - RFC 2822;
                "2020-01-15 14:56:57.234567890+03:00" - RFC 3339.
--ctime=TIME    last status change time in RFC 2822 or 3339 format to set for a
                stub file. See the description of `--atime=TIME'
                option for examples.
--fmode=MODE    file mode bits in octal format to set for a stub file.
--gid=GROUP     file group (in NUMBER, STRING, /NUMBER, STRING/, or
                STRING/NUMBER format) to set for a stub file.
--help          display this help and exit.
--hex-xattr-name
                treat an extended attribute NAME in the option
                `--xattr=NAME=VALUE' as a byte sequence in hexadecimal format.
--hex-xattr-value
                treat an extended attribute VALUE in the option
                `--xattr=NAME=VALUE' as a byte sequence in hexadecimal format.
--lock-level=partial|implicit
                DMAPI locking level:
                "partial"  - explicitly request an exclusive DMAPI access
                             right when creating a stub file;
                "implicit" - instruct DMAPI to self-manage access rights when
                             creating a stub file.
                Default: "partial".
--log-format=json
                log messages in JSON format.
                Conflicts with: --log-target=syslog.
--log-target=syslog
                redirect all logging to the syslog.
                Conflicts with: --log-format=json.
--mtime=TIME    last modification time in RFC 2822 or 3339 format to set for a
                stub file. See the description of `--atime=TIME'
                option for examples.
--no-flock      disable using lock files.
                Sets lock level to "implicit" if it is not set.
                Conflicts with: --lock-level=partial.
--overwrite-local
                overwrite local files if they already exist.
--remote-loc=KEY/LOCATION
                remote location to write to a "APXrmtXX" DMAPI xattr of a
                stub file, where XX is a two-character KEY (must not be equal
                to "sz"). The DMAPI xattr takes a value LOCATION. This option
                can be present multiple times on the command line.
--sha512=STRING hexadecimal SHA-512 string to set in the "APXsh512" DMAPI
                xattr of a stub file. The string must contain 128 lowercase
                hexadecimal digits.
--size=LENGTH   stub file size in bytes.
--uid=USER      file owner (in NUMBER, STRING, /NUMBER, STRING/, or
                STRING/NUMBER format) to set for a stub file.
--uuid=STRING   UUID to write to the "APXguuid" DMAPI xattr of a stub file in
                the format XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX where X is a
                lowercase hexadecimal digit.
-v, --verbose   print the names of created stub files.
-V, --version   display version information and exit.
--xattr=NAME=VALUE
                set an extended attribute NAME with a VALUE for a stub file.
                This option can be present multiple times on the command line.
                Compatible with: --hex-xattr-name; --hex-xattr-value.

Examples

Create the stub file "mylocalstub" with size 4096 bytes, file mode 644, the DMAPI extended attribute APXguuid containing a randomly generated UUID, and the DMAPI extended attribute APXrmtlc containing the remote location "myendpoint:myobject":

ngmakestub --uuid="$(/bin/uuidgen)" --remote-loc=lc/myendpoint:myobject --size=4096 --fmode=644 mylocalstub

Create the stub file "tst.1.reconstructed" with size 4 GiB, the DMAPI extended attribute APXguuid containing the UUID value, the DMAPI extended attribute APXsh512 containing the hash value, and the DMAPI extended attribute APXrmt01 containing the remote location "awss3:tst.1":

ngmakestub --size=4294967296 --uuid=9e1a9c22-1f6a-4026-872b-e97cd8071dc2 --rmtlc=01/awss3:tst.1  \
           --sha512=43b5c6f434f71daae80a502212dc8c0e9e52d8b075d589afa430092eaf2d7f960cb097cb5ec656cdeaf87d5a9e61fa8e81665b07f40665fd8b09b6aeccb7f02f  \
           tst.1.reconstructed

See Also