Ingesting¶
Ingesting files from external storage¶
A particularly powerful feature of Ngenea HSM is the ability to ingest existing data into a PixStor file system by "reverse stubbing". This process creates a migrated file stub on the file system which points to any file on a defined external storage target. The file is then immediately accessible via the PixStor file system as if it had been natively created and then migrated via Ngenea HSM.
In this way, it is possible to rapidly and efficiently migrate any existing data into a PixStor file system, without requiring a wholesale copy or move of data. Only metadata records need to be created prior to beginning use of the data via the PixStor file system. Once this initial metadata creation is complete, data will automatically migrate to the file system on access, and can also be brought across as a background process.
Note
If a storage endpoint is not a PixStor filesystem, ngrecall may change the access time of files it is accessing in the storage endpoint while creating reverse stubs for them.
The process for ingesting existing data holdings will vary based on requirements, but the process will typically consist of:
define the Ngenea HSM configuration for the external storage
generate a list of file/object paths to be ingested
create any required directories
create reverse stubs inside the directories with the command
ngrecall --stub
Example - ingesting existing NFS storage¶
In this example, an existing storage system is mounted via NFS at /mnt/legacy on the Ngenea HSM node(s).
All data will be ingested into the legacy/ folder on the PixStor file system at /mmfs1/.
The goal is to eventually move all data from the legacy system into the /mmfs1 file system.
Simultaneously, the /mmfs1 file system will be enabled with Ngenea HSM to migrate data to an S3
storage target, as per a standard Ngenea HSM deployment.
Master configuration files¶
Here, the external storage is coupled to the /mmfs1/legacy path. Since a different target
is being used for subsequent migrations, this is created as a dedicated configuration
file for the ingest (/opt/arcapix/etc/ngenea-ingest.conf).
The default configuration file (/opt/arcapix/etc/ngenea.conf) defines how to recall
data from the target as well as setting the default migration (and recall) target(s) for
subsequently migrated data.
/opt/arcapix/etc/ngenea-ingest.conf¶
This specifies the location of the configuration file for the legacy storage (legacy_nfs.conf), and
assigns it as the target to be used for files under the /mmfs1/legacy path,
with the relative path set underneath /mmfs1/legacy. For example, a file
with path /mmfs1/legacy/folder1/file1 would be mapped to a file on the target
storage at /folder1/file1, relative to its root - i.e. /mnt/legacy/folder1/file1
in this scenario.
[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/mmfs1/legacy/(.+)
/opt/arcapix/etc/ngenea.conf¶
This specifies the configuration file to be used where file data blocks are
stored in the legacy_nfs target, and also an AWS object storage target which
will be the default used for subsequent migrations for the whole /mmfs1 file system.
The use of READONLY as the LocalFileRegex effectively disables any use of the
legacy NFS storage for standard data migrations.
[Storage aws_bucket1]
StorageType=AmazonS3
ConfigFile=/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf
RemoteLocationXAttrRegex=aws_bucket1:(.+)
LocalFileRegex=/mmfs1/(.+)
[Storage legacy_nfs]
StorageType=FS
ConfigFile=/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf
RemoteLocationXAttrRegex=legacy_nfs:(.+)
LocalFileRegex=/READONLY(.+)
Storage Target configuration files¶
/opt/arcapix/etc/ngeneahsm/legacy_nfs.conf¶
This specifies that the legacy storage is mounted at /mnt/legacy. It also
enables DeleteOnRecall, as the goal is to move all data off the legacy
storage over time.
If it were to be used ongoing as a migration target, DeleteOnRecall would
typically be set to False.
[General]
RemoteLocationXAttr=legacy_nfs:$1
RetrieveObjectBasePath=/mnt/legacy
RetrieveObjectName=$1
StoreObjectBasePath=/mnt/legacy
StoreObjectName=$1
EnsureMountPoint=/mnt/legacy
DeleteOnRecall=True
ObjectXAttrManipulationMode=auto
/opt/arcapix/etc/ngeneahsm/aws_bucket1.conf¶
Note that we use DeleteOnRecall=False here, as this will be the general
purpose migration target, and we wish to make use of premigration
functionality.
[General]
AccessKeyId=ACCESSKEYID
SecretAccessKey=SECRETACCESSKEY
Bucket=my_ngenea_bucket
Region=eu-west-2
Scheme=HTTPS
SSLVerify=True
RemoteLocationXAttr=aws_bucket1:$1
RetrieveObjectName=$1
StoreObjectName=$1
DeleteOnRecall=False
Ingest command¶
The following ngrecall command will:
scan the legacy storage;
create any required directories;
create reverse stubs for all files contained;
print the names of created stub files.
It is safe to re-run it multiple times - it will skip past any files which already exist.
Note
Storage existing prior to Ngenea HSM operations does not contain files with UUID suffixes created by ngmigrate.
In this case, ingestion speed can be substantially increased by additionally passing the option --all-obj-instances to ngrecall.
That option disables scanning files that differ in UUID suffixes to ingest only most recently migrated files.
The command can be executed on a sub-folder basis to perform a selective ingest.
For example, to process only the folder /mnt/legacy/folder1:
ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs:folder1
Alternatively, to ingest the whole file system:
ngrecall -v --stub --all-obj-instances --config-file=/opt/arcapix/etc/ngenea-ingest.conf -E legacy_nfs
Creating Stub Files with Supplied Parameters¶
Use the ngmakestub tool to create stub files using parameters supplied on the command line without accessing remote storage. Later, the ngrecall tool can use the stub files to recall actual data from remote storage.
ngmakestub¶
Synopsis¶
ngmakestub --size=LENGTH --uuid=STRING
[ --fmode=MODE ] [ --uid=USER ] [ --gid=GROUP ]
[ --sha512=STRING ] [ --acl=STRING ]
[ --atime=TIME ] [ --mtime=TIME ] [ --ctime=TIME ]
( --remote-loc=KEY/LOCATION )+
( --xattr=NAME=VALUE )* FILE1 ... FILEn
Description¶
Creates stub (migrated) files.
Options¶
--acl=STRING ACL in ngenea format to set for a stub file.
--atime=TIME last access time in RFC 2822 or 3339 format to set for a stub
file. Examples:
"Wed, 15 Jan 2020 14:56:57 +0300" - RFC 2822;
"2020-01-15 14:56:57.234567890+03:00" - RFC 3339.
--ctime=TIME last status change time in RFC 2822 or 3339 format to set for a
stub file. See the description of `--atime=TIME'
option for examples.
--fmode=MODE file mode bits in octal format to set for a stub file.
--gid=GROUP file group (in NUMBER, STRING, /NUMBER, STRING/, or
STRING/NUMBER format) to set for a stub file.
--help display this help and exit.
--hex-xattr-name
treat an extended attribute NAME in the option
`--xattr=NAME=VALUE' as a byte sequence in hexadecimal format.
--hex-xattr-value
treat an extended attribute VALUE in the option
`--xattr=NAME=VALUE' as a byte sequence in hexadecimal format.
--lock-level=partial|implicit
DMAPI locking level:
"partial" - explicitly request an exclusive DMAPI access
right when creating a stub file;
"implicit" - instruct DMAPI to self-manage access rights when
creating a stub file.
Default: "partial".
--log-format=json
log messages in JSON format.
Conflicts with: --log-target=syslog.
--log-target=syslog
redirect all logging to the syslog.
Conflicts with: --log-format=json.
--mtime=TIME last modification time in RFC 2822 or 3339 format to set for a
stub file. See the description of `--atime=TIME'
option for examples.
--no-flock disable using lock files.
Sets lock level to "implicit" if it is not set.
Conflicts with: --lock-level=partial.
--overwrite-local
overwrite local files if they already exist.
--remote-loc=KEY/LOCATION
remote location to write to a "APXrmtXX" DMAPI xattr of a
stub file, where XX is a two-character KEY (must not be equal
to "sz"). The DMAPI xattr takes a value LOCATION. This option
can be present multiple times on the command line.
--sha512=STRING hexadecimal SHA-512 string to set in the "APXsh512" DMAPI
xattr of a stub file. The string must contain 128 lowercase
hexadecimal digits.
--size=LENGTH stub file size in bytes.
--uid=USER file owner (in NUMBER, STRING, /NUMBER, STRING/, or
STRING/NUMBER format) to set for a stub file.
--uuid=STRING UUID to write to the "APXguuid" DMAPI xattr of a stub file in
the format XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX where X is a
lowercase hexadecimal digit.
-v, --verbose print the names of created stub files.
-V, --version display version information and exit.
--xattr=NAME=VALUE
set an extended attribute NAME with a VALUE for a stub file.
This option can be present multiple times on the command line.
Compatible with: --hex-xattr-name; --hex-xattr-value.
Examples¶
Create the stub file "mylocalstub" with size 4096 bytes, file mode 644, the DMAPI extended attribute APXguuid containing a randomly generated UUID, and the DMAPI extended attribute APXrmtlc containing the remote location "myendpoint:myobject":
ngmakestub --uuid="$(/bin/uuidgen)" --remote-loc=lc/myendpoint:myobject --size=4096 --fmode=644 mylocalstub
Create the stub file "tst.1.reconstructed" with size 4 GiB, the DMAPI extended attribute APXguuid containing the UUID value, the DMAPI extended attribute APXsh512 containing the hash value, and the DMAPI extended attribute APXrmt01 containing the remote location "awss3:tst.1":
ngmakestub --size=4294967296 --uuid=9e1a9c22-1f6a-4026-872b-e97cd8071dc2 --rmtlc=01/awss3:tst.1 \
--sha512=43b5c6f434f71daae80a502212dc8c0e9e52d8b075d589afa430092eaf2d7f960cb097cb5ec656cdeaf87d5a9e61fa8e81665b07f40665fd8b09b6aeccb7f02f \
tst.1.reconstructed