Migrating¶
Policy Based Migration¶
A PixStor policy is typically used to migrate files. A policy will select candidates for migration based on various criteria, and then call ngmigrate in the execution phase to migrate files to external storage.
Example Migration Policy¶
This simple example policy will migrate files which have not been accessed in 180 days to free up space.
define(
exclude_list,
(
PATH_NAME LIKE '%/.ctdb/%'
OR NAME LIKE 'user.quota%'
OR NAME LIKE 'fileset.quota%'
OR NAME LIKE 'group.quota%'
)
)
define(is_migrated, (MISC_ATTRIBUTES LIKE '%V%'))
/* All files use the ngenea.conf configuration file:*/
RULE EXTERNAL POOL 'NGENEA_DEFAULT'
EXEC '/var/mmfs/etc/mmpolicyExec-ngenea-hsm'
OPTS '-v1 --log-target=syslog --config-file=/opt/arcapix/etc/ngenea.conf'
ESCAPE '%'
RULE 'ngenea_migrate' MIGRATE TO POOL 'NGENEA_DEFAULT'
/* If Filesystem is running out of space (more than 85% full)
reduce usage to 70% */
THRESHOLD(85,70,70)
/* Choose files least recently accessed */
WEIGHT(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME))
/* but only migrate files > 1MB in size */
WHERE KB_ALLOCATED > 1024
/* Don't migrate anything which has been accessed in the last 180 days */
AND (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME) > 180)
AND NOT (is_migrated)
AND NOT (exclude_list)
A more comprehensive example can be found here.
ngmigrate¶
Synopsis¶
ngmigrate [-p|--sync-metadata] NAME1 ... NAMEn
ngmigrate [-p|--sync-metadata]
[--filelist-format=NUL|quoted] -f FILELIST
ngmigrate --default-stub-size=LENGTH NAME1 ... NAMEn
ngmigrate --default-stub-size=LENGTH
[--filelist-format=NUL|quoted] -f FILELIST
ngmigrate --force-stub-size=LENGTH NAME1 ... NAMEn
ngmigrate --force-stub-size=LENGTH
[--filelist-format=NUL|quoted] -f FILELIST
where NAMEi
is a file, directory, or symbolic link name.
Common options for all use cases:
[-vLEVEL] [--config-file=FILE] [-r] [--no-recursion-remote]
[ --overwrite-remote | --fail-on-mismatch[=all] ]
[--ignore-rmtlc] [--remote-path=PATH] [--update-atime]
[--update-mtime] [--skip-metadata-update] [--no-stamp-live]
[ --lock-level=partial | [--lock-level=implicit] [--no-flock] ]
[ --log-target=syslog | --log-format=json ]
( -E RESTRICTION_ALIASES[:RESTRICTION_PATHS] |
--endpoint-exclude=EXCLUSION_ALIASES[:EXCLUSION_PATHS] )*
( -P ALIASES:PARAMETER=VALUE )*
Description¶
Migrates files from the local GPFS file system to a Storage Target.
Options¶
--config-file=FILE
path to a master configuration file.
Default: /opt/arcapix/etc/ngenea.conf
--default-stub-size=LENGTH
default approximate length of a beginning file segment that
should be retained during file migration. This setting can be
overridden in a configuration file.
Default: 0 (free up the entire file content).
-E, --endpoint=ALIASES[:PATHS]
restrict the set of storage endpoints for interaction to
endpoints with aliases specified by extended glob
pattern ALIASES.
Optionally, restrict remote object pathnames at those endpoints
to pathnames matching extended glob pattern PATHS.
By default, restrict remote object pathnames to the root path.
Compatible with: --no-recursion-remote
--endpoint-exclude=ALIASES[:PATHS]
exclude remote object pathnames specified by extended glob
pattern PATHS at storage endpoints with aliases specified by
extended glob pattern ALIASES from processing.
By default, exclude remote object pathnames at the root path.
Compatible with: --no-recursion-remote
-f FILELIST process files and directories from a filelist file.
--fail-on-mismatch[=content|all]
fail migrating a file, directory, or symbolic link if a remote
object (with a matching UUID) or folder exists but:
"content" - has a different hash (default);
"all" - has a different hash or different metadata.
Conflicts with: --overwrite-remote
--filelist-format=LF|NUL|quoted
format of a filelist file:
"LF" - filenames delimited by newlines; a filename cannot
contain newline characters;
"NUL" - filenames delimited by the NUL (0) byte;
"quoted" - filenames possibly enclosed in single or double
quotes and delimited by newlines.
Default: "LF".
Compatible with: -f FILELIST
--force-stub-size=LENGTH
retain a segment of every migrated file starting from its
beginning and having a specified approximate length in bytes.
Conflicts with: --sync-metadata
--help display this help and exit.
--ignore-rmtlc always use local file names to deduce remote object names.
Default: read remote location xattrs to determine the names of
remote objects for migrated, premigrated, or metadata
synced files.
--lock-level=partial|implicit
DMAPI locking level:
"partial" - explicitly request a DMAPI shared access right
for the duration of migrating a file and
explicitly request exclusive DMAPI access rights
when updating file xattrs or stubbing a file;
"implicit" - instruct DMAPI to self-manage access rights per
file block when migrating and also self-manage
DMAPI access rights when updating file xattrs or
stubbing a file.
Default: "partial"; can be specified in the configuration file
or overridden by command line.
--log-format=json
log messages in JSON format.
Conflicts with: --log-target=syslog
--log-target=syslog
redirect all logging to the syslog.
Conflicts with: --log-format=json
--no-flock disable using lock files.
Sets lock level to "implicit" if it is not set.
Conflicts with "partial" lock level.
--no-recursion-remote
disable recursive interpretation of restriction and exclusion
extended glob patterns for remote object pathnames.
The recursive interpretation means matching sub-directories at
all nesting levels, whereas non-recursive interpretation means
matching a single directory.
Compatible with: -E, --endpoint; --endpoint-exclude
--no-stamp-live when premigrating a snapshot file, disable stamping the xattrs
of a live file and changing its status;
when metadata syncing a snapshot file, do not use information
from a live file to identify a remote object.
--overwrite-remote
overwrite remote objects if they already exist--do not create
remote object instances with various UUID suffixes.
Overwrite the metadata of already existing remote folders for
local directories migrated explicitly.
Conflicts with: --sync-metadata; --fail-on-mismatch
-P, --param-endpoint=ALIASES:PARAMETER=VALUE
add a parameter with name PARAMETER and value VALUE to
parameters read from configuration files for storage endpoints
with aliases specified by extended glob pattern ALIASES.
If PARAMETER already exists in a configuration file, it takes
a new VALUE.
--path-remote=PATH
--remote-path=PATH
migrate a single file, directory, or symbolic link to remote
object or folder PATH or migrate multiple files, directories,
or symbolic links to remote folder PATH (ending with `/').
Default: migrate files, directories, or symbolic links to
remote locations deduced from the local paths of the
files, directories, or symbolic links.
--perf-dstat=INTERVAL,FILE
record the following information about the running program to a
FILE every INTERVAL seconds: virtual memory size (in
megabytes), resident set size (in megabytes), thread count, and
the number of open file descriptors.
--perf-profile[=all]
dump cumulative times of executing various operations:
"all" - dump cumulative times for all operations executed at
least once (default: hide operations with
insignificant times).
-p, --premigrate
retain the content of every migrated file and do not set the
OFFLINE flag for the file.
Conflicts with: --sync-metadata
-r, --recursion-local
if program arguments specify directory names, process files in
those directories and their sub-directories recursively.
Default: process specified directories but not their content.
--skip-metadata-update
disable updating the metadata of a remote object if its UUID
and hash are equal to the UUID and hash of a local file.
Disable updating the metadata of folders and symbolic links.
Conflicts with: --sync-metadata
--sync-metadata update remote object or folder metadata based on the status of
local files, directories, or symbolic links.
Conflicts with: -p, --premigrate; --overwrite-remote;
--skip-metadata-update; --force-stub-size
--update-atime update the access time and status change time of local files to
"now" after successful migration.
--update-mtime update the modification time and status change time of local
files to "now" after successful migration.
-v, --verbose[=LEVEL]
verbosity level:
0 = error and warning messages (also used when this option
is absent);
1 = print the names of successfully migrated files (default);
2 = debug messages, excluding those related to file locking;
3 = enable core dump and debug messages related to file locking;
print PID and current time with microsecond precision.
-V, --version display version information and exit.
Exit Status¶
On successful completion, ngmigrate returns exit status 0. If ngmigrate was called to migrate files, successful completion means that all files were migrated successfully, and there were no warning messages printed.
On unsuccessful completion, ngmigrate returns exit status 1---this means that none of the files were migrated successfully.
On partially successful completion, ngmigrate returns exit status 2---this means that some files were migrated successfully, and some files were not migrated successfully, or that all files were migrated successfully, but ngmigrate printed one or more warning messages.
Examples¶
To migrate a file to the associated Storage Target:
ngmigrate /mmfs1/data/file1
To migrate a file to a storage target, using a custom configuration file (which may redefine the storage target) to replace the default configuration file:
ngmigrate --config-file=/path/to/custom.conf /mmfs1/data/file1
To migrate multiple files to their associated Storage Targets:
ngmigrate /mmfs1/data/file1 /mmfs1/data/file2
To migrate all files in directory /mmfs1/data/
starting with name file
to the associated Storage Target:
ngmigrate /mmfs1/data/file*
To migrate all files starting with name file
and name newfile
in directory /mmfs1/data/
to the associated Storage Target:
ngmigrate /mmfs1/data/file* /mmfs1/data/newfile*
To migrate all files, except hidden ("dot") files, within a directory to the associated Storage Target:
ngmigrate /mmfs1/data/*
To migrate all files, including hidden ("dot") files, within a directory to the associated Storage Target:
ngmigrate /mmfs1/data/{.??,}*
To migrate all files, except hidden ("dot") files, within two different directories to their associated Storage Targets:
ngmigrate /mmfs1/data/dir1/* /mmfs1/data/dir2/*
Handling a Too Long List of Arguments¶
If there are too many files in a directory, invoking ngmigrate to process files in the directory using a glob pattern may fail with the "Argument list too long" error.
For example, if the directory /mmfs1/data
contains too many files, the following command fails:
$ ngmigrate /mmfs1/data/*
bash: /opt/arcapix/bin/ngmigrate: Argument list too long
In this situation, a user can invoke ngmigrate to process all files in the directory recursively by passing the option -r
and a directory name instead of passing a glob pattern, for example:
ngmigrate -r /mmfs1/data
In this case, ngmigrate will process all files in the directory /mmfs1/data
and all its descending subdirectories.
If ngmigrate encounters files or directories with duplicate dev/ino pairs, it will process instances of those files or directories it finds first.
To process files in a too large directory by glob pattern, a user can use the standard find
and xargs
commands, for example:
find /mmfs1/data -name '*.bin' -print0 | xargs -0 -n100 ngmigrate
This command scans the directory /mmfs1/data
and all its descending subdirectories, finds files with names matching the glob pattern *.bin
, and executes ngmigrate passing those file names as its arguments.
For every ngmigrate invocation, xargs
passes no more than 100 arguments, so the "Argument list too long" error shall not occur.
Alternatively, a user can pass a long list of files via a filelist, for example:
find /mmfs1/data -name '*.bin' -print0 | ngmigrate --filelist-format=NUL -f-
This command makes ngmigrate read a NUL-separated list of files to migrate piped from the find
command.
Matching Storage Endpoints¶
The options --endpoint / --endpoint-exclude are used to include / exclude some endpoints.
Example, to migrate files to those Storage Targets which match an extended glob pattern :
ngmigrate --endpoint='awss3-*|fs-*' /mmfs1/data/dir1/*
Only storage endpoints which match the pattern will be taken into consideration.
To migrate files to those Storage Targets which fail to match an extended glob pattern :
ngmigrate --endpoint-exclude='awss3-*|fs-*' /mmfs1/data/dir1/*
Only storage endpoints which don't match the pattern will be taken into consideration.
Premigrating from Snapshots¶
On passing to ngmigrate directories and symbolic links located in snapshots, ngmigrate uploads them to remote folders and objects with names deduced from the names of corresponding "live" (i.e. located outside of any snapshots) directories and symbolic links.
On passing to ngmigrate files located in snapshots, where the files do not have remote location extended attributes for a storage endpoint, ngmigrate uploads the files to remote objects at the storage endpoint with names deduced from the names of corresponding "live" files.
Example:
ngmigrate -p /mmfs1/path/to/fileset1/.snapshots/snap9/subdir1/subdir2/file2.txt
The above command uploads the specified file to the remote object "path/to/fileset1/subdir1/subdir2/file2.txt" (possibly with a UUID extension). In the above example ".snapshots" is a snapshot directory within the fileset "fileset1", and "snap9" is a snapshot name. Ngenea removes the snapshot directory and name path components upon uploading.
On passing files located in snapshots, where the files have remote location extended attributes (APXrmtXX
) for a storage endpoint, ngmigrate uploads the files to remote objects at the storage endpoint with names deduced from the remote location extended attributes.
Pass the option --ignore-rmtlc
to disable using the remote location extended attributes in this case and upload the files to remote objects with names deduced from the names of corresponding "live" files.
Determining the UUID of a Premigrated Snapshot File¶
The ngmigrate tool determines a UUID for premigrating a snapshot file by the following rules:
If a "live" file exists, is "normal" (online) or "premigrated", has the same fsid/ino/igen triple as the premigrated snapshot file, and also has an Ngenea
APXguuid
extended attribute, ngmigrate uses a UUID from the extended attribute.Otherwise, if the premigrated snapshot file has an Ngenea
APXguuid
extended attribute, ngmigrate uses a UUID from the extended attribute.Otherwise, ngmigrate randomly generates a UUID.
If a determined UUID is equal to the UUID metadata of a corresponding remote object, ngmigrate will replace the remote object with the premigrated file.
If a determined UUID is different from the UUID metadata of a corresponding remote object, ngmigrate will upload the premigrated file with a UUID extension in order to avert overwriting the remote object.
If after premigrating files from a snapshot the outcome required is that local and remote paths match exactly, the --overwrite-remote
option should be passed ensuring that on encountering a collision, ngmigrate will overwrite the object in the storage endpoint.
Users should ensure that if this option is used, they understand their data workflow, and it is advisable to turn on any versioning support in the storage endpoint in case of inadvertent overwriting of data.
Stamping a "Live" File¶
If a "live" file exists, is "normal" (online) or "premigrated", and has the same fsid/ino/igen triple as a premigrated snapshot file, then on successful premigration of the snapshot file, ngmigrate sets Ngenea extended attributes for the "live" file and changes its status from "normal" to "premigrated".
Setting Ngenea extended attributes for the "live" file and changing its status from "normal" to "premigrated" can be disabled by passing the option --no-stamp-live
to ngmigrate.
See Also¶
Limitations¶
Migration failures may be observed when using the ngmigrate tool due to violation of one or more of the following limitations:
Maximum DMAPI Xattr Value Length¶
Ngenea stores information about remote objects corresponding to a local migrated (i.e. stub or premigrated) file in its DMAPI extended attributes. DMAPI extended attributes must be accessed by DMAPI-specific functions, and the standard commands getfattr and setfattr for manipulating extended attributes cannot access them.
PixStor imposes a limit on the value of a DMAPI extended attribute of a file equal to 1022 bytes. The length of a remote location string of a migrated file stored in its DMAPI extended attribute cannot exceed 1022 bytes. This limitation restricts the length of a name part of a migrated file stored in its DMAPI extended attribute for keeping a remote location.
On attempt to migrate a file with a remote location string longer than 1022 bytes, ngmigrate will report an error and will not migrate the file to that particular remote location.
On migrating files to any storage target, except for the filesystem storage target, ngmigrate percent-encodes special characters in the names of remote objects corresponding to local files (unless this mode is disabled by the configuration parameter EscapeNames=false
if that parameter is available).
Therefore, if the name of a local file contains special characters, the limit equal to 1022 bytes may be violated even for a short file name.
Where files with very long names have several common path prefixes, additional storage endpoints can be configured for such path prefixes to make name parts of those files stored in the DMAPI extended attributes shorter. This approach also shortens the names of remote objects corresponding to local files---in this way, it is possible to prevent violating a restriction on maximum object name length for a particular storage target.
Maximum Object Name Length¶
Storage targets impose limits on the length of remote object and folder names. A long local file name or directory path when converted to a remote object name may exceed the allowed character limit of the storage target. Therefore, ngmigrate may fail to create the remote object or folder.
Additionally, ngmigrate may store the metadata of remote objects and folders in separate shadow metadata objects with names formed by prepending .
and appending .xattr
to an object or folder name.
Where a remote object or folder name has allowed length close to the maximum allowed limit, a shadow metadata object name including the .xattr
suffix may be longer than the allowed length, and therefore the migration of a local file or directory fails.
Modifying Migrated Files¶
Modifying files containing DMAPI extended attributes set by Ngenea may remove the extended attributes from those files. Removing DMAPI extended attributes from a file deletes information about remote objects corresponding to the file. This deletion results in the inability to correctly perform operations that require this information, for example, recall the file.
Third-party tools that modify a file by creating a new file with updated content and renaming the new file using the same name as the original file may remove DMAPI extended attributes from the original file. If possible, such third-party tools should be configured to modify a file by writing to it directly, without creating a new file with updated content.
For example, Vim (Vi IMproved, a programmers text editor) may require setting the option bkc to yes by issuing the command :set bkc=yes
.
The Vim documentation describes that option as
'backupcopy' 'bkc' make backup as a copy, don't rename the file
The command set bkc=yes
can be specified in the file .vimrc
to set default behavior.
ACL Support¶
Ngenea supports saving and restoring the ACLs of files and directories. This behaviour is particularly applicable to 'follow-the-sun' type workflows where security of data is enforced across global sites.
To enable saving ACLs of local files and directories in a storage endpoint via ngmigrate specify the parameter ACLSave=true
in the configuration file for the storage endpoint.
To disable restoration of remote ACLs to local files and directories on reverse stubbing/premigration pass the option --no-restore-acl
to ngrecall.
Similarly to restoring other metadata of local directories on reverse stubbing/premigration, ngrecall does not restore (I.E. overwrite) the ACLs of local pre-existing directories.
ACL Behaviour¶
Ngenea ACL behaviour differs depending upon the ACL support of the target endpoint type.
Object Store¶
Local file ACLs are saved to the metadata of remote objects
Local directories ACLs are saved as shadow metadata remote objects
GPFS Filesystem¶
Local ACLs are copied to the ACLs of files and directories in the storage endpoint
A GPFS filesystem supports either POSIX or NFS4 ACLs.
The option -k
of mmchfs
provides configuration of ACL semantics for a GPFS filesystem.
Ngenea requires the ACL type configuration of a GPFS filesystem to be identical to that of the source GPFS filesystem in order to successfully restore the ACLs of local files and directories on reverse stubbing/premigration.
Where the ACL type of the destination filesystem differs from that stored in the endpoint --no-restore-acl
can be passed to ngrecall to skip restoration of incompatible ACL types and successfully recall the file(s) and/or directories.
POSIX-compliant Filesystem¶
Local ACLs are saved to the metadata of remote objects and folders in the storage endpoint