Scanning¶
Use the ngscan tool to list (scan) remote objects in storage endpoints.
ngscan¶
Synopsis¶
ngscan ( -E RESTRICTION_ALIASES[:RESTRICTION_PATHS] |
--endpoint-exclude=EXCLUSION_ALIASES[:EXCLUSION_PATHS] )+
ngscan [-r] NAME1 ... NAMEn
ngscan [-r] [--filelist-format=NUL|quoted] -f FILELIST
Description¶
Lists remote objects, folders, and symbolic links in storage endpoints.
Options¶
--all-obj-instances
list all object instances.
Default: list only latest object instances.
--base-path-type=retrieve|store
type of a base path at storage endpoints to list
remote objects:
"retrieve" - use the retrieve base path;
"store" - use the store base path.
Default: "retrieve".
--config-file=FILE
path to a master configuration file.
Default: /opt/arcapix/etc/ngenea.conf
--depth-remote=objects|objects-ext|immediates|infinity
recursion depth for matching restriction and exclusion extended
glob patterns for remote object pathnames:
"objects" - match objects at nesting level 0;
"objects-ext" - match objects and folder-like objects (that
have names ending with `/') or files and
directories at nesting level 0;
is equivalent to: --no-recursion-remote;
"immediates" - match objects at nesting level 0 and folders
at nesting levels 0 and 1;
"infinity" - match objects and folders at all
nesting levels.
Default: "infinity".
Compatible with: -E, --endpoint; --endpoint-exclude
-E, --endpoint=ALIASES[:PATHS]
restrict the set of storage endpoints for listing remote
objects to endpoints with aliases specified by extended glob
pattern ALIASES.
Optionally, restrict listed remote object pathnames at those
endpoints to pathnames matching extended glob pattern PATHS.
By default, restrict listed remote object pathnames to the
root path.
Compatible with: --depth-remote
--endpoint-exclude=ALIASES[:PATHS]
exclude remote object pathnames specified by extended glob
pattern PATHS at storage endpoints with aliases specified by
extended glob pattern ALIASES from listing.
By default, exclude remote object pathnames at the root path.
Compatible with: --depth-remote
--ent-type=STRING
list remote entities with specified types:
`f' - regular files;
`d' - directories;
`l' - symbolic links.
Separate these letters by `,' to specify multiple types.
Default: "f,d,l".
-f FILELIST process files and directories from a filelist file.
--filelist-format=LF|NUL|quoted
format of a filelist file:
"LF" - filenames delimited by newlines; a filename cannot
contain newline characters;
"NUL" - filenames delimited by the NUL (0) byte;
"quoted" - filenames possibly enclosed in single or double
quotes and delimited by newlines.
Default: "LF".
Compatible with: -f FILELIST
--format=STRING
output every line containing information about a remote object,
folder, or symbolic link according to a specified format
string; such string can contain the following
format specifiers:
%{FIELD_NAME} - print a specified field aligned to the
right using default width;
%-{FIELD_NAME} - print a specified field aligned to the
left using default width;
%WIDTH{FIELD_NAME} - print a specified field aligned to the
right in a column of specified width;
%-WIDTH{FIELD_NAME} - print a specified field aligned to the
left in a column of specified width.
File and object name fields:
"fln" - normalized local file name;
"fln_raw" or "fr" - local file name biuniquely
corresponding to an object;
"name" - decoded object name without a
base path prefix;
"name_raw" or "nr" - raw object name without a base
path prefix;
"name_full" or "nf" - full decoded object name;
"name_full_raw" or "nfr" - full raw object name;
"name_no_uuid" or "nnu" - decoded object name without
a base path prefix and
UUID suffix;
"name_no_uuid_raw" or "nnur" - raw object name without
a base path prefix and
UUID suffix.
Standard file information fields:
"type" - remote entity type
(`f' - object, `d' - folder,
`l' - symbolic link);
"mode" - octal file mode;
"owner" - owner (user) name;
"group" - group name;
"size" - size in bytes.
Time fields:
"atime" - last access time;
"migtime" - migration time;
"mtime" - last modification time;
"ctime" - last status change time.
Other fields:
"row_idx" - index of a list row
before sorting;
"hash_sha512" - object content SHA-512 hash;
"storage_alias" or "sa" - alias of a storage endpoint;
"symlink_value" or "sv" - symbolic link value;
"uuid" - object UUID;
"marker" - marker to continue listing on
next invocation of the program.
Metadata elements:
"metadata.all.KEY" - native or shadow metadata
element with a specified key;
"metadata.native.KEY" - native metadata element with a
specified key;
"metadata.shadow.KEY" - shadow metadata element with a
specified key.
Default: "%-{name} %{size}".
--help display this help and exit.
--ignore-rmtlc never read remote location xattrs to determine the names of
remote objects for local files specified on the command line.
Default: read remote location xattrs on running the program
with superuser privileges.
--json[=pretty] output log messages and information about remote objects,
folders, and symlinks in JSON format.
If the option argument "pretty" is present, produce indented
multiline output for JSON objects; otherwise, produce
single-line output for JSON objects.
Default: output log messages in ordinary text form and output
information about remote objects, folders, and
symlinks in text table form.
--list-shadow additionally list shadow metadata objects that have names
beginning with `.' and ending with `.xattr'.
--marker=STRING
continue listing from a marker returned in the `marker' field
at a previous program invocation; a list of parameters
`-E, --endpoint' and `--endpoint-exclude' should be the same as
at that program invocation.
Requires: --all-obj-instances
Conflicts with: --depth-remote=infinity (default mode)
--marker-rows-hint=N
if possible, print a marker every N rows.
--max-rows=INT maximum number of rows to list.
Default: list all rows.
--no-header do not print column headers.
--no-recursion-remote
disable recursive interpretation of restriction and exclusion
extended glob patterns for remote object pathnames.
The recursive interpretation means matching sub-directories at
all nesting levels, whereas non-recursive interpretation means
matching a single directory.
Equivalent to: --depth-remote=objects-ext
-o FILE output a remote object list to a specified file.
Default: output to stdout.
-P, --param-endpoint=ALIASES:PARAMETER=VALUE
add a parameter with name PARAMETER and value VALUE to
parameters read from configuration files for storage endpoints
with aliases specified by extended glob pattern ALIASES.
If PARAMETER already exists in a configuration file, it takes
a new VALUE.
--perf-dstat=INTERVAL,FILE
record the following information about the running program to a
FILE every INTERVAL seconds: virtual memory size (in
megabytes), resident set size (in megabytes), thread count, and
the number of open file descriptors.
--perf-profile[=all]
dump cumulative times of executing various operations:
"all" - dump cumulative times for all operations executed at
least once (default: hide operations with
insignificant times).
--print-metadata
output all available native and shadow metadata of listed
remote objects, folders, and symlinks.
Implies the options: --print-native-metadata +
--print-shadow-metadata
--print-native-metadata
output all available metadata of listed remote objects,
folders, and symlinks except for metadata stored as
object content.
--print-shadow-metadata
output all available metadata of listed remote objects and
folders stored in shadow metadata objects.
--print-vendor-metadata
output vendor (storage-specific) metadata of listed remote
objects, folders, and symlinks.
-q, --quiet suppress normal output (to stdout).
Return exit status 0 if normal output would contain at least
one line describing a remote entity on condition that no
warnings were printed.
Return exit status 1 if normal output would be empty.
-r, --recursion-local
if program arguments specify directory names, process files in
those directories and their sub-directories recursively.
Default: process specified directories but not their content.
--skip-check-uuid
disable verifying that UUID-like suffixes in remote object
names are equal to UUID metadata of those remote objects.
If this verification is disabled, guessed file names
corresponding to remote object names may be incorrect (to
obtain a file name corresponding to a remote object name, its
UUID suffix has to be removed).
When determining remote object names for local file names
specified on the command line by reading their remote location
xattrs, disable verifying that the UUID xattr of a local file
and the UUID of a remote object fetched from its metadata
are equal.
--sort[=FIELD1,...,FIELDn]
sort an output remote object list by specified fields.
FIELDi is a field name (see the description of
`--format=STRING' option) optionally followed by `-' for
sorting in reverse order.
Default: "storage_alias,name_full".
--time-style=rfc3339
print time fields in RFC 3339 format with nanoseconds.
Example: "2020-01-15 14:56:57.234567890+03:00".
-u, --unique remove duplicate lines from an output remote object list.
-v, --verbose=LEVEL
verbosity level:
0 = remote object list entries and error and warning messages
(also used when this option is absent);
2 = debug messages;
3 = enable core dump;
print PID and current time with microsecond precision.
-V, --version display version information and exit.
Examples¶
List all remote objects in all storage endpoints described in a default master configuration file:
ngscan --all-obj-instances -E '*'
List only latest instances of remote objects in all storage endpoints described in a default master configuration file:
ngscan -E '*'
Use a master configuration file "templates/master-fs.conf":
ngscan --config-file=templates/master-fs.conf -E '*'
List all remote objects in storage endpoints with the aliases "fs" and "awss3":
ngscan -E 'fs|awss3'
List all remote objects in storage endpoints with aliases containing the substring "s3" but not with the alias "remote_blackpearl_ds3":
ngscan -E '*s3*' --endpoint-exclude=remote_blackpearl_ds3
List remote objects recursively (i.e. including all sub-folders) in the folder "path/to/dir" in a storage endpoint with the alias "fs":
ngscan -E fs:path/to/dir
List remote objects recursively in the folders "path/to/dir" and "some/other/path" in a storage endpoint with the alias "fs":
ngscan -E 'fs:path/to/dir|some/other/path'
List remote objects recursively in the folder "dir" in all storage endpoints but exclude remote objects (recursively) in the folder "dir/subdir" in a storage endpoint with the alias "remote_blackpearl_ds3":
ngscan -E '*:dir' --endpoint-exclude=remote_blackpearl_ds3:dir/subdir
List remote objects non-recursively (i.e. not including remote objects in sub-folders) in the folder "path/to/dir" in storage endpoints with aliases containing the substring "s3":
ngscan -E '*s3*:path/to/dir/' --no-recursion-remote
Selecting Information Fields to Print¶
Print the fields: full remote object name ("nf") aligned to the left, file mode ("mode"), size ("size"), and last modification time ("mtime"):
ngscan --format='%-{nf} %{mode} %{size} %{mtime}' -E '*'
Print the fields: remote object name without a base path ("name") aligned to the left in a column of width 80 and remote object size ("size") in a column of width 16:
ngscan --format='%-80{name} %16{size}' -E '*'
Print remote object names without column header lines:
ngscan --format='%-{name}' --no-header -E '*'
Sorting¶
Sort by remote object size in ascending order:
ngscan --sort=size -E '*'
Sort by remote object size in descending order:
ngscan --sort=size- -E '*'
Sort by file owner in ascending order, then by size in descending order, then by name in ascending order:
ngscan --sort=owner,size-,name --format='%-{owner} %-{name} %{size}' -E '*'
Removing Duplicate Lines¶
Print all distinct file owners:
ngscan -u --format='%-{owner}' -E '*'
Print all distinct file owner / file mode pairs:
ngscan -u --format='%-{owner} %{mode}' -E '*'