Utilities
The utils
modules provides convenience methods built for the GPFS C API.
utils
provides more complex functionality than CLib’s other classes
which are a thin wrapper over the functions available in the GPFS C API.
Note
Most methods require root permission
Description
Miscellaneous Functions
Filesystem Snapshot Identifier Convenience Functions
- arcapix.fs.gpfs.clib.utils.get_fsname_by_path(path)
Get the name of the filesystem a path belongs to.
e.g.
>>> get_fsname_by_path('/mmfs1/data') 'mmfs1'
- arcapix.fs.gpfs.clib.utils.get_snapname_by_path(path)
Get the name of the snapshot a path belongs to.
e.g.
>>> get_snapname_by_path('/mmfs1/.snapshots/snap1/data') 'snap1'
- arcapix.fs.gpfs.clib.utils.get_path_in_snapshot(path, snap, fileset=None)
Get the equivalent of a path within a given snapshot.
e.g.
>>> get_path_in_snapshot('/mmfs1/data', 'snapshot1') '/mmfs1/.snapshots/snapshot1/data'
Directory Scan Convenience Functions
- class arcapix.fs.gpfs.clib.utils.scandir(path, snapName=None)
scandir
is a directory iterator.Similar to the one in the Python 3.5 stdlib, implemented using GPFS C lib.
scandir()
is a generator version ofos.listdir()
that returns an iterator over files in a directory, and also exposes extra information (such as type and stat information).When
snapName
is specified, the returned paths will be children of the specified snapshot’s directory - e.g.>>> for i in scandir('/mmfs1/data', 'snap1'): ... print i.path /mmfs1/.snapshots/snap1/data
- Parameters
- Returns
iterator of
GpfsDirEntry
objects for given path
- class arcapix.fs.gpfs.clib.utils.GpfsDirEntry
Object representing an directory entry, as returned by
scandir
.- inode(self) gpfs_ino64_t
Returns the inode number of the entry.
- name
Returns the name of the entry.
- path
Returns the full path of the entry.
- stat(self)
Returns
stat_result
for the entry.Result comes from
arcapix.fs.gpfs.clib.file.stat()
.
- arcapix.fs.gpfs.clib.utils.listdir(path)
List the contents of a directory.
Like Python
os.listdir()
, implemented using GPFS C Lib. As with Python, this method follows symlinks.The list is in arbitrary order. It does not include the special entries ‘.’ and ‘..’ even if they are present in the directory.
- Parameters
path (str) – path to a directory in a GPFS filesystem
- arcapix.fs.gpfs.clib.utils.walk(top, bool topdown=True, bool followlinks=False)
Walk a filesystem directory tree.
Like Python
os.walk()
, implemented using GPFS C LibNote: unlike
os.walk
, clibwalk
doesn’t ‘see’ the.snapshots
directory
- arcapix.fs.gpfs.clib.utils.parallel_walk(root, mapfn, reducefn=<built-in function iadd>, workers=None)
Perform a parallel walk of a GPFS directory tree.
- Parameters
mapfn – function to call for each directory entry. Receives a
GpfsDirEntry
object.reducefn – function to combine results from mapfn Default = addition
workers – number of worker processes to spawn Default = CPU count/2, up to a maximum of 8
Note
Requires root permission
Inode Scan Convenience Functions
- class arcapix.fs.gpfs.clib.utils.inode_iterator(fsName, snapName=None, prevSnap=None, fromInode=0, toInode=0)
inode_iterator is an iterator object, which allows users to perform inode scans
>>> for i in inode_iterator(...): ... # do something >>> iscan = inode_iterator(...) >>> i = iscan.next() >>> j = next(iscan)
It acts as a convenience for the various
inodescan
methods.- Parameters
fsName (str) – Name of the Filesystem to be scanned
snapName (str) – Name of a snapshot with the named filesystem to scan
prevSnap – Name of a previous snapshot, older than
snapName
If provided, only files that have changed since this snapshot will be returned Pass None to return all inodes fromfsName
/snapName
fromInode (int) – The minimum inode number to scan from
toInode (int) – The maximum inode number to scan to. If not specified or 0, all inodes will be returned.
The
fromInode
andtoInode
parameters can be used to perform multi-threaded scans.- Returns
iattr
namedtuples
- close(self)
Close the inode scan.
Reset Times
- class arcapix.fs.gpfs.clib.utils.SetTimesError(message)
Exception raised by
reset_times()
When
precheck
is True and times cannot be changed
- arcapix.fs.gpfs.clib.utils.reset_times(path, follow=True, precheck=True)
Reset the timestamps on a file on context exit
>>> with reset_times('/mmfs1/file'): ... # do stuff with file
- Parameters
path (str) – path of the file whose times should be reset
follow (bool) – whether to follow symlinks
precheck (bool) –
Pre-check if we will be able to reset times.
If this option is True and times can’t be changed, a
SetTimesError
will be thrown before any code is run inside the context. This ensures that the existing times are preserved.If this option is False, then resetting times may fail silently.
(Re)setting times may fail, for example, if you aren’t the file owner or root
ACL Convenience Functions
- arcapix.fs.gpfs.clib.utils.acl.get_ace_name(ace)
Get the user or group name associated with an ACE.
ACE is an entry returned by
arcapix.fs.gpfs.clib.acl.get_nfs4_acl()
Note
User and group name lookup is performed with
pwd.getpwuid
andgrp.getgrgid
.These may not work for identifying users and groups in an AD environment.
- Returns
tuple of (type, name) where type is one of (special, group, user)
- Raises
KeyError if the ACE id can’t be translated to a name
- arcapix.fs.gpfs.clib.utils.acl.append_nfs4_aces(pathname, aces)
Append one or more entries to the NFSv4 ACL for a path.
This is slightly more efficient than using
arcapix.fs.gpfs.clib.acl.get_nfsv4_acl()
andarcapix.fs.gpfs.clib.acl.put_nfsv4_acl()
since both steps are performed at the C-level- Parameters
pathname (str) – path of file or directory to get ACL for.
aces – an
ace_v4
NFSv4 entry or list of entries to append
Examples
Walk the filesystem
>>> import os
>>> from arcapix.fs.gpfs.clib.utils import walk
>>>
>>> for root, dirs, files in walk("/mmfs1"):
... for name in files:
... print(os.path.join(root, name))
... for name in dirs:
... print(os.path.join(root, name))
/mmfs1/test
/mmfs1/data
/mmfs1/.policytmp
Get the filesystem a given path belongs to
>>> from arcapix.fs.gpfs import Filesystem
>>> from arcapix.fs.gpfs.clib.utils import get_fsname_by_path
>>>
>>> fs = Filesystem(get_fsname_by_path('/mmfs1/data'))
>>>
>>> print(fs.name)
'mmfs1'
Walk the filesystem for a given snapshot
>>> import os
>>> from arcapix.fs.gpfs.clib.utils import scandir
>>>
>>> def walk(root, snap):
... for i in scandir(root, snap):
... yield i.path
... # recurse into the directory
... if i.is_dir():
... for d in walk(i.path, snap):
... yield d
...
>>> for i in walk('/mmfs1', 'snap1'):
... print(i)
...
/mmfs1/.snapshots/snap1/data
/mmfs1/.snapshots/snap1/.policytmp
Calculate the total size of temporary files on the filesystem
>>> import os
>>> from arcapix.fs.gpfs.clib.utils import scandir, inode_iterator
>>>
>>> # iterator of inode numbers for files that end '.tmp'
>>> def find_inodes(root):
... for i in scandir(root):
... if i.name.endswith('.tmp'):
... yield i.inode()
... # recurse into the directory
... if i.is_dir():
... for d in find_inodes(i.path):
... yield d
...
>>> # list of inode number of '.tmp' files
>>> inodes = list(find_inodes('/mmfs1'))
>>>
>>> # create iterator - use max and min to limit scope of scan
>>> itr = inode_iterator('mmfs1', fromInode=min(inode), toInode=max(inodes)+1)
>>>
>>> # add up sizes of inodes in the inode list
>>> print(sum(x.ia_size for x in itr if x.ia_inode in inodes))
100421
Count files in a directory tree in parallel
>>> from arcapix.fs.gpfs.clib.utils import parallel_walk
>>>
>>> # define a map function to count files only
>>> def count_files(dirent):
... if dirent.is_file():
... return 1
... return 0
...
>>> # perform a parallel directory tree walk
>>> count = parallel_walk('/mmfs1/data', count_files, workers=4)
>>>
>>> print(count)
1358926
Read a file without updating its atime
>>> import os
>>> from arcapix.fs.gpfs.clib.utils import reset_times
>>>
>>> print(os.stat('/mmfs1/hello.txt').st_atime)
1558002472
>>>
>>> with reset_times('/mmfs1/hello.txt'):
... with open('/mmfs1/hello.txt', 'r') as f:
... print(f.read())
...
hello world
>>>
>>> print(os.stat('/mmfs1/hello.txt').st_atime)
1558002472
Add a new entry to a file ACL
Grant read/write permission for the ‘admin’ group
Hint
This may be combined with arcapix.fs.gpfs.clib.utils.walk()
to add the new ACE to a directory tree, recursively.
>>> import grp
>>> from arcapix.fs.gpfs.clib.utils.acl import append_nfs4_aces
>>> from arcapix.fs.gpfs.clib.acl import ace_v4, AM_READ, AM_WRITE, AF_GROUP_ID
>>>
>>> # define the new entry
>>> gid = grp.getgrnam('admin').gr_gid
>>> ace = ace_v4(aceWho=gid, aceFlags=AF_GROUP_ID, aceMask=AM_READ|AM_WRITE)
>>>
>>> # append the new entry to the target file
>>> append_nfs4_aces('/mmfs1/test', ace)