######## Cookbook ######## .. contents:: :local: ======== Overview ======== This page presents a number of "recipes", to show how to achieve simple tasks using the REST API. These are not supported as such, and are provided to give hints and ideas, rather than to be used as is. ================ Batch operations ================ Using a non system-user ----------------------- Whilst many environments have an accepted "robot" user to perform batch operations, typically with an obfuscated password, in some circumstances, this is either not possible, or not desirable. If you are using the sample authentication server, you can achieve passwordless batch operations by getting a token using a special user which is shared between your script and the authentication server. Firstly, add a user to the authentication server - you will need to restart the authentication server after doing this. In this example, we use a username of "specialuser", but it should be chosen on a site specific basis. .. code-block:: Python passwords = config.get("arcapix.search.server.authserver.passwords", {}) passwords.update({'specialuser': 'specialpassword'}) config['arcapix.search.server.authserver.passwords'] = passwords .. warning:: You may not be able to read or write that configuration property on a properly configured system without amending the group memberships/filesystem permissions/ACL's to grant your script user access to it. Then, utilise that user's credentials in your scripts .. code-block:: Python import requests from arcapix.config import config pwd = config.get("arcapix.search.server.authserver.passwords")['specialuser'] authserverurl = config.get("arcapix.search.server.authserver.url") resp = requests.post( authserverurl + "oauth2/token", data={"grant_type": "password", "username": "specialuser", "password": pwd}) token = resp.json()['access_token'] .. note:: This approach will only work if the user running the batch operation has access to the ``arcapix.search.server.authserver.passwords`` configuration key. In a properly configured system, such access will be limited to a specific group, quite possibly the ``apsearch`` user only. This example is available in the ``samples/cookbook`` directory, as ``batch_token.py`` and ``update_authserver.py`` ============== Locating files ============== Finding all the files which have changed recently ------------------------------------------------- This is most easily done by searching for all files which have a last modification date since the time you last checked. In essence .. code-block:: Python import requests from time import time, sleep def emitter(pathname): # Send pathname to your third party watch process # If you are very concerned at never emitting the same pathname twice # you can add logic to do that here print pathname token = ... # Follow method above or use system standardised robot user. server = ... # e.g. https://mypixsearchserver/api since = time() while True: # Get all files which have been modified more recently than (since - 600) seconds ago data = requests.get( '%s/files/?where={"core.modificationtime": {"gte": %r}}' % (server, since-600), auth=(token, '') ).json() # parse collection+json if 'items' in data['collection']: for file_ in data['collection']['items']: # locate the pathname of the file in the returned results pathname = [c['value'] for c in file_ if c['name']=='core.pathname'][0] # Pass over to third party wrapper function emitter(pathname) since = time() sleep(300) # Wait for 5 minutes and repeat .. note:: There is the potential for some lag between files being modified and making it into the database. Thus the presence of the ``since - 600`` to allow for a small overlap This example is available in the ``samples/cookbook`` directory, as ``files_since.py`` An alternative approach would be to write a custom plugin which will be called at ingest time. This is more complex to achieve, but has the advantage of being more "realtime". However, please note the following warning: .. warning:: This approach is considered an abuse of the supported interface for a plugin. The code may, at some future time, be modified in ways that break such plugin use. .. code-block:: Python from arcapix.search.metadata.plugins.base import Plugin def emitter(filename): # Send filename to your third party watch process print filename class ThirdPartyEmitterPlugin(Plugin): def namespace(self): return "_product_by_third_party" def schema(self): return [] def handles(self, mimetype, extension): return True def is_async(self): return True def process(self, filename, fileinfo): emitter(filename) # This could also be done asynchronously by calling, # self._submit(emitter, args=[filename]) # # However, this is a fairly heavyweight process, so should not be used unless # the emitter wrapped function is long running (>1 minute or so) By having handles & is_async both return True, the plugin has maximum flexibility. However, for performance reasons, you may wish to declare the plugin as only handling certain filetypes, or if possible, have is_async return False. .. This example is available in the ``samples/cookbook`` directory, as ``files_since_plugin.py`` =========================== Access via the command line =========================== PixStor search can be queried on the commandline using cURL e.g. .. code-block:: console $ curl 'https://mypixsearchserver/api/files/?where={"_all":"jpg"}&max_results=10&pretty' -H 'authorization: Basic ...' This will return search results in C+J format. See :doc:`/rest_api` for more information on the REST interface. Getting paths that match a query -------------------------------- To get a list of paths for files which match a certain query, the results from cURL can be processed like .. code-block:: console $ curl ... | awk -F': ' '/core\.pathname/ {getline; gsub(/\"/, ""); print $2}' Or using `jq `_ .. code-block:: console $ curl ... | jq -r '.collection.items[].data[] | select(.name=="core.pathname").value' Or using the ``pxs_file_list`` tool from the ``arcapix-search-client-utils`` package The paths can then be piped to some other utility, e.g. .. code-block:: console $ pxs_file_list --filter-by-field core.extension .tmp | xargs -I {} rm {} In this example we are finding and deleting temporary (``.tmp``) files. Note: this particular example only returns the first 10 results. If there are more than 10 files matching a search you will need to increase ``max_results``, or else iterate over pages of results