4.2. Cloud Functions

Cloud to Hub is a function intended to be running in the cloud. It runs in response to cloud storage events, triggering a job in Ngenea Hub to reflect the change onto another system (e.g. PixStor). This allows for keeping multiple systems in sync, using cloud storage as the source of truth.

The code is designed to work with any of the supported platforms (see above), and any event (create, delete, …) with only changes to the config.json file, as described below.

Currently supports:

Coming soon:

  • Azure

4.2.3. Configuration

The configuration file is in JSON format.

4.2.3.1. version

Indicates the config format version. Currently, only 1.0 is supported.

4.2.3.2. hub_access

Defines setting for interacting with Ngenea Hub

  • hub_ip: IP addresss of the Ngenea Hub REST API to use

  • hub_port: port for the Ngenea Hub REST API, typically 8000

  • hub_protocol: either http or https

  • api_key: API ‘client key’ to authenticate Ngenea Hub with

  • workflow: name of the workflow to submit for the event file, e.g. reverse_stub

  • workflow_flags: mapping of settings to pass to the workflow, e.g. {"hydrate": true}

For new files, one would typically use workflow reverse_stub or recall, which ship with Ngenea Hub by default. There is no default ‘delete’ workflow in Ngenea Hub, so one must be created - e.g.

{
  "name": "delete_file",
  "label": "delete_file",
  "icon_classes": [
    "fa fa-cloud fa-stack-2x text-success",
    "fa fa-angle-up fa-stack-2x text-light"
  ],
  "discovery": null,
  "enabled": true,
  "visible": true,
  "fields": [],
  "filter_rules": [
    {
      "type": "all",
      "state": "all",
      "action": [
        {
          "name": "dynamo.tasks.delete_paths_from_gpfs",
          "recursive": false
        }
      ],
      "description": "Delete the file at a given path"
    }
  ]
}

4.2.3.3. sites

List of sites to reflect the events to.

  • site: name of the site, as registered in Ngenea Hub

  • default: (optional) default mode for ‘recall’ type workflows. One of stub, premigrate

  • skip_from_ngenea: (optional) if True then any file that was created via ngenea will be skipped for this site

default is used if the event path doesn’t match any action (see below), and if hydrate isn’t explicitly set in workflow_flags (see above)

skip_from_ngenea is useful when e.g. Ngenea Hub is being used to sync files between sites. Files which are being transmitted via the cloud bucket don’t need to be automatically recalled onto either site. On the other hand, files uploaded directly to the cloud still need to be recalled.

For GCP, we may not be able to determine the source of a delete event. In that case, deletes will always be reflected to all sites.

4.2.3.4. actions

Mapping of actions – stub or premigrate – to path prefixes.

This is used to determine whether a file should be hydrated by a ‘recall’ type workflow. If a path matches multiple actions, the longest match wins. For example, using

{
    "stub": ["data"],
    "premigrate": ["data/cats"]
}

the path data/cats/cat-01.jpg would be premigrated, while data/cats-02.jpg would be stubbed.

If not specified, the site-specific default (see above) will be used, unless missing or unless hydrate is explicitly set in workflow_flags. If none of these settings are configured, the default behaviour is to stub.

4.2.3.5. optional

Optional settings

  • ngenea_prefix: prefix which maps a cloud path to a local path, e.g. with prefix /mmfs1 the cloud path data/cats-01.jpg is mapped to /mfs1/data/cats-01.jpg. Default: ''

  • excludes: list of strings used to exclude paths. The strings are treated as sub-strings which can match anywhere in the (cloud) path string. Default: []

  • append_jobs: if true, tasks will be grouped under the same job id (per hour). If false, each task will get its own job. Default: false

  • verbose: set logging output to info level. Default: false

  • debug: set logging output to debug level. Takes precedence over verbose. Default: false

4.2.3.6. vendor

The vendor that the function is being run on.

Currently supported values: AWS, GCP

4.2.3.7. vendors

Vendor specific settings. Currently only used by AWS

AWS

  • ngeneabackupuser: name of the user ngenea uses for AWS. Used to identify whether a file came from ngenea for skip_from_ngenea

4.2.3.8. Complete Example

{
    "version": 1.0,
    "hub_access": {
        "hub_ip": "192.168.0.1",
        "hub_port": 8000,
        "hub_protocol": "http",
        "api_key": "pixitmedia.123456",
        "workflow": "reverse_stub",
        "workflow_flags": {
            "hydrate": false,
            "overwrite": true
        }
    },
    "sites": [
        {
            "site": "uk",
            "default": "stub"
        }
    ],
    "actions": {
        "stub": [],
        "premigrate": []
    },
    "optional": {
        "ngenea_prefix": "",
        "excludes": [],
        "append_jobs": true,
        "verbose": true,
        "debug": true
    },
    "vendor": "AWS",
    "vendors": {
        "AWS": {
            "ngeneabackupuser": ""
        },
        "GCP": {},
        "Azure": {}
    }
}