4.2. Cloud Functions

Introduction

“Cloud to Hub” is a function that works in the cloud and reacts to changes in cloud storage, like when files are added or deleted. When something changes, it sends an update to Ngenea Hub, which then syncs the changes with other systems (like PixStor). This ensures that all the systems stay up-to-date and match the information in cloud storage.

It works with any supported platform and can handle any type of event (like creating or deleting files). To set it up for a new platform or event, you only need to change the settings in the config.json file.

Note: A function is a specific task or action that a system or program can perform. In this case, the Cloud to Hub function reacts to changes in cloud storage and triggers actions to keep other systems in sync.

Why Use Cloud Functions ?

Automation: Cloud Functions ensure that once something happens in the cloud (like a file upload), the correct action is taken automatically, without manual intervention.

Synchronization: Cloud Functions help keep different systems in sync with each other, making sure all systems know when a file has changed.

Scalability: The cloud-based nature of the function means it can handle large amounts of data and automatically scale as needed.

Supported Platforms:

4.2.3. Configuration

The configuration file config.json is the key to setting up Cloud Functions and defining how they should behave. This file is written in JSON, and it tells the system what actions to take when certain events occur. You will configure the settings to ensure everything works correctly.

Here are the most important sections of the configuration file:

4.2.3.1. version

The version field specifies the version of the configuration format. Currently, the only supported version is 1.0.

4.2.3.2. hub_access

To integrate and interact with Ngenea Hub, you need to configure a set of settings to establish communication with its REST API. These settings ensure that your system or application can properly authenticate, submit files, and trigger workflows for processing. Below is an explanation of each configuration setting, along with example values.

  • hub_ip : The IP address of the Ngenea Hub REST API. This identifies the location of the Ngenea Hub server within your network or on the internet. Example: hub_ip = "192.168.1.100"

  • hub_port : The port number used by the Ngenea Hub REST API. By default, the Ngenea Hub API may be set to use port 8000, but this can vary depending on your specific setup. Example: hub_port = 8000

  • hub_protocol : The protocol used for communicating with the Ngenea Hub API. You can choose between http (insecure) and https (secure). Example: hub_protocol = "https"

  • api_key: An API key used for authentication with the Ngenea Hub. This key ensures that your system has the necessary permissions to interact with the Hub. Example: api_key = "your-api-key-here"

  • workflow: The name of the workflow you wish to submit for processing an event file. Workflows define a set of tasks or steps that Ngenea Hub follows to process data.

    • reverse_stub: Typically used for new files, reversing or processing them in a specific way. Example: workflow = "reverse_stub"

    • recall: Another default workflow for recalling or processing files.

  • workflow_flags: A mapping of additional settings you can pass to customize the behavior of the selected workflow. These flags allow you to fine-tune the process by specifying certain options.Example: workflow_flags = {"hydrate": true}

  • Custom Workflows: By default, Ngenea Hub does not have a delete workflow. If you need to delete files, you must create a custom workflow to handle file deletion. As shown below, you might create a workflow named delete_file for this purpose:

{
  "name": "delete_file",
  "label": "delete_file",
  "icon_classes": [
    "fa fa-cloud fa-stack-2x text-success",
    "fa fa-angle-up fa-stack-2x text-light"
  ],
  "discovery": null,
  "enabled": true,
  "visible": true,
  "fields": [],
  "filter_rules": [
    {
      "type": "all",
      "state": "all",
      "action": [
        {
          "name": "dynamo.tasks.delete_paths_from_gpfs",
          "recursive": false
        }
      ],
      "description": "Delete the file at a given path"
    }
  ]
}

4.2.3.3. Sites

In systems like Ngenea Hub, events (such as file updates or deletions) need to be reflected across multiple sites to maintain consistency.

  • Site Name : Each site in the Ngenea Hub system is identified by a name. This name is crucial for tracking and managing events on a particular site.

  • Default Mode for ‘Recall’ Workflows: When an event occurs, the system needs to decide how to recall or reflect that event across sites. If the event path does not match a specific action, the default mode is used.

  • The two available default modes are:

    • Stub: A placeholder or temporary record used when the system doesn’t have detailed information.

    • Premigrate: This mode prepares the system for future migration or change before the event is fully applied.

  • Skip from Ngenea:

    • If the system is syncing files between multiple sites, there may be situations where certain files should not be reflected or recalled on sites. This is where the skip_from_ngenea setting comes into play.

    • When skip_from_ngenea is set to True, files created or transferred via Ngenea Hub will be skipped for recall. This is useful because files transferred through Ngenea Hub are already in sync between sites, so recalling them again is unnecessary.

    • However, if a file is uploaded directly to the cloud (outside of Ngenea Hub), it still needs to be reflected or recalled across the sites to ensure consistency.

  • Handling Delete Events:

    • Deleting a file or event can sometimes be tricky, especially when using cloud platforms like GCP (Google Cloud Platform). In these cases, it may be unclear where the file was deleted from, which can complicate the process of reflecting that delete action.

    • As a result, when a delete event occurs and the system cannot determine its origin, the system will reflect the delete on all sites to ensure that no inconsistent or outdated data remains across the sites.

By configuring these settings carefully, you can ensure that the right events are reflected accurately across all your sites, while avoiding unnecessary updates or recalls for already-synced data.

4.2.3.4. Actions

Mapping of actions – stub or premigrate – to path prefixes.

This section explains how actions such as “stub” or “premigrate” are mapped to specific path prefixes. These mappings determine whether a file should be included in a “recall” workflow, where the file is downloaded and prepared for use.

If a path matches multiple actions, the longest matching prefix will take priority. For example, consider the following configuration:

{
    "stub": ["data"],
    "premigrate": ["data/cats"]
}

In this case:

  • The path data/cats/cat-01.jpg will undergo premigrate, meaning this action prepares a file for migration to a different storage location, typically involving downloading and preparing the file for a transition or backup.

  • The path data/cats-02.jpg will be stubbed, meaning it will remain as a placeholder without being fully downloaded.

If no specific action is defined for a given path, the default action for the site (as mentioned earlier) will be applied.

However, if the default action is not set, or if hydrate is explicitly specified in the workflow_flags, that setting will take priority and override the default behavior. If no configuration is provided at all, the default action is to stub the file.

This system allows for more flexibility and precise control over how files are handled during the migration and recall processes.

4.2.3.5. optional

Optional settings are additional settings that can be configured in the config.json file to customize the behavior of the system. While their use is not mandatory, they offer increased flexibility for those seeking to refine and optimize the system’s functionality. Below is a detailed overview:

  • ngenea_prefix : Allows you to map a cloud path to a local storage path. For example, if you have a file in the cloud at data/cats-01.jpg, but want it to appear in a specific folder on your local system, like /mmfs1/data/cats-01.jpg, this setting adds a prefix to the cloud file path. This ensures it matches your local storage path. By default, the setting is empty (''), meaning no mapping occurs unless you specify one.

  • excludes : Allows you to exclude specific paths or files from being processed. For example, if you want to ignore files in the logs/ folder, you can add logs to the excludes list. By default, this setting is an empty list ([]), meaning no exclusions are applied unless explicitly specified.

  • append_jobs : The append_jobs setting controls whether multiple tasks should be grouped under the same job ID. If set to true, it groups tasks that occur within the same hour under one job ID. If set to false, each task will be assigned its own unique job ID. By default, this setting is false, meaning tasks are handled separately unless modified.

  • verbose : The verbose refers to the level of detail included in the output, especially in logs or messages. If set to true, it displays general info level logs, which provide information on the system’s activities. If set to false, only minimal log information will be shown. The default value is false, meaning only basic logs are shown unless specified otherwise.

  • debug : The debug setting provides detailed logs for troubleshooting. When set to true, it displays ‘debug levellogs, which contain in-depth technical information for resolving issues. If set tofalse, it does not show detailed debug information. The **default** value is false`, meaning fewer details are shown unless enabled.

    • Note : If both verbose and debug are enabled, the debug logs will take priority and provide more detailed information.

4.2.3.6. vendor

This setting tells the system which cloud service is being used. Right now, there are two options:

  • AWS (Amazon Web Services)

  • GCP (Google Cloud Platform)

So, if you’re using AWS or GCP for your system, you would set this option to either “AWS” or “GCP” to let the system know which cloud platform it should work with.

4.2.3.7. vendors

This section contains settings that are specific to a particular cloud platform (vendor), such as AWS. Currently, these settings are only used for AWS.

  • AWS-specific setting - ngeneabackupuser:

    • The ngeneabackupuser is the name of the user account that Ngenea uses within AWS. It’s used to identify whether a file came from the Ngenea system when processing files.

    • This identification is helpful for situations where you might want to skip files that were already uploaded or created by Ngenea, especially when the skip_from_ngenea setting is enabled.

    • Essentially, this ensures that files associated with Ngenea are handled appropriately, preventing redundant actions or unnecessary processing.

4.2.3.8. Complete Example

{
    "version": 1.0,
    "hub_access": {
        "hub_ip": "192.168.0.1",
        "hub_port": 8000,
        "hub_protocol": "http",
        "api_key": "pixitmedia.123456",
        "workflow": "reverse_stub",
        "workflow_flags": {
            "hydrate": false,
            "overwrite": true
        }
    },
    "sites": [
        {
            "site": "uk",
            "default": "stub"
        }
    ],
    "actions": {
        "stub": [],
        "premigrate": []
    },
    "optional": {
        "ngenea_prefix": "",
        "excludes": [],
        "append_jobs": true,
        "verbose": true,
        "debug": true
    },
    "vendor": "AWS",
    "vendors": {
        "AWS": {
            "ngeneabackupuser": ""
        },
        "GCP": {},
        "Azure": {}
    }
}