5.4. Custom Workflows

5.4.1. Defining workflows

It's possible to define custom workflows which use pre-defined rules as building blocks to create your workflow.

Note

Custom workflows are not currently exposed via the UI. Use the API /api/workflows/ endpoint to create custom workflows

A workflow definition requires the following parameters:

Name

Description

name

The unique name for this workflow. For easy of submission again the API, this should not contain spaces.

label

The human readable name for this workflow, can contain spaces.

icon_classes

List of icon classes to represent the workflow in the UI. Font Awesome is useful here.

filter_rules

A list of rules to apply to provided files that match defined states. Described in more detail below.

fields

A list of runtime fields. Described in more detail below.

Additionally, you can optionally provide:

Name

Description

discovery

Which discovery task the workflow should be used by default, this can be either recursive or snapdiff.

discovery_options

A json containing any additional options to pass to the workflow default discovery. Described in more detail below.

5.4.1.1. Filter Rules

Filter rules are defined in JSON. They are a list of individual rules in a mapping format that will be performed on each matching file result when a discovery task is complete. If called through the API with no discovery task provided, rules will be applied to any states provided in the workflow input.

Steps are defined in JSON. Steps is a list of individual steps that will be performed serially. Each rule must contain the following:

Name

Description

Required

state

The state of a result provided by the discovery task with any given path, an example of that could be "processed" or "modified" more details about this are in the discovery section.

Yes

type

The type of result the rule will apply to, the only valid types are: file|directory|symlink|all

Yes

action

A list of tasks to perform on files that match the state and type

Yes

include

A list of globs to apply to provided files to limit actions to just them.

No

exclude

A list of globs to apply to provided files. Described in more detail below.

No

ignore_site_includes

Whether to ignore any global includes defined on the site the workflow will run on

No

ignore_site_excludes

Whether to ignore any global excludes defined on the site the workflow will run on

No

These rules control which actions will be performed on certain files based on their given state that they have been given following specific discovery tasks such as snapdiff or provided in the initial input of a workflow. These states can allow direct control of workflows performed on files provided, allowing multiple workflow paths within the same job by utilizing multiple rules controlling specific states with additional control with include and exclude path rules.

Alongside rules bound to a state, there are two special states that rules can be used, these being default and all. Rule sets cannot have both default and all rules within them, but it is possible to have multiple of one type with different sets of exclude and include rules to allow for more granular control.

Rules defined with default as their state and type will perform their action on paths that have not been captured by all other rules within a given rule set. This means that if there are specific file states that need to be actioned differently, paths that do not match any other rules actioned against without ignoring those non-matching paths.

The other special rule type is rules with the state and type of all. This rule will perform its action on all paths regardless of their provided type and state. This is an additional operation so if another rule has an explicit rule provided it will perform multiple actions on the same path, for each matching rule in rule set. Simple workflows are typically composed of a single rule with the state and type of all as this will simply process all paths provided to it.

Within each rule, there must be a list of actions to perform on the resulting file provided within the action key. These actions will be performed serially. Each action must be a mapping that contain the following in each entry:

Name

Description

Required

name

The name of the task to run, e.g. dynamo.tasks.migrate

Yes

site

The name of the site to run against, if this is not provided it will use the site provided within the workflow call.

No

queue

The name of the queue the task should run on. The queue must exist on the site the task will run against. If not provided it will use the queue provided in the workflow call, or the default queue if one is not provided

No

If steps have optional arguments, these can be passed as additional key:value pairs in these step definition mapping to pass those optional arguments.

As an example we can define a generic rule that captures every type of file and state and sends it to a second site, this would be useful for a bulk move using the recursive discovery task to cover all types of files in directories provided to the task:

Example 1 - Send to london
{
    "state": "all",
    "type": "all",
    "action": [
        {
            "name": "dynamo.tasks.migrate",
            "queue": "highpriority"
        },
        {
            "name": "dynamo.tasks.reverse_stub",
            "site": "london"
        }
    ]
}

5.4.1.2. Runtime fields

A workflow needs to be able to accept parameters as it submitted. Taking example #1 above, "london" doesn't want to be hardcoded as the destination site, as that would mean a new workflow would need to be defined for each possible destination.

Instead, fields can be defined, that in turn will need to be provided at workflow submission time. Fields are defined as a mapping with the following keys:

Name

Description

Required

name

The name of the field.

Yes

label

The friendly name for this field, used for presenting in the UI

Yes

type

The type of the field, valid options are:

  • string - a free text field

  • int - a free text field that will be validated a integer

  • bool - a checkbox

  • choices - A dropdown box representing a list of choices, populated from choices list of objects.

  • enum[enum_type] - A dropdown box representing a choice of option, populated from enum_type. enum_type can be one of the following

    • site - A list of all the sites Ngenea Hub has defined

    • queue - A list of all queues available on the selected site

  • list - a list of values of any scalar type

Yes

default

The default value for runtime fields

optional

The following is an example of a custom field definition for providing a site to an action step:

Example 2 - Custom field definition
[
    {
        "name": "target_site",
        "label": "Site to migrate to",
        "type": "enum[site]"
    }
]
Custom field defintion with default value
[
    {
        "name": "target_site",
        "label": "site to migrate to",
        "type": "enum[site]",
        "default": "london"
    }
]

If default value is specified in runtime fields, it will take the default value for fields while running workflow if the user input is not given otherwise it will always use the user input.

Back in the definition of an action step, any value that is prefixed with a * will be used as a field name and the value replaced instead of a literal string.

The following example, modifies example #1 to use the custom field as defined in example #3:

Example 3 - Updated rule now using custom fields
{
    "state": "all",
    "type": "all",
    "action": [
        {
            "name": "dynamo.tasks.migrate"
        },
        {
            "name": "dynamo.tasks.reverse_stub",
            "site": "*target_site"
        }
    ]
}

So, a complete request to create a workflow that will process all file and state types with a dynamic "site" field will look like:

Example 4 - Full workflow request
{
    "name": "send_file",
    "label": "Send files from one site to another",
    "icon_classes": ["fa fa-cloud fa-stack-2x text-primary", "fa fa-refresh fa-stack-1x text-light"],
    "filter_rules": [
        {
            "state": "all",
            "type": "all",
            "action": [
                {
                    "name": "dynamo.tasks.migrate"
                },
                {
                    "name": "dynamo.tasks.reverse_stub",
                    "site": "*target_site"
                }
            ]
        }
    ],
    "fields": [
        {
            "name": "target_site",
            "label": "Site to migrate to",
            "type": "enum[site]"
        }
    ]
}

The following is an example of a custom field definition for providing a choices to an action step:

Example 5 - Custom field definition
[
    {
        "name": "sync_policy",
        "label": "sync_policy",
        "type": "choices",
        "choices": [
            {
               "label": "Newest",
               "value": "newest"
            },
            {
               "label": "Sourcesite",
               "value": "sourcesite"
            }
        ]
    }
]

choices support both string and integer type values.

Back in the definition of an action step, any value that is prefixed with a * will be used as a field name and the value replaced instead of a literal string.

The following example, uses the custom field in action:

Example 6 - Updated rule now using custom fields
{
    "state": "all",
    "type": "all",
    "action": [
        {
            "name": "dynamo.tasks.migrate"
        },
        {
            "name": "dynamo.tasks.reverse_stub",
            "sync_policy": "*sync_policy"
        }
    ]
}

So, a complete request to create a workflow that will process all file and state types with a static choices field will look like:

Example 7 - Full workflow request
{
    "name": "send_file",
    "label": "Send files from one site to another",
    "icon": "<span class='fa-stack'><i class='fa fa-cloud fa-stack-2x text-primary'></i><i class='fa fa-angle-right fa-stack-2x text-light'></i></span>"
    "filter_rules": [
        {
            "state": "all",
            "type": "all",
            "action": [
                {
                    "name": "dynamo.tasks.migrate"
                },
                {
                    "name": "dynamo.tasks.reverse_stub",
                    "sync_policy": "*sync_policy"
                }
            ]
        }
    ],
    "fields": [
        {
            "name": "sync_policy",
            "label": "sync_policy",
            "type": "choices",
            "choices": [
               {
                  "label": "Newest",
                  "value": "newest"
               },
               {
                  "label": "Sourcesite",
                  "value": "sourcesite"
               }
            ]
        }
    ]
}

5.4.2. Running Workflows

Once a workflow has been defined, it can be performed through the file browser by selecting files and directories and clicking the actions button. It is then possible to select the workflow you wish to call, this workflow call will not use a discovery task unless a directory is selected, in that case it will make use of the recursive discovery step.

This can also be performed via a POST request to /api/file/workflow. When called through the API, you have the option to provide a discovery step, these steps can expand the initial paths provided to them to either recursively perform actions or perform something like a file difference scan.

Name

Description

Type

Required

paths

A list of paths to perform the workflows against, these can be just strings of file absolute file paths or can be JSON with the keys of "path" and "state", detailed example in example 7

JSON List

Yes

site

The site to perform the workflow against

String

Yes

queue

The queue to run workflow tasks on. The default queue will be used if not provided.

String

None

fields

The runtime fields for a workflow

String

Yes

discovery

The discovery phase to use for this workflow run, this will override any defaults

String

No

job

The ID of a job that this workflow should be run within

Integer

No

Following the example workflow defined above, you can call the workflow to recursively send all files within any paths provided using the following POST to /api/file/workflow:

Example 8 - Calling example workflow
{
    "paths": [
        "/mmfs1/data/project_one",
        "/mmfs1/data/project_two"
    ],
    "site": "london",
    "queue": "highpriority",
    "workflow": "send_file",
    "discovery": "recursive",
    "fields": {
        "target_site": "dublin",
    }
}

This will now migrate all files within /mmfs1/data/project_one and /mmfs1/data/project_two and then recall them at the site defined as dublin.

If there is a more complex workflow that have been defined that includes rules for specific states, the input paths can include this state information. This behaviour can be only be used when no discovery state is provided, an example of a custom rule set using could be:

Example 9 - Calling workflow with state data
{
    "name": "migrate_state",
    "label": "Stateful file migration",
    "filter_rules": [
        {
            "type": "all",
            "state": "modified",
            "action": {
                "name": "dynamo.tasks.migrate"
            }
        {
            "type": "all",
            "state": "moved",
            "action": {
                "name": "dynamo.tasks.delete_paths_from_gpfs"
            }
        }
    ],
    "discovery": null,
    "fields": []
}

Here is a simple rule set that will migrate all paths provided with the state modified and will delete all paths provided with the state moved. With this example workflow provided you can perform a POST to /api/file/workflow with the following JSON:

Example 10 - Calling workflow with state data
{
    "paths": [
        {
            "path": "/mmfs1/data/project_one",
            "state": "modified"
        {
            "path": "/mmfs1/data/project_two",
            "state": "moved"
        }
    ],
    "site": "london",
    "workflow": "migrate_state",
    "discovery": null,
    "fields": {}
}

Using multiple state based rules with different include and exclude path filters, you could achieve more complex behaviour in workflow calls for more finite control.

5.4.2.1. Discovery Steps

Discovery steps can make complex large bulk operations much more manageable to call, allowing you to provide a single path that expands to cover all the contents of a path, or to see time based differences for a given path.

Note

If a workflow is submitted without a discovery task explicitly provided, it will default to using the discovery task defined as the default during the workflow's creation, visible via the workflow's "discovery" attribute. To avoid this, it is possible to expliclty pass null as the discovery task via the API to skip any discovery phase and additional processing on the paths provided and instead process the actions specified using the rules, without any additional checks.

Name

Description

Supported states

recursive

Performs a recursive expansion of the initial provided paths. This allows paths to be expanded to cover all sub file and directories, it will then perform the defined action for all the generic rules in a workflow against all resulting files.

all

snapdiff

Performs a time based file scan on an independant fileset between the last time a scan was performed. It will retrieve all file differences between those moments in time and the state of that file.

created|updated|moved|deleted|all

For more complex discovery steps such as snapdiff, there are defined states that files, directories and links can be in once it has completed its scan. This allows more explicit control of file and state control within a single call to a workflow. If for example you want all results with the type of file that have the state created to be sent to another site without any temporary files, a rule to cover that could be:

Example 11 - Custom rule for filtering snapdiff discovery results
{
    "state": "created",
    "type": "file",
    "exclude": ["*.temp"],
    "action": [
        {
            "name": "dynamo.tasks.migrate"
        },
        {
            "name": "dynamo.tasks.reverse_stub",
            "site": "*target_site"
        }
    ]
}

5.4.2.1.1. Discovery Options

Additional options can be passed to the discovery task by setting discovery_options on the workflow. The supported options are those described in Discovery Steps.

Example 12 - Passing options to the discovery
{
    "discovery": "recursive",
    "discovery_options": {
        "skip_missing": false
    }
}

Discovery options will be used with the workflow default discovery only. If a different discovery is set at runtime, the discovery options will be ignored. Discovery options cannot currently be overwritten at runtime.

5.4.3. Includes and Excludes

Includes and excludes can be used to select paths that individual filter rules should apply to, or generally limit which paths should be handled during a workflow run.

Include/exclude patterns behave like unix shell pattern matching ('globbing'). The 'wildcard' asterisk character * will match any characters within a string. Patterns must match whole paths; partial matches are not supported, except through the use of wildcards.

Includes and excludes are combined as "a path matching any includes and not any excludes". For example {"include": ["/mmfs1/data/*"], "exclude": ["*.tmp"]} would match only files in /mmfs1/data, but not files in that directory with the .tmp extension. If no includes are defined, then all files are considered included (unless explicitly excluded).

There are three places where path include and exclude patterns can be defined:

  • on a site

  • within a filter rule

  • at runtime, when a workflow is submitted

Site patterns can be used to apply includes and excludes globally, to all workflows and workflow steps. If defined, these will be appended to any patterns defined within a filter rule or at runtime.

For example, if a rule defines {"exclude": ["*.tmp"]} and the site defines {"exclude": ["*.cache"]}, then the combined excludes for that rule would be {"exclude": ["*.tmp", "*.cache"]}

If not desired, this behaviour can be overridden by specifying ignore_site_includes / ignore_site_excludes either on a per-rule basis, or for all rules by passing those parameters when submitting a workflow run.

For workflows involving multiple sites, such as send and sync, only the primary (source) site patterns will be considered.

If includes and excludes are passed when submitting a workflow, they will be applied to all filter rules within the workflow, replacing any patterns already defined within the workflow rules. Any site patterns will be appended to the runtime patterns, unless 'ignore' is specified.