5.4. Custom Workflows¶
5.4.1. Defining workflows¶
It's possible to define custom workflows which use pre-defined rules as building blocks to create your workflow.
Note
Custom workflows are not currently exposed via the UI. Use the API /api/workflows/
endpoint to create custom workflows
A workflow definition requires the following parameters:
Name |
Description |
---|---|
|
The unique name for this workflow. For easy of submission again the API, this should not contain spaces. |
|
The human readable name for this workflow, can contain spaces. |
|
List of icon classes to represent the workflow in the UI. Font Awesome is useful here. |
|
A list of rules to apply to provided files that match defined states. Described in more detail below. |
|
A list of runtime fields. Described in more detail below. |
Additionally, you can optionally provide:
Name |
Description |
---|---|
|
Which discovery task the workflow should be used by default, this can be either recursive or snapdiff. |
|
A json containing any additional options to pass to the workflow default discovery. Described in more detail below. |
5.4.1.1. Filter Rules¶
Filter rules are defined in JSON. They are a list of individual rules in a mapping format that will be performed on each matching file result when a discovery task is complete. If called through the API with no discovery task provided, rules will be applied to any states provided in the workflow input.
Steps are defined in JSON. Steps is a list of individual steps that will be performed serially. Each rule must contain the following:
Name |
Description |
Required |
---|---|---|
|
The state of a result provided by the discovery task with any given path, an example of that could be "processed" or "modified" more details about this are in the discovery section. |
Yes |
|
The type of result the rule will apply to, the only valid types are: |
Yes |
|
A list of tasks to perform on files that match the state and type |
Yes |
|
A list of globs to apply to provided files to limit actions to just them. |
No |
|
A list of globs to apply to provided files. Described in more detail below. |
No |
|
Whether to ignore any global includes defined on the site the workflow will run on |
No |
|
Whether to ignore any global excludes defined on the site the workflow will run on |
No |
These rules control which actions will be performed on certain files based on their given state that they have been given
following specific discovery tasks such as snapdiff
or provided in the initial input of a workflow. These states can
allow direct control of workflows performed on files provided, allowing multiple workflow paths within the same job by
utilizing multiple rules controlling specific states with additional control with include and exclude path rules.
Alongside rules bound to a state, there are two special states that rules can be used, these being default
and all
.
Rule sets cannot have both default
and all
rules within them, but it is possible to have multiple of one type
with different sets of exclude and include rules to allow for more granular control.
Rules defined with default
as their state
and type
will perform their action on paths that have not been captured by
all other rules within a given rule set. This means that if there are specific file states that need to be actioned
differently, paths that do not match any other rules actioned against without ignoring those non-matching paths.
The other special rule type is rules with the state
and type
of all
. This rule will perform its action on all paths
regardless of their provided type and state. This is an additional operation so if another rule has an explicit rule
provided it will perform multiple actions on the same path, for each matching rule in rule set. Simple workflows are
typically composed of a single rule with the state and type of all
as this will simply process all paths provided to it.
Within each rule, there must be a list of actions to perform on the resulting file provided within the action
key.
These actions will be performed serially. Each action must be a mapping that contain the following in each entry:
Name |
Description |
Required |
---|---|---|
|
The name of the task to run, e.g. |
Yes |
|
The name of the site to run against, if this is not provided it will use the site provided within the workflow call. |
No |
If steps have optional arguments, these can be passed as additional key:value pairs in these step definition mapping to pass those optional arguments.
As an example we can define a generic rule that captures every type of file and state and sends it to a second site,
this would be useful for a bulk move using the recursive
discovery task to cover all types of files in directories
provided to the task:
{
"state": "all",
"type": "all",
"action": [
{
"name": "dynamo.tasks.migrate"
},
{
"name": "dynamo.tasks.reverse_stub",
"site": "london"
}
]
}
5.4.1.2. Runtime fields¶
A workflow needs to be able to accept parameters as it submitted. Taking example #1 above, "london" doesn't want to be hardcoded as the destination site, as that would mean a new workflow would need to be defined for each possible destination.
Instead, fields can be defined, that in turn will need to be provided at workflow submission time. Fields are defined as a mapping with the following keys:
Name |
Description |
Required |
---|---|---|
|
The name of the field. |
Yes |
|
The friendly name for this field, used for presenting in the UI |
Yes |
|
The type of the field, valid options are:
|
Yes |
|
The default value for runtime fields |
optional |
The following is an example of a custom field definition for providing a site to an action step:
[
{
"name": "target_site",
"label": "Site to migrate to",
"type": "enum[site]"
}
]
[
{
"name": "target_site",
"label": "site to migrate to",
"type": "enum[site]",
"default": "london"
}
]
If default value is specified in runtime fields, it will take the default value for fields while running workflow if the user input is not given otherwise it will always use the user input.
Back in the definition of an action step, any value that is prefixed with a *
will be used as a field name and the
value replaced instead of a literal string.
The following example, modifies example #1 to use the custom field as defined in example #3:
{
"state": "all",
"type": "all",
"action": [
{
"name": "dynamo.tasks.migrate"
},
{
"name": "dynamo.tasks.reverse_stub",
"site": "*target_site"
}
]
}
So, a complete request to create a workflow that will process all file and state types with a dynamic "site" field will look like:
{
"name": "send_file",
"label": "Send files from one site to another",
"icon_classes": ["fa fa-cloud fa-stack-2x text-primary", "fa fa-refresh fa-stack-1x text-light"],
"filter_rules": [
{
"state": "all",
"type": "all",
"action": [
{
"name": "dynamo.tasks.migrate"
},
{
"name": "dynamo.tasks.reverse_stub",
"site": "*target_site"
}
]
}
],
"fields": [
{
"name": "target_site",
"label": "Site to migrate to",
"type": "enum[site]"
}
]
}
The following is an example of a custom field definition for providing a choices to an action step:
[
{
"name": "sync_policy",
"label": "sync_policy",
"type": "choices",
"choices": [
{
"label": "Newest",
"value": "newest"
},
{
"label": "Sourcesite",
"value": "sourcesite"
}
]
}
]
choices support both string and integer type values.
Back in the definition of an action step, any value that is prefixed with a *
will be used as a field name and the
value replaced instead of a literal string.
The following example, uses the custom field in action:
{
"state": "all",
"type": "all",
"action": [
{
"name": "dynamo.tasks.migrate"
},
{
"name": "dynamo.tasks.reverse_stub",
"sync_policy": "*sync_policy"
}
]
}
So, a complete request to create a workflow that will process all file and state types with a static choices field will look like:
{
"name": "send_file",
"label": "Send files from one site to another",
"icon": "<span class='fa-stack'><i class='fa fa-cloud fa-stack-2x text-primary'></i><i class='fa fa-angle-right fa-stack-2x text-light'></i></span>"
"filter_rules": [
{
"state": "all",
"type": "all",
"action": [
{
"name": "dynamo.tasks.migrate"
},
{
"name": "dynamo.tasks.reverse_stub",
"sync_policy": "*sync_policy"
}
]
}
],
"fields": [
{
"name": "sync_policy",
"label": "sync_policy",
"type": "choices",
"choices": [
{
"label": "Newest",
"value": "newest"
},
{
"label": "Sourcesite",
"value": "sourcesite"
}
]
}
]
}
5.4.2. Running Workflows¶
Once a workflow has been defined, it can be performed through the file browser by selecting files and directories
and clicking the actions button. It is then possible to select the workflow you wish to call, this workflow call
will not use a discovery task unless a directory is selected, in that case it will make use of the recursive
discovery step.
This can also be performed via a POST request to /api/file/workflow
. When called through the API, you have the option
to provide a discovery step, these steps can expand the initial paths provided to them to either recursively perform
actions or perform something like a file difference scan.
Name |
Description |
Type |
Required |
---|---|---|---|
|
A list of paths to perform the workflows against, these can be just strings of file absolute file paths or can be JSON with the keys of "path" and "state", detailed example in example 7 |
JSON List |
Yes |
|
The site to perform the workflow against |
String |
Yes |
|
The runtime fields for a workflow |
String |
Yes |
|
The discovery phase to use for this workflow run, this will override any defaults |
String |
No |
|
The ID of a job that this workflow should be run within |
Integer |
No |
Following the example workflow defined above, you can call the workflow to recursively send all files within any paths
provided using the following POST to /api/file/workflow
:
{
"paths": [
"/mmfs1/data/project_one",
"/mmfs1/data/project_two"
],
"site": "london",
"workflow": "send_file",
"discovery": "recursive",
"fields": {
"target_site": "dublin",
}
}
This will now migrate all files within /mmfs1/data/project_one
and /mmfs1/data/project_two
and then recall them
at the site defined as dublin
.
If there is a more complex workflow that have been defined that includes rules for specific states, the input paths can include this state information. This behaviour can be only be used when no discovery state is provided, an example of a custom rule set using could be:
{
"name": "migrate_state",
"label": "Stateful file migration",
"filter_rules": [
{
"type": "all",
"state": "modified",
"action": {
"name": "dynamo.tasks.migrate"
}
{
"type": "all",
"state": "moved",
"action": {
"name": "dynamo.tasks.delete_paths_from_gpfs"
}
}
],
"discovery": null,
"fields": []
}
Here is a simple rule set that will migrate all paths provided with the state modified
and will delete all paths provided
with the state moved
. With this example workflow provided you can perform a POST to /api/file/workflow
with the following JSON:
{
"paths": [
{
"path": "/mmfs1/data/project_one",
"state": "modified"
{
"path": "/mmfs1/data/project_two",
"state": "moved"
}
],
"site": "london",
"workflow": "migrate_state",
"discovery": null,
"fields": {}
}
Using multiple state based rules with different include and exclude path filters, you could achieve more complex behaviour in workflow calls for more finite control.
5.4.2.1. Discovery Steps¶
Discovery steps can make complex large bulk operations much more manageable to call, allowing you to provide a single path that expands to cover all the contents of a path, or to see time based differences for a given path.
Note
If a workflow is submitted without a discovery task explicitly provided, it will default to using the discovery task
defined as the default during the workflow's creation, visible via the workflow's "discovery" attribute. To avoid this,
it is possible to expliclty pass null
as the discovery task via the API to skip any discovery phase and additional
processing on the paths provided and instead process the actions specified using the rules, without any additional checks.
Name |
Description |
Supported states |
---|---|---|
|
Performs a recursive expansion of the initial provided paths. This allows paths to be expanded to cover all sub file and directories, it will then perform the defined action for all the generic rules in a workflow against all resulting files. |
|
|
Performs a time based file scan on an independant fileset between the last time a scan was performed. It will retrieve all file differences between those moments in time and the state of that file. |
|
For more complex discovery steps such as snapdiff
, there are defined states that files, directories and links can be in
once it has completed its scan. This allows more explicit control of file and state control within a single call to a workflow.
If for example you want all results with the type of file
that have the state created
to be sent to another site without
any temporary files, a rule to cover that could be:
{
"state": "created",
"type": "file",
"exclude": ["*.temp"],
"action": [
{
"name": "dynamo.tasks.migrate"
},
{
"name": "dynamo.tasks.reverse_stub",
"site": "*target_site"
}
]
}
5.4.2.1.1. Discovery Options¶
Additional options can be passed to the discovery task by setting discovery_options
on the workflow.
The supported options are those described in Discovery Steps.
{
"discovery": "recursive",
"discovery_options": {
"skip_missing": false
}
}
Discovery options will be used with the workflow default discovery only. If a different discovery is set at runtime, the discovery options will be ignored. Discovery options cannot currently be overwritten at runtime.
5.4.3. Includes and Excludes¶
Includes and excludes can be used to select paths that individual filter rules should apply to, or generally limit which paths should be handled during a workflow run.
Include/exclude patterns behave like unix shell pattern matching ('globbing'). The 'wildcard' asterisk character *
will match any characters within a string. Patterns must match whole paths; partial matches are not supported, except through the use of wildcards.
Includes and excludes are combined as "a path matching any includes and not any excludes". For example {"include": ["/mmfs1/data/*"], "exclude": ["*.tmp"]}
would match only files in /mmfs1/data
, but not files in that directory with the .tmp
extension. If no includes are defined, then all files are considered included (unless explicitly excluded).
There are three places where path include and exclude patterns can be defined:
on a site
within a filter rule
at runtime, when a workflow is submitted
Site patterns can be used to apply includes and excludes globally, to all workflows and workflow steps. If defined, these will be appended to any patterns defined within a filter rule or at runtime.
For example, if a rule defines {"exclude": ["*.tmp"]}
and the site defines {"exclude": ["*.cache"]}
, then the combined excludes for that rule would be {"exclude": ["*.tmp", "*.cache"]}
If not desired, this behaviour can be overridden by specifying ignore_site_includes
/ ignore_site_excludes
either on a per-rule basis, or for all rules by passing those parameters when submitting a workflow run.
For workflows involving multiple sites, such as send and sync, only the primary (source) site patterns will be considered.
If includes and excludes are passed when submitting a workflow, they will be applied to all filter rules within the workflow, replacing any patterns already defined within the workflow rules. Any site patterns will be appended to the runtime patterns, unless 'ignore' is specified.