5.6. Queues¶

When a system needs to handle many tasks (called jobs), it organizes them into different queues. A queue is like a waiting line where jobs sit until a worker is ready to run them.

By using custom job queues, you can control how and when certain jobs run. Each queue can have its own settings, like how many jobs it can run at the same time.

Here are some examples:

High Priority Queue: This queue is for important jobs that need to be finished quickly. It uses more threads (like more workers), so these jobs are handled faster.
Low Priority Queue: This queue is for jobs that are not urgent. It uses fewer threads, which means jobs here will take longer, but that’s okay because they’re not time-sensitive.
Transparent Recall Queue: This special queue is only for transparent recall jobs. It ensures these jobs aren’t delayed by other tasks, so they always get the attention they need.

In addition to these custom queues, there is always a default queue. If you don’t choose a specific queue when submitting a job, it will automatically go to the default queue.

5.6.1. Configuration¶

Queues are set up in a file called the worker configuration file, which is located at: /etc/ngenea/ngenea-worker.conf

To create a new queue, you simply add a new section to this file using the format: [Queue queue_name]

For example, to create a queue named highpriority that can run 20 jobs at the same time, you would add this:

[Queue highpriority]
threads = 20

Rules for Naming Queues:

You can use letters, numbers, underscores (_), and hyphens (-).
Spaces are not allowed in queue names.

Each queue can have settings to control how it works. The most important one is:

- **threads**: This tells the system how many jobs it can run at the same time in that queue.

Inheriting Settings

If you don’t add any settings to a queue, it will inherit the default (global) settings from a special section in the same file called [settings].

Here’s an example:

[settings]
...
threads = 4

[Queue transparent_recall]

[Queue highpriority]
threads = 10

In this case:

The highpriority queue will run up to 10 jobs at once because it’s explicitly set.
The transparent_recall queue doesn’t specify a number of threads, so it uses the global setting, which is 4.
If there is no global threads setting at all, the system uses 10 threads by default.

5.6.2. Functions¶

Job queues can be used for different types of tasks, called functions. Each queue supports a specific set of these functions.

Functions in Custom Queues

Custom queues (the ones you define yourself) can run the following functions:

worker – runs the main workflow tasks
discovery – runs discovery tasks like recursive scans or snapdiff
custom – runs tasks from custom plugin

Functions in the Default Queue

The default queue, which always exists, supports all the functions listed above plus two more:

interactive – handles internal tasks like browsing files
settings – handles configuration tasks like creating spaces or setting policies

Note: Custom queues do not support interactive or settings functions. Only the default queue can run these.

By default, all functions in a queue share the same number of threads. For example, if a queue has 20 threads, each function can use those 20 threads as needed.

You can also customize how many threads each function gets.

Here’s an example:

[Queue highpriority]
threads = 20
custom = {"threads": 10}

In this example:

The worker and discovery functions will each use up to 20 threads.
The custom function is limited to 10 threads, even though the total thread count is 20.

Note: If you set per-function settings in the [settings] section (the global configuration), they only apply to the default queue. These settings are not inherited by your custom queues.

5.6.2.1. Disabling Functions¶

You can turn off (disable) specific functions for a queue if you don’t want certain types of tasks to run there.

For example, if you don’t want a queue to run custom plugin tasks, you can disable the custom function by setting it to false.

Example:

[Queue no-custom]
custom = false

In this example:

The custom function is disabled.
That means custom plugin jobs will not run in the highpriority queue.
This is useful when you want a queue to handle only specific types of jobs and avoid others.

In the same way, default queue functions can be disabled.

For example, in a cluster, you may set the management node to only run settings tasks

# management node
[settings]
discovery = false
worker = false
custom = false

And, you set ngenea nodes to only run workflow tasks, and not run settings tasks

# ngenea node
[settings]
settings = false

Warning

Each default queue function must be enabled on at least one node within a cluster, or else Hub will not function correctly.

5.6.3. Queue discovery¶

When you update the worker config file to add a new queue, it will automatically start up. If you remove a queue from the config, it will automatically shut down.

If this doesn’t happen right away, you can force the worker to reload the config with this command:

systemctl reload ngenea-worker

This reloads the worker config without affecting any queues that haven’t changed.

Queue Visibility in Ngenea Hub

Ngenea Hub automatically picks up new queues shortly after they’re started, using worker heartbeats. New queues will show up in the Hub and can be used in workflows almost immediately.

However, removing a queue from the config doesn’t automatically remove it from Ngenea Hub.

Removed queues must be manually de-registered using:

ngeneahubctl manage remove_queue <site> <queue>

Re-using a Queue Name

If you want to re-use a queue name that was previously removed:

First, restart the worker completely: systemctl restart ngenea-worker

Then remove the queue using this command:

ngeneahubctl manage remove_queue --offline-only <site> <queue>

--offline-only will only remove the queue if it’s not running. If the queue is still active, it will be recreated.
To ensure a queue is completely removed and doesn’t come back, shut it down first and then use --offline-only.

Note: If you submit a job to a queue which has been shutdown and not de-registered, the job will not be processed unless the queue is configured and started up again.

5.6.4. Queue removal¶

When a queue is removed from the worker configuration file and the worker is reloaded, the queue will be shutdown.

However, the removed queue will not be automatically de-registered from Ngenea Hub.

Warning

If you submit a job to a queue which has been shutdown and not de-registered, the job will not be processed unless the queue is configured and started up again.

Removed queues must be manually de-registered using the following command

ngeneahubctl manage remove_queue <site> <queue>

If the queue is brought back online, or if a new queue is created with the same name, it will not be recreated in Ngenea Hub until the queue has expired.

Removed queues expire when no heartbeats have been received for some interval after being removed. The expiry interval is configured using the REMOVED_QUEUE_CLEANUP_INTERVAL setting, which defaults to 1 day.

To immediately re-use the same queue name of a previously removed queue, the --offline-only flag can be used. This must be preceded by a complete worker restart

systemctl restart ngenea-worker

--offline-only will remove the queue only if the queue is offline.
If the queue is still online, or comes back online, it will be automatically recreated.

Note: If you send a job to a queue that has been shut down but not de-registered, the job won’t be processed until the queue is set up and started again.

5.6.5. Memory considerations¶

Queues use about 40MB of RAM per thread when no tasks are running.

Each queue has three functions, and if all three functions in a queue are set up with the same number of threads, the memory needed will be roughly: threads * 125MB.

For example:

A high-priority queue with 20 threads per function (60 threads in total) will use about 2.5GB of memory when no tasks are running.
A low-priority queue with 5 threads per function (15 threads in total) will use about 625MB of memory when no tasks are running.

Keep in mind that task execution will require additional memory on top of this baseline. The more threads there are, the more tasks can run simultaneously, which means a queue with many threads can use a lot of memory at peak times.

5.6.6. Workflows¶

When submitting a workflow, you can specify a queue to use. If no queue is provided, the default queue will be used.

For workflows that involve tasks running on multiple sites (like send or sync workflows), the same queue will be used across all sites. If the specified queue doesn’t exist on one of the sites, the default queue will be used instead.

You can also assign a different queue to each task within a workflow. This is done by specifying the queue in the task parameters, like so:

{
    "name": "dynamo.tasks.migrate",
    "queue": "highpriority"
}

Selecting Queues at Runtime

When you’re setting up a workflow, you can choose which queue to use for each task at the time the workflow runs (this is known as “runtime”). This allows you to dynamically decide which queue a task should use, based on the situation or inputs you provide while the workflow is executing.

You can use workflow fields to define which queue to use for tasks within that workflow. These fields allow you to pass values into the workflow at runtime.

For example, in the built-in send workflow, there’s a field called destinationqueue. This field can be used to specify the queue that should be used for the task, and it can be passed when the workflow runs, just like how you pass the destinationsite field to specify the site the task should go to.

Here’s an example to show how it works:

{
    "paths": [
        "/mmfs1/data/project_one",
    ],
    "site": "london",
    "queue": "highpriority",
    "workflow": "send_to_site",
    "fields": {
        "destinationsite": "dublin",
        "destinationqueue": "lowpriority"
    }
}

For more information on constructing and running workflows, see Custom Workflows