Skip to content

Track polling tasks across cluster in the openMIC database#46

Open
StephenCWills wants to merge 1 commit intomasterfrom
polling-task
Open

Track polling tasks across cluster in the openMIC database#46
StephenCWills wants to merge 1 commit intomasterfrom
polling-task

Conversation

@StephenCWills
Copy link
Member

Adds the PollingTask table to the openMIC database to track tasks across the entire cluster. This allows one node to clear all matching tasks for a given downloader when the task executes, reducing churn on the cluster when processing times are high.

Adds the DownloaderGroup table to prevent parallel processing, even between separate nodes in the cluster. Many devices don't support parallel access, and even those that do would experience performance degradation due to the additional resources required to handle simultaneous connections. The cluster likely has enough to do in parallel already so this helps spread those resources out across different downloaders.

PollingTask records are added when a task is queued on a logical thread. They are removed at the start of execution when that task executes on the logical thread. They may also be removed by another node if that node happens to execute its task first.

DownloaderGroup records are created at the start of execution when the first task for a downloader group is executed somewhere on the cluster. These records never get deleted. Also at the start of execution, the downloader will attempt to obtain a cluster-wide lock on the downloader group by entering its node identifier and the current timestamp into the DownloaderGroup table. When the task has finished executing, the node enters NULL into those fields to release the lock. If it fails to obtain the lock, the downloader will simply requeue the task in the hope that the lock will have been released when the task comes up again.

As a safeguard, any lock only lasts one hour at which point other nodes are allowed to take the lock. If a node crashes and is able to restart, it can expire the lock early during initialization of the downloader. If a node crashes and fails to restart, other nodes in the cluster will have to wait the full hour before they can obtain the lock. If a node somehow takes more than an hour to poll a device, the lock will still expire while it is polling and may end up getting taken by another node.

All downloaders in a downloader group will now share a logical thread. This prevents some churn when the node is already processing a downloader in that group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant