Skip to content

Fix a task fingerprinting bug#2740

Open
mkeeler wants to merge 1 commit intogo-task:mainfrom
mkeeler:watch-rebuild-single
Open

Fix a task fingerprinting bug#2740
mkeeler wants to merge 1 commit intogo-task:mainfrom
mkeeler:watch-rebuild-single

Conversation

@mkeeler
Copy link

@mkeeler mkeeler commented Mar 12, 2026

Task Fingerprinting Bug

The first commit in this PR fixes a bug where two task invocations (such as in a for loop) inadvertently where writing the checksum or timestamp files for the task to the same location even though the tasks were executed with different arguments causing them to have different sources.

Reproduction

version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"
  1. Run: echo 1 >1.in && echo 2 >2.in
  2. Run: task copy
    • This will run the copy:single once for each *.in file
  3. Run: echo 2.2 > 2.in
    • _This will run the copy:single task twice again with neither showing as up to date.

Because only 2.in was changed, I was expecting the task to show one copy:single task as up to date and then re-copy 2.in to 2.out.

Fix

Instead of writing out the checksum/timestamps to a single file within the respective directory, the task is first fingerprinted. So instead of the copy:single task here recording the checksum/timestamp in a single copy-single file, it will take a hash of the normalized task name, working directory of the task and the declared sources/generates and store the checksum in copy-single-<hash>. This allows each distinct invocation of the sub-task with different arguments to independently manage whether it is up to date.

Previously this PR also contained another fix but that has since been rolled into #2743

Task Watch Cancellation Bug

The pre-existing task watching code had a bug where once an event occurred it would
spawn go routines to process all tasks in the background and continue the loop. If an
event occurs, it would cancel the context used to run those previous go routines and > restart everything. In some scenarios this works fine such as when the generated
files do not reside within the same directory being watches. When the generated files
do reside in the same directory, the first task generating its output causes an
fsnotify event to be triggered which then cancels the context. This is racey, but if
the tasks are longer running it can eventually cancel the task resulting in other
sub-tasks not being executed. This doesn't result in an infinite loop because prior
to executing the task the fingerprint is checked and updated to prevent subsequent
runs.

The root cause of all of the bad behavior of not running the tasks to completion is
that the context is cancelled when it shouldn't be (an fsnotify event comes in for
something that is not one of the sources).

Reproduction

version: 3

tasks:
  copy:
    sources:
      - '**/*.in'
    generates:
      - '**/*.out'
    cmds:
      - for: sources
        task: copy:single
        vars:
          SOURCE: "{{.ITEM}}"
          TARGET: '{{.ITEM | replace ".in" ".out"}}'

  copy:single:
    sources:
      - '{{.SOURCE}}'
    generates:
      - '{{.TARGET}}'
    cmds: 
      - cp "{{.SOURCE}}" "{{.TARGET}}"
      # this is the main difference from the first bugs reproduction yaml
      # the sleep here ensures that tasks are "long running" allowing
      # time for the context cancellation to happen and prevent running
      # all the tasks
      - sleep 3
  1. Run: echo 1 >1.in && echo 2 >2.in
  2. Run: task -w copy
    • This will run the copy:single only once. It never gets around to executing
      the copy for the 2.in file
  3. In another terminal, run: echo 2.2 > 2.in
    • This will run the copy:single only once again.

I would have expected step 2 to run copy:single twice but it doesn't due to the
context being cancelled while in the first copy:single invocations sleep command
is executing.

I would also have expected step 3 cause copying to take place again. With the fix for
the fingerprinting bug included, the first invocation should show as up to date and
the second one would then run.

Fix

The fix was to move some logic to check the event against the sources out of the
spawned go routines to execute the tasks and to where the event handling first
starts. Because we check the events file against the list of sources before the
context is cancelled, we can toss out irrelevant events and keep processing of the
tasks going.~

@trulede trulede mentioned this pull request Mar 15, 2026
@butuzov butuzov mentioned this pull request Mar 15, 2026
@butuzov
Copy link
Contributor

butuzov commented Mar 15, 2026

This looks way better, than mine simple solution + test coverage added.

@mkeeler mkeeler force-pushed the watch-rebuild-single branch from c5d3920 to d19a73f Compare March 16, 2026 18:14
@mkeeler mkeeler changed the title Fix a task watching cancellation bug and a task fingerprinting bug Fix a task fingerprinting bug Mar 16, 2026
@mkeeler
Copy link
Author

mkeeler commented Mar 16, 2026

@trulede I have refactored this PR to only have the task fingerprinting fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants