Skip to content

Conversation

@ahal
Copy link
Collaborator

@ahal ahal commented Jan 15, 2025

We currently hard code some environment variables that should be turned into absolute paths. But it would be great if we had the ability to do this for any environment variable.

My current use case is setting PIP_CACHE_DIR to an absolute path under the task's workdir. I could add it to the list of hard coded envs, but it would be better to do something like PIP_CACHE_DIR=$TASK_WORKDIR/.cache/pip and have run-task do the interpolation automatically.

@ahal ahal requested review from a team and gabrielbusta January 15, 2025 21:14
@ahal ahal marked this pull request as draft January 15, 2025 21:17
@ahal ahal force-pushed the ahal/push-xzlwzlwtypxp branch from 32c7d8b to ad1bfef Compare January 15, 2025 21:18
@ahal ahal self-assigned this Jan 15, 2025
@ahal ahal added the feature A feature request label Jan 15, 2025
@ahal ahal marked this pull request as ready for review January 16, 2025 15:23
@ahal ahal force-pushed the ahal/push-xzlwzlwtypxp branch from ad1bfef to 512b52c Compare January 17, 2025 19:01
@ahal ahal requested a review from Eijebong January 17, 2025 19:02
@ahal ahal force-pushed the ahal/push-xzlwzlwtypxp branch from 512b52c to 3dd4b3f Compare January 17, 2025 19:06
Copy link
Contributor

@bhearsum bhearsum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current use case is setting PIP_CACHE_DIR to an absolute path under the task's workdir. I could add it to the list of hard coded envs, but it would be better to do something like PIP_CACHE_DIR=$TASK_WORKDIR/.cache/pip and have run-task do the interpolation automatically.

Just to make sure I understand fully: this is necessary when running payloads that don't execute through bash? (We already have places in translations where we use $TASK_WORKDIR directly, but that runs through bash, which I suppose is why that works.

os.environ[k] = os.path.abspath(os.environ[k])
# Interpolate environment variables with defined substitution patterns. For
# example, the variable `CACHE_DIR={task_workdir}/.cache` will be
# interpolated with the task's working directory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to get some minimal mentions of this in the docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but run-task and the run transforms are entirely undocumented atm. Documenting this should be pretty high priority, but I don't want to scope bloat just to have a place to add this.

# example, the variable `CACHE_DIR={task_workdir}/.cache` will be
# interpolated with the task's working directory.
env_subs = {
"task_workdir": task_workdir,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there's a reason that we can't use the upper-cased $TASK_WORKDIR here? It's a bit weird to have both versions of it. If we can't use the upper cased version, I guess this is the best alternative. (Having a completely differently named variable that is actually the same value would be worse...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see my explanation in the main thread, I think it'll make more sense then)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I'll expand a little bit.. My initial patch did use $TASK_WORKDIR, but then Bastien pointed out that it was strange that ${TASK_WORKDIR} doesn't work. He suggested using os.path.expandvars as a substitute.

I think this would be a good idea, but I also worry that this could break tasks in subtle ways if it accidentally starts substituting things we didn't intend it to. There's also questions about the order environment variables are substituted and what happens if you have variables that depend on each other in a cycle or with a higher depth.

Ultimately I decided that what we are doing here isn't really environment variable substitution, so it would be cleaner not to disguise it as such. I don't know if it's the right call, but I think this is the safest option. Maybe we can change it again later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed this earlier. Thanks for the explanation - sorry I didn't notice it in the other thread!

@ahal
Copy link
Collaborator Author

ahal commented Jan 20, 2025

Just to make sure I understand fully: this is necessary when running payloads that don't execute through bash? (We already have places in translations where we use $TASK_WORKDIR directly, but that runs through bash, which I suppose is why that works.

No, this is necessary to be able to specify absolute paths in the task definition with generic-worker. Normally we just use relative paths like ./artifacts or ./checkouts because the task user isn't known. But that breaks if the task ever changes the working directory.

Specifically for my use case, I want to be able to set:

PIP_CACHE_DIR=/home/task_123456/.task-cache/pip

directly in the task definition. Since task_123456 can't be known, I'll set it to {work_dir}/.task-cache/pip instead and run-task will interpolate it.

@ahal ahal merged commit eb4d396 into taskcluster:main Jan 20, 2025
17 checks passed
@ahal ahal deleted the ahal/push-xzlwzlwtypxp branch January 20, 2025 16:18
bhearsum added a commit to bhearsum/translations that referenced this pull request Feb 7, 2025
bhearsum added a commit to bhearsum/translations that referenced this pull request Feb 11, 2025
bhearsum added a commit to bhearsum/translations that referenced this pull request Feb 11, 2025
bhearsum added a commit to mozilla/translations that referenced this pull request Feb 11, 2025
* chore: upgrade to taskgraph 13.0

This should be fairly boring; the biggest thing it gets us is some under the hood things that will allow us to access better cost tracking.

* fix: get rid of unnecessary $TASK_WORKDIR prefix before $VCS_PATH

taskgraph now does this automatically as of taskcluster/taskgraph#630

* drop taskcluster/requirements.*

These files cause us to reinstall taskgraph at the start of each decision task, which is both unnecessary and potentially changes the version being used, which can be confusing.

We do currently use these files as part of a couple of taskgraph tests; I've replaced this usage with installing it through Poetry instead, since we already have taskgraph as a dependency there, and the tests run through Poetry anyways.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature A feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants