Skip to content

Add file_extension fields to BlobType#3406

Draft
ddl-rliu wants to merge 5 commits intoflyteorg:masterfrom
ddl-rliu:rliu.DOM-75010.file-ext2
Draft

Add file_extension fields to BlobType#3406
ddl-rliu wants to merge 5 commits intoflyteorg:masterfrom
ddl-rliu:rliu.DOM-75010.file-ext2

Conversation

@ddl-rliu
Copy link
Copy Markdown
Contributor

@ddl-rliu ddl-rliu commented Mar 12, 2026

See flyteorg/flyte#7009

Tracking issue

Closes flyteorg/flyte#7024 [BUG] [copilot] File extensions are missing when copilot downloads Blob/FlyteFile inputs

Why are the changes needed?

(Keeping this PR in draft until flyteorg/flyte#7009 is merged)

After flyteorg/flyte#7009 merges, adding the new file_extension field to BlobType, this flytekit PR will enable users to configure the file extension on FlyteFile inputs. This addresses the issue where file extensions are missing when copilot downloads Blob/FlyteFile inputs.

What changes were proposed in this pull request?

Add the file_extension field to BlobType. Add annotation "FileExtension", which is used to annotate a FlyteFile when we want to download the file with a specific extension. For example,

# ContainerTask
def t1(file: Annotated[FlyteFile, FileExtension("csv")]):
    ... # copilot downloads the file to e.g. /inputs/file.csv

versus...

def t1(file: FlyteFile["csv"]):
    ... # copilot downloads the file to e.g. /inputs/file

Under the hood, this sets file_extension in BlobType.

How was this patch tested?

Setup process

Ran a workflow locally to test the changes

# Container task python code
def t1(datasetA: Annotated[FlyteFile[TypeVar('csv')], FileExtension("csv")], datasetB: FlyteFile[TypeVar('csv')]): …

# flyte-copilot logs
2026-04-09T22:15:08.506Z: {"json":{},"level":"info","msg":"Successfully copied [1703936] bytes remote data from [s3://ls-engc-flyte-data//25d4f8d8-7eb4-492b-97c4-ab3ce8bdcf74/ae] to local [/execution-vol/flows/workflow/inputs/datasetA.csv]","ts":"2026-04-09T22:15:08Z"}
2026-04-09T22:15:08.540Z: {"json":{},"level":"info","msg":"Successfully copied [196608] bytes remote data from [s3://ls-engc-flyte-data//858fd656-11ec-41ec-b235-b3b2984f230e/ex] to local [/execution-vol/flows/workflow/inputs/datasetB]","ts":"2026-04-09T22:15:08Z"}

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

pyproject.toml Outdated
"docker>=4.0.0",
"docstring-parser>=0.9.0",
"flyteidl>=1.16.1,<2.0.0a0",
"flyteidl @ git+https://github.com/ddl-rliu/flyte.git@1ba7c1545198a2820348323e64c23a41a19e7a7d#subdirectory=flyteidl",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump this after flyteorg/flyte#7009 merges

return None


class FileExtension:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follows same pattern as BatchSize:

class BatchSize:

@ddl-rliu ddl-rliu marked this pull request as draft April 9, 2026 23:04
@ddl-rliu ddl-rliu force-pushed the rliu.DOM-75010.file-ext2 branch 2 times, most recently from 52b0c79 to ce58c01 Compare April 9, 2026 23:53
ddl-rliu added 5 commits April 9, 2026 16:53
Port new BlobType fields file_extension and enable_legacy_filename to flytekit.
FlyteFile inputs can be annotated with the FileDownloadConfig annotation to
configure the file extension to use during the copilot download phase.

e.g.

```python
def t1(file: Annotated[FlyteFile, FileDownloadConfig(file_extension="csv")]):
    ... # copilot downloads the file to e.g. /inputs/file.csv

versus...

def t1(file: FlyteFile["csv"]):
    ... # copilot downloads the file to e.g. /inputs/file
```

Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
@ddl-rliu ddl-rliu force-pushed the rliu.DOM-75010.file-ext2 branch from ce58c01 to 3edf6e4 Compare April 9, 2026 23:53
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 10, 2026

Codecov Report

❌ Patch coverage is 45.71429% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 39.44%. Comparing base (c164f35) to head (3edf6e4).

Files with missing lines Patch % Lines
flytekit/core/type_engine.py 37.50% 10 Missing ⚠️
flytekit/types/file/file.py 25.00% 9 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (c164f35) and HEAD (3edf6e4). Click for more details.

HEAD has 24 uploads less than BASE
Flag BASE (c164f35) HEAD (3edf6e4)
25 1
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3406      +/-   ##
==========================================
- Coverage   45.78%   39.44%   -6.34%     
==========================================
  Files         317      216     -101     
  Lines       28359    22873    -5486     
  Branches     3015     3021       +6     
==========================================
- Hits        12983     9022    -3961     
+ Misses      15278    13755    -1523     
+ Partials       98       96       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] [copilot] File extensions are missing when copilot downloads Blob/FlyteFile inputs

1 participant