Skip to content

attach/upload: enhance file type detection with content sniffing and filename keywords #498

@aftersnow

Description

@aftersnow

Description

Follow-up to #497. Currently getProcessor() relies solely on file extension matching. When that fails, we plan to add a file-size heuristic as a fallback (reusing the existing logic from modelfile generate).

For future improvement, two additional heuristic strategies could make file type detection more accurate:

1. Content sniffing

Read the first few bytes of the file to determine if it's text or binary:

  • If the content looks like valid JSON/YAML/TOML → classify as config
  • If the content is binary → classify based on file size (weight vs code)
  • Could use net/http.DetectContentType or a simple UTF-8 validity check

2. Filename keyword matching

Infer file type from keywords in the filename:

  • Names containing config, param, hparam, setting → config
  • Names containing weight, model, checkpoint, ckpt → weight
  • Names containing readme, license, doc → doc

Motivation

These heuristics would reduce the cases where users need to manually specify --media-type and provide more accurate auto-detection than file size alone.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions