Add CNCF ModelPack support and improve model format detection#813
Open
ilopezluna wants to merge 6 commits intomainfrom
Open
Add CNCF ModelPack support and improve model format detection#813ilopezluna wants to merge 6 commits intomainfrom
ilopezluna wants to merge 6 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request adds support for the CNCF ModelPack format and includes tests for the new media types. A critical security vulnerability was identified in pkg/distribution/internal/bundle/unpack.go:684, where the filepath.Base fallback fails to prevent directory traversal; the sanitized path must be re-validated. SCORE: 3/10 | VERDICT: Not ready | ISSUES: path traversal vulnerability in pkg/distribution/internal/bundle/unpack.go:684
Contributor
Author
|
/gemini review |
Contributor
There was a problem hiding this comment.
Code Review
Review: Support for CNCF ModelPack format. ### Summary: This pull request introduces support for the CNCF ModelPack format, allowing the system to unpack models using layer-per-file approaches with specific media types. It also includes a security improvement to handle and sanitize potentially unsafe file paths from external packaging tools. ### Warnings: - pkg/distribution/internal/bundle/unpack.go:689: The error message for failed path sanitization is misleading as it wraps the original validation error instead of the sanitization error. ### What looks good: Comprehensive test coverage for new media types and path validation edge cases. SCORE: 8/10 | VERDICT: Nearly ready | ISSUES: misleading error message in path sanitization
Address review comment: filepath.Base can return unsafe values like '..' or '/' for certain inputs. After sanitizing a path that failed validation, re-validate the result to prevent directory traversal attacks in UnpackFromLayers.
… layer annotations
Contributor
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The format-inference logic in
partial.getModelFormat/inferFormatFromAnnotationsandbundle.inferFormatFromLayerAnnotationsis nearly identical; consider extracting a shared helper to avoid drift in behavior between unpacking and partial path discovery. - The
matchesMediaTypeswitch now explicitly enumerates many media types just to fall through with no special handling; consider simplifying this with a default case or a small allowlist to reduce noise and the chance of missing new types in future edits.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The format-inference logic in `partial.getModelFormat`/`inferFormatFromAnnotations` and `bundle.inferFormatFromLayerAnnotations` is nearly identical; consider extracting a shared helper to avoid drift in behavior between unpacking and partial path discovery.
- The `matchesMediaType` switch now explicitly enumerates many media types just to fall through with no special handling; consider simplifying this with a default case or a small allowlist to reduce noise and the chance of missing new types in future edits.
## Individual Comments
### Comment 1
<location path="pkg/distribution/internal/bundle/unpack.go" line_range="535-544" />
<code_context>
+//
+// This is safer than filepath.Base which would flatten the entire path,
+// losing nested directory structure and causing silent collisions.
+func sanitizeRelativePath(p string) string {
+ // Normalize to forward slashes and clean
+ cleaned := filepath.ToSlash(filepath.Clean(p))
+
+ // Strip leading "../" segments
+ for strings.HasPrefix(cleaned, "../") {
+ cleaned = cleaned[len("../"):]
+ }
+ // Handle the case where the entire path is ".."
+ if cleaned == ".." {
+ return ""
+ }
+
+ return cleaned
+}
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Handling of paths that collapse entirely to ".." or similar may lead to writing directly into modelDir
For a path like "../../..", `filepath.Clean` and the loop strip all `"../"` segments so `cleaned` becomes `""`, and the `cleaned == ".."` check never fires. `sanitizeRelativePath` then returns `""`, causing `relPath == ""` and `destPath` to be `modelDir`, so we unpack directly into the bundle root. This is exactly the sort of pathological input we want to guard against. Consider treating a fully-collapsed path (empty string) as invalid and returning an error instead of mapping it to the root directory.
</issue_to_address>
### Comment 2
<location path="pkg/distribution/internal/bundle/unpack.go" line_range="721-726" />
<code_context>
+ // ".." components when the model was packaged using an absolute path.
+ // In that case, strip leading ".." segments while preserving the
+ // remaining directory structure to avoid flattening nested layouts.
if err := validatePathWithinDirectory(modelDir, relPath); err != nil {
- return nil, fmt.Errorf("invalid filepath annotation %q: %w", relPath, err)
+ sanitizedPath := sanitizeRelativePath(relPath)
+ // Re-validate the sanitized path to ensure it's safe.
+ if err2 := validatePathWithinDirectory(modelDir, sanitizedPath); err2 != nil {
+ return nil, fmt.Errorf("invalid filepath annotation %q could not be sanitized: %w", relPath, err)
+ }
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Error wrapping uses the original validation error instead of the one from the sanitized path
In the sanitization branch, if both validations fail we still wrap `err` rather than `err2`, so the final error message refers to the original path instead of `sanitizedPath`. Please wrap `err2` (or include it) so the error reflects why the sanitized path was rejected.
```suggestion
if err := validatePathWithinDirectory(modelDir, relPath); err != nil {
sanitizedPath := sanitizeRelativePath(relPath)
// Re-validate the sanitized path to ensure it's safe.
if err2 := validatePathWithinDirectory(modelDir, sanitizedPath); err2 != nil {
return nil, fmt.Errorf(
"invalid filepath annotation %q (sanitized as %q) could not be sanitized: original error: %v, sanitized error: %w",
relPath, sanitizedPath, err, err2,
)
}
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add CNCF ModelPack support and improve model format detection
Problem
CNCF ModelPack models (packaged with tools like
modctl) were not fully supported. Several issues existed:Format detection gaps for models without
config.format: CNCF ModelPack models may use the format-agnostic media typeapplication/vnd.cncf.model.weight.v1.rawand omit the optionalconfig.formatfield. This causedGGUFPaths(),SafetensorsPaths(), andDDUFPaths()to return empty results, and the scheduler to fall back to the default backend instead of selecting the correct one (e.g., vLLM for safetensors, diffusers for DDUF).Unsafe path sanitization during unpack: When
modctlpackages a model from an absolute path, the filepath annotations in layers may contain..components. The previous fallback usedfilepath.Base()which flattened the entire directory structure, silently losing files in multi-directory models (e.g., two differentmodel.safetensorsin different subdirectories would collide to the same filename).DDUF format not fully supported:
DDUFPaths()did not resolve format-agnostic weight layers.matchesMediaType()had no case forMediaTypeDDUFin the generic weight section.parse.godid not look for.dduffiles, causing DDUF-only bundles to be unnecessarily recreated on every load.Solution
Format inference fallback (
partial.go,unpack.go,scheduler.go):config.formatis empty, scan weight layer filepath annotations for file extensions (.gguf,.safetensors,.dduf) to infer the format. This is implemented ingetModelFormat()→inferFormatFromAnnotations()inpartial.goandinferFormatFromLayerAnnotations()inunpack.go.inferFormatFromModel()checksmodel.DDUFPaths()as a final fallback.Path sanitization (
unpack.go):filepath.Base()fallback withsanitizeRelativePath()which strips only leading../segments while preserving the remaining directory structure (e.g.,../../a/b/model.safetensors→a/b/model.safetensors).DDUF support (
partial.go,parse.go):DDUFPaths()now passesgetModelFormat()to resolveMediaTypeWeightRawlayers as DDUF.matchesMediaType()now includes aMediaTypeDDUFcase in the generic weight type resolution, supporting bothFormatDDUFand legacyFormatDiffusers.parse.gonow searches for.dduffiles and includes them in bundle validation, preventing unnecessary bundle recreation.