Skip to content

Add CNCF ModelPack support and improve model format detection#813

Open
ilopezluna wants to merge 6 commits intomainfrom
unpack-cncf
Open

Add CNCF ModelPack support and improve model format detection#813
ilopezluna wants to merge 6 commits intomainfrom
unpack-cncf

Conversation

@ilopezluna
Copy link
Copy Markdown
Contributor

@ilopezluna ilopezluna commented Mar 30, 2026

Add CNCF ModelPack support and improve model format detection

Problem

CNCF ModelPack models (packaged with tools like modctl) were not fully supported. Several issues existed:

  1. Format detection gaps for models without config.format: CNCF ModelPack models may use the format-agnostic media type application/vnd.cncf.model.weight.v1.raw and omit the optional config.format field. This caused GGUFPaths(), SafetensorsPaths(), and DDUFPaths() to return empty results, and the scheduler to fall back to the default backend instead of selecting the correct one (e.g., vLLM for safetensors, diffusers for DDUF).

  2. Unsafe path sanitization during unpack: When modctl packages a model from an absolute path, the filepath annotations in layers may contain .. components. The previous fallback used filepath.Base() which flattened the entire directory structure, silently losing files in multi-directory models (e.g., two different model.safetensors in different subdirectories would collide to the same filename).

  3. DDUF format not fully supported: DDUFPaths() did not resolve format-agnostic weight layers. matchesMediaType() had no case for MediaTypeDDUF in the generic weight section. parse.go did not look for .dduf files, causing DDUF-only bundles to be unnecessarily recreated on every load.

Solution

Format inference fallback (partial.go, unpack.go, scheduler.go):

  • When config.format is empty, scan weight layer filepath annotations for file extensions (.gguf, .safetensors, .dduf) to infer the format. This is implemented in getModelFormat()inferFormatFromAnnotations() in partial.go and inferFormatFromLayerAnnotations() in unpack.go.
  • The scheduler's inferFormatFromModel() checks model.DDUFPaths() as a final fallback.

Path sanitization (unpack.go):

  • Replaced filepath.Base() fallback with sanitizeRelativePath() which strips only leading ../ segments while preserving the remaining directory structure (e.g., ../../a/b/model.safetensorsa/b/model.safetensors).
  • Added collision detection: if two different annotations resolve to the same destination path after sanitization, the unpack fails with an explicit error instead of silently dropping files.

DDUF support (partial.go, parse.go):

  • DDUFPaths() now passes getModelFormat() to resolve MediaTypeWeightRaw layers as DDUF.
  • matchesMediaType() now includes a MediaTypeDDUF case in the generic weight type resolution, supporting both FormatDDUF and legacy FormatDiffusers.
  • parse.go now searches for .dduf files and includes them in bundle validation, preventing unnecessary bundle recreation.
MODEL_RUNNER_HOST=http://localhost:13434 docker model run ghcr.io/modelpack/qwen-qwen2.5-0.5b-instruct:latest
Unable to find model 'ghcr.io/modelpack/qwen-qwen2.5-0.5b-instruct:latest' locally. Pulling from the server.
599bab540750: Pull complete [==================================================>]  1.672MB/1.672MB
e558847a8b44: Pull complete [==================================================>]     242B/242B
c0382117ea32: Pull complete [==================================================>]  7.032MB/7.032MB
18e18afcacca: Pull complete [==================================================>]     659B/659B
5b5d4f65d0ac: Pull complete [==================================================>]  7.305kB/7.305kB
fdf756fa7fcb: Pull complete [==================================================>]  988.1MB/988.1MB
b19c806a904d: Pull complete [==================================================>]  4.917kB/4.917kB
Model pulled successfully
> hello!
Hello! How can I assist you today?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the CNCF ModelPack format and includes tests for the new media types. A critical security vulnerability was identified in pkg/distribution/internal/bundle/unpack.go:684, where the filepath.Base fallback fails to prevent directory traversal; the sanitized path must be re-validated. SCORE: 3/10 | VERDICT: Not ready | ISSUES: path traversal vulnerability in pkg/distribution/internal/bundle/unpack.go:684

@ilopezluna
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Review: Support for CNCF ModelPack format. ### Summary: This pull request introduces support for the CNCF ModelPack format, allowing the system to unpack models using layer-per-file approaches with specific media types. It also includes a security improvement to handle and sanitize potentially unsafe file paths from external packaging tools. ### Warnings: - pkg/distribution/internal/bundle/unpack.go:689: The error message for failed path sanitization is misleading as it wraps the original validation error instead of the sanitization error. ### What looks good: Comprehensive test coverage for new media types and path validation edge cases. SCORE: 8/10 | VERDICT: Nearly ready | ISSUES: misleading error message in path sanitization

Address review comment: filepath.Base can return unsafe values like
'..' or '/' for certain inputs. After sanitizing a path that failed
validation, re-validate the result to prevent directory traversal
attacks in UnpackFromLayers.
@ilopezluna ilopezluna changed the title [WIP] feat: add support for CNCF ModelPack format in unpacking logic Add CNCF ModelPack support and improve model format detection Apr 1, 2026
@ilopezluna ilopezluna marked this pull request as ready for review April 1, 2026 13:29
Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The format-inference logic in partial.getModelFormat/inferFormatFromAnnotations and bundle.inferFormatFromLayerAnnotations is nearly identical; consider extracting a shared helper to avoid drift in behavior between unpacking and partial path discovery.
  • The matchesMediaType switch now explicitly enumerates many media types just to fall through with no special handling; consider simplifying this with a default case or a small allowlist to reduce noise and the chance of missing new types in future edits.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The format-inference logic in `partial.getModelFormat`/`inferFormatFromAnnotations` and `bundle.inferFormatFromLayerAnnotations` is nearly identical; consider extracting a shared helper to avoid drift in behavior between unpacking and partial path discovery.
- The `matchesMediaType` switch now explicitly enumerates many media types just to fall through with no special handling; consider simplifying this with a default case or a small allowlist to reduce noise and the chance of missing new types in future edits.

## Individual Comments

### Comment 1
<location path="pkg/distribution/internal/bundle/unpack.go" line_range="535-544" />
<code_context>
+//
+// This is safer than filepath.Base which would flatten the entire path,
+// losing nested directory structure and causing silent collisions.
+func sanitizeRelativePath(p string) string {
+	// Normalize to forward slashes and clean
+	cleaned := filepath.ToSlash(filepath.Clean(p))
+
+	// Strip leading "../" segments
+	for strings.HasPrefix(cleaned, "../") {
+		cleaned = cleaned[len("../"):]
+	}
+	// Handle the case where the entire path is ".."
+	if cleaned == ".." {
+		return ""
+	}
+
+	return cleaned
+}
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Handling of paths that collapse entirely to ".." or similar may lead to writing directly into modelDir

For a path like "../../..", `filepath.Clean` and the loop strip all `"../"` segments so `cleaned` becomes `""`, and the `cleaned == ".."` check never fires. `sanitizeRelativePath` then returns `""`, causing `relPath == ""` and `destPath` to be `modelDir`, so we unpack directly into the bundle root. This is exactly the sort of pathological input we want to guard against. Consider treating a fully-collapsed path (empty string) as invalid and returning an error instead of mapping it to the root directory.
</issue_to_address>

### Comment 2
<location path="pkg/distribution/internal/bundle/unpack.go" line_range="721-726" />
<code_context>
+		// ".." components when the model was packaged using an absolute path.
+		// In that case, strip leading ".." segments while preserving the
+		// remaining directory structure to avoid flattening nested layouts.
 		if err := validatePathWithinDirectory(modelDir, relPath); err != nil {
-			return nil, fmt.Errorf("invalid filepath annotation %q: %w", relPath, err)
+			sanitizedPath := sanitizeRelativePath(relPath)
+			// Re-validate the sanitized path to ensure it's safe.
+			if err2 := validatePathWithinDirectory(modelDir, sanitizedPath); err2 != nil {
+				return nil, fmt.Errorf("invalid filepath annotation %q could not be sanitized: %w", relPath, err)
+			}
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Error wrapping uses the original validation error instead of the one from the sanitized path

In the sanitization branch, if both validations fail we still wrap `err` rather than `err2`, so the final error message refers to the original path instead of `sanitizedPath`. Please wrap `err2` (or include it) so the error reflects why the sanitized path was rejected.

```suggestion
		if err := validatePathWithinDirectory(modelDir, relPath); err != nil {
			sanitizedPath := sanitizeRelativePath(relPath)
			// Re-validate the sanitized path to ensure it's safe.
			if err2 := validatePathWithinDirectory(modelDir, sanitizedPath); err2 != nil {
				return nil, fmt.Errorf(
					"invalid filepath annotation %q (sanitized as %q) could not be sanitized: original error: %v, sanitized error: %w",
					relPath, sanitizedPath, err, err2,
				)
			}
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ilopezluna ilopezluna requested a review from a team April 1, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant