[Web][COS] Persist URL→hash mapping across page loads#19569
Open
tomayac wants to merge 1 commit into
Open
Conversation
The CrossOriginStorage class was storing the URL→hash map only in the module-level GLOBAL_HASH_CACHE. After a page reload that cache is empty, and getFileHash() can only recover hashes for HuggingFace LFS files (URLs containing /resolve/). This left several resource categories uncacheable across sessions: - JSON files not stored in LFS (mlc-chat-config.json, tokenizer.json, tensor-cache.json) — getFileHash returns null for their /resolve/ URLs because the raw pointer is the actual file content, not an LFS pointer. - .wasm files from GitHub raw URLs — no /resolve/ pattern at all. - Any file whose hash was computed from blob content via getBlobHash. Additionally, even for genuine LFS model shards, each page load was re-fetching every shard's LFS pointer file over the network just to re-derive the SHA-256 hash. Fix: persist the URL→hash mapping to a dedicated Cache API store (tvmjs-cos-hash-meta). Two write sites: 1. put() — after a file is stored in COS, persist its blob-derived hash. This covers all non-LFS files and non-HuggingFace URLs. 2. resolveHashDescriptor() — after getFileHash() resolves a hash from the LFS pointer, persist it immediately. This eliminates repeated pointer-file network requests for model shards on subsequent visits. Both write sites use a best-effort try/catch so storage quota errors are silently ignored. loadPersistedHashEntry() similarly swallows errors. The typeof caches === "undefined" guard keeps the code safe in Node.js test environments.
Contributor
There was a problem hiding this comment.
Code Review
This pull request implements persistent caching for hash descriptors in CrossOriginStorage using the browser's Cache API. It introduces persistHashEntry and loadPersistedHashEntry methods to store and retrieve hashes, reducing redundant network requests for non-LFS files and specific URLs on subsequent visits. I have no feedback to provide as there were no review comments.
Author
|
Most likely @CharlieFRuan, @guan404ming, and @akaashrp would be good reviewers for this refinement. |
Contributor
|
Thanks for the ping @tomayac. Will take a look soon. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The CrossOriginStorage class was storing the URL→hash map only in the module-level GLOBAL_HASH_CACHE. After a page reload that cache is empty, and getFileHash() can only recover hashes for HuggingFace LFS files (URLs containing /resolve/). This left several resource categories uncacheable across sessions:
Additionally, even for genuine LFS model shards, each page load was re-fetching every shard's LFS pointer file over the network just to re-derive the SHA-256 hash.
Fix: persist the URL→hash mapping to a dedicated Cache API store (tvmjs-cos-hash-meta). Two write sites:
put() — after a file is stored in COS, persist its blob-derived hash. This covers all non-LFS files and non-HuggingFace URLs.
resolveHashDescriptor() — after getFileHash() resolves a hash from the LFS pointer, persist it immediately. This eliminates repeated pointer-file network requests for model shards on subsequent visits.
Both write sites use a best-effort try/catch so storage quota errors are silently ignored. loadPersistedHashEntry() similarly swallows errors. The typeof caches === "undefined" guard keeps the code safe in Node.js test environments.