Conversation
…ransformers.js into v4-pnpm-workspaces
There was a problem hiding this comment.
Very nice! I think we should add some docs (either a new CONTRIBUTING.md or in README.md), which explains the new repo structure and how things tie together. When we add react-transformers, we can document how the multi-dev-server works.
Some other comments:
There was a problem hiding this comment.
I think we should define per-package github workflow tests. Kind of like https://github.com/huggingface/huggingface.js/tree/main/.github/workflows. WDYT?
There was a problem hiding this comment.
You mean publish workflows per package? Definitely makes sense.
There was a problem hiding this comment.
Yep exactly. This can be a follow-up PR.
Users should use onnxruntime-genai or optimum directly
There was a problem hiding this comment.
PR's looking pretty good imo! 🙌 I made a couple of fixes for documentation generation, and I've now tested it with hf-doc-builder, and everything seems to be correct. In case you want to run it locally, you can:
- install https://github.com/huggingface/doc-builder from source (
pip install --upgrade git+https://github.com/huggingface/doc-builder) - in
packages/transformers, runpnpm docs-api, thendoc-builder build transformers.js ./docs/source/ --not_python_module --build_dir ./docs/build/, thendoc-builder preview transformers.js ./docs/source/ --not_python_module.
Some things you may be able to check out:
- Our root directory is filled with lots of config files. Can we find a way to reduce this count? e.g., are both tsconfig files necessary (do we need one in the root, if we have on a per-project level)?
- We should add a solid CONTRIBUTING.md, probably following from your work in #1433, including how the project is structured, the build process, opening PRs, etc.
- We dynamically build the Transformers.js README.md, (
pnpm readme), but we don't want to commit duplicates of this file. On the other hand, when publishing, we would need the README.md inside of the packages/transformers folder. On that note, we would probably eventually need to make the monorepo README different from the transformers.js README (where the monorepo README explains the project structure). - Some pages (e.g., api/tokenizers) contain many unnecessary sections (because these are exported by the module). Perhaps we can find a way with https://www.npmjs.com/package/jsdoc-to-markdown to filter definitions to only be visible when this is a global export of transformers.js
- Documentation improvements, as a whole, would be most welcomed (in a separate PR), and the doc builder would then be able to include code snippets inside the generated docs.
|
@xenova, I added a CONTRIBUTING.md and did some changes to the config files. Unfortunately I dont think we can remove more files. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
* switched to pnpm workspaces * updated github actions * added comments * Update tensor.js * Formatting * Update tsconfig.json * Update tsconfig.json * fixed circular reference error in pipelines/zero-shot-audio-classification.js * Post-tsconfig updates * Move transformers.js docs to package folder * Move additional tests * JSDoc update * Version bumps * Update incorrect test * Update test_modeling_musicgen.js * Update test_modeling_musicgen.js * Update test_modeling_musicgen.js * fixed broken symlink * fixes after review * Remove old conversion scripts Users should use onnxruntime-genai or optimum directly * Update .prettierrc * Formatting * Update readme/docs * Move build scripts to parent folder * Remove unused tests * Remove old compare function * Fix JSDoc * Update generate.js * Update inline descriptions * Bump versions * Update node imports * Add module header to FileCache.js * JSDoc updates * Update tensor.js * Move prettier config to package.json key * Update FileCache.js * Remove unused import * Remove non-existent file include * Prefer non-default exports * Update doc module exports * Update docs generation script * merged tsconfigs and added contributing.md * Update path_to_docs * Formatting * Formatting * Formatting * Update prettier usage * Remove <code> tags from headers * Swap docs-preview and docs-build commands * ONNXRUNTIME_NODE_INSTALL=skip for doc-builder * Update buildAll.mjs * Update index --------- Co-authored-by: Joshua Lochner <26504141+xenova@users.noreply.github.com>

This PR is a proposal to move to pnpm workspaces.
why workspaces?
Right we have only one package:
@huggingface/transformersSo everything we ship can be used from this one package. In the future we also want to be able to ship small, domain specific packages. For example a framework specific package
@huggingface/transformers-reactor maybe abstractions like a text-generation package that is compatible with the Responses API@huggingface/transformers-responses.Using workspaces will allow us to ship those small packages that depend on
@huggingface/transformerswithout the overhead of individual repos.why pnpm?
Pnpm is a package manager for NodeJS that brings a lot of benefits especially in how it handles disk space.
pnpm stores each package version once in a global store and hard-links it to all workspaces that need it, which avoids the disk space waste of npm duplicating packages everywhere. The symlink-based node_modules structure means packages can only import what they explicitly declare as dependencies, preventing the "works on my machine" issues where code accidentally relies on hoisted packages.
The workspace:* protocol links local packages during development and converts to real versions when you publish, and installs are faster because pnpm fetches packages in parallel and reuses everything from the global store. You just need a pnpm-workspace.yaml file to set it up.
Also pnpm seems to be the standard for workspaces in the npm ecosystem, since big repos like Vercel AI SDK, langchainjs and others are using it.