This repo is the source of https://www.tidymodels.org, and this readme tells you how it all works.
-
If you spot any small problems with the website, please feel empowered to fix them directly with a PR.
-
If you see any larger problems, an issue is probably better: that way we can discuss the problem before you commit any time to it.
This repo (and resulting website) is licensed as CC BY-SA.
- Fork and clone the repo, then install R packages by running
R/installs.Rand restart R. - Edit the relevant
.qmdfile(s) instart/,learn/,find/, etc. - Render your changes locally with
quarto render <path/to/file.qmd>, or skip local rendering by opening a PR and commenting:CI will render the changed pages and commit the outputs back to your branch./render - Commit the source and the rendered outputs (
_freeze/entries and.mdfiles) to your branch. - Open a pull request.
If you're adding a new article to learn/, create a folder with an index.qmd inside it. If it needs packages beyond the core tidymodels meta-package, add them to R/installs.R and to the r-packages: field in the file's YAML front matter.
If you're adding a new package to the tidymodels ecosystem, add it to make_function_lists/all_packages.csv (one package name per row), then regenerate the function lists with Rscript make_function_lists/run_all.R.
When updating the site, the goal is to use the most recent CRAN versions of the modeling/data analysis packages.
-
Get a local copy of the website source.
- Users of devtools/usethis can do:
Note that
usethis::create_from_github("tidymodels/tidymodels.org")
usethis::create_from_github()works best when it can find a GitHub personal access token and usethis (git2r, really) is configured correctly for your preferred transport protocol (SSH vs HTTPS). Setup advice. - Otherwise, use your favorite method to fork and clone or download the repo as a ZIP file and unpack.
- Users of devtools/usethis can do:
-
Start R in your new
tidymodels.org/directory. -
To install the required packages, run the code within
R/installs.RThis file will also install the
keraspython libraries and environments. -
Restart R.
-
You should now be able to render the site in all the usual ways for quarto by calling
quarto render.
We use the latest release version of quarto. You can install and manage different version with qvm.
The website is deployed to GitHub Pages via the publish.yml workflow.
The source of the website is a collection of .qmd files stored in the folders in this repository. This site is then rendered as a Quarto html website.
-
packages/: this is a top-level page on the site rendered from a single.qmdfile. -
start/: these files make up a 5-part tutorial series to help users get started with tidymodels. Each article is an.qmdfile as a page bundle, meaning that each article is in its own folder along with accompanying images, data, and rendered figures. -
learn/: these files make up the articles presented in the learn section. This section is nested, meaning that inside this section, there are actually 4 subsections:models,statistics,work,develop. Each article is an.qmdfile. -
help/: this is a top-level page on the site rendered from a single.qmdfile. -
contribute/: this is a top-level page on the site rendered from a single.qmdfile. -
books/: these files make up the books page, linked from resource stickies. To add a new book, create a new folder with a new.qmdfile inside namedindex.qmd. An image file of the cover should be added in the same folder, namedcover.*. -
find/: these files make up the find page, linked from the top navbar and resource stickies. Each of these pages is an.qmdfile. The CSV data files in this directory are generated by scripts inmake_function_lists/. -
make_function_lists/: scripts that generate the CSV reference lists for the find pages. See Generating function lists below.all_packages.csvin this folder lists all packages in the tidymodels ecosystem and is used bytidymodels.Rto build the function reference. -
about/: author listing page rendered fromindex.qmdusingauthors.yamlandtemplate.ejs. -
cheatsheets/: cheatsheet page rendered from a singleindex.qmdfile.
This repo uses two Quarto profiles to split behavior between local and CI rendering:
_quarto-local.yml(default): used when rendering locally. Defines post-render scripts:learn/models/parsnip-predictions/normalize-h2o-output.R,R/post-render.R, andR/post-render-downlit.R._quarto-production.yml: used in CI viaQUARTO_PROFILE: productioninpublish.yml. Also runsR/post-render-downlit.Rso code linking applies to all HTML files including frozen pages.
When adding a script that should only run locally, add it to _quarto-local.yml. If it should run in CI, add it to _quarto-production.yml and ensure the workflow installs the needed dependencies.
R functions in code blocks are hyperlinked to their documentation via the downlit package, enabled with code-link: true in _quarto.yml.
Because library(tidymodels) is not automatically expanded by downlit (unlike library(tidyverse)), R/post-render-downlit.R explicitly seeds the package list via tidymodels::tidymodels_packages() so functions like step_*, tune(), etc. are linked correctly.
Every .qmd file that contains R code declares its package dependencies in the YAML front matter using the r-packages field:
r-packages:
- tidymodels
- ranger
- kableExtraConvention: list only packages that are not already members of the tidymodels meta-package. The full list of tidymodels members can be checked with tidymodels::tidymodels_packages(). For example, dplyr, ggplot2, modeldata, tune, and rlang are all covered by listing tidymodels and should not be listed separately.
This metadata is the foundation for tooling that can:
- install exactly the packages needed for a given page
- selectively re-render only pages affected by a package release
Pure prose pages (no R code chunks) do not need this field.
-
To add a new post to
learn/, add a new folder with aindex.qmdfile in it and adapt the YAML header from an existing post. If new packages are required to run this post, then add them to thepackagesobject inR/installs.Rand to ther-packagesfield in the new post's YAML front matter. -
To preview the site, render it locally with the latest quarto release version.
-
Rendered outputs are committed to the repo — the freeze cache (
_freeze/) and the.mdfiles kept viakeep-md: true. Thepublish.ymlworkflow renders and publishes the site on Ubuntu viaquarto-actions/publish; frozen pages are served from the committed cache rather than re-rendered. Always include rendered outputs in your PR. -
Rendering in CI via a PR comment: If you'd prefer not to render locally, comment
/renderon your open PR. A GitHub Actions workflow (render-pr.yml) will detect which.qmdfiles changed, install the needed packages, render those pages, and commit the output back to your branch. It posts a comment when done (or links to the failed run on error). Only repo owners, org members, and collaborators can trigger this. -
Note on platform differences: As the automated nightly re-render (
check-cran-releases.yml) matures, pages will increasingly be rendered on Linux (Ubuntu) rather than macOS. The first time a page is re-rendered in CI you may see numerical differences in the output — floating point results can vary slightly between platforms due to differences in BLAS/LAPACK libraries and other system-level factors. These differences are expected and not a sign of a bug, but should be reviewed before merging the automated PR. -
keep-md: trueis set in_quarto.ymlso that rendered.mdfiles are committed alongside the source. This makes it possible to review in a PR whether code produced different results than before. -
To do a complete rerender, run
R/re-render.Rscript.
Some pages use engines that require large external downloads (Spark, torch). These are cached in CI to avoid re-downloading on every run.
Spark is pre-installed in CI via .github/actions/setup-render/action.yml. When upgrading Spark, update all four of these in lockstep:
- Cache key in
setup-render/action.yml:spark-3.5.8-java17-${{ runner.os }} - Cache path and download URL in the Install Spark step of
setup-render/action.yml:3.5.8 - Version check in
R/install_packages.R:grepl("^3\\.5", ...) versionargument in allspark_connect()calls inlearn/models/parsnip-predictions/index.qmd:"3.5"
Changing the cache key forces a fresh download on the next CI run. The R/install_packages.R check prevents a redundant download when Spark is already present (either from cache or a prior install).
We try to do a rerender after a release of a main package.
-
Make sure that
make_function_lists/all_packages.csvis up to date. -
Run
R/installs.Rscript. Make sure to check that dev versions aren't present. -
Run
R/re-render.Rscript.
To re-render only the pages affected by one or more package updates, use R/re-render-package.R:
Rscript R/re-render-package.R ranger
Rscript R/re-render-package.R ranger glmnet # union of affected pages, deduped
Rscript R/re-render-package.R tidymodels # all pages that use tidymodels
Rscript R/re-render-package.R --all # every page on the siteThis reads data/package_map.json to find affected pages, clears their freeze cache, and re-renders them.
To re-render specific pages directly, use R/re-render-pages.R:
Rscript R/re-render-pages.R learn/models/parsnip-nnet/index.qmd
Rscript R/re-render-pages.R learn/models/parsnip-nnet/index.qmd start/resampling/index.qmdThis clears the freeze cache for each page and re-renders it.
-
data/package_map.json: maps each package to the pages that depend on it. Regenerate after changing anyr-packages:field:Rscript R/make_package_map.R
-
data/_versions.json: records the installed package versions at the time of the last render. Update after any re-render:Rscript R/make_versions.R
The check-cran-releases.yml workflow runs on weekdays at 4am Pacific time. It compares current CRAN versions against data/_versions.json and, if any packages have updated, automatically:
- Installs only the packages needed for the affected pages (via
R/install_for_packages.R, which uses the sharedR/install_packages.Rhelper) - Re-renders the affected pages
- Updates
data/_versions.jsonanddata/package_map.json - Opens a pull request for review, including the old and new versions of each updated package
If any page fails to render, an issue is opened instead of a PR, with a link to the failed workflow run. The data/_versions.json and data/package_map.json are not updated on failure, so the workflow will retry on the next run.
You can also trigger it manually from the GitHub Actions UI, or with the gh CLI:
# Normal version check
gh workflow run check-cran-releases.yml
# Force re-render for specific packages
gh workflow run check-cran-releases.yml -f packages="ranger glmnet"
# Re-render every page
gh workflow run check-cran-releases.yml -f packages="--all"To re-render specific pages by path (rather than by package), use the render-pages.yml workflow:
gh workflow run render-pages.yml -f pages="learn/models/parsnip-nnet/index.qmd"
# Multiple pages:
gh workflow run render-pages.yml -f pages="learn/models/parsnip-nnet/index.qmd start/resampling/index.qmd"This installs the packages declared in each page's r-packages: front matter, clears the freeze cache, renders the pages, and opens a pull request.
The find/ pages display searchable tables of functions, models, and recipe steps. The data for these tables comes from CSV files generated by scripts in make_function_lists/.
Regenerate the function lists when:
- New packages are added to the tidymodels ecosystem
- After major CRAN releases of tidymodels packages
- When new models or recipe steps are added
When adding a new package to the tidymodels ecosystem, first add it to make_function_lists/all_packages.csv (one package name per row), then regenerate the function lists.
To regenerate all function lists:
Rscript make_function_lists/run_all.RTo force a fresh run (ignoring cache):
Rscript make_function_lists/run_all.R --freshTo run individual generators:
Rscript make_function_lists/broom.R
Rscript make_function_lists/recipes.R
Rscript make_function_lists/tidymodels.R
Rscript make_function_lists/parsnip.R
Rscript make_function_lists/sparse.R
Rscript make_function_lists/tidyclust.RThe styling of this website is happening in a number of different places. Some of the high-level changes are set in the format section of _quarto.yml, with the rest of the main styles set in styles.scss.
The front page includes a number of detailed styling, these are all located in styles-frontpage.scss. They are all wrapped in #FrontPage ID so they shouldn't affect anything not located in the front page.
The sidebar for the Get Started section has a unique style, and that is specified in the start/styles.css file, that is loaded into each of these pages with either css: styles.css or css: ../styles.css.