Skip to content

tidymodels/tidymodels.org

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

470 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Creative Commons License

tidymodels.org

This repo is the source of https://www.tidymodels.org, and this readme tells you how it all works.

  • If you spot any small problems with the website, please feel empowered to fix them directly with a PR.

  • If you see any larger problems, an issue is probably better: that way we can discuss the problem before you commit any time to it.

This repo (and resulting website) is licensed as CC BY-SA.

Contributing

  1. Fork and clone the repo, then install R packages by running R/installs.R and restart R.
  2. Edit the relevant .qmd file(s) in start/, learn/, find/, etc.
  3. Render your changes locally with quarto render <path/to/file.qmd>, or skip local rendering by opening a PR and commenting:
    /render
    
    CI will render the changed pages and commit the outputs back to your branch.
  4. Commit the source and the rendered outputs (_freeze/ entries and .md files) to your branch.
  5. Open a pull request.

If you're adding a new article to learn/, create a folder with an index.qmd inside it. If it needs packages beyond the core tidymodels meta-package, add them to R/installs.R and to the r-packages: field in the file's YAML front matter.

Adding a new package to the ecosystem

If you're adding a new package to the tidymodels ecosystem, add it to make_function_lists/all_packages.csv (one package name per row), then regenerate the function lists with Rscript make_function_lists/run_all.R.

Requirements to preview the site locally

R packages

When updating the site, the goal is to use the most recent CRAN versions of the modeling/data analysis packages.

  1. Get a local copy of the website source.

    • Users of devtools/usethis can do:
      usethis::create_from_github("tidymodels/tidymodels.org")
      Note that usethis::create_from_github() works best when it can find a GitHub personal access token and usethis (git2r, really) is configured correctly for your preferred transport protocol (SSH vs HTTPS). Setup advice.
    • Otherwise, use your favorite method to fork and clone or download the repo as a ZIP file and unpack.
  2. Start R in your new tidymodels.org/ directory.

  3. To install the required packages, run the code within

    R/installs.R
    

    This file will also install the keras python libraries and environments.

  4. Restart R.

  5. You should now be able to render the site in all the usual ways for quarto by calling quarto render.

Quarto

We use the latest release version of quarto. You can install and manage different version with qvm.

The website is deployed to GitHub Pages via the publish.yml workflow.

Structure

The source of the website is a collection of .qmd files stored in the folders in this repository. This site is then rendered as a Quarto html website.

  • packages/: this is a top-level page on the site rendered from a single .qmd file.

  • start/: these files make up a 5-part tutorial series to help users get started with tidymodels. Each article is an .qmd file as a page bundle, meaning that each article is in its own folder along with accompanying images, data, and rendered figures.

  • learn/: these files make up the articles presented in the learn section. This section is nested, meaning that inside this section, there are actually 4 subsections: models, statistics, work, develop. Each article is an .qmd file.

  • help/: this is a top-level page on the site rendered from a single .qmd file.

  • contribute/: this is a top-level page on the site rendered from a single .qmd file.

  • books/: these files make up the books page, linked from resource stickies. To add a new book, create a new folder with a new .qmd file inside named index.qmd. An image file of the cover should be added in the same folder, named cover.*.

  • find/: these files make up the find page, linked from the top navbar and resource stickies. Each of these pages is an .qmd file. The CSV data files in this directory are generated by scripts in make_function_lists/.

  • make_function_lists/: scripts that generate the CSV reference lists for the find pages. See Generating function lists below. all_packages.csv in this folder lists all packages in the tidymodels ecosystem and is used by tidymodels.R to build the function reference.

  • about/: author listing page rendered from index.qmd using authors.yaml and template.ejs.

  • cheatsheets/: cheatsheet page rendered from a single index.qmd file.

Quarto profiles

This repo uses two Quarto profiles to split behavior between local and CI rendering:

  • _quarto-local.yml (default): used when rendering locally. Defines post-render scripts: learn/models/parsnip-predictions/normalize-h2o-output.R, R/post-render.R, and R/post-render-downlit.R.
  • _quarto-production.yml: used in CI via QUARTO_PROFILE: production in publish.yml. Also runs R/post-render-downlit.R so code linking applies to all HTML files including frozen pages.

When adding a script that should only run locally, add it to _quarto-local.yml. If it should run in CI, add it to _quarto-production.yml and ensure the workflow installs the needed dependencies.

Code linking

R functions in code blocks are hyperlinked to their documentation via the downlit package, enabled with code-link: true in _quarto.yml.

Because library(tidymodels) is not automatically expanded by downlit (unlike library(tidyverse)), R/post-render-downlit.R explicitly seeds the package list via tidymodels::tidymodels_packages() so functions like step_*, tune(), etc. are linked correctly.

Package metadata

Every .qmd file that contains R code declares its package dependencies in the YAML front matter using the r-packages field:

r-packages:
  - tidymodels
  - ranger
  - kableExtra

Convention: list only packages that are not already members of the tidymodels meta-package. The full list of tidymodels members can be checked with tidymodels::tidymodels_packages(). For example, dplyr, ggplot2, modeldata, tune, and rlang are all covered by listing tidymodels and should not be listed separately.

This metadata is the foundation for tooling that can:

  • install exactly the packages needed for a given page
  • selectively re-render only pages affected by a package release

Pure prose pages (no R code chunks) do not need this field.

Workflow

  • To add a new post to learn/, add a new folder with a index.qmd file in it and adapt the YAML header from an existing post. If new packages are required to run this post, then add them to the packages object in R/installs.R and to the r-packages field in the new post's YAML front matter.

  • To preview the site, render it locally with the latest quarto release version.

  • Rendered outputs are committed to the repo — the freeze cache (_freeze/) and the .md files kept via keep-md: true. The publish.yml workflow renders and publishes the site on Ubuntu via quarto-actions/publish; frozen pages are served from the committed cache rather than re-rendered. Always include rendered outputs in your PR.

  • Rendering in CI via a PR comment: If you'd prefer not to render locally, comment /render on your open PR. A GitHub Actions workflow (render-pr.yml) will detect which .qmd files changed, install the needed packages, render those pages, and commit the output back to your branch. It posts a comment when done (or links to the failed run on error). Only repo owners, org members, and collaborators can trigger this.

  • Note on platform differences: As the automated nightly re-render (check-cran-releases.yml) matures, pages will increasingly be rendered on Linux (Ubuntu) rather than macOS. The first time a page is re-rendered in CI you may see numerical differences in the output — floating point results can vary slightly between platforms due to differences in BLAS/LAPACK libraries and other system-level factors. These differences are expected and not a sign of a bug, but should be reviewed before merging the automated PR.

  • keep-md: true is set in _quarto.yml so that rendered .md files are committed alongside the source. This makes it possible to review in a PR whether code produced different results than before.

  • To do a complete rerender, run R/re-render.R script.

Heavy engine dependencies

Some pages use engines that require large external downloads (Spark, torch). These are cached in CI to avoid re-downloading on every run.

Apache Spark

Spark is pre-installed in CI via .github/actions/setup-render/action.yml. When upgrading Spark, update all four of these in lockstep:

  • Cache key in setup-render/action.yml: spark-3.5.8-java17-${{ runner.os }}
  • Cache path and download URL in the Install Spark step of setup-render/action.yml: 3.5.8
  • Version check in R/install_packages.R: grepl("^3\\.5", ...)
  • version argument in all spark_connect() calls in learn/models/parsnip-predictions/index.qmd: "3.5"

Changing the cache key forces a fresh download on the next CI run. The R/install_packages.R check prevents a redundant download when Spark is already present (either from cache or a prior install).

Rerender

We try to do a rerender after a release of a main package.

  • Make sure that make_function_lists/all_packages.csv is up to date.

  • Run R/installs.R script. Make sure to check that dev versions aren't present.

  • Run R/re-render.R script.

Selective re-render

By package

To re-render only the pages affected by one or more package updates, use R/re-render-package.R:

Rscript R/re-render-package.R ranger
Rscript R/re-render-package.R ranger glmnet   # union of affected pages, deduped
Rscript R/re-render-package.R tidymodels      # all pages that use tidymodels
Rscript R/re-render-package.R --all           # every page on the site

This reads data/package_map.json to find affected pages, clears their freeze cache, and re-renders them.

By page path

To re-render specific pages directly, use R/re-render-pages.R:

Rscript R/re-render-pages.R learn/models/parsnip-nnet/index.qmd
Rscript R/re-render-pages.R learn/models/parsnip-nnet/index.qmd start/resampling/index.qmd

This clears the freeze cache for each page and re-renders it.

Supporting files

  • data/package_map.json: maps each package to the pages that depend on it. Regenerate after changing any r-packages: field:

    Rscript R/make_package_map.R
  • data/_versions.json: records the installed package versions at the time of the last render. Update after any re-render:

    Rscript R/make_versions.R

Automated re-renders via GitHub Actions

The check-cran-releases.yml workflow runs on weekdays at 4am Pacific time. It compares current CRAN versions against data/_versions.json and, if any packages have updated, automatically:

  1. Installs only the packages needed for the affected pages (via R/install_for_packages.R, which uses the shared R/install_packages.R helper)
  2. Re-renders the affected pages
  3. Updates data/_versions.json and data/package_map.json
  4. Opens a pull request for review, including the old and new versions of each updated package

If any page fails to render, an issue is opened instead of a PR, with a link to the failed workflow run. The data/_versions.json and data/package_map.json are not updated on failure, so the workflow will retry on the next run.

You can also trigger it manually from the GitHub Actions UI, or with the gh CLI:

# Normal version check
gh workflow run check-cran-releases.yml

# Force re-render for specific packages
gh workflow run check-cran-releases.yml -f packages="ranger glmnet"

# Re-render every page
gh workflow run check-cran-releases.yml -f packages="--all"

To re-render specific pages by path (rather than by package), use the render-pages.yml workflow:

gh workflow run render-pages.yml -f pages="learn/models/parsnip-nnet/index.qmd"

# Multiple pages:
gh workflow run render-pages.yml -f pages="learn/models/parsnip-nnet/index.qmd start/resampling/index.qmd"

This installs the packages declared in each page's r-packages: front matter, clears the freeze cache, renders the pages, and opens a pull request.

Generating function lists

The find/ pages display searchable tables of functions, models, and recipe steps. The data for these tables comes from CSV files generated by scripts in make_function_lists/.

When to regenerate

Regenerate the function lists when:

  • New packages are added to the tidymodels ecosystem
  • After major CRAN releases of tidymodels packages
  • When new models or recipe steps are added

When adding a new package to the tidymodels ecosystem, first add it to make_function_lists/all_packages.csv (one package name per row), then regenerate the function lists.

How to run

To regenerate all function lists:

Rscript make_function_lists/run_all.R

To force a fresh run (ignoring cache):

Rscript make_function_lists/run_all.R --fresh

To run individual generators:

Rscript make_function_lists/broom.R
Rscript make_function_lists/recipes.R
Rscript make_function_lists/tidymodels.R
Rscript make_function_lists/parsnip.R
Rscript make_function_lists/sparse.R
Rscript make_function_lists/tidyclust.R

Styling

The styling of this website is happening in a number of different places. Some of the high-level changes are set in the format section of _quarto.yml, with the rest of the main styles set in styles.scss.

The front page includes a number of detailed styling, these are all located in styles-frontpage.scss. They are all wrapped in #FrontPage ID so they shouldn't affect anything not located in the front page.

The sidebar for the Get Started section has a unique style, and that is specified in the start/styles.css file, that is loaded into each of these pages with either css: styles.css or css: ../styles.css.

About

Source of tidymodels.org

Resources

Stars

Watchers

Forks

Contributors