Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 24 additions & 2 deletions .github/workflows/checks.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ jobs:
- name: Build with Jekyll
run: bundle exec jekyll build
working-directory: ./docs
- name: Run Lychee
- name: Check online links (lychee)
uses: lycheeverse/lychee-action@v2
with:
args: >-
Expand All @@ -45,4 +45,26 @@ jobs:
--root-dir ${{ github.workspace }}/docs/_site
./_site
workingDirectory: ./docs
fail: true
fail: true
- name: Set up Python for offline link check
uses: actions/setup-python@v5
with:
python-version: '3.14'
- name: Install Python deps
run: pip install -r requirements.txt
- name: Check offline links (check_links.py)
run: >-
python scripts/check_links.py
--offline --include-fragments
--index-files index.html
--root-dir docs/_site-offline
docs/_site-offline
- name: Check for surviving live-site links in offline tree
# Flags any https://docs.twinbasic.com/<path> reference left in
# _site-offline/ HTML outside <code>/<pre> blocks. After offlinify
# strips the jekyll-seo-tag block, anything surviving is a source
# link that points at the live site instead of using a relative or
# /tB/... permalink that resolves locally. The bare root URL
# (https://docs.twinbasic.com[/]) is exempt -- intentional "go to
# the live site" links are allowed.
run: python scripts/check_offline_live_links.py
49 changes: 33 additions & 16 deletions .github/workflows/jekyll-gh-pages.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ jobs:
env:
JEKYLL_ENV: production
PAGES_REPO_NWO: "${{ github.repository }}"
- name: Run Lychee against the online tree
- name: Check online links (lychee)
uses: lycheeverse/lychee-action@v2
with:
# --remap matches the fully-resolved file URI (not the raw href), so the pattern
Expand All @@ -68,6 +68,11 @@ jobs:
# `--fallback-extensions html` mirrors what GitHub Pages does at request time:
# an extensionless URL like `/FAQ` is served as `/FAQ.html`. Without the flag
# lychee would flag every pretty permalink on the site.
#
# Lychee, not the Python checker, handles the online tree here because the
# `--remap` flag isn't implemented by scripts/check_links.py; the offline tree
# below has all baseurl prefixes already stripped by the offlinify plugin and
# so doesn't need it.
args: >-
--offline --include-fragments
--fallback-extensions html
Expand All @@ -77,22 +82,34 @@ jobs:
./_site
workingDirectory: ./docs
fail: true
- name: Run Lychee against the offline tree
uses: lycheeverse/lychee-action@v2
- name: Set up Python for offline link check
uses: actions/setup-python@v5
with:
# Strict check on `_site-offline/`: every link must resolve to an actual file
# under `file://`, with no extension fallback. Catches relative links in
# markdown sources that point at a permalink that doesn't match the rendered
# filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not
# `Foo/index.html`) -- the kind of breakage the online check above hides
# behind `--fallback-extensions html`.
args: >-
--offline --include-fragments
--index-files 'index.html'
--root-dir ${{ github.workspace }}/docs/_site-offline
./_site-offline
workingDirectory: ./docs
fail: true
python-version: '3.14'
- name: Install Python deps
run: pip install -r requirements.txt
- name: Check offline links (check_links.py)
# Strict check on `_site-offline/`: every link must resolve to an actual file
# under `file://`, with no extension fallback. Catches relative links in
# markdown sources that point at a permalink that doesn't match the rendered
# filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not
# `Foo/index.html`) -- the kind of breakage the online check above hides
# behind `--fallback-extensions html`.
run: >-
python scripts/check_links.py
--offline --include-fragments
--index-files index.html
--root-dir docs/_site-offline
docs/_site-offline
- name: Check for surviving live-site links in offline tree
# Flags any https://docs.twinbasic.com/<path> reference left in
# _site-offline/ HTML outside <code>/<pre> blocks. After offlinify
# strips the jekyll-seo-tag block, anything surviving is a source
# link that points at the live site instead of using a relative or
# /tB/... permalink that resolves locally. The bare root URL
# (https://docs.twinbasic.com[/]) is exempt -- intentional "go to
# the live site" links are allowed.
run: python scripts/check_offline_live_links.py
- name: Upload Pages artifact
uses: actions/upload-pages-artifact@v5
with:
Expand Down
2 changes: 1 addition & 1 deletion docs/Miscellaneous/Documentation Development.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ To check that none of the internal links in the most recent documentation build

check.bat

This runs three checks: [Lychee](https://github.com/lycheeverse/lychee) in offline mode against `_site/` (the live tree), the same against `_site-offline/` (the file://-browsable mirror), and a small Python pass over `_site-offline/` that flags any surviving `https://docs.twinbasic.com/<path>` link --- the offline mirror should not navigate back to the live docs site.
This runs three checks: `scripts/check_links.py` against `_site/` (the live tree, in offline mode), the same against `_site-offline/` (the file://-browsable mirror), and `scripts/check_offline_live_links.py` over `_site-offline/` that flags any surviving `https://docs.twinbasic.com/<path>` link --- the offline mirror should not navigate back to the live docs site. The same three checks run in CI on every pull request and on every push to `staging`.

### Building and Local Serving

Expand Down
19 changes: 11 additions & 8 deletions docs/_plugins/offlinify.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,22 +300,25 @@ The offline build touches the following files:
| `docs/_config.yml` | `also_build_offline: true` (default-on) and `exclude: [_site-offline]` (keeps Jekyll's watcher from rebuilding on the plugin's own output). |
| `docs/build.bat` | Plain `bundle exec jekyll build` — produces `_site/`, `_site-offline/`, and (via `pdfify.rb`) `_site-pdf/` in one run. |
| `docs/serve.bat` | `bundle exec jekyll serve` — watcher-friendly thanks to the exclude. |
| `docs/check.bat` | Local link check (dev-side only; CI runs the two lychee passes directly). Three steps: lychee permissive on `_site/`, lychee strict on `_site-offline/`, and `scripts/check_offline_live_links.py` against `_site-offline/`. Exits non-zero on any failure. |
| `scripts/check_offline_live_links.py` | Flags any `https://docs.twinbasic.com/<path>` reference that survived offlinify in `_site-offline/` HTML, outside `<code>` / `<pre>` blocks. Skips the bare root (`https://docs.twinbasic.com[/]`) since intentional "go to the live site" links are allowed. Caught locally by `check.bat`; not wired into CI. |
| `docs/check.bat` | Local link check (CI runs the same three passes via the workflows). Three steps: `scripts/check_links.py` permissive on `_site/`, `scripts/check_links.py` strict on `_site-offline/`, and `scripts/check_offline_live_links.py` against `_site-offline/`. Exits non-zero on any failure. |
| `scripts/check_offline_live_links.py` | Flags any `https://docs.twinbasic.com/<path>` reference that survived offlinify in `_site-offline/` HTML, outside `<code>` / `<pre>` blocks. Skips the bare root (`https://docs.twinbasic.com[/]`) since intentional "go to the live site" links are allowed. Run by `check.bat` locally and by both CI workflows after the offline link check. |
| `docs/.gitignore` | `_site`, `_site-offline`, and `_site-pdf` all excluded from git. |
| `.github/workflows/jekyll-gh-pages.yml` | CI workflow. Builds, runs lychee against both trees, deploys to Pages, and (on manual dispatch) packages `_site-offline/` as a release artifact. |
| `.github/workflows/jekyll-gh-pages.yml` | Deploy workflow (push to `staging`, manual dispatch). Builds, runs lychee against `_site/`, runs `scripts/check_links.py` against `_site-offline/`, runs `scripts/check_offline_live_links.py` against `_site-offline/`, deploys to Pages, and (on manual dispatch) packages `_site-offline/` as a release artifact. |
| `.github/workflows/checks.yml` | PR-gating workflow (pull-request to `main`, manual dispatch). Same three link-check steps as the deploy workflow; no deploy or release. |

## CI integration

`bundle exec jekyll build` in CI passes `--baseurl "${{ steps.pages.outputs.base_path }}"` from `actions/configure-pages`. For a Pages site with a custom domain (CNAME), base_path is empty. For a project page without a custom domain, it's `/repo-name`. Offlinify handles both cases — `normalize_baseurl` in `setup` produces the right prefix to strip.

The workflow has two lychee steps after the build:
The workflow has three link-check steps after the build:

1. **Against `_site/`**, with `--fallback-extensions html` and a `--remap` that strips the base_path prefix. This mirrors what GitHub Pages does at request time — extensionless URLs like `/FAQ` get served as `/FAQ.html`. Without `--fallback-extensions html`, every pretty permalink would appear broken in this check.
1. **Lychee against `_site/`**, with `--fallback-extensions html` and a `--remap` that strips the base_path prefix. This mirrors what GitHub Pages does at request time — extensionless URLs like `/FAQ` get served as `/FAQ.html`. Without `--fallback-extensions html`, every pretty permalink would appear broken in this check. Lychee (not `scripts/check_links.py`) handles the online tree because `--remap` isn't implemented in the Python checker; the offline tree below has all baseurl prefixes already stripped by offlinify and doesn't need it.

2. **Against `_site-offline/`**, strict — no extension fallback (`--index-files 'index.html'` only; the online check also accepts the bare directory via `,.`). Every link must resolve to a real file as written. This catches relative links in markdown sources whose permalink shape doesn't match the rendered filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not `Foo/index.html`) — the kind of breakage the online check above hides behind both the fallback and the bare-directory acceptance.
2. **`scripts/check_links.py` against `_site-offline/`**, strict — no extension fallback (`--index-files index.html` only; the online check also accepts the bare directory via `,.`). Every link must resolve to a real file as written. This catches relative links in markdown sources whose permalink shape doesn't match the rendered filename (e.g. `[Foo](Foo/)` when Jekyll wrote `Foo.html`, not `Foo/index.html`) — the kind of breakage the online check above hides behind both the fallback and the bare-directory acceptance. The Python checker is roughly 25× faster than lychee on this workload and a bit stricter (catches missing `<script src>` targets and trailing slashes on file-shaped URLs).

Both checks set `fail: true`. Any unresolved link fails the build, blocks the Pages deploy, and blocks the release upload. After both lychee runs succeed and Pages is deployed, the release job (gated to manual dispatch only) downloads the offline-site workflow artifact, computes a tag like `docs-YYYY-MM-DD-HHMM` (UTC), and creates a GitHub release with `twinbasic-docs-offline.zip` attached via `softprops/action-gh-release@v2`.
3. **`scripts/check_offline_live_links.py` against `_site-offline/`**, flagging any surviving `https://docs.twinbasic.com/<path>` reference outside `<code>` / `<pre>` blocks (the bare root is exempt — see [Failure modes: Surviving live-site links](#failure-modes)).

All three steps fail the build on the first non-zero exit, blocking the Pages deploy and the release upload. After they succeed and Pages is deployed, the release job (gated to manual dispatch only) downloads the offline-site workflow artifact, computes a tag like `docs-YYYY-MM-DD-HHMM` (UTC), and creates a GitHub release with `twinbasic-docs-offline.zip` attached via `softprops/action-gh-release@v2`.

## Failure modes

Expand All @@ -331,7 +334,7 @@ The plugin surfaces several conditions in its summary log lines:

- **`_site-offline/` triggering `jekyll serve` rebuilds.** Was a problem; now handled by two things in combination: `exclude: [_site-offline]` in `_config.yml`, and the "clean contents but keep the directory" trick in the wipe step (which keeps all watcher events under `_site-offline/...` where the exclude matches).

- **Surviving live-site links.** The [SEO block stripping](#seo-block-stripping) pass removes the bulk of `https://docs.twinbasic.com` references each page contains (canonical link, OpenGraph URL, JSON-LD `url`). Anything left in `_site-offline/` is a source link that points at the live docs site -- usually a markdown author writing `https://docs.twinbasic.com/<path>` instead of a relative link or `/tB/...` permalink, which would silently navigate the offline reader back online. `scripts/check_offline_live_links.py` (run by `check.bat` after the offline lychee pass) flags these locally; the bare root `https://docs.twinbasic.com[/]` is exempt since intentional "go to the live site" links are allowed. CI does not run this check.
- **Surviving live-site links.** The [SEO block stripping](#seo-block-stripping) pass removes the bulk of `https://docs.twinbasic.com` references each page contains (canonical link, OpenGraph URL, JSON-LD `url`). Anything left in `_site-offline/` is a source link that points at the live docs site -- usually a markdown author writing `https://docs.twinbasic.com/<path>` instead of a relative link or `/tB/...` permalink, which would silently navigate the offline reader back online. `scripts/check_offline_live_links.py` flags these; the bare root `https://docs.twinbasic.com[/]` is exempt since intentional "go to the live site" links are allowed. Run locally by `check.bat` and in CI by both workflows after the offline link check.

## Performance

Expand Down
14 changes: 10 additions & 4 deletions docs/check.bat
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
@rem Use lychee to check the links in both build outputs, then scan
@rem Run the Python-based link checker on both build outputs, then scan
@rem _site-offline/ for live-site links that survived offlinify.
@rem
@rem Same arguments as lychee.bat -- only the executable differs. The Python
@rem script is faster on this workload (~25x on Windows) and a bit stricter:
@rem it flags <script src> targets that don't exist and rejects trailing
@rem slashes on file-shaped URLs (e.g. `foo.html/`), both of which lychee
@rem silently accepts. lychee.bat remains available as a second opinion.
@rem
@rem _site/ Online tree. `--fallback-extensions html` mirrors what
@rem GitHub Pages does at request time: an extensionless
@rem URL like /FAQ is served as /FAQ.html. Without the flag
Expand All @@ -23,9 +29,9 @@
@rem script exits non-zero if any fails (earlier failures take precedence
@rem in the reported code).
@setlocal
@set LYCHEE="%~dp0..\.claude\lychee.exe"
@set CHECK=python "%~dp0..\scripts\check_links.py"
@echo Checking _site/ (online) ...
@%LYCHEE% --offline --include-fragments --fallback-extensions html --index-files "index.html,." --root-dir ".\_site" ".\_site" %*
@%CHECK% --offline --include-fragments --fallback-extensions html --index-files "index.html,." --root-dir ".\_site" ".\_site" %*
@set EXIT1=%ERRORLEVEL%
@echo.
@echo Checking _site-offline/ (offline) ...
Expand All @@ -34,7 +40,7 @@
@rem above accepts `.` because GitHub Pages can serve an unstyled
@rem directory listing or a 404 in that case; offline, there's no
@rem such fallback, and the link is just broken.
@%LYCHEE% --offline --include-fragments --index-files "index.html" --root-dir ".\_site-offline" ".\_site-offline" %*
@%CHECK% --offline --include-fragments --index-files "index.html" --root-dir ".\_site-offline" ".\_site-offline" %*
@set EXIT2=%ERRORLEVEL%
@echo.
@echo Checking _site-offline/ for live-site links ...
Expand Down
45 changes: 45 additions & 0 deletions docs/lychee.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
@rem Use lychee to check the links in both build outputs, then scan
@rem _site-offline/ for live-site links that survived offlinify.
@rem
@rem _site/ Online tree. `--fallback-extensions html` mirrors what
@rem GitHub Pages does at request time: an extensionless
@rem URL like /FAQ is served as /FAQ.html. Without the flag
@rem every pretty permalink would appear broken.
@rem _site-offline/ Offline tree. No extension fallback -- every link must
@rem resolve to an actual file under file://, since the
@rem browser does no rewriting. Catches relative links in
@rem markdown sources whose permalink shape doesn't match
@rem the rendered filename (e.g. `[Foo](Foo/)` when Jekyll
@rem wrote `Foo.html`, not `Foo/index.html`).
@rem live-links Greps _site-offline/ HTML for any surviving
@rem https://docs.twinbasic.com reference outside <code> /
@rem <pre> blocks. After _plugins/offlinify.rb strips the
@rem jekyll-seo-tag block from each page, none should
@rem remain -- a hit means a source link goes to the live
@rem site instead of the canonical /tB/... permalink.
@rem See ../scripts/check_offline_live_links.py.
@rem
@rem All three checks always run so you see all errors in one pass; the
@rem script exits non-zero if any fails (earlier failures take precedence
@rem in the reported code).
@setlocal
@set LYCHEE="%~dp0..\.claude\lychee.exe"
@echo Checking _site/ (online) ...
@%LYCHEE% --offline --include-fragments --fallback-extensions html --index-files "index.html,." --root-dir ".\_site" ".\_site" %*
@set EXIT1=%ERRORLEVEL%
@echo.
@echo Checking _site-offline/ (offline) ...
@rem No `.` in --index-files: under file://, a bare directory URL
@rem (`Foo/`) requires an actual index.html inside. The online check
@rem above accepts `.` because GitHub Pages can serve an unstyled
@rem directory listing or a 404 in that case; offline, there's no
@rem such fallback, and the link is just broken.
@%LYCHEE% --offline --include-fragments --index-files "index.html" --root-dir ".\_site-offline" ".\_site-offline" %*
@set EXIT2=%ERRORLEVEL%
@echo.
@echo Checking _site-offline/ for live-site links ...
@python "%~dp0..\scripts\check_offline_live_links.py"
@set EXIT3=%ERRORLEVEL%
@if %EXIT1% NEQ 0 exit /b %EXIT1%
@if %EXIT2% NEQ 0 exit /b %EXIT2%
@exit /b %EXIT3%
1 change: 1 addition & 0 deletions experiments/lcheck-fixture/site/assets/script.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
// noop
1 change: 1 addition & 0 deletions experiments/lcheck-fixture/site/assets/style.css
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
body { color: black; }
2 changes: 2 additions & 0 deletions experiments/lcheck-fixture/site/good-dir-noindex/other.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<!DOCTYPE html>
<title>no index here</title>
3 changes: 3 additions & 0 deletions experiments/lcheck-fixture/site/good-dir/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<!DOCTYPE html>
<title>dir index</title>
<h2 id="dir-anchor">Dir Anchor</h2>
2 changes: 2 additions & 0 deletions experiments/lcheck-fixture/site/good-fallback.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
<!DOCTYPE html>
<title>fallback</title>
4 changes: 4 additions & 0 deletions experiments/lcheck-fixture/site/good-fragments.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<!DOCTYPE html>
<title>frag</title>
<h2 id="known">Known</h2>
<a name="legacy-anchor">legacy</a>
3 changes: 3 additions & 0 deletions experiments/lcheck-fixture/site/good.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
<!DOCTYPE html>
<title>good</title>
<h1 id="top">good</h1>
Loading
Loading