From deb4000e85d3f68ea6848a0d3b97a3211edba720 Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Sun, 17 May 2026 20:48:55 +0200 Subject: [PATCH 01/15] Cache pip dependencies, just as we do for ruby. --- .github/workflows/checks.yml | 1 + .github/workflows/jekyll-gh-pages.yml | 1 + requirements.txt | 2 +- 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/.github/workflows/checks.yml b/.github/workflows/checks.yml index 30f83ff..dfa84e6 100644 --- a/.github/workflows/checks.yml +++ b/.github/workflows/checks.yml @@ -50,6 +50,7 @@ jobs: uses: actions/setup-python@v5 with: python-version: '3.14' + cache: 'pip' - name: Install Python deps run: pip install -r requirements.txt - name: Check offline links (check_links.py) diff --git a/.github/workflows/jekyll-gh-pages.yml b/.github/workflows/jekyll-gh-pages.yml index 346d02a..351ee1e 100644 --- a/.github/workflows/jekyll-gh-pages.yml +++ b/.github/workflows/jekyll-gh-pages.yml @@ -86,6 +86,7 @@ jobs: uses: actions/setup-python@v5 with: python-version: '3.14' + cache: 'pip' - name: Install Python deps run: pip install -r requirements.txt - name: Check offline links (check_links.py) diff --git a/requirements.txt b/requirements.txt index 4eb0354..9761951 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +1 @@ -selectolax>=0.4 +selectolax==0.4.9 From 2c0104a99088554a8bec5d61c4e62fc2c360c0e7 Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Sun, 17 May 2026 20:59:42 +0200 Subject: [PATCH 02/15] Add profiling to offlinify. --- docs/_plugins/offlinify.md | 46 +++++++++- docs/_plugins/offlinify.rb | 175 ++++++++++++++++++++++++++++--------- 2 files changed, 178 insertions(+), 43 deletions(-) diff --git a/docs/_plugins/offlinify.md b/docs/_plugins/offlinify.md index 549a8db..7145d94 100644 --- a/docs/_plugins/offlinify.md +++ b/docs/_plugins/offlinify.md @@ -344,11 +344,51 @@ The optimization story is captured in the commit history. Briefly: - **+ `site_paths` Set** (O(1) lookup): down to ~10 s, before further work. - **+ `result_cache`, `seg_cache`, manual LCP** (replaced `Pathname.relative_path_from` per match with a string-segment comparison): down to ~7 s as the site grew past 800 pages. - **+ combined HTML regex** (single gsub matching both absolute and page-relative URLs in one pass — eliminating the second full file scan and the interim re-scan of code-block ranges that used to sit between two separate passes): down to ~4 s. Roughly 40% off the HTML walk. -- **+ per-page hook architecture** (`:pages, :post_render` consumes `page.output` in memory rather than re-reading the rendered HTML from `_site/` at `:site, :post_write`): the per-file `File.binread` is eliminated. Cumulative self-time across hooks is ~5-6 s on the current ~830-page site, dominated by per-page Jekyll hook dispatch overhead and the per-page `File.binwrite`. The ~290 jekyll-redirect-from stubs go through a much cheaper code path than the main HTML pass (a single regex over a few hundred bytes, no code-block scan, no search-setup injection) so they're a small slice of the total. +- **+ per-page hook architecture** (`:pages, :post_render` consumes `page.output` in memory rather than re-reading the rendered HTML from `_site/` at `:site, :post_write`): the per-file `File.binread` is eliminated. Cumulative self-time across hooks is ~5-6 s on the current ~830-page site. The ~290 jekyll-redirect-from stubs go through a much cheaper code path than the main HTML pass (a single regex over a few hundred bytes, no code-block scan, no search-setup injection) so they're a small slice of the total. -The remaining cumulative time is mostly `File.binwrite` across ~830 HTML files (Windows file I/O on NTFS is the dominant cost) plus the regex pass over the SCSS-compiled `just-the-docs-combined.css`. +### Profiling -The static-file copy in `finish` adds an additional ~200 ms of `FileUtils.cp` for the binary assets (images, fonts, etc.) that don't need rewriting. +The plugin carries an opt-in per-operation timing breakdown, gated on Jekyll's existing `--profile` build flag. Running `bundle exec jekyll build --profile` adds 16 rows under the `Offlinify:` topic prefix after the existing summary lines, e.g.: + +``` +Offlinify: Offlinifier ran in 6236ms. +Offlinify: setup 291.2ms +Offlinify: dest_path 5.1ms +Offlinify: page.output.dup 1.2ms +Offlinify: strip_seo 30.4ms +Offlinify: code_ranges 688.5ms +Offlinify: rewrite_html 4597.7ms +Offlinify: inject_search 21.8ms +Offlinify: write_html 382.8ms +Offlinify: rewrite_css 0.7ms +Offlinify: write_css 3.4ms +Offlinify: rewrite_redirect 5.5ms +Offlinify: write_redirect 74.0ms +Offlinify: write_other 3.7ms +Offlinify: copy_static 109.5ms +Offlinify: patch_jtd 0.5ms +Offlinify: search_data 2.4ms +``` + +The row labels match the `tick(:time_*)` keys spread across `setup`, `process_page`, and `finish`. The sum of the rows is within ~20 ms of `cumulative_ms` — the gap is hook plumbing not wrapped in a `tick` (the extension dispatch, the `file_dir` computation, the `case` itself). + +The instrumentation lives behind a `tick(key) { ... }` helper. With `--profile` off, the helper is a single boolean check plus a block yield — negligible at the per-page rate. With `--profile` on, each callsite reads the monotonic clock twice and accumulates the elapsed ms into `@state[key]`. The breakdown is emitted by `log_profile_breakdown` at the end of `finish`, which walks the `BREAKDOWN_KEYS` constant table. + +### Hot spots (current site shape) + +The breakdown above puts the cost of each phase in concrete numbers. The picture: + +- **`rewrite_html`** dominates at ~70% of all Offlinify time. The combined HTML regex runs ~50 matches per page × ~830 pages ≈ 42k callbacks. Each callback does a code-range linear scan, a cache-key string build, a hash lookup, and a result-string build. Per-match Ruby work, not the regex match itself, is the bottleneck. + +- **`code_ranges`** is second at ~12%. The `<(code|pre)\b[^>]*>(.*?)<\/\1>` regex uses a backreference (`\1`) that defeats fast literal-string scanning for the closing tag — the engine has to remember which group matched. + +- **`write_html`** at ~7%: 837 `File.binwrite` calls. Windows NTFS file-creation cost dominates over the byte write. Lowering the file count is not an option (each page is a separate file). + +- **`setup`** at ~6%: the `Pathname.relative_path_from` walk in `build_site_paths` over the in-memory page set. + +- **`copy_static`** at ~2%: 221 `FileUtils.cp` calls in `finish` for binary assets that didn't need rewriting. + +The two regex passes (`rewrite_html` + `code_ranges`) together account for ~82% of all Offlinify time and both traverse the same ~130 KB-per-page HTML. Fusing them into a single regex pass — where the engine consumes ``/`
` blocks atomically and never tries the href/src alternative inside — is the largest remaining optimization opportunity.
 
 ## Known limitations
 
diff --git a/docs/_plugins/offlinify.rb b/docs/_plugins/offlinify.rb
index 36c07b1..e439ef1 100644
--- a/docs/_plugins/offlinify.rb
+++ b/docs/_plugins/offlinify.rb
@@ -368,6 +368,24 @@ def self.offline_excluded?(rel, patterns)
     patterns.any? { |pat| File.fnmatch(pat, rel, File::FNM_PATHNAME) }
   end
 
+  # Time the given block and accumulate its elapsed ms into
+  # `@state[key]`, then return the block's value. When the build
+  # was not started with `--profile`, this is a pass-through that
+  # just calls the block -- no clock reads, no map writes. Callsites
+  # therefore pay only one boolean check + one block yield when
+  # profiling is off (negligible at the per-page rate).
+  #
+  # Usage:
+  #   content = tick(:time_dispatch_dup) { page.output.dup }
+  #   _changed, misses = tick(:time_rewrite_html) { rewrite_html!(...) }
+  def self.tick(key)
+    return yield unless @state[:profile]
+    t = Process.clock_gettime(Process::CLOCK_MONOTONIC)
+    result = yield
+    @state[key] += (Process.clock_gettime(Process::CLOCK_MONOTONIC) - t) * 1000
+    result
+  end
+
   # Matches `...` and `
...
` blocks, capturing # the BODY between the tags (group 2). Used to identify regions of # rendered HTML the URL rewrite passes should leave alone -- the @@ -474,10 +492,34 @@ def self.setup(site) # pre_render and post_write (which includes Jekyll's render # and write phases between our hooks). cumulative_ms: 0.0, + # When true (the `--profile` build flag is set), the `tick` + # helper measures each instrumented operation and the finish + # hook emits a per-operation breakdown alongside Jekyll's own + # render-stats table. When false, `tick` is a no-op pass-through + # and the per-operation accumulators stay at zero. + profile: !!site.config["profile"], + time_setup: 0.0, + time_strip_seo: 0.0, + time_code_ranges: 0.0, + time_rewrite_html: 0.0, + time_inject_search: 0.0, + time_write_html: 0.0, + time_rewrite_css: 0.0, + time_write_css: 0.0, + time_rewrite_redirect: 0.0, + time_write_redirect: 0.0, + time_write_other: 0.0, + time_dispatch_dup: 0.0, + time_dest_path: 0.0, + time_copy_static: 0.0, + time_patch_jtd: 0.0, + time_search_data: 0.0, } wipe_out_dest_contents(out_dest) - @state[:cumulative_ms] += (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000 + elapsed = (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000 + @state[:cumulative_ms] += elapsed + @state[:time_setup] += elapsed if @state[:profile] end # Normalise the configured baseurl to either empty string or @@ -549,8 +591,10 @@ def self.process_page(page) # Pathname round-trip -- Pathname#relative_path_from is roughly # 2ms per call on Windows and would dominate per-page cost on a # 1000+ page build. - dest_path = page.destination(@state[:dest]).tr("\\", "/") - rel = dest_path[(@state[:dest_root_fs].length + 1)..] + dest_path, rel = tick(:time_dest_path) do + dp = page.destination(@state[:dest]).tr("\\", "/") + [dp, dp[(@state[:dest_root_fs].length + 1)..]] + end if offline_excluded?(rel, @state[:exclude_patterns]) @state[:excluded_files] += 1 @@ -580,23 +624,29 @@ def self.process_page(page) # still loads if jekyll-redirect-from is removed. out_path = File.join(@state[:out_dest], rel) file_dir = File.dirname(out_path) - content = page.output.dup + content = tick(:time_dispatch_dup) { page.output.dup } site_url = @state[:site_url] - unless site_url.empty? - file_segs = file_dir_segs_from_rel(rel) - prefix_re = /#{Regexp.escape(site_url)}(\/[^"' >]*)/ - content = content.gsub(prefix_re) do - raw = Regexp.last_match(1) - cache_key = "#{file_dir}\x00#{raw}" - rel_url = @state[:result_cache].fetch(cache_key) do - @state[:result_cache][cache_key] = - compute_relative(raw, file_segs, @state[:site_paths], @state[:seg_cache], @state[:baseurl]) + content = tick(:time_rewrite_redirect) do + if site_url.empty? + content + else + file_segs = file_dir_segs_from_rel(rel) + prefix_re = /#{Regexp.escape(site_url)}(\/[^"' >]*)/ + content.gsub(prefix_re) do + raw = Regexp.last_match(1) + cache_key = "#{file_dir}\x00#{raw}" + rel_url = @state[:result_cache].fetch(cache_key) do + @state[:result_cache][cache_key] = + compute_relative(raw, file_segs, @state[:site_paths], @state[:seg_cache], @state[:baseurl]) + end + rel_url || "#{site_url}#{raw}" end - rel_url || "#{site_url}#{raw}" end end - FileUtils.mkdir_p(file_dir) - File.binwrite(out_path, content) + tick(:time_write_redirect) do + FileUtils.mkdir_p(file_dir) + File.binwrite(out_path, content) + end @state[:rewritten_redirects] += 1 else out_path = File.join(@state[:out_dest], rel) @@ -605,25 +655,35 @@ def self.process_page(page) case File.extname(dest_path).downcase when ".html" - content = page.output.dup - @state[:seo_stripped] += 1 if strip_seo!(content) - code_ranges = code_block_ranges(content) - _changed, misses = rewrite_html!(content, file_dir, file_segs, @state[:site_paths], @state[:seg_cache], @state[:result_cache], @state[:baseurl], code_ranges) + content = tick(:time_dispatch_dup) { page.output.dup } + tick(:time_strip_seo) { @state[:seo_stripped] += 1 if strip_seo!(content) } + code_ranges = tick(:time_code_ranges) { code_block_ranges(content) } + _changed, misses = tick(:time_rewrite_html) do + rewrite_html!(content, file_dir, file_segs, @state[:site_paths], @state[:seg_cache], @state[:result_cache], @state[:baseurl], code_ranges) + end @state[:unresolved] += misses - inject_search_setup!(content, file_segs) - FileUtils.mkdir_p(file_dir) - File.binwrite(out_path, content) + tick(:time_inject_search) { inject_search_setup!(content, file_segs) } + tick(:time_write_html) do + FileUtils.mkdir_p(file_dir) + File.binwrite(out_path, content) + end @state[:rewritten_html] += 1 when ".css" - content = page.output.dup - _changed, misses = rewrite_css!(content, file_dir, file_segs, @state[:site_paths], @state[:seg_cache], @state[:result_cache], @state[:baseurl]) + content = tick(:time_dispatch_dup) { page.output.dup } + _changed, misses = tick(:time_rewrite_css) do + rewrite_css!(content, file_dir, file_segs, @state[:site_paths], @state[:seg_cache], @state[:result_cache], @state[:baseurl]) + end @state[:unresolved] += misses - FileUtils.mkdir_p(file_dir) - File.binwrite(out_path, content) + tick(:time_write_css) do + FileUtils.mkdir_p(file_dir) + File.binwrite(out_path, content) + end @state[:rewritten_css] += 1 else - FileUtils.mkdir_p(file_dir) - File.binwrite(out_path, page.output) + tick(:time_write_other) do + FileUtils.mkdir_p(file_dir) + File.binwrite(out_path, page.output) + end @state[:copied_files] += 1 end end @@ -641,20 +701,22 @@ def self.finish(site) return unless @state start = Process.clock_gettime(Process::CLOCK_MONOTONIC) - site.static_files.each do |sf| - dest_path = sf.destination(@state[:dest]).tr("\\", "/") - rel = dest_path[(@state[:dest_root_fs].length + 1)..] - if offline_excluded?(rel, @state[:exclude_patterns]) - @state[:excluded_files] += 1 - next + tick(:time_copy_static) do + site.static_files.each do |sf| + dest_path = sf.destination(@state[:dest]).tr("\\", "/") + rel = dest_path[(@state[:dest_root_fs].length + 1)..] + if offline_excluded?(rel, @state[:exclude_patterns]) + @state[:excluded_files] += 1 + next + end + out_path = File.join(@state[:out_dest], rel) + copy_asset!(dest_path, out_path) + @state[:copied_files] += 1 end - out_path = File.join(@state[:out_dest], rel) - copy_asset!(dest_path, out_path) - @state[:copied_files] += 1 end - js_patches = patch_jtd_js!(@state[:out_dest]) - search_data_built = build_search_data_js!(@state[:out_dest]) + js_patches = tick(:time_patch_jtd) { patch_jtd_js!(@state[:out_dest]) } + search_data_built = tick(:time_search_data) { build_search_data_js!(@state[:out_dest]) } @state[:cumulative_ms] += (Process.clock_gettime(Process::CLOCK_MONOTONIC) - start) * 1000 @@ -667,10 +729,43 @@ def self.finish(site) Jekyll.logger.info "Offlinify:", "patched just-the-docs.js (#{js_patches.join(", ")})" unless js_patches.empty? Jekyll.logger.info "Offlinify:", "wrote #{SEARCH_DATA_JS_REL} (#{search_data_built} bytes)" if search_data_built Jekyll.logger.info "Offlinify:", "Offlinifier ran in #{@state[:cumulative_ms].round(0)}ms." + log_profile_breakdown if @state[:profile] @state = nil end + # Per-operation timing breakdown emitted at the end of `finish` + # when the build was started with `--profile`. The labels match + # the `tick(:time_*)` keys spread across setup, process_page, and + # finish. The sum of the listed rows is close to (but not exactly + # equal to) `cumulative_ms` -- the difference is the per-hook + # plumbing (state lookups, dispatch, `tick` overhead, file_dir + # computation) that isn't itself wrapped in a `tick`. + BREAKDOWN_KEYS = [ + [:time_setup, "setup"], + [:time_dest_path, "dest_path"], + [:time_dispatch_dup, "page.output.dup"], + [:time_strip_seo, "strip_seo"], + [:time_code_ranges, "code_ranges"], + [:time_rewrite_html, "rewrite_html"], + [:time_inject_search, "inject_search"], + [:time_write_html, "write_html"], + [:time_rewrite_css, "rewrite_css"], + [:time_write_css, "write_css"], + [:time_rewrite_redirect, "rewrite_redirect"], + [:time_write_redirect, "write_redirect"], + [:time_write_other, "write_other"], + [:time_copy_static, "copy_static"], + [:time_patch_jtd, "patch_jtd"], + [:time_search_data, "search_data"], + ].freeze + + def self.log_profile_breakdown + BREAKDOWN_KEYS.each do |key, label| + Jekyll.logger.info "Offlinify:", format(" %-18s %7.1fms", label, @state[key]) + end + end + # Copy a file from src to out, creating intermediate directories. # Used for everything in `_site/` that didn't need URL rewriting. def self.copy_asset!(src_path, out_path) From b871d85bd5bed363e946c0ec63ae3aa450e4b593 Mon Sep 17 00:00:00 2001 From: Kuba Sunderland-Ober Date: Sun, 17 May 2026 21:06:49 +0200 Subject: [PATCH 03/15] Speed up offlinify by ~1s. --- docs/_plugins/offlinify.md | 57 +++++++++---------- docs/_plugins/offlinify.rb | 112 ++++++++++++++++--------------------- 2 files changed, 76 insertions(+), 93 deletions(-) diff --git a/docs/_plugins/offlinify.md b/docs/_plugins/offlinify.md index 7145d94..a311226 100644 --- a/docs/_plugins/offlinify.md +++ b/docs/_plugins/offlinify.md @@ -175,11 +175,13 @@ If the original is already correct (e.g. `href="foo.html"` where `foo.html` exis #### Code-block skip -Before the rewrite regex runs, the file's content is scanned once for `` and `
` blocks. The byte ranges of their bodies are passed to the regex callback, which returns the match verbatim when the match offset falls inside any range. The skip has two consequences: +The rewrite regex carries two leading alternatives — `]*>.*?
` and `]*>.*?
` — placed before the href/src alternative. The regex engine consumes any `` or `
` block atomically and never tries the href/src alternative inside it. The gsub callback distinguishes the two outcomes by checking whether the href/src capture group is nil; when it is, the match was a code block, and the callback returns it verbatim. The skip has two consequences:
 
-- Example URLs in tutorial code samples (e.g. `