workflow: check TiDB code PRs weekly to update docs#22801
workflow: check TiDB code PRs weekly to update docs#22801hfxsd wants to merge 438 commits intopingcap:masterfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a Python script to automate the weekly identification of merged TiDB PRs that may require documentation updates based on labels, keywords, and file paths. Feedback includes addressing a logic error in the time window calculation to avoid timezone mismatches and overlapping reports, using word boundaries for keyword matching to prevent false positives, removing redundant keywords, and implementing error handling for GitHub API requests.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Update the weekly docs-check workflow to scan merged code PRs across the pingcap organization instead of only pingcap/tidb. Introduce SOURCE_ORG and EXCLUDED_REPOS env vars (defaulting to pingcap/docs and pingcap/docs-cn), update job/name strings, commit/title templates, and the PR body/link to reflect the org-wide scan and excluded repos.
Add automation to apply heuristic weekly doc updates to docs-cn. - Add scripts/apply_weekly_docs_cn_updates.py: reads the scan JSON, maps PRs/paths to target docs, and appends a "weekly code sync" section to matched docs; writes a summary to docs-cn/weekly-doc-sync/applied-doc-updates.json. - Update workflow .github/workflows/tidb-pr-weekly-doc-check.yml to copy the scan JSON, run the new script when updates are needed, and include the update summary in the created PR (also adjust PR copy text). - Update scripts/check_tidb_prs_and_create_docs_cn_pr.py to include changed_files in the PR payload so the apply script can resolve target docs. This automates applying simple, heuristic doc notes from the weekly PR scan and generates a summary for maintainers to review.
Refactor the weekly doc-check workflow into a scan job that discovers candidate PRs and a downstream create-pr-per-source job that processes each candidate in parallel. The scan job now emits outputs (needs_update, candidates_count, candidates_matrix, paths, window dates) and uploads a JSON/report artifact; a TARGET_BRANCH_MAP env var is introduced to map source base branches to docs target branches. apply_weekly_docs_cn_updates.py was rewritten to apply updates for a single source PR, add a per-PR marker/note, and write an applied-<repo>-<pr>.json summary. check_tidb_prs_and_create_docs_cn_pr.py now accepts TARGET_BRANCH_MAP, records source_base_branch for PRs, builds a matrix of candidates, and writes candidates_count/candidates_matrix outputs for the workflow.
Add an EXTRA_REPOS env var to the workflow and job so additional repos (default tikv/tikv,tikv/pd) can be passed into the doc-check job. Parse EXTRA_REPOS in the script and merge it into the list of source repositories, then remove any EXCLUDED_REPOS before returning the final sorted repo list. This allows including extra repositories in the weekly TiDB PR documentation checks.
Move collect_source_pr_doc_candidates.py and apply_source_pr_docs_cn_updates.py into scripts/source-pr-doc-sync/, relocate the source-pr-doc-sync doc to scripts/source-pr-doc-sync/, and update .github/workflows/source-pr-doc-sync.yml to reference the new script paths. This reorganizes the repository to group the source PR doc sync automation files for clearer structure.
Add configurable in-page insertion for Source PR sync notes. A new section-preferences.json provides per-path and default heading preference lists. The script now loads these preferences, finds matching markdown headings (via regex), and inserts the note block under the highest-priority matched section; if no match is found it falls back to appending at the file end. Refactor includes build_note_block, load/choose preference helpers, find_section_insert_line, and updating append_pr_note/main to use the new behavior. Also update docs to document the new configuration file and change the note header to "Source PR sync note".
First-time contributors' checklist
What is changed, added or deleted? (Required)
This PR redesigns the docs sync automation from a weekly aggregated docs-cn PR into a per-source-PR workflow, and adds better operability for event-driven and manual runs.
Key changes:
.github/workflows/tidb-pr-weekly-doc-check.ymlinto two stages:scan: discover merged source PR candidates that likely require docs updates.create-pr-per-source: create one docs-cn PR per source PR candidate.TARGET_BRANCH_MAP.repository_dispatch(docs-sync-source-pr) and enhancedworkflow_dispatchinputs.MAX_CANDIDATES_PER_RUNFORCE_SOURCE_REPO+FORCE_SOURCE_PR_NUMBER)tikv/tikvandtikv/pdviaEXTRA_REPOS.scripts/check_tidb_prs_and_create_docs_cn_pr.pyscripts/apply_weekly_docs_cn_updates.pydocs/automation/weekly-doc-sync.mdWhich TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?