Skip to content

chore: import scorecards next to deps.dev data from openssf dataset (CM-1227)#4196

Merged
themarolt merged 9 commits into
mainfrom
feat/openssf-scorecard-ingest-CM-1227
Jun 12, 2026
Merged

chore: import scorecards next to deps.dev data from openssf dataset (CM-1227)#4196
themarolt merged 9 commits into
mainfrom
feat/openssf-scorecard-ingest-CM-1227

Conversation

@themarolt

@themarolt themarolt commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Ingests OpenSSF Scorecard data from BigQuery into packages-db, populating repos.scorecard_score, repos.scorecard_last_run_at, and repo_scorecard_checks (per-check detail). Sourced from openssf.scorecardcron.scorecard-v2_latest (~1.3M repos, ~18M check rows, full rescan weekly). Runs as the final child
workflow in bootstrapOsspckgs.

Also renames the deps-dev-ingest task queue / binary / scripts to bq-dataset-ingest since the worker now handles data from multiple BQ sources, not just deps.dev.

Changes

  • New src/scorecard/ moduleingestScorecard Temporal workflow: exports two BQ queries (aggregate scores + per-check detail via UNNEST), loads via GCS parquet chunks into staging tables, merges into repos (UPDATE) and repo_scorecard_checks (INSERT ON CONFLICT DO UPDATE). Follows the same chunked
    pattern as ingestAdvisories.
  • bootstrapOsspckgsingestScorecard wired as the final executeChild call; runs when kinds is unset or includes 'scorecard'; intentionally last since it UPDATEs repos that must already exist from the deps-dev ingest.
  • Rename deps-dev-ingestbq-dataset-ingest — binary, yaml, task queue strings, package.json scripts; mechanical rename only, no logic changes.
  • Migration V1781074345 — extends osspckgs_ingest_jobs.job_kind CHECK constraint to include scorecard_repos and scorecard_checks (without this, every ingest job creation throws a constraint violation).
  • Bootstrap schedule 0 2 * * 00 2 * * 1 — deps.dev and scorecard both publish Sunday ~21:00 UTC; Monday 02:00 UTC ensures we pick up fresh data same week.
  • triggerBootstrap.ts'scorecard' added to VALID_KINDS so --kinds scorecard works from CLI.
  • exportToBucket.tsscorecard_repos and scorecard_checks parts added; supports --dry-run, resume, and parts=all.
  • updated_at = NOW() added to both merge SQLs — Tinybird sync watermark advances on re-ingest for repos and repo_scorecard_checks.

Type of change

  • Bug fix
  • New feature
  • Refactor / cleanup
  • Performance improvement
  • Chore / dependency update
  • Documentation

JIRA ticket

https://linuxfoundation.atlassian.net/browse/CM-1227


Note

Medium Risk
Touches production ingest orchestration (Temporal schedules, task queue rename, DB CHECK migration) and bulk updates to repos and repo_scorecard_checks; mis-deployed queue names or skipped migration would break ingest job creation or leave scorecard data stale.

Overview
Adds OpenSSF Scorecard ingestion from BigQuery (openssf.scorecardcron) into packages-db: a new ingestScorecard Temporal workflow exports aggregate repo scores and per-check rows (UNNEST), loads them through the existing BQ→GCS→staging→merge pipeline, and updates repos.scorecard_score / scorecard_last_run_at plus upserts repo_scorecard_checks. It runs as the last child of bootstrapOsspckgs when kinds is unset or includes scorecard, after deps.dev has populated repos.

Renames the multi-source BQ worker from deps-dev-ingest to bq-dataset-ingest (compose, task queues, scripts, export-to-bucket service). Drops the monolithic packages-worker entrypoint and its compose file; npm/osv/maven schedules are no longer registered from that stub. Weekly bootstrap moves from Sunday 02:00 to Monday 02:00 UTC to align with Sunday BQ publishes.

A Flyway migration extends osspckgs_ingest_jobs.job_kind with scorecard_repos and scorecard_checks; TypeScript job kinds and CLI/export tooling (exportToBucket, trigger-bootstrap --kinds scorecard) are updated to match.

Reviewed by Cursor Bugbot for commit 42f4871. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings June 11, 2026 06:38
Comment thread services/apps/packages_worker/src/deps-dev/schedules/bootstrap.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new OpenSSF Scorecard ingestion path to the packages_worker BigQuery→GCS→Postgres bootstrap pipeline, persisting both aggregate repo-level Scorecard results and per-check details into packages-db. It also renames the existing deps.dev ingest worker/task-queue to a more general bq-dataset-ingest to reflect ingestion from multiple BigQuery datasets.

Changes:

  • Added a new scorecard/ module with BigQuery export SQL and a Temporal workflow that loads/merges Scorecard repo aggregates and per-check rows into Postgres via chunked parquet staging.
  • Wired Scorecard ingestion into bootstrapOsspckgs (runs last) and extended CLI/export tooling to support scorecard/scorecard_* kinds/parts.
  • Renamed the deps.dev ingest worker/task queue from deps-dev-ingestbq-dataset-ingest, adjusted schedules/scripts, and added a DB migration to allow new ingest job kinds.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated no comments.

Show a summary per file
File Description
services/libs/data-access-layer/src/osspckgs/ingestJobs.ts Extends OsspckgsJobKind union to include Scorecard job kinds used by the ingest pipeline.
services/apps/packages_worker/src/workflows/index.ts Exposes ingestScorecard from the workflows barrel for worker registration/usage.
services/apps/packages_worker/src/scripts/triggerBootstrap.ts Allows --kinds scorecard and routes bootstrap to the renamed bq-dataset-ingest task queue.
services/apps/packages_worker/src/scripts/exportToBucket.ts Adds Scorecard export “parts” and SQL wiring for pre-exporting Scorecard data to GCS.
services/apps/packages_worker/src/scorecard/workflows/ingestScorecard.ts Implements the chunked Scorecard ingest workflow (export → parquet staging → merge/update).
services/apps/packages_worker/src/scorecard/workflows/index.ts Barrel export for the new Scorecard workflow module.
services/apps/packages_worker/src/scorecard/queries/scorecardSql.ts Defines BigQuery SQL for Scorecard aggregate rows and per-check UNNEST rows.
services/apps/packages_worker/src/schedules/cleanup.ts Updates cleanup schedule to use the renamed bq-dataset-ingest task queue.
services/apps/packages_worker/src/deps-dev/workflows/bootstrapOsspckgs.ts Runs Scorecard ingest as the final child workflow when kinds is unset or includes scorecard.
services/apps/packages_worker/src/deps-dev/schedules/bootstrap.ts Moves weekly bootstrap cron to Monday 02:00 UTC and updates task queue name.
services/apps/packages_worker/src/deps-dev/config.ts Adds the SCORECARD_DATASET constant for Scorecard BigQuery sourcing.
services/apps/packages_worker/src/bin/bq-dataset-ingest.ts New ingest worker entrypoint that registers bootstrap/cleanup schedules and starts the service worker.
services/apps/packages_worker/package.json Renames scripts from deps-dev-ingest to bq-dataset-ingest and updates export tooling SERVICE wiring.
scripts/services/bq-dataset-ingest.yaml Renames docker-compose service/hostname/command/env to match bq-dataset-ingest.
backend/src/osspckgs/migrations/V1781074345__add-scorecard-job-kinds.sql Extends the osspckgs_ingest_jobs.job_kind CHECK constraint to include new Scorecard kinds.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@themarolt themarolt requested a review from mbani01 June 11, 2026 07:30
Copilot AI review requested due to automatic review settings June 11, 2026 09:32

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 75bedbc. Configure here.

Comment thread services/apps/packages_worker/src/scorecard/queries/scorecardSql.ts

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 21 changed files in this pull request and generated 4 comments.

Comment thread services/libs/data-access-layer/src/osspckgs/ingestJobs.ts
Comment thread services/apps/packages_worker/package.json Outdated
Comment thread services/apps/packages_worker/src/scorecard/queries/scorecardSql.ts Outdated
Comment thread services/apps/packages_worker/src/scorecard/queries/scorecardSql.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
mbani01
mbani01 previously approved these changes Jun 11, 2026
Copilot AI review requested due to automatic review settings June 11, 2026 12:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 21 changed files in this pull request and generated 2 comments.

Comment thread services/apps/packages_worker/src/scorecard/queries/scorecardSql.ts Outdated
Comment thread services/apps/packages_worker/src/scorecard/queries/scorecardSql.ts Outdated
Signed-off-by: Uroš Marolt <uros@marolt.me>
Copilot AI review requested due to automatic review settings June 12, 2026 06:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 21 changed files in this pull request and generated 2 comments.

Comment thread services/apps/packages_worker/src/deps-dev/schedules/bootstrap.ts
Comment thread services/apps/packages_worker/src/schedules/cleanup.ts
@themarolt themarolt merged commit 90c86b2 into main Jun 12, 2026
16 checks passed
@themarolt themarolt deleted the feat/openssf-scorecard-ingest-CM-1227 branch June 12, 2026 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants