Skip to content

feat: A reconciler to actually keep our records in check#4991

Merged
another-rex merged 19 commits intogoogle:masterfrom
another-rex:reconciler
Mar 19, 2026
Merged

feat: A reconciler to actually keep our records in check#4991
another-rex merged 19 commits intogoogle:masterfrom
another-rex:reconciler

Conversation

@another-rex
Copy link
Contributor

@another-rex another-rex commented Mar 10, 2026

Wrote a reconciler so that we catch any missing records within 30 hours of it happening.

This acts like a last resort to help us identify leaks in our ingestion pipeline, and ideally should not be actually doing any reimporting.

TODO: We might need to keep a list in the datastore of known bad entries so we avoid erroring on them constantly.

Copy link
Member

@michaelkedar michaelkedar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we s/datastore/database (or similar) throughout (since the underlying database is abstracted away)

if sourceRepo.Type == models.SourceRepositoryTypeGit && sourceRepo.Git != nil {
if existingBranch, ok := gitBranches[sourceRepo.Git.URL]; ok {
if existingBranch != sourceRepo.Git.Branch {
return fmt.Errorf("multiple sources with same git url %q but different branches (%q and %q) is not supported",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this stop the entire reconciler?
If not, we could just combine this loop with the one afterwards, and not store the sourceRepos

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should stop the entire reconciler, the concurrency model only holds if all ecosystems on the same repo only does read operations. Otherwise one ecosystem would change the branch while the other ecosystem is still walking the fs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I meant we could still try reconcile the other source repositories that don't use this same git URL (and also try reconcile the first one encountered).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I just want it to completely break if that assumption is broken so we can go and fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have this in base, but no patch in the production environment, meaning it'll get deployed to prod without the PersistentVolume?
Either add the prod one (if you want it), or remove this from base, and add it as a resource (instead of a patch) to the test instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now fixed.

// - parsed: A pointer to a pre-parsed gjson.Result if already available. If nil, sourceRecord is opened and parsed dynamically.
// - sourceRecord: The interface used to lazily retrieve file contents (from a bucket, git, or REST).
// - format: RecordFormat representing the format we expect to decode (JSON, YAML, etc.).
func checkReconcile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blocks the SourceRepository walking to download the record from GCS/REST (or open the file when in git), then sends it to the importerWorker which downloads the file again to do validation/send to worker.

IMO most of this function should be in the pool of workers for the reconciler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, reworked now to actually use workers effectively, also kinda refactored importer a little bit, turns out changes are smaller than I expected.

michaelkedar
michaelkedar previously approved these changes Mar 17, 2026
Copy link
Member

@michaelkedar michaelkedar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM

if sourceRepo.Type == models.SourceRepositoryTypeGit && sourceRepo.Git != nil {
if existingBranch, ok := gitBranches[sourceRepo.Git.URL]; ok {
if existingBranch != sourceRepo.Git.Branch {
return fmt.Errorf("multiple sources with same git url %q but different branches (%q and %q) is not supported",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I meant we could still try reconcile the other source repositories that don't use this same git URL (and also try reconcile the first one encountered).

Comment on lines +383 to +385
Import Action = iota
Reconcile
Withdraw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should these be named ActionImport, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Comment on lines +411 to +412
// We keep reading from the channel (even if context is closed)
// as the channel will be closed by the producer side.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the explanation here is a bit spurious.
Closing a channel does not discard unpulled messages.
IIRC, what's actually happening is the senders block on the channel being read, and presumably the senders will exit early when the context is cancelled.


dbMod, exists := dbRecords[path]
if !exists {
logger.ErrorContext(ctx, "Found missing vulnerability in database during reconcile",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit ambivalent on using Error logs here (and for the Found outdated vulnerability logs).
On one hand, this is not an error - this is what we want the reconciler to do.
On the other, it's probably good to alert us when our database is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for now I made this warning, we can switch to error once upstream errors are lower.

michaelkedar
michaelkedar previously approved these changes Mar 18, 2026
@another-rex another-rex merged commit c3fc471 into google:master Mar 19, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants