feat: A reconciler to actually keep our records in check#4991
feat: A reconciler to actually keep our records in check#4991another-rex merged 19 commits intogoogle:masterfrom
Conversation
michaelkedar
left a comment
There was a problem hiding this comment.
can we s/datastore/database (or similar) throughout (since the underlying database is abstracted away)
| if sourceRepo.Type == models.SourceRepositoryTypeGit && sourceRepo.Git != nil { | ||
| if existingBranch, ok := gitBranches[sourceRepo.Git.URL]; ok { | ||
| if existingBranch != sourceRepo.Git.Branch { | ||
| return fmt.Errorf("multiple sources with same git url %q but different branches (%q and %q) is not supported", |
There was a problem hiding this comment.
Should this stop the entire reconciler?
If not, we could just combine this loop with the one afterwards, and not store the sourceRepos
There was a problem hiding this comment.
Yes, this should stop the entire reconciler, the concurrency model only holds if all ecosystems on the same repo only does read operations. Otherwise one ecosystem would change the branch while the other ecosystem is still walking the fs.
There was a problem hiding this comment.
I guess I meant we could still try reconcile the other source repositories that don't use this same git URL (and also try reconcile the first one encountered).
There was a problem hiding this comment.
I think I just want it to completely break if that assumption is broken so we can go and fix it.
There was a problem hiding this comment.
You have this in base, but no patch in the production environment, meaning it'll get deployed to prod without the PersistentVolume?
Either add the prod one (if you want it), or remove this from base, and add it as a resource (instead of a patch) to the test instance
| // - parsed: A pointer to a pre-parsed gjson.Result if already available. If nil, sourceRecord is opened and parsed dynamically. | ||
| // - sourceRecord: The interface used to lazily retrieve file contents (from a bucket, git, or REST). | ||
| // - format: RecordFormat representing the format we expect to decode (JSON, YAML, etc.). | ||
| func checkReconcile( |
There was a problem hiding this comment.
This blocks the SourceRepository walking to download the record from GCS/REST (or open the file when in git), then sends it to the importerWorker which downloads the file again to do validation/send to worker.
IMO most of this function should be in the pool of workers for the reconciler.
There was a problem hiding this comment.
Ok, reworked now to actually use workers effectively, also kinda refactored importer a little bit, turns out changes are smaller than I expected.
| if sourceRepo.Type == models.SourceRepositoryTypeGit && sourceRepo.Git != nil { | ||
| if existingBranch, ok := gitBranches[sourceRepo.Git.URL]; ok { | ||
| if existingBranch != sourceRepo.Git.Branch { | ||
| return fmt.Errorf("multiple sources with same git url %q but different branches (%q and %q) is not supported", |
There was a problem hiding this comment.
I guess I meant we could still try reconcile the other source repositories that don't use this same git URL (and also try reconcile the first one encountered).
go/internal/importer/importer.go
Outdated
| Import Action = iota | ||
| Reconcile | ||
| Withdraw |
There was a problem hiding this comment.
should these be named ActionImport, etc?
go/internal/importer/importer.go
Outdated
| // We keep reading from the channel (even if context is closed) | ||
| // as the channel will be closed by the producer side. |
There was a problem hiding this comment.
Hmm the explanation here is a bit spurious.
Closing a channel does not discard unpulled messages.
IIRC, what's actually happening is the senders block on the channel being read, and presumably the senders will exit early when the context is cancelled.
go/internal/importer/importer.go
Outdated
|
|
||
| dbMod, exists := dbRecords[path] | ||
| if !exists { | ||
| logger.ErrorContext(ctx, "Found missing vulnerability in database during reconcile", |
There was a problem hiding this comment.
I'm a bit ambivalent on using Error logs here (and for the Found outdated vulnerability logs).
On one hand, this is not an error - this is what we want the reconciler to do.
On the other, it's probably good to alert us when our database is wrong.
There was a problem hiding this comment.
Yeah, for now I made this warning, we can switch to error once upstream errors are lower.
Wrote a reconciler so that we catch any missing records within 30 hours of it happening.
This acts like a last resort to help us identify leaks in our ingestion pipeline, and ideally should not be actually doing any reimporting.
TODO: We might need to keep a list in the datastore of known bad entries so we avoid erroring on them constantly.