perf(importers): batch Vulnerability_Id inserts by valentijnscholten · Pull Request #14966 · DefectDojo/django-DefectDojo

valentijnscholten · 2026-06-07T09:33:41Z

Summary

Fix sanitize_vulnerability_ids in dojo/finding/helper.py — was a silent no-op (reassigned local variable, return value never used by caller); now returns the filtered list
Replace per-row Vulnerability_Id.save() loop in save_vulnerability_ids with bulk_create, reducing N INSERT queries to 1 per finding for all callers (importer and reimporter)
Add cross-finding batch accumulation in DefaultImporter: store_vulnerability_ids appends Vulnerability_Id objects to self.pending_vulnerability_ids; flush_vulnerability_ids() bulk-inserts the whole batch at each batch boundary alongside location_handler.persist()
Add cross-finding batch accumulation in DefaultReImporter: reconcile_vulnerability_ids appends changed findings to self.pending_vuln_id_deletes and new IDs to self.pending_vulnerability_ids; flush_vulnerability_ids() issues one DELETE WHERE finding_id IN (...) followed by one bulk_create per batch; findings with unchanged IDs hit the early-exit path and never touch either buffer

For a scan with 1000 findings × 5 CVEs each: ~5000 INSERT queries reduced to O(number of batches) bulk_create calls. Same reduction applies on reimport for findings whose vulnerability IDs changed.

Replace per-row Vulnerability_Id saves with bulk_create in two layers: - fix sanitize_vulnerability_ids to return filtered list (was a no-op bug — reassigned local variable, caller never saw the result) - save_vulnerability_ids now uses bulk_create per finding instead of one INSERT per ID; fixes all callers including the reimporter path - DefaultImporter.store_vulnerability_ids accumulates Vulnerability_Id objects across all findings in a batch; flush_vulnerability_ids() does a single bulk_create at each batch boundary (alongside location_handler.persist()) For a scan with 1000 findings × 5 CVEs each: 5000 INSERT queries reduced to O(batches) bulk_create calls.

Extend the cross-finding accumulation pattern to DefaultReImporter: - reconcile_vulnerability_ids now accumulates changed findings into pending_vuln_id_deletes / pending_vulnerability_ids instead of issuing per-finding DELETE + INSERT immediately - flush_vulnerability_ids (BaseImporter) runs one bulk DELETE WHERE finding_id IN (...) followed by one bulk_create for all new IDs - flush called at both dedupe batch boundaries (alongside location_handler.persist()) and after the mitigation loop Early-exit path (unchanged IDs) never touches either buffer, so the common case pays zero extra cost. Add two unit tests: cross-finding batch (3 findings, 2 changed + 1 unchanged, verify buffer contents before flush and DB state after) and unchanged-IDs early-exit (verify buffers stay empty).

Remove pending-rebaseline skips from TestDojoImporterPerformanceSmall and TestDojoImporterPerformanceSmallLocations. Update all expected query counts to reflect the batch Vulnerability_Id insert optimisation (counts decrease by 1-20 queries per step depending on the scan size and code path).

The test mocked Vulnerability_Id.save (individual saves) but save_vulnerability_ids now uses bulk_create. Django's bulk_create validates FK references before issuing SQL, raising ValueError when the finding has no pk. Mock bulk_create instead and assert on the deduplicated object list passed to it.

github-actions Bot added the unittests label Jun 7, 2026

valentijnscholten added 2 commits June 7, 2026 11:39

valentijnscholten added this to the 2.60.0 milestone Jun 7, 2026

valentijnscholten closed this Jun 7, 2026

valentijnscholten reopened this Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(importers): batch Vulnerability_Id inserts#14966

perf(importers): batch Vulnerability_Id inserts#14966
valentijnscholten wants to merge 4 commits into
DefectDojo:devfrom
valentijnscholten:batch-vulnerability-id-inserts

valentijnscholten commented Jun 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

valentijnscholten commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

valentijnscholten commented Jun 7, 2026 •

edited

Loading