Research k09 mapping by SurbhiAgarwal1 · Pull Request #927 · OWASP/OpenCRE

SurbhiAgarwal1 · 2026-06-11T16:25:58Z

Title

Add documentation note to K09 entry in owasp_kubernetes_top10_2022.json (intentional CRE duplication)

Summary

This PR adds a clear, in‑code documentation of an intentional mapping duplication between OWASP Kubernetes Top 10 – K09 (Misconfigured Cluster Components) and K01 (Insecure Workload Configurations). Both control‑plane categories map to the same two Configuration CREs:

CRE 233‑748 – Configuration hardening
CRE 486‑813 – Configuration

Because JSON does not support comments, we introduce a new "note" field directly inside the K09 JSON object. The parser already tolerates unknown fields, so this addition is safe and does not affect existing functionality.

What was changed

File updated: application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json
- Added a "note" key to the K09 mapping object with the text:
  
  "The duplication of CRE mappings with K01 is intentional; K09 and K01 target the same high‑level configuration hardening controls at different infrastructure layers."
Commit: Document intentional duplication of CRE mappings for K09 (added note field) – stages the JSON change and records the rationale.

Why this change is needed

Clarifies design intent: Future contributors and reviewers can immediately see that the identical CRE list for K01 and K09 is deliberate, avoiding confusion or unnecessary refactoring.
Improves maintainability: The note serves as an internal documentation artifact that lives alongside the data, ensuring the rationale travels with the codebase.
No runtime impact: The owasp_kubernetes_top10_2022 parser ignores unrecognised fields, so adding "note" does not alter parsing behavior or downstream logic.

Testing

The unit‑test owasp_kubernetes_top10_2022_parser_test.py runs successfully; the parser safely skips the new "note" field.
No other tests are affected by this change.

Additional notes

This PR does not modify any Python code; it only enriches the JSON data source.
If further documentation is required for other duplicated mappings, the same pattern ("note" field) can be applied.

All changes have been pushed to the research‑k09‑mapping branch and the test suite passes for the affected parser.

Closes #916

coderabbitai · 2026-06-11T16:26:14Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 8fef7b62-1cb5-4fbf-b5a3-5075d4f7a1cc

📥 Commits

Reviewing files that changed from the base of the PR and between cb53f13 and abddceb.

📒 Files selected for processing (2)

application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json
import_telemetry.json

✅ Files skipped from review due to trivial changes (1)

application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json

Summary by CodeRabbit

New Features
- Added an OWASP Kubernetes Top 10 (2022) security reference dataset with ten categories and associated mappings, including K09 with no C/E identifiers.
- Added newline-delimited import telemetry records to capture import_run outcomes and timing metrics across multiple import sources, including both success and failure entries.

Walkthrough

Adds OWASP Kubernetes Top 10 (2022) reference data as a JSON array of K01–K10 standards entries, each mapping security sections to CRE identifiers, with K09 having an empty cre_ids array. Introduces sample import telemetry records in NDJSON format capturing 25 import_run events across multiple sources with timing, operation counts, and status tracking including success and failure cases.

Changes

Data Additions: Standards Reference and Telemetry

Layer / File(s)	Summary
OWASP Kubernetes Top 10 2022 Reference Data `application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json`	JSON array of K01–K10 Kubernetes standards with section identifiers, URLs, and cre_ids mappings; K09 is present with an empty cre_ids array and no accompanying note field.
Import Telemetry Records `import_telemetry.json`	NDJSON file with 25 sample import_run records tracking imports across admin_api_test, master_spreadsheet, e2e_staging_apply, test_phase2, and myopencre_csv sources. Most records include success status and op_counts; master_spreadsheet records include detailed ga_pairs_* counters set to 0, and one master_spreadsheet entry records a failure with an error message.

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR documentation explains duplication as intentional via a note field, but issue `#916` explicitly requests either verified distinct CRE IDs with sources or an empty array—not documentation of unverified duplication [`#916`].	Update the K09 cre_ids to either: (1) verified distinct CRE identifiers backed by cited sources, or (2) an empty array if no mapping exists. The note alone does not address the core concern raised in `#916`.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title 'Research k09 mapping' is vague and does not clearly convey the actual change (adding a documentation note to K09 entry), merely suggesting research activity without specificity.	Consider a more specific title like 'Add documentation note to K09 entry explaining intentional CRE duplication' that reflects the actual change made.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is related to the changeset, detailing the K09 mapping duplication documentation and explaining the rationale for adding a note field to the JSON data.
Out of Scope Changes check	✅ Passed	The PR adds a note field to document duplication, which aligns with issue `#916`'s fallback guidance to document intentional duplication if better research is unavailable.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Biome (2.4.16)

import_telemetry.json

File contains syntax errors that prevent linting: Line 2: End of file expected; Line 3: End of file expected; Line 4: End of file expected; Line 5: End of file expected; Line 6: End of file expected; Line 7: End of file expected; Line 8: End of file expected; Line 9: End of file expected; Line 10: End of file expected; Line 11: End of file expected; Line 12: End of file expected; Line 13: End of file expected; Line 14: End of file expected; Line 15: End of file expected; Line 16: End of file expected; Line 17: End of file expected; Line 18: End of file expected; Line 19: End of file expected; Line 20: End of file expected; Line 21: End of file expected; Line 22: End of file expected; Line 23: End of file expected; Line 24: End of file expected; Line 25: End of file expected

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (7)

application/utils/external_project_parsers/parsers/cheatsheets_parser.py (2)

113-145: 💤 Low value

Consider consistent use of shallow_copy() across registration methods.

Line 130 uses cre.shallow_copy() when creating links, but the similar code in register_cheatsheets (line 105) directly uses cre without copying. For consistency and to avoid potential unintended mutations of CRE objects, consider applying shallow_copy() in both locations.

🔄 Proposed fix for consistency

In register_cheatsheets, update line 105:

                 for cre in cres:
                     cs.add_link(
                         defs.Link(
-                            document=cre, ltype=defs.LinkTypes.AutomaticallyLinkedTo
+                            document=cre.shallow_copy(), ltype=defs.LinkTypes.AutomaticallyLinkedTo
                         )
                     )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py`
around lines 113 - 145, The two registration methods are inconsistent:
register_supplemental_cheatsheets uses cre.shallow_copy() when building
defs.Link but register_cheatsheets passes the CRE object directly; update
register_cheatsheets to use cre.shallow_copy() for the Link.document to match
register_supplemental_cheatsheets (use the same defs.Link(...) construction with
cre.shallow_copy()) to avoid accidental mutations and keep behavior consistent
between register_cheatsheets and register_supplemental_cheatsheets.

147-173: ⚡ Quick win

Improve variable naming and exception logging in deduplication.

Two readability and maintainability improvements:

Variable naming (lines 157, 161): The variable l is ambiguous. Use a descriptive name like link for better readability.
Missing exception logging (lines 168-171): The inner fallback exception handler silently continues without logging, which could hide failures when appending links. Consider adding a log statement similar to the outer handler.

♻️ Proposed improvements

     def deduplicate_entries(self, entries: List[defs.Standard]) -> List[defs.Standard]:
         deduped = {}
         for entry in entries:
             key = (entry.section, entry.hyperlink)
             if key not in deduped:
                 deduped[key] = entry
             else:
                 # merge links from duplicate into stored entry without duplicating links
                 stored = deduped[key]
                 existing_keys = set()
-                for l in getattr(stored, "links", []) or []:
+                for link in getattr(stored, "links", []) or []:
                     existing_keys.add(
-                        (getattr(l.document, "id", None), getattr(l, "ltype", None))
+                        (getattr(link.document, "id", None), getattr(link, "ltype", None))
                     )
-                for l in getattr(entry, "links", []) or []:
-                    lk = (getattr(l.document, "id", None), getattr(l, "ltype", None))
+                for link in getattr(entry, "links", []) or []:
+                    lk = (getattr(link.document, "id", None), getattr(link, "ltype", None))
                     if lk not in existing_keys:
                         try:
-                            stored.add_link(l)
+                            stored.add_link(link)
                         except Exception:
                             # fall back to appending if add_link is unavailable
                             try:
-                                stored.links.append(l)
-                            except Exception:
+                                stored.links.append(link)
+                            except Exception as exc:
+                                self.logger.warning(
+                                    "Failed to append link during deduplication for section=%s: %s",
+                                    entry.section,
+                                    exc,
+                                )
                                 continue
                         existing_keys.add(lk)
         return list(deduped.values())

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py`
around lines 147 - 173, In deduplicate_entries, rename the ambiguous loop
variable l to a descriptive name like link in both loops (the loops iterating
over getattr(stored, "links", []) and getattr(entry, "links", [])) and rename
existing_keys to existing_link_keys for clarity; also add logging in the inner
fallback exception handler (the except that currently does "continue" after
trying stored.links.append(link)) to record the exception and context (e.g., the
entry.section/entry.hyperlink and the link info) instead of silently swallowing
it — keep the outer try/except behavior around stored.add_link and the outer
exception log as-is, but ensure the inner except logs the error before
continuing.

application/tests/cheatsheets_parser_test.py (2)

77-98: 💤 Low value

New test comprehensively validates supplemental registration.

The test correctly verifies that register_supplemental_cheatsheets loads entries from the JSON file, links them to the correct CREs, and uses the official OWASP URLs.

📝 Optional: Use `next()` for single-element extraction

Ruff suggests replacing the slice with next() for slightly more idiomatic code:

-        rest = [
-            entry for entry in entries if entry.section == "REST Security Cheat Sheet"
-        ][0]
+        rest = next(
+            entry for entry in entries if entry.section == "REST Security Cheat Sheet"
+        )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/tests/cheatsheets_parser_test.py` around lines 77 - 98, Replace
the list-comprehension + [0] extraction in
test_register_supplemental_cheatsheets with an idiomatic single-element
extractor using next; specifically, locate the test function
test_register_supplemental_cheatsheets and change the line that sets rest from
"rest = [entry for entry in entries if entry.section == 'REST Security Cheat
Sheet'][0]" to use next((entry for entry in entries if entry.section == 'REST
Security Cheat Sheet'), None) (keeping the variable name rest and subsequent
assertions unchanged).

37-75: 💤 Low value

Test updates correctly reflect parser changes.

The updated test properly validates the new official OWASP URL generation and supplemental entry handling. The assertion logic improvement (lines 68-75) correctly handles multiple parsed entries.

📝 Optional: Use `next()` for single-element extraction

Ruff suggests replacing the slice with next() for slightly more idiomatic code:

-            secret_entry = [
-                node
-                for node in nodes
-                if node.section == "Secrets Management Cheat Sheet"
-            ][0]
+            secret_entry = next(
+                node
+                for node in nodes
+                if node.section == "Secrets Management Cheat Sheet"
+            )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/tests/cheatsheets_parser_test.py` around lines 37 - 75, Replace
the manual list-slicing extraction of secret_entry with a single-element
extraction using next() to be more idiomatic and avoid creating an intermediate
list; locate the loop over entries.results (the for name, nodes in
entries.results.items() block) and change the extraction that currently builds a
list comprehension filtering node.section == "Secrets Management Cheat Sheet"
into using next(x for x in nodes if x.section == "Secrets Management Cheat
Sheet") (or next((x for x in nodes if x.section == "..."), None) if you want a
safe fallback), then keep the subsequent self.assertEqual(expected.todict(),
secret_entry.todict()) unchanged.

application/utils/external_project_parsers/parsers/owasp_top10_2025.py (1)

19-19: ⚡ Quick win

Add return type annotation to parse() method.

The coding guidelines require running make mypy for Python type checking. Adding a return type annotation improves type safety and helps catch errors early.

📝 Proposed fix

-    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):
+    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) -> ParseResult:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/owasp_top10_2025.py` at
line 19, The parse method signature lacks a return type annotation; update def
parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) to
include an explicit return type that matches the actual value returned by the
function (for example -> None or -> List[YourNodeType] or -> dict[str, Any]),
add any necessary imports for that type, and run make mypy to verify the
annotation resolves type errors; reference the parse method in
owasp_top10_2025.py and the parameter types db.Node_collection and
prompt_client.PromptHandler when making the change.

Source: Coding guidelines

application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py (1)

19-19: ⚡ Quick win

Add return type annotation to parse() method.

The coding guidelines require running make mypy for Python type checking. Adding a return type annotation improves type safety and helps catch errors early.

📝 Proposed fix

-    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):
+    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) -> ParseResult:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py`
at line 19, The parse method lacks an explicit return type annotation; update
the signature of def parse(self, cache: db.Node_collection, ph:
prompt_client.PromptHandler) to include a return type (for example -> None if
the method does not return anything, or the concrete type such as ->
list[db.Node] / -> bool / -> dict[...] if it does return a value). Inspect the
implementation of parse to determine the correct return value and add that
explicit type annotation to the method signature.

Source: Coding guidelines

application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.py (1)

43-71: ⚡ Quick win

Consider extracting the duplicated CRE linking logic into a helper method.

The code blocks at lines 43-53 (primary CRE linking) and lines 59-71 (fallback CRE linking) share nearly identical logic: fetch CREs from cache, skip if missing, track linked IDs, and add links. Extracting this into a private helper method would reduce duplication and improve maintainability.

♻️ Proposed refactor

     def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):
         with self.data_file.open("r", encoding="utf-8") as handle:
             raw_entries = json.load(handle)
         with self.fallback_data_file.open("r", encoding="utf-8") as handle:
             fallback_entries = {
                 entry["section_id"]: entry for entry in json.load(handle)
             }
 
+        def add_cre_links(standard, cre_ids, cache, linked_cre_ids):
+            """Add CRE links to a standard, skipping already-linked and missing CREs."""
+            for cre_id in cre_ids:
+                if cre_id in linked_cre_ids:
+                    continue
+                cres = cache.get_CREs(external_id=cre_id)
+                if not cres:
+                    continue
+                linked_cre_ids.append(cre_id)
+                standard.add_link(
+                    defs.Link(
+                        ltype=defs.LinkTypes.LinkedTo,
+                        document=cres[0].shallow_copy(),
+                    )
+                )
+
         entries = []
         for entry in raw_entries:
             standard = defs.Standard(
                 name=self.name,
                 sectionID=entry["section_id"],
                 section=entry["section"],
                 hyperlink=entry["hyperlink"],
             )
             linked_cre_ids = []
-            for cre_id in entry.get("cre_ids", []):
-                cres = cache.get_CREs(external_id=cre_id)
-                if not cres:
-                    continue
-                linked_cre_ids.append(cre_id)
-                standard.add_link(
-                    defs.Link(
-                        ltype=defs.LinkTypes.LinkedTo,
-                        document=cres[0].shallow_copy(),
-                    )
-                )
+            add_cre_links(standard, entry.get("cre_ids", []), cache, linked_cre_ids)
+            
             if not linked_cre_ids:
                 for section_id in entry.get("fallback_section_ids", []):
                     fallback_entry = fallback_entries.get(section_id)
                     if not fallback_entry:
                         continue
-                    for cre_id in fallback_entry.get("cre_ids", []):
-                        if cre_id in linked_cre_ids:
-                            continue
-                        cres = cache.get_CREs(external_id=cre_id)
-                        if not cres:
-                            continue
-                        linked_cre_ids.append(cre_id)
-                        standard.add_link(
-                            defs.Link(
-                                ltype=defs.LinkTypes.LinkedTo,
-                                document=cres[0].shallow_copy(),
-                            )
-                        )
+                    add_cre_links(standard, fallback_entry.get("cre_ids", []), cache, linked_cre_ids)
             entries.append(standard)
 
         return ParseResult(

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.py`
around lines 43 - 71, The CRE linking logic is duplicated between the primary
loop over entry.get("cre_ids") and the fallback loop over
fallback_entry.get("cre_ids"); extract this into a private helper (e.g.,
_link_cres or link_cre_ids) that accepts a list of cre_ids, the current
linked_cre_ids set/list, the cache, and the standard object, then inside the
helper call cache.get_CREs(external_id=cre_id), skip missing results, append to
linked_cre_ids, and call
standard.add_link(defs.Link(ltype=defs.LinkTypes.LinkedTo,
document=cres[0].shallow_copy())); replace both duplicated blocks with calls to
this helper for entry.get("cre_ids") and for each fallback_entry.get("cre_ids")
(respecting the existing check to skip cre_ids already in linked_cre_ids).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json`:
- Line 5: The JSON uses the same generic "hyperlink" for all K01–K10 entries;
update each entry's "hyperlink" value in
application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json
to point to the section-specific OWASP pages (use the K01–K10 URLs under
/www-project-kubernetes-top-ten/2025/en/src/ as listed in the review: K01 ->
.../K01-Insecure-Workload-Configurations.html, K02 ->
.../K02-Overly-Permissive-Authorization-Configurations.html, K03 ->
.../K03-Secrets-Management-Failures.html, K04 ->
.../K04-Lack-Of-Cluster-Level-Policy-Enforcement.html, K05 ->
.../K05-Missing-Network-Segmentation-Controls.html, K06 ->
.../K06-Overly-Exposed-Kubernetes-Components.html, K07 ->
.../K07-Misconfigured-And-Vulnerable-Cluster-Components.html, K08 ->
.../K08-Cluster-To-Cloud-Lateral-Movement.html, K09 ->
.../K09-Broken-Authentication-Mechanisms.html, K10 ->
.../K10-Inadequate-Logging-And-Monitoring.html) so each object's "hyperlink"
value matches its K0x key.

In `@application/utils/external_project_parsers/parsers/owasp_aisvs.py`:
- Line 17: The parse method is missing a return type annotation; update the
signature of parse(self, cache: db.Node_collection, ph:
prompt_client.PromptHandler) to include the correct return type (most likely ->
None if it doesn't return a value, otherwise the concrete type such as ->
List[db.Node] or -> Iterable[db.Node] as appropriate), add any necessary typing
imports, and then run make mypy to confirm the change satisfies type checking.

In `@application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py`:
- Around line 13-47: The OwaspApiTop10_2023 parser duplicates parsing logic
present in several other parsers; extract that shared logic into a small base
class and have specific parsers supply only name and data_file. Create a
SimpleOwaspJsonParser (subclassing ParserInterface) that takes name and
data_file in __init__ and implements parse(cache, ph) using the existing logic
(loading json, building defs.Standard objects, linking CREs via cache.get_CREs,
and returning ParseResult), then change OwaspApiTop10_2023 to subclass
SimpleOwaspJsonParser and call super().__init__(name="OWASP API Security Top 10
2023", data_file=Path(...)/"owasp_api_top10_2023.json"); apply the same pattern
to OwaspTop10_2025, OwaspKubernetesTop10_2022, OwaspLlmTop10_2025, and
OwaspAisvs so they only provide their name and data_file. Ensure signature and
return types of parse remain identical to the original to preserve usage.

In `@application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py`:
- Line 19: The parse method lacks a return type annotation; update the method
signature for parse(self, cache: db.Node_collection, ph:
prompt_client.PromptHandler) to include an explicit return type matching its
implementation (for example -> None if it doesn't return anything, or ->
List[db.Node] / -> Sequence[db.Node] / the exact type it yields if it returns
nodes), update any imports or typing aliases needed, and then run make mypy to
verify the fix.

In `@scripts/update-owasp-top10-2025-mappings.sh`:
- Around line 23-26: The backup step that copies "$CACHE_FILE" to "$BACKUP_FILE"
needs verification to avoid silent failures: after the cp command in the script
(where CACHE_FILE and BACKUP_FILE are used) check cp's exit status and/or test
that "$BACKUP_FILE" now exists and is non-empty; if the copy failed, emit an
error via echo/processLogger and exit with a non-zero status so the script stops
rather than proceeding without a valid backup.

In `@scripts/update-owasp-top10-standards.sh`:
- Around line 23-26: The backup step that copies "$CACHE_FILE" to "$BACKUP_FILE"
can fail silently; after running cp (the copy of CACHE_FILE to BACKUP_FILE)
verify the operation succeeded by checking cp’s exit status and that BACKUP_FILE
exists and is non-empty (e.g., test -f and -s), and if the verification fails
log an error and exit non‑zero to avoid proceeding; update the script block
around the cp invocation to perform these checks and abort on failure so
destructive operations never run without a valid backup.
- Around line 102-127: The script opens the SQLite connection via conn =
sqlite3.connect(cache_file) and then deletes nodes with cur.execute("delete from
node where id = ?", (node_id,)) without enabling foreign key enforcement, so
CASCADE on cre_node_links.node won't run; fix by executing the PRAGMA
foreign_keys = ON on the connection immediately after creating conn (before any
queries/deletes) so the FK cascade defined for
cre_node_links.node/cre_node_links is enforced and orphaned rows are not left
behind.

---

Nitpick comments:
In `@application/tests/cheatsheets_parser_test.py`:
- Around line 77-98: Replace the list-comprehension + [0] extraction in
test_register_supplemental_cheatsheets with an idiomatic single-element
extractor using next; specifically, locate the test function
test_register_supplemental_cheatsheets and change the line that sets rest from
"rest = [entry for entry in entries if entry.section == 'REST Security Cheat
Sheet'][0]" to use next((entry for entry in entries if entry.section == 'REST
Security Cheat Sheet'), None) (keeping the variable name rest and subsequent
assertions unchanged).
- Around line 37-75: Replace the manual list-slicing extraction of secret_entry
with a single-element extraction using next() to be more idiomatic and avoid
creating an intermediate list; locate the loop over entries.results (the for
name, nodes in entries.results.items() block) and change the extraction that
currently builds a list comprehension filtering node.section == "Secrets
Management Cheat Sheet" into using next(x for x in nodes if x.section ==
"Secrets Management Cheat Sheet") (or next((x for x in nodes if x.section ==
"..."), None) if you want a safe fallback), then keep the subsequent
self.assertEqual(expected.todict(), secret_entry.todict()) unchanged.

In `@application/utils/external_project_parsers/parsers/cheatsheets_parser.py`:
- Around line 113-145: The two registration methods are inconsistent:
register_supplemental_cheatsheets uses cre.shallow_copy() when building
defs.Link but register_cheatsheets passes the CRE object directly; update
register_cheatsheets to use cre.shallow_copy() for the Link.document to match
register_supplemental_cheatsheets (use the same defs.Link(...) construction with
cre.shallow_copy()) to avoid accidental mutations and keep behavior consistent
between register_cheatsheets and register_supplemental_cheatsheets.
- Around line 147-173: In deduplicate_entries, rename the ambiguous loop
variable l to a descriptive name like link in both loops (the loops iterating
over getattr(stored, "links", []) and getattr(entry, "links", [])) and rename
existing_keys to existing_link_keys for clarity; also add logging in the inner
fallback exception handler (the except that currently does "continue" after
trying stored.links.append(link)) to record the exception and context (e.g., the
entry.section/entry.hyperlink and the link info) instead of silently swallowing
it — keep the outer try/except behavior around stored.add_link and the outer
exception log as-is, but ensure the inner except logs the error before
continuing.

In `@application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py`:
- Line 19: The parse method lacks an explicit return type annotation; update the
signature of def parse(self, cache: db.Node_collection, ph:
prompt_client.PromptHandler) to include a return type (for example -> None if
the method does not return anything, or the concrete type such as ->
list[db.Node] / -> bool / -> dict[...] if it does return a value). Inspect the
implementation of parse to determine the correct return value and add that
explicit type annotation to the method signature.

In
`@application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.py`:
- Around line 43-71: The CRE linking logic is duplicated between the primary
loop over entry.get("cre_ids") and the fallback loop over
fallback_entry.get("cre_ids"); extract this into a private helper (e.g.,
_link_cres or link_cre_ids) that accepts a list of cre_ids, the current
linked_cre_ids set/list, the cache, and the standard object, then inside the
helper call cache.get_CREs(external_id=cre_id), skip missing results, append to
linked_cre_ids, and call
standard.add_link(defs.Link(ltype=defs.LinkTypes.LinkedTo,
document=cres[0].shallow_copy())); replace both duplicated blocks with calls to
this helper for entry.get("cre_ids") and for each fallback_entry.get("cre_ids")
(respecting the existing check to skip cre_ids already in linked_cre_ids).

In `@application/utils/external_project_parsers/parsers/owasp_top10_2025.py`:
- Line 19: The parse method signature lacks a return type annotation; update def
parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) to
include an explicit return type that matches the actual value returned by the
function (for example -> None or -> List[YourNodeType] or -> dict[str, Any]),
add any necessary imports for that type, and run make mypy to verify the
annotation resolves type errors; reference the parse method in
owasp_top10_2025.py and the parameter types db.Node_collection and
prompt_client.PromptHandler when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: af4aa50f-a17b-4641-bdb0-a9c88fbc977f

📥 Commits

Reviewing files that changed from the base of the PR and between bc3a8a3 and 3f6152c.

📒 Files selected for processing (26)

application/cmd/cre_main.py
application/tests/cheatsheets_parser_test.py
application/tests/owasp_aisvs_parser_test.py
application/tests/owasp_api_top10_2023_parser_test.py
application/tests/owasp_kubernetes_top10_2022_parser_test.py
application/tests/owasp_kubernetes_top10_2025_parser_test.py
application/tests/owasp_llm_top10_2025_parser_test.py
application/tests/owasp_top10_2025_parser_test.py
application/utils/external_project_parsers/data/owasp_aisvs_1_0.json
application/utils/external_project_parsers/data/owasp_api_top10_2023.json
application/utils/external_project_parsers/data/owasp_cheatsheets_supplement.json
application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json
application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json
application/utils/external_project_parsers/data/owasp_llm_top10_2025.json
application/utils/external_project_parsers/data/owasp_top10_2025.json
application/utils/external_project_parsers/parsers/cheatsheets_parser.py
application/utils/external_project_parsers/parsers/owasp_aisvs.py
application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py
application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2022.py
application/utils/external_project_parsers/parsers/owasp_kubernetes_top10_2025.py
application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py
application/utils/external_project_parsers/parsers/owasp_top10_2025.py
cre.py
scripts/update-cheatsheets.sh
scripts/update-owasp-top10-2025-mappings.sh
scripts/update-owasp-top10-standards.sh

coderabbitai · 2026-06-11T16:37:08Z

+  {
+    "section_id": "K01",
+    "section": "Insecure Workload Configurations",
+    "hyperlink": "https://owasp.org/www-project-kubernetes-top-ten/",


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Check if OWASP K8s Top 10 2025 has section-specific URLs curl -s "https://owasp.org/www-project-kubernetes-top-ten/" | grep -i "k01\|k02\|k03" | head -20

Repository: OWASP/OpenCRE

Length of output: 851

🏁 Script executed:

#!/bin/bash # List OWASP K8s Top Ten 2025 section page URLs (K01-K10) from the main project page curl -s "https://owasp.org/www-project-kubernetes-top-ten/" \ | rg -o 'href="([^"]*www-project-kubernetes-top-ten/2025/en/src/K0[0-9][^"]*\.html)"' \ | sed -E 's/^href="(.*)"$/\1/' \ | sort -u

Repository: OWASP/OpenCRE

Length of output: 873

🏁 Script executed:

#!/bin/bash # Extract OWASP K8s Top Ten 2025 K10 section page URL from the main project page curl -s "https://owasp.org/www-project-kubernetes-top-ten/" \ | rg -o 'href="([^"]*www-project-kubernetes-top-ten/2025/en/src/[^"]*K10[^"]*\.html)"' \ | sed -E 's/^href="(.*)"$/\1/' \ | sort -u

Repository: OWASP/OpenCRE

Length of output: 144

🏁 Script executed:

#!/bin/bash set -euo pipefail python3 - <<'PY' import json, pathlib p = pathlib.Path("application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json") data = json.loads(p.read_text(encoding="utf-8")) # handle either list or dict-wrapped items = data if isinstance(data, list) else data.get("data", data.get("sections", [])) vals = {} for it in items: sid = it.get("section_id") or it.get("id") or it.get("sectionId") vals.setdefault(it.get("hyperlink"), []).append(sid) print("unique hyperlink values:", len(vals)) for k,v in vals.items(): print(k, "->", len(v), "sections", sorted([x for x in v if x])) PY

Repository: OWASP/OpenCRE

Length of output: 220

Update hyperlink to section-specific OWASP K8s Top Ten 2025 pages

owasp_kubernetes_top10_2025.json sets the same hyperlink (https://owasp.org/www-project-kubernetes-top-ten/) for all K01–K10, even though OWASP publishes individual section pages under /www-project-kubernetes-top-ten/2025/en/src/. Update each entry’s hyperlink to the matching page, e.g.:

K01: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K01-Insecure-Workload-Configurations.html

K02: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K02-Overly-Permissive-Authorization-Configurations.html

K03: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K03-Secrets-Management-Failures.html

K04: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K04-Lack-Of-Cluster-Level-Policy-Enforcement.html

K05: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K05-Missing-Network-Segmentation-Controls.html

K06: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K06-Overly-Exposed-Kubernetes-Components.html

K07: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K07-Misconfigured-And-Vulnerable-Cluster-Components.html

K08: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K08-Cluster-To-Cloud-Lateral-Movement.html

K09: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K09-Broken-Authentication-Mechanisms.html

K10: https://owasp.org/www-project-kubernetes-top-ten/2025/en/src/K10-Inadequate-Logging-And-Monitoring.html

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json` at line 5, The JSON uses the same generic "hyperlink" for all K01–K10 entries; update each entry's "hyperlink" value in application/utils/external_project_parsers/data/owasp_kubernetes_top10_2025.json to point to the section-specific OWASP pages (use the K01–K10 URLs under /www-project-kubernetes-top-ten/2025/en/src/ as listed in the review: K01 -> .../K01-Insecure-Workload-Configurations.html, K02 -> .../K02-Overly-Permissive-Authorization-Configurations.html, K03 -> .../K03-Secrets-Management-Failures.html, K04 -> .../K04-Lack-Of-Cluster-Level-Policy-Enforcement.html, K05 -> .../K05-Missing-Network-Segmentation-Controls.html, K06 -> .../K06-Overly-Exposed-Kubernetes-Components.html, K07 -> .../K07-Misconfigured-And-Vulnerable-Cluster-Components.html, K08 -> .../K08-Cluster-To-Cloud-Lateral-Movement.html, K09 -> .../K09-Broken-Authentication-Mechanisms.html, K10 -> .../K10-Inadequate-Logging-And-Monitoring.html) so each object's "hyperlink" value matches its K0x key.

coderabbitai · 2026-06-11T16:37:08Z

+    name = "OWASP AI Security Verification Standard (AISVS)"
+    data_file = Path(__file__).resolve().parent.parent / "data" / "owasp_aisvs_1_0.json"
+
+    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add return type annotation for mypy compliance.

The parse() method is missing a return type annotation. Per coding guidelines, Python code should pass make mypy type checking.

📝 Suggested fix

- def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler): + def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) -> ParseResult:

As per coding guidelines, run make mypy for Python type checking to ensure full type coverage.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):

def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) -> ParseResult:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/owasp_aisvs.py` at line 17, The parse method is missing a return type annotation; update the signature of parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) to include the correct return type (most likely -> None if it doesn't return a value, otherwise the concrete type such as -> List[db.Node] or -> Iterable[db.Node] as appropriate), add any necessary typing imports, and then run make mypy to confirm the change satisfies type checking.

Source: Coding guidelines

coderabbitai · 2026-06-11T16:37:08Z

+class OwaspApiTop10_2023(ParserInterface):
+    name = "OWASP API Security Top 10 2023"
+    data_file = (
+        Path(__file__).resolve().parent.parent / "data" / "owasp_api_top10_2023.json"
+    )
+
+    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):
+        with self.data_file.open("r", encoding="utf-8") as handle:
+            raw_entries = json.load(handle)
+
+        entries = []
+        for entry in raw_entries:
+            standard = defs.Standard(
+                name=self.name,
+                sectionID=entry["section_id"],
+                section=entry["section"],
+                hyperlink=entry["hyperlink"],
+            )
+            for cre_id in entry.get("cre_ids", []):
+                cres = cache.get_CREs(external_id=cre_id)
+                if not cres:
+                    continue
+                standard.add_link(
+                    defs.Link(
+                        ltype=defs.LinkTypes.LinkedTo,
+                        document=cres[0].shallow_copy(),
+                    )
+                )
+            entries.append(standard)
+
+        return ParseResult(
+            results={self.name: entries},
+            calculate_gap_analysis=False,
+            calculate_embeddings=False,
+        )


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Consider consolidating duplicated parser logic into a base class.

This parser class is nearly identical to OwaspTop10_2025, OwaspKubernetesTop10_2022, OwaspLlmTop10_2025, and OwaspAisvs (based on the PR summary). The only differences are the name and data_file attributes. Extracting the common parse() logic into a shared base class or helper function would reduce duplication and improve maintainability.

♻️ Proposed refactor approach

Create a base class that accepts name and data_file as constructor parameters:

class SimpleOwaspJsonParser(ParserInterface): def __init__(self, name: str, data_file: Path): self.name = name self.data_file = data_file def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) -> ParseResult: with self.data_file.open("r", encoding="utf-8") as handle: raw_entries = json.load(handle) entries = [] for entry in raw_entries: standard = defs.Standard( name=self.name, sectionID=entry["section_id"], section=entry["section"], hyperlink=entry["hyperlink"], ) for cre_id in entry.get("cre_ids", []): cres = cache.get_CREs(external_id=cre_id) if not cres: continue standard.add_link( defs.Link( ltype=defs.LinkTypes.LinkedTo, document=cres[0].shallow_copy(), ) ) entries.append(standard) return ParseResult( results={self.name: entries}, calculate_gap_analysis=False, calculate_embeddings=False, )

Then each parser becomes:

class OwaspApiTop10_2023(SimpleOwaspJsonParser): def __init__(self): super().__init__( name="OWASP API Security Top 10 2023", data_file=Path(__file__).resolve().parent.parent / "data" / "owasp_api_top10_2023.json" )

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/owasp_api_top10_2023.py` around lines 13 - 47, The OwaspApiTop10_2023 parser duplicates parsing logic present in several other parsers; extract that shared logic into a small base class and have specific parsers supply only name and data_file. Create a SimpleOwaspJsonParser (subclassing ParserInterface) that takes name and data_file in __init__ and implements parse(cache, ph) using the existing logic (loading json, building defs.Standard objects, linking CREs via cache.get_CREs, and returning ParseResult), then change OwaspApiTop10_2023 to subclass SimpleOwaspJsonParser and call super().__init__(name="OWASP API Security Top 10 2023", data_file=Path(...)/"owasp_api_top10_2023.json"); apply the same pattern to OwaspTop10_2025, OwaspKubernetesTop10_2022, OwaspLlmTop10_2025, and OwaspAisvs so they only provide their name and data_file. Ensure signature and return types of parse remain identical to the original to preserve usage.

coderabbitai · 2026-06-11T16:37:08Z

+        Path(__file__).resolve().parent.parent / "data" / "owasp_llm_top10_2025.json"
+    )
+
+    def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add return type annotation for mypy compliance.

The parse() method is missing a return type annotation. Per coding guidelines, Python code should pass make mypy type checking.

📝 Suggested fix

- def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler): + def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) -> ParseResult:

As per coding guidelines, run make mypy for Python type checking to ensure full type coverage.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@application/utils/external_project_parsers/parsers/owasp_llm_top10_2025.py` at line 19, The parse method lacks a return type annotation; update the method signature for parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) to include an explicit return type matching its implementation (for example -> None if it doesn't return anything, or -> List[db.Node] / -> Sequence[db.Node] / the exact type it yields if it returns nodes), update any imports or typing aliases needed, and then run make mypy to verify the fix.

Source: Coding guidelines

coderabbitai · 2026-06-11T16:37:08Z

+if [[ -f "$CACHE_FILE" ]]; then
+  cp "$CACHE_FILE" "$BACKUP_FILE"
+  echo "Backed up database to $BACKUP_FILE"
+fi


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify backup creation to prevent silent data loss.

Unlike update-cheatsheets.sh (lines 25-28), this script doesn't verify that the backup file was successfully created. If cp silently fails (e.g., disk full, permission issues), the script continues without a safety net.

🛡️ Proposed fix to add backup verification

if [[ -f "$CACHE_FILE" ]]; then cp "$CACHE_FILE" "$BACKUP_FILE" + if [[ ! -f "$BACKUP_FILE" ]]; then + echo "Failed to create backup at $BACKUP_FILE" >&2 + exit 1 + fi echo "Backed up database to $BACKUP_FILE" fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ -f "$CACHE_FILE" ]]; then

cp "$CACHE_FILE" "$BACKUP_FILE"

echo "Backed up database to $BACKUP_FILE"

fi

if [[ -f "$CACHE_FILE" ]]; then

cp "$CACHE_FILE" "$BACKUP_FILE"

if [[ ! -f "$BACKUP_FILE" ]]; then

echo "Failed to create backup at $BACKUP_FILE" >&2

exit 1

fi

echo "Backed up database to $BACKUP_FILE"

fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/update-owasp-top10-2025-mappings.sh` around lines 23 - 26, The backup step that copies "$CACHE_FILE" to "$BACKUP_FILE" needs verification to avoid silent failures: after the cp command in the script (where CACHE_FILE and BACKUP_FILE are used) check cp's exit status and/or test that "$BACKUP_FILE" now exists and is non-empty; if the copy failed, emit an error via echo/processLogger and exit with a non-zero status so the script stops rather than proceeding without a valid backup.

coderabbitai · 2026-06-11T16:37:08Z

+if [[ -f "$CACHE_FILE" ]]; then
+  cp "$CACHE_FILE" "$BACKUP_FILE"
+  echo "Backed up database to $BACKUP_FILE"
+fi


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify backup creation to prevent silent data loss.

This script shares the same backup verification gap as update-owasp-top10-2025-mappings.sh. If cp fails silently, the script continues without a safety net before performing destructive operations.

🛡️ Proposed fix to add backup verification

if [[ -f "$CACHE_FILE" ]]; then cp "$CACHE_FILE" "$BACKUP_FILE" + if [[ ! -f "$BACKUP_FILE" ]]; then + echo "Failed to create backup at $BACKUP_FILE" >&2 + exit 1 + fi echo "Backed up database to $BACKUP_FILE" fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if [[ -f "$CACHE_FILE" ]]; then

cp "$CACHE_FILE" "$BACKUP_FILE"

echo "Backed up database to $BACKUP_FILE"

fi

if [[ -f "$CACHE_FILE" ]]; then

cp "$CACHE_FILE" "$BACKUP_FILE"

if [[ ! -f "$BACKUP_FILE" ]]; then

echo "Failed to create backup at $BACKUP_FILE" >&2

exit 1

fi

echo "Backed up database to $BACKUP_FILE"

fi

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/update-owasp-top10-standards.sh` around lines 23 - 26, The backup step that copies "$CACHE_FILE" to "$BACKUP_FILE" can fail silently; after running cp (the copy of CACHE_FILE to BACKUP_FILE) verify the operation succeeded by checking cp’s exit status and that BACKUP_FILE exists and is non-empty (e.g., test -f and -s), and if the verification fails log an error and exit non‑zero to avoid proceeding; update the script block around the cp invocation to perform these checks and abort on failure so destructive operations never run without a valid backup.

coderabbitai · 2026-06-11T16:37:09Z

+conn = sqlite3.connect(cache_file)
+cur = conn.cursor()
+
+has_2022 = cur.execute(
+    "select 1 from node where name = 'OWASP Kubernetes Top Ten 2022' limit 1"
+).fetchone()
+if has_2022:
+    standard_names = standard_names + ("OWASP Kubernetes Top Ten 2022",)
+
+rows = list(
+    cur.execute(
+        f"""
+        select n.id, n.name, coalesce(n.section_id, ''), coalesce(n.section, '')
+        from node n
+        left join cre_node_links l on l.node = n.id
+        where n.name in ({','.join('?' for _ in standard_names)})
+        group by n.id
+        having count(l.cre) = 0
+        """,
+        standard_names,
+    )
+)
+
+for node_id, name, section_id, section in rows:
+    cur.execute("delete from node where id = ?", (node_id,))
+    print(f"Removed unmapped entry: {name} {section_id} {section}".strip())


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Enable foreign key constraints before deleting nodes to prevent orphaned link rows.

The pruning script deletes nodes (line 126) but doesn't enable PRAGMA foreign_keys = ON. SQLite disables foreign key enforcement by default, so the CASCADE delete constraint on cre_node_links.node (defined in db.py:148-156) won't fire, leaving orphaned link rows and violating referential integrity.

The version-selection script at line 58 correctly enables this pragma, confirming it's necessary for CASCADE behavior.

🔧 Proposed fix to enable foreign key enforcement

conn = sqlite3.connect(cache_file) cur = conn.cursor() +# Ensure SQLite enforces foreign key constraints so deletes cascade as expected +cur.execute("PRAGMA foreign_keys = ON") has_2022 = cur.execute(

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

conn = sqlite3.connect(cache_file)

cur = conn.cursor()

has_2022 = cur.execute(

"select 1 from node where name = 'OWASP Kubernetes Top Ten 2022' limit 1"

).fetchone()

if has_2022:

standard_names = standard_names + ("OWASP Kubernetes Top Ten 2022",)

rows = list(

cur.execute(

f"""

select n.id, n.name, coalesce(n.section_id, ''), coalesce(n.section, '')

from node n

left join cre_node_links l on l.node = n.id

where n.name in ({','.join('?' for _ in standard_names)})

group by n.id

having count(l.cre) = 0

""",

standard_names,

)

)

for node_id, name, section_id, section in rows:

cur.execute("delete from node where id = ?", (node_id,))

print(f"Removed unmapped entry: {name} {section_id} {section}".strip())

conn = sqlite3.connect(cache_file)

cur = conn.cursor()

# Ensure SQLite enforces foreign key constraints so deletes cascade as expected

cur.execute("PRAGMA foreign_keys = ON")

has_2022 = cur.execute(

"select 1 from node where name = 'OWASP Kubernetes Top Ten 2022' limit 1"

).fetchone()

if has_2022:

standard_names = standard_names + ("OWASP Kubernetes Top Ten 2022",)

rows = list(

cur.execute(

f"""

select n.id, n.name, coalesce(n.section_id, ''), coalesce(n.section, '')

from node n

left join cre_node_links l on l.node = n.id

where n.name in ({','.join('?' for _ in standard_names)})

group by n.id

having count(l.cre) = 0

""",

standard_names,

)

)

for node_id, name, section_id, section in rows:

cur.execute("delete from node where id = ?", (node_id,))

print(f"Removed unmapped entry: {name} {section_id} {section}".strip())

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/update-owasp-top10-standards.sh` around lines 102 - 127, The script opens the SQLite connection via conn = sqlite3.connect(cache_file) and then deletes nodes with cur.execute("delete from node where id = ?", (node_id,)) without enabling foreign key enforcement, so CASCADE on cre_node_links.node won't run; fix by executing the PRAGMA foreign_keys = ON on the connection immediately after creating conn (before any queries/deletes) so the FK cascade defined for cre_node_links.node/cre_node_links is enforced and orphaned rows are not left behind.

coderabbitai

🧹 Nitpick comments (1)

import_telemetry.json (1)
1-13: 💤 Low value

Consider normalizing op_counts schema for consistent downstream analysis.

Lines 3-4 include explicit zero counters ("ga_pairs_planned": 0, ...), while lines 1-2 and 5-13 use empty objects ({}). This reflects runtime behavior—imports with GA pair processing initialize the full counter structure, while others default to an empty dict. However, the inconsistency may complicate downstream telemetry queries and aggregations.

For example, distinguishing "no GA processing configured" from "GA processing ran and found zero pairs" requires checking both the presence of keys and their values.

If telemetry consumers would benefit from uniform structure, consider having emit_import_event always emit the full op_counts schema (with zeros for unused counters) when the import source supports GA operations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@import_telemetry.json` around lines 1 - 13, The telemetry output is
inconsistent because op_counts is sometimes an empty object and sometimes
contains GA counters; update the emitter (emit_import_event) to normalize
op_counts by always including the full GA counter keys (ga_pairs_planned,
ga_pairs_enqueued, ga_pairs_retried, ga_pairs_failed, ga_pairs_completed) with
integer values (default 0) so downstream consumers can rely on a consistent
schema; implement this either by initializing a default op_counts template and
merging any real counts into it or by explicitly populating those keys before
emitting the event.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@import_telemetry.json`:
- Around line 1-13: The telemetry output is inconsistent because op_counts is
sometimes an empty object and sometimes contains GA counters; update the emitter
(emit_import_event) to normalize op_counts by always including the full GA
counter keys (ga_pairs_planned, ga_pairs_enqueued, ga_pairs_retried,
ga_pairs_failed, ga_pairs_completed) with integer values (default 0) so
downstream consumers can rely on a consistent schema; implement this either by
initializing a default op_counts template and merging any real counts into it or
by explicitly populating those keys before emitting the event.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

Run ID: 618bdd77-727a-4a1a-b799-79f8a8698573

📥 Commits

Reviewing files that changed from the base of the PR and between 3f6152c and d34c31d.

📒 Files selected for processing (1)

import_telemetry.json

…cation, update OWASP 2025 hyperlinks, and other quality improvements

Bornunique911 · 2026-06-12T05:29:23Z

Hi, @SurbhiAgarwal1 . Please check this PR review (#877 (comment)) reviewed by @coderabbitai once and update the files as requested by @coderabbitai in the suggestion made by @coderabbitai .

coderabbitai · 2026-06-12T05:30:14Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the review details from PR `#877`
gh api repos/OWASP/OpenCRE/pulls/877/reviews --jq '.[] | select(.id == 4422342882) | {id: .id, state: .state, body: .body, user: .user.login}'

Length of output: 13416

🏁 Script executed:

#!/bin/bash
# Also fetch review comments from that specific review
gh api "repos/OWASP/OpenCRE/pulls/877/reviews/4422342882/comments" --jq '.[] | {path: .path, line: .line, body: .body, diff_hunk: .diff_hunk}'

Length of output: 35677

@SurbhiAgarwal1, here is a summary of the relevant suggestion from the PR #877 review that applies to this PR:

📌 Suggestion still applicable to this PR

File: application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json
Issue: K09 ("Misconfigured Cluster Components") has the exact same cre_ids (["233-748", "486-813"]) as K01 ("Insecure Workload Configurations"), which looks like a copy/paste leftover.

Original suggestion from the review:

Update the cre_ids for K09 to the correct, distinct CRE identifiers or an empty array ([]) if none can be confirmed — rather than leaving duplicated IDs or just adding a note.

  {
    "section_id": "K09",
    "section": "Misconfigured Cluster Components",
    "hyperlink": "https://owasp.org/www-project-kubernetes-top-ten/2022/en/src/K09-misconfigured-cluster-components",
-   "note": "The duplication of CRE mappings with K01 is intentional; ...",
-   "cre_ids": ["233-748", "486-813"]
+   "cre_ids": []
  }

The current PR adds a note field explaining the duplication is intentional, but the review's recommendation was to either use distinct correct CRE IDs for K09, or use an empty array if no verified mapping exists. Documenting a questionable mapping as "intentional" without verified evidence may mislead future contributors.

If research confirms that K09 and K01 genuinely share these controls, please also consider citing the specific source/reference that backs this up.

✅ Already addressed (no action needed for these)

The following comments from PR #877 were already fixed in commit f2f85f4 and don't require changes in this PR:

Ruff E741: renaming l → link in test files (owasp_aisvs_parser_test.py, owasp_api_top10_2023_parser_test.py, owasp_kubernetes_top10_2022_parser_test.py, owasp_kubernetes_top10_2025_parser_test.py, owasp_llm_top10_2025_parser_test.py)
Broken hyperlink URLs in owasp_aisvs_1_0.json
Undispatched --owasp_*_in CLI flags in cre.py

coderabbitai · 2026-06-12T05:30:53Z

Tip

For best results, initiate chat on the files or code changes.

@Bornunique911, I already addressed this in my reply just above in this same comment thread! Here's a quick recap for convenience:

📌 Action needed for `@SurbhiAgarwal1`

File: application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json

The suggestion from the PR #877 review is to not simply document the duplication as intentional, but instead to:

Use distinct, verified CRE IDs for K09 if research confirms specific ones, OR
Use an empty array ([]) if no verified mapping can be confirmed.

  {
    "section_id": "K09",
    "section": "Misconfigured Cluster Components",
    "hyperlink": "https://owasp.org/www-project-kubernetes-top-ten/2022/en/src/K09-misconfigured-cluster-components",
-   "note": "The duplication of CRE mappings with K01 is intentional; ...",
-   "cre_ids": ["233-748", "486-813"]
+   "cre_ids": []
  }

The current PR only adds a note field, which doesn't resolve the underlying concern — the duplicated cre_ids still need to be either properly verified (with a cited source) or cleared out. Please update accordingly, @SurbhiAgarwal1.

Bornunique911 · 2026-06-14T18:35:21Z

I think import_telementry.json file is not needed.

@coderabbitai review

coderabbitai · 2026-06-14T18:35:50Z

@Bornunique911, you're right — import_telemetry.json appears to be an accidentally committed local artifact (25 newline-delimited import_run telemetry records) that has no relevance to this PR's scope of researching the K09 CRE mapping. @SurbhiAgarwal1 should remove it before this PR is merged.

On the positive side, based on the AI-generated diff summary, the main change looks correctly updated: owasp_kubernetes_top10_2022.json now shows K09 with "cre_ids": [] and no note field — which aligns with the recommendation from the PR #877 review. 🐇

I'll kick off a full review now.

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

SurbhiAgarwal1 · 2026-06-14T19:05:54Z

Thanks for catching that, @Bornunique911 and @coderabbitai!

I've just pushed a new commit that removes the accidentally committed import_telemetry.json file. I have also ensured that the cre_ids array for K09 in owasp_kubernetes_top10_2022.json is completely empty ([]) and removed the note field, aligning perfectly with the previous review recommendations.

Everything should be fully resolved and ready to go now!

coderabbitai · 2026-06-14T19:06:25Z

Tip

For best results, initiate chat on the files or code changes.

@SurbhiAgarwal1, great work! Both issues are now fully resolved:

✅ import_telemetry.json removed — no accidental artifacts.
✅ K09 cre_ids cleared to [] and the note field removed — perfectly in line with the recommendation from the PR #877 review.

The PR looks good from a data integrity standpoint. Thanks for following up on this promptly! 🐇

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Bornunique911 · 2026-06-14T19:22:40Z

@coderabbitai review

Do a final review and make sure that it fixes the PR review comment made by you for keeping a note for K09 and K01 mapping #877.

coderabbitai · 2026-06-14T19:23:14Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current state of the K09 entry in the JSON file
cat application/utils/external_project_parsers/data/owasp_kubernetes_top10_2022.json | python3 -c "
import json, sys
data = json.load(sys.stdin)
for entry in data:
    if entry.get('section_id') in ('K01', 'K09'):
        print(json.dumps(entry, indent=2))
"

Length of output: 823

@Bornunique911, the final verification confirms everything is properly fixed. Here's the current state:

Entry	`section`	`cre_ids`	`note` field
K01	Insecure Workload Configurations	`["233-748", "486-813"]`	—
K09	Misconfigured Cluster Components	`[]` ✅	removed ✅

✅ All items from the PR `#877` review are resolved:

K09 cre_ids cleared to [] — the duplicated IDs that matched K01 are gone, exactly as recommended.
note field removed — no longer documenting an unverified duplication as "intentional."
import_telemetry.json removed — the accidentally committed artifact has been cleaned up.

The PR is in good shape and ready for maintainer merge. 🐇

Bornunique911 · 2026-06-15T08:30:11Z

@Pa04rth Please review it once. It fixes the issue. #916

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

SurbhiAgarwal1 added 2 commits June 11, 2026 22:51

Document intentional duplication of CRE IDs for K09 (add note field)

02f1f83

Apply fixes: add type annotations, enable foreign keys, backup verifi…

cb53f13

…cation, update OWASP 2025 hyperlinks, and other quality improvements

SurbhiAgarwal1 force-pushed the research-k09-mapping branch from d34c31d to cb53f13 Compare June 11, 2026 17:22

Fix K09 cre_ids as per CodeRabbit review

abddceb

Remove accidentally committed import_telemetry.json artifact

e16cf63

Bornunique911 mentioned this pull request Jun 14, 2026

Add refresh scripts for OWASP resources for issue 471 #877

Open

	def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler):
	def parse(self, cache: db.Node_collection, ph: prompt_client.PromptHandler) -> ParseResult:

Conversation

SurbhiAgarwal1 commented Jun 11, 2026

Title

Summary

What was changed

Why this change is needed

Testing

Additional notes

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

❌ Failed checks (2 warnings, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Bornunique911 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Jun 12, 2026

📌 Suggestion still applicable to this PR

✅ Already addressed (no action needed for these)

Uh oh!

coderabbitai Bot commented Jun 12, 2026

📌 Action needed for @SurbhiAgarwal1

Uh oh!

Bornunique911 commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SurbhiAgarwal1 commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026

Uh oh!

Bornunique911 commented Jun 14, 2026

Uh oh!

coderabbitai Bot commented Jun 14, 2026

✅ All items from the PR #877 review are resolved:

Uh oh!

Bornunique911 commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Bornunique911 commented Jun 12, 2026 •

edited

Loading

📌 Action needed for `@SurbhiAgarwal1`

coderabbitai Bot commented Jun 14, 2026 •

edited

Loading

✅ All items from the PR `#877` review are resolved: