Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions rfds/RFD_Redifining_a_Reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
# RFD: Redefining a Reference

- **Status:** Draft
- **Authors:** Jay Jacobs
- **Created:** 2026-02-24
- **Updated:** 2026-02-24
- **Target:** CVE Record Format v6.x (or later)
- **Related:** Discuss Forum RFD (PR \#462)
- **Affected Section:** References

# Summary

This RFD proposes improving the CVE Record `references` section (in both CNA and ADP containers) so that references are not merely unlabeled URLs but machine-usable, typed, and contextualized external anchors. The current structure requires only a `url`, with optional `name` and optional `tags`; in practice this often results in lists of raw pointers whose purpose, authority, and content type are unclear to consumers. This proposal explores a replacement/extension of the current reference object to include structured fields such as `type`, `title`, `publisher`, retrieval/publication timestamps, `media_type`, `archive_urls`, and `status`, while maintaining compatibility paths from the current `url` / `name` / `tags` model.

# Problem Statement

The current `references` object in the CVE Record Format is minimally constrained and not sufficiently expressive for either human understanding nor machine use. It currently supports three fields:

- `url` (required)
- `name` (optional free text)
- `tags` (optional array; includes defined enums and custom values via `x_*`)

In practice, references in CVEs are often reduced to a list of unlabeled URLs. When only `url` is present, the reference is a pointer to unknown content with no structured indication of what the consumer should expect (vendor advisory, technical analysis, patch, issue tracker, etc.), whether the link is authoritative, whether it is still reachable, or whether archival alternatives exist.

This creates problems in at least three categories:

1. **Data quality deficiency**: lack of contextualized reference metadata weakens the usefulness and interpretability of references.
2. **Schema expressiveness deficiency**: the schema cannot capture common, practical distinctions that consumers care about.
3. **Consumer interoperability deficiency**: consumers cannot reliably automate workflows based on references because meaning is hidden in free text, tags, or external page content.

### Evidence of current limitations (over the past 12 months as of 2026-02-23)

- 71% of URLs (92,736 / 130,906) did not include a **name** value.
- 70% of CVEs (34,857 / 49,722) did not include name values for any references.
- 50% of URLs (65,475 / 130,906) did not include a **tag** value
- 48% of CVEs (23,914 / 49,722) did not include tag values for any references.
- 55% of tags used (786,038 / 1,428,322) used x\_\* custom tags, indicating high variance and weak standardization.

### Who is affected

- **CNA / ADP producers**: are able to insert references without context, pushing the burden onto consumers. CNAs that want to identify and differentiate the authoritative material they are producing would struggle to separate out the important references from simply adjacent references (e.g., reference to issue-level details vs link to product page vs link to general article that mentions CVE as an example).
- **CVE consumers** (tool vendors, VM teams, researchers, data platforms): must manually inspect URLs or rely on heuristics to infer meaning and importance.
- **UI builders and automation workflows**: cannot reliably prioritize or label references (e.g., “show vendor advisory first”, “collect patch links”, “attempt CSAF ingestion”).

### What happens if we do nothing?

If no changes are made, `references` will continue to function primarily as a weak pointer list with inconsistent tagging. Consumers will continue to spend manual effort following links to discover their purpose, and CVE records will continue to under-deliver on a key coordination function: structured external anchors that bolster confidence and support downstream automation.

## Proposed Solution

This RFD proposes a structured replacement/extension for items in the `references` array in the **CNA** and **ADP** containers. The intent is to preserve the core role of references while making them more machine-usable and more informative for humans.

### Design goals

1. **Contextualize references** so consumers know what kind of content a URL points to, when it was last active, how to retrieve it (media type), etc.
2. **Improve machine usability** for reference selection, retrieval, filtering, prioritization, and general information retrieval automation.
3. **Improve data quality checks** by making reference purpose and status explicit.
4. **Fit within the existing CVE record model,** everything is contained in a single reference section and as an array of reference objects..
5. **Support operational reality** where some metadata is unknown or difficult to determine.

# Proposed reference object fields

The following fields are proposed for discussion as a new structured reference object shape.

## Core identity and content location

- `url` (string \- required)
- Single URL field
- `type` (enum \- required, draft list below)
- Structured classification of the referenced resource (replacing overreliance on free-text names and tags).
- `title` (string \- optional, free text )
- Human-readable label/title for the referenced content.

#### Publisher metadata

- `publisher` (object)
- `name` (string) \- need to define this clearly, is it the host/domain? the org that authored the content? or the org that published the page? The authors name?
- `role` (enum; draft values are `authoritative`, `non_authoritative`)
- `domain` (string)

#### Timing metadata

- `published_at` (datetime, optional) **OR** `first_retrieved_at` (datetime, optional)
- At least one of these should be supported and available when possible.
- `last_retrieved_at` (datetime)
- Intended to be required in the proposed model (subject to QWG discussion).
- May be set and refreshed by the CVE Program / Secretariat over time.

#### Content and availability metadata

- `media_type` (string)
- Ideally inferred and populated automatically where feasible, with support for human correction/override.
- `archive_urls` (array of string URLs)
- Intended to be populated by the CVE Program archival process (details out of scope for this RFD).
- `status` (enum; initial draft: `reachable`, `moved`, `archived_only`)

## Required fields (intentionally unresolved for discussion)

This initial draft does **not** fix the final required field set. The RFD should invite QWG discussion on the value and burden of each field before finalizing minimum requirements.

## `type` vocabulary (draft for discussion)

This RFD proposes introducing a single explicit `type` field for references. The starting point for discussion includes current tag semantics and possible consolidation. The submitter specifically recommends:

- no `x_*` extensions for `type`, until presentation can be separated from storage, and there is a version of data where all `x_*` is stripped.
- include a single `other` value instead to denote that the type was evaluated and this reference was deemed to not meet any of them.

Draft values for discussion (mix of current and possible future simplifications):

- broken-link
- customer-entitlement
- exploit
- government-resource
- issue-tracking
- mailing-list
- mitigation
- not-applicable
- patch
- permissions-required
- media-coverage
- product
- related
- release-notes
- signature
- technical-description
- third-party-advisory
- vendor-advisory
- vdb-entry
- other

Alternative consolidation candidates to discuss:

- advisory \- (note: removing the “third-party” vs “vendor” distinction)
- technical\_analysis
- commit
- issue\_tracker
- bulletin
- vendor\_notice
- blog\_post
- exploit\_writeup
- forum\_post (discussion?)

And specific doc types to consider:

- csaf
- cvrf
- vulnerability\_record \- (e.g., osv/ghsa/gcve \- should we have a way to explicitly reference a structured data resource from another registry?)
- *(there are definitely more)*

### Relationship to existing `tags`

This RFD does not yet finalize the fate of `tags`. It should explicitly discuss:

- whether `tags` are retained for legacy purposes, slowly deprecated, or immediate replaced
- how the existing tags map to `type`, `status`, or other structured fields
- whether some existing tags reflect semantics better represented elsewhere

## Examples

The examples below are illustrative and intended to support QWG discussion of structure and semantics.

### Example 1: Current vs proposed (minimal contextualization)

#### Current style (today)

```json
{
"references": [
{
"url": "https://vendor.example/advisories/abc-2026"
}
]
}
```

#### Proposed style (draft minimal)

```json
{
"references": [
{
"url": "https://vendor.example/advisories/abc-2026",
"type": "advisory",
"title": "Acme Advisory ABC-2026",
"publisher": {
"name": "Example Vendor",
"role": "authoritative",
"domain": "vendor.example"
},
"retrieved_at": "2026-02-24T18:00:00Z",
"status": "reachable"
}
]
}
```

### Example 2: Third-party technical analysis blog

```json
{
"references": [
{
"url": "https://research.example/blog/deep-analysis-of-cve-2026-9999",
"type": "technical-description",
"title": "Deep Analysis of CVE-2026-9999",
"publisher": {
"name": "Research Example",
"role": "non_authoritative",
"domain": "research.example"
},
"published_at": "2026-01-12T09:30:00Z",
"retrieved_at": "2026-02-24T18:00:00Z",
"media_type": "text/html",
"status": "reachable"
}
]
}
```

### Example 3: Git commit / patch

```json
{
"references": [
{
"url": "https://github.com/example/project/commit/abcdef123456",
"type": "patch",
"title": "Fix bounds check in parser",
"publisher": {
"name": "example/project",
"role": "authoritative",
"domain": "github.com"
},
"retrieved_at": "2026-02-24T18:00:00Z",
"media_type": "text/html",
"status": "reachable"
}
]
}
```

### Example 4: Removed URL with archive URL

```json
{
"references": [
{
"url": "https://nonexistant.example/security/advisory-123",
"type": "vendor-advisory",
"title": "Security Advisory 123",
"publisher": {
"name": "Old Example Vendor",
"role": "authoritative",
"domain": "old.example"
},
"first_retrieved_at": "2024-03-01T00:00:00Z",
"retrieved_at": "2026-02-24T18:00:00Z",
"status": "archived_only",
"archive_urls": [
"https://archive.example/snapshots/old-example-advisory-123"
]
}
]
}
```

## Impact Assessment

### Expected benefits

- **Improved consumer usability**: explicit `type` and `status` support filtering, prioritization, and automation.
- **Better UI/UX**: references can be labeled consistently in tools and portals.
- **Improved data quality checks**: records with only low-value or non-authoritative references can be detected more easily.
- **Better archival resilience**: structured `status` and `archive_urls` support long-term utility of records.

### Risks / costs (to be refined during QWG discussion)

- **Increased producer burden** if too many fields are required at publication time.
- **Inconsistent classification** for `type` and `publisher.role` without clear guidance, may be able to enhance validation with LLMs and autoclassification approaches.
- **False precision** in dates (`published_at`) or `media_type` if populated heuristically (e.g., “I think this was published last May” \== “2024-05-01T00:00:00Z”)
- **Implementation complexity** if CVE Services automates status checks, archive handling, or media type inference.

## Compatibility and Migration

This section is intentionally incomplete in this initial draft and should be completed based on QWG discussion.

## Success Metrics

To be finalized in later revision of this RFD

### Proposed evaluation timeline

To be finalized in later revision of this RFD

### Candidate success criteria

To be finalized in later revision of this RFD

## Supporting Data or Research

There is substantial incompleteness and inconsistency in current `references` usage (missing `name`, missing `tags`, heavy `x_*` tag usage, see above) across a large sample of recent CVE records.

### What the CNA Rules say about references

[**CNA Operational Rules v4.1.0**](https://www.cve.org/Resources/Roles/Cnas/CNA_Rules_v4.1.0.pdf)

**A CVE Record must include at least one public reference**

* CNA Rules §5.1.10 says a CVE Record MUST contain at least one public reference, and that reference MUST NOT be the CVE Record itself.

**The public reference must exist before or at publication time**

* CNA Rules §5.3.1 says CNAs MUST ensure a public reference exists on the internet before or concurrently with publication of the CVE Record. It also states a CVE Record MUST NOT be the first public disclosure of the vulnerability.

**If there are multiple public references, include the most freely available one**

* CNA Rules §5.3.1.2 says if multiple public references exist, CNAs MUST include the most freely available public reference in the CVE Record.

**CNAs should think about long-term availability / archival**

* CNA Rules §5.3.2 says CNAs SHOULD consider long-term availability of public references, including archival services (e.g., Internet Archive) or other mechanisms determined by the CVE Program.

**At least one reference must meet quality/access criteria**

* CNA Rules §5.3.3 says at least one public reference must:
* SHOULD NOT require registration/login (§5.3.3.1)
* SHOULD NOT impose restrictive terms that conflict with CVE Program use (§5.3.3.2)
* MUST contain info about the specific vulnerability (§5.3.3.3)
* MUST provide minimum information supporting the CVE Record content (§5.3.3.4)
* MUST NOT be the CVE Record itself (§5.3.3.5)

### What the schema says about references (format / fields / limits)

**“references” is required in the CNA container**

* In the CVE schema docs, the CNA container’s references field is marked Required and is an array. It must contain at least 1 item and at most 512 items, and items must be unique.

**Each reference item is a constrained object**

* Each reference entry is an object with No Additional Properties (i.e., schema-constrained shape).

**“url” is the only required field in each reference item**

* Within a reference item, url is Required; name and tags are optional. The schema describes url as the URL used to retrieve the referenced resource.

**“name” is optional and user-created**

* The schema defines optional name as a user-created name for the reference, often the page title.

**“tags” are optional, but if present they are structured**

* If tags is present, it is an array with min 1 item, unique values, and items must be either:
* a standard enum from reference-tags.json, or
* an extension tag (tagExtension, i.e., x\_...)

The schema docs also enumerate the standard reference tags (e.g., vendor-advisory, patch, issue-tracking, technical-description, vdb-entry, etc.).

## Related Issues or Proposals

- Pull Request \#462 (add a field to the reference type):
[https://github.com/CVEProject/cve-schema/pull/462](https://github.com/CVEProject/cve-schema/pull/462)

## Unresolved Questions

1. What consumers should infer when fields are absent (e.g., absent `status` \= unspecified, not unreachable)
2. Whether `retrieved_at` reflects CNA/ADP retrieval, registry retrieval, or either (with provenance implied by container/source)

3. What is the final required field set for a structured reference object?
4. What should the final `type` vocabulary be, and should it consolidate current tag semantics?
5. Should existing `tags` be retained, deprecated, or replaced?
6. What exactly does `publisher` represent (host vs author vs publisher org)?
7. How much automated enrichment is feasible without creating operational burden?
- Who sets and maintains `retrieved_at`, `status`, `archive_urls`, and `media_type` (CNA/ADP vs CVE Program)?
8. What semantics should consumers apply when metadata fields are absent? Unaddressed?
9. How should backward compatibility be handled in schema and CVE Services?

## Future Possibilities

This RFD is intentionally scoped as a targeted improvement to the CVE Record `references` section. Future work may explore:

- Stronger normalization of external resource types and publisher roles
- More explicit relationships between references and specific CVE record assertions (analyzed\_by, described\_by, announced\_by, fixed\_by (for commits/patch refs), mentions, disputes, etc.)
- Richer provenance for reference metadata changes
- Cross-field consistency checks between reference types and other record content (e.g., affectedness, remediation, vendor metadata)
- There is a reference for a patch, but the record does not have patch information or claims there is no patch.
- Reference claims their company is “authoritative” but the record has no mention of the company (vendor).