Skip to content

Commit 04ee03f

Browse files
committed
docs: add publisher enrichment prioritization plan
1 parent e2f9dec commit 04ee03f

1 file changed

Lines changed: 394 additions & 0 deletions

File tree

Lines changed: 394 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,394 @@
1+
# Publisher Full-Scope Enrichment Prioritization Plan
2+
3+
**Date:** 2026-03-12
4+
**Author:** Codex (GPT-5)
5+
**Purpose:** define the recommended one-by-one execution order for full-scope publisher enrichment across the current `OSHConnect-Python` public-data fleet.
6+
**Scope:** `nws`, `ndbc`, `coops`, `aviation_wx`, `opensky`, `iss`, `usgs_water`, `usgs_nims`, `usgs_eq`
7+
**Relationship to prior analysis:** this plan operationalizes the findings in `All_Bootstraps_Full_Scope_Gap_Analysis_2026-03-12.md` and its appendices.
8+
9+
---
10+
11+
## 1. Executive Decision
12+
13+
Do not enrich the publishers in alphabetical order, by repo age, or only by which one looks weakest.
14+
15+
The recommended sequence is:
16+
17+
1. `ISS`
18+
2. `USGS Water`
19+
3. `NWS`
20+
4. `USGS EQ`
21+
5. `OpenSky`
22+
6. `NDBC`
23+
7. `USGS NIMS`
24+
8. `CO-OPS`
25+
9. `Aviation WX`
26+
27+
This is not a pure maturity ranking. It is a `rework-minimizing, semantics-first execution order` designed to:
28+
29+
- close the biggest current-fleet contradiction first;
30+
- resolve dependency roots before dependent publishers;
31+
- establish canonical templates before catch-up work;
32+
- convert the strongest current ideas into reusable patterns early;
33+
- delay the lowest-leverage catch-up work until the family conventions are already stable.
34+
35+
---
36+
37+
## 2. Prioritization Logic
38+
39+
The order above was chosen using six criteria.
40+
41+
### 2.1 Canonical completeness
42+
43+
If a publisher slot is supposed to exist in the active fleet but is materially incomplete, that gap should be closed early because it corrupts the credibility of the whole corpus.
44+
45+
This is why `ISS` is first.
46+
47+
### 2.2 Dependency roots
48+
49+
If one publisher depends on another publisher's systems, semantics, or sidecars, the upstream publisher should usually be enriched first.
50+
51+
This is why `USGS Water` must precede `USGS NIMS`.
52+
53+
### 2.3 Pattern-template leverage
54+
55+
The first few enrichments should define reusable family standards:
56+
57+
- canonical station-family enrichment expectations;
58+
- canonical Pattern C feed-adapter expectations;
59+
- canonical Pattern A companion-datastream expectations.
60+
61+
This is why `NWS`, `USGS EQ`, and `OpenSky` are early, and why `CO-OPS` and `Aviation WX` are late.
62+
63+
### 2.4 Artifact-state risk
64+
65+
Publishers whose current artifact state is contradictory, misleading, or absent should be addressed before publishers that are already materially present and inspectable.
66+
67+
This is why `ISS` and `USGS Water` are ahead of already-packaged publishers like `USGS NIMS` and `USGS EQ`.
68+
69+
### 2.5 Marginal fleet value
70+
71+
An enrichment should be prioritized earlier if finishing it gives the rest of the fleet a better template, stronger test discipline, or clearer vocabulary.
72+
73+
This is why `NWS` is ahead of `CO-OPS` and `Aviation WX`, and why `USGS EQ` is ahead of a second-pass refinement on already-strong `USGS NIMS`.
74+
75+
### 2.6 Catch-up should come after template stabilization
76+
77+
The thinnest publishers should not be first if their enrichment would otherwise be designed in a vacuum.
78+
79+
This is why `CO-OPS` and especially `Aviation WX` are late-phase work rather than first-phase work.
80+
81+
---
82+
83+
## 3. Phase 0 Guardrails
84+
85+
Before the first publisher enrichment begins, define these fleet-level guardrails once and then reuse them for every publisher:
86+
87+
1. a standard `full-scope package contract`
88+
- required folders
89+
- required manifests
90+
- required worked examples
91+
- required patch candidates
92+
- required source-corpus notes
93+
2. a standard `semantic acceptance checklist`
94+
- procedure
95+
- system
96+
- datastream
97+
- deployment
98+
- feature-of-interest
99+
- provenance
100+
- units and null semantics
101+
3. a standard `round-trip verification checklist`
102+
- POST
103+
- GET back
104+
- SensorML inspection
105+
- result-schema inspection
106+
- observation write-path verification
107+
4. a standard `runtime hardening checklist`
108+
- TLS verification
109+
- auth handling
110+
- retry and dedupe policy
111+
- logging and observability
112+
5. a standard `artifact state label`
113+
- `metadata pack`
114+
- `total pack`
115+
- `source basis`
116+
- `historical artifact`
117+
- `migration artifact`
118+
119+
This preflight is important because otherwise the first three enrichments will each reinvent package shape and acceptance criteria.
120+
121+
---
122+
123+
## 4. Ordered Publisher Plan
124+
125+
| Rank | Publisher | Why This Position | Primary Outcome | Main Dependency / Leverage |
126+
|---|---|---|---|---|
127+
| 1 | ISS | The active fleet is currently incomplete; the bootstrap slot is missing even though the runtime and README imply it exists. | Migrate ISS into a canonical current bootstrap and package. | Removes the single clearest architecture contradiction in the fleet. |
128+
| 2 | USGS Water | It is a dependency root for NIMS and currently has an artifact-state mismatch. | Materialize the missing total package and make water the canonical USGS station reference. | Unblocks clean NIMS follow-on work and repairs repo-state trust. |
129+
| 3 | NWS | It is the best place to formalize station-family round-trip SensorML acceptance criteria. | Convert strong pack work into a canonical live-plus-package station reference. | Sets the first real station-family enrichment template. |
130+
| 4 | USGS EQ | It is already one of the strongest full packages and can define the canonical Pattern C total-pack standard. | Harden Pattern C lifecycle, provenance, and event semantics. | Establishes a reusable feed-adapter template. |
131+
| 5 | OpenSky | It is the other major Pattern C publisher and already has strong metadata foundations. | Graduate OpenSky from metadata-pack maturity to total-pack maturity. | Reuses the Pattern C template established in USGS EQ. |
132+
| 6 | NDBC | It is the richest station-family multi-stream case and should become the canonical station-plus-imagery reference. | Convert NDBC into the canonical multi-stream station publisher package. | Reuses station-family conventions from NWS and informs imagery semantics for later work. |
133+
| 7 | USGS NIMS | It is already strong, but its dependency semantics should be refined only after water and family templates are stable. | Formalize companion-datastream dependency semantics and policy constraints. | Depends on clean USGS Water semantics and benefits from earlier imagery-pattern work. |
134+
| 8 | CO-OPS | It lacks a pack, but its enrichment should borrow already-proven station-family structure rather than invent its own. | Create a full package and sharpen coastal product semantics. | Reuses station-family scaffolding from NWS and NDBC. |
135+
| 9 | Aviation WX | It is the thinnest current publisher, but also the one with the least leverage on the rest of the fleet. | Bring Aviation WX up to full-package parity without using it as the template setter. | Best done after station-family conventions are already frozen. |
136+
137+
---
138+
139+
## 5. Recommended Waves
140+
141+
### Wave 1. Canonical blockers and dependency roots
142+
143+
1. `ISS`
144+
2. `USGS Water`
145+
3. `NWS`
146+
147+
This wave fixes the biggest fleet contradiction, repairs the most important current artifact mismatch, and establishes the first canonical station-family enrichment method.
148+
149+
### Wave 2. Canonical family references
150+
151+
4. `USGS EQ`
152+
5. `OpenSky`
153+
6. `NDBC`
154+
7. `USGS NIMS`
155+
156+
This wave completes the strongest reusable family references:
157+
158+
- Pattern C total-package reference;
159+
- second Pattern C implementation;
160+
- multi-stream station-family reference;
161+
- Pattern A companion-datastream reference.
162+
163+
### Wave 3. Catch-up and parity
164+
165+
8. `CO-OPS`
166+
9. `Aviation WX`
167+
168+
This wave uses already-proven conventions to bring the remaining station-family publishers to full-package parity with much lower design risk.
169+
170+
---
171+
172+
## 6. Publisher-by-Publisher Intent
173+
174+
### 6.1 ISS
175+
176+
**Why first**
177+
178+
- The current fleet is semantically incomplete while ISS remains a runtime without a current bootstrap.
179+
- The migration path is unusually well defined because `scripts/bootstrap_iss.py` already exists as a strong precedent.
180+
- The project should not start a long enrichment program while one advertised active publisher is still missing its bootstrap artifact.
181+
182+
**What "full-scope enrichment" means here**
183+
184+
- migrate `bootstrap_iss.py` into `publishers/iss/`
185+
- modernize it to current helper-layer and env conventions
186+
- produce a current ISS package, not just a migrated script
187+
- preserve the dual-product model and rich SensorML
188+
- remove every legacy credential, TLS, and hardcoded-endpoint anti-pattern
189+
190+
**Exit condition**
191+
192+
ISS is no longer a special-case hole in the active fleet.
193+
194+
### 6.2 USGS Water
195+
196+
**Why second**
197+
198+
- `USGS NIMS` depends on it structurally
199+
- the current repo has a claimed total pack that is not actually present on disk
200+
- the water publisher is already semantically stronger than its artifact state suggests
201+
202+
**What "full-scope enrichment" means here**
203+
204+
- materialize the missing total package
205+
- reconcile research-note claims with actual on-disk artifacts
206+
- preserve and extend the strong parameter/statistic semantics
207+
- formalize datum, QC, null, and feature-of-interest semantics more explicitly
208+
209+
**Exit condition**
210+
211+
USGS Water becomes the canonical USGS station-family base that NIMS can depend on without ambiguity.
212+
213+
### 6.3 NWS
214+
215+
**Why third**
216+
217+
- NWS is the best place to institutionalize the lessons from the historical SensorML field-shape failure
218+
- it already has substantial adjacent metadata-pack work
219+
- it can become the station-family proof point for round-trip validation discipline
220+
221+
**What "full-scope enrichment" means here**
222+
223+
- converge the best metadata-pack content into a live canonical package
224+
- codify station-family semantic-contract conventions
225+
- add explicit round-trip verification expectations
226+
- deepen QC, null, and feature-of-interest semantics
227+
228+
**Exit condition**
229+
230+
NWS becomes the reference station-family enrichment against which later station publishers are judged.
231+
232+
### 6.4 USGS EQ
233+
234+
**Why fourth**
235+
236+
- it is already one of the strongest current total packages
237+
- it can define the canonical Pattern C total-package template with the least uncertainty
238+
- its lifecycle, provenance, and crosswalk semantics are unusually rich and reusable
239+
240+
**What "full-scope enrichment" means here**
241+
242+
- turn strong current package materials into the authoritative Pattern C template
243+
- deepen summary/detail/FDSN crosswalk semantics
244+
- formalize lifecycle, supersession, and quality semantics
245+
- define the feed-adapter artifact and acceptance pattern other Pattern C publishers should follow
246+
247+
**Exit condition**
248+
249+
USGS EQ becomes the canonical Pattern C reference package.
250+
251+
### 6.5 OpenSky
252+
253+
**Why fifth**
254+
255+
- OpenSky is already a strong Pattern C implementation, but it still trails USGS EQ in package maturity
256+
- once USGS EQ defines the full Pattern C standard, OpenSky can be upgraded without inventing a second incompatible model
257+
258+
**What "full-scope enrichment" means here**
259+
260+
- graduate from metadata pack to total package
261+
- formalize auth-aware and budget-aware operational semantics
262+
- deepen data quality, provenance, and coverage semantics
263+
- align its package shape to the Pattern C standard frozen in the previous step
264+
265+
**Exit condition**
266+
267+
The fleet has two aligned Pattern C references rather than one strong feed-adapter and one separate event-feed model.
268+
269+
### 6.6 NDBC
270+
271+
**Why sixth**
272+
273+
- NDBC is the richest current station-family multi-stream case
274+
- it benefits from earlier station-family and Pattern C decisions
275+
- it is the best place to formalize how imagery-related semantics coexist with a fixed-station observation model
276+
277+
**What "full-scope enrichment" means here**
278+
279+
- convert the existing metadata pack into a total-package-grade artifact
280+
- deepen the buoy-plus-imagery relationship model
281+
- sharpen QC, null, and provenance semantics
282+
- decide whether NDBC becomes the canonical "multi-stream station" reference
283+
284+
**Exit condition**
285+
286+
The station family now has a strong simple reference and a strong multi-stream reference.
287+
288+
### 6.7 USGS NIMS
289+
290+
**Why seventh**
291+
292+
- it is already strong and materially present
293+
- its next gains depend more on dependency clarification than on raw metadata creation
294+
- it benefits from earlier USGS Water and imagery-related lessons
295+
296+
**What "full-scope enrichment" means here**
297+
298+
- formalize the dependency contract on USGS Water systems
299+
- decide whether selected-camera-per-station is permanent policy or transitional implementation
300+
- deepen coverage, product, and asset provenance semantics
301+
- align Pattern A packaging to the conventions proven earlier
302+
303+
**Exit condition**
304+
305+
USGS NIMS becomes the canonical Pattern A reference with explicit dependency semantics.
306+
307+
### 6.8 CO-OPS
308+
309+
**Why eighth**
310+
311+
- it is a good publisher, but not a template setter
312+
- it currently lacks a pack
313+
- its enrichment can be done faster and more cleanly after station-family conventions are settled
314+
315+
**What "full-scope enrichment" means here**
316+
317+
- create a full package from scratch
318+
- sharpen product-family, datum, and coastal semantics
319+
- apply already-proven station-family packaging and validation discipline
320+
321+
**Exit condition**
322+
323+
CO-OPS reaches full-package parity without forcing the project to learn station-family lessons the hard way.
324+
325+
### 6.9 Aviation WX
326+
327+
**Why ninth**
328+
329+
- it is the thinnest current publisher
330+
- it has the least leverage on the rest of the fleet
331+
- it will benefit most from arriving after the station-family conventions and package contract are already stable
332+
333+
**What "full-scope enrichment" means here**
334+
335+
- create its first full package
336+
- deepen airport/system/procedure semantics
337+
- formalize aviation vocabulary and null/missing-value handling
338+
- bring it to parity without turning it into the family template
339+
340+
**Exit condition**
341+
342+
Aviation WX is no longer the semantic and artifact outlier of the current fleet.
343+
344+
---
345+
346+
## 7. Program-Level Checkpoints
347+
348+
Do not run the whole program as nine disconnected tasks. Use checkpoints.
349+
350+
### Checkpoint A: after ISS and USGS Water
351+
352+
Lock down:
353+
354+
- the standard package contract
355+
- the migration-artifact policy
356+
- the dependency-root policy
357+
358+
### Checkpoint B: after NWS and USGS EQ
359+
360+
Lock down:
361+
362+
- canonical station-family acceptance criteria
363+
- canonical Pattern C acceptance criteria
364+
- round-trip verification expectations
365+
366+
### Checkpoint C: after OpenSky, NDBC, and USGS NIMS
367+
368+
Lock down:
369+
370+
- final Pattern C conventions
371+
- final multi-stream station conventions
372+
- final Pattern A dependency conventions
373+
374+
### Checkpoint D: after CO-OPS and Aviation WX
375+
376+
Run a parity review to confirm the lagging station-family publishers now meet the same artifact and semantic baseline as the rest of the fleet.
377+
378+
---
379+
380+
## 8. Recommended Operating Rule
381+
382+
For this program, `done` should mean more than "a richer bootstrap file exists."
383+
384+
For each publisher, completion should require:
385+
386+
- a current bootstrap aligned to the target model
387+
- a materially present package on disk
388+
- corrected and verified provenance corpus
389+
- explicit semantic-contract notes
390+
- round-trip verification evidence
391+
- a runtime follow-on list if runtime work remains outside the package scope
392+
393+
If the project follows that rule, the enrichment program will produce a canonical fleet rather than a collection of better comments.
394+

0 commit comments

Comments
 (0)