Skip to content

Commit a6acc37

Browse files
committed
Add Environment Agency hydrology publisher
1 parent d283464 commit a6acc37

12 files changed

Lines changed: 2647 additions & 1 deletion
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# Best-of-Breed Publisher Template Report
2+
3+
Date: 2026-05-26
4+
5+
## Purpose
6+
7+
This report records the initial research recommendation for selecting existing OSHConnect-Python publisher implementations as exemplars for four upcoming data-source publisher additions. The goal is to identify the strongest current patterns for richness, completeness, and accuracy before designing new publisher work.
8+
9+
## Executive Recommendation
10+
11+
Use a small template family rather than a single universal publisher template.
12+
13+
The strongest primary exemplar is `publishers/usgs_eq` for any new event-feed or feed-adapter source. It has the best combination of data-model rigor, authoritative metadata, clear CSAPI modeling, explicit runtime semantics, and enrichment planning.
14+
15+
For station networks, imagery feeds, or strict server compatibility work, use complementary exemplars:
16+
17+
| New source shape | Best existing example | Primary reason |
18+
| --- | --- | --- |
19+
| Event feed, alert feed, or one API stream | `publishers/usgs_eq` | Best Pattern C feed-adapter model, rich metadata, official source documentation, explicit event revision dedupe. |
20+
| Fixed station network or physical sensor fleet | `publishers/usgs_water` | Richest station-level model, sidecar station metadata, multiple datastreams per system, strong official provenance. |
21+
| Imagery, media, or camera feed | `publishers/usgs_nims` | Best media-feed pattern, image URL modeling, duplicate suppression, and companion datastream behavior. |
22+
| Strict CSAPI/SensorML compatibility | `publishers/aviation_wx` plus `publishers/bootstrap_helpers.py` | Best reference for strict parser constraints, GeoJSON stub separation, and SensorML PUT behavior. |
23+
24+
## Evaluation Criteria
25+
26+
The recommendation is based on these qualities:
27+
28+
- Metadata richness: official documentation links, SensorML bodies, identifiers, classifiers, contacts, documents, deployments, and result schemas.
29+
- Completeness: bootstrap script, runtime publisher, config or sidecar data, clean/bootstrap modes, dry-run behavior, operational notes, and enrichment plan.
30+
- Accuracy: source semantics grounded in authoritative upstream documentation, explicit field meanings, correct observation model, and avoidance of misleading CSAPI resource modeling.
31+
- Runtime robustness: duplicate suppression, rate-limit handling, reconnect behavior, server compatibility workarounds, and stable datastream discovery.
32+
- Extensibility: clear boundaries between baseline runtime, optional enrichment, and future UI/client use.
33+
34+
## Findings
35+
36+
### 1. USGS Earthquake Is the Best Event-Feed Exemplar
37+
38+
`publishers/usgs_eq` should be the default starting point for event feeds, alert feeds, and API streams where the source is not a fleet of physical stations.
39+
40+
Key strengths:
41+
42+
- Correct Pattern C model: one procedure, one feed-adapter system, one datastream, and deployment grouping.
43+
- Avoids the common modeling mistake of creating one CSAPI system per event.
44+
- Uses authoritative USGS earthquake source documentation and records optional enrichment surfaces.
45+
- Publishes one CSAPI observation per earthquake event.
46+
- Deduplicates by `(eventId, updatedTime)`, so revised events are republished while unchanged feed entries are skipped.
47+
- Includes a total bootstrap/data-model enrichment pack that documents source verification, omitted upstream fields, and future detail/FDSN enrichment boundaries.
48+
49+
Use this pattern when a new data source is conceptually a live feed rather than a set of deployed sensors.
50+
51+
### 2. USGS Water Is the Best Station-Network Exemplar
52+
53+
`publishers/usgs_water` should be the primary model for fixed stations, physical assets, and parameterized sensor networks.
54+
55+
Key strengths:
56+
57+
- One CSAPI system per monitoring location.
58+
- Multiple datastreams per station, with explicit parameter semantics.
59+
- Rich station sidecar data in `stations.json`.
60+
- Strong official USGS Water Data OGC API references.
61+
- SensorML captures station identifiers, classifiers, contacts, documents, characteristics, capabilities, and position.
62+
- Runtime handles API keys, request delay, rate-limit backoff, station filtering, duplicate suppression, and datastream discovery quirks.
63+
64+
Use this pattern when the new source has named locations, sites, platforms, gauges, monitors, or other physical assets that should appear as systems.
65+
66+
### 3. USGS NIMS Is the Best Media/Imagery Exemplar
67+
68+
`publishers/usgs_nims` should be the reference for image-producing sources, camera feeds, and companion media datastreams.
69+
70+
Key strengths:
71+
72+
- Models imagery as a companion datastream on existing USGS Water systems.
73+
- Captures image URL, thumbnail/full image concepts, media type, camera identity, and latest-file semantics.
74+
- Handles upstream rate limits, cooldown/backoff, and duplicate suppression by filename.
75+
- Uses a curated `cameras.json` sidecar.
76+
77+
Use this pattern when the new source produces media artifacts rather than conventional scalar observations.
78+
79+
### 4. Aviation WX Is the Best Strict-Compatibility Reference
80+
81+
`publishers/aviation_wx` is not necessarily the richest domain model overall, but it is the most useful example for strict CSAPI and SensorML compatibility constraints.
82+
83+
Key strengths:
84+
85+
- Documents strict parser behavior directly in the bootstrap.
86+
- Separates small GeoJSON create stubs from rich SensorML update bodies.
87+
- Records csapi-go-v2 compatibility quirks.
88+
- Uses server-specific result normalization where required.
89+
- Demonstrates multi-station runtime behavior with duplicate suppression.
90+
91+
Use this pattern as a guardrail for all new publishers, especially when targeting both OSH SensorHub and stricter CSAPI servers.
92+
93+
## Baseline Standard for New Publishers
94+
95+
Every new publisher should follow these conventions unless the data source clearly requires a different model:
96+
97+
- Use `publishers/bootstrap_helpers.py` for idempotent create/update/delete behavior.
98+
- Create resources with minimal GeoJSON stubs, then PUT rich SensorML using `application/sml+json`.
99+
- Use stable UIDs; never depend on server-assigned IDs in source code or config.
100+
- Include authoritative source documentation links in procedure, system, datastream, and deployment metadata where appropriate.
101+
- Define an explicit result schema with units, field definitions, and omitted-field notes if the upstream source is richer than the baseline result body.
102+
- Add config or sidecar files for curated station/camera/source lists.
103+
- Implement duplicate suppression using source-native identifiers and update timestamps where possible.
104+
- Handle HTTP 429 or source throttling with cooldown/backoff behavior.
105+
- Support `--dry-run`, `--once`, and interval control for safe validation.
106+
- Keep baseline polling separate from optional enrichment or expensive detail fetches.
107+
108+
## Pattern Selection Rules
109+
110+
When the four candidate sources are provided, classify each first:
111+
112+
1. If it is a stream of events, alerts, tracks, reports, incidents, detections, or records from one API feed, start from `usgs_eq`.
113+
2. If it is a list of physical stations or monitoring locations, start from `usgs_water`.
114+
3. If it is a camera/image/media source, start from `usgs_nims`.
115+
4. If it is a moving-object feed with many transient assets, use `usgs_eq` and compare against `opensky` for runtime-specific field handling.
116+
5. If it must run against csapi-go-v2 or another strict server, review `aviation_wx` and `bootstrap_helpers.py` before finalizing the bootstrap payloads.
117+
118+
## Non-Preferred Starting Points
119+
120+
The following publishers remain useful references but should not be the primary template for new work:
121+
122+
- `publishers/iss`: useful for a simple moving-object demo, but too specialized and thin as a general template.
123+
- Earlier NWS/NDBC/CO-OPS patterns: operationally valuable, but the repository research notes show they were candidates for further metadata enrichment.
124+
- `publishers/opensky`: useful for moving-object feed semantics and bounding-box configuration, but less complete as the general best-of-breed exemplar than USGS EQ.
125+
126+
## Recommended Next Step
127+
128+
When the four new data sources are available, produce a per-source classification table with:
129+
130+
- source type and recommended exemplar,
131+
- expected CSAPI model,
132+
- proposed procedures/systems/datastreams/deployments,
133+
- required sidecar/config files,
134+
- dedupe key and revision strategy,
135+
- rate-limit/backoff strategy,
136+
- authoritative source documentation links,
137+
- optional enrichment surfaces.
138+
139+
This should happen before implementation so the four publishers share a coherent design language rather than diverging into one-off scripts.

0 commit comments

Comments
 (0)