Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# OpenSAMPL data paths
archive/
ntp-snapshots/
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
19 changes: 17 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ All notable changes to this project will be documented in this file in [Keep a C
This project adheres to [Semantic Versioning](https://semver.org/).

---

<!--

## [Unreleased] - YYYY-MM-DD
Expand Down Expand Up @@ -37,13 +36,29 @@ This project adheres to [Semantic Versioning](https://semver.org/).
*Unreleased* versions radiate potential—-and dread. Once you merge an infernal PR, move its bullet under a new version heading with the actual release date.*

-->
## [Unreleased] - YYYY-MM-DD
## [1.2.0] - Unreleased
### Added
- 🔥 Moved alembic migration code into openSAMPL along with Docker image information
- 🔥 Moved backend api code into openSAMPL along with Docker image information
- 🔥 Docker-compose for developers which installs openSAMPL as editable on backend image
- 🔥 NTP vendor probe family (`NtpProbe`) with JSON snapshot format, filename convention, and `ntp_metadata` ORM table
- 🔥 `opensampl-collect ntp` entry point: local chrony/ntpq/timedatectl-style collection and remote UDP queries via `ntplib`
- 🔥 NTP-focused metrics in `METRICS` (phase offset, delay, jitter, stratum, reachability, dispersion, root delay/dispersion, poll interval, sync health)
- 🔥 Idempotent database bootstrap after schema creation: seed `reference_type`, `metric_type`, default `reference` and `defaults` rows from `REF_TYPES` / `METRICS`; `public.get_default_uuid_for()` for `ProbeData` defaults; `castdb.campus_locations` view for geospatial dashboards backed by `locations.geom`
- 🔥 Grafana: NTP probes dashboard (`ntp-opensampl`), public geospatial timing dashboard updates, datasource/dashboard provisioning alignment
- 🔥 Grafana table panels joining stored `probe_metadata`, `ntp_metadata`, `locations`, and `reference` / `reference_type` for probe reference & source context (no runtime geolocation in panels)
- 🔥 Remote NTP snapshot identity overrides (`probe_id`, `probe_ip`, `probe_name`, optional lab `geolocation` hints) for stable ingest keys

### Changed
- ⚡ Grafana timing panel titles and dashboard copy to **reference-safe** wording (NTP / configured reference vs implying GNSS truth where not applicable); extensible for future GNSS-backed probes
- ⚡ `METRICS.NTP_JITTER` description to distinguish measured jitter (local parsers) from conservative remote estimates
- ⚡ Remote `query_ntp_server`: emit `jitter_s` for time series using a documented delay/root-dispersion bound when RFC peer jitter is unavailable from a single packet
- ⚡ `load_probe_metadata`: NTP path attaches stored `locations` rows for dashboard geospatial joins (one-time at metadata load; not repeated in Grafana queries)

### Fixed
- 🩹 `opensampl init` / `create_new_tables` leaving lookup tables empty (load path now seeds baseline rows and defaults)
- 🩹 Grafana PostgreSQL variables and panel filters: text-safe UUID handling for `varchar` `probe_metadata.uuid` (avoid `varchar = uuid` / empty `IN ()` failures)
- 🩹 Public geospatial dashboard map layer using the provisioned `castdb-datasource` UID consistently
- 🩹 Bug which caused random data duration to always be 1 hour

## [1.1.5] - 2025-09-22
Expand Down
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,15 @@ The name OpenSAMPL stands for **O**pen **S**ynchronization **A**nalytics and **M
with the goal of this project being to provide a comprehensive and open-source solution for clock data management and analysis.
Visualizations are provided via [grafana](https://grafana.com/), and the data is stored in a [TimescaleDB](https://www.timescale.com/) database, which is a time-series database built on PostgreSQL.

**NTP clock probes** are supported end-to-end: YAML-driven vendor scaffolding (`opensampl create`), JSON snapshots, `opensampl load ntp`, and `opensampl-collect ntp` (local chrony/ntpq-style chain or remote UDP via optional `ntplib`). NTP observations use the same metric/reference tables as other vendors; **“Reference”** in Grafana means the OpenSAMPL reference dimension for SQL joins, **not** GNSS ground truth for NTP-only demos.

**Database bootstrap**: first-time setup should run **`opensampl init`**, which creates schema and seeds lookup tables (`reference_type`, `metric_type`, `reference`, `defaults`) plus PostgreSQL helpers expected by the load path (see `opensampl/db/bootstrap.py`). Skipping init leads to obscure failures on first load.

**Documentation**: [published docs](https://ornl.github.io/OpenSAMPL/) — start with *Guides*. For NTP specifically, see *NTP vendor design* and *NTP extension (walkthrough)* (generator, geolocation at ingest, bootstrap, Grafana query notes).

### (**O**pen **S**ynchronization **A**nalytics and **M**onitoring **PL**atform)

python tools for adding clock data to a timescale db.
Python tools for adding clock data to a TimescaleDB-backed database.

## CLI TOOL

Expand All @@ -39,6 +44,7 @@ python tools for adding clock data to a timescale db.
2. Pip install the latest version of opensampl:
```bash
pip install opensampl
# Remote NTP collection also needs extras, e.g. pip install 'opensampl[collect]'
```

### Development Setup
Expand Down Expand Up @@ -130,21 +136,23 @@ Display current environment configuration:

```bash
# Show all variables
poetry run opensampl config show
uv run opensampl config show

# Show with descriptions
poetry run opensampl config show --explain
uv run opensampl config show --explain

# Show specific variable
poetry run opensampl config show --var DATABASE_URL
uv run opensampl config show --var DATABASE_URL
```

Configuration resolution: explicit `--env-file`, then `OPENSAMPL_ENV_FILE`, then `python-dotenv`’s search for `.env`. **Process environment variables override values from the env file** (pydantic-settings precedence).

### Set Configuration

Update environment variables:

```bash
poetry run opensampl config set VARIABLE_NAME value
uv run opensampl config set VARIABLE_NAME value
```

## File Format Support
Expand Down
1 change: 1 addition & 0 deletions docs/guides/collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ The collect API enables automated collection of measurement data from network-co

- **Microchip TWST Modems** (ATS6502 series): Collect offset and EBNO tracking values along with contextual information
- **Microchip TimeProvider® 4100** (TP4100): Collect timing performance metrics from various input channels via web interface
- **NTP** (`opensampl-collect ntp`): Write JSON snapshots for the `NTP` vendor—local host client state (chrony/ntpq/timedatectl chain) or remote UDP queries (install with `pip install 'opensampl[collect]'` for `ntplib`). See [NTP vendor design](ntp_vendor_design.md) and [NTP extension walkthrough](ntp_extension.md).

## CLI Usage

Expand Down
8 changes: 5 additions & 3 deletions docs/guides/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Guides

* [Configuration](configuration.md)
* [Configuration](configuration.md)
* [Using the `opensampl` CLI](opensampl-cli.md)
* [Using the `opensampl-server` CLI](opensampl-server.md)
* [Using the `opensampl-collect` CLI](opensampl-cli.md)
* [Using the `opensampl-server` CLI](opensampl-server.md)
* [Using the `opensampl-collect` CLI](opensampl-cli.md)
* [NTP vendor design](ntp_vendor_design.md)
* [NTP extension walkthrough](ntp_extension.md) — generator, bootstrap, geolocation, Grafana vs demo repo
125 changes: 125 additions & 0 deletions docs/guides/ntp_extension.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# NTP vendor extension (implementation walkthrough)

This document describes how the **NTP** clock-probe family was added to OpenSAMPL, how it fits the **vendor generator** model, and which pieces remain **manual integration** work. It also separates **upstream OpenSAMPL** behavior from the **syncscope-at-home** demo appliance.

For field-level design (modes, metadata vs series, local tool chain), see [NTP vendor design](ntp_vendor_design.md).

---

## 1. Why NTP is modeled as a vendor / probe family

OpenSAMPL organizes ingest around **vendors** (`VendorType`), **probe identity** (`ProbeKey`), **vendor-specific metadata tables**, and **normalized time series** in `probe_data`. NTP sources are not GNSS truth references; they are **network clock observations** (local client state or remote server responses). Modeling NTP as its own vendor keeps:

- A dedicated **`ntp_metadata`** table for sync/leap/stratum/targets and parser provenance.
- Stable **`ProbeKey`** derivation from snapshot filenames and JSON payloads.
- Metrics and references aligned with the rest of the platform (`metric_type`, `reference`) without pretending NTP stratum implies a calibrated physical reference.

---

## 2. Role of the vendor generator (`opensampl create`)

The CLI command `opensampl create <config.yaml>` uses `opensampl.create.create_vendor.VendorConfig` to:

- Generate or refresh **probe parser scaffolding** and **SQLAlchemy metadata ORM** from YAML.
- Update **`opensampl.vendors.constants`** so the new vendor is registered for routing and CLI.

The generator is **scaffolding**: it does not implement protocol logic, collectors, or Grafana panels by itself.

---

## 3. `ntp_vendor.yaml` and generated artifacts

The canonical input is `opensampl/ntp_vendor.yaml` in the package source tree. It declares:

- `name`, `parser_class` / `parser_module`, `metadata_orm`, `metadata_table`
- `metadata_fields`: typed columns on `ntp_metadata` (mode, targets, sync fields, dispersion, etc.)

Running `opensampl create` against this file produces/updates generated modules under `opensampl/vendors/` and wires the vendor into constants. Treat **`ntp_vendor.yaml` as the contract** between schema and hand-written Python (`ntp.py`, collectors).

---

## 4. Manual steps after generation

Typical follow-through (as done for NTP):

1. **Implement the probe class** (`NtpProbe`): parse snapshot JSON, normalize metadata, emit series rows with correct metric keys.
2. **Implement collectors** (`opensampl-collect ntp`):
- **Local**: shell out to chrony/ntpq/timedatectl with a defined fallback order.
- **Remote**: UDP client via `ntplib` (`opensampl/vendors/ntp_remote.py`), optional probe/geo overrides.
3. **Load path hooks**: `write_to_table` / probe load pipeline must attach NTP-specific behavior (e.g. **geolocation** before `probe_metadata` insert—see below).
4. **Metrics**: register NTP metrics in `opensampl/metrics` so bootstrap seeds `metric_type`.
5. **References**: NTP demos use **`REF_TYPES.UNKNOWN`**; dashboards label **“Reference”** for SQL joins, **not** as GNSS ground truth.
6. **ORM / migrations**: ensure `opensampl init` (or Alembic, if used) creates `ntp_metadata` and related objects consistently with generated code.

---

## 5. Metrics and reference choices

- **Offset, delay, stratum, poll, root delay/dispersion**, etc. are stored as first-class metrics where applicable.
- **Jitter**: a **single** remote NTP client response does not expose RFC5905 peer jitter; `ntp_remote` may emit a **positive bound** derived from delay and root dispersion so dashboards have a value—this is an **estimate**, not a sampled Allan deviation. Local chrony/ntpq paths may still expose **measured** jitter when available.
- **Reference**: use **`UNKNOWN`** unless a future model maps NTP reference IDs to calibrated references. Do **not** describe NTP-only demos as validating against **GNSS truth**.

---

## 6. Local vs remote collection

| Path | Mechanism | Notes |
|------|-----------|--------|
| **Local** | Subprocess chain (chronyc, ntpq, timedatectl, …) | Best-effort; records `observation_source` and partial state when tools are missing. |
| **Remote** | `ntplib` UDP request | High ports supported for lab mocks; production often uses UDP **123**. Timeouts produce degraded metadata, not process crashes. |

---

## 7. Metadata and geolocation

**Geolocation is applied at metadata ingest** (when building rows for `locations` / `probe_metadata`), not inside Grafana:

- **`attach_ntp_location`** (`opensampl/load/ntp_geolocation.py`) resolves coordinates from YAML `geo_override`, lab defaults, or **public IP → HTTP** lookup (e.g. ip-api.com) when enabled.
- Grafana maps read **`castdb.locations`** / **`castdb.campus_locations`**; panels do **not** call external geo APIs at query time.

Disable enrichment with env **`NTP_GEO_ENABLED=false`** if you want probes without new location rows.

---

## 8. Bootstrap and seed requirements

`opensampl init` and/or load bootstrap (`opensampl/db/bootstrap.py` → `seed_lookup_tables`) must ensure:

- **`reference_type`** / **`metric_type`** rows exist (including **UNKNOWN**).
- A **`reference`** row and **`defaults`** entries so ORM defaults and `ProbeData` triggers resolve UUIDs.
- **`public.get_default_uuid_for(text)`** exists on PostgreSQL (used by probe data insertion).
- **`castdb.campus_locations`** view (PostGIS lat/lon from `locations.geom`) for **reference-safe** geospatial dashboards when PostGIS is present.

Skipping bootstrap causes obscure failures during first load; always run **`opensampl init`** against a fresh database before loading probes.

---

## 9. Grafana and SQL hardening

- Dashboards use **text** template variables aligned with `probe_metadata.uuid` (varchar UUID strings)—avoid numeric formatting that strips leading zeroes.
- Prefer queries that tolerate **empty** or **single-probe** deployments (e.g. NTP-only stacks without legacy GNSS rows).
- **“Reference”** in titles means **OpenSAMPL reference dimension** for joins/filters, not a claim of absolute timing truth.
- **Metadata panels** may **collapse** JSON into compact rows for readability; that is presentation-only.

---

## 10. OpenSAMPL vs syncscope-at-home

| Concern | OpenSAMPL (library) | syncscope-at-home (demo) |
|--------|---------------------|---------------------------|
| Vendor YAML, parsers, collectors, load hooks, bootstrap | Yes | Consumes as submodule |
| Docker Compose, custom **PostGIS + Timescale** DB image, **ntp-ingest** loop | No | Yes (`docker-compose.yaml`, `demo/db`, `demo/ntp-ingest`) |
| Default **NTP targets**, **interval**, spool paths | No | `config/ntp-ingest.yaml`, env `NTP_INGEST_CONFIG` |
| Lab **mock NTP** UDP services | No | Compose services `mock-ntp-*` |
| Opinionated Grafana **dashboards** shipped in repo | Optional / examples | `demo/` Grafana image and provisioning |

Treat **syncscope-at-home** as an **appliance-style** illustration: it shows how to run continuous collect+load with sane defaults, not a mandatory deployment topology for upstream OpenSAMPL.

---

## See also

- [NTP vendor design](ntp_vendor_design.md) — probe identity, modes, failure semantics
- [Collection](collection.md) — `opensampl-collect` overview
- [Configuration](configuration.md) — env files and CLI config
- API: [`create_vendor`](../api/helpers/create_vendor.md)
59 changes: 59 additions & 0 deletions docs/guides/ntp_vendor_design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# NTP vendor design (OpenSAMPL)

This note defines the NTP clock-probe family: identity, storage, local vs remote collection, and lab-demo caveats.

## Vendor identity

| Item | Value |
|------|--------|
| Vendor name | `NTP` |
| Probe class | `NtpProbe` |
| Module | `opensampl.vendors.ntp` |
| Metadata ORM / table | `NtpMetadata` / `ntp_metadata` |

## Probe identity (`ProbeKey`)

- **`ip_address`**: For `remote_server`, the target server IP or a placeholder derived from the hostname. For `local_host`, typically `127.0.0.1` or the host’s primary IPv4 used for labeling.
- **`probe_id`**: Stable slug per logical probe (e.g. `local-chrony`, `mock-a`, `remote-pool-1`).

Snapshot files use a strict filename pattern so the loader can derive `ProbeKey` without opening the file (see `NtpProbe` docstring).

## Modes

| Mode | Meaning |
|------|--------|
| `local_host` | Collector runs on the machine whose NTP client state is observed (Raspberry Pi friendly). |
| `remote_server` | Collector issues NTP client requests to `target_host`:`target_port` (default UDP **123**; high ports supported for demos). |

## Metadata vs time series

- **`ntp_metadata`**: Latest normalized fields from the most recent observation (sync/leap/stratum/reach/reference/poll/root metrics, mode, targets, `observation_source`, etc.) plus `additional_metadata` JSONB for raw command output snippets and parser notes.
- **`probe_data`**: One OpenSAMPL row per `(time × metric_type × reference)` as elsewhere. NTP uses dedicated metrics (offset, delay, jitter, stratum, etc.) with `REF_TYPES.UNKNOWN` unless a future reference model is introduced.

Offset is stored in seconds; Grafana panels may scale to nanoseconds for consistency with existing timing dashboards.

## Local fallback chain

Tools are tried in order until one yields usable structured data:

1. `chronyc tracking`
2. `chronyc -m 'sources -v'` or `chronyc sources -v`
3. `ntpq -p`
4. `timedatectl show-timesync --all` / `timedatectl status`
5. `systemctl show systemd-timesyncd` / `systemctl status systemd-timesyncd`

Missing binaries are skipped without failing the snapshot; `sync_status` and `observation_source` record partial or unavailable state.

## Remote collection

Standard NTP client requests over UDP (default port **123**, configurable). Timeouts and non-responses produce degraded samples and metadata rather than crashing the loader.

## Failure semantics

- Loaders and collectors catch per-step failures; snapshots are still written when possible.
- Missing numeric fields omit that metric series for that timestamp or use absent rows only—never rely on invalid JSON (`NaN` is avoided in stored values).

## Demo vs production NTP

- **Lab mock servers** often listen on **high UDP ports** so containers do not require `CAP_NET_BIND_SERVICE`. Real deployments typically use **UDP/123**.
- **Simulated drift / unhealthy behavior** in containers is implemented by manipulating **NTP response fields** (stratum, delay, dispersion, etc.), not by true physical clock Allan deviation. Comparison panels show **protocol-level** differences between mock instances.
2 changes: 2 additions & 0 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ nav:
- Create: guides/create_probe_type.md
- Server: guides/opensampl-server.md
- Collect: guides/collection.md
- NTP vendor: guides/ntp_vendor_design.md
- NTP extension (walkthrough): guides/ntp_extension.md
- Random Data: guides/random-data-generation.md
- API:
- index: api/index.md
Expand Down
3 changes: 2 additions & 1 deletion opensampl/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,8 @@ def init():
"""
Initialize the database.
Creates all tables as defined in the opensampl.db.orm file.
Creates all tables as defined in the opensampl.db.orm file, then idempotently seeds lookup tables
(reference_type, reference, metric_type, defaults) from REF_TYPES and METRICS.
This is not required if you are using `opensampl-server`, as that is done as part of that initialization of the db.
"""
logger.debug("Initializing database")
Expand Down
Loading