Skip to content

Feature/matchy ntp#21

Draft
midgemacf wants to merge 22 commits intorelease/1.2.0from
feature/matchy-ntp
Draft

Feature/matchy ntp#21
midgemacf wants to merge 22 commits intorelease/1.2.0from
feature/matchy-ntp

Conversation

@midgemacf
Copy link
Copy Markdown
Contributor

@midgemacf midgemacf commented Apr 15, 2026

PR Progress

TODO:

  • Test/ finish geo locator stuff (may not be able to do on work laptop due to netskope)
    • tests in general
  • Update changelog
  • add documentation
  • Make sure the tests workflow can handle pytest-postgresql with postgis

DONE:

  • add migrations
  • collection (to file & direct)
  • random
  • load (from file & direct from collect)
  • initial swing at geolocator stuff
  • Run ruff again
  • update grafana dashboards

Summary

This PR adds first-class NTP support to OpenSAMPL and hardens the local/demo experience around initialization, loading, and visualization.

At a high level, it introduces:

  • an NTP vendor/probe family
  • local and remote NTP collection support
  • Additional metrics and metadata handling
  • reference-safe dashboard/query updates for NTP-backed timing views
  • documentation for the NTP extension path

What’s included

NTP vendor / probe support

  • Adds an NTP vendor family using the existing OpenSAMPL extension model
  • Supports remote NTP querying and local NTP inspection paths
  • Adds ntp_metadata handling and NTP-specific probe loading behavior
  • Adds metrics common for NTP clocks such as jitter, delay, stratum, and sync health

Dashboard and query hardening

  • Fixes Grafana variable/query issues around text UUID handling
  • Removes brittle empty-filter and varchar = uuid failure modes
  • Updates timing dashboards to use reference-safe wording for NTP-backed demo paths
  • Adds compact reference/source metadata tables to aid interpretation
  • Keeps GNSS extensibility intact for future true GNSS-backed probe families

NTP jitter handling

  • Fixes the empty jitter path for remote NTP collection
  • When a direct measured jitter value is unavailable from a single remote response, stores a documented estimate/bound instead
  • Clarifies this distinction in dashboard wording and docs

Docs

  • Updates README/docs to reflect current init/bootstrap behavior
  • Adds a detailed guide explaining how the NTP extension was added and integrated into OpenSAMPL
  • Adds an unreleased 1.2.0 changelog entry

Why

Before this change:

  • NTP support was not available as a first-class probe family
  • Grafana dashboards were brittle around variable typing and empty selections
  • NTP-backed demo dashboards could imply GNSS truth where only configured reference semantics existed

After this change:

  • NTP-backed timing data can be collected, loaded, and visualized cleanly
  • dashboard behavior is safer and more honest
  • the extension path is documented for future maintainers

Notes

  • The dashboard wording uses Reference rather than GNSS where the current backing data is NTP-referenced rather than GNSS-grounded.
  • GNSS-related backend concepts are preserved; this PR does not remove future support for true GNSS-backed probe families.
  • Demo/appliance orchestration (custom DB image, continuous ingest service, mock NTP targets, etc.) remains outside this PR in the syncscope-at-home repo.

Validation

Verified locally with:

  • uv run pytest
  • uv run ruff check .
  • NTP collect/load path
  • Grafana dashboard rendering with populated probe metadata and time-series rows

Comment thread opensampl/helpers/geolocator.py Outdated
Comment thread opensampl/db/orm.py

probe_uuid = Column(String, ForeignKey("probe_metadata.uuid"), primary_key=True)
mode = Column(Text)
reference = Column(Boolean, comment="Is used as a reference for other probes")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to update all clock metadata to be used as a reference?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get the information pretty easily from the tables, since any probe used as a reference will appear in the reference table.

I think i'll add a migration to add a view that joins our probe metadata against the reference table to make an easily accessible (and query-able) spot to get that info

@midgemacf midgemacf marked this pull request as draft April 20, 2026 19:54
Copy link
Copy Markdown
Contributor

@sempervent sempervent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • High: gen_api_key is unauthenticated, so enabling USE_API_KEY=true does not actually protect the backend. Any caller can hit /gen_api_key and mint a valid key without already having one, which defeats the access-control model entirely.
    See opensampl/server/backend/main.py:309.
  • High: opensampl load ntp ... will fail through the shared loader path. BaseProbe.process_single_file() now instantiates probes as cls(input_file=..., chunk_size=..., **kwargs), but NtpProbe.init only accepts input_file and does not
    accept chunk_size or **kwargs. That makes the new NTP parser incompatible with the standard load pipeline introduced in the same branch. See opensampl/vendors/base_probe.py:227 and opensampl/vendors/ntp.py:543.
  • Medium: collected probe files cannot be reloaded from a directory. _collect_and_save() writes files ending in .txt, but CollectMixin.filter_files() checks f.stem == ".txt". Path.stem never equals ".txt" for those files, so directory loads
    will always filter them out. See opensampl/mixins/collect.py:141 and opensampl/mixins/collect.py:145.
  • Medium: local chrony collection stores phase offset as a string, not a float. _parse_chronyc_tracking() assigns m.group(1) directly to self.offset_s, and export_data() then serializes that value into the time-series artifact unchanged.
    The downstream load path only keeps time/value, so this will persist string JSON for a metric declared as float, which is likely to break numeric queries/dashboards. See opensampl/vendors/ntp.py:190, opensampl/vendors/ntp.py:83, and
    opensampl/load_data.py:137.
  • Medium: remote NTP collection has no usable default port. CollectConfig.port defaults to None, and remote mode passes that straight into NTPRemoteCollector(target_port=collect_config.port). A normal collect ntp --mode remote --host ...
    run will therefore fail unless the caller knows to supply --port, instead of defaulting to standard NTP/123. See opensampl/vendors/ntp.py:511 and opensampl/vendors/ntp.py:653.

@sempervent
Copy link
Copy Markdown
Contributor

  1. High: opensampl load ntp ... is wired incompatibly with the shared file loader. The loader instantiates parsers as cls(input_file=filepath, chunk_size=chunk_size, **kwargs), but the new NtpProbe.init only accepts input_file and
    forwards it positionally. Any normal load path that passes chunk_size will fail before parsing starts. See opensampl/vendors/base_probe.py:227 and opensampl/vendors/ntp.py:543.
  2. High: the new geolocation path can break metadata ingestion when only a location_name or name-only geolocation override is provided. load_probe_metadata() now calls create_location() whenever geolocation is non-empty; create_location()
    then writes a locations row with lat=None and lon=None, and Locations.init blindly turns those keys into POINT(None None). That makes a simple name-based location association path invalid. See opensampl/load_data.py:203, opensampl/
    helpers/geolocator.py:67, and opensampl/db/orm.py:124.
  3. Medium: local chrony collection records phase offset as a string instead of a float. _parse_chronyc_tracking() assigns m.group(1) directly to self.offset_s, and that value is later exported and loaded as the metric payload. This creates
    a type mismatch for a metric declared as numeric and is likely to break aggregations/visualizations expecting numeric JSON values. See opensampl/vendors/ntp.py:190.
  4. Medium: remote NTP collection has no default port, so the common case fails unless the caller knows to pass one manually. CollectConfig.port defaults to None, and remote mode passes that straight into
    NTPRemoteCollector(target_port=collect_config.port) instead of defaulting to UDP 123. See opensampl/vendors/ntp.py:510 and opensampl/vendors/ntp.py:652.
  5. Low: interval-based collection always sleeps once after the last sample. With --interval > 0, the loop unconditionally calls time.sleep() at the end of every iteration, including the final one, so runtime overshoots by one full interval.
    See opensampl/vendors/ntp.py:669.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants