Ground Truth Data: Verification and Confidence Scores

In Vangrid, ground truth is not a marketing term — it is a precisely defined data quality guarantee. Every observation labeled as ground truth was captured by physical sensors in the real world, corroborated by multiple independent nodes, and signed with a cryptographic hash that proves it hasn’t been altered. This distinguishes Vangrid’s output from inferred, modeled, or synthetic spatial data, which may be statistically plausible but cannot be independently verified.

What ground truth means in Vangrid’s context

A ground truth observation in Vangrid meets three criteria:

Physical capture — The observation originates from a real sensor in the real world at a specific location and time. No portion of the observation is generated by a model or interpolated from adjacent data.
Multi-node corroboration — At least one additional independent node has captured a consistent observation of the same feature within the same spatiotemporal window. Single-node observations are returned with a low ground_truth_score and clearly marked as unconfirmed.
Cryptographic provenance — The observation carries a provenance_hash generated on the originating node. The hash ties the observation to a specific device, timestamp, and content, giving you tamper-evident proof of its origin.

If any of these three criteria are not met, the observation is returned with a lower score or flagged in the response. Vangrid does not suppress low-confidence observations — it surfaces them with the metadata you need to decide whether to use them.

How `ground_truth_score` is calculated

The ground_truth_score is a confidence metric between 0.0 and 1.0 that reflects the degree of multi-node corroboration for a given observation. The score increases as:

More independent nodes report consistent observations of the same feature
The contributing nodes use different sensor modalities (cross-modal agreement is weighted more heavily than same-sensor agreement)
The temporal spread between contributing observations is smaller (nodes that agree within milliseconds provide stronger corroboration than nodes that agree across minutes)

The score decreases when:

Only one node contributed to the observation
Contributing nodes disagree on position, classification, or velocity beyond the configured tolerance thresholds
One or more contributing nodes have a degraded hardware quality rating

A ground_truth_score of 1.0 does not mean certainty — it means the maximum corroboration achievable given the nodes that responded to your query. Always consider node_count alongside the score: a 1.0 from two nodes in a sparse area carries less weight than a 0.92 from fourteen nodes in a dense urban deployment.

The score is not a probability estimate — it is a normalized measure of corroboration strength. How you threshold it depends on your application’s tolerance for uncertainty. Common patterns:

Use case	Recommended minimum `ground_truth_score`
Archival or audit records	`0.5` — include with metadata for human review
Autonomous system inputs	`0.85` — high corroboration required for safety-critical decisions
Real-time situational awareness	`0.7` — balance between coverage and confidence
Historical analysis	`0.6` — broader inclusion for statistical work

Ground truth vs. inferred and synthetic data

It is important to understand what Vangrid’s ground truth is not. Inferred data is produced by a model that estimates the state of a scene based on partial observations or prior patterns. For example, a system that predicts vehicle positions between sensor captures using a motion model is producing inferred data, not ground truth. Inferred data can be highly accurate, but it cannot be verified against a physical observation. Synthetic data is generated entirely by simulation or generative models. It is useful for training and testing, but it describes a virtual world, not the physical one. Vangrid does not mix inferred or synthetic observations into its ground truth responses. Every feature in a Vangrid API response either originated from a physical sensor or is explicitly flagged with a data_source value of interpolated (for gap-filled historical records) or estimated (for low-confidence single-node observations pending corroboration).

If you need synthetic or inferred data — for simulation environments, training datasets, or counterfactual analysis — Vangrid does not provide it. Vangrid’s value proposition is physical ground truth, not modeled approximations.

How cryptographic provenance proves authenticity

Every observation in a Vangrid response includes a provenance_hash field. This hash is generated on the originating edge node using the node’s hardware-backed private key. It encodes:

The node’s unique identifier
The capture timestamp (GPS-synchronized, sub-second precision)
A SHA-256 hash of the observation’s feature payload

To verify a provenance_hash, you submit it to the Vangrid Provenance API along with the feature payload. The API checks the signature against the node’s public key and returns a verification result. If the payload was modified at any point after capture — in transit, in Vangrid’s infrastructure, or in your own systems — the verification will fail.

For defense, compliance, or auditing workflows, archive the provenance_hash alongside your application data at ingest time. You can verify the chain of custody months or years later without depending on Vangrid retaining the original observation.

Real-time vs. historical ground truth

Vangrid supports two query modes, and the choice affects how you should interpret ground_truth_score and provenance_hash.

Real-time ground truth

Real-time queries return observations captured within a short rolling window (typically the last few seconds to minutes). Because the aggregation window is short, some nodes may not have contributed yet when your response is assembled. This means:

node_count may be lower than what a historical query for the same time and place would show
ground_truth_score may be lower than it will be once all nodes have reported
The response is assembled quickly, but it reflects the network state at query time, not the fully-corroborated state

Use real-time ground truth when your application requires low latency — autonomous navigation, real-time fleet tracking, live situational awareness.

Historical ground truth

Historical queries retrieve observations from Vangrid’s provenance archive, where corroboration has had time to complete. For any given point in time, a historical query will typically return higher ground_truth_score values and higher node_count values than a real-time query for the same moment would have returned.Historical responses also include the full set of contributing node identifiers and their individual signed payloads — useful for audits or forensic reconstruction of a scene.Use historical ground truth for compliance records, incident investigation, training data generation, and any workflow where completeness matters more than latency.

Example API response with ground truth fields

The following shows a single feature from a real-time spatial query. Note the ground_truth_score, provenance_hash, and supporting fields that together constitute a verified ground truth observation.

{
  "feature_id": "feat_9b1e3c7d",
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [-73.9857, 40.7484, 8.1]
  },
  "properties": {
    "classification": "person.pedestrian",
    "velocity_mps": 1.3,
    "heading_deg": 45.0,
    "captured_at": "2026-05-22T14:32:05.198Z",
    "ground_truth_score": 0.91,
    "provenance_hash": "sha256:c7f2a49d1e8b3056a7f2c49d1e8b3056c7f2a49d1e8b3056a7f2c49d1e8b30",
    "node_count": 4,
    "sensor_modalities": ["camera", "depth"],
    "data_source": "ground_truth",
    "corroboration_window_ms": 312
  }
}

Key fields in this response:

Field	Value	What it tells you
`ground_truth_score`	`0.91`	High corroboration — 4 independent nodes agreed within 312ms.
`provenance_hash`	`sha256:c7f2...`	Cryptographic proof of origin. Submit to the Provenance API to verify.
`node_count`	`4`	Four independent nodes contributed to this observation.
`sensor_modalities`	`["camera", "depth"]`	Both camera and depth sensors agreed — cross-modal corroboration.
`data_source`	`ground_truth`	This is a physical observation, not interpolated or estimated.
`corroboration_window_ms`	`312`	All four nodes reported within 312 milliseconds of each other.

​What ground truth means in Vangrid’s context

​How ground_truth_score is calculated

​Ground truth vs. inferred and synthetic data

​How cryptographic provenance proves authenticity

​Real-time vs. historical ground truth

​Example API response with ground truth fields

What ground truth means in Vangrid’s context

How `ground_truth_score` is calculated

Ground truth vs. inferred and synthetic data

How cryptographic provenance proves authenticity

Real-time vs. historical ground truth

Example API response with ground truth fields