diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 4637ce62..1e7066dc 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -4,9 +4,11 @@ Hedgehog Open Network Fabric leverages the Kubernetes API to manage its resource To make network switches Kubernetes-aware, the Fabric employs an **Agent** running on each switch. This agent acts as an interface between the Kubernetes control plane and the switch internal network configuration mechanisms. It continuously syncs desired state from Kubernetes via the Fabric Controller and applies configurations using **gNMI** (gRPC Network Management Interface). +Gateway nodes follow the same Kubernetes-native model. The Fabric Controller manages gateway configuration through a dedicated Kubernetes CRD, which the gateway's Dataplane watches directly, continuously reconciling its running state with the desired configuration and reporting observed status back through the Kubernetes API. This keeps gateway management fully consistent with the rest of the Fabric: operators interact exclusively through Kubernetes resources, and operational state is always visible via standard Kubernetes tooling. + ## Components -Hedgehog Fabric consists of several key components, distributed between the Control Node and the Network devices. The following diagram breaks down the components of a [mesh topology](fabric.md#mesh). Hedgehog components have been highlighted in brown color: +Hedgehog Fabric consists of several key components, distributed between the Control Node and the network devices. The following diagram illustrates these components and their relationships. Hedgehog components have been highlighted in brown color: ``` mermaid graph TD; @@ -19,35 +21,38 @@ graph TD; K -->|Interacts via K8s API| A L[Fabricator]:::ourComponent -->|Installs & Configures| A - A -->|Kubernetes API| B1 - B1 -->|Syncs State| A; - A -->|Kubernetes API| B2 - B2 -->|Syncs State| A; - - %% Mesh - Two Switches - subgraph SONiC Leaf 2 - B1[Fabric Agent]:::ourComponent -->|Scraped by| C1[Alloy]:::thirdParty - C1 -->|Pushes Logs/Metrics| P - D1[gNMI]:::thirdParty - E1[Config DB]:::thirdParty - I1[ASIC]:::thirdParty + A -->|Kubernetes API| SW_AGENT + SW_AGENT -->|Syncs State| A + GWD -->|Syncs State| A + + %% Switch + subgraph Switch + SW_AGENT[Fabric Agent]:::ourComponent + SW_ALLOY[Alloy]:::thirdParty + SW_GNMI[gNMI]:::thirdParty + SW_CDB[Config DB]:::thirdParty + SW_ASIC[ASIC]:::thirdParty + SW_ALLOY -->|scrapes| SW_AGENT + SW_ALLOY -->|Pushes Logs/Metrics| P end - subgraph SONiC Leaf 1 - B2[Fabric Agent]:::ourComponent -->|Scraped by| C2[Alloy]:::thirdParty - C2 -->|Pushes Logs/Metrics| P - D2[gNMI]:::thirdParty - E2[Config DB]:::thirdParty - I2[ASIC]:::thirdParty + %% Gateway + subgraph Gateway + GWD[Dataplane]:::ourComponent + GWFA[FRR Agent]:::ourComponent + GWFRR[FRR]:::thirdParty + GWA[Alloy]:::thirdParty + GWD -->|routing config| GWFA + GWFA -->|config reload| GWFRR + GWFRR -->|routes & BGP state| GWD + GWA -->|scrapes /metrics| GWD + GWA -->|Pushes Logs/Metrics| P end %% Switch Configuration Flow - B1 -->|Applies Config| D1 - B2 -->|Applies Config| D2 - D1 -->|Writes/Reads| E1 - D2 -->|Writes/Reads| E2 - E1 -->|Controls| I1 - E2 -->|Controls| I2 + SW_AGENT -->|Applies Config| SW_GNMI + SW_GNMI -->|Writes/Reads| SW_CDB + SW_CDB -->|Controls| SW_ASIC %% Logs and Metrics Flow P -->|Forwards Logs/Metrics| M @@ -70,10 +75,10 @@ The key components essential for understanding the Fabric architecture are: ### Control Node Components - **Fabric Controller**: The central control plane component that manages Fabric resources and configurations. - **Fabric CLI (kubectl plugin)**: A `kubectl` plugin that provides an easy way to manage Fabric resources. -- **Fabric Proxy**: A pod responsible for collecting logs and metrics from switches (via Alloy) and forwarding them to an external system. +- **Fabric Proxy**: A pod responsible for collecting logs and metrics from switches and gateways (via Alloy) and forwarding them to an external system. - **Fabricator**: A tool for installing and configuring Fabric, including virtual lab environments. -### SONiC Switch Components +### Switch Components - **Fabric Agent**: Runs on each switch and applies configurations received from the control plane. - **Alloy**: Collects logs and telemetry data from the switch. - **gNMI Interface**: The main configuration API used by the Fabric Agent to interact with the switch. @@ -82,6 +87,14 @@ The key components essential for understanding the Fabric architecture are: The SONiC architecture presented here is a high-level abstraction, for simplicity. +### Gateway Components +- **Dataplane**: A packet processing pipeline that handles NAT, flow tracking, and VXLAN encapsulation/decapsulation. It reads the desired peering and NAT configuration from Kubernetes and generates FRR configuration delivered to the FRR Agent. +- **FRR Agent**: A Hedgehog-written component that receives FRR configuration from the dataplane and applies it to FRR via dynamic reload. +- **FRR (Free Range Routing)**: A suite of routing daemons that provides BGP peering with the fabric switches. FRR advertises VPC peering routes to attract traffic to the gateway, and pushes routes received from the fabric back into the dataplane's forwarding table via the Control Plane Interface (CPI). +- **Alloy**: Collects logs and metrics from the gateway and forwards them to the Fabric Proxy. + +Gateway nodes run Flatcar Linux and join the Kubernetes cluster as worker nodes. The Fabric Controller schedules all gateway components onto gateway nodes and delivers configuration through the `GatewayAgent` Kubernetes CRD. The Dataplane watches this CRD directly, keeping its own state synchronized and reporting back observed status. FRR and the FRR Agent are responsible for all routing interactions with the fabric: FRR advertises and receives routes via BGP, while the FRR Agent keeps FRR's configuration in sync with the Dataplane's desired state. + ## Architecture Flow ### 1. **Fabric Installation & Configuration** @@ -99,7 +112,13 @@ The SONiC architecture presented here is a high-level abstraction, for simplicit - The **Fabric Agent** applies configurations using the **gNMI** interface, updating the **Config DB**. - The **Config DB** ensures that all settings are applied to the **ASIC** for packet forwarding. -### 4. **Telemetry & Monitoring** -- The **Alloy** agent on the switch collects logs and metrics. +### 4. **Gateway Configuration & Management** +- The **Fabric Controller** publishes a `GatewayAgent` CRD containing the desired gateway configuration: BGP settings, VPC peerings, NAT rules, and gateway group membership. +- The **Dataplane** watches the `GatewayAgent` CRD via the Kubernetes API, applies the configuration, and writes its observed state (including FRR applied generation and per-VPC traffic statistics) back to the CRD status. +- The **Dataplane** generates FRR configuration from the desired state and delivers it to the **FRR Agent**, which applies it to FRR via dynamic reload. +- **FRR** establishes BGP sessions with the fabric switches to advertise VPC peering routes. It pushes received routes and BGP state back to the **Dataplane** via the Control Plane Interface (CPI) and BGP Monitoring Protocol (BMP) respectively. + +### 5. **Telemetry & Monitoring** +- The **Alloy** agent on switches and gateways collects logs and metrics. - Logs and metrics are sent to the **Fabric Proxy** running in Kubernetes. - The **Fabric Proxy** forwards this data to **LGTM**, an external logging and monitoring system. diff --git a/docs/troubleshooting/gateway.md b/docs/troubleshooting/gateway.md new file mode 100644 index 00000000..5ea84a3f --- /dev/null +++ b/docs/troubleshooting/gateway.md @@ -0,0 +1,196 @@ +# Gateway + +This page covers diagnosing common issues with the Hedgehog Gateway, including +connectivity problems and NAT issues. + +## Health Checks + +Start by verifying the gateway has picked up its current configuration: + +```console +$ kubectl get gatewayagents +NAME APPLIED APPLIEDG CURRENTG VERSION PROTOCOLIP VTEPIP AGE +gateway-1 10 minutes ago 10 10 v1.2.0 ... ... 2d +``` + +`AppliedG` should equal `CurrentG`. If they differ, the gateway has not yet +applied the latest configuration. + +If the gateway is not reporting in at all, check that both pods are running: + +```console +$ kubectl get pods -n fab -l app.kubernetes.io/component=gateway +NAME READY STATUS RESTARTS AGE +gw--gateway-1--dataplane-7v9ss 1/1 Running 0 12h +gw--gateway-1--frr-c9kwc 2/2 Running 0 12h +``` + +## Common Issues + +### Traffic not flowing through gateway + +1. **Check peering is configured**: Verify the GatewayPeering object exists + and is not rejected: + ```console + $ kubectl get gatewaypeerings + ``` + +2. **Check routes on the leaf**: Verify gateway routes are installed on the + leaf switches: + ```console + $ kubectl fabric inspect vpc + ``` + Look for routes pointing to the gateway's VTEP IP. + +3. **Check BGP neighbors**: Verify all BGP sessions are established (see + [Inspecting Gateway State](#inspecting-gateway-state)). + +### NAT not working as expected + +1. **Check traffic is reaching the gateway**: Use the per-VPC and per-peering + packet counters in the gateway state (see + [Inspecting Gateway State](#inspecting-gateway-state)) to verify packets + are being processed. Zero counters while traffic is expected indicates + the packets are not reaching the gateway. + +2. **Idle timeout**: If connections work briefly then stop, the flow may be + expiring. Check the `idleTimeout` setting in the GatewayPeering spec. + Use TCP or application-layer keepalives for long-lived connections. + +### Gateway failover + +1. **Check both gateways are running**: Verify both gateway pods are healthy. + +2. **Check gateway group membership**: + ```console + $ kubectl get gateways -o yaml + ``` + Verify both gateways are members of the expected group with correct + priorities. + +3. **Check BGP on leaves**: After a failover, the leaf switches should + withdraw routes from the failed gateway and install routes from the + backup. Use `kubectl fabric inspect bgp` to check. Also verify BGP + neighbor state on the backup gateway (see + [Inspecting Gateway State](#inspecting-gateway-state)). + +## Inspecting Gateway State + +The `GatewayAgent` status exposes the full operational state of the gateway, +including BGP neighbor sessions and per-VPC traffic counters. + +```console +$ kubectl get gatewayagents -o yaml +``` + +### Configuration status + +```yaml +status: + lastAppliedGen: 10 + lastAppliedTime: "2026-04-17T16:29:04Z" + lastHeartbeat: "2026-04-17T17:25:25Z" + state: + frr: + lastAppliedGen: 10 +``` + +- `lastAppliedGen` should match the object's `metadata.generation`. If it + lags, the dataplane has not yet applied the current configuration. +- `state.frr.lastAppliedGen` should match `lastAppliedGen`. If it lags, FRR + has not yet picked up the latest routing configuration. +- `lastHeartbeat` is updated periodically by the dataplane. A stale value + indicates the dataplane is not running or not reachable. + +### BGP neighbor state + +```yaml + state: + bgp: + vrfs: + default: + neighbors: + 172.30.128.12: + sessionState: established + localAS: 65534 + peerAS: 65100 + remoteRouterID: 172.30.8.0 + connectionsDropped: 0 + establishedTransitions: 1 + ipv4UnicastPrefixes: + received: 6 + sent: 1 + 172.30.128.26: + sessionState: established + ... +``` + +All neighbors should be in `established` state. If a neighbor is in `active` +or `idle`, the BGP session is not up; check physical connectivity and IP +configuration on both the gateway and the connected leaf switch. + +A non-zero `connectionsDropped` or a high `establishedTransitions` count +indicates the session has been flapping. + +### Traffic counters + +Per-VPC totals: + +```yaml + state: + vpcs: + vpc-01: + p: 3555 # packets + b: 5835616 # bytes + d: 0 # drops +``` + +Per-peering directional counters (present when peerings are active): + +```yaml + state: + peerings: + vpc-01->vpc-02: + p: 3555 + b: 5835616 + d: 0 + bps: 254024.3 + pps: 70.2 + vpc-02->vpc-01: + p: 2711 + b: 1000519 + d: 0 + bps: 128.9 + pps: 9.5 +``` + +Drops (`d`) greater than zero indicate packets are being discarded, which may +point to a misconfigured peering or an exhausted NAT pool. Zero packet counts +on an expected active peering indicate traffic is not reaching the gateway. + +## Metrics + +The dataplane exposes Prometheus metrics scraped by the Alloy agent on the +gateway node and forwarded to the Fabric Proxy. + +Each metric is emitted with three label variants: + +- `{total=""}`: all traffic in or out of the VPC +- `{drops=""}`: traffic dropped for the VPC +- `{from="",to=""}`: directional traffic between two VPCs + +Available metrics: + +| Metric | Type | Description | +|--------|------|-------------| +| `vpc_packet_count` | Gauge | Packet count | +| `vpc_packet_rate` | Gauge | Packet rate | +| `vpc_byte_count` | Gauge | Byte count | +| `vpc_byte_rate` | Gauge | Byte rate | + +To inspect metrics directly, run on the gateway node itself (the dataplane uses +host networking, so the endpoint is accessible on the node at port 9442): + +```console +$ curl -s http://localhost:9442/metrics +``` diff --git a/docs/user-guide/gateway.md b/docs/user-guide/gateway.md index 523e7aed..6e0bc8f1 100644 --- a/docs/user-guide/gateway.md +++ b/docs/user-guide/gateway.md @@ -106,6 +106,33 @@ style Leaves fill:none,stroke:none style Servers fill:none,stroke:none ``` +## Flow Table and Stateful Processing + +When stateful NAT (masquerade or port-forwarding) is configured on a gateway peering, +the gateway maintains a **flow table** to track active connections. Each unique connection +(identified by its source/destination IPs, ports, and protocol) creates an entry in the +flow table. This entry records the NAT translation applied and the connection's idle timer. + +Key characteristics of the flow table: + +- **Timeout-based eviction**: Flow entries expire after a configurable period of inactivity. + The idle timeout is set per peering via the `idleTimeout` field in the NAT configuration + (default: 2 minutes for masquerade; see [Masquerade](#masquerade-stateful-source-nat) and + [Port-Forwarding](#port-forwarding-stateful-destination-nat) for details). When a flow expires, + its entry is removed and subsequent packets for that connection are treated as a new flow. +- **Capacity**: The flow table can handle millions of concurrent entries depending on the gateway + node's available memory. The maximum number of flow entries can be configured via the + `flowTableCapacity` field in the Gateway spec. In most deployments, the default is sufficient. +- **Per-gateway state**: Each gateway maintains its own flow table independently. Flow state + is not shared between gateways. If a gateway fails and traffic is redirected to a backup + gateway (see [Gateway fail-over](gateway-failover.md)), existing stateful connections must + be re-established, as the backup gateway has no knowledge of the failed gateway's flow table. + +!!! tip + Use TCP keepalives or application-layer keepalives for long-lived connections through + stateful NAT. This prevents the flow entry from expiring due to inactivity during + idle periods. + ## Gateway Peering Just as [VPC Peerings](vpcs.md#vpcpeering) provide VPC-to-VPC connectivity by way of the switches in the fabric, gateway peerings provide connectivity via the gateway nodes.