Skip to content

Proposal: Add AI-driven Network Anomaly Detection Plugin and OpenTelemetry Export for Pixie #2271

@vatankh

Description

@vatankh

Background

Pixie provides deep, eBPF-based visibility into Kubernetes clusters, automatically capturing
network and application telemetry without manual instrumentation. However, while Pixie
offers powerful query and visualization capabilities (via PxL and Vizier), it currently lacks
a built-in mechanism for automated anomaly detection or OpenTelemetry-native export
of detected network irregularities.

This limits the ability of operators to detect and correlate real-time operational anomalies
(such as unexpected service-to-service communication, latency spikes, or throughput drops)
directly within Pixie’s observability workflow or external telemetry pipelines.

Problem Statement

Existing open-source tools like Zeek or Suricata perform deep packet inspection but are not
optimized for the dynamic, container-based nature of cloud-native microservices. Pixie already
solves visibility at scale but does not yet provide AI-assisted detection or direct
integration with the OpenTelemetry ecosystem.

Proposed Solution

Introduce a lightweight, optional plugin for Pixie that performs operational anomaly detection
on network traffic metrics and exports the results through OpenTelemetry.

  1. AI-driven Anomaly Detection Layer

    • Implement a Pixie plugin or PxL script extension that computes simple
      streaming anomaly scores on traffic metrics (latency, request rate, error rate, byte count).
    • Techniques: EWMA, robust z-scores, Isolation Forest, or simple autoencoders
      (depending on available library support and compute limits).
    • Tag anomalies with metadata such as service_a, service_b, namespace, and anomaly.score.
  2. OpenTelemetry Export Integration

    • Extend Pixie’s existing OpenTelemetry export capabilities to include these anomaly events
      as metrics or logs.
    • Allow configuration of anomaly thresholds and export frequency via Pixie’s plugin interface.
  3. Example Output

    - name: px.anomaly.network.latency_spike
      attributes:
        src_service: checkout
        dst_service: payment
        namespace: production
        anomaly.score: 0.94
      timestamp: 2025-11-11T12:00:00
    

Benefits

  • Enables real-time operational anomaly detection without additional instrumentation.

  • Bridges Pixie’s in-cluster visibility with the broader OpenTelemetry and AIOps ecosystem.

  • Provides actionable alerts and insights directly in the Pixie UI and external dashboards (Grafana, Datadog, etc.).

Scope & Alignment

  • Keeps focus on observability and performance analysis, not security or intrusion detection.

  • Aligns with the goal of improving AI-driven insights in Pixie’s roadmap.

  • Can be developed as an independent plugin, avoiding changes to Pixie’s core.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions