SPF Caching for Demand Placement Workflows

 Problem

  Network workflows that place multiple demands repeatedly compute redundant SPF (Shortest Path First) calculations. SPF computes paths to ALL nodes but we only use the path to one destination, and many demands
   share the same source.

  Affected workflows:
  1. MaximumSupportedDemand (MSD) - Binary search evaluates ~25 alpha values, placing all demands at each step
  2. TrafficMatrixPlacement - Monte Carlo runs N iterations, each placing all demands under different failure scenarios
  3. demand_placement_analysis - Core function used by the above and available for direct use

  Production scenario example (2,733 nodes, 720 demands from 77 sources):

  | Workflow               | Iterations         | SPF Calls (current) | SPF Calls (with caching) |
  |------------------------|--------------------|---------------------|--------------------------|
  | MSD                    | 25 alphas × 1 seed | 18,000              | ~2,000                   |
  | TrafficMatrixPlacement | 10 MC iterations   | 7,200               | ~800                     |

  Root Cause

  for demand in demands:
      # Each call computes SPF internally
      policy.place_demand(flow_graph, src_id, dst_id, priority, volume)

  The place_demand() API encapsulates SPF computation—convenient but prevents reuse. With 720 demands from 77 unique sources, we compute 720 SPFs when 77 would suffice.

  Proposed Solution

  Cache SPF results by source node within each placement batch:

  spf_cache: dict[int, PredDAG] = {}

  for demand in demands:
      if src_id not in spf_cache:
          spf_cache[src_id] = algorithms.spf(src=src_id, dst=None, ...)  # Full DAG

      dag = spf_cache[src_id]
      placed = flow_graph.place(flow_idx, src_id, dst_id, dag, volume, ...)

      # Fallback if shortest paths saturated
      if placed < volume:
          spf_cache[src_id] = algorithms.spf(src=src_id, dst=None, ...)
          # retry...

  Key insight: We already have the primitives:
  - algorithms.spf(dst=None) returns full predecessor DAG to all nodes
  - flow_graph.place() accepts a pre-computed DAG

  Implementation Locations

  Option A: Per-workflow (minimal change)

  Apply caching in each workflow step individually:
  - maximum_supported_demand_step.py → _evaluate_alpha_cached()
  - exec/analysis/flow.py → demand_placement_analysis()

  Pros: Isolated changes, easy to test independently
  Cons: Duplicated caching logic

  Option B: Shared utility (cleaner)

  Create a CachedDemandPlacer class in exec/analysis/:

  class CachedDemandPlacer:
      def __init__(self, algorithms, handle, flow_graph, node_mask, edge_mask):
          self.spf_cache: dict[int, PredDAG] = {}
          ...

      def place(self, src_id, dst_id, volume, priority) -> float:
          # Caching logic here
          ...

      def clear_cache(self):
          self.spf_cache.clear()

  Pros: Single implementation, consistent behavior
  Cons: New abstraction to maintain

  Option C: Core-level caching

  Push caching into NetGraph-Core's FlowPolicy class.

  Pros: Automatic for all callers
  Cons: More complex C++ changes, cache lifetime management, harder to tune

  Recommendation: Start with Option A for MSD (highest impact), then refactor to Option B if TrafficMatrixPlacement also shows significant benefit.

  Design Considerations

  Why not cache across alpha evaluations (MSD) or MC iterations?

  Each evaluation starts with fresh residual capacities. The optimal paths differ based on what's already placed. Cross-evaluation caching would require tracking edge usage—complexity not worth the benefit
  since within-evaluation cache hit rate is already high (720/77 = 9.3x).

  Why not cache across failure scenarios (TrafficMatrixPlacement)?

  Each failure scenario has different excluded nodes/links. The network topology changes, so cached paths may be invalid. Would need cache keyed by (source, exclusion_set)—possible but adds complexity.

  Future enhancement: For failure scenarios with small exclusion sets, we could potentially reuse cached DAGs and just filter out excluded edges at placement time (similar to how stale DAGs work with
  residuals). This needs investigation.

  Why the fallback recomputation is essential

  When ALL shortest paths from a source to a destination become saturated, placement fails. The fallback discovers alternative (longer) paths that have become optimal due to shortest-path saturation. Without
  it, demands would fail even when capacity exists via non-shortest routes.

  Thread safety for TrafficMatrixPlacement

  Monte Carlo runs parallel iterations via ThreadPoolExecutor. Each iteration operates on independent FlowGraph instances with independent residuals. The SPF cache should be per-iteration, not shared, to avoid
  cross-contamination between failure scenarios.

  Expected Impact

  | Workflow                          | Before | After | Speedup           |
  |-----------------------------------|--------|-------|-------------------|
  | MSD (25 alphas)                   | 106s   | ~36s  | ~3x               |
  | TrafficMatrixPlacement (10 iters) | ~10s   | ~4s   | ~2.5x (estimated) |
  | Combined scenario                 | 118s   | ~42s  | ~2.8x             |

  Alternatives Considered

  Bidirectional Dijkstra: Explored for multipath mode. For dense graphs (degree ~133), both search frontiers explore most nodes before meeting—actually 58% slower. Only helps for sparse graphs or
  single-destination queries.

  Destination-aware early termination: Dijkstra can stop when destination is settled, but for ECMP/multipath we need ALL equal-cost paths, requiring fuller exploration. And caching the full DAG provides more
  value than early termination for one destination.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SPF Caching for Demand Placement Workflows #98

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Workflow	Iterations	SPF Calls (current)	SPF Calls (with caching)
MSD	25 alphas × 1 seed	18,000	~2,000
TrafficMatrixPlacement	10 MC iterations	7,200	~800

Workflow	Before	After	Speedup
MSD (25 alphas)	106s	~36s	~3x
TrafficMatrixPlacement (10 iters)	~10s	~4s	~2.5x (estimated)
Combined scenario	118s	~42s	~2.8x

SPF Caching for Demand Placement Workflows #98

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions