MoE Inference

Stop MoE inference from quietly inflating your GPU bill.

The hidden cost driver

Not all tokens are created equal.

Token counts tell you how much AI you're using. They don't tell you how expensive that usage is to run.

Tokens trigger real computation inside the model — routing, communication, and GPU coordination. That behavior, not token count alone, is what drives inference cost.

Dystrio turns token behavior into a clear cost decision.

Run one warmup and get the decision you're currently guessing at: change expert placement — or prove your baseline is safe.

  • See where MoE is silently costing you
  • Get a safety-checked APPLY vs KEEP verdict
  • Share one report to justify the decision with infra and product

Upload your Dystrio ZIP

Read-only. Generated locally. Deleted immediately after render.

Drop ZIP here or click to select
Accepts .zip files up to 1GB
1
Dystrio Collector (local CLI)

A lightweight, read-only collector that runs during a short warmup. It records MoE routing and expert co-activation patterns and outputs a ZIP.

Does not
  • Intercept traffic
  • Modify inference
  • Apply changes
  • Retain data
2
Generate the ZIP

Package existing artifacts (no GPU):

dystrio analyze --run-dir artifacts/<run_id> --out dystrio.zip

Or run a full warmup (GPU required):

dystrio collect --model <model> --prompts <prompts.jsonl> \ --run-id <run_id> --num-nodes <N> --gpus-per-node <G> \ --out dystrio.zip

No prompts, weights, or model outputs — routing metadata only.

3
Get the decision report

One verdict (APPLY or KEEP), safety gates, and estimated cross-node traffic impact.

Does

  • Analyze MoE routing patterns
  • Return a safety-checked verdict
  • Default to KEEP unless proven better

Does not

  • Host models or serve tokens
  • Benchmark public APIs
  • Apply changes automatically