Stop MoE inference from quietly inflating your GPU bill.
Not all tokens are created equal.
Token counts tell you how much AI you're using. They don't tell you how expensive that usage is to run.
Tokens trigger real computation inside the model — routing, communication, and GPU coordination. That behavior, not token count alone, is what drives inference cost.
Run one warmup and get the decision you're currently guessing at: change expert placement — or prove your baseline is safe.
- See where MoE is silently costing you
- Get a safety-checked APPLY vs KEEP verdict
- Share one report to justify the decision with infra and product
Upload your Dystrio ZIP
Read-only. Generated locally. Deleted immediately after render.
A lightweight, read-only collector that runs during a short warmup. It records MoE routing and expert co-activation patterns and outputs a ZIP.
- Intercept traffic
- Modify inference
- Apply changes
- Retain data
Package existing artifacts (no GPU):
dystrio analyze --run-dir artifacts/<run_id> --out dystrio.zip
Or run a full warmup (GPU required):
dystrio collect --model <model> --prompts <prompts.jsonl> \
--run-id <run_id> --num-nodes <N> --gpus-per-node <G> \
--out dystrio.zip
No prompts, weights, or model outputs — routing metadata only.
One verdict (APPLY or KEEP), safety gates, and estimated cross-node traffic impact.
Does
- Analyze MoE routing patterns
- Return a safety-checked verdict
- Default to KEEP unless proven better
Does not
- Host models or serve tokens
- Benchmark public APIs
- Apply changes automatically