MoE Inference

Stop MoE inference from quietly inflating your GPU bill.

The hidden cost driver

Not all tokens are created equal.

Token counts tell you how much AI you're using. They don't tell you how expensive that usage is to run.

Tokens trigger real computation inside the model — routing, communication, and GPU coordination. That behavior, not token count alone, is what drives inference cost.

Dystrio turns token behavior into a clear cost decision.

Run one warmup and get the decision you're currently guessing at: change expert placement — or prove your baseline is safe.

See where MoE is silently costing you
Get a safety-checked APPLY vs KEEP verdict
Share one report to justify the decision with infra and product

Upload your Dystrio ZIP

Read-only. Generated locally. Deleted immediately after render.

Dystrio Collector (local CLI)

A lightweight, read-only collector that runs during a short warmup. It records MoE routing and expert co-activation patterns and outputs a ZIP.

Does not

Intercept traffic
Modify inference
Apply changes
Retain data

Generate the ZIP

Package existing artifacts (no GPU):

dystrio analyze --run-dir artifacts/<run_id> --out dystrio.zip

Or run a full warmup (GPU required):

dystrio collect --model <model> --prompts <prompts.jsonl> \
  --run-id <run_id> --num-nodes <N> --gpus-per-node <G> \
  --out dystrio.zip

No prompts, weights, or model outputs — routing metadata only.

Get the decision report

One verdict (APPLY or KEEP), safety gates, and estimated cross-node traffic impact.

Does

Analyze MoE routing patterns
Return a safety-checked verdict
Default to KEEP unless proven better

Does not

Host models or serve tokens
Benchmark public APIs
Apply changes automatically