moco automatically converts existing classifiers into routed cascaded systems that avoid unnecessary computation while preserving model behavior.
Most production ML systems send every input through the full model — even when many predictions are trivial to classify.
moco analyzes model behavior and automatically routes easy predictions through cheaper decision paths while reserving full inference for difficult cases.
Reduce unnecessary inference compute.
Reduce energy consumed per prediction.
Process more requests on existing hardware.
I will analyze historical inference logs, model outputs, or representative datasets to identify unnecessary model computation and estimate potential savings.
The audit evaluates:
License to Self-Serve API: you only pay based on the value it provides you. (50% of first year of cost-savings)
Self-Serve API + Support: you only pay based on the value it provides you (50% of first year of cost-savings)
Enterprise: I will quantize, prune models and apply my algorithms to build cascaded systems. (50% of first year of cost-savings)
The Product API docs are here.
import moco
rules = moco.analyze(dataset: np.ndarray, predictions: np.ndarray)
selected_rules = [rule for rule in rules if rule.class in (0, 1, 2)]
optimized_model = moco.build_cascaded_system(original_model, selected_rules)
Can be applied in combination with quantization/pruned models. Quantization & pruning alike risk accuracy degradation. Accepting lower accuracy in favor of cost/energy constraints is a tough place to be in.
Requires re-training, and thus significant development cost.
Cascaded systems are great! Why evaluate a complex ML model if a simple rule will suffice? moco scales this idea and automatically generates these rules, and builds a routed system automatically.
Adding more CPUs/GPUs to meet compute needs is expensive and disruptive. Delaying that until next quarter is a win.
Migrating to faster CPU/GPUs to meet compute needs is expensive and disruptive. Delaying that until next quarter is a win.
| Task | Dataset | Base Model | Fine Tuned Model | Compute Reduction | Accuracy Change |
|---|---|---|---|---|---|
| Image Classification | CIFAR-10 | ResNet-18 | HF Model finetuned on CIFAR-10 | 34.6% | -0.3% |
| Text Classification | IMDB Reviews | BERT | Fine-Tuned TinyBERT | 21.5% | +0.1% |
Evaluation Steps