Mathematical optimization to improve computational efficiency of classifiers
Problem
Serving High-Throughput Machine Learning models require a large number of GPUs, costing organizations upwards of $100,000 per year.
Solution
moco is a mathematical optimization library that contains algorithms that analyze input data, model embeddings and weight matrices to optimize machine learning models, resulting in smaller computational loads, reducing floating point operations (FLOPS). This has a range of benefits including enabling higher throughput, fewer GPUs required, avoiding SLAs by meeting strict latency requirements more frequently, shifting workloads from the Cloud to the device, reducing energy footprint, extending battery life, increasing GPU headroom, etc.
Consulting
I work directly with teams to analyze their ML inference pipelines and optimize models to reduce compute usage and latency.
Pricing is performance-based: you pay 50% of the verified infrastructure savings.
Example: If optimization reduces compute costs by $100k per year, the consulting fee is $50k.
Product
A software product based on the same optimization framework is currently in development. If you are interested in early access or piloting the system, please reach out.
Use Cases by Industry
Benchmarked Results
| Dataset | Latency Improvement (Seq) | Latency Improvement (Raced) | Latency Improvement | FLOPs Reduction | Accuracy Change |
|---|---|---|---|---|---|
| MNIST 8x8 Optical Character Recognition | 50.8% | 14.3% | 50.8% | 84.3% | -0.0017 |
| Iris Flower Classification | 26.6% | 22.5% | 26.6% | 66.9% | 0 |
| Credit Card Fraud Detection | 11.7% | 58.5% | 58.5% | 31.9% | -3.5e-06 |