Mathematical optimization to improve computational efficiency of classifiers
Problem
Machine Learning systems that sustain a high throughput of requests and achieve low latency require a large number of GPUs.
Companies' GPU Inference Costs are sky-high; it cuts into runway for start-ups and revenue margins for more established companies. In many cases, it exceeds a million dollars for ML systems serving users, especially for analytics companies. Adding to the problem, GPUs are at 30-40% utilization, which further adds to the financial costs.
Solution
moco is a mathematical optimization library that optimizes machine learning models for computational efficiency, resulting in smaller computational loads, reducing floating point operations (FLOPS), effectively meaning companies can run higher throughput on fewer GPUs.
moco reduces the average latency of queries running through models, which reduces latency-based penalties (e.g SLA violations).
Use Cases by Industry
Benchmarked Results
| Dataset | Latency Improvement (Seq) | Latency Improvement (Raced) | Latency Improvement | FLOPs Reduction | Accuracy Change |
|---|---|---|---|---|---|
| MNIST 8x8 Optical Character Recognition | 50.8% | 14.3% | 50.8% | 84.3% | -0.0017 |
| Iris Flower Classification | 26.6% | 22.5% | 26.6% | 66.9% | 0 |
| Credit Card Fraud Detection | 11.7% | 58.5% | 58.5% | 31.9% | -3.5e-06 |