Computationally Efficient ML: Inference Optimization

30% fewer FLOPs in inference — reduce energy / latency with no accuracy loss with a plug and play product.
Optimization library

Mathematical optimization to improve computational efficiency of classifiers

~30% fewer FLOPS Higher Throughput • Smaller GPU Fleets

Problem

Machine Learning systems that sustain a high throughput of requests and achieve low latency require a large number of GPUs.

Companies' GPU Inference Costs are sky-high; it cuts into runway for start-ups and revenue margins for more established companies. In many cases, it exceeds a million dollars for ML systems serving users, especially for analytics companies. Adding to the problem, GPUs are at 30-40% utilization, which further adds to the financial costs.

Solution

moco is a mathematical optimization library that optimizes machine learning models for computational efficiency, resulting in smaller computational loads, reducing floating point operations (FLOPS), effectively meaning companies can run higher throughput on fewer GPUs.

moco reduces the average latency of queries running through models, which reduces latency-based penalties (e.g SLA violations).

Use Cases by Industry

Finance
Cybersecurity
  • Increase throughput for high-volume attacks (e.g., DDoS).
  • Reduce latency for fast-propagating attacks (e.g., malware).
Energy
  • Predictive maintenance for smart grid sensors.
  • Battery charge/discharge forecasting.
  • Fault detection.
Security
  • Real-time threat detection in images and live video streams.
Defense
  • Low-latency, edge-based threat detection for constrained systems (e.g., naval/subsurface platforms).

Benchmarked Results

Dataset Latency Improvement (Seq) Latency Improvement (Raced) Latency Improvement FLOPs Reduction Accuracy Change
MNIST 8x8 Optical Character Recognition 50.8% 14.3% 50.8% 84.3% -0.0017
Iris Flower Classification 26.6% 22.5% 26.6% 66.9% 0
Credit Card Fraud Detection 11.7% 58.5% 58.5% 31.9% -3.5e-06

Notebooks and Reports

Natural Language Processing
Tabular Models
Computer Vision
Audio
  • Client Deployment — Audio Classification Optimization

    Reduced GPU hours by 13% for an audio event detection pipeline (fine-tuned ResNet50, PyTorch), with no measurable accuracy degradation.

    Ideal for: call-center analytics, voice moderation, edge audio monitoring.