ML Inference Optimization for Low-Latency, Low-Energy Models

30% fewer FLOPs in inference — reduce energy / latency with no accuracy loss with a plug and play product.
Optimization library

Mathematical optimization to reduce inference latency

~30% fewer FLOPS Lower energy • Lower latency

Problem

Classification models (or models that make decisions, categorize data, recognize intent, etc.) must meet inference latency, throughput, and energy standards and SLAs to be deployed in real-time embedded systems.

Engineers address these technical requirements using iterative methods and must ultimately make informed trade-offs between accuracy, cost, and latency.

Solution

moco is a mathematical optimization library that analyzes a model’s input data and predictions and derives rules. Those rules replace model inference for portions of the input space. This results in avoided computation for some of the data, and produces lower energy and lower latency models.

Use Cases by Industry

Finance
Cybersecurity
  • Increase throughput for high-volume attacks (e.g., DDoS).
  • Reduce latency for fast-propagating attacks (e.g., malware).
Energy
  • Predictive maintenance for smart grid sensors.
  • Battery charge/discharge forecasting.
  • Fault detection.
Security
  • Real-time threat detection in images and live video streams.
Defense
  • Low-latency, edge-based threat detection for constrained systems (e.g., naval/subsurface platforms).

Benchmarked Results

Dataset Latency Improvement (Seq) Latency Improvement (Raced) Latency Improvement FLOPs Reduction Accuracy Change
MNIST 8x8 Optical Character Recognition 50.8% 14.3% 50.8% 84.3% -0.0017
Iris Flower Classification 26.6% 22.5% 26.6% 66.9% 0
Credit Card Fraud Detection 11.7% 58.5% 58.5% 31.9% -3.5e-06

Notebooks and Reports

Natural Language Processing (NLP)

Tabular

Computer Vision (CV)