Low-Latency, Low-Energy Models

Bookmark this to keep an eye on my project updates!

View My GitHub Profile

moco

Machine learning models are common place in tabular models to predict stock market prices, and click-through rates in advertising.

On the edge, or situations where data is sufficiently high-volume, these models must operate fast. In critical applications, this “price” can be huge, for autonomous vehicles it can be a crash, it can be lost revenue due to fraud, or failing to make a decision fast enough on a fleeting opportunity (in the case of algorithmic trading).

I am building practical methods that reduce the latency and energy-intensivity of these prediction tasks in inference.

The inputs into the system are:

The output:

The success (resulting time-savings) of the algorithm depends on the data distribution.

I’ve demonstrated the software’s success on multi-layer perceptrons (MLPs), but this approach is general enough to be applied to other model architectures.

The software package has two functions available. Both functions construct early-exit rules that allow the model to branch given easy inputs.

Software Usage:

from sklearn.neural_network import MLPClassifier
from moco import EarlyExitModel

model = MLPClassifier()
model.fit(X_train, y_train)

eem = EarlyExitModel(model)
eem.add_linear_definite_class_rule(X_train)

start = time.time()
eem.predict(X_test)
end = time.time()
experimental_time = end - start

start = time.time()
eem.baseline_predict(X_test)
end = time.time()
baseline_time = end - start

# Subject to the MLP being wide or deep!
assert baseline_time > experimental_time

Posts:

I invite you to explore the detailed findings in our blog posts and consider integrating moco into your projects. Feel free to contact me at quickmlmodels@gmail.com if you have a model you want to try this out on.