Bookmark this to keep an eye on my project updates!
Efficient ML
Problem: Machine learning models can be too slow, require a lot of energy, and are thus expensive in inference.
Root Problem: Machine Learning Models are far from computationally efficient on their expected data distribution.
Solution: I am building algorithms to take trained machine learning models and make them more computationally efficient when running in inference. This results in more energy-efficient models, faster models, and models that require less infra, resulting in cheaper systems.
Results: I’ve achieved 20% reductions in inference speed on image, text and tabular classification models.
h = 2048
in-parallel, as well as >1.5 cost-savings across model sizes in the real-time setting.Quantization, pruning, and knowledge distillation all involve achieving acceleration at the cost of accuracy.
Hardware acceleration with GPUs, faster CPUs or parallelizing ML models on many machines involve more money, infrastructure cost and energy cost.
Use moco
to find subsets of data that are easy to classify early in the model, so the model does not need to execute fully for every single data point.
📢 Stay Tuned: I am building a software package. Indicate interest here.
🚀 Try it on your model — Contact me
View the project on GitHub at moco-client. PyPi package coming soon for you to try out.
I use my background in graph theory and topological data analysis to develop algorithms built into software that data scientists and ML performance engineers can use to optimize latency-critical models.