Credit Card Transaction Demo¶
In this notebook, we demonstate how the moco optimization library can be used to accelerate a pre-trained binary classifier trained to flag credit card transactions as fraudulent.
It does this by reducing the average number of FLOPS (floating point operations) that the model needs to run inference on the entire dataset. Critically, moco finds subspaces of the input space where the prediction is a simple linear or constant function. At runtime, the derived model determines which subspace the transaction is a member of, and then executes the associated map with that subspace.
This results in lower energy use, lower latency, higher throughput and less hardware needed.
| experiment | Latency per transaction, raced (ms) | Number of FLOPs per transaction (ms) | Precision | Recall |
|---|---|---|---|---|
| baseline | 5.20875e-05 | 3104 | 0.630952 | 1 |
| optimized | 2.30249e-05 | 1627.6 | 0.630952 | 1 |
Imports¶
import pandas as pd
from typing import Callable
from moco.profiling import profile_method_real_time
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support
import time
from moco.binary_logit_classifier import BinaryLogitClassifier
import matplotlib.pyplot as plt
import numpy as np
Step 1: Audio initial model¶
def load_dataset(path: str):
df = pd.read_csv(path)
X = df[[col for col in df.columns if col.startswith('V')]].to_numpy()
y = df['Class'].to_numpy()
return X, y
Load the dataset. The dataset is heavily imbalanced -- it consists of mostly non-fraudulent transactions (only 0.17% of the 280k+ transactions are fraudulent).
# Load Dataset
X, y = load_dataset('/Users/samrandall/Downloads/creditcard.csv')
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 2)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.2, random_state = 2)
X.shape, pd.Series(y).value_counts().to_dict()
# Train initial model
mlp = MLPClassifier(random_state = 4)
mlp.fit(X_train, y_train)
from sklearn.metrics import PrecisionRecallDisplay
display = PrecisionRecallDisplay.from_estimator(
mlp, X_val, y_val, name="MLP", plot_chance_level=True, despine=True
)
_ = display.ax_.set_title("2-class Precision-Recall curve")
Choose a threshold based on precision-recall curve on validation set.¶
- I chose
threshold = 0.9, achievingprecision = 63%andrecall = 100%.
threshold = 0.9
blc = BinaryLogitClassifier(mlp, threshold)
y_hat_t_train = blc.predict(X_train)
y_hat_t_train.shape
Benchmark its latency and accuracy.¶
y_hat_t_test = blc.predict(X_test)
p, r, _, _ = precision_recall_fscore_support(y_hat_t_test, y_test, average = 'binary')
p, r
The precision we get is 70% and the recall we get is 100% on the test set.
original_timings = profile_method_real_time(mlp.predict, X_test)
On average, it takes 6.41E-6 seconds or 6 microseconds (us) to execute a transaction.
np.mean(original_timings)
FLOPs (Baseline)¶
Next we baseline the FLOPS. The model is a two-layer MLP with a ReLU activation and than a sigmoid activation. Roughly, the number of FLOPS for the first layer to process one data point is (28 + 1) * 100. The number of FLOPS to process the second layer is (100 + 1) * 1. Note that in both cases we account for the bias term. The number of FLOPs associated with the ReLU is 100 and the the number of flops associated with the sigmoid is 3. In sum total, there are (28 + 1) * 100 + 100 + (100 * 1) + 1 + 3 = 3104 FLOPS.
for i, (W, b) in enumerate(zip(mlp.coefs_, mlp.intercepts_)):
print(f"Layer {i}: {W.shape} {b.shape}")
Accelerating the model with moco¶
from moco.partition import Partition, RoutedModel
p_train = blc.predict(X_train)
partition = Partition()
C = 2
partition.find_sufficient_groups(X_train, y_train, min_group_size = X_train.shape[0] / (C * 10))
partition.summary_table
From the above summary table, we see that what we'll now do is create a Gated Model based on only the first group (is_active == True for that one, and not the second group). We see that the system generated a group of 182011 transactions, that were all not fraudulent. We fit a model to that, and that model identified 89757. We'll use that model in our new routed model.
mask = partition.subset_predictors[0].predict(X_train)
pd.Series(y_train[mask]).value_counts()
# Problem is Need to Pass in MLP
rm = RoutedModel.from_partition(partition, blc.predict)
Evaluation of the RoutedModel¶
On the test set, we do not see a drop in precision or recall. The precision is 83.3% in both and the recall is 87.5% in both.
p_train_new = rm.predict(X_train)
mlp_train_new = mlp.predict_proba(X_train)[:, 1] > threshold
print("train")
p, r, _, _ = precision_recall_fscore_support(mlp_train_new, y_train, average = 'binary')
print("MLP", p, r)
p, r, _, _ = precision_recall_fscore_support(p_train_new, y_train, average = 'binary')
print("Routed Model", p, r)
p_test_new = rm.predict(X_test)
mlp_test = (mlp.predict_proba(X_test)[:, 1] > threshold).astype(np.int64)
print("test")
print(mlp_test.dtype, y_test.dtype)
print(mlp_test.shape, y_test.shape)
p, r, _, _ = precision_recall_fscore_support(mlp_test, y_test, average = 'binary')
print("MLP", p, r)
mlp_precision = p
mlp_recall = r
p, r, _, _ = precision_recall_fscore_support(p_test_new, y_test, average = 'binary')
print("Routed Model", p, r)
routed_precision = p
routed_recall = r
In terms of FLOPs, we can do a theoretical analysis, and then we will get to the latency analysis.
np.isnan(partition.transform(X_train)).mean()
FLOPs in the New Model¶
Computed before for the original path is 3104 FLOPS / transaction.
That occurs in the test set whenever we get nan's (51.5%) of the time.
In both cases, whether the early exit path is used or not, we must evaluate it.
total_flops = (1 * Flops(early_exit)) + (g / N) * Flops(full model)
The FLOPs of the early exit (LogisticClassifier) are 28 * 1 + 1 = 29.
(1 * 29) + (0.515) * 3104 = 1627.6 FLOPS
This is a 48% reduction in FLOPS
Real Time Latency Analysis¶
indices = np.flatnonzero(~np.isnan(partition.transform(X_test)))
vcs_test = pd.Series(y_test[indices]).value_counts()
indices = np.flatnonzero(~np.isnan(partition.transform(X_train)))
vcs_train = pd.Series(y_train[indices]).value_counts()
print(vcs_train, vcs_test)
blc = BinaryLogitClassifier(mlp, threshold)
rm = RoutedModel.from_partition(partition, blc.predict)
rm_times_raced = profile_method_real_time(rm.predict_race, X_test)
rm_seq_times = profile_method_real_time(rm.predict, X_test)
df = pd.DataFrame({"original": original_timings, "optimized_best_of": rm_times_raced, "optimized_sequential": rm_seq_times})
df.mean(axis = 0).to_dict()
speedup = df.mean(axis = 0).to_dict()
ratio = speedup['optimized_best_of'] / speedup['original']
ratio
Latency Result¶
We are now seeing an average of 2.82E-5 seconds per transaction, which is faster than 4.92E-5.
benchmark_analysis = {
"experiment": ["baseline", "optimized"],
"Latency per transaction, raced (ms)": [speedup['original'], speedup['optimized_best_of']],
"Number of FLOPs per transaction (ms)": [3104, 1627.6],
"Precision": [mlp_precision, routed_precision],
"Recall" : [mlp_recall, routed_recall]
}
benchmark_df = pd.DataFrame(benchmark_analysis)
s = benchmark_df.to_markdown()
# Put this above as summary!
s
benchmark_df