使用 FastAPI 和机器学习构建实时信用卡欺诈检测系统

使用 fastapi 和机器学习构建实时信用卡欺诈检测系统

介绍

信用卡欺诈对金融业构成重大威胁，每年造成数十亿美元的损失。为了解决这个问题，人们开发了机器学习模型来实时检测和防止欺诈交易。在本文中，我们将逐步介绍使用 fastapi（python 的现代 web 框架）以及在 kaggle 流行的信用卡欺诈检测数据集上训练的随机森林分类器构建实时信用卡欺诈检测系统的过程。

项目概况

该项目的目标是创建一个 web 服务来预测信用卡交易欺诈的可能性。该服务接受交易数据，对其进行预处理，然后返回预测以及欺诈概率。该系统设计快速、可扩展且易于集成到现有的金融系统中。

关键部件

机器学习模型：经过训练以区分欺诈交易和合法交易的随机森林分类器。
数据预处理：交易特征标准化，确保模型达到最佳性能。
api：使用 fastapi 构建的 restful api，用于实时处理预测请求。

第 1 步：准备数据集

本项目使用的数据集是来自 kaggle 的信用卡欺诈检测数据集，其中包含 284,807 笔交易，其中只有 492 笔是欺诈交易。这种类别不平衡带来了挑战，但可以通过对少数类别进行过采样来解决。

数据预处理

首先使用 scikit-learn 的 standardscaler 对这些功能进行标准化。然后将数据集分为训练集和测试集。鉴于不平衡，在训练模型之前应用 randomoversampler 技术来平衡类别。

from sklearn.preprocessing import standardscaler
from imblearn.over_sampling import randomoversampler

# standardize features
scaler = standardscaler()
x_scaled = scaler.fit_transform(x)

# balance the dataset
ros = randomoversampler(random_state=42)
x_resampled, y_resampled = ros.fit_resample(x_scaled, y)

登录后复制

第 2 步：训练机器学习模型

我们训练了一个随机森林分类器，它非常适合处理不平衡的数据集并提供可靠的预测。该模型在过采样数据上进行训练，并使用准确度、精确度、召回率和 auc-roc 曲线来评估其性能。

from sklearn.ensemble import randomforestclassifier
from sklearn.metrics import classification_report, roc_auc_score

# train the model
model = randomforestclassifier(n_estimators=100, random_state=42)
model.fit(x_resampled, y_resampled)

# evaluate the model
y_pred = model.predict(x_test_scaled)
print(classification_report(y_test, y_pred))
print("auc-roc:", roc_auc_score(y_test, model.predict_proba(x_test_scaled)[:, 1]))

登录后复制

第 3 步：构建 fastapi 应用程序

使用 joblib 保存训练好的模型和缩放器后，我们继续构建 fastapi 应用程序。选择 fastapi 是因为它的速度和易用性，使其成为实时应用程序的理想选择。

创建 api

fastapi 应用程序定义了一个 post 端点 /predict/，它接受交易数据、处理数据并返回模型的预测和概率。

from fastapi import fastapi, httpexception
from pydantic import basemodel
import joblib
import pandas as pd

# load the trained model and scaler
model = joblib.load("random_forest_model.pkl")
scaler = joblib.load("scaler.pkl")

app = fastapi()

class transaction(basemodel):
    v1: float
    v2: float
    # include all other features used in your model
    amount: float

@app.post("/predict/")
def predict(transaction: transaction):
    try:
        data = pd.dataframe([transaction.dict()])
        scaled_data = scaler.transform(data)
        prediction = model.predict(scaled_data)
        prediction_proba = model.predict_proba(scaled_data)
        return {"fraud_prediction": int(prediction[0]), "probability": float(prediction_proba[0][1])}
    except exception as e:
        raise httpexception(status_code=400, detail=str(e))

登录后复制

第 4 步：部署应用程序

要在本地测试应用程序，您可以使用 uvicorn 运行 fastapi 服务器并向 /predict/ 端点发送 post 请求。该服务将处理传入的请求，扩展数据，并返回交易是否存在欺诈。

本地运行 api

uvicorn main:app --reload

登录后复制

然后您可以使用curl或postman等工具测试api：

curl -X POST http://127.0.0.1:8000/predict/ \
-H "Content-Type: application/json" \
-d '{"V1": -1.359807134, "V2": -0.072781173, ..., "Amount": 149.62}'

登录后复制

api 将返回一个 json 对象，其中包含欺诈预测和相关概率。