From Notebook to Production
Training a model in a Jupyter notebook is one thing. Deploying it reliably in production is another. The gap between experimentation and production is where many AI projects stumble.
Common Challenges
Data Drift
Models trained on historical data can degrade as real-world data changes over time. A fraud detection model trained on last year's transactions may miss new patterns that emerge today.
import numpy as np
from scipy import stats
def detect_data_drift(reference, current, threshold=0.05):
"""Kolmogorov-Smirnov test for distribution shift."""
stat, p_value = stats.ks_2samp(reference, current)
if p_value < threshold:
print(f"⚠️ Data drift detected (p={p_value:.4f})")
return True
print("✅ No significant drift detected")
return False
# Example: comparing feature distributions over time
reference_data = np.random.normal(0, 1, 1000)
current_data = np.random.normal(0.5, 1.2, 1000)
detect_data_drift(reference_data, current_data)
Model Latency and Throughput
Real-time inference requires careful optimization. A model that takes 5 seconds to predict is useless for a live recommendation system.
Scalability
Serving one model is manageable. Serving dozens — each with different requirements, update schedules, and SLAs — demands robust infrastructure.
Best Practices
1. Containerize Your Models
Use Docker to package your model with all dependencies:
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY model.pkl ./
COPY app.py ./
EXPOSE 8080
CMD ["gunicorn", "--bind", "0.0.0.0:8080", "app:app"]
2. Set Up Monitoring
Track metrics that matter:
- Performance: Latency, throughput, error rates
- Data quality: Missing values, outliers, drift
- Business metrics: Conversion rate, user engagement
import prometheus_client
REQUEST_COUNT = prometheus_client.Counter(
'model_requests_total', 'Total model requests', ['model', 'status']
)
LATENCY = prometheus_client.Histogram(
'model_latency_seconds', 'Model inference latency'
)
3. Implement CI/CD for ML
Automate testing, validation, and deployment:
# GitHub Actions example
name: Deploy Model
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: pip install -r requirements.txt
- run: pytest tests/
- run: docker build -t my-model:latest .
- run: docker push registry/my-model:latest
4. Plan for Model Retraining
Schedule regular retraining pipelines and automate them. Use tools like Apache Airflow or Kubeflow to orchestrate data collection, training, validation, and deployment.
Conclusion
Deploying AI to production is less about algorithms and more about engineering discipline. Focus on monitoring, automation, and continuous improvement. The best model in a notebook is worthless if it can't reach users reliably. Build for the long haul.