ποΈ System Architecture
Overview
Smart Mobility Predictor follows a modular, layered architecture designed for scalability, maintainability, and performance.
Architecture Layers
1. Data Ingestion Layer
Responsibility: Collect data from multiple sources
Components:
- Traffic API Client - Real-time sensor feeds, historical data
- Weather Service - Current & forecast weather data
- Event Calendar - Scheduled events, holidays
- Web Scraper - Public transportation schedules
- User Feedback System - Ratings & actual vs. predicted data
Technologies:
- Python requests library
- Async processing (asyncio, aiohttp)
- Message queue (RabbitMQ) for event streaming
Data Flow:
[APIs] β [Rate Limiter] β [Validation] β [Message Queue] β [Data Pipeline]
2. Data Processing Pipeline
Responsibility: Transform raw data into ML-ready features
Steps:
-
Data Loading
- Fetch from databases & message queues
- Concurrent processing for performance
- Memory-efficient streaming for large datasets
-
Data Cleaning
- Remove duplicates & outliers
- Handle missing values (KNN imputation for temporal)
- Validate data ranges
-
Feature Engineering
- Temporal features (hour, day, holiday flag)
- Lag features (traffic 5, 15, 60 min ago)
- Rolling aggregates (hourly, daily, weekly)
- Distance decay & proximity features
- Weather interaction terms
- Network features (degree centrality)
-
Data Normalization
- Min-Max scaling for bounded features (0-100%)
- Standard scaling (Z-score) for unbounded
- Separate scalers to prevent data leakage
-
Output
- Training data (60%)
- Validation data (20%)
- Test data (20%)
Technologies:
- Pandas, NumPy
- Scikit-learn preprocessing
- Apache Spark (for distributed processing at scale)
3. Machine Learning Layer
Responsibility: Generate predictions & recommendations
Models Deployed:
| Model | Purpose | Input | Output | Latency |
|---|---|---|---|---|
| XGBoost Ensemble | Travel time | Distance, congestion, weather, time | Minutes Β±CI | 50ms |
| Random Forest | Congestion class | Traffic, weather, events | Low/Med/High | 30ms |
| K-Means | Zone clusters | Aggregated patterns | Cluster ID | 10ms |
| Recommendation Engine | Route ranking | All above + user prefs | Ranked routes | 100ms |
Model Serving:
- Format: ONNX, joblib, pickle
- Storage: Model registry (MLflow), S3/GCS
- Caching: Redis for fast inference
- Versioning: Semantic versioning (v1.0, v1.1, etc.)
4. API Layer
Responsibility: Expose ML predictions via HTTP/REST
Endpoints:
POST /api/v1/predict/travel-time
Input: {origin, destination, departure_time, weather}
Output: {time_min, confidence, factors}
POST /api/v1/predict/congestion
Input: {location, time, weather}
Output: {class, probability}
POST /api/v1/recommendations
Input: {origin, destination, preferences}
Output: {routes: [{rank, time, risk, score}]}
GET /api/v1/traffic/real-time
Input: {bbox: [lat_min, lon_min, lat_max, lon_max]}
Output: {segments: [{congestion, speed, timestamp}]}
Technologies:
- FastAPI or Flask
- Request validation (Pydantic)
- Rate limiting (Redis)
- Error handling & logging
- OpenAPI documentation
5. Application Layer
Responsibility: Interactive user interface
Components:
Web Dashboard (Streamlit/React):
- Input form (origin, destination, time, preferences)
- Interactive map with route visualization
- Recommendation cards with details
- Real-time traffic heatmap
- Historical analytics
Features:
- Responsive design (mobile, tablet, desktop)
- Dark/light mode
- Accessibility (WCAG 2.1 AA)
- Progressive loading
- Offline map support (PWA)
6. Data Storage Layer
Databases:
| Database | Use Case | Data | Capacity |
|---|---|---|---|
| PostgreSQL | Transactional data | Users, trips, feedback | 100 GB |
| TimescaleDB | Time-series traffic | Sensor readings, predictions | 5 TB |
| Redis | Caching & sessions | Model cache, user sessions | 50 GB |
| Elasticsearch | Search & logging | API logs, errors | 500 GB |
Schema Example (PostgreSQL):
sqlCREATE TABLE trips ( id UUID PRIMARY KEY, user_id UUID REFERENCES users(id), origin POINT, destination POINT, departure_time TIMESTAMP, transport_mode VARCHAR, predicted_time INT, actual_time INT, feedback FLOAT, created_at TIMESTAMP DEFAULT NOW() ); CREATE INDEX ON trips(created_at, user_id);
Deployment Architecture
Containerization
Docker Images:
-
API Service
- Base: python:3.10-slim
- Runtime: FastAPI
- Size: ~200MB
-
Web Dashboard
- Base: python:3.10-slim
- Runtime: Streamlit
- Size: ~300MB
-
Worker Service
- Scheduled jobs (model retraining, data aggregation)
- Celery + RabbitMQ
- Size: ~200MB
Kubernetes Orchestration
Deployment Structure:
yaml# Namespace for isolation apiVersion: v1 kind: Namespace metadata: name: smart-mobility --- # API Service Deployment apiVersion: apps/v1 kind: Deployment metadata: name: api namespace: smart-mobility spec: replicas: 3 selector: matchLabels: app: api template: metadata: labels: app: api spec: containers: - name: api image: gcr.io/project/api:v1.0 ports: - containerPort: 5000 resources: requests: cpu: 500m memory: 512Mi limits: cpu: 1000m memory: 1Gi env: - name: DATABASE_URL valueFrom: secretKeyRef: name: db-secrets key: url livenessProbe: httpGet: path: /health port: 5000 initialDelaySeconds: 30 periodSeconds: 10 --- # Service exposure apiVersion: v1 kind: Service metadata: name: api-service namespace: smart-mobility spec: type: LoadBalancer selector: app: api ports: - protocol: TCP port: 80 targetPort: 5000
Load Balancing & Scaling
Horizontal Pod Autoscaler:
yamlapiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api minReplicas: 2 maxReplicas: 10 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
Data Flow Diagram
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA SOURCES (Real-time & Batch) β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Traffic APIs β Weather APIs β Events β GPS Traces β
ββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
ββββββββββββββββ
β Data Ingestionβ
β - Validation β
β - Dedup β
ββββββββ¬ββββββββ
βΌ
ββββββββββββββββββββ
β Message Queue β
β (RabbitMQ/Kafka) β
ββββββββ¬ββββββββββββ
βΌ
ββββββββββββββββββββββ
β Feature Engineering β
β - Cleaning β
β - Scaling β
β - Feature Creation β
ββββββββββ¬ββββββββββββ
βΌ
ββββββββββββββββ
β Feature Storeβ
β (Redis) β
ββββββββ¬ββββββββ
βΌ
ββββββββββββββββββββββββββββββ
β ML Model Inference β
ββββββββββββββββββββββββββββββ€
β - Travel time (XGBoost) β
β - Congestion (RF) β
β - Clustering (K-Means) β
β - Recommendations (Hybrid) β
ββββββββββ¬ββββββββββββββββββββ
βΌ
βββββββββββββββ
β API Responsesβ
β (FastAPI) β
βββββββ¬βββββββββ
βΌ
ββββββββββββββββββββ
β Frontend Display β
β (Streamlit/React)β
ββββββββββββββββββββ
Performance Optimization
Caching Strategy
Multi-Level Cache:
-
L1: In-Memory (Application)
- TTL: 5 minutes
- Capacity: 1GB
- Content: Recent predictions, model parameters
-
L2: Redis (Distributed)
- TTL: 15 minutes
- Capacity: 50GB
- Content: Route recommendations, traffic snapshots
-
L3: Database Cache (PostgreSQL)
- Permanent storage
- Content: Historical predictions for analytics
Cache Invalidation:
- Time-based: Refresh every 15 min
- Event-based: Invalidate on accident/incident
- Manual: Force refresh for testing
Query Optimization
Database Indexing:
sql-- Time-series queries CREATE INDEX idx_traffic_time ON traffic_readings(timestamp DESC); -- Geospatial queries CREATE INDEX idx_location ON traffic_readings USING GIST(location); -- User queries CREATE INDEX idx_user_trips ON trips(user_id, created_at DESC);
Connection Pooling:
- PgBouncer for PostgreSQL
- Pool size: 20-50 connections
- Timeout: 30 seconds
API Performance
Response Times (Target):
- Travel time prediction: < 50ms
- Congestion classification: < 30ms
- Route recommendations: < 100ms
- All requests: P95 < 500ms
Optimization Techniques:
- Request batching (multiple predictions in one call)
- Async processing for I/O operations
- Compression (gzip) for large responses
- CDN caching for static assets
Security Architecture
Authentication & Authorization
API Security:
- API keys for service-to-service calls
- JWT tokens for user sessions
- Role-based access control (RBAC)
- Rate limiting: 100 req/min per user, 10K/min per key
Data Security:
- Encryption at rest (AES-256)
- Encryption in transit (TLS 1.3)
- Database: Column-level encryption for PII
- Secrets management: HashiCorp Vault
Network Security
βββββββββββββββββββββββββββββββββββββββ
β CDN (Cloudflare) β
ββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β WAF (Web Application Firewall) β
β - DDoS protection β
β - SQL injection blocking β
β - Rate limiting β
ββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Load Balancer (TLS Termination) β
ββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Kubernetes Pod Security Policies β
ββββββββββ¬βββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββ
β Network Policies (Egress/Ingress) β
βββββββββββββββββββββββββββββββββββββββ
Monitoring & Logging
Observability Stack
| Tool | Purpose | Metrics |
|---|---|---|
| Prometheus | Metrics collection | CPU, memory, requests |
| Grafana | Visualization | Dashboards & alerts |
| ELK Stack | Log aggregation | Errors, performance |
| Jaeger | Distributed tracing | Request flow, latency |
Key Metrics to Monitor
System Health:
- API latency (P50, P95, P99)
- Error rate (4xx, 5xx)
- Pod restart count
- Database connection pool usage
Model Performance:
- Prediction accuracy (MAE, RMSE)
- Prediction drift detection
- Inference latency distribution
- Cache hit ratio
Business Metrics:
- User requests per second
- Route recommendation adoption rate
- User satisfaction (ratings)
- Cost per prediction
Disaster Recovery
Backup Strategy:
- Database: Daily full backups, hourly incremental
- Models: Version-controlled in S3, Git
- Code: GitHub with branch protection
- Configuration: Terraform state stored in GCS
Recovery Time Objectives (RTO):
- Complete system failure: 1 hour
- Database failure: 15 minutes
- Model failure: 5 minutes (fallback model)
Scalability
Horizontal Scaling:
- Stateless API services (unlimited)
- Database replicas for read scaling
- Caching layer for hot data
Vertical Scaling:
- Increase instance size for memory-intensive jobs
- GPU nodes for model training
Expected Scale:
- 10M users
- 1M requests/hour
- 100 predictions/second sustained
- 500 predictions/second burst
Cost Optimization
Resource Management:
- Spot instances for batch jobs (70% savings)
- Reserved instances for baseline (40% savings)
- Autoscaling to match demand (30% savings)
Estimated Monthly Costs:
- Compute: $5,000
- Storage: $1,000
- Data transfer: $2,000
- Total: $8,000
Cost Reduction Path:
- Implement caching (15% reduction)
- Use spot instances (20% reduction)
- Optimize queries (10% reduction)
- Total potential savings: 45%