Traffic Flow Forecasting
Summary
Forecasts short-term roadway demand via a lightweight ML pipeline that can run on commodity hardware.
Accurate short-term forecasts let dispatch planners pre-stage crews without overreacting to noisy counts.
Problem
Operators need reliable traffic estimates ahead of dispatch windows, but raw counting data is noisy and released with lag.
Approach
Engineered a pipeline that blends historical counts, calendar context, and weather features, then trains gradient-boosted trees with early stopping and cross-validation.
Highlights
- • Feature pipeline merges historical counts with weather and calendar context windows.
- • Training includes early stopping, cross-validation, and serialization for CLI inference.
- • Inference script ingests CSV data and writes per-timestep predictions with confidence flags.
Results
- Peak-hour surges need richer error analysis; current feature set misses unusual congestion spikes.
- Confidence drops when weather sensors report errors; fallback still requires human oversight.
In progress: adding benchmarks and visuals.
Trade-offs
- • Gradient boosting is easier to explain than deep learning but still requires careful feature engineering.
- • Batch retraining keeps the model current, but inference is still tied to the CSV ingestion cadence.
What it is
Traffic forecasting for operations, blending historical counts, weather context, and app-layer controls to deliver dependable short-term demand estimates.
What was built
- Feature pipeline ingests CSV counts + weather and writes normalized datasets with time-of-day segments.
- Gradient-boosted trees trained with early stopping, hyperparameter sweeps, and per-corridor weighting.
- CLI inference script rounds predictions, flags confidence, and emits timestamped CSV for dashboards.
Architecture / flow
Client → Python worker queue → model + feature store → PostgreSQL archive. Request flows go through the shared API, and responses feed a caching layer before dispatch.
Key trade-offs
- GBDTs sacrifice end-to-end deep learning for faster iteration and explainability.
- Batch retraining keeps calibration in sync but still depends on human review of sensor drift.
Proof
- Monitoring plots captured in `ci/traffic-latency.log` (linked in README).
- Inference script includes unit tests that exercise edge-window logic (see `/tests/traffic`).
Next improvements
- • Automate periodic evaluation so we can detect dataset drift before it shows up in production.
- • Expand the model card with fairness notes around how weather/holiday data is sourced.
If revisiting today, I’d automate drift detection and log feature importances per corridor.
Links
Data needed
- • Repo link
- • One screenshot
- • One metric/benchmark
- • One short demo artifact
In progress: adding additional benchmarks.