Featured Projects

Traffic Flow Forecasting

Summary

Forecasts short-term roadway demand via a lightweight ML pipeline that can run on commodity hardware.

Accurate short-term forecasts let dispatch planners pre-stage crews without overreacting to noisy counts.

Problem

Operators need reliable traffic estimates ahead of dispatch windows, but raw counting data is noisy and released with lag.

Approach

Engineered a pipeline that blends historical counts, calendar context, and weather features, then trains gradient-boosted trees with early stopping and cross-validation.

Highlights

• Feature pipeline merges historical counts with weather and calendar context windows.
• Training includes early stopping, cross-validation, and serialization for CLI inference.
• Inference script ingests CSV data and writes per-timestep predictions with confidence flags.

Results

RMSE (validation)

≈ 9.3

Calculated over last 4 weeks of eastbound loop counts.

Inference latency

≈ 180ms per timestep

Measured on a 2.4GHz laptop.

Evaluation

Gradient-boosted trees with 5-fold cross-validation

Temporal holdout, last 14 days

Baseline: Last observed count

Limitations

Peak-hour surges need richer error analysis; current feature set misses unusual congestion spikes.
Confidence drops when weather sensors report errors; fallback still requires human oversight.

In progress: adding benchmarks and visuals.

Trade-offs

• Gradient boosting is easier to explain than deep learning but still requires careful feature engineering.
• Batch retraining keeps the model current, but inference is still tied to the CSV ingestion cadence.

What it is

Traffic forecasting for operations, blending historical counts, weather context, and app-layer controls to deliver dependable short-term demand estimates.

What was built

Feature pipeline ingests CSV counts + weather and writes normalized datasets with time-of-day segments.
Gradient-boosted trees trained with early stopping, hyperparameter sweeps, and per-corridor weighting.
CLI inference script rounds predictions, flags confidence, and emits timestamped CSV for dashboards.

Architecture / flow

Client → Python worker queue → model + feature store → PostgreSQL archive. Request flows go through the shared API, and responses feed a caching layer before dispatch.

Key trade-offs

GBDTs sacrifice end-to-end deep learning for faster iteration and explainability.
Batch retraining keeps calibration in sync but still depends on human review of sensor drift.

Proof

Monitoring plots captured in `ci/traffic-latency.log` (linked in README).
Inference script includes unit tests that exercise edge-window logic (see `/tests/traffic`).

Next improvements

• Automate periodic evaluation so we can detect dataset drift before it shows up in production.
• Expand the model card with fairness notes around how weather/holiday data is sourced.

If revisiting today

If revisiting today, I’d automate drift detection and log feature importances per corridor.