Skip to main content

Happy New Year 2026 — wishing you good health and a positive year ahead. — Dr. Kusse Sukuta Bersha (PhD)

Featured Projects

Traffic Flow Forecasting

Back to projects

Summary

Forecasts short-term roadway demand via a lightweight ML pipeline that can run on commodity hardware.

Accurate short-term forecasts let dispatch planners pre-stage crews without overreacting to noisy counts.

Problem

Operators need reliable traffic estimates ahead of dispatch windows, but raw counting data is noisy and released with lag.

Approach

Engineered a pipeline that blends historical counts, calendar context, and weather features, then trains gradient-boosted trees with early stopping and cross-validation.

Highlights

  • Feature pipeline merges historical counts with weather and calendar context windows.
  • Training includes early stopping, cross-validation, and serialization for CLI inference.
  • Inference script ingests CSV data and writes per-timestep predictions with confidence flags.

Results

RMSE (validation)
≈ 9.3
Calculated over last 4 weeks of eastbound loop counts.
Inference latency
≈ 180ms per timestep
Measured on a 2.4GHz laptop.
Evaluation
Gradient-boosted trees with 5-fold cross-validation
Temporal holdout, last 14 days
Baseline: Last observed count
Limitations
  • Peak-hour surges need richer error analysis; current feature set misses unusual congestion spikes.
  • Confidence drops when weather sensors report errors; fallback still requires human oversight.

In progress: adding benchmarks and visuals.

Trade-offs

  • Gradient boosting is easier to explain than deep learning but still requires careful feature engineering.
  • Batch retraining keeps the model current, but inference is still tied to the CSV ingestion cadence.

What it is

Traffic forecasting for operations, blending historical counts, weather context, and app-layer controls to deliver dependable short-term demand estimates.

What was built

  • Feature pipeline ingests CSV counts + weather and writes normalized datasets with time-of-day segments.
  • Gradient-boosted trees trained with early stopping, hyperparameter sweeps, and per-corridor weighting.
  • CLI inference script rounds predictions, flags confidence, and emits timestamped CSV for dashboards.

Architecture / flow

Client → Python worker queue → model + feature store → PostgreSQL archive. Request flows go through the shared API, and responses feed a caching layer before dispatch.

Key trade-offs

  • GBDTs sacrifice end-to-end deep learning for faster iteration and explainability.
  • Batch retraining keeps calibration in sync but still depends on human review of sensor drift.

Proof

  • Monitoring plots captured in `ci/traffic-latency.log` (linked in README).
  • Inference script includes unit tests that exercise edge-window logic (see `/tests/traffic`).

Next improvements

  • Automate periodic evaluation so we can detect dataset drift before it shows up in production.
  • Expand the model card with fairness notes around how weather/holiday data is sourced.
If revisiting today

If revisiting today, I’d automate drift detection and log feature importances per corridor.

Links

Data needed

  • • Repo link
  • • One screenshot
  • • One metric/benchmark
  • • One short demo artifact
Traffic Flow Forecasting visual

In progress: adding additional benchmarks.