This project predicts prices for active Melbourne property listings using transaction history. It scrapes Domain.com.au weekly, enriches the data with ABS demographics, crime statistics, and proximity-to-transport features, then trains an XGBoost model to predict prices for For Sale listings.
Data snapshot refreshed every Sunday at 9:00 AM Melbourne time. Predictions reflect
the current market level by injecting today's Year and Month at inference.
The dashboard runs the XGBoost model directly in the browser via ONNX Runtime Web, so on-demand price estimates are instant and entirely client-side.
Interactive map with 38k For Sale listings, suburb choropleth, filters, charts, and an in-browser ONNX prediction tab.
Explore nowExploratory analysis: data quality, outliers, time-series price inflation, Sold vs For Sale drift.
View insightsModel development: Linear vs Random Forest vs XGBoost, SHAP interpretability, final predictions.
Read analysisGitHub repository with full pipeline, notebooks, and reproducible scripts.
Repositoryetl/extract_house_price.py): pulls Sold and For Sale listings from Domain.com.au.
etl/enrich_property_data.py): joins ABS demographics, crime rate, and train-station distance per suburb.
production/clean.py): outlier handling, per-type LandSize caps, NaN policies.
production/train_pipeline.py): time-based 70/15/15 split, XGBoost with locked hyperparameters, quantile models for prediction intervals. Exports .pkl and .onnx for browser inference.
production/predict.py): For Sale inference with current Year/Month injection, deal-signal classification.
production/weekly_update.py): GitHub Actions runs all stages every Sunday.