What it covers

Two-notebook project on the Ames Housing dataset:

  • Part 1 — Data Preparation & Exploration: feature type classification, ordinal encoding, missing-value imputation, correlation analysis
  • Part 2 — Modeling & Analysis: Ridge with hyperparameter tuning, Decision Tree regression, Random Forest with feature importance, KNN with standardization, log-transform feature engineering, outlier detection, SHAP force plots + global SHAP summary, Partial Dependence Plots, PCA, t-SNE, Gaussian Mixture clustering

Why it matters

This is the methodological-depth piece of the portfolio — the project where every step is done by the book:

  • Separate training-dependent preprocessing (scalers, imputers) from training-independent steps
  • Ridge vs. RF vs. KNN with the same preprocessing pipeline — a clean comparison
  • SHAP + PDP as complementary explainability techniques (global vs. local)
  • Feature engineering (log transforms) drops RMSE visibly
  • Unsupervised analysis (PCA, t-SNE, GMM) for outlier detection and regime analysis