Nobel Prize Data Lake — Medallion Architecture on AWS
A medallion-architecture data lake on AWS S3 with Prefect-orchestrated ETL Lambdas — raw API responses → bronze joins → silver analytics table.
A medallion-architecture data lake on AWS S3 with Prefect-orchestrated ETL Lambdas — raw API responses → bronze joins → silver analytics table.
An end-to-end ETL pipeline that ingests movie metadata from IMDb bulk files and a REST API, stages it in MongoDB, and lands it in a normalized relational schema with foreign keys.
A full BI stack analyzing 8,847 Airbnb listings — from raw CSV load through SQL modeling on Supabase to stakeholder dashboards in Preset.
A deep exploratory analysis across 4,000 movies to identify the factors driving box-office success — framed around a low-budget-production investment scenario.
A distributed ML pipeline processing 17M Amazon reviews with PySpark MLlib on AWS Glue — including S3 medallion storage, feature engineering, and model serialization for batch inference.
A continuous ingestion pipeline for Spanish grid demand with InfluxDB storage and Prophet-based day-ahead forecasting. Includes a dashboard for real vs forecast visualization.
A pandas-driven analysis across 13 relational CSVs (75 years of F1 history) with multi-way joins, filtering, and map-based visualization.