Analytics Engineering

Nobel Prize Data Lake — Medallion Architecture on AWS

A medallion-architecture data lake on AWS S3 with Prefect-orchestrated ETL Lambdas — raw API responses → bronze joins → silver analytics table.

Movie Database ETL Pipeline — Multi-Source Ingestion to SQLite

An end-to-end ETL pipeline that ingests movie metadata from IMDb bulk files and a REST API, stages it in MongoDB, and lands it in a normalized relational schema with foreign keys.

Airbnb Valencia — Cloud BI with Supabase + Preset

A full BI stack analyzing 8,847 Airbnb listings — from raw CSV load through SQL modeling on Supabase to stakeholder dashboards in Preset.

Movie Analytics — Deep EDA for Investment Decisions

A deep exploratory analysis across 4,000 movies to identify the factors driving box-office success — framed around a low-budget-production investment scenario.

Sentiment Analysis at Scale — PySpark on AWS

A distributed ML pipeline processing 17M Amazon reviews with PySpark MLlib on AWS Glue — including S3 medallion storage, feature engineering, and model serialization for batch inference.

Spanish Electricity Demand — Time-Series Pipeline with InfluxDB + Forecasting

A continuous ingestion pipeline for Spanish grid demand with InfluxDB storage and Prophet-based day-ahead forecasting. Includes a dashboard for real vs forecast visualization.

Formula 1 Data Analysis — Multi-Table Pandas Pipeline

A pandas-driven analysis across 13 relational CSVs (75 years of F1 history) with multi-way joins, filtering, and map-based visualization.