Airbnb Valencia — Cloud BI with Supabase + Preset | Luis Núñez

The problem

Analyze the Valencia Airbnb market using production-style BI infrastructure: cloud PostgreSQL as the warehouse, a modern BI tool as the front-end. Two stakeholders — a prospective investor and an established property manager — each need a tailored dashboard answering their specific questions.

Stack

Warehouse: Supabase-hosted PostgreSQL (Session Pooler connection)
BI tool: Preset (managed Apache Superset)
Source: Inside Airbnb — 8,847 listings + 416,389 reviews for Valencia
Orchestration/ingest: Python + psycopg2 + pandas

Architecture

┌──────────────────┐
│ Inside Airbnb    │
│ CSV.gz downloads │
└──────────┬───────┘
           ▼
┌──────────────────┐    COPY / psycopg2
│  Supabase Postgres│◄─────────────────
│  apartamentos    │
│  reviews         │
└──────────┬───────┘
           ▼      CREATE TABLE AS SELECT
   ┌──────────────────┐
   │ apartamentos_clean│    (proper types, selected cols)
   │ reviews_clean     │
   └──────────┬───────┘
              ▼
      ┌──────────────────┐
      │  Preset workspace │
      │  SQL Lab          │
      │  2 dashboards     │
      └──────────────────┘

Key decisions and trade-offs

Raw-to-clean modeled with `CREATE TABLE AS`

Instead of storing raw Airbnb data and wrangling types at query time in Preset, I materialized clean tables in Postgres via CREATE TABLE AS SELECT. Typed columns, scrubbed sentinels (\N, nan, empty strings), currency stripping from price, and host_is_superhost binarization all happen once at load rather than repeatedly per query.

CREATE TABLE public.apartamentos_clean AS
SELECT
    id::BIGINT AS id,
    neighbourhood_cleansed::VARCHAR(100) AS neighbourhood_cleansed,
    CAST(NULLIF(NULLIF(NULLIF(latitude, '\N'), ''), 'nan') AS FLOAT) AS latitude,
    CASE
        WHEN price IN ('\N', '', 'nan') THEN NULL
        ELSE CAST(REPLACE(REPLACE(price, '$', ''), ',', '') AS NUMERIC(10,2))
    END AS price,
    CASE
        WHEN host_is_superhost IN ('t', 'true', 'True', 'TRUE') THEN TRUE
        ELSE FALSE
    END AS host_is_superhost,
    -- ... 20+ other casts
FROM public.apartamentos;

This is the standard analytics-engineering pattern: push data-quality logic as close to the source as possible, so downstream queries (and BI tool users) get clean data by default.

Business-question framing, not just “SQL practice”

Every analytical query answers a concrete business question:

Where are the most expensive listings by property type?
How has review volume evolved over time?
Which neighborhoods offer the best value-for-money?
Where are large-portfolio hosts concentrated?
Which neighborhoods grew fastest in the last 3 years?

Each query is paired with a chart-type recommendation and a written analysis — because “what chart to build” is half the AE job.

Two dashboard personas, two different lens

Investor dashboard answers “where to invest?”:

Revenue per listing by neighborhood (choropleth map)
Review volume and average price over time (time series)
Rating vs. review count by neighborhood (scatter)
Year-over-year growth in supply (stacked bar)

Property manager dashboard answers “how is my portfolio performing?”:

Listing-level detail table with host filters
Per-listing control panel (reviews, rating, price)
Price vs. rating scatter to surface outliers

Same underlying dataset. Different aggregations. Different questions. Different storytelling.

Notable findings

2020 COVID crater: Review volume cratered mid-2020, recovered through 2021, and now exceeds pre-pandemic levels
Superhost paradox: Superhosts charge less on average than non-Superhosts — possibly optimizing for occupancy over premium pricing
Fallas effect: March is a consistent review-volume peak across all years — a measurable tourism signal for the local event

Dashboard outputs

The final PDF exports of both dashboards are in the Resultados/ folder alongside the notebook.

The problem#

Stack#

Architecture#

Key decisions and trade-offs#

Raw-to-clean modeled with CREATE TABLE AS#

Business-question framing, not just “SQL practice”#

Two dashboard personas, two different lens#

Notable findings#

Dashboard outputs#

Links#