Skip to content

Data Model Reference

This document covers the core Python types in src/models/__init__.py, the EventData TypedDict that extractors use as their output contract, and the type-safety guarantees enforced at pre-commit time.

For how events reach Cloudflare D1 see D1_DEPLOYMENT.md. For how extractors populate these fields see EXTRACTORS.md.


Table of Contents

  1. VegasEvent
  2. Event Identity: composite_key vs external_id
  3. Supporting Models
  4. EventData TypedDict
  5. Type Safety Guarantees
  6. Data Flow

Entity Relationships

erDiagram
    VegasEvent ||--o| TablePricing : "has pricing"
    VegasEvent ||--o{ ImageMetadata : "has images"
    VegasEvent ||--o| StreamingLinks : "has links"
    VegasEvent ||--o| ArtistStats : "has stats"
    VegasEvent ||--o| EnrichmentStatus : "tracks enrichment"
    VegasEvent ||--o| SocialLinks : "has socials"
    TablePricing ||--|{ TablePricingTier : "has tiers"

    VegasEvent {
        string event_date
        string performer
        string venue
        string url
        string composite_key
        string venue_id
        string external_id
    }
    TablePricing {
        list tiers
    }
    TablePricingTier {
        string name
        float min_spend
        float pay_now
        int guests
        bool available
    }
    ImageMetadata {
        string source_url
        string category
        dict sizes
        string status
    }
    EnrichmentStatus {
        bool spotify
        bool tracklists
        bool resident_advisor
        dict errors
    }

VegasEvent

File: src/models/__init__.pyclass VegasEvent(BaseModel)

The canonical representation of a single nightlife event. All extractors produce a VegasEvent; all exporters (D1, SQLite, CSV) consume one.

Fields

Field Type Required Description
event_date str YYYY-MM-DD — validated by field_validator
performer str Main headliner name (display form, e.g. "DOM DOLLA")
venue str Venue name (e.g. "LIV Las Vegas")
url str Source URL the event was scraped from
scraped_at str ISO-8601 timestamp with timezone
title str \| None Page title, e.g. "DOM DOLLA at LIV Las Vegas"
event_time str \| None HH:MM local time
event_datetime str \| None ISO-8601 datetime if available
venue_address str \| None Street address
age_requirement str Defaults to "21+ only"
description str \| None Full event description
description_short str \| None Short teaser copy
artist_bio str \| None Artist biography (set by enrichment, not scraping)
artist_image_url str \| None R2 _main preset URL after upload; None until then
artist_image_url_full str \| None R2 _hd preset URL after upload; None until then
streaming_links StreamingLinks \| None Spotify / Apple Music / YouTube / SoundCloud
social_links SocialLinks \| None Instagram / TikTok / Beatport etc.
artist_stats ArtistStats \| None Spotify stats (set by enrichment)
enrichment_status EnrichmentStatus \| None Tracks which enrichment passes have run
performers list[str] \| None All performers including headliner + supporting acts
venue_description str \| None Venue blurb
venue_id str \| None Venue code for the table-pricing API (e.g. "VEN1121561")
table_pricing TablePricing \| None VIP table sections and min-spend tiers
ticket_url str \| None Direct ticket purchase link
external_id str \| None Scraper-assigned ID (e.g. EVE111500020260531); not the D1 PK
images list[Any] ImageMetadata objects (populated by image plugins)
history EventHistory \| None Field-level change log; excluded from serialization

Key computed property

@property
def composite_key(self) -> str:
    perf = self.performer.lower().replace(" ", "-")
    ven  = self.venue.lower().replace(" ", "-")
    return f"{self.event_date}-{perf}-{ven}"
    # → "2026-05-02-dom-dolla-liv-las-vegas"

Event Identity: composite_key vs external_id

This distinction is the most common source of confusion. Read this section carefully before touching any exporter or migration.

external_id — unstable, informational

VegasEvent.external_id stores whatever ID the scraper found on the page, typically the EVE... segment of a Wynn Social or LIV URL:

https://www.wynnsocial.com/event/EVE111500020260531/dom-dolla/
                                  ^^^^^^^^^^^^^^^^^^^^
                                  external_id = "EVE111500020260531"

This value changes between scrapes. The same real-world event may have a different EVE... string each time it is re-scraped. It is stored for debugging and is written to the D1 events table as a non-PK column, but it is never used as a join key.

composite_key — stable, the D1 primary key

composite_key = f"{event_date}-{performer_slug}-{venue_slug}"
# "2026-05-02-dom-dolla-liv-las-vegas"

This is derived entirely from human-meaningful fields that don't change between scrapes of the same event. It is used as event_id (the PK) in every D1 table, so all child rows (event_artists, table_tiers, images) join on it.

The D1 exporter enforces this with an assertion:

primary_event_id = event.composite_key
assert primary_event_id, f"composite_key is empty for event: {event.model_dump()}"

Why not just use external_id as the PK?

Concern composite_key external_id
Stable across re-scrapes ✅ Yes ❌ No
Human-readable in dashboard ✅ Yes ⚠️ Opaque
Works without scraper-side IDs ✅ Yes ❌ Breaks on HTML-only pages
Derivable from site data alone ✅ Yes ❌ Only on JS-heavy pages

Supporting Models

class StreamingLinks(BaseModel):
    spotify:     str | None = None
    apple_music: str | None = None
    youtube:     str | None = None
    soundcloud:  str | None = None

Stored as streaming_links_json TEXT in D1 (JSON-encoded). Populated by the LIV extractor (from description HTML) and by the Spotify enrichment plugin.

TablePricing / TablePricingTier

class TablePricingTier(BaseModel):
    name:      str            # "Dance Floor", "VIP Booth", etc.
    min_spend: float | None
    pay_now:   float | None   # deposit amount
    guests:    int | None     # table capacity
    available: bool = True

class TablePricing(BaseModel):
    tiers: list[TablePricingTier]

Tiers are fetched from the urvenue API (wp-admin/admin-ajax.php?action=uvpx&uvaction=uwspx_map) using event.venue_id. See TABLE_PRICING.md.

Populated by enrichment plugins (Spotify, Resident Advisor), not by venue extractors. See PLUGIN_DEVELOPMENT.md.


EventData TypedDict

File: src/models/__init__.pyclass EventData(_EventDataRequired, total=False)

EventData is the typed output contract for every venue extractor. Instead of building an untyped dict and passing it to VegasEvent(**dict) with a # type: ignore, extractors annotate their local dict as EventData and call the factory:

# In any extractor:
event_data: EventData = {
    "url":        url,
    "scraped_at": scraped_at,
    "performer":  performer,
    "venue":      venue,
    "event_date": event_date,
}

# ... conditionally add optional fields ...
if ticket_url:
    event_data["ticket_url"] = ticket_url

return VegasEvent.from_extractor_data(event_data)

Required keys (must always be present)

Defined on _EventDataRequired(TypedDict):

Key Type
url str
scraped_at str
performer str
venue str
event_date str

Optional keys (total=False)

Defined on EventData(_EventDataRequired, total=False):

Key Type
title str
event_time str
event_datetime str
venue_address str
description str
description_short str
artist_image_url str
artist_image_url_full str
streaming_links StreamingLinks
ticket_url str
age_requirement str
performers list[str]
images list[Any]
external_id str
venue_id str

from_extractor_data factory

@classmethod
def from_extractor_data(cls, data: EventData) -> VegasEvent:
    return cls(**data)

The single TypedDict → BaseModel boundary is here, not scattered across every extractor. ty can fully validate the TypedDict at the call sites so no # type: ignore is needed anywhere.


Type Safety Guarantees

Vinny uses ty (Astral's type checker) via pre-commit, so these errors are caught before any code reaches main.

What ty catches at extractor call sites

Mistake ty error
Misspelled field: "perfromer" unknown-key on EventData
Old field name: "event_id" (renamed to external_id) unknown-key on EventData
Wrong value type: event_data["performers"] = "DOM DOLLA" invalid-assignment — expected list[str]
Missing required key: omitting "event_date" missing-key at VegasEvent.from_extractor_data(event_data)
Passing str \| None for a required str field invalid-argument-type

Pre-commit hook order

ruff (lint) → ruff-format → ty (type-check) → pytest

All four must pass for a commit to succeed. ty runs on the full src/ tree, not just changed files — so a rename in models/__init__.py will immediately surface stale references in all extractors.

Adding a new field to VegasEvent

  1. Add the field to VegasEvent (with a default of None or a Field(default_factory=...) so existing events don't break)
  2. If extractors need to supply it, add the key to EventData (in _EventDataRequired if always required, or in EventData with total=False if optional)
  3. Run just check — ty will surface any missed call sites

Data Flow

flowchart TD
    A["Venue Page\n(HTML / JSON-LD)"] --> B["VenueExtractor.extract()\nbuilds EventData dict"]
    B --> C["VegasEvent.from_extractor_data()\nsingle typed boundary"]
    C --> D["VegasEvent"]
    E["Enrichment Plugins\nartist_stats, streaming_links,\nsocial_links, artist_bio"] --> D
    D --> F["MasterDatabase.add_or_update_event()\nmerge + track field history"]
    F --> G["D1Exporter"]
    F --> H["SQLiteExporter"]
    G --> I["Cloudflare D1\n(remote production)"]
    H --> J["Local events.db\n(dev / archive)"]

    style A fill:#7c3aed,color:#fff
    style D fill:#059669,color:#fff
    style I fill:#d97706,color:#fff
    style J fill:#d97706,color:#fff

Key points: - Extractors never touch VegasEvent directly — only through EventData + factory - Enrichment plugins receive a VegasEvent and return a new one (immutable pattern) - Exporters are read-only consumers — they never modify events - composite_key is computed on-the-fly from VegasEvent fields; it is never stored in the Python object, only materialised when writing to D1/SQLite


Last updated: 2026-03-04