Venue Extractors Reference¶

This document covers all venue extractors, their current status, URL patterns, and implementation details. For general plugin development patterns, see PLUGIN_DEVELOPMENT.md.

Table of Contents¶

Extractor Architecture
Registered Venues
LIV Nightclub / LIV Beach
XS Nightclub
Encore Beach Club (EBC)
TAO Group (Omnia, Hakkasan, Marquee, Jewel)
Adding a New Extractor

Extractor Architecture¶

flowchart TD
    A["Venue Sitemap URL"] --> B["Crawlee Sitemap Handler"]
    B --> C{"SitemapIndex.diff()"}
    C -->|new/updated| D["Enqueue event URLs"]
    C -->|unchanged| E["Skip"]
    D --> F{"ExtractorRegistry\nfirst match wins"}
    F -->|livnightclub.com| G["LIVExtractor"]
    F -->|wynnsocial.com| H{"Venue detection"}
    H -->|EBC page text| I["EBCExtractor"]
    H -->|default| J["XSExtractor"]
    F -->|taogroup.com| K["TaoGroupExtractor"]
    G --> L["VegasEvent"]
    I --> L
    J --> L
    K --> L
    L --> M["StorageManager\n+ MasterDatabase"]

    style A fill:#7c3aed,color:#fff
    style L fill:#059669,color:#fff
    style E fill:#6b7280,color:#fff

All extractors live in src/extractors/ and inherit from VenueExtractor (base in src/extractors/__init__.py).

Type safety: EventData¶

Every extractor builds its output as an EventData TypedDict and constructs the final object through the factory method — never VegasEvent(**dict) directly:

from src.models import EventData, VegasEvent

event_data: EventData = {
    "url": url, "scraped_at": scraped_at,
    "performer": performer, "venue": venue, "event_date": event_date,
}
# ... add optional fields conditionally ...
return VegasEvent.from_extractor_data(event_data)

ty (Astral's type checker, run in pre-commit) validates every key and value type at the extractor call site — misspelled field names or wrong types are caught before the commit lands. See DATA_MODEL.md § EventData TypedDict and § Type Safety Guarantees for the full list of what ty catches.

src/extractors/
├── __init__.py        # VenueExtractor base + ExtractorRegistry
├── liv.py             # LIV Nightclub + LIV Beach (Dayclub)
├── wynn.py            # WynnSocialBase (shared base for XS + EBC)
├── xs.py              # XS Nightclub
├── ebc.py             # Encore Beach Club + EBC at Night
└── tao.py             # TAO Group (Omnia, Hakkasan, Marquee, Jewel, etc.)

WynnSocial Shared Base

XS and EBC both inherit from WynnSocialBase in wynn.py, which provides shared logic for Schema.org JSON-LD parsing, table pricing extraction (uv_tablesitems), and URL pattern handling. Registration order in create_default_registry() matters — first match wins, and EBCExtractor handles disambiguation via _detect_venue().

Registration order in create_default_registry() matters — first match wins. XS and EBC both use wynnsocial.com, but the EBCExtractor handles disambiguation via _detect_venue().

Registered Venues¶

Extractor	Venue(s)	Domain	Status
`LIVExtractor`	LIV Nightclub, LIV Beach	`livnightclub.com`	✅ Production
`XSExtractor`	XS Nightclub	`wynnsocial.com`	✅ Production
`EBCExtractor`	Encore Beach Club, EBC at Night	`wynnsocial.com`	🚧 Validation in progress
`TaoGroupExtractor`	Omnia, Hakkasan, Marquee, Jewel + 5 dayclubs	`taogroup.com`	✅ Phase 1+2 complete

LIV Nightclub / LIV Beach¶

File: src/extractors/liv.py

Domain: livnightclub.com
Data format: Schema.org Event JSON-LD embedded in page
Venues handled: LIV Nightclub (nightclub) and LIV Beach (dayclub)
Table pricing: Separate urvenue API call — see TABLE_PRICING.md

URL Pattern¶

https://livnightclub.com/event/{slug}/

Key Extraction Logic¶

Find <script type="application/ld+json"> with "@type": "Event"
Extract performer, date, images from JSON-LD
Fall back to og:image for images if JSON-LD image is missing

XS Nightclub¶

File: src/extractors/xs.py Base class: WynnSocialBase

Domain: wynnsocial.com
Data format: Schema.org Event JSON-LD (same site as EBC)
Table pricing: uv_tablesitems JS variable embedded in page (no API call needed)

URL Pattern¶

https://www.wynnsocial.com/event/EVE{id}{YYYYMMDD}/{slug}/

The EVE segment encodes the date: last 8 chars are YYYYMMDD.

Key Extraction Logic¶

Find <script type="application/ld+json"> with "@type": "Event"
Extract performer from performer[0].name, date from startDate
Extract table pricing from inline uv_tablesitems JS var (WynnSocialBase.extract_table_pricing())

Encore Beach Club (EBC)¶

File: src/extractors/ebc.py Base class: WynnSocialBase

Domain: wynnsocial.com (same as XS)
Data format: HTML-only — no Schema.org JSON-LD on EBC pages
Venues: Encore Beach Club (Dayclub) and Encore Beach Club at Night
Operator: Wynn Nightlife (same as XS)

URL Pattern¶

Same Wynn Social pattern as XS:

https://www.wynnsocial.com/event/EVE{id}{YYYYMMDD}/{slug}/

Venue Detection¶

Since EBC and XS share the same domain, EBCExtractor._detect_venue() inspects page text:

if "encore beach club at night" in page_text:
    return "Encore Beach Club at Night"
if "encore beach club" in page_text:
    return "Encore Beach Club"
return None  # Not an EBC page — let XS handle it

Table Pricing¶

EBC pages use the same uv_tablesitems JS variable as XS. WynnSocialBase.extract_table_pricing() handles both — no EBC-specific code needed.

EBC Rollout Phases¶

EBC is being rolled out in phases to avoid wasted effort. Test 5 events after each phase before proceeding.

Phase	Issue	Description	Status
1a	#85	EBC at Night extractor validation (5-event test)	🔲 Open
1b	#86	EBC Dayclub extractor validation (5-event test)	🔲 Open
—	#87	Artist info enrichment (description, streaming links)	🔲 Open
—	#88	Venue info enrichment (hours, capacity, dress code)	🔲 Open
2	#89	Table pricing extraction (5-event test)	🔲 Open
3	#90	Image extraction — full gallery (5-event test)	🔲 Open

Dependency chain: Phase 1 → Phase 2 → Phase 3 → Full calendar scrape

Testing EBC Events¶

# Test a single EBC event URL
just scrape -u "https://www.wynnsocial.com/event/EVE.../slug/" --max-requests 5

# Inspect extracted data
just list-runs
cat runs/latest/events.json | jq '.[0]'

# Check table pricing
cat runs/latest/events.json | jq '.[0].table_pricing'

# Check images
just images-download latest
just images-status latest

TAO Group (Omnia, Hakkasan, Marquee, Jewel)¶

File: src/extractors/tao.py

Domain: taogroup.com
Data format: Schema.org Event JSON-LD + og:title for performer name
Venues handled: 10 Las Vegas venues (4 nightclubs + 6 day/pool venues)
Table pricing: urvenue API via booketing.com proxy (same protocol as LIV, different base URL)
Sitemaps: events-sitemap4.xml and events-sitemap5.xml (2026 events only)

URL Pattern¶

https://taogroup.com/event/{M}-{D}-{YYYY}-{slug}/

Example: https://taogroup.com/event/3-20-2026-tyga-hakkasan-nightclub/

Las Vegas Venues¶

Venue	Type	Hotel	Venue Tag
Omnia Nightclub	Night	Caesars Palace	`omn`
Hakkasan Nightclub	Night	MGM Grand	`hak`
Marquee Nightclub	Night	Cosmopolitan	`marq`
Jewel Nightclub	Night	Aria	`jwl`
Marquee Dayclub	Day	Cosmopolitan	`marqd`
Tao Beach Dayclub	Day	Venetian	`taob`
Tao Nightclub	Night	Venetian	`tao`
Wet Republic Ultra Pool	Day	MGM Grand	`wet`
Palm Tree Beach Club	Day	Mandalay Bay	`palm`
Liquid Pool Lounge	Day	Aria	`liq`

Key Extraction Logic¶

Find <script type="application/ld+json"> with "@type": "Event"
Extract date from startDate, time from startDate/endDate (ISO format)
Performer: Parse from og:title ("M/D/YYYY - PERFORMER - VENUE") — JSON-LD performer.name is bugged (returns true)
Venue: From JSON-LD location.name, strip - Las Vegas suffix, normalize casing
Images: Prefer JSON-LD image (artist photo) over og:image (may be venue default)
Non-LV filtering: Skip events where venue is not in _LAS_VEGAS_VENUES set (taogroup.com is global)

Quirks¶

Performer name bug: JSON-LD performer.name returns boolean true instead of the actual name. Must parse from og:title split on " - ".
Global domain: taogroup.com covers NYC, LA, Singapore venues too. URL-level pre-filter in crawlee_main.py uses _TAO_LV_VENUE_SLUGS allowlist.
Sitemap filtering: Only sitemaps 4+5 have 2026 events. Date and venue slug filters applied at the URL level before crawling.
Per-venue sitemap filtering: vinny scrape omnia filters sitemap URLs by slug before crawling. See CLI Aliases section below.
Booketing.com is urvenue: TAO pricing uses the same urvenue protocol as LIV, routed through booketing.com/uws/house/proxy with an extra manageentid=61 param. Venue codes: VEN1085 (Hakkasan), VEN1089 (Omnia), VEN1108 (Marquee).

TAO Sitemap Filtering

TAO Group sitemaps cover venues globally (NYC, LA, Singapore). Vinny applies a two-level filter: only sitemaps 4+5 are fetched (2026 events), and per-venue aliases like vinny scrape omnia apply URL slug filters before crawling. The slug mapping lives in _TAO_ALIAS_SLUGS in src/cli_scrape.py.

Image CDN: WordPress wp-content URLs with Fastly IO resize (?width=N). Max 1080px original.

CLI Aliases & Per-Venue Filtering¶

All TAO aliases point to the same two sitemaps, but per-venue aliases apply sitemap-level URL filtering so only matching events are crawled:

vinny scrape tao          # All TAO LV venues (no filter)
vinny scrape tao-group    # Same as tao
vinny scrape omnia        # Only URLs containing "omnia"
vinny scrape hakkasan     # Only "hakkasan-nightclub" URLs
vinny scrape marquee      # Both "marquee-nightclub" and "marquee-dayclub" URLs
vinny scrape jewel        # Only "jewel-nightclub" URLs
vinny scrape omnia hakkasan  # Both omnia + hakkasan URLs

Filtering happens in crawlee_main.py at the sitemap handler level — event pages for other venues are never downloaded. The slug mapping lives in _TAO_ALIAS_SLUGS in src/cli_scrape.py.

Incremental Sitemap Scraping¶

Venues that use sitemaps (LIV, TAO Group) support incremental scraping — only new or updated event URLs are crawled on subsequent runs.

How It Works¶

Sitemap XML → Parse <loc> + <lastmod> pairs
                    ↓
            SitemapIndex.diff()
                    ↓
        ┌───────────┼───────────┐
       new       updated    unchanged
        ↓           ↓           ↓
     enqueue     enqueue      skip

SitemapIndex (src/sitemap_index.py) stores URL + lastmod + scraped_at per event in data/sitemaps/{source_key}.json
On each run, the sitemap handler parses <url>/<loc>/<lastmod> pairs and diffs against the stored index
Only new and updated (lastmod changed) URLs are enqueued for scraping
Past events (date parsed from URL < today) are auto-skipped
All visited URLs are marked in the index after the crawl — even if extraction returns nothing (prevents infinite re-visits of unparseable pages)

Source Keys¶

Pattern in URL	Source Key	Index File
`taogroup.com/events-sitemap`	`tao-group`	`data/sitemaps/tao-group.json`
`livnightclub.com/events-sitemap`	`liv`	`data/sitemaps/liv.json`

CLI¶

# Normal run — only scrapes new/changed URLs
vinny scrape tao
vinny sync omnia

# Force full re-scrape (bypass diff)
vinny scrape tao --force
vinny sync omnia --force

# Check index status
vinny sitemap-status

Master DB Fallback in `vinny sync`¶

When vinny sync <venue> finds 0 new events (everything already indexed), it loads matching events from the master database so the rest of the pipeline (images → R2 → D1) still runs. This handles the common case of pushing already-scraped events through the full pipeline for the first time.

Key Files¶

src/sitemap_index.py — SitemapIndex, SitemapEntry, DiffResult models
src/crawlee_main.py — _SITEMAP_SOURCE_KEYS, sitemap handler with diff logic
src/cli_sitemap.py — vinny sitemap-status command
data/sitemaps/ — persisted index JSON files

Adding a New Extractor¶

See PLUGIN_DEVELOPMENT.md for the full step-by-step guide and DATA_MODEL.md for the complete field reference.

Quick checklist:

Create src/extractors/{venue}.py inheriting from VenueExtractor (or WynnSocialBase for Wynn properties)
Implement name, domain, and extract()
Register in create_default_registry() in src/extractors/__init__.py
Test with 5 events before full calendar scrape
Document in this file

Last updated: 2026-03-05

Venue Extractors Reference¶

Table of Contents¶

Extractor Architecture¶

Type safety: EventData¶

Registered Venues¶

LIV Nightclub / LIV Beach¶

URL Pattern¶

Key Extraction Logic¶

XS Nightclub¶

URL Pattern¶

Key Extraction Logic¶

Encore Beach Club (EBC)¶

URL Pattern¶

Venue Detection¶

Table Pricing¶

EBC Rollout Phases¶

Testing EBC Events¶

TAO Group (Omnia, Hakkasan, Marquee, Jewel)¶

URL Pattern¶

Las Vegas Venues¶

Key Extraction Logic¶

Quirks¶

CLI Aliases & Per-Venue Filtering¶

Incremental Sitemap Scraping¶

How It Works¶

Source Keys¶

CLI¶

Master DB Fallback in vinny sync¶

Key Files¶

Adding a New Extractor¶

Master DB Fallback in `vinny sync`¶