Vinny Scraper

v2.3.1 · project recap · 2026-03-23 · 2-week window

Project Identity

Vegas nightlife event scraper covering 13+ venues across LIV, XS, Encore Beach Club, and the full TAO Group portfolio. Scrapes event listings, artist data, table pricing, and images — then exports to JSON, CSV, D1 SQL, and SQLite to power vinny.vegas.

v2.3.1 Python 3.10+ Crawlee Pydantic v2 Cyclopts 86 commits

Target users: promoter friends in the Vegas EDM/nightlife scene. Stage: actively shipping features — scraper core is stable, site is live, now adding AI agent layer.

Source Code

10,738 lines across 43 Python source files
7,324 lines across 21 test files
5 extractors · 7 CLI modules · 11 plugin files

Infrastructure

Cloudflare D1 (events DB) · R2 (img.vinny.vegas)
Astro site on Pages · docs.vinny.vegas (CLI + diagrams)
GitLab CI for site deploys · GitHub for scraper

Architecture Snapshot

graph TD
  subgraph CLI["CLI Layer — Cyclopts"]
    SCRAPE["vinny scrape\n(per-venue or all)"]
    SYNC["vinny sync\n(full pipeline)"]
    EXPORT["vinny export-d1\nexport-csv / sqlite"]
    TABLES["vinny tables\ndeals / heatmap"]
    ENRICH["vinny enrich\n(spotify/RA/tracklists)"]
    IMAGES["vinny images\n(download/upload/sync)"]
  end

  subgraph Extractors["Venue Extractors"]
    LIV["LIV / LIV Beach\n(livnightclub.com)"]
    WYNN["XS / EBC\n(wynnsocial.com)"]
    TAO["TAO Group × 10\n(taogroup.com sitemaps)"]
  end

  subgraph Core["Data Pipeline"]
    CRAWLEE["Crawlee\n(request queue)"]
    SITEMAP["SitemapIndex\n(incremental tracking)"]
    MODELS["Pydantic Models\n(VegasEvent, MasterDB)"]
    PRICING["Table Pricing\n(urvenue API)"]
    MASTER["MasterDatabaseManager\n(master_events.json)"]
  end

  subgraph Plugins["Plugin Layer"]
    IMG_DL["Image Downloader\n(VEA CDN)"]
    R2["R2 Storage\n(img.vinny.vegas)"]
    SPOTIFY["Spotify Enricher"]
    RA["Resident Advisor"]
    TL["1001Tracklists"]
  end

  subgraph Cloud["Cloudflare"]
    D1["D1 Database\n(events + artists + tiers)"]
    R2B["R2 Bucket\n(artist images)"]
    WORKER["Agent API Worker\n(3 endpoints for ElevenLabs)"]
    PAGES["Astro Site\n(vinny.vegas)"]
    EL["ElevenLabs ConvAI\n(voice/chat widget)"]
  end

  SCRAPE --> CRAWLEE
  SYNC --> CRAWLEE
  CRAWLEE --> SITEMAP
  SITEMAP --> LIV & WYNN & TAO
  LIV & WYNN & TAO --> MODELS
  MODELS --> PRICING
  PRICING --> MASTER
  MASTER --> EXPORT
  EXPORT --> D1

  ENRICH --> SPOTIFY & RA & TL
  SPOTIFY & RA & TL --> MASTER

  IMAGES --> IMG_DL
  IMG_DL --> R2
  R2 --> R2B

  D1 --> PAGES
  D1 --> WORKER
  R2B --> PAGES
  WORKER --> EL

  classDef cli fill:#3a7d5e22,stroke:#3a7d5e
  classDef extract fill:#2e7d8c22,stroke:#2e7d8c
  classDef core fill:#6b5b8a22,stroke:#6b5b8a
  classDef plugin fill:#b5761a22,stroke:#b5761a
  classDef cloud fill:#2e7d8c22,stroke:#2e7d8c

  class SCRAPE,SYNC,EXPORT,TABLES,ENRICH,IMAGES cli
  class LIV,WYNN,TAO extract
  class CRAWLEE,SITEMAP,MODELS,PRICING,MASTER core
  class IMG_DL,R2,SPOTIFY,RA,TL plugin
  class D1,R2B,WORKER,PAGES,EL cloud
      
CLI
Extractors / Cloud
Core Pipeline
Plugins

Recent Activity

10 commits over 2 weeks (Mar 9 – Mar 23). One dominant theme: the ElevenLabs ConvAI hackathon — building a voice/chat AI agent for the site. Plus CI automation and docs polish.

Grouped by Theme

Mar 21 · Feature
ElevenLabs ConvAI Agent (Phase 1)
End-to-end voice/chat concierge for vinny.vegas. Generated a 19-file knowledge base (4,147 lines) from 218 upcoming events across 14 venues. Wrote the system prompt (Vegas nightlife persona). Deployed a standalone Cloudflare Worker (vinny-vegas-events) with 3 API endpoints querying live D1 data. Automated tool registration via ElevenLabs API. Embedded the widget in the Astro site footer.
Mar 21 · Documentation
Concierge Widget Prompt + E2E Validation
Two-agent architecture documented (outbound call agent + website concierge). 9/9 test scenarios passing. Separate ELEVENLABS_WIDGET_PROMPT.md for the cold-start concierge persona (no dynamic variables, unlike the outbound agent).
Mar 21 · Infrastructure
Changelog Auto-Generation Workflow
GitHub Action runs changelogen on merge to main and commits the updated CHANGELOG.md.
Mar 23 · Documentation
SEO Audit Diagram + Project Cleanup
Added SEO audit diagram. Updated PLAN.md (consolidated, trimmed). Refreshed README, CLAUDE.md, and AGENTS.md.

Decision Log

Two-Agent Architecture for ElevenLabs

Decided: Separate agents for outbound calls (with dynamic variables like caller name, event details) vs. website widget (cold-start concierge, no context). Different system prompts, same backend tools.

Why: The widget has no prior context about the user — it needs a broader conversational opening. The outbound agent knows who it's calling and why. Merging them would dilute both experiences.

Standalone Worker for Agent API

Decided: Deploy vinny-vegas-events as its own Worker rather than adding routes to the Astro site.

Why: Astro Pages Functions have limitations for API routes. A standalone Worker is simpler to deploy, test, and iterate on independently. The ElevenLabs webhook calls need clean JSON responses with CORS — mixing that into the SSR site adds unnecessary coupling.

Changelogen via GitHub Action

Decided: Auto-generate CHANGELOG.md on merge to main. Why: Manual changelog updates were frequently forgotten. Conventional commits already provide the structured data — changelogen just needs to read them.

State of Things

8
Working
3
In Progress
2
Blocked
1
Known Issue
Working & Shipped
  • Scraper core: 13 venues, incremental sitemap
  • Table pricing: LIV, XS, TAO Group via urvenue API
  • Image pipeline: download → R2 → D1 sync
  • Artist enrichment: Spotify + RA + Tracklists
  • CLI: Cyclopts with 7 command groups
  • vinny.vegas live with Astro SSR + D1
  • ElevenLabs ConvAI widget embedded
  • CI: changelog auto-generation
In Progress
  • ElevenLabs hackathon phases 2-4 (#120, #121)
  • SEO audit findings (diagram created, fixes pending)
  • Table pricing routing diagram (modified, uncommitted)
Blocked / Waiting
  • Wet Republic: no events in TAO sitemaps yet (#10)
  • Stale venue_id for Jewel, Tao Night, Palm Tree (#98)
Known Issues
  • Events page loads 1,200+ events at once (no pagination, #113)
Uncommitted Changes

justfile (modified) · docs/diagrams/table-pricing-routing.html (modified)
Untracked: 3 hackathon playbooks (phases 2-4), diagrams index page, SEO audit HTML, maestro initiation playbooks

Mental Model Essentials

  1. Never hardcode venue IDs. The map page JS hardcodes VEN1121562 (wrong). Always use event.venue_id. The pricing URL builder _build_pricing_url() auto-routes TAO venues through the booketing.com proxy.
  2. Immutability everywhere. Never assign to Pydantic fields — always event.model_copy(update={...}). This applies to all model mutations throughout the codebase.
  3. Astro SSR fails silently. When template code throws during SSR, <main> is empty but HTTP is 200. Dev server shows the stack trace; production does not. XS Nightclub events are the best canary (sparsest data).
  4. Two separate repos. Scraper is on GitHub (prime-optimal/vinny). Astro site is on GitLab (optimalprime/vinny-vegas-app) in site/ (gitignored here). They share D1 + R2 but are deployed independently.
  5. TAO pricing goes through booketing.com. Same urvenue protocol as LIV/XS but routed through booketing.com/uws/house/proxy with extra manageentid=61 param.
  6. Incremental scraping. SitemapIndex tracks lastmod per URL. Only new/updated events are re-scraped. Past events auto-skipped. --force for full re-scrape.
  7. model_dump(mode="json") is required when storing values that will be serialized later. Path/datetime objects in FieldChange history caused json.dump crashes without it.
  8. User-Agent required for pricing API. The urvenue API returns 403 without a User-Agent header. No auth or cookies needed otherwise.
  9. R2 images at img.vinny.vegas. Old domain pub-a209680121414327917920199a3f8c63.r2.dev is deprecated. R2_PUBLIC_URL in .env must point to the custom domain.
  10. D1 exports: no ALTER TABLE. Wrangler sends SQL as one transaction. All columns must be in the CREATE TABLE statement. ALTER TABLE on existing columns fails the whole import.

Cognitive Debt Hotspots

High

src/worker.ts — Agent API Worker in a Python project

A TypeScript Worker file living in a Python project root. It has its own tsconfig.json and wrangler.jsonc but no README or docs explaining its relationship to the Python codebase. 318 lines with 3 D1 query endpoints. Suggestion: Move to a workers/agent-api/ subdirectory with its own README, or document the two-language setup in CLAUDE.md.

High

src/export/d1.py — 847 lines, largest single file

Handles SQL generation for events, artists, event_artists, and table_tiers tables. Includes both schema creation and data export. No inline documentation on the SQL generation strategy. Suggestion: Split into schema definitions vs. data export. Add comments explaining the "no ALTER TABLE" constraint and the full-replace-on-export pattern.

Medium

ElevenLabs knowledge base is static

The 19 KB files in data/elevenlabs-kb/ were generated once from a snapshot of 218 events. No automation to regenerate when new events are scraped or pricing changes. The live Worker API endpoints help, but the KB files will go stale. Suggestion: Add a vinny generate-kb CLI command to refresh the knowledge base.

Medium

src/main.py — 472 lines, legacy entry point

The original monolithic scraper entry point. Most functionality has been extracted to extractors, CLI modules, and plugins, but this file still exists at 472 lines. Unclear what still depends on it vs. the Cyclopts CLI. Suggestion: Audit imports to determine if this is dead code or still used by Apify/Crawlee runners.

Low

Maestro playbooks accumulating untracked

Four untracked hackathon playbooks (elevenlabs-hackathon-2/3/4.md + Initiation/). These contain valuable context for the ElevenLabs integration work but aren't version-controlled. Suggestion: Commit or gitignore them — untracked files in the working tree create noise.

Next Steps

Momentum is pointing toward completing the ElevenLabs hackathon and then circling back to site UX.

Immediate (from active work)
  • Commit untracked hackathon playbooks (phases 2-4)
  • ElevenLabs hackathon phases 2-4 — outbound calls, Twilio, WhatsApp (#121)
  • SEO audit fixes from the diagram findings
  • Commit modified justfile + table pricing diagram
Roadmap (from PLAN.md)
  • Events page pagination (#113) — stop loading 1,200+ events
  • Pricing freshness — scheduled re-scraping with escalating frequency (#114)
  • Firecrawl integration for unstructured venues (#122)
  • Zouk (Resorts World) extractor (#8)
  • Full-text search + filters on the site (#60, #61)
Generated 2026-03-23 · Vinny Scraper v2.3.1 · 86 commits · 10,738 loc
vinny.vegas · docs.vinny.vegas · img.vinny.vegas · GitHub