v2.3.1 · project recap · 2026-03-23 · 2-week window
Vegas nightlife event scraper covering 13+ venues across LIV, XS, Encore Beach Club, and the full TAO Group portfolio. Scrapes event listings, artist data, table pricing, and images — then exports to JSON, CSV, D1 SQL, and SQLite to power vinny.vegas.
Target users: promoter friends in the Vegas EDM/nightlife scene. Stage: actively shipping features — scraper core is stable, site is live, now adding AI agent layer.
10,738 lines across 43 Python source files
7,324 lines across 21 test files
5 extractors · 7 CLI modules · 11 plugin files
Cloudflare D1 (events DB) · R2 (img.vinny.vegas)
Astro site on Pages · docs.vinny.vegas (CLI + diagrams)
GitLab CI for site deploys · GitHub for scraper
graph TD
subgraph CLI["CLI Layer — Cyclopts"]
SCRAPE["vinny scrape\n(per-venue or all)"]
SYNC["vinny sync\n(full pipeline)"]
EXPORT["vinny export-d1\nexport-csv / sqlite"]
TABLES["vinny tables\ndeals / heatmap"]
ENRICH["vinny enrich\n(spotify/RA/tracklists)"]
IMAGES["vinny images\n(download/upload/sync)"]
end
subgraph Extractors["Venue Extractors"]
LIV["LIV / LIV Beach\n(livnightclub.com)"]
WYNN["XS / EBC\n(wynnsocial.com)"]
TAO["TAO Group × 10\n(taogroup.com sitemaps)"]
end
subgraph Core["Data Pipeline"]
CRAWLEE["Crawlee\n(request queue)"]
SITEMAP["SitemapIndex\n(incremental tracking)"]
MODELS["Pydantic Models\n(VegasEvent, MasterDB)"]
PRICING["Table Pricing\n(urvenue API)"]
MASTER["MasterDatabaseManager\n(master_events.json)"]
end
subgraph Plugins["Plugin Layer"]
IMG_DL["Image Downloader\n(VEA CDN)"]
R2["R2 Storage\n(img.vinny.vegas)"]
SPOTIFY["Spotify Enricher"]
RA["Resident Advisor"]
TL["1001Tracklists"]
end
subgraph Cloud["Cloudflare"]
D1["D1 Database\n(events + artists + tiers)"]
R2B["R2 Bucket\n(artist images)"]
WORKER["Agent API Worker\n(3 endpoints for ElevenLabs)"]
PAGES["Astro Site\n(vinny.vegas)"]
EL["ElevenLabs ConvAI\n(voice/chat widget)"]
end
SCRAPE --> CRAWLEE
SYNC --> CRAWLEE
CRAWLEE --> SITEMAP
SITEMAP --> LIV & WYNN & TAO
LIV & WYNN & TAO --> MODELS
MODELS --> PRICING
PRICING --> MASTER
MASTER --> EXPORT
EXPORT --> D1
ENRICH --> SPOTIFY & RA & TL
SPOTIFY & RA & TL --> MASTER
IMAGES --> IMG_DL
IMG_DL --> R2
R2 --> R2B
D1 --> PAGES
D1 --> WORKER
R2B --> PAGES
WORKER --> EL
classDef cli fill:#3a7d5e22,stroke:#3a7d5e
classDef extract fill:#2e7d8c22,stroke:#2e7d8c
classDef core fill:#6b5b8a22,stroke:#6b5b8a
classDef plugin fill:#b5761a22,stroke:#b5761a
classDef cloud fill:#2e7d8c22,stroke:#2e7d8c
class SCRAPE,SYNC,EXPORT,TABLES,ENRICH,IMAGES cli
class LIV,WYNN,TAO extract
class CRAWLEE,SITEMAP,MODELS,PRICING,MASTER core
class IMG_DL,R2,SPOTIFY,RA,TL plugin
class D1,R2B,WORKER,PAGES,EL cloud
10 commits over 2 weeks (Mar 9 – Mar 23). One dominant theme: the ElevenLabs ConvAI hackathon — building a voice/chat AI agent for the site. Plus CI automation and docs polish.
vinny-vegas-events) with 3 API endpoints
querying live D1 data. Automated tool registration via ElevenLabs API. Embedded the widget in the Astro site footer.
ELEVENLABS_WIDGET_PROMPT.md for the cold-start concierge persona
(no dynamic variables, unlike the outbound agent).
changelogen on merge to main and commits the updated CHANGELOG.md.
Decided: Separate agents for outbound calls (with dynamic variables like caller name, event details) vs. website widget (cold-start concierge, no context). Different system prompts, same backend tools.
Why: The widget has no prior context about the user — it needs a broader conversational opening. The outbound agent knows who it's calling and why. Merging them would dilute both experiences.
Decided: Deploy vinny-vegas-events as its own Worker rather than adding routes to the Astro site.
Why: Astro Pages Functions have limitations for API routes. A standalone Worker is simpler to deploy, test, and iterate on independently. The ElevenLabs webhook calls need clean JSON responses with CORS — mixing that into the SSR site adds unnecessary coupling.
Decided: Auto-generate CHANGELOG.md on merge to main. Why: Manual changelog updates were frequently forgotten. Conventional commits already provide the structured data — changelogen just needs to read them.
justfile (modified) · docs/diagrams/table-pricing-routing.html (modified)
Untracked: 3 hackathon playbooks (phases 2-4), diagrams index page, SEO audit HTML, maestro initiation playbooks
VEN1121562 (wrong).
Always use event.venue_id. The pricing URL builder _build_pricing_url()
auto-routes TAO venues through the booketing.com proxy.
event.model_copy(update={...}).
This applies to all model mutations throughout the codebase.
<main>
is empty but HTTP is 200. Dev server shows the stack trace; production does not.
XS Nightclub events are the best canary (sparsest data).
prime-optimal/vinny).
Astro site is on GitLab (optimalprime/vinny-vegas-app) in site/ (gitignored here).
They share D1 + R2 but are deployed independently.
booketing.com/uws/house/proxy with extra manageentid=61 param.
SitemapIndex tracks lastmod per URL.
Only new/updated events are re-scraped. Past events auto-skipped. --force for full re-scrape.
model_dump(mode="json") is required when storing values that will be serialized later.
Path/datetime objects in FieldChange history caused json.dump crashes without it.
User-Agent header.
No auth or cookies needed otherwise.
img.vinny.vegas. Old domain pub-a209680121414327917920199a3f8c63.r2.dev
is deprecated. R2_PUBLIC_URL in .env must point to the custom domain.
src/worker.ts — Agent API Worker in a Python project
A TypeScript Worker file living in a Python project root. It has its own tsconfig.json and wrangler.jsonc
but no README or docs explaining its relationship to the Python codebase. 318 lines with 3 D1 query endpoints.
Suggestion: Move to a workers/agent-api/ subdirectory with its own README,
or document the two-language setup in CLAUDE.md.
src/export/d1.py — 847 lines, largest single fileHandles SQL generation for events, artists, event_artists, and table_tiers tables. Includes both schema creation and data export. No inline documentation on the SQL generation strategy. Suggestion: Split into schema definitions vs. data export. Add comments explaining the "no ALTER TABLE" constraint and the full-replace-on-export pattern.
The 19 KB files in data/elevenlabs-kb/ were generated once from a snapshot of 218 events.
No automation to regenerate when new events are scraped or pricing changes.
The live Worker API endpoints help, but the KB files will go stale.
Suggestion: Add a vinny generate-kb CLI command to refresh the knowledge base.
src/main.py — 472 lines, legacy entry pointThe original monolithic scraper entry point. Most functionality has been extracted to extractors, CLI modules, and plugins, but this file still exists at 472 lines. Unclear what still depends on it vs. the Cyclopts CLI. Suggestion: Audit imports to determine if this is dead code or still used by Apify/Crawlee runners.
Four untracked hackathon playbooks (elevenlabs-hackathon-2/3/4.md + Initiation/).
These contain valuable context for the ElevenLabs integration work but aren't version-controlled.
Suggestion: Commit or gitignore them — untracked files in the working tree create noise.
Momentum is pointing toward completing the ElevenLabs hackathon and then circling back to site UX.