Vinny - Vegas Nightlife Scraper Development Plan¶
Project Overview¶
Vinny is the ultimate Vegas nightlife scraper designed to extract comprehensive event data from Las Vegas nightclubs, starting with LIV Las Vegas and expanding to other major venues.
Target Users: Promoter friends in the Vegas EDM/nightlife scene
Current Version: v2.3.1 — 13+ venues, incremental scraping, Astro site live at vinny.vegas
Completed Milestones¶
v1.0.0 - LIV Perfection ✅ (2026-02-26)¶
- Performer name, event title, date, time, venue, age requirement, description
- CSV export capability
v1.5.0 - Plugin Architecture + Timestamped Runs ✅ (2026-02-27)¶
- Plugin architecture for multi-venue support
- LIV Las Vegas + LIV Beach extractors
- XS Nightclub skeleton
- Artist enrichment — biography, streaming links, VEA CDN images with multiple size variants
- Timestamped run folders, field-level diff tracking, master database
- Markdown + D1 SQL export
v1.6.0 - Table Pricing Extraction ✅ (2026-02-27)¶
Issues: #3, #4 - [x] Direct HTTP to urvenue AJAX API (no Playwright) — all sections, min spend, deposits, capacity - [x] Fixed VEN1121562 bug (map page hardcodes wrong venue code)
v1.7.0 - D1 Schema Expansion + CLI Polish ✅ (2026-03-01)¶
Issues: #24, #27, #30, #32, #36
- [x] table_tiers, artists, event_artists tables in D1
- [x] vinny export-sqlite — local .db file, same schema as D1
- [x] vinny tables, vinny deals, vinny heatmap — table pricing CLI
- [x] Rich console output, CLI module split, command grouping
- [x] FastAPI app (src/app.py) for REST API access
v1.8.0 - XS Nightclub + Multi-Venue Pricing ✅ (2026-03-01)¶
Issues: #39, #40
- [x] XS extractor — Schema.org JSON-LD, embedded table pricing, AM/PM time formatting
- [x] vinny scrape --with-pricing — combined scrape + pricing in one command
- [x] Enrichment preservation — master DB merge never overwrites enriched fields with None
v1.9.0 - Image Naming Refactor + R2 Upload ✅ (2026-03-01)¶
Issues: #18, #42, #43, #93
- [x] Venue-aware image storage — data/artists/{artist}/{artist}_{venue}_{size}.jpg
- [x] Cloudflare R2 integration — R2Storage class, HEAD-check dedup, vinny images upload-r2
- [x] Venue alias resolution — vinny scrape xs / vinny scrape liv shorthand
v1.10.0 - Pipeline Automation + D1 REST API ✅ (2026-03-01)¶
Issues: #12, #34
- [x] vinny sync — full pipeline: scrape → images → R2 → D1 in one command
- [x] Direct Cloudflare D1 REST API (bypasses wrangler OAuth issues)
- [x] Artist data enrichment — Spotify, Resident Advisor, 1001tracklists (#91)
v2.0.0 - Astro Site Foundation ✅ (2026-03-02)¶
Browsable, SEO-friendly event site on Cloudflare Pages using Astro + D1 + R2.
- [x] Homepage, event listing + detail, venue pages, artist pages, table pricing comparison
- [x] TypeScript types + D1 query helpers, base layout + nav, null-safety hardening
- [x] SEO — meta tags, Open Graph, schema.org JSON-LD, canonical URLs at vinny.vegas
- [x] Deploy to Cloudflare Pages — GitLab CI auto-deploy, custom domain vinny.vegas
See .plans/2026-03-01-phase1-astro-site.md for full implementation details.
v2.1.0 - Tables Page Interactive Filters ✅ (2026-03-03)¶
Issue: #75 - [x] Tables page sort + range sliders with dynamic bounds (React island) - [x] Pagination and ppg color coding fixes
v2.2.0 - TAO Group + EBC + Incremental Scraping ✅ (2026-03-05)¶
Encore Beach Club — EBC extractor (src/extractors/ebc.py) inherits WynnSocialBase, covers day + night venues.
TAO Group — one extractor covers 10 venues via taogroup.com sitemaps. Per-venue sitemap filtering (vinny scrape omnia). Booketing.com proxy for table pricing.
| Venue | Type | Hotel | Tag | Issues |
|---|---|---|---|---|
| Omnia | Night | Caesars Palace | omn |
#5 |
| Hakkasan | Night | MGM Grand | hak |
#6 |
| Marquee | Night | Cosmopolitan | marq |
#7 |
| Jewel | Night | Aria | jwl |
✅ |
| Marquee Dayclub | Day | Cosmopolitan | marqd |
✅ |
| Tao Beach | Day | Venetian | taob |
✅ |
| Palm Tree Beach Club | Day | Venetian/Palazzo | palm |
✅ |
| Liquid Pool Lounge | Day | Aria | liq |
✅ |
| Tao Nightclub | Night | Venetian | tao |
#9 |
| Wet Republic | Day | MGM Grand | wet |
#10 (no events in sitemap yet) |
| Encore Beach Club | Day | Wynn | ebc |
#11 |
| EBC at Night | Night | Wynn | ebcn |
#85 |
Incremental sitemap scraping (#96) — SitemapIndex tracks lastmod per URL, only new/updated events enqueued. Past events auto-skipped. --force for full re-scrape. Full TAO backfill: 1,135 events indexed, 1,388 in master DB.
See .plans/2026-03-04-tao-group-extractor.md and .plans/2026-03-05-incremental-sitemap-scraping.md.
v2.3.0 - Typer → Cyclopts Migration ✅ (2026-03-06)¶
Issue: #99 (sub-issues: #100–#106) - [x] Full CLI migration from Typer to Cyclopts - [x] MkDocs CLI reference auto-generated via cyclopts plugin
v2.3.1 - Cleanup ✅ (2026-03-06)¶
Issues: #109, #110, #111
- [x] Remove FastAPI/uvicorn — unused dead code
- [x] R2 custom domain migration (img.vinny.vegas)
- [x] MkDocs admonitions and mermaid diagrams
- [x] Repo health check — README, CLAUDE.md, AGENTS.md, PLAN.md updated
Roadmap¶
Scraper: Data Quality & Expansion¶
- Venue enrichment — description, hours, capacity, images, table map screenshot (#92)
- Stale venue_id fix — Jewel, Tao Nightclub, Palm Tree have wrong venue_id in D1 (#98)
- Zouk (Resorts World) — new venue extractor (#8)
- Wet Republic — waiting for events to appear in TAO sitemaps (#10)
- Firecrawl integration — scrape Drai's, EBC After Hours, and off-strip venues (Area15, Club Ego, Terrace, KWay's) that don't have structured sitemaps/APIs. Also test for rapid day-of pricing checks. 10K credits available. (#122)
- Pricing freshness — scheduled re-scraping with escalating frequency as events approach. Stores pricing snapshots with timestamps, tracks availability changes (Yes→No), preserves original vs current pricing. Week before: once. 3 days: daily. Day-of: hourly. (#114)
Site: UX Fixes¶
- Events page pagination — load 24-36 events per page instead of 1,200+. Server-side with URL params, works with existing filters. (#113)
- Historical pricing display — original vs current price on event detail, "Sold Out" badges on unavailable tiers, price change indicators. Depends on #114. (#115)
Site: Search & Interactive Features¶
Parent issue: #46. See .plans/2026-03-01-phase2-search-filters.md.
- [ ] Full-text search endpoint + SearchBox component (#60)
- [ ] Advanced event filters — date range, venue, price (#61)
- [ ] Quick filters — "Tonight", "This Weekend", "Next 2 Weeks" (#62)
- [ ] Event comparison page (#63)
- [ ] Table/list view toggle on events page (#76)
Site: AI Agent (Conversational Event Discovery)¶
- ElevenLabs voice/chat agent — embed ConvAI widget in site footer. Agent acts as a Vegas nightlife concierge — answers event questions, pricing, recommendations. Knowledge base fed from D1. (#120)
- Proactive outbound notifications — when pricing changes or tables sell out (from #114), trigger outbound calls via ElevenLabs + Twilio to notify large groups. Voicemail fallback, SMS, WhatsApp. (#121)
- Workers AI binding + agent tools (#64)
- Chat API endpoint with streaming (#65)
- ChatWidget React island (#66)
- System prompt + prompt tuning (#67)
Note: #120 (ElevenLabs) may supersede #64-#67 (Workers AI). The ElevenLabs agent handles voice + chat with better quality. Workers AI tools (#64) could still power the backend queries that feed the ElevenLabs agent via webhooks.
Site: Production Polish¶
Parent issue: #48. See .plans/2026-03-01-phase4-polish.md.
- [ ] Dynamic sitemap + robots.txt (#68)
- [ ] OG image generation — per-event social cards (#69)
- [ ] Caching strategy + security headers (#70)
- [ ] Analytics — Cloudflare Web Analytics (#71)
- [ ] Custom 404 page (#72)
- [ ] MkDocs polish (#108)
Site: Content & Engagement¶
- Artist page enrichment — deep content for standalone profiles (#78)
- Venue page enrichment — photos, upcoming shows, past events (#80)
- Guest List CTA — button + form modal on event pages (#81)
- FAQ knowledge base — guest list, dress code, bottle service (#82)
- Social sharing — event card image/video generator (#83)
Infrastructure¶
- Local dev workflow — Wrangler local D1/R2 emulation, make D1 the default export target instead of CSV. Full pipeline works offline. (#116)
- R2 image transformation — auto-resize hi-res artist images into thumb (150px), mobile (400px), main (800px), and og (1200px) variants on upload. Reduces bandwidth, improves load times. (#117)
- Static page generation — pre-render event pages from D1 to reduce DB reads. Regenerate on D1 updates (new events, pricing refresh). Dynamic pages (search, filters) stay SSR. (#118)
- Guestlist Python Worker — deploy
vinny-urvenueCLI as a Cloudflare Worker (guestlist.vinny.vegas). Enables "Add to Guest List" button on event pages + iOS Shortcuts via iMessage. (#119) - AI enrichment pipeline — auto-fill missing descriptions (#77)
- Run Vinny as a Cloudflare Python Worker (#14)
- Deploy to Apify platform (#13)
- Generate promotional images for upcoming events (#16)
- Create social media widgets from upcoming events (#15)
Technical Notes¶
Scraping Discipline¶
- ✅ Use sitemaps when available
- ✅ Incremental scraping — only new/updated URLs (SitemapIndex diff)
- ✅ Stay on-site (livnightclub.com, taogroup.com, wynnsocial.com)
- ✅ Respect rate limits (5 concurrent, 30 req/min)
- ✅ Download images from VEA CDN (post-processing)
Pricing Extraction Strategy (Resolved)¶
- ✅ LIV/LIV Beach: Direct HTTP GET to
wp-admin/admin-ajax.php?action=uvpx&uvaction=uwspx_map - ✅ TAO Group: Same urvenue protocol via
booketing.com/uws/house/proxy - Always use each event's own
venue_id(never hardcode VEN1121562) - See TABLE_PRICING.md for full documentation
Image System (VEA CDN + R2)¶
See IMAGES.md for the full image system reference.
- VEA CDN — LIV, XS, and EBC all use venueeventartist.com; URL-based resizing
- Venue-aware storage — data/artists/{artist}/{artist}_{venue}_{size}.jpg
- R2 custom domain — img.vinny.vegas (bucket: vinny-vegas-images)
- Dedup — by (venue_tag, artist_slug, size) — same artist at different venues = different images
Data Quality¶
- Deduplicate events by composite key (
{date}-{performer}-{venue}) - Handle venue variations (LIV nightclub vs LIV Beach dayclub)
- Null-safety in Astro templates via
src/lib/safe.ts
Repos¶
| Repo | Host | Purpose |
|---|---|---|
| prime-optimal/vinny | GitHub | Scraper, CLI, pipeline |
| optimalprime/vinny-vegas-app | GitLab | Astro site at vinny.vegas |
Last Updated: 2026-03-20 Status: v2.3.1 — 13+ venues, incremental scraping, Cyclopts CLI, Astro site live. Next: events pagination (#113), local dev workflow (#116), pricing freshness (#114), static pages (#118)