Sync Pipeline
Complete data flow: scrape → images → R2 → D1 → vinny.vegas
1 — Pipeline Overview
graph TD
A["Scrape\nvinny scrape"] --> B["VegasEvent[]"]
B --> C["Download Images\nvinny images download"]
C --> D["Local Files\ndata/artists/{slug}/"]
D --> E["Upload R2\nvinny images upload-r2"]
E --> F["R2 Bucket\nimg.vinny.vegas"]
F --> G["Export D1\nvinny export-d1"]
G --> H["Cloudflare D1"]
H --> I["vinny.vegas\nAstro SSR"]
classDef scrape fill:#818cf811,stroke:#818cf844,stroke-width:2px
classDef image fill:#34d39911,stroke:#34d39944,stroke-width:1.5px
classDef upload fill:#fbbf2411,stroke:#fbbf2444,stroke-width:1.5px
classDef export fill:#22d3ee11,stroke:#22d3ee44,stroke-width:1.5px
classDef output fill:#fb718511,stroke:#fb718544,stroke-width:2px
class A,B scrape
class C,D image
class E,F upload
class G,H export
class I output
Pipeline order is critical. Images must be downloaded and uploaded to R2 before D1 export, so that
artist_image_url is populated in the database. Run vinny sync to execute all steps in order, or vinny sync --run latest to reuse the last scrape data.
2 — Step 1: Scrape
Crawl venue sitemaps and extract events
Crawlee crawls LIV, XS, EBC, and TAO Group sitemaps, routing each URL to the appropriate venue extractor.
Extractors
- LIVExtractor (livnightclub.com)
- XSExtractor (wynnsocial.com)
- EBCExtractor (wynnsocial.com)
- TaoGroupExtractor (taogroup.com)
Output: VegasEvent
- Event date & time
- Performer name(s)
- Venue & venue_id
- Table pricing (if available)
Sitemap Diffing
Each extractor maintains a
SitemapIndex with lastmod timestamps. Only new or updated URLs are enqueued, avoiding redundant re-crawls. Use vinny scrape --full to ignore timestamps. 3 — Step 2: Images Download
Fetch artist images from VEA CDN to local disk
For each unique performer in the event batch, vinny images download fetches images from the Vegas Events API (VEA) CDN and stores them locally.
VEA CDN
Vegas Events API CDN hosts artist images at:
cdn.vegaseventsapi.com/images/{artist_slug}/{size}.jpgLocal Path Format
data/artists/{slug}/{slug}_{venue}_{size}.jpge.g.
deadmau5_liv_main.jpgDefault Sizes
main (500px) and hd (1500px)Dedup key:
(venue_tag, artist_slug, size)Venue Tags
ebc, ebcn, xs, liv, livb — one per venue/facilityValidator dedup counts unique (venue_tag, artist_slug, size) tuples, not per-event. Same performer at multiple venues = multiple files.
4 — Step 3: R2 Upload
Upload local images to Cloudflare R2
Transfer all downloaded images from data/artists/ to the R2 bucket for CDN serving and D1 reference.
R2 Bucket
vinny-vegas-images (Cloudflare account)Custom Domain
img.vinny.vegas — public CDN URL for image referencesR2 Path
Same as local:
artists/{slug}/{slug}_{venue}_{size}.jpgSpeed
Concurrent uploads via Cloudflare Workers API or wrangler CLI
Migration Note
Old domain
pub-a209680121414327917920199a3f8c63.r2.dev is deprecated. Use img.vinny.vegas (set R2_PUBLIC_URL in .env). Migration script: scripts/migrate-r2-domain.sql 5 — Step 4: D1 Export
Export events to Cloudflare D1 via REST API
Generate SQL from the event batch and send to the Cloudflare D1 database. Images must be on R2 first so artist_image_url is populated.
5 Core Tables
- events — main event records
- artists — performer profiles (name, slug, bio, image)
- event_artists — junction table (event_id, artist_id, sort_order)
- images — image metadata (size, r2_url, mime_type)
- table_tiers — pricing tiers (venue_id, tier, price)
| Method | CLI Command | Behavior |
|---|---|---|
| REST API (preferred) |
vinny export-d1 --execute |
Direct HTTP to Cloudflare D1 API; requires CLOUDFLARE_API_TOKEN + D1_DATABASE_ID |
| Wrangler CLI (fallback) |
vinny export-d1 --use-wrangler |
Uses wrangler d1 execute; slower but simpler auth |
| SQL File Only (dev/review) |
vinny export-d1 |
Generates export.sql without executing |
D1 Best Practices
- No ALTER TABLE — all columns must be in CREATE TABLE; wrangler sends entire SQL as one transaction
- Batching — large event sets are split into batches to avoid timeout
- --local dev testing — use
vinny sync --localto write to local wrangler D1 first
6 — CLI Reference
Run all steps with vinny sync, or execute individual steps.
| Command | Steps | Flags |
|---|---|---|
vinny sync |
1, 2, 3, 4 (all) | --run latest — skip scrape, reuse last data |
--local — write to local wrangler D1 (dev) |
||
vinny scrape |
1 only | --full — ignore sitemap timestamps, recrawl all |
--venues VENUE [...] — scrape specific venues |
||
vinny images download |
2 only | --force — re-download existing files |
vinny images upload-r2 |
3 only | --validate — verify files exist before upload |
vinny export-d1 --execute |
4 only (REST API) | --use-wrangler — use wrangler CLI instead |
Full sync example: $ vinny sync --venues liv xs hakkasan Reuse last scrape (faster): $ vinny sync --run latest Dev: local D1 only: $ vinny sync --local