Artist Enrichment Pipeline

Tracklists → Resident Advisor → Spotify

1 — Pipeline Flow
graph TD
  A["Load master_events.json"] --> B["EnrichmentRegistry"]
  B --> C["TracklistsEnricher\n1001tracklists.com"]
  C --> D["ResidentAdvisorEnricher\nra.co GraphQL"]
  D --> E["SpotifyEnricher\nSpotify Web API"]
  E --> F["Enriched Event\n+ EnrichmentStatus"]
  F --> G["Save master_events.json"]

  classDef registry fill:#60a5fa11,stroke:#60a5fa44,stroke-width:1.5px
  classDef enricher1 fill:#34d39911,stroke:#34d39944,stroke-width:1.5px
  classDef enricher2 fill:#fbbf2411,stroke:#fbbf2444,stroke-width:1.5px
  classDef enricher3 fill:#a78bfa11,stroke:#a78bfa44,stroke-width:1.5px
  classDef output fill:#fb718511,stroke:#fb718544,stroke-width:2px

  class B registry
  class C enricher1
  class D enricher2
  class E enricher3
  class F,G output
      
Order matters. Enrichers run sequentially: Tracklists finds RA URLs (used by next step), then RA enriches bio data (used by Spotify), then Spotify adds top tracks. Partial enrichment is valid — errors in one step don't prevent the next.
2 — The Three Enrichers
TracklistsEnricher (Step 1)
1001tracklists.com
  • HTML scraping via requests
  • Searches for artist by name
  • Extracts RA URL from bio
  • Fallback: slug generation for step 2
  • Saves artist.json
ResidentAdvisorEnricher (Step 2)
ra.co GraphQL
  • GraphQL query to ra.co
  • Input: RA URL (from step 1)
  • Or slug fallback if no URL
  • Outputs: bio, genres, links
  • Coverage: ~29/66 performers
SpotifyEnricher (Step 3)
Spotify Web API
  • OAuth: Client Credentials
  • Artist search by name
  • Extracts: genres, popularity
  • Top 10 tracks with audio features
  • Saves artist.json, top-tracks.json
3 — Key Patterns
Immutability
Always use event.model_copy(update={...}) — never assign to Pydantic fields directly. Each enricher returns a new event copy with updated fields.
EnrichmentStatus Tracking
Each event carries enrichment_status with boolean flags (spotify, tracklists, resident_advisor) and errors dict. Tracks partial completion and failure reasons.
Dependency Chain
Tracklists → RA → Spotify is mandatory. Tracklists finds RA URL used by step 2; RA enriches bio used for context in step 3. Breaking the order breaks dependencies.
Error Resilience
Each enricher gracefully handles missing data (404, no results, API failures). Partial enrichment is valid — one enricher's failure doesn't block the next step.
Rate Limiting
0.5s delay between requests in all enrichers. Prevents throttling by 1001tracklists, RA, and Spotify. Configurable via ENRICHMENT_DELAY_SEC.
4 — CLI Reference

Full Enrichment Pipeline

$ vinny enrich artists
# Runs all three enrichers on all events

Spotify Only

$ vinny enrich artists --spotify-only
# Skip Tracklists & RA; fetch Spotify metadata directly

Single Artist

$ vinny enrich artists --artist "Carl Cox"
# Enrich only "Carl Cox" in the database

Dry Run (Preview)

$ vinny enrich artists --dry-run
# Log what would be enriched without saving

Force Re-enrichment

$ vinny enrich artists --force
# Re-enrich all artists, even if enrichment_status already set
Status flags prevent re-runs. By default, enrichers skip artists with enrichment_status.tracklists = true. Use --force to override and re-fetch from all sources.
5 — Storage

Directory Structure

data/artists/{artist_slug}/
├── artist.json          # Spotify metadata (name, genres, popularity)
├── top-tracks.json      # Spotify top 10 tracks + audio features
└── ra.json              # RA GraphQL response (bio, genres, links)

File Lifecycle

  • artist.json: Created by TracklistsEnricher or SpotifyEnricher. Contains Spotify artist metadata or 1001tracklists artist card.
  • ra.json: Created by ResidentAdvisorEnricher. Raw GraphQL response; used to populate event's artist.resident_advisor object.
  • top-tracks.json: Created by SpotifyEnricher. Array of top 10 tracks with name, artists, audio_features.

Slug Generation (Fallback)

def fallback_slug(name: str) -> str:
  # Strip non-alphanumeric, lowercase
  # "Carl Cox" → "carlcox"
  # Used if TracklistsEnricher finds no RA URL
Dedup key: (artist_slug, venue_tag) — same artist at different venues = separate enrichment records. This allows per-venue enrichment context while avoiding duplicate work for the same artist name.
Vinny Scraper — Artist Enrichment Pipeline · Generated 2026-03-06 · src/plugins/enrichment/