Artist Enrichment Pipeline
Tracklists → Resident Advisor → Spotify
1 — Pipeline Flow
graph TD
A["Load master_events.json"] --> B["EnrichmentRegistry"]
B --> C["TracklistsEnricher\n1001tracklists.com"]
C --> D["ResidentAdvisorEnricher\nra.co GraphQL"]
D --> E["SpotifyEnricher\nSpotify Web API"]
E --> F["Enriched Event\n+ EnrichmentStatus"]
F --> G["Save master_events.json"]
classDef registry fill:#60a5fa11,stroke:#60a5fa44,stroke-width:1.5px
classDef enricher1 fill:#34d39911,stroke:#34d39944,stroke-width:1.5px
classDef enricher2 fill:#fbbf2411,stroke:#fbbf2444,stroke-width:1.5px
classDef enricher3 fill:#a78bfa11,stroke:#a78bfa44,stroke-width:1.5px
classDef output fill:#fb718511,stroke:#fb718544,stroke-width:2px
class B registry
class C enricher1
class D enricher2
class E enricher3
class F,G output
Order matters. Enrichers run sequentially: Tracklists finds RA URLs (used by next step), then RA enriches bio data (used by Spotify), then Spotify adds top tracks. Partial enrichment is valid — errors in one step don't prevent the next.
2 — The Three Enrichers
TracklistsEnricher (Step 1)
1001tracklists.com
- HTML scraping via
requests - Searches for artist by name
- Extracts RA URL from bio
- Fallback: slug generation for step 2
- Saves
artist.json
ResidentAdvisorEnricher (Step 2)
ra.co GraphQL
- GraphQL query to
ra.co - Input: RA URL (from step 1)
- Or slug fallback if no URL
- Outputs: bio, genres, links
- Coverage: ~29/66 performers
SpotifyEnricher (Step 3)
Spotify Web API
- OAuth: Client Credentials
- Artist search by name
- Extracts: genres, popularity
- Top 10 tracks with audio features
- Saves
artist.json,top-tracks.json
3 — Key Patterns
Immutability
Always use
event.model_copy(update={...}) — never assign to Pydantic fields directly. Each enricher returns a new event copy with updated fields.EnrichmentStatus Tracking
Each event carries
enrichment_status with boolean flags (spotify, tracklists, resident_advisor) and errors dict. Tracks partial completion and failure reasons.Dependency Chain
Tracklists → RA → Spotify is mandatory. Tracklists finds RA URL used by step 2; RA enriches bio used for context in step 3. Breaking the order breaks dependencies.
Error Resilience
Each enricher gracefully handles missing data (404, no results, API failures). Partial enrichment is valid — one enricher's failure doesn't block the next step.
Rate Limiting
0.5s delay between requests in all enrichers. Prevents throttling by 1001tracklists, RA, and Spotify. Configurable via
ENRICHMENT_DELAY_SEC. 4 — CLI Reference
Full Enrichment Pipeline
$ vinny enrich artists # Runs all three enrichers on all events
Spotify Only
$ vinny enrich artists --spotify-only # Skip Tracklists & RA; fetch Spotify metadata directly
Single Artist
$ vinny enrich artists --artist "Carl Cox" # Enrich only "Carl Cox" in the database
Dry Run (Preview)
$ vinny enrich artists --dry-run # Log what would be enriched without saving
Force Re-enrichment
$ vinny enrich artists --force # Re-enrich all artists, even if enrichment_status already set
Status flags prevent re-runs. By default, enrichers skip artists with
enrichment_status.tracklists = true. Use --force to override and re-fetch from all sources.
5 — Storage
Directory Structure
data/artists/{artist_slug}/
├── artist.json # Spotify metadata (name, genres, popularity)
├── top-tracks.json # Spotify top 10 tracks + audio features
└── ra.json # RA GraphQL response (bio, genres, links)
File Lifecycle
- artist.json: Created by TracklistsEnricher or SpotifyEnricher. Contains Spotify artist metadata or 1001tracklists artist card.
- ra.json: Created by ResidentAdvisorEnricher. Raw GraphQL response; used to populate event's
artist.resident_advisorobject. - top-tracks.json: Created by SpotifyEnricher. Array of top 10 tracks with
name,artists,audio_features.
Slug Generation (Fallback)
def fallback_slug(name: str) -> str: # Strip non-alphanumeric, lowercase # "Carl Cox" → "carlcox" # Used if TracklistsEnricher finds no RA URL
Dedup key:
(artist_slug, venue_tag) — same artist at different venues = separate enrichment records. This allows per-venue enrichment context while avoiding duplicate work for the same artist name.