music-video-creator-showcase

Architecture

Music Video Creator uses a modular pipeline architecture for audio analysis and visualization.

Pipeline Flow

┌─────────────────────────────────────────────────────────────────────┐
│                         INPUT STAGE                                  │
├─────────────────────────────────────────────────────────────────────┤
│  WAV File → AudioLoader → Mono Conversion → Normalization           │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       ANALYSIS STAGE                                 │
├─────────────────────────────────────────────────────────────────────┤
│  AudioAnalyzer                                                       │
│  ├── STFT (window=2048, hop=512)                                    │
│  ├── Feature Extraction per frame:                                   │
│  │   ├── RMS energy (loudness)                                      │
│  │   ├── Spectral centroid (brightness)                             │
│  │   ├── Bass/Mid/High energy (frequency bands)                     │
│  │   └── Onset strength (transients)                                │
│  │                                                                   │
│  └── BeatDetector                                                    │
│      ├── Spectral flux onset envelope                               │
│      ├── Peak picking with threshold                                │
│      └── Tempo estimation (autocorrelation + beat intervals)        │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        SCENE STAGE                                   │
├─────────────────────────────────────────────────────────────────────┤
│  SceneDetector                                                       │
│  ├── Energy profiling (rolling mean/std)                            │
│  ├── Scene boundary detection                                        │
│  └── Classification: intro, verse, build, drop, breakdown, outro    │
│                                                                      │
│  SceneTimeline                                                       │
│  ├── Scene progression management                                    │
│  └── Style transitions per scene                                     │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    VISUALIZATION STAGE                               │
├─────────────────────────────────────────────────────────────────────┤
│  MatplotlibVisualizer (9 styles)                                    │
│  └── CPU-based, detailed, slower                                    │
│                                                                      │
│  ModernGLVisualizer (10 styles)                                     │
│  ├── GPU-accelerated, ~100x faster                                  │
│  ├── GLSL fragment shaders                                          │
│  └── Uniforms: u_bass, u_mid, u_high, u_time, u_is_beat, u_spectrum │
│                                                                      │
│  EffectsProcessor                                                    │
│  └── Beat-triggered: zoom, flash, shake                             │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
┌─────────────────────────────────────────────────────────────────────┐
│                       RENDERING STAGE                                │
├─────────────────────────────────────────────────────────────────────┤
│  PipeRenderer (recommended)                                          │
│  ├── FFmpeg subprocess with stdin pipe                              │
│  ├── Memory-efficient (frames streamed, not stored)                 │
│  ├── 1440p upscaling option (forces YouTube VP9)                    │
│  └── Subtitle burn-in (ASS/SRT)                                     │
│                                                                      │
│  GifRenderer                                                         │
│  ├── Two-pass FFmpeg (palettegen + paletteuse)                      │
│  └── Platform presets (Discord, Twitter, etc.)                      │
└────────────────────────────────┬────────────────────────────────────┘
                                 │
                                 ▼
                          MP4 / GIF Output

Module Structure

src/
├── audio/
│   ├── loader.py        # WAV loading, normalization
│   └── analyzer.py      # STFT, features, BeatDetector
│
├── scenes/
│   ├── detector.py      # SceneDetector
│   ├── timeline.py      # SceneTimeline
│   └── curves.py        # UniformCurve (feature→shader mapping)
│
├── visuals/
│   ├── matplotlib_viz.py  # 9 Matplotlib styles
│   ├── moderngl_viz.py    # 10 GPU styles
│   ├── effects.py         # Zoom, flash, shake
│   ├── hud.py             # Title, BPM, progress overlays
│   └── shaders/           # External GLSL fragment shaders
│       ├── tunnel_3d.frag
│       ├── sphere_3d.frag
│       └── terrain_3d.frag
│
├── render/
│   └── renderer.py      # PipeRenderer, GifRenderer, VideoRenderer
│
├── lyrics/
│   └── transcriber.py   # Whisper integration, ASS generation
│
├── config/
│   ├── schema.py        # Config dataclasses
│   └── loader.py        # YAML loading, validation
│
└── main.py              # CLI entry point

Key Design Decisions

Decision	Choice	Rationale
Audio analysis	scipy.signal.stft	Standard, well-documented, no heavy dependencies
GPU rendering	ModernGL	Pure Python bindings, easy shader development
Video encoding	FFmpeg pipe	Memory-efficient, professional quality
Configuration	YAML + dataclasses	Type-safe, mergeable, preset-friendly
Beat detection	Spectral flux + peaks	Works well for electronic and rhythmic music
Scene detection	Energy profiling	Genre-agnostic, no ML dependencies