Music Video Creator uses a modular pipeline architecture for audio analysis and visualization.
┌─────────────────────────────────────────────────────────────────────┐
│ INPUT STAGE │
├─────────────────────────────────────────────────────────────────────┤
│ WAV File → AudioLoader → Mono Conversion → Normalization │
└────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ ANALYSIS STAGE │
├─────────────────────────────────────────────────────────────────────┤
│ AudioAnalyzer │
│ ├── STFT (window=2048, hop=512) │
│ ├── Feature Extraction per frame: │
│ │ ├── RMS energy (loudness) │
│ │ ├── Spectral centroid (brightness) │
│ │ ├── Bass/Mid/High energy (frequency bands) │
│ │ └── Onset strength (transients) │
│ │ │
│ └── BeatDetector │
│ ├── Spectral flux onset envelope │
│ ├── Peak picking with threshold │
│ └── Tempo estimation (autocorrelation + beat intervals) │
└────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ SCENE STAGE │
├─────────────────────────────────────────────────────────────────────┤
│ SceneDetector │
│ ├── Energy profiling (rolling mean/std) │
│ ├── Scene boundary detection │
│ └── Classification: intro, verse, build, drop, breakdown, outro │
│ │
│ SceneTimeline │
│ ├── Scene progression management │
│ └── Style transitions per scene │
└────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ VISUALIZATION STAGE │
├─────────────────────────────────────────────────────────────────────┤
│ MatplotlibVisualizer (9 styles) │
│ └── CPU-based, detailed, slower │
│ │
│ ModernGLVisualizer (10 styles) │
│ ├── GPU-accelerated, ~100x faster │
│ ├── GLSL fragment shaders │
│ └── Uniforms: u_bass, u_mid, u_high, u_time, u_is_beat, u_spectrum │
│ │
│ EffectsProcessor │
│ └── Beat-triggered: zoom, flash, shake │
└────────────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ RENDERING STAGE │
├─────────────────────────────────────────────────────────────────────┤
│ PipeRenderer (recommended) │
│ ├── FFmpeg subprocess with stdin pipe │
│ ├── Memory-efficient (frames streamed, not stored) │
│ ├── 1440p upscaling option (forces YouTube VP9) │
│ └── Subtitle burn-in (ASS/SRT) │
│ │
│ GifRenderer │
│ ├── Two-pass FFmpeg (palettegen + paletteuse) │
│ └── Platform presets (Discord, Twitter, etc.) │
└────────────────────────────────┬────────────────────────────────────┘
│
▼
MP4 / GIF Output
src/
├── audio/
│ ├── loader.py # WAV loading, normalization
│ └── analyzer.py # STFT, features, BeatDetector
│
├── scenes/
│ ├── detector.py # SceneDetector
│ ├── timeline.py # SceneTimeline
│ └── curves.py # UniformCurve (feature→shader mapping)
│
├── visuals/
│ ├── matplotlib_viz.py # 9 Matplotlib styles
│ ├── moderngl_viz.py # 10 GPU styles
│ ├── effects.py # Zoom, flash, shake
│ ├── hud.py # Title, BPM, progress overlays
│ └── shaders/ # External GLSL fragment shaders
│ ├── tunnel_3d.frag
│ ├── sphere_3d.frag
│ └── terrain_3d.frag
│
├── render/
│ └── renderer.py # PipeRenderer, GifRenderer, VideoRenderer
│
├── lyrics/
│ └── transcriber.py # Whisper integration, ASS generation
│
├── config/
│ ├── schema.py # Config dataclasses
│ └── loader.py # YAML loading, validation
│
└── main.py # CLI entry point
| Decision | Choice | Rationale |
|---|---|---|
| Audio analysis | scipy.signal.stft | Standard, well-documented, no heavy dependencies |
| GPU rendering | ModernGL | Pure Python bindings, easy shader development |
| Video encoding | FFmpeg pipe | Memory-efficient, professional quality |
| Configuration | YAML + dataclasses | Type-safe, mergeable, preset-friendly |
| Beat detection | Spectral flux + peaks | Works well for electronic and rhythmic music |
| Scene detection | Energy profiling | Genre-agnostic, no ML dependencies |