Audio Analysis

Deep dive into the STFT-based audio analysis pipeline that powers all visualizations. Understanding how audio features map to visual parameters.

Short-Time Fourier Transform (STFT)

The STFT breaks audio into overlapping windows and computes the frequency spectrum for each. This creates a time-frequency representation that drives all visualizations.

Spectrogram Visualization
Low Freq (Bass) High Freq (Treble)
Window Size 2048 samples

Frequency resolution vs time resolution tradeoff

Hop Length 512 samples

Overlap between consecutive windows (75%)

Window Function Hann

Reduces spectral leakage at window boundaries

Extracted Audio Features

📊
RMS Energy

Root Mean Square - measures loudness/intensity of the signal

Maps to: Size, Scale, Brightness
Range: 0.0 - 1.0
🎨
Spectral Centroid

"Center of mass" of the spectrum - indicates brightness/timbre

Maps to: Color Hue, Warmth
Range: 0 - 8000 Hz
🔊
Bass Energy

Energy in low frequencies (20-250 Hz)

Maps to: Pulse, Shake, Size
Range: 20-250 Hz
🎸
Mid Energy

Energy in mid frequencies (250-2000 Hz) - vocals, instruments

Maps to: Rotation, Movement
Range: 250-2000 Hz
High Energy

Energy in high frequencies (2000-8000 Hz) - cymbals, brilliance

Maps to: Sparkle, Detail
Range: 2000-8000 Hz
🥁
Beat Detection

Onset detection using energy derivative + thresholding

Maps to: Zoom, Flash, Shake
Min interval: 0.15s

Analysis Pipeline

🎵
WAV Input
44.1kHz stereo
🔄
Normalize
Mono, -1 to 1
📊
STFT
2048 window
🔬
Extract
Features/frame
🥁
Beat Detect
Onsets
🎬
Visualize
30 FPS frames

GPU Shader Uniforms

Audio features are passed to GPU shaders as uniform variables every frame.

// Available in all GPU styles
uniform float u_time; // Elapsed time in seconds
uniform float u_rms; // RMS energy (0-1)
uniform float u_bass; // Bass energy (0-1)
uniform float u_mid; // Mid energy (0-1)
uniform float u_high; // High energy (0-1)
uniform float u_is_beat; // Beat trigger (0 or 1)
uniform float u_centroid; // Spectral centroid (normalized)
uniform sampler2D u_spectrum; // 128-bin spectrum texture

Configuration

Audio analysis parameters can be configured via YAML or CLI.

YAML Config:
audio:
window_size: 2048
hop_length: 512
bass_range: [20, 250]
mid_range: [250, 2000]
high_range: [2000, 8000]
Beat Detection:
beat:
threshold_ratio: 0.3
min_interval_sec: 0.15
decay_rate: 0.85