Audio Analysis - Music Video Creator

Short-Time Fourier Transform (STFT)

The STFT breaks audio into overlapping windows and computes the frequency spectrum for each. This creates a time-frequency representation that drives all visualizations.

Spectrogram Visualization

Low Freq (Bass) High Freq (Treble)

Window Size 2048 samples

Frequency resolution vs time resolution tradeoff

Hop Length 512 samples

Overlap between consecutive windows (75%)

Window Function Hann

Reduces spectral leakage at window boundaries

Extracted Audio Features

📊

RMS Energy

Root Mean Square - measures loudness/intensity of the signal

Maps to: Size, Scale, Brightness

Range: 0.0 - 1.0

🎨

Spectral Centroid

"Center of mass" of the spectrum - indicates brightness/timbre

Maps to: Color Hue, Warmth

Range: 0 - 8000 Hz

🔊

Bass Energy

Energy in low frequencies (20-250 Hz)

Maps to: Pulse, Shake, Size

Range: 20-250 Hz

🎸

Mid Energy

Energy in mid frequencies (250-2000 Hz) - vocals, instruments

Maps to: Rotation, Movement

Range: 250-2000 Hz

✨

High Energy

Energy in high frequencies (2000-8000 Hz) - cymbals, brilliance

Maps to: Sparkle, Detail

Range: 2000-8000 Hz

🥁

Beat Detection

Onset detection using energy derivative + thresholding

Maps to: Zoom, Flash, Shake

Min interval: 0.15s

Analysis Pipeline

🎵

WAV Input

44.1kHz stereo

→

🔄

Normalize

Mono, -1 to 1

→

📊

STFT

2048 window

→

🔬

Extract

Features/frame

→

🥁

Beat Detect

Onsets

→

🎬

Visualize

30 FPS frames

GPU Shader Uniforms

Audio features are passed to GPU shaders as uniform variables every frame.

// Available in all GPU styles

uniform float u_time; // Elapsed time in seconds

uniform float u_rms; // RMS energy (0-1)

uniform float u_bass; // Bass energy (0-1)

uniform float u_mid; // Mid energy (0-1)

uniform float u_high; // High energy (0-1)

uniform float u_is_beat; // Beat trigger (0 or 1)

uniform float u_centroid; // Spectral centroid (normalized)

uniform sampler2D u_spectrum; // 128-bin spectrum texture

Configuration

Audio analysis parameters can be configured via YAML or CLI.

YAML Config:

audio:

window_size: 2048

hop_length: 512

bass_range: [20, 250]

mid_range: [250, 2000]

high_range: [2000, 8000]

Beat Detection:

beat:

threshold_ratio: 0.3

min_interval_sec: 0.15

decay_rate: 0.85