Scene Detection

Automatic song structure analysis that identifies intro, verse, build, drop, breakdown, and outro sections. Each scene gets mapped to an optimal visual style based on genre.

How It Works

1. Energy Analysis

Computes RMS energy envelope and smooths it to identify energy patterns

2. Boundary Detection

Finds significant changes in energy to mark scene boundaries

3. Scene Classification

Classifies each segment by position and energy level into scene types

Interactive Timeline

Click on any scene in the timeline to see its details. This example shows a typical electronic track structure.

0:00
3:30
Intro
Verse
Build
Drop
Breakdown
Outro
🎬

Intro

Opening section of the track. Usually lower energy, building anticipation.

Time Range
0:00 - 0:30
Energy Level
Low
Visual Style
tunnel_3d_gpu

Scene Types

Detection Algorithm

# Simplified scene detection process
energy = compute_rms_envelope(audio)
smoothed = gaussian_filter(energy, sigma=sr*2)
gradient = np.diff(smoothed)
boundaries = find_peaks(np.abs(gradient), threshold=0.1)
scenes = classify_segments(boundaries, energy)

The algorithm uses energy envelope analysis with Gaussian smoothing to identify structural changes. Scene classification considers position (first 10% = intro, last 10% = outro) and relative energy levels.

Usage

List detected scenes:
python main.py song.wav --list-scenes
Render with automatic scene detection:
python main.py song.wav -o output.mp4 --scenes --genre electronic --pipe
Manual scene timestamps:
python main.py song.wav -o output.mp4 --scenes --genre rock \
--scene-timestamps "0:intro,30:verse,60:build,90:drop,120:breakdown,150:outro"