Skip to content

Feature: sonic analysis provider#3516

Closed
chrisuthe wants to merge 6 commits intomusic-assistant:devfrom
chrisuthe:task/sonic-analysis-provider
Closed

Feature: sonic analysis provider#3516
chrisuthe wants to merge 6 commits intomusic-assistant:devfrom
chrisuthe:task/sonic-analysis-provider

Conversation

@chrisuthe
Copy link
Copy Markdown
Member

@chrisuthe chrisuthe commented Mar 31, 2026

Sonic Analysis Provider

This PR adds a sonic analysis provider that extracts semantic audio features from PCM audio during playback using librosa and stores them as standard AudioAnalysisData fields.

What It Does

  • Processes audio in 10-second blocks with overlap to avoid STFT artifacts
  • Derives human-readable semantic descriptors from raw spectral features:
Field Derivation
bpm Tempo estimation from onset envelope
key / mode Krumhansl-Kessler profile correlation against chroma
energy Normalized mean RMS
danceability Onset regularity + tempo suitability
loudness_integrated / loudness_range RMS-derived dB approximations
brightness Spectral centroid normalized against Nyquist
harmonic_complexity Shannon entropy of mean chroma vector
roughness Spectral contrast range + flatness
rhythmic_regularity Inter-onset interval coefficient of variation
rms_energy_per_second / spectral_centroid_per_second Per-second time series
  • Stores results via set_audio_analysis() as a plain AudioAnalysisData — no opaque blobs, no custom subclasses
  • Proposes 4 new upstream fields: brightness, harmonic_complexity, roughness, rhythmic_regularity

Architecture

The provider is a pure feature extraction + distillation layer. It does not store similarity vectors or compute distances — that responsibility belongs to the similarity plugin (separate stacked PR).

Both the provider and the similarity plugin depend on the shared AudioAnalysisData model contract. Any audio analysis provider that populates the same fields can feed the similarity plugin.

Code Organization

  • helpers.py — Pure feature extraction (extract_block_features, merge_block_features) and semantic derivation (collapse_to_analysis with private _derive_* helpers)
  • __init__.py — MA integration: PCM streaming, block accumulation, session management, _finalize() stores the result

Testing

  • test_helpers.py — Tests for block extraction, merging, and collapse_to_analysis (scalar ranges, determinism, noise vs sine differentiation)
  • test_provider_units.py — Tests for PCM byte conversion (16/24/32-bit, mono/stereo)

Dependencies

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 31, 2026

🔒 Dependency Security Report

✅ No dependency changes detected in this PR.

Comment thread music_assistant/providers/sonic_analysis/helpers.py Outdated
Comment thread music_assistant/providers/sonic_analysis/__init__.py
@chrisuthe chrisuthe force-pushed the task/sonic-analysis-provider branch 2 times, most recently from 481723c to 043220e Compare April 1, 2026 14:44
@chrisuthe chrisuthe changed the title WIP: Feature: sonic analysis provider Feature: sonic analysis provider Apr 2, 2026
@chrisuthe chrisuthe force-pushed the task/sonic-analysis-provider branch from 729ed8a to 3d93152 Compare April 5, 2026 15:37
@chrisuthe chrisuthe force-pushed the task/sonic-analysis-provider branch 2 times, most recently from b04676e to fa65f09 Compare April 21, 2026 16:24
… base class

Provider extracts audio features from PCM streams using librosa and stores
them as semantic AudioAnalysisData fields (BPM, key, mode, energy,
danceability, brightness, harmonic_complexity, roughness, rhythmic_regularity,
loudness, beats, duration, true_peak, wave_form).

Adapted to upstream AudioAnalysisProvider API:
- _start_analysis returns bool (replaces old start_analysis override)
- Uses streamdetails (not stream_details)
- Stores via mass.streams.audio_analysis.set_audio_analysis()
Empty frequency sets and flat chroma profiles produce harmless warnings
during key detection. Now suppressed with targeted warning filters and
NaN handling for zero-std correlations.
@chrisuthe chrisuthe force-pushed the task/sonic-analysis-provider branch from fa65f09 to 7f5d82d Compare April 21, 2026 16:28
…scan

Override the AudioAnalysisProvider.analyze_file hook so upstream's
AudioAnalysisController._run_background_scan can drive backfill through
the generic provider-agnostic interface. Loads audio via librosa, runs
block feature extraction and collapse, populates duration and true_peak.
Three cleanups in one commit:

1. Stop computing overlap fields in librosa:
     bpm            <- overlaid by smart_fades (beat_this CNN)
     key, mode      <- overlaid by smart_fades (S-KEY)
     danceability   <- overlaid by clap_analysis (zero-shot, calibrated)

   These were quality-inferior to their overlay sources and the overlay
   system guaranteed replacement at vector-assembly time. Computing them
   in librosa was wasted work; leaving their AudioAnalysisData fields
   None makes the architecture honest. Install must have the relevant
   overlay providers enabled or vectors won't assemble — the "no valid
   signatures found" diagnostic added in the previous commit tells the
   user exactly which fields are missing when that happens.

2. Remove dead-code feature extractions:
     librosa.feature.mfcc
     librosa.feature.tonnetz
     librosa.feature.spectral_rolloff
     librosa.feature.zero_crossing_rate

   These were extracted per block and stored on BlockFeatures but never
   read by collapse_to_analysis. Legacy from an earlier vector schema.
   Removing saves roughly 100ms per 10s block of analyzed audio — for
   a typical 3-min track, ~1.8s less CPU per analysis.

3. Fix pre-existing stale field names in test_helpers.py:
     rms_energy_per_second       -> rms_energy
     spectral_centroid_per_second -> spectral_centroid

   These referenced the pre-upstream-alignment field names and had been
   silently failing 2 tests since the AudioAnalysisData model was updated.

Net: -157/+69 lines in helpers.py, test surface shrunk to match.
All 102 sonic_analysis + sonic_similarity tests pass.
extract_block_features previously called four librosa feature functions
that each computed their own STFT internally — four redundant spectrograms
per 10s block. All four (chroma_stft, spectral_contrast, spectral_centroid,
spectral_flatness) share the same default n_fft=2048 / hop_length=512, so
a single up-front STFT is the correct input for all of them via librosa's
`S=` kwarg.

Verified byte-identical output to the old per-feature path (max abs diff
= 0 on all four feature matrices). All 10 sonic_analysis tests pass
unchanged.

Measured: 1.56x speedup on a 10s block (25ms -> 16ms). For a 3-min track
(18 blocks), that's ~180ms saved per analyzed track. At a user's 12k-track
library scale, ~36 minutes of CPU time per full background scan. Millions-
of-tracks libraries benefit proportionally.

RMS and onset_strength are left unchanged: RMS is time-domain, and
onset_strength uses a mel spectrogram with different parameters.
@chrisuthe chrisuthe closed this Apr 27, 2026
@chrisuthe chrisuthe deleted the task/sonic-analysis-provider branch May 4, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants