News
Introducing Sonilo v1.1
Our most capable video-to-music model
- Written by
- Sonilo Team
- Published
Sonilo v1.0 proved a simple idea: the information needed to score a video already lives inside the video. v1.1 keeps that foundation and carries more of the work for you. It aligns to your footage more tightly, preserves the voices already in your video, and, when you want creative direction, lets you shape the score scene by scene. Instead of generating one track and hoping it fits, you can hand Sonilo a full edit, with dialogue and structure intact, and trust it to deliver a soundtrack that lands on every cut.
In side-by-side evaluation against v1.0, v1.1 wins where real video work is hardest. Across the four dimensions we annotate for video-to-music, rhythm alignment, emotional fit, prompt adherence, and musicality, evaluators preferred v1.1 or rated the two equal in 70 to 78% of comparisons, and chose v1.1 over v1.0 roughly 1.6 times as often as the reverse.
- 70–78%
- of comparisons preferred v1.1, or rated it equal to v1.0, across all four video-to-music dimensions
- 1.6×
- more often, evaluators chose v1.1 over v1.0 when they expressed a preference
- 50 vs 30
- prompt-adherence preference, v1.1 versus v1.0, the widest margin we measured
Hear the upgrade
The same clip, two versions
Identical footage, scored by v1.0 and by v1.1. Press play to run both in sync, then tap a side to compare the music one version at a time.
Capability 01
Sharper audio alignment
Alignment has always been the core of Sonilo, and in v1.1 it gets noticeably tighter. The model locks beats, builds, and transitions to the pacing of your visuals more precisely than v1.0, following every cut and shift in energy across the entire timeline so the music feels written for that exact edit rather than dropped on top of it. It still resolves with a natural musical ending instead of a hard cut or loop.
The gains show up directly in evaluation. The demo above is the same improvement, heard rather than measured.
Annotators preferred v1.1, or called it equal to v1.0, on rhythm alignment in 78% of comparisons and on emotional fit in 70%, the two dimensions that most determine whether a score feels locked to the video.
Capability 02 · New in v1.1
Vocal-preserved generation
New in v1.1, Sonilo can separate the original speech in your video and keep it intact over a freshly generated track. Narration, dialogue, and on-camera voice stay clear and front-and-center while new music fills in underneath, with no manual ducking, re-recording, or audio cleanup.
It is built for the formats where the original voice has to stay: vlogs, interviews, ads, tutorials, and short-form social content.

Capability 03 · New in v1.1
Segment-level control
For creators who want precise direction, v1.1 lets you slice the timeline and assign a separate prompt to each segment, generating music scene by scene. Steer style, mood, and instrumentation per section, and shape musical structure directly with labels like intro, verse, chorus, and bridge. Automatic scoring when you want speed, granular control when you want intent, without ever having to leave the video as the starting point.
This is where v1.1 improved most. On prompt adherence, annotators preferred v1.1 in 50% of comparisons versus 30% for v1.0, the widest margin of any dimension we measured, and a direct result of the new per-segment control.

Built right
Built on professionally licensed music
Like v1.0, every soundtrack from v1.1 is original, production-ready, and cleared for commercial use, with no additional licensing required, whether for social content, branded video, games, or broadcast.
Most AI music tools treat licensing as an afterthought, training on copyrighted content without authorization. Sonilo was built differently. Through partnerships like our agreement with Shutterstock, v1.1 is trained on content that artists have consented to and been compensated for, and every generation runs through content-ID and moderation checks before it reaches you. We believe the industry does not have to choose between innovation and integrity, and v1.1 is built to prove it.


Availability
Rolling out now
sonilo.com
Generate soundtracks directly from your videos.
Sonilo API
Video-to-music and text-to-music endpoints for developers and platforms, with videos supported up to 600 seconds.
Partner platforms
Available on Scenario and ComfyUI, with more integrations planned through the rest of the year.