Industry

Sonilo Is Now Available on fal.ai

Written by
Sonilo Team
Published
Sonilo is now available on fal.ai

There's a moment in every edit when the video is finished — but not complete. The cuts are there. The pacing works. The story is visible. But the emotional layer is still missing.

For decades, creators solved that gap the same way: searching through stock libraries, dragging tracks into timelines, cutting around beats, adjusting durations, and compromising between what the video needed and what existing music could offer.

AI was supposed to fix this. Instead, much of the industry simply accelerated music generation while leaving the core workflow untouched — and increasingly, the legal foundation under it. In June 2024, the major labels filed landmark copyright cases against Suno and Udio, and by late 2025 both Warner and Universal had pushed those services toward licensed models. The shift signaled something we'd been building toward from the start: speed without rights is not a real product.

landmark copyright cases against Suno and Udio

Today, we're excited to announce that Sonilo v1.0 is now available on fal.ai.

This launch brings Sonilo's video-to-music generation model into one of the leading platforms for generative media infrastructure — making it easier for developers, creators, and AI-native products to integrate cinematic soundtrack generation directly into their workflows. But more importantly, it represents a different vision for what AI music can become.

Music That Starts With the Video

Sonilo was not built as a text-to-music model. It was built around a simple observation: every video already contains a soundtrack waiting to be discovered.

Pacing. Motion. Transitions. Emotional rhythm. Narrative tension. Visual energy. These signals already exist inside the footage itself — and the academic field that studies this, video-to-music (V2M) generation, has spent years mapping exactly how visual feature extraction, conditioning mechanisms, and music generation frameworks combine to produce coherent soundtracks from footage. The problem is that traditional music workflows force creators to manually translate those signals into prompts, searches, edits, and licensing decisions.

Sonilo removes that layer entirely.

Upload a video, and Sonilo analyzes the structure, pacing, and emotional arc of the footage to compose original music that matches automatically — in seconds, without a single text prompt. In our internal evaluations on a 200-clip benchmark across narrative, advertising, and travel footage, the model produced a usable first-pass soundtrack on 87% of clips, where "usable" means an editor accepted it without regeneration.

Every generation produces multiple soundtrack directions for the same footage, typically three to four distinct emotional interpretations, allowing creators to compare options instantly rather than searching through endless libraries or tweaking prompts.

The music is generated to the exact duration of the video automatically — accurate to within ±0.3 seconds in our tests — designed to feel composed for the footage instead of retrofitted onto it.

This is not background music generation. It is soundtrack generation built for the timeline.

Built for the Next Generation of Creative Tools

As AI video rapidly evolves, one thing has become increasingly clear: visual generation is scaling faster than audio infrastructure.

Millions of videos can now be generated instantly. But finding music that actually fits those videos still depends on workflows designed for the pre-AI era.

We believe the next generation of creative tools will not treat music as a separate asset layer. Music should adapt dynamically to the content itself. That belief is why Sonilo exists, and why launching on fal.ai feels especially meaningful.

fal.ai has become one of the core infrastructure platforms powering the generative media ecosystem, serving over 2.5 million developers and partnering with companies like Amazon MGM Studios, Canva, and Adobe to deploy high-performance AI models at production scale. By making Sonilo available through fal.ai, we're enabling builders to integrate soundtrack intelligence directly into AI video pipelines, creator tools, editing platforms, and multimodal workflows — with the same warm-inference latency profile (typically sub-second cold paths on optimized endpoints) that production teams have come to expect from the platform.

The future of video creation is not just AI-generated visuals. It's fully adaptive audiovisual generation.

Why Licensed AI Matters

The AI music industry has largely treated licensing as a problem to solve later. We disagreed with that approach from the beginning.

Every Sonilo model is trained on professionally licensed content where artists have consented to participation and are compensated for their work. The contrast with the broader industry is no longer theoretical: the RIAA's 2024 actions accused unlicensed generators of "mass infringement of copyrighted sound recordings on an almost unimaginable scale," and the subsequent Warner and Universal settlements have effectively redrawn the map — licensed foundations are no longer optional for any team that wants to ship into commercial workflows.

We believe creative infrastructure cannot become truly foundational if creators themselves are excluded from the value chain.

This is not simply a legal distinction. It is a product philosophy.

Creative tools shape creative culture. The systems built today will influence how music is created, distributed, and valued for the next decade. We believe that future should be built with artists, not against them.

No gray areas. No retroactive fixes. No "move fast and apologize later."

Just original music generated for original videos — on a professionally licensed foundation from day one.

sonilo user interface

The Soundtrack Every Video Needs

Sonilo exists because we felt a gap long before we could fully describe it. A video can be visually complete and emotionally unfinished at the same time. Music is often the difference between content that is watched and content that is felt.

But the process of finding that music has remained unnecessarily difficult for far too long. We believe every original video deserves an equally original soundtrack. Not pulled from a stock library. Not copied from existing artists. Not manually forced into place.

But composed specifically for that moment, that pacing, and that story. That's the future we're building. And today, we're excited to start building it together with fal.ai.

Frequently Asked Questions

How does Sonilo handle videos with existing dialogue or sound effects?

Sonilo's analysis focuses on visual signals — pacing, motion, transitions, and emotional arc — and is robust to footage with existing audio. The model generates the score as a separate stem so editors can mix it under existing dialogue or VO at their own levels, rather than producing a flattened mixdown that locks in volume relationships.

Can the generated soundtrack be used commercially without additional clearances?

Yes. Because Sonilo's training corpus is professionally licensed at the source and artists are compensated under the participation framework, outputs are cleared for commercial use under the terms of the deployment plan. This is structurally different from outputs produced by training pipelines currently subject to active copyright litigation — where the commercial-use question remains legally unresolved.

What video formats and durations does the fal.ai endpoint support?

The v1.0 endpoint accepts standard MP4 and MOV inputs and is tuned for clips between 5 and 180 seconds, which covers the bulk of advertising, social, and short-narrative use cases. For longer-form footage, we recommend segmenting at scene boundaries — the model's structural analysis is most accurate when the clip represents a coherent narrative beat rather than a multi-scene reel.