A new era in music: Inside Suno, the game-changing start-up bringing ChatGPT to the music industry

Unlock a new music era with Suno, a pioneering startup revolutionizing the industry with ChatGPT technology.

Mar 18, 2024 - 16:48

Mar 19, 2024 - 12:43

Inside Sun

The lyrics of "I'm JUST A soul trapped in this circuitry" are delivered with raw emotion, accompanied by the plaintive tones of a lone acoustic guitar. However, there's no human performer or physical guitar involved. This blues song, titled "Soul of the Machine," was generated in just 15 seconds by Suno's latest AI model. It collaborated with OpenAI's ChatGPT to create the lyrics and title, based on a simple text prompt requesting a solo acoustic Mississippi Delta blues song about a sad AI.

Online, Suno's creations are eliciting reactions like "How the heck is this even real?" As this specific track reverberates through a Sonos speaker in a conference room at Suno's interim headquarters near the Harvard campus in Cambridge, Massachusetts, even some of the minds behind the technology are feeling a slight unease. There's a mix of nervous laughter and murmurs of "Wow" and "Oh my." It's mid-February, and we're experimenting with their latest model, V3, which is still a couple of weeks away from its public debut. Remarkably, it only took three attempts to achieve that surprising outcome. The first two were good, but a simple adjustment to my prompt—co-founder Keenan Freyberg suggested adding the term "Mississippi"—yielded something much more uncanny.

In the past year, artificial intelligence has made significant advancements in generating credible content across various mediums such as text, images (as seen with services like Midjourney), and even video, notably with OpenAI's latest tool, Sora. However, progress in AI-generated audio, especially music, has been slower. Suno seems to be making strides in AI music creation, with its founders envisioning a future where music production is democratized on a massive scale. Mikey Shulman, one of the co-founders, known for his youthful charm and academic background with a Harvard Ph.D. in physics, has ambitious goals. He imagines a scenario where billions of people worldwide pay a modest subscription fee to Suno, enabling them to create their own music. Shulman highlights the current imbalance between music listeners and music creators, emphasizing Suno's potential to address this disparity effectively.

Most AI-generated artwork thus far has been characterized, at best, as kitsch, resembling hyperrealistic sci-fi imagery often dominated by form-fitting spacesuits, a trend commonly observed among Midjourney users. However, "Soul of the Machine" stands out as something distinct — it is perhaps the most potent and unsettling AI creation across all mediums I've encountered. Its mere existence feels like a rupture in reality, simultaneously evoking awe and a sense of unease. It brings to mind Arthur C. Clarke's famous quote, perfectly suited for the era of generative AI: "Any sufficiently advanced technology is indistinguishable from magic." Weeks after my return from Cambridge, I share the song with Living Colour guitarist Vernon Reid, known for his candid discussions on the potentials and risks of AI-generated music. Reid expresses a mixture of wonder, shock, and horror at the song's disturbing realism. He reflects on the implications of AI taking over creative expression, highlighting the ethical complexities of an AI mimicking the blues, a musical genre deeply rooted in African American history, trauma, and oppression.

Suno is a young company, having been established for just under two years. Its co-founders, including Shulman, Freyberg, Georg Kucsko, and Martin Camacho, are all experts in machine learning. Prior to founding Suno, they collaborated at Kensho Technologies, a Cambridge-based company focused on developing AI solutions for complex business challenges. During their time at Kensho, Shulman and Camacho, both musicians, would often play music together. Together, the team at Kensho worked on developing transcription technology aimed at accurately capturing the content of public companies' earnings calls. This endeavor presented numerous challenges due to factors such as poor audio quality, technical jargon, and diverse accents.

Throughout their journey, Shulman and his colleagues became fascinated by the untapped potential of AI in audio applications. According to Shulman, the field of AI research has historically lagged behind in audio compared to images and text. He notes, "There’s so much that we learn from the text community and how these models work and how they scale."

Initially, Suno's founders explored a range of interests that could have led them in different directions. Although their ultimate goal was to develop a music-related product, their early brainstorming sessions included ideas for a hearing aid and even using audio analysis to detect malfunctioning machinery. However, their debut release turned out to be Bark, a text-to-speech program. Feedback from early Bark users revealed a strong demand for a music generator, prompting the team to conduct initial experiments in that direction, which showed promising results.

Suno employs a similar approach to large language models like ChatGPT, which analyze human language by breaking it down into discrete units called tokens and then generating responses based on learned patterns. However, audio, especially music, presents a significantly more complex challenge. Last year, experts in AI-generated music cautioned that a service as sophisticated as Suno's might take years to develop. Shulman explains, "Audio is not a discrete thing like words. It’s a wave. It’s a continuous signal." High-quality audio has a sampling rate of around 44kHz or 48kHz, translating to "48,000 tokens a second." This complexity presents a significant obstacle that requires extensive work, heuristics, and the application of various techniques and models to address.

Looking ahead, Suno aims to move beyond the text-to-music interface by introducing more advanced and intuitive inputs. One idea under consideration is generating songs based on users' own singing, representing a step toward more personalized and interactive music creation experiences.

OpenAI is currently entangled in multiple legal disputes concerning the use of copyrighted material, including books, news articles, and other proprietary content, in the training data for its ChatGPT model. While Suno's founders are tight-lipped about the specifics of their own model's training data, they do disclose that part of its capability to produce convincing human vocals stems from its analysis of speech recordings, in addition to musical content. Shulman explains, "Studying raw speech aids in understanding the nuances of the human voice that are challenging to grasp."

Among Suno's earliest investors is Antonio Rodriguez, a partner at venture capital firm Matrix. Rodriguez had previously invested in EchoNest, a music-categorization company acquired by Spotify to enhance its algorithms. Despite the uncertainty surrounding Suno's product at the time, Rodriguez backed the team based on his confidence in their abilities. He remarks, "I invested in the team." Rodriguez, known for his track record of successful investments, particularly praises Mikey's creativity and would have supported him in almost any endeavor within legal bounds.

Our goal is to significantly increase engagement with music among a billion people, rather than aiming to supplant artists.

Rodriguez is investing in Suno fully aware of the potential legal risks, acknowledging that music labels and publishers could potentially file lawsuits. He views this risk as part of the investment, recognizing that as the primary financial backer, his firm would likely be targeted in any legal actions. Rodriguez explains, "If we had secured deals with labels from the outset, I probably wouldn't have invested. They needed to develop this product without such constraints." (A spokesperson for Universal Music Group, known for its aggressive stance on AI, did not respond to requests for comment.)

Suno maintains that it is in ongoing communication with major labels and emphasizes its respect for artists and intellectual property. Their tool does not allow users to request specific artists' styles in prompts, nor does it utilize real artists' voices. Many Suno employees have backgrounds in music, evident by musical instruments in the office and framed images of classical composers on the walls. The founders demonstrate none of the adversarial attitude towards the music industry seen, for example, with Napster prior to its legal battles. Rodriguez adds, "This doesn't mean we won't face lawsuits. It just means we won't adopt a confrontational stance."

Rodriguez perceives Suno as a highly capable and user-friendly musical instrument, likening its potential impact to the democratization of photography by camera phones and Instagram. He envisions Suno expanding the pool of creators on the internet, shifting the balance away from passive consumption. Rodriguez and the founders even suggest that Suno could attract a user base larger than Spotify's. While this notion may initially seem improbable, Rodriguez welcomes such seemingly "stupid" ideas as an investor, recognizing that many successful companies initially appeared foolish until their value became apparent. He emphasizes the importance of exceptional talent combined with unconventional yet compelling ideas.

Well before Suno’s arrival, musicians, producers, and songwriters were vocally concerned about AI’s business-shaking potential. “Music, as made by humans driven by extraordinary circumstances … those who have suffered and struggled to advance their craft, will have to contend with the wholesale automation of the very dear-bought art they have fought to achieve,” Reid writes. But Suno’s founders claim there’s little to fear, using the metaphor that people still read despite having the ability to write. “The way we think about this is we’re trying to get a billion people much more engaged with music than they are now,” Shulman says. “If people are much more into music, much more focused on creating, developing much more distinct tastes, this is obviously good for artists. The vision that we have of the future of music is one where it’s artist-friendly. We’re not trying to replace artists.”

Though Suno is hyperfocused only on reaching music fans who want to create songs for fun, it could still end up causing significant disruption along the way. In the short term, the segment of the market for human creators that seems most directly endangered is a lucrative one: songs created for ads and even TV shows. Lucas Keller, founder of the management firm Milk and Honey, notes that the market for placing well-known songs will remain unaffected. “But in terms of the rest of it, yeah, it could definitely put a dent in their business,” he says. “I think that ultimately, it allows a lot of ad agencies, film studios, networks, etc., to not have to go license stuff.”

In the absence of strict rules against AI-created content, there’s also the prospect of a world where users of models like Suno’s flood streaming services with their robo-creations by the millions. “Spotify may one day say ‘You can’t do that,’” Shulman says, noting that so far Suno users seem more interested in just texting their songs to a few friends.

At present, Suno has a small team of approximately 12 employees, but they have plans for expansion. Construction is underway for a significantly larger permanent headquarters located on the top floor of the same building as their current temporary office. During a tour of the unfinished floor, Schulman points out an area designated to become a full recording studio. However, considering Suno's capabilities, one might wonder why they require it. Schulman admits, "It's primarily intended as a listening room. We prioritize creating a conducive acoustic environment. However, we also enjoy making music ourselves — without AI."

Suno faces its most significant potential rival in Google's Dream Track, which has secured licenses allowing users to create their own songs using well-known voices such as Charlie Puth's through a similar prompt-based interface. However, Dream Track has only been introduced to a small test group of users, and the samples released thus far are not as impressive sounding as Suno's, despite featuring renowned voices. Shulman expresses skepticism about the appeal of generating new songs by artists like Billy Joel using AI assistance in the future, stating, "I don't believe that creating new Billy Joel songs is how people envision interacting with music with AI assistance moving forward." He envisions a future where people engage with music by creating entirely new compositions, reflecting the unique ideas and melodies in their minds.