“The Best Thing I Never Said”: Understanding AI Voice Technology and Its Legal Implications on the Music Industry

Oct 24

Move over Hollywood—there's a new star in town, and it doesn't need to spend weeks in the studio, take hours to memorize a script, or request for “just one more take.” In a world where AI voice technology is stealing the limelight, the entertainment landscape is undergoing a futuristic shift in more ways than one. From narrating documentaries to lending charisma to virtual characters, the indiscernible tones of artificial intelligence are rewriting show business as we know it. AI voices are not just reading the lines, but stealing the final scene one “byte” at a time!

How it Works

AI voice cloning technology has ushered in a new era of intuitive and seamless human to machine interaction. Where the traditional text-to-speech technology allows text to be read aloud by a humanized computer-generated voice, voice cloning transforms the text into the exact voice of a specific human being.[1] From a technical standpoint, the output voice from AI voice cloning remains synthetic; however, it is expected to possess a more human-like quality compared to traditional text-reading voices.[2] This is because it has been trained using a real human voice rather than being generated entirely from scratch.[3]

First, in order to precisely replicate a voice, the algorithms must store mass volumes of the target voice’s data.[4] The AI software specifically picks up certain vocal idiosyncrasies like tone, pitch, accent, and range.[5] Due to the vast range of sounds in languages, it's crucial to have a diverse dataset at the beginning of the replication process.[6] This ensures the model can reproduce any sound the text might need later on.[7] Next, the software breaks the data into individual soundwaves, labeling each one by their corresponding phoneme.[8] Once the algorithm trains the system on the original data set, it can produce an AI voice that sounds exactly like the original voice, based on any text input imaginable.[9] The final stage of the cloning system involves a post-processing step to enhance the quality and naturalness of the generated speech.[10] This final step essentially puts a bow on it and ensures that the output voice is high-quality, clean, clear, and indistinguishable from the original.[11]

Application of AI Voice Cloning Technology: Pros and Cons

Like with the advancement of any new technology and its emergence into everyday life, there is usually a struggle to strike a balance between innovation and responsible use. Aside from the obvious expansion in efficiency, AI voice cloning has directly addressed a range of accessibility concerns. According to an NPR report on the use of AI voice in in the medical field, some patients that are either speech impaired or were told their voices could disappear indefinitely turned to AI tech to save them.[12] For example, by prematurely preserving his voice in AI software, actor Val Kilmer, who lost his voice to throat cancer a few years ago, has the opportunity to reclaim something approaching his former vocal strength through AI voice cloning.[13] Additionally, in the post COVID-19 era of education, voice cloning has made immense strides in the realm of global accessibility to the classroom. Teachers can connect with students internationally, using synthetic versions of their voice to effortlessly speak other languages.[14] Furthermore, students who struggled with speaking or class communication could actively take part in discussions, connecting freely with their teachers and peers.[15]

While AI voice technology undeniably introduces groundbreaking capabilities for society, its proliferation raises significant ethical and practical concerns that merit careful consideration. The most obvious concern across industries is the use of “deepfakes” to spread false or defaming information.[16] In the context of AI voice cloning, deepfakes refer to “a synthetic media where a person’s likeness is replaced with someone else’s, creating convincing fake audio or video clips.”[17] A famous example of deepfake that circulated global news was a cloned version of Barack Obama's voice warning people about the dangers of fake news, created by comedic actor and film director Jordan Peele.[18] This presidential deepfake sparked a deeper discussion about being wary of machines that sound a little too much like humans.[19] Experts in cybersecurity have warned the public on the use of AI to clone the voices of children to scam their parents.[20] One ABC report described how just a few seconds from a video on social media can give scammers all they need to recreate someone’s voice using artificial intelligence.[21] A terrified mother recounted the fake ransom conversation she had with a scammer demanding $1 million, hearing her daughter’s panicked voice in the background, “‘This man gets on and he says, ‘Listen here. I have your daughter. If you call anybody, you call the police, I'm gonna pump your daughter so full of drugs, I'm gonna have my way with her, I'm gonna drop her in Mexico, and you're never gonna see your daughter again.’”[22] Consequently, as parents are becoming more acquainted with the jarring capabilities of voice cloning technology, some are adopting “safe words” to be used when talking with a loved one who has supposedly been kidnapped.[23] As voice cloning technology and its uses progress, it will be important to balance the incredible possibilities with ethical pitfalls of these advancements.

AI Voice Cloning v. The Music Industry

As expected, the emergence of AI voice cloning has swiftly reshaped the way people create, experience, and interact with music. Voice cloning has allowed fans to hear Rihanna sing the Little Mermaid’s “Part of Your World,”[24] Drake rapping Beyonce’s “Heated,”[25] and even Despicable Me’s minions take on Bad Bunny’s famous “Te Bote”[26] remix. Some industry icons are experimenting with the new technology and putting AI into their creative toolbox, while others are skeptical of their “moneymakers” being stolen without a cost.[27] From a music management perspective, AI’s capacity to generate music tirelessly, without the need for breaks and at minimal cost, poses a significant difference to the hiring of human artists.[28] Industry producers note that,“[w]hile a human artist may require years of practice and substantial financial investment to produce an album, AI can churn out a comparable album in minutes without any human-associated costs.”[29] This realization about AI’s efficiency could lead to a massive undercutting of the value in human-generated music forever.

In 2021, Capitol Records shocked the industry by signing an entirely virtual AI rapper, “FN Meka.”[30] Although initially praised for its innovative decision in acquiring FN Meka, Capitol Records quickly received backlash from all angles, and eventually terminated Meka’s contract.[31] Steam heading the controversy around the virtual rapper was what skeptics referred to as “digital blackface.”[32] FN Meka was virtually portrayed as black male cyborg with braided green hair and tattoos all over his face and body.[33] Furthermore, Meka often used the N-word in his songs and almost strictly rapped about topics like police brutality and racism.[34] Critics argued that Meka’s persona and output was built of off African American stereotypes and an extreme example of cultural appropriation, deeming it entirely unethical for a record label to control and monetize off it.[35] Additionally, fans discovered that the human actor behind FK Meka’s voice bytes did not receive any public credit or monetary compensation from Factory New (the creators of FK Meka) or Capitol Records.[36] (digital news). This incident is just one example among many of how an unbridled and unregulated use of AI in music can lead to unexpected polarization and confusion regarding compensation.

Legal Implications in Music

As discussed above, one of the latest uses for voice AI technology is the editing of deepfake vocal synthesizers to sound exactly like the famous voices of top artists in the industry.[38] With a growing number of artists discovering songs they never actually sang on social media platforms, there has been a massive push for legal regulation of the new technology.[39] One of these disgruntled artists is rap industry royalty, Jay-Z, whose highly recognizable voice was anonymously cloned to rap Shakespeare’s famous “To Be, or Not To Be” soliloquy.[40] As the lyric video started to grab fans’ attention on YouTube, Jay-Z’s record label demanded it be removed, citing copyright infringement.[41] The record label representatives’ argument addressed the idea that “if a voice is cloned without the individual's explicit permission, it violates their right to control the use of their personal identity.[42] The financial success of many artists is directly associated with the monetization of their personal identity, and any threat to that could be catastrophic for their career and future of their craft. Eventually, after finding lack of valid legal grounds for removal, YouTube reinstated the Jay-Z video.[43] As more cases like this appear on the docket, the widespread impact this AI technology has on artists' rights to their own voice and the associated perception of their music will change the industry forever.

Under its current regulations (or lack thereof), AI voice cloning lives in a complex and uncharted legal gray area.[44] Under the existing state of copyright law, regulations focus on protecting tangible creative expressions like lyrics and melodies, but voices on their own are not universally recognized as intellectual property, and do not receive protections as “works of authorship.”[45] Effectively, this gives a complete legal pass for an individual’s voice to be cloned and crafted into another’s song using AI, without any consent or knowledge from the original artist.[46] Aside from copyright implications, the privacy laws surrounding voice cloning are also at risk.[47] Sound “bytes” and personal biometric data can legally be collected and processed without consent, raising major legal questions as to who has access to it and where it can be stored.[48]

[49]

The Future for Music

With the rapid insertion of AI into the music creation process, artists, their agents, entertainment lawyers, and other legal experts are appealing to lawmakers to amend the current copyright laws to account for AI-related breaches.[50] One strategy is to push for appropriate recognition of an artist’s distinctive voice as a unique and protectable facet of their work, giving direct legal grounds to contest unauthorized voice cloning.[51] Another way to expand regulation of voice cloning would be through promoting more precise guidelines to the United States’ “fair use” doctrine.[52] The fair use doctrine promotes freedom of expression by allowing limited use of copyrighted material without explicit permission from the original rights holders.[53] AI researchers and developers lean heavily on fair use to justify the use of famous voice bytes and to protect against legal repercussions for doing so.[54] In response, the music industry is working quickly and relentlessly to either change the definition of “fair use” by the Supreme Court or at least place protections and exceptions surrounding AI’s “not-so-fair” use.[55]

[1] Carol Moh, AI Voice Cloning: What It Is and How It Works, Lovo (August 4, 2023), https://lovo.ai/post/ai-voice-cloning-what-it-is-and-how-it-works.

[2] Id.

[3] Id.

[4] Virginie Berger, A Deep Dive Into the World of AI Voice Cloning, Digital Music News (August 31, 2023), https://www.digitalmusicnews.com/2023/08/31/voice-cloning-deep-dive/.

[5] Id.

[6] Moh, supra note 1.

[7] Id.

[8] Id. A phoneme is the basic distinctive unit of speech sound by which morphemes, words, and sentences are represented. Phoneme, Dictionary.com (last visited Oct. 23, 2020).

[9] Moh, supra note 1.

[10] Id.

[11] Id.

[12] Chloe Veltman, Send in the clones: Using artificial intelligence to digitally replicate human voices, npr (January 17, 2022, 4:18 PM), https://www.npr.org/2022/01/17/1073031858/artificial-intelligence-voice-cloning.

[13] Id.

[14] Benefits of Synthetic Voice and Voice Cloning for Today’s Top Brands, wellsaid (August 18, 2022), https://wellsaidlabs.com/blog/synthetic-voice-and-voice-cloning/.

[15] Id.

[16] Cliff Weitzman, What is a Deepfake? What is Voice Cloning?, Speechify (July 16, 2023), https://speechify.com/blog/audio-deepfake/?landing_url=https%3A%2F%2Fspeechify.com%2Fblog%2Faudio-deepfake%2F.

[17] Id.

[18] Veltman, supra note 12.

[19] Id.

[20] Justin Green & Allie Weintraub, Experts warn of rise in scammers using AI to mimic voices of loved ones in distress, Abc News (July 7, 2023, 1:27 PM), https://abcnews.go.com/Technology/experts-warn-rise-scammers-ai-mimic-voices-loved/story?id=100769857#:~:text=In%20reality%2C%20Briana%20was%20safe,example%20of%20an%20alarming%20trend.

[21] Id.

[22] Id.

[23] Id.

[24] Rihanniaa (@Rihanniaa), Tik Tok (May 28, 2023), https://www.tiktok.com/@rihanniaa/video/7238339218437704987?_r=1&_t=8gBZVUnQEHO.

[25] Chris Sumlin (@thechrissumlin), Tik Tok (September 17, 2022), https://www.tiktok.com/@thechrissumlin/video/7144451270558944555?_r=1&_t=8gBZeWCVQ67.

[26] Clipsrandoms (@clipsrandomss16), Tik Tok (September 17, 2023), https://www.tiktok.com/@clipsrandomss16/video/7279720314051808545?_r=1&_t=8gBZXoSQxml.

[27] Berger, supra note 4.

[28] Id.

[29] Id.

[30] Id.

[31] Id.

[32] Id.

[33] Nick Breene & John Love, Attack of the Clones: AI Soundalike Tools Spin Complex Web of Legal Questions for Music (Guest Column), Billboard (May 19, 2023), https://www.billboard.com/pro/ai-music-tools-copy-artists-voices-legal-questions/.

[34] Id.

[35] Gavia Baker-Whitelaw,‘This what they think of y’all’: A.I. rapper FN Meka accused of digital blackface, Daily Dot (August 24, 2022), https://www.dailydot.com/unclick/fn-meka-tiktok-rapper-digital-blackface/.

[36] Berger, supra note 4.

[37] FN Meka (photograph), in Virtual Rapper FN Meka powered by Artificial Intelligence Signs to Major Label, XXLmag.com (August 21, 2022), https://www.xxlmag.com/fn-meka-virtual-rapper-signs-major-label/.

[38] Don McCombie, AI-Generated Music and Copyright, Clifford Chance (April 27, 2023), https://www.cliffordchance.com/insights/resources/blogs/talking-tech/en/articles/2023/04/ai-generated-music-and-copyright.html.

[39] Id.

[40] Berger, supra note 4.

[41] Id.

[42] Rachel Rosencrans, Voice Cloning Technology and its Legal Implications: An IP Law Perspective, Schmeiser Olsen & Watts LLP (August 26, 2023), https://iplawusa.com/voice-cloning-technology-and-its-legal-implications-an-ip-law-perspective/.

[43] Berger, supra note 4.

[44] Connor Edney, YouTube collaborating with labels to replicate artist voices with AI, RᴏᴜᴛᴇNᴏᴛᴇ (October 23, 2023, 8:00 AM), https://routenote.com/blog/youtube-labels-ai-replication-artist-voices/#:~:text=This%20has%20consequently%20created%20a,the%20masters%20of%20their%20music.

[45] Rosencrans, supra note 42.

[46] Id.

[47] Id.

[48] Id.

[49] Vocal Synthesis, Jay-Z raps the "To Be, Or Not To Be" soliloquy from Hamlet (Speech Synthesis), YouTube (2020), https://www.youtube.com/watch?v=m7u-y9oqUSw.

[50] Berger, supra note 4.

[51] Id.

[52] Id.

[53] Emilia David, Musicians are eyeing a legal shortcut to fight AI voice clones, The Verge (September 21, 2023, 8:00 AM), https://www.theverge.com/2023/9/21/23836337/music-generative-ai-voice-likeness-regulation.

[54] Id.

[55] Id.