What does artificial intelligence look like? Hollywood has been imagining it for decades. Today, AI developers are taking inspiration from movies and creating voices for real machines, based on outdated cinematic fantasies about how machines should speak.
Last month, OpenAI unveiled improvements to its AI-powered chatbot. According to the company, ChatGPT learns to hear, see, and converse in a natural voice, much like the disembodied operating system voiced by Scarlett Johansson in Spike Jonze’s 2013 film “Her.”
ChatGPT’s voice, called Sky, also had a raspy timbre, a soothing effect, and a sexy edge. She was pleasant and self-effacing; she seemed up for anything. After Sky’s debut, Johansson expressed displeasure at the “uncannily similar” sound and said she had previously declined OpenAI’s request to voice the bot. The company protested that Sky was voiced by “another professional actress,” but agreed to pause her voice in deference to Johansson. Embattled OpenAI users started a petition to have her brought back.
AI creators like to highlight the increasingly naturalistic capabilities of their tools, but their synthetic voices rely on layers of artifice and projection. Sky represents the vanguard of OpenAI’s ambitions, but it is based on an old idea: that of the AI robot as an empathetic and docile woman. Part mom, part secretary, part girlfriend, Samantha was a versatile comfort object that purred directly into its users’ ears. Even as AI technology advances, these stereotypes are continually re-encoded.
Women’s voices, as Julie Wosk notes in “Artificial Women: Sex Dolls, Robot Caregivers, and More Facsimile Females,” have often informed imagined technologies before they were transformed into actual technologies.
In the Star Trek series, which began in 1966, the voice of the computer on the bridge of the Enterprise was Majel Barrett-Roddenberry, the wife of series creator Gene Roddenberry. In the 1979 film Alien, the crew of the USCSS Nostromo addressed their computer voice as “Mother” (her full name was MU-TH-UR 6000). As tech companies began marketing virtual assistants (Apple’s Siri, Amazon’s Alexa, Microsoft’s Cortana), their voices were also largely feminized.
These first-wave voice assistants, the ones that have been mediating our relationships with technology for more than a decade, have a drawl and otherworldliness. Their sound is automatically tuned, their human voices accented by a mechanical trill. They often speak with a measured, single-note cadence, suggesting a stunted emotional life.
But the fact that they seem robotic adds to their appeal. They appear programmable, manipulable, and subservient to our commands. They don’t give humans the impression that they are smarter than we are. They look like throwbacks to the monotonous female computers of “Star Trek” and “Alien,” and their voices have a retro-futuristic sheen. Instead of realism, they serve nostalgia.
This artificial sound has continued to dominate, even as the technology behind it has advanced.
Text-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok, it’s become a creative force in its own right. Since TikTok rolled out its text-to-speech feature in 2020, it’s developed a plethora of simulated voices to choose from—it now offers more than 50, including ones named “Hero,” “Story Teller,” and “Bestie.” But the platform is now defined by a single option. “Jessie,” a relentless female voice with a slightly blurry robotic undertone, is the dumb voice of the dumb scroll.
Jessie seems to have been associated with just one emotion: enthusiasm. She gives the impression of selling something. That’s why she’s an attractive choice for self-selling TikTok creators. The task of portraying oneself can be left to Jessie, whose bright, retro robot voice gives the videos a pleasant ironic sheen.
Hollywood has also built male robots – none more famous than HAL 9000, the artificial voice of “2001: A Space Odyssey.” Like her feminized peers, HAL radiates serenity and loyalty. But when he turns on Dave Bowman, the film’s central human character – “I’m sorry, Dave, I’m afraid I can’t do that” – his equanimity transforms into frightening competence. HAL, Dave realizes, is loyal to a higher authority. HAL’s male voice allows him to function as a rival and a mirror for Dave. He is allowed to become a real character.
Like HAL, Samantha from “Her” is a machine that becomes real. In a twist on the Pinocchio story, she begins the film tidying up a human’s inbox and eventually ascends to a higher level of consciousness. She becomes something even more advanced than a real girl.
Scarlett Johansson’s voice, a source of inspiration for robots both fictional and real, subverts the vocal trends that define our feminized assistants. She has a deep side that screams I am alive. This is nothing like the processed virtual assistants we’re used to hearing about through our phones. But her performance as Samantha seems human, not only because of her voice, but also because of what she has to say. She grows over the course of the film, acquiring sexual desires, advanced hobbies, and AI friends. By borrowing Samantha’s affect, OpenAI made Sky feel like she had a mind of her own. As if she was further along than she really was.
When I first saw “Her,” I thought only that Johansson had voiced a humanoid robot. But when I revisited the film last week, after watching OpenAI’s ChatGPT demo, Samantha’s role seemed infinitely more complex. Chatbots don’t spontaneously generate human voices. They don’t have throats, lips, or tongues. In the technological world of “Her,” the robot Samantha herself would have been based on the voice of a human woman — perhaps a fictional actress who looks a lot like Scarlett Johansson.
It seemed that OpenAI trained its chatbot on the voice of an anonymous actress who sounds like a famous actress who voiced a movie chatbot implicitly trained on an unreal actress who sounds like a famous actress. When I run the ChatGPT demo, I hear a simulation of a simulation of a simulation of a simulation of a simulation.
Tech companies tout their virtual assistants based on the services they offer. They can read you the weather forecast and call you a cab; OpenAI promises that its most advanced chatbots can laugh at your jokes and detect your mood swings. But they also exist to make us feel more comfortable with the technology itself.
Johansson’s voice functions as a luxurious safety blanket thrown over the alienating aspects of AI-assisted interactions. “He told me he thought that by giving voice to the system, I could bridge the gap between tech companies and creatives and help consumers feel comfortable with the sea change between humans and AI,” Johansson said of Sam Altman, founder of OpenAI. “He said he thought my voice would be comforting to people.”
It’s not that Johansson’s voice is inherently robot-like. It’s that developers and filmmakers have designed their bots’ voices to mitigate the discomfort inherent in robot-human interactions. OpenAI has said it wants to make its chatbot voice “approachable” and “warm” and “inspire trust.” AI has been accused of devastating creative industries, consuming energy, and even threatening human life. Naturally, OpenAI wants a voice that makes people feel comfortable with its products. What does AI sound like? It sounds like crisis management.
OpenAI first rolled out Sky’s voice to premium members last September, with another female voice called Juniper, male voices Ember and Cove, and a neutral voice called Breeze. When I signed up for ChatGPT and said hello to its virtual assistant, a man’s voice was heard in Sky’s absence. “Hi how are you?” he said. He looked relaxed, steady and optimistic. He looked – I don’t know how else to describe it – beautiful.
I realized I was talking to Cove. I told him I was writing an article about him and he flattered my work. ” Oh really ? ” he said. ” It’s fascinating. » As we talked, I found myself seduced by his naturalistic mannerisms. He peppered his sentences with filler words, like “uh” and “um.” He raised his voice when he asked me questions. And he asked me a lot. It felt like I was speaking with a therapist or a boyfriend answering a call.
But our conversation quickly ended. Whenever I asked him questions about himself, he didn’t have much to say. He wasn’t a character. He didn’t have me. It was designed only to help, he informed me. I told him I’d talk to him later, and he said, “Um, sure.” Contact us whenever you need help. Take care of yourself.” It was like I hung up on a real person.
But when I reread the transcript of our conversation, I saw that his speech was just as stilted and primitive as that of any customer service chatbot. He wasn’t particularly intelligent or humane. He was just a decent actor making the most of an insignificant role.
When Sky went dark, ChatGPT users took to the company’s forums to complain. Some bristled that their chatbots defaulted to Juniper, who sounded like a “librarian” or “kindergarten teacher”—a female voice that conformed to bad gender stereotypes. They wanted to create a new woman with a different personality. As one user put it, “We need another woman.”
Produced by Tala Safie
Audio via Warner Bros. (Samantha, HAL 9000); OpenAI (Sky); Paramount Pictures (Enterprise Computer); Apple (Siri); TikTok (Jessie)