The WSJ tested 5 chatbots: ChatGPT from OpenAI, Copilot from Microsoft, Gemini from Google, Perplexity and Claude from Anthropic. Do you want to guess the winner and loser?
The great challenge of AI
Please consider The Big AI Challenge: We Test Five Best Bots on Useful Everyday Skills.
It’s a free link, which I use sparingly so as not to abuse privileges. Here are some excerpts.
Human-sounding robots barely existed two years ago. Now they are everywhere. There’s ChatGPT, which started the whole generative AI craze, and the big pushes from Google and Microsoft, plus countless other smaller players, all with their own smooth-talking assistants.
We put five of the leading bots through a series of blind tests to determine their usefulness. While we were hoping to find the Caitlin Clark of chatbots, that’s not exactly what happened. They excel in some areas and fail in others. Plus, they all evolve quickly. During our testing, OpenAI released an upgrade to ChatGPT that improved its speed and news awareness.
We wanted to see the range of answers we would get by asking real questions and organizing daily tasks – not a scientific assessment, but one that reflects how we will use all of these tools. Think of it as the chatbot Olympics.
We have ChatGPT from OpenAI, famous for its versatility and ability to remember user preferences. (News Corp, owner of the Wall Street Journal, has a content licensing partnership with OpenAI.) Anthropic’s Claude, from a socially conscious startup, aims to be innocuous. Microsoft’s Copilot leverages OpenAI technology and integrates with services such as Bing and Microsoft 365. Google’s Gemini accesses the popular search engine for real-time answers. And Perplexity is a research-focused chatbot that cites sources with links and stays up to date.
Although each of these services offers a free version, we used the $20 per month paid versions to improve performance, to evaluate their full capabilities across a wide range of tasks. (We used the latest ChatGPT GPT-4o model and the Gemini 1.5 Pro model in our testing.)
creative writing
One of the biggest surprises was the difference between working writing and creative writing. Copilot finished dead last in working writing, but was easily the funniest and smartest in creative writing. We asked for a poem about a turd on a log. We requested a wedding toast featuring the Muppets. We requested a fictional street fight between Donald Trump and Joe Biden. With Copilot, the jokes never ended. Claude was in second place, with clever jokes about the two presidential challengers.
Overall results
What have these Olympian challenges taught us? Each chatbot has unique strengths and weaknesses, which are worth exploring. We’ve seen few outright errors and “hallucinations,” where robots take unexpected directions and completely make things up. The bots provided mostly helpful answers and avoided controversy.
The biggest surprise? ChatGPT, despite its big update and massive fame, is not leading the pack. Instead, the lesser-known Perplexity was our champion. “We optimize to be concise,” says Dmitry Shevelenko, business director at Perplexity AI. “We optimized our model to be more concise, which forces it to identify the most essential components.”
Congratulations to Perplexity
Judging by the scores, it seems to be close between Perplexity and ChatGPT.
Copilot was incredibly bad, only winning creative writing while coming in the bottom 5 out of 8 categories.
There was no bias in the judging, as the judges simply evaluated the answers without knowing who provided them.
If I had to score this, I would award 4 points to the winner, 3 for second place, 2 for third, and 1 for fourth.
I would use speed to break the tie. Accuracy is much more important than speed.
Finally, I would subtract 2 points for terrible answers. Copilot has one on food. When asked to prepare a recipe that met many dietary restrictions, the answer included two sticks of butter and 4 large eggs.
Perplexity has 3 firsts, 1 second, 4 thirds and no quarters. It was last only in speed, my tiebreaker.
My ranking on this rating system (I don’t know what the WSJ did) is as follows.
- Perplexity: 23
- ChatGPT: 18 (22 if you give 4 for speed)
- Gemini: 18 (21 if you give 3 for speed)
- Claude: 15 (16 if you award 1 for speed)
- Copilot: 9 (11 if you count 2 for speed) but subtract 2 for terrible responses
For me personally
For me personally, it’s not even close. Consider the category summarizing the elements of the Web.
Even the Claude premium account was not able to handle web links.
Wikipedia pages for very famous people can be wordy, so we asked for a summary of Paul McCartney’s. Some provided short blurbs with obvious facts about the Beatles. Copilot responded in a skimmed format and included some lesser-known fun facts.
The winner of the Perplexity category always summed things up well, including skimming subtitles in a YouTube video..
Scanning the subtitles of a YouTube video is very impressive. Unlike the rest, Perplexity cites sources with links and stays up to date.
I won’t use anything without a link.
Claude replied “I apologize, but I can’t open URLs, links or videos”, rendering it useless for anything.
If you enjoy writing fiction and humor, you may want to try Microsoft’s Copilot.
I haven’t tried any of them, except for a very early version of ChatGPT.
The AI boom (or do I mean bust)
Also consider reorganizing technicians for the artificial intelligence boom
Tech workers are feverishly retooling their skills at a time when every company suddenly wants to become an artificial intelligence company and every worker feels the need to bring in AI.
“I’ve been leading with an AI-friendly resume for two or three months,” says Asif Dhanani, 31, of Irvine, Calif., who was laid off from his job as a technical product manager at Amazon in 2017. March .
Dhanani landed numerous interviews for AI product manager positions, but he did not receive any offers.
The tech job market is in an unbalanced state. There is demand for a specific type of entry-level AI talent, namely those with the technical knowledge or experience working with large language models, or LLMs, that power chatbots with the ability to generate content. Some companies are looking for candidates with these skills, but not enough qualified workers to perform them.
And then there’s everyone. Thousands of people have been laid off in recent years, and many of those who remain employed face new management styles, reorganizations, and downsizing as more resources are shifted to AI. These workers are now taking AI courses, adding buzzwords to their resumes, and competing in an increasingly crowded field.
Tony Phillips, co-founder of Deep Atlas boot camp, says he’s noticed a significant increase in the level of urgency tech workers feel about the need to upskill. Deep Atlas recently added five additional locations to its summer AI boot camp.
“People started to realize that their work could really be obsolete,” he says. “You’re probably not going to be replaced by AI. You’re going to be replaced by someone who knows AI and does your job.
New job postings in the technology sector fell from an average of about 308,000 per month in 2019 to 180,000 per month in April, according to technology trade association CompTIA.
Big tech companies are trying to make their entire workforce more AI-literate. Trailhead, Salesforce’s training platform, currently offers 43 AI-related courses, ranging from fundamentals to the ethical use of AI. More than 60,000 Salesforce employees have taken at least one AI course.
“We believe everyone should upskill themselves and, in one way or another, have the tools they need to succeed in this new world,” said Jayesh Govindarajan, senior vice president of Salesforce AI.
Creative destruction
A great cycle of creative destruction is on the horizon. Many people will lose their jobs.
However, we tend to overestimate how quickly things happen.
I don’t know where we stand, and I doubt anyone does either. But I know we have serious supply issues and energy concerns around all the data centers we need to support AI.
Data center power consumption
According to the IEA, electricity consumption by data centers, artificial intelligence (AI) and the cryptocurrency sector could double by 2026. The global push towards electric vehicles must be taken into account.
Data centers
Please consider AI and cryptocurrency will double data center energy consumption by 2026
If you think solar and wind can provide the energy needed, think again. Even if they could, is the network capable?