New ChatGPT offers a lesson in AI hype

When OpenAI unveiled the latest version of its wildly popular ChatGPT chatbot this month, it had a new voice with human inflections and emotions. The online demonstration also featured the robot teaching a child how to solve a geometry problem.

Much to my chagrin, the demo turned out to be essentially a bait and switch. The new ChatGPT was released without most of its new features, including improved voice (which the company told me it postponed for fixes). The ability to use a phone’s video camera to get real-time analysis of something like a math problem isn’t available yet either.

Amid this delay, the company also disabled the ChatGPT voice that some said sounded like actress Scarlett Johansson, after threatening legal action, replacing it with a different female voice.

For now, what has been rolled out in the new ChatGPT is the ability to upload photos for the bot to analyze. Users can generally expect faster and more lucid responses. The bot can also do real-time translations, but ChatGPT will respond with its older, machine-like voice.

Nonetheless, it is the leading chatbot that shook up the tech industry, so it was worth looking into. After trying the accelerated chatbot for two weeks, I had mixed feelings. He excelled at language translations, but struggled in mathematics and physics. Overall, I haven’t seen a significant improvement over the latest version, ChatGPT-4. I certainly wouldn’t let him tutor my child.

This tactic, where AI companies promise crazy new features and deliver a half-baked product, is becoming a trend that is sure to confuse and frustrate people. The $700 Ai Pin, a talking pin from startup Humane, funded by OpenAI CEO Sam Altman, was universally criticized for overheating and spewing nonsense. Meta also recently added an AI chatbot to its apps that did a poor job at most of the advertised tasks, like searching for plane tickets on the web.

Companies are launching AI products prematurely, in part because they want people to use the technology to help them learn how to improve it. In the past, when companies unveiled new tech products like phones, what we were shown – features like new cameras and brighter screens – was what we got. With artificial intelligence, companies are providing a glimpse into a potential future, demonstrating technologies developed and working only under limited and controlled conditions. A mature, reliable product might arrive – or it might not.

The lesson from all this is that we, as consumers, should resist the hype and take a slow, cautious approach to AI. We shouldn’t spend a lot of money on undercooked technology until we have proof that the tools work as advertised.

The new version of ChatGPT, called GPT-4o (“o” as in “omni”), can now be tried for free on the OpenAI website and app. Non-paying users can make a few queries before hitting a timeout, and those with a $20 monthly subscription can ask the bot a greater number of questions.

OpenAI said its iterative approach to updating ChatGPT allowed it to gather feedback to make improvements.

“We believe it is important to preview our advanced models to give users an overview of their capabilities and help us understand their real-world applications,” the company said in a statement.

(The New York Times sued OpenAI and its partner Microsoft last year for using copyrighted news articles without permission to train chatbots.)

Here’s what you need to know about the latest version of ChatGPT.

Geometry and physics

To show off ChatGPT-4o’s new tricks, OpenAI released a video featuring Sal Khan, CEO of Khan Academy, an education nonprofit, and his son, Imran. With a video camera trained on a geometry problem, ChatGPT was able to walk Imran through how to solve it step by step.

Even though ChatGPT’s video analysis feature hasn’t been released yet, I was able to upload photos of geometry problems. ChatGPT solved some of the easier problems correctly, but ran into some more difficult problems.

For a problem involving intersecting triangles, which I discovered on an SAT prep website, the robot understood the question but gave the wrong answer.

Taylor Nguyen, a high school physics teacher in Orange County, California, posted a physics problem involving a man on a swing that is typically included on Advanced Placement Calculus tests. ChatGPT made several logical errors to give the wrong answer, but was able to correct itself thanks to Mr. Nguyen’s comments.

“I was able to coach him, but I’m a teacher,” he said. “How is a student supposed to spot these errors? They assume the chatbot is right.

I noticed that ChatGPT-4o got some division calculations right that its predecessors did poorly, so there are signs of slow improvement. But it also failed at a basic math task that previous versions and other chatbots, including Google’s Meta AI and Gemini, failed to do: the ability to count. When I asked ChatGPT-4o for a four-syllable word starting with the letter “W,” he replied “Wonderful.”

OpenAI said it is constantly working to improve its systems’ responses to complex mathematical problems.

Mr Khan, whose company uses OpenAI technology in its Khanmigo tutoring software, did not respond to a request for comment on whether he would leave ChatGPT the tutor alone with his son.

Reasoning

OpenAI also pointed out that the new ChatGPT was better at reasoning, or using logic to come up with answers. So I ran one of my favorite tests on it: I asked it to generate a Where’s Waldo? puzzle. When he showed a picture of a giant Waldo standing in a crowd, I said the point was that he was supposed to be hard to find.

The robot then generated an even larger Waldo.

Subbarao Kambhampati, a professor and artificial intelligence researcher at Arizona State University, also put the chatbot through some tests and said he didn’t see any noticeable improvement in reasoning compared to the latest version.

He presented ChatGPT with a puzzle involving blocks:

If block C is on top of block A and block B is separately on the table, can you tell me how I can create a stack of blocks with block A on top of block B and block B on top of it? above block C, but without moving block C?

The answer is that it is impossible to arrange the blocks under these conditions, but, just as with previous versions, ChatGPT-4o systematically proposed a solution that involved moving block C. With this test of reasoning and Others, ChatGPT was sometimes able to take comments to get the right answer, which is contrary to how artificial intelligence works, Mr. Kambhampati said.

“You can correct that, but when you do that, you’re using your own intelligence,” he said.

OpenAI highlighted test results that showed GPT-4o performed approximately two percentage points higher on general knowledge questions than previous versions of ChatGPT, illustrating that its reasoning abilities had improved slightly. .

Language

OpenAI also said that the new ChatGPT could perform real-time translation, which could help you converse with someone speaking a foreign language.

I tested ChatGPT with Mandarin and Cantonese and confirmed that it was correct to translate phrases like “I would like to book a hotel room for next Thursday” and “I want a king-size bed “. But the accents were slightly wrong. (To be fair, my broken Chinese isn’t much better.) OpenAI said it’s still working on improving accents.

ChatGPT-4o also excelled as an editor. When I fed it the paragraphs I had written, it was quick and efficient in removing excessive words and jargon. ChatGPT’s decent performance in language translation gives me confidence that this will soon become a more useful feature.

Conclusion

One of OpenAI’s main achievements with ChatGPT-4o is making the technology free for people to try. Free is the right price: since we help improve these AI systems with our data, we shouldn’t pay for them.

The best of AI hasn’t arrived yet, and one day it might be a good math teacher we want to talk to. But we should believe it when we see it – and hear it.

Source link

Geometry and physics

Reasoning

Language

Conclusion

Leave a Comment Cancel Reply