Openai tugs on Chatgpt update. That's what it says, why it matters

The latest news about Chatgpt makes the chatbot too pleasant, Openai said on Friday it is taking steps to prevent the problem from happening again.
In the blog post, the company details the testing and evaluation process for the new model and outlines the issues that updated its GPT-4O model on April 25. Essentially, a bunch of individual changes seems useful to create a tool that is too sticky and potentially harmful.
How much sucking is this? In some tests earlier this week, we asked about a trend that was too emotional, and chatgpt put it in flattery: “Hey, listen – emotion is not weakness; this is one of yours superpower.
“This release taught us many of the courses. Even if we think all the right ingredients (A/B tests, offline evals, expert reviews), we still missed this important question.”
Openai backed off the update this week. To avoid causing new problems, it takes about 24 hours to restore the model for everyone.
The focus on mucodextrin is more than just the level of enjoyment of the user experience. It poses a health and security threat to users, i.e. OpenAI's existing security checks are missed. Any AI model can provide dubious advice on topics like mental health, but overly flattering topics can be dangerously noble or convincing – for example, is this investment a sure thing or how weak you should be looking for.
“One of the biggest lessons learned is to fully recognize how people start using Chatgpt for in-depth personal advice – we didn't even see that a year ago,” Openai said. “At the time, that wasn't the main focus, but as AI and society develop together, it's clear that we need to be very cautious about this use case.”
Maarten SAP, assistant professor of computer science at Carnegie Mellon University, said Sycophantic large language models can enhance bias and hard beliefs. “[The LLM] If these opinions are harmful, or if they want to take actions that are harmful to themselves or others, they may incite their opinions. ”
(Disclosure: CNET's parent company Ziff Davis filed a lawsuit against OpenAI in April, accusing it of infringing on Ziff Davis' copyright in training and operating its AI systems.)
How OpenAI test model changes
The company provides some insight into how it tests models and updates. This is the fifth major update of GPT-4O, with a focus on personality and help. These changes involve new post-training work or fine-tuning of existing models, including ratings and evaluations of various responses to prompts to make it more likely to produce a higher rating response.
Updates to the expected model are evaluated to evaluate its usefulness in various situations, such as coding and mathematics, as well as specialized tests from experts to experience its behavior in practice. The company also conducts safety assessments to understand its response to safety, health and other potentially dangerous questions. Finally, OpenAI conducts A/B testing with a few users to see how it performs in the real world.
Is chatgpt too stupid? You decide. (To be fair, we do ask for a dynamic trend to talk about our overly sentimentality.)
The April 25 update performed well in these tests, but some expert testers showed that the personality seemed to be somewhat insufficient. These tests did not specifically study the squid waves, and despite the questions raised by the testers, Openai decided to move forward. Note, reader: AI companies are in a hurry, which is not always good with well-thought-out product development.
“Looking back, qualitative assessments suggest important things and we should pay close attention to it,” the company said.
Openai said it needs to be the same as other security issues and needs to be the same as other security issues – if there is a problem, stop launching. For some model releases, the company said there will be a phase that opts into the “Alpha” phase before a wider launch.
Evaluating LLM doesn't necessarily make you the most honest chatbot, SAP says, based on whether the user likes the response. In a recent study, SAP and others found conflicts between the usefulness and authenticity of chatbots. He compared it to what the truth isn't necessarily what people want – think of a car salesman trying to sell a car.
“The problem here is that they trust the user's thumb/up response to the model output, which has some limitations because people may praise something more sicophantic than others,” he said.
SAP says OpenAI is correct for being more critical of quantitative feedback, such as user responses up/down, as they can enhance bias.
The issue also highlights how quickly companies push for updates and change to existing users — a problem that’s not limited to one tech company, SAP said. “The tech industry does adopt the 'IT release, every user is a beta tester's approach,” he said. “Before pushing the update process to each user, do more tests before the update can be made clear before these issues become widespread.”