laitimes

使用AI来修复AI:OpenAI推出CriticGPT

author:The frontier of the AI era

In the current era of AI explosion, one of the main challenges we face is that AI sometimes makes mistakes. What's more, the black-box nature of many AI tools means that catching these errors and understanding why they happen can be very difficult.

使用AI来修复AI:OpenAI推出CriticGPT

OpenAI recently discussed the problem — and a potential solution — in a blog post based on one of the company's research papers. Here, the company released CriticGPT – a GPT-4-based architecture model that identifies and highlights inaccuracies in ChatGPT-generated responses, particularly in programming tasks.

OpenAI's researchers found that when human reviewers used CriticGPT to evaluate ChatGPT's code output, they outperformed those without CriticGPT 60% of the time. This work goes far beyond mere error detection to reshape how we train, evaluate, and deploy AI.

Digging into the details, CriticGPT was trained using Human Feedback Reinforcement Learning (RLHF). This is a method similar to that used by ChatGPT itself. The method involves an AI trainer manually inserting errors into ChatGPT-generated code and then providing feedback on those inserted errors. In the process, OpenAI found that in 63% of naturally occurring bugs, trainers preferred to use CriticGPT over ChatGPT. This is due to the fact that CriticGPT produces fewer small complaints, as well as the fact that CriticGPT does not hallucinate very often.

使用AI来修复AI:OpenAI推出CriticGPT

The study found that identifying specific, predefined bugs was more intuitive than evaluating other aspects of code quality or effectiveness than other attributes (detail or comprehensiveness).

The paper discusses two types of evaluation data: human-inserted errors and human-detected errors. This dual approach provides a more comprehensive understanding of CriticGPT's performance in different scenarios, including human-introduced errors and naturally occurring errors. However, consistency is greatly improved when analyzing data that contains artifact insertion errors that contain reference error descriptions.

This consistent pattern suggests that clearly identifying errors provides a more concrete context for evaluation, allowing developers to make more consistent judgments. But it also raises difficulties in making a consistent assessment of AI-generated opinions, especially when dealing with other aspects of code quality.

In addition, OpenAI points out that CriticGPT doesn't do all the work. They observed that human developers often retain or modify AI-generated opinions, suggesting a synergistic relationship between human expertise and AI assistance.

使用AI来修复AI:OpenAI推出CriticGPT

Obviously, there's more work to be done here, but OpenAI's CriticGPT is a big step towards reducing the error rate generated by models like ChatGPT.

Read on