laitimes

IBM has launched an innovative framework to evaluate the output of large models in a "black box" manner

author:Not bald programmer
IBM has launched an innovative framework to evaluate the output of large models in a "black box" manner

Compared with performance and evaluation rankings, the accuracy, security, and interpretability of the content output of large models are more important, and it is impossible to talk about these commercializations.

IBM researchers have developed a black-box framework to evaluate the output, confidence, and more of a large model without having to access its internal structure, parameters, or training data.

Address: https://arxiv.org/abs/2406.04370

IBM has launched an innovative framework to evaluate the output of large models in a "black box" manner

In order to elicit the variability of large models in outputs, the researchers propose six different cue perturbation strategies: 1) stochastic decoding, which uses different decoding techniques, such as greedy search, beam search, and core sampling, to generate multiple outputs, thus reflecting the uncertainty of the model's response to them.

2) paraphrasing, by paraphrasing the context of the prompt, such as using reverse translation technology, translating text from one language to another and back again, in order to observe changes in the output. If the paraphrased output is semantically consistent with the original output, this indicates that the model is fairly confident in its output.

3) Sentence arrangement, which tests the consistency of the model's output by changing the order of named entities in the input. If the model is very confident in its output, then the output should remain the same even if the order of the entities changes.

IBM has launched an innovative framework to evaluate the output of large models in a "black box" manner

4) Entity frequency amplification, by repeating sentences containing named entities, to test whether the model will change its output due to the repetition of information.

5) Stop word removal, by removing common stop words, to see if these words, which are usually considered to be less informative, have an impact on the model's response.

6) Segmentation response consistency, by randomly splitting the output of the model into two parts, and using the NLI model to measure the semantic consistency between the two parts.

In addition, based on these strategies, the researchers constructed two features, semantic and syntactic, which were used to train the confidence model. Semantic features mainly focus on the number of semantic equivalence sets output, and if the output of a large model can form multiple semantic equivalence sets, it means that the model is not confident in its output.

Syntactic features are evaluated by calculating the syntactic similarity between outputs, and the higher the similarity, the higher the model's confidence in its output.

IBM has launched an innovative framework to evaluate the output of large models in a "black box" manner

During model training, researchers use a standard supervised learning process to adjust model parameters by pairing features with labels (generated based on how well the output matches the standard answers).

The creation of the label is based on a concise rule: if the LORD score of the model's output with the real answer exceeds a certain threshold (e.g., 0.3), the model is considered correct in its answer to the question (label 1);

Otherwise, it is considered an error (labeled 0). This method is very simple and efficient, and it can effectively distinguish the performance of the model on different problems.

为了评估该框架的性能,研究人员在TriviaQA、SQuAD、CoQA和Natural Questions数据集上,通过在Flan-ul2、Llama-13b和Mistral-7b三款知名开源大模型上进行了实验。

The results show that the framework not only significantly outperforms the existing black-box confidence estimation methods on multiple datasets, but also improves the performance of AUROC indicators by more than 10%.

IBM has launched an innovative framework to evaluate the output of large models in a "black box" manner

According to the researchers, the framework is highly extensible and applicable, and different perturbation strategies can be added to it at any time to detect and adapt to different types of large models. At the same time, it only needs to be trained on a large model for confidence model, which can be applied to similar models in most cases.

Read on