overview

The rise of large language models (LLMs) has revolutionized the field of natural language processing, but fine-tuning them for specific tasks often faces challenges in balancing performance and maintaining the ability to follow general instructions. In this paper, we consider the distribution gap between the task dataset and LLMs to be the main root cause of the problem. To solve this problem, we introduced the Self-Distillation Fine-Tuning (SDFT) method. It guides the model to rewrite the task dataset and fine-tune it on the resulting distillation dataset to bridge the distribution gap and match the model's original distribution. Experiments on various benchmark datasets using the Llama-2-chat model demonstrate that SDFT is effective in mitigating catastrophic forgetting and achieving comparable or better performance on downstream tasks when compared to normal fine-tuning. In addition, SDFTs show the potential to maintain the usefulness and safety alignment of LLMs.

Address: https://arxiv.org/abs/2402.13669

Code address: https://github.com/sail-sg/sdft

The Landscape of LLM Model Fine-tuning

On the Hugging Face platform, numerous fine-tuning models emerge every day, both with contributions from community enthusiasts and the results of large research institutions. For example, a search for a Llama3-based model yields more than 9,000 results.

Interpretation of the paper | ACL 2024: Self-distillation bridges distribution differences in language model fine-tuning

The Challenge of Enhancing Existing Models: Performance

While fine-tuning a model to improve performance for a specific task may seem simple, the actual operation presents challenges. For example, the training data details of Llama3 released by Meta are not disclosed, and the model has used more than 10 million undisclosed annotation examples. As a result, it is not easy to collect private data that the model has not seen and fine-tune it effectively.

The Challenge of Enhancing Existing Models: Safety

Fine-tuning a large language model can weaken its security. According to an article in ICLR 24, fine-tuned models can be significantly less secure. Even though the model aligns with human values after alignment with RLHF, fine-tuning can still undermine this security. Experiments have shown that even fine-tuning with a benign dataset reduces the security of the model.

The Need for a Better Approach

Two challenges to fine-tuning were mentioned earlier: performance and security. This article explores whether there is a better way to fine-tune the downstream task capabilities while still maintaining the model's original security performance. This involves balancing customization of the model with maintaining its versatility.

The Root Cause of Challenge

Experiments in this paper show that the main challenge of fine-tuning is the difference between the original model data distribution and the fine-tuned data distribution. The image on the left shows the Lama-3-Instruct model's wide range of capabilities, such as code generation, storytelling, and text summarization, which align with human values to ensure the safety of the model. However, when fine-tuning a particular task, the data used often comes from a narrow distribution, which can affect model performance.

Introducing Self-Distillation Fine-Tuning

Based on the above findings, this study proposes a new fine-tuning strategy called Self-Distillation Fine-Tuning (SDFT). This method aims to align the task dataset with the original dataset of the language model to reduce the difference in distribution between the two while preserving the supervisory information in the dataset. SDFT rewrites the target label through the language model to realize the integration of new knowledge and the original knowledge system of the model.

This article further provides a schematic diagram that clearly illustrates the differences between the two fine-tuning methods. The top half of the diagram illustrates Vanilla Fine-Tuning, where a language model is fine-tuned directly on a specific dataset to enhance its performance on a specific task. However, this approach can lead to a loss of other capabilities for the model, resulting in a so-called compromise language model. Comparatively, the lower half of the diagram describes the SDFT method. The method first generates a refined dataset by distillation and then performs fine-tuning on that dataset, aiming to improve the performance of the model on specific tasks while avoiding the impairment of the original ability.

Method: Self-Distillation Fine-tuning

The SDFT method proposed in this paper starts from a selected conversation model, such as Llama-3-Instruct or Llama-2-Chat, filters and rewrites the datasets that perform poorly in specific downstream tasks, generates a refined dataset aligned with the distribution of the original model, and then fine-tunes the dataset to achieve the goal of improving new skills while maintaining the original capabilities of the model. The diagram shows the transition from the original dataset distribution to the model's original distribution, ensuring that the fine-tuned dataset distribution (orange areas) is closer to the model's initial distribution.

The following diagram shows a template for knowledge distillation and an example of its application. This template is based on the Alpaca template and contains the original instructions and responses, which are designed to guide the model to generate innovative responses based on this information. The right-hand part of the demo shows an example of the instructions and the original response, as well as the rewritten distillation response of the model. While the original response only provides a brief answer to the question, the distillation response expands on this and incorporates the model's own body of knowledge to provide a more comprehensive response.

Experiments: SDFT vs. Vanilla Fine-tuning

In the experimental phase, the performance differences between the traditional fine-tuning method and the proposed method are compared in detail.

The experiment focused on three representative datasets of downstream tasks: GSM8K (math problem set), HumanEval (code generation ability assessment), and OpenFunctions (function call ability assessment). The table shows the performance of each model on different datasets after fine-tuning.

The data shows that the performance of the model on a specific fine-tuned dataset is enhanced after traditional fine-tuning. For example, the accuracy of the original language model on the OpenFunctions dataset increased from 19.6 to 34.8. However, this improvement is often accompanied by a decrease in performance on other datasets, such as the decrease in accuracy on the GSM8K dataset from 29.4 to 21.5 after fine-tuning on the OpenFunctions dataset. This contrast highlights the limitations of traditional fine-tuning methods. In contrast, using the self-distillation fine-tuning method proposed in this study, different results were observed, indicating that the method can effectively improve the performance of specific tasks while maintaining the original ability of the model.

In the process of fine-tuning a specific task-specific dataset, the method used in this study not only achieves a performance improvement on the target dataset, but also achieves an accuracy of 36.6, which is comparable to or better than the improvement achieved with traditional fine-tuning, as shown in the area shown in the blue box. It is worth noting that after the fine-tuning of this method, the performance of the model on other datasets decreases less. Taking the GSM8K dataset as an example, the performance drops to 21.5 after traditional fine-tuning, while the performance drops to 29.1 after self-distillation fine-tuning. This phenomenon shows that although the two fine-tuning methods can effectively improve the performance of the model on the target task, the method proposed in this study has a better performance in maintaining the original wide range of capabilities of the model.

This section takes a deep dive into how the model performs in terms of security and helpfulness. Two fine-tuning techniques were compared: traditional fine-tuning and self-distillation fine-tuning. In the chart, the results of the traditional fine-tuning are shown on the left, while the results of the SDFT are shown on the right. Experimental results show that the performance of the model in terms of security and helpfulness is significantly reduced after traditional fine-tuning. In contrast, the model using the SDFT method can more effectively maintain its original level of security and helpfulness after fine-tuning, so as to ensure the reliability and usefulness of the model while maintaining performance.

Analysis: Distribution Gap

In the analysis phase of this study, the distribution difference between the fine-tuned model and the original model was discussed. Through model inference on the dataset, the similarity between the fine-tuned model and the original model in the embedding space was evaluated, and the change of model distribution was measured. In the chart, the red area represents the embedding similarity between the traditional fine-tuned model and the original model, while the green area shows the embedding similarity after the self-distillation fine-tuning method. The results show that the model processed by the self-distillation fine-tuning method has a significant improvement in embedding similarity, which indicates that the method can effectively reduce the distribution change and alleviate the model forgetting problem in the fine-tuning process.

Analysis: Effective across Models

In further analysis, the study aims to verify the generalizability of the proposed method in different models and scales. The previous table focused on how the Llama-2-7b-chat model performed on LoRA fine-tuning. To complement this analysis, this study expands the scope of the study to include the following three different scenarios: first, the implementation of full-parameter fine-tuning on the Llama-2-7b-chat model, which results in significant performance improvements; secondly, LoRA fine-tuning was applied on the Llama-2-13b-chat model. Finally, LoRA fine-tuning is performed on the newly released Llama-3-8B-Instruct model. Experimental results show that in all three scenarios, the proposed method shows better performance than traditional fine-tuning. Summarizing these findings, it can be concluded that the method has shown good results and applicability on different model sizes and architectures.

Take Away

The core findings of this study point to distribution shifts as a key factor in catastrophic forgetting during fine-tuning. In order to solve this challenge, the proposed method uses self-distillation technology to reduce the distribution gap and effectively alleviate the problem of forgetting. The experimental results further confirm that the proposed method not only improves the performance of the model on the target task, but also successfully retains the original ability of the model.

Interpretation of the paper | ACL 2024: Self-distillation bridges distribution differences in language model fine-tuning

overview

Read on