laitimes

Large Model Recommender Systems: Progress and Future

This paper will share the application status and future trend of large language models in recommender systems, and will sort out the recent related work and some development directions in the academic community. Most of the content of this sharing is based on the tutorial of the same name on WWW'24, and interested students can check out the full content of the tutorial.

The main content of this sharing includes the following six parts:

1. Background knowledge of applying large models to make recommendations

2. 大模型推荐系统的进展现状,包括 LLM4Rec 和可信的 LLM4Rec

3. Open Problems related to large model recommender systems

4. Possible future research directions of large-scale model recommendation systems

5. Thanks

6. Q&A session

Guest Speaker|Dr. Wang Wenjie Research Fellow, National University of Singapore

Edited and organized|Wu Yeguo

Content proofreading|Li Yao

Produced by DataFun

01

Introduction

1. Background of RecSys

Large Model Recommender Systems: Progress and Future

The diagram shows a simplified workflow for a recommender system that reflects how the recommender system interacts with users.

The recommendation system generates personalized recommendation lists for users based on their past interaction history, interactions, and contextual information through a multi-stage or complex filtering system. When users interact with a recommended list, they are influenced by external information and their surroundings to interact with the recommended list, such as clicking, purchasing, or favoriting. This user feedback is collected and used for the next step in the training of the recommender system, forming a feedback loop.

Large Model Recommender Systems: Progress and Future

In the past, the process of generating a list of recommendations would be divided into multiple stages. The first is to carry out a recall, using collaborative filtering (CF), using algorithms such as matrix factorization to recall items, and perform a coarse-grained filtering.

Large Model Recommender Systems: Progress and Future

Of course, the system will also consider more complex context information and some fine-grained characteristics of users and items, and perform more fine-grained operations such as ranking and reranking.

2. Benefit of LMs

Large Model Recommender Systems: Progress and Future

There has been a lot of work on applying language models to recommendations:

  • 借鉴语言模型的 model architecture,用 transformer 实现对 sequential pattern 推荐的建模。
  • Borrow the form of BERT's task formulation to customize the recommended task.
  • The text information extraction ability of the language model is used to represent the user's behavior and make use of the knowledge of the relevant context.
  • 在学习范式上学习语言模型,如 Pretrain-finetune、Prompt learning 等。
Large Model Recommender Systems: Progress and Future

Before the advent of large language models, language models provided quite a lot of inspiration for the construction of recommendation models, such as:

  • 利用 Architecture 的 BERT4Rec、SASRec 应用了 transformer 和 self-attention 的结构;
  • Use the language model as an item encoder for semantic information extraction;
  • Use language model to unify multiple recommendation tasks, and use the same model to do different things, such as P5, M6-Rec, etc., all use the same language model to unify multiple recommendation tasks.
Large Model Recommender Systems: Progress and Future

Since the advent of ChatGPT, it has been found that when the scale of the model is scaled up to a certain extent, the large model will have very strong capabilities, especially in terms of language understanding, language interaction, and rich world knowledge, which is very different from previous language models.

Large language models have strong in-context learning capabilities, and previous language models were not natural enough to have conversations and could not learn how to use an inference strategy to answer questions based on few-shot examples. Based on the previous use of language models to unify multiple recommendation tasks, researchers also tried to use large language models to unify each recommendation task to do recall, ranking, recommendation interpretation, and invoking tools to assist users in traditional recommendation tasks.

02

Progress of LLM4Rec

Next, we will introduce the main directions of using large language models to make recommendations, and some representative work in each direction.

1. LLMs for Recommendation

Large Model Recommender Systems: Progress and Future

As mentioned earlier, recommendation models can learn a lot from language models, such as model architecture, learning paradigms, representation learning, and so on. Furthermore, large language models provide more things for recommendation models to learn from on this basis:

  • Strong interaction capability of large models. It can provide users with a better interactive experience, the main interaction with users in the past is through passive feedback, a big reason why users are unwilling to interact with the recommendation system or give instructions is that users are unwilling to spend time and energy to provide information, and it is best for the system to guess what they like, and when the model is very smart, for example, even if the user only provides very simple information, they can enjoy a greatly improved user experience. Users are also likely to be willing to provide some simple instructions to help the system significantly improve the accuracy of recommendations. In terms of the way of interaction, there will be certain changes in the recommendation model in the future.
  • The generalization ability of large models. Large models have strong generalization capabilities in multiple domains, and with the generalization capabilities of Zero-shot or Few-shot In-context Learning, large models can use their own World Knowledge to improve the generalization of models. There are also many scenarios that need to be generalized in the recommendation, such as cross-domain scenarios, which require the model to have the ability to reason and generalize based on world knowledge.
  • Ability to generate large models. We hope that the recommendation system can customize personalized content for users, and conduct personalized content generation, which can not only recommend human-generated content, but also generate content through AIGC, and interpret the recommended results.

2. Progress of LLM4Rec

Large Model Recommender Systems: Progress and Future

The following is a review of the existing work through three dimensions.

(1) Indicators.

What metrics are being optimized for large language models to make recommendations? Because the focus of different jobs is different, recommendations are also pursuing many different goals. For example, we will first optimize the accuracy of LLM4Rec, and then consider various trustworthiness metrics, such as Fairness, Privacy, Safety, and Out-of-Distribution (OOD) generalization.

(2) Information modality.

The most basic is of course the text modal. For recommendations, the earliest thing to explore is how to use the information of the item text modality and the context text modality, and use the large language model to reason about the text modality. In many scenarios, the multimodal information on the user side is relatively fixed, for example, the user may have an avatar and text description, but item generates a lot of video and image every day, including the corresponding text description, tag and other multimodal information. Multimodal information is very useful in many scenarios, how to make good use of video and image modal information, and how to combine these multimodal information into the architecture of LLM4Rec, is also a dimension in exploration.

and (3) how LLMs are technically used to make recommendations.

There are three main categories: the first is In-context Learning, which does not require Tuning, and directly uses the LLM that has passed through Pretraining and Instruction Tuning, you can give him a few examples of few-shot, and the LLM can learn to generate output based on the input of the task, and it is the same in the recommendation, we treat the recommendation task as a new task, and then use some Few-shot's data sample goes to the instruct LLM to do the recommendation task; The second is Tuning, which uses the recommended data to train the parameters of the model to better complete the recommendation task; The third is the Agent, the Agent is a very special existence, will use In-context Learning, Tuning, RAG and other technologies, the reason why it is put forward separately is because there is indeed a certain uniqueness in the recommendation, and there are still some differences between the related work and the work of only doing accuracy, and some work specifically uses the Agent to complete the corresponding recommendation tasks.

3. In-context learning

Large Model Recommender Systems: Progress and Future

In-context learning is relatively simple to make recommendations, and the most straightforward idea is to put a few data samples (user history preference data) into the prompt to instruct LLM to generate whether the user likes it, which can achieve Point-wise, Pair-wise, List-wise and other different ways of output. In addition, there is some work to use the in-context learning method of large models to achieve data augmentation, such as supplementing the user's information with misinformation, generating a text description through the user's own interaction history and profile inference, and then inputting the enhanced data into the traditional recommendation model.

To sum up:

  • Since the large model has not been trained on recommendation data, it generally uses language to describe the history of items and users, and needs to be given a candidate list for the large model to do point-wise, pair-wise or list-wise ranking.
  • The other type of task is to use in-context learning to extract or generate some information to assist the traditional recommendation model to do a good job of recommendation tasks.

4. Tuning LLM4Rec

Large Model Recommender Systems: Progress and Future

The performance of using in-context learning to make recommendations is not good, because there is a gap between the pre-training and recommendation tasks of the large model, and the pre-training of the large model does not cover the instruction processing of the recommendation task, nor have we seen a lot of different recommendation task data formats.

Large Model Recommender Systems: Progress and Future

In order to eliminate this gap, the large model needs to be aligned to the recommended task.

A lot of the existing work is using LLM4Rec with large models to explore how to better train a large language model, so that he can not only do a good job of recommendation tasks, but also be able to make good use of the knowledge obtained in the pre-training stage of the large language model itself before Tuning.

The existing Tuning LLM4Rec related work can be divided into two categories based on the task level:

  • The first type is the discriminant paradigm or method, which, like the traditional recommendation task, provides a candidate set and asks the large model to choose one of them through instructions. The corresponding prompt may be: this is the user's interaction history, please help me recommend an item that the user may like, and then provide a candidate, let the large model tell me whether the user will like this item, or choose one from the following two pair-wise items, or choose one from a list, basically given the candidate of the item for the large model to make a certain judgment, so we think that from the task level it is discriminative, Although the large model is indeed generating next tokens, at the task level, it is to judge whether the user likes it or not by giving the history and candidates, which is more like the task of CTR, where the user and the item pair are given to predict whether the user likes or clicks on the probability of whether the user likes or clicks.
  • The second type is generative, which does not provide candidates for items, and directly asks the large model to generate the next thing the user clicks on or the item they like after the given interaction history. This requires the large language model to be trained on the data to be recommended, and knows what kind of candidate items are in the proposal, which is generated.

For the first category, it is divided into two small subcategories, the first one is to train only some parameters, such as one Lora, or only one or two layers; The second is the retraining of all the parameters of the entire LLM, called full tuning.

5. Tuning LLM4Rec: TALLRec

Large Model Recommender Systems: Progress and Future

Let's start with a solution that trains only a subset of parameters, TALLRec, which we published in Recsys'23. Take a relatively large LLM, such as 7B's Llama, and take a Lora to do fine tuning. Fine tuning LoRa actually has relatively few parameters, but we found that only tuning these relatively few parameters, with not a lot of samples, or even dozens of samples, can greatly improve the effect of LLM model recommendations. Only a small number of samples and a small number of trained parameters are required to quickly adapt the LLM to new recommended tasks. This also proves that when a large model is generalized to a task, it only needs to use some samples to tell him how to follow the format of the task to do certain inferences. Not only does it not require many samples, but it also makes use of its own pre-trained knowledge. Of course, here we use the title of the item as the input, which is in the same space as the text information pre-trained by the large language model.

Large Model Recommender Systems: Progress and Future

We've found that the few-shot tuning method with item title also works well to generalize to some cross-domain recommendations. For example, it can be trained on a movie and then tested on a book, or trained on a book and tested on a movie, which can also have a good effect, once again proving its generalization ability.

6. Tuning LLM4Rec: InstructRec

Large Model Recommender Systems: Progress and Future

This section introduces InstructRec, a model that is fully tuning and trained with full parameters. Compared to TALLRec, there are some other differences in addition to the training parameters. InstructRec believes that user instruction is very diverse, with more obvious instructions and more obscure instructions, and different instructions represent different requirements. In this work, we first build a very diverse set of recommendation instructions, and then follow different instructions to do different tasks through the built instruction data training model, such as Product Search, Personalize Search, and Sequential Recommendation. Different tasks will involve different instructions, of course, the same task also has different instructions, for example, some users say it more obviously, and some users say it more obscurely.

Large Model Recommender Systems: Progress and Future

There are a lot of templates designed for this work. First of all, build the above triples, including the user's Preference, the user's Intention, whether it is vague or specific, and the user's Task, whether the task is pairwise or listwise, etc. Templates can be used to assemble many different instructions.

有了这些 instruction 之后,就可以基于用户的历史 interaction 数据造一些 instruction tuning 的 data 出来,再利用生成出来的 instruction 去训练模型。

7. Tuning LLM4Rec: BIGRec

Large Model Recommender Systems: Progress and Future

The generative approach is to give the history and go straight to what the next item will be. For example, when we describe the user's interaction history in natural language as input, the large model needs to know how to represent the item in order to generate the next item, and it needs to be represented in the form of a token sequence.

We can use the item title of each item as an identifier, and then we can use the user's interaction history to organize it into natural language to form a pair with the title of the next item for training. The problem with this is how to make the large language model know what kind of sequence is a usable title, and although we can do this through instruction tuning, the large language model will inevitably generate some non-existent titles during inference.

There are currently two means:

  • One is to first free generation and then add the form of grounding, first let the large model arbitrarily generate the item title, because the large model has been trained by instruction tuning, so the token sequence it generates is already in the space of the item title or has been relatively similar, although there may still be some words, such as 3 out of 10 words generated incorrectly, But we'll add a grounding stage and match it to one of the most similar items.
  • 另一类手段叫做受限制的生成(constrained generation)。
Large Model Recommender Systems: Progress and Future

In the current work with item titles, we have found that it can also achieve good recommendation effects with very few samples, because it is relatively easy for large language models to understand what item titles roughly mean.

Large Model Recommender Systems: Progress and Future

In the work, it is also found that the large model can quickly adapt to the recommendation with a few samples, but then add samples to it, the improvement is not obvious, not like the traditional model, when the data increases to a certain scale, the effect will be suddenly improved, its slope is relatively slow, and the percentage of each improvement is very small compared with the previous one. When there is more data, the traditional recommendation model can make better use of collaborative information, such as what kind of person will like what kind of item, which is why the traditional recommendation model will do better only when the data accumulates a certain amount. However, one reason BIGRec's performance improvement is not as significant is that it does not make good use of the collaborative filtering information in the interaction history.

Large Model Recommender Systems: Progress and Future

We can calculate the popularity of the item, and let BIGRec take into account the statistical information of the item when making inferences, such as popularity, what kind of people will like what kind of item, etc., and the recommendation effect of injecting statistics will be significantly improved compared with not injecting statistics.

8. Tuning LLM4Rec: TransRec

Large Model Recommender Systems: Progress and Future

下面讲一下前面提到的 constrained generation 怎样去生成合法的 item identifier (标识符)。

There are two key steps to using LLM4Rec to achieve this:

(1)怎样把 item index 到 language model,让模型知道这个 item。 item indexing 目前有很多种方法,text base 可以用 item title、item attributes 和 description 以及 ID-based,直接像以前传统推荐模型一样去生成一个 ID,当然这种情况就需要大量的数据去做训练。 还可以用 codebook 的形式去生成 item 的 identifier。

(2)怎么去 generate 下一个 item,就是 Generation Grounding 的步骤。 可以直接生成 item title,生成 item identifier,也可以直接接一个分类层去做判别。

Large Model Recommender Systems: Progress and Future

The idea of this work is to represent an item from multiple sides, construct a multi-facet identifier, so that the ID can be used to capture the subtle differences between items, the item title can be used to cover the semantic information of the item and be aligned with the pre-trained knowledge of the large model, and the item attributes can also be used. Intuitively, you can use a tree to restrict the model from generating only legitimate paths when generating tokens, for example, in Beam Search, the first token is generated, and then the second token is generated, and after the first token is generated, there is a restriction on what the second token can only generate, only in legitimate items identifier collection. This work uses a technique called FM-index, which ensures that the item identifier is generated from anywhere, and the restriction is enforced on a tree-by-tree basis.

9. Tuning LLM4Rec: LC-Rec

Large Model Recommender Systems: Progress and Future

There are three types of item indexes, one is ID, one is text, and one is codebook. Codebook can use the multimodal information of items, text information, and semantic information, and can use AutoEncoder (RQ-VAE) to encode these information into multiple code combinations, that is, a codebook has a lot of vectors pre-stored, these vectors are like an embedding matrix, and this embedding can be learned. For example, if you input a text description, you can use the language model to convert it into a dense vector first, and then use AutoEncoder to compress it, and calculate the combination of which vectors in the codebook can be split into dense vectors in the compression process, for example, vector1 plus vector3 plus vector4 equals the compressed vector, then its code is 134. This form of codebook requires a large amount of data to be trained, because the meaning of code is not known to the large model, and it can only be inferred by feeding enough data through the training of multiple tasks.

10. Agent for Recommendation

Large Model Recommender Systems: Progress and Future

Above we talked about how to use the tuning method to do a good job of accuracy, mainly using the information of the text mode, and then briefly introduce the work related to the agent.

LM 赋能 Agent 做推荐主要有两类:

  • One is Agent as User Simulator, which simulates user behavior and does interactive evaluation, as an evaluator to evaluate the effect of the recommendation algorithm. There is a lot of work in this direction, and two representative ones are listed in the figure above.
  • The other type is Agent for Recommendation, which directly uses Agent to make recommendations. In addition, the agent can call tools to meet the various information needs of users like the master brain, such as interactivity, which can be used to interact with natural language and additional means of interaction. Of course, you can also take advantage of other capabilities of LLMs, such as planning capabilities.

11. Agent: BiLLP

Large Model Recommender Systems: Progress and Future

Recommendations have long-term goals, such as optimizing the retention rate of users is the long-term goal of multiple rounds of recommendations, and ensuring the diversity and fairness of recommendations in multiple rounds of recommendations is also a long-term goal.

BiLLP is a work of SIGIR'24 that divides how to optimize long-term metrics into two phases: one is macro-level planning and the other is micro-level learning.

  • First of all, it is necessary to plan at the macro level what kind of strategy should be implemented in each round of the process of making multiple rounds of recommendations. For example, in order to ensure the diversity of long-term recommendations, what needs to be promoted in this round, what is recommended in the next round, and what is promoted in the next round.
  • Then, at the micro level, we will carefully portray what is going to be pushed in this recommendation. In each round, the recommendation strategy is optimized at a fine-grained and microscopic level while following the guidance.
Large Model Recommender Systems: Progress and Future

In the experiment, the effect of the scheme on long-term recommendation was found through multiple rounds of evaluation.

12. Trustworthy LLM4Rec

Large Model Recommender Systems: Progress and Future

In addition to optimizing accuracy, there is also work on large language model alignment, such as optimizing Fairness, considering Privacy (using model unlearning and federated learning methods), and considering Attack, Explanation, and OOD. We have listed the relevant work here, if you are interested, you can read more about it.

03

Open Problems

1. Open Problems

Large Model Recommender Systems: Progress and Future

Here is a brief introduction to some possible open problems, focusing on my understanding of this matter.

  • Modeling: How to represent the data in the recommendation, so that the LLM can really use the pre-trained knowledge combined with the recommended data for inference. Modeling is not particularly good, we have not found a strong generalization ability in cross-domain or cross-recommendation tasks, and there is no basic model of the recommendation domain.
  • Cost: Large model recommendations have problems with delay and cost.
  • Evaluation: How to evaluate the effectiveness of large model recommendations.

2. Modeling

Large Model Recommender Systems: Progress and Future

How to tokenize the recommended data is relatively difficult, because the recommendation is relational data and not fact data or not a pure text data. In order to describe the complexity of the relationship, for example, there are N entities that are O(N^2), and how to describe their relationship is very complicated, especially if you need to model the coordinated information.

3. Cost: Training

Large Model Recommender Systems: Progress and Future

In the process of continuously retraining the recommendation model, how to ensure that the data-efficient reduces the cost is also a problem worthy of attention. How to use less data to better fine tuning the recommendation large language model, and do a good job of recommendation task is a representative job.

4. Cost: Inference

Large Model Recommender Systems: Progress and Future

In addition to this, inference acceleration is also important. Some of the existing ideas are to train a large model first, then distill it into a small model, and then use the small model for inference in the inference stage.

5. Evaluation: Data Issues

Large Model Recommender Systems: Progress and Future

There are a lot of recommended datasets, but many of them are not well suited for LLM4Rec tasks. Because our demand for non-anonymized semantic information is relatively high, before the large model came out, many datasets were relatively old, and some large models had already seen some of the recommendation data when they were trained, although they could not say that they had seen all of them, but they should have seen some of them, so how to ensure fair testing is a problem. We need some new evaluation data that contains semantic information that has not been seen by large models.

Another problem is how to evaluate interactivity, for example, in conversational recommendations, how does the policy change, how does the user's behavior change, such an interactive recommendation evaluation is also difficult, including how to evaluate the implementation of long-term indicators in multiple rounds of recommendation is also a difficult problem.

04

Future Direction & Conclusions

Let's briefly introduce what other future directions are worth doing in addition to these open problems.

1. Generative Recommendation Paradigm

Large Model Recommender Systems: Progress and Future

In the past, the content in recommendations was all human-generated, expert or user-generated content, such as expert-generated news, user-uploaded short videos, etc., but we believe that AI-generated content will further enrich the recommendation ecosystem in the future.

On the one hand, AI can help creators create, there are already tools that can help users make short videos, and in the future, the ability of AI can help creators and experts generate more diverse and personalized things. For example, the same cooking video can generate different versions for each person, such as a condensed version for the elderly and a more detailed version for the younger ones. AIGC has greatly lowered the threshold for creation, and personalized content generation will be an important research direction in the future.

2. Rec4Agentverse

Large Model Recommender Systems: Progress and Future

The other one is the Rec4Agentverse. On OpenAI's agent platform GPTs, more and more people are starting to contribute a variety of agents, each with its own strengths. In the future, there may be different companies that develop their own unique agents, and these agents will form a very diverse universe of agents, some are good at fashion recommendations, and some are good at travel planning. So how to carry out personalized filtering among so many agents, that is, what kind of agent the user needs, which agent can provide him with more personalized services, you need an Agent Recommender, who can make personalized recommendations of the Agent, can interact with the Agent, and let them provide personalized services for the user like calling a tool.

3. Action Speaker Louder than Words

Large Model Recommender Systems: Progress and Future

The function model in the recommendation is also explored, which is the architecture of a Generative Recommender that Meta is doing.

Large Model Recommender Systems: Progress and Future

They found that using generative models for recommendations could achieve better results than traditional models when the amount of data and parameters increased. This effect has been found on some open source and Meta's own data, and a certain Scaling Law has been verified.

4. Large Behaviour Model

Large Model Recommender Systems: Progress and Future

We already know some facts about the Large Behaviour Model, such as the fact that Scaling Law exists in certain scenarios and can be generalized relatively well.

But there are still some things that are unknown:

  • How to integrate the world's knowledge to make recommendations, and whether you really make good use of the pre-trained knowledge.
  • How to tokenize the user's behavior, the user's behavior includes both the item, the user's behavior type, and different contexts, and whether it can be aligned with the pre-trained information is also a key issue that needs to be explored in the future.
  • How to model short-term and long-term behavior, as well as planning short-term and long-term goals, are also questions that need to be considered for future Large Behaviour Models.
  • It is very difficult to understand the user and accurately predict the user's next behavior, which may involve modeling the user's behavior, but where is the bottleneck of this problem? Whether it is because the input information is not enough, or because the understanding of user information is not enough, or the reasoning ability of the model is not good, or the patterns of the modeling data are too diverse, where is the bottleneck is a question worth exploring in the future. If we can do a good job of simulation and can predict the user's next behavior well, we think it can approximate Ranking is a recommendation, because the two are equivalent to a certain extent, that is, when I can predict the user's next behavior well, I can know what is better for the user, and the relationship between them is complementary, which is also a problem worth considering in the future recommendation model.

05

Thanks

If you are interested in learning more about this, you can browse the tutorial given in the picture above and our official account "Zhijinge", and we will continue to introduce the latest work at SIGIR'24.

06

Q&A session

Q1: What is the length or magnitude of user behavior used in Action Speaker Louder than Words work? What kind of data does it come from?

A1: This work is for Meta, and they don't say what the specific scenario is in the paper, which may involve Meta's privacy policy. It is probably guessed that it is Meta's main social platform, which has micro video and image information, including post, etc.

Q2: Is there a recommended scenario to use LLMs now?

A2: Nowadays, the industry may apply LLMs to enhance the existing recommendation algorithms or recommendation models, and the industry will not completely overturn the existing recommendation algorithms, and generally iterate on versions. It's not realistic to use LLMs completely, after all, the cost is relatively high, for example, Meta action is not a classic LLMs that work, and the architecture and training methods will be modified.

Q3: How does LLM help recommender systems incentivize users to generate better content?

A3: That's a great question. Content generation is important in the recommended ecosystem. However, from the academic community, the attention to this aspect is relatively low, and the academic community believes that the recommended content is given or there is there, but in fact, the content is still very important for a platform, because the algorithm carries out a certain filtering, first of all, it must ensure that there is good content, and good content is a very important part of the recommendation ecology. In this context, we feel that large language models, including multimodal models, will definitely be able to help users generate more fancy content more easily in the future, for example, the same video can easily generate many different versions of videos at low cost.

现有一些工作是在做 personalized content generation,我们有一个工作发表在今年的 SIGIR'24 上。 华为也有一个工作叫做 PMG,在做 Personalized Multi-modal generation。

That's all for this sharing, thank you.

Large Model Recommender Systems: Progress and Future

Read on