UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Quantum Position

2024-06-01 15:10Posted on the official account of Beijing Qubit

Contributed by UrbanGPT team

Quantum Position | 公众号 QbitAI

Spatio-temporal prediction technology, ushering in the ChatGPT moment.

Spatiotemporal forecasting is dedicated to capturing the dynamics of urban life and predicting its future direction, focusing not only on the flow of traffic and people, but also on multiple dimensions such as crime trends. At present, deep spatiotemporal prediction technology relies on the support of a large number of training data to generate accurate spatiotemporal models, which is particularly difficult when urban data is insufficient.

The joint team of HKU and Baidu drew on the idea of large language models to propose a new type of spatiotemporal large language model, UbanGPT.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

The model shows excellent versatility in a variety of urban application scenarios. By combining spatiotemporal dependent encoders and instruction fine-tuning methods, the model enhances the understanding of the complex relationships between time and space, providing more accurate predictions even under conditions of data scarcity. Through a series of extensive experiments, UrbanGPT has demonstrated its superior performance on multiple city-related tasks and demonstrated its strong potential in the field of zero-shot learning.

UrbanGPT, a spatiotemporal large language model

Challenge 1: Label scarcity and high training costs

Although cutting-edge spatiotemporal networks excel in prediction tasks, their performance is limited by their dependence on large amounts of labeled data. In urban applications, it is often very difficult to obtain data, for example, to monitor traffic and air quality throughout the city, which can be quite costly. In addition, the generalization ability of these models is usually insufficient when faced with new regions or new tasks, and they need to be retrained to adapt to different spatiotemporal environments.

Challenge 2: LLMs and existing spatiotemporal prediction models have limitations in zero-shot generalization

As shown in Figure 1, the large language model LLaMA is able to infer traffic patterns based on the input text information. However, when it comes to dealing with digital time series data with complex spatiotemporal dependence, LLaMA's predictive power is limited and can sometimes produce predictions that are contrary to reality. At the same time, while pre-trained baseline models are effective at encoding spatiotemporal dependencies, they may perform poorly in new scenarios (zero-shot scenarios) with no prior experience due to over-adaptation to the original training data.

Challenge 3: How to extend the excellent reasoning power of LLMs to the field of spatiotemporal prediction:

Spatiotemporal data has its own unique properties, which differ from the information encoded by LLMs. Bridging this gap and constructing a spatiotemporal large language model that can exhibit excellent generalization performance in a variety of urban tasks is a major challenge.

△ Figure 1: Compared with LLMs and existing spatiotemporal graph neural networks, UrbanGPT can better predict future spatiotemporal trends in zero-shot scenarios

UrbanGPT, a spatiotemporal large language model

According to the team, this is the first attempt to create a spatiotemporal large language model that can predict multiple urban phenomena on different datasets, especially in scenarios where training samples are limited.

In this study, we propose a spatiotemporal prediction framework called UrbanGPT, which gives large language models the ability to deeply understand the complex interdependencies between time and space. By skillfully combining spatiotemporal dependent encoders with instruction fine-tuning strategies, the framework successfully integrates spatiotemporal information with the inference ability of large language models.

Extensive experiments based on real-world data have verified UrbanGPT's excellent generalization performance in zero-shot spatiotemporal learning scenarios. These experimental results not only highlight the strong generalization potential of the UrbanGPT model, but also confirm its effectiveness in accurately predicting and understanding spatiotemporal patterns, even in the absence of training samples.

△Figure 2: UrbanGPT overall framework

Spatiotemporal dependencies encoders

LLMs excel in handling linguistic tasks, but they have difficulties in resolving time series and their evolutionary patterns inherent in spatiotemporal data. In order to overcome this problem, this paper proposes an innovative approach, that is, to integrate spatiotemporal encoders to improve the ability of large language models to capture the temporal dependence in the spatiotemporal context. Specifically, the designed spatiotemporal encoder consists of two core components: a gated diffusion convolutional layer and a multi-level correlation injection layer.

The gated temporal diffusion convolutional layer encodes different degrees of time dependence at different levels and captures the temporal evolution characteristics with different granularity levels. In order to preserve these temporal information patterns, the team introduced a multi-layered correlation injection layer that aims to incorporate the interconnectedness between different layers.

In order to cope with the possible diverse urban scenarios, the spatiotemporal encoder proposed in this paper does not depend on a specific graph structure when simulating spatial correlation. This approach takes into account the fact that in the case of zero-shot predictions, the spatial connections between entities may be unknown or difficult to define clearly. Such a design ensures that UrbanGPT is able to maintain its applicability and effectiveness in a wide range of urban environmental conditions.

Space-time instruction fine-tuning framework

Spatiotemporal data-text alignment

In order for language models to accurately capture spatiotemporal patterns, it is key to ensure the consistency of text information with spatiotemporal data. This alignment allows the model to integrate multiple types of data to produce richer representations of information. By combining contextual features in the textual and spatiotemporal domains, the model is not only able to capture complementary information, but also to extract more expressive high-level semantic features.

Space-time prompt commands

When making spatiotemporal predictions, both temporal and spatial dimensions contain rich semantic information, which is essential for models to accurately understand the spatiotemporal dynamics in specific contexts. For example, traffic flow characteristics in the morning are significantly different from rush hour, while traffic patterns in commercial and residential areas are also distinctive. The UrbanGPT framework integrates temporal data and spatial features of different granularities as instruction inputs for its large language model. Specifically, temporal information includes elements such as date and specific time, while spatial information includes data such as city names, administrative divisions, and surrounding points of interest (POIs), as shown in Figure 3. This multi-dimensional integration of spatiotemporal information enables UrbanGPT to accurately capture spatiotemporal patterns at different times and places, significantly enhancing its inference ability on unknown samples.

△Figure 3: Spatiotemporal prompt command encoding temporal and spatial information perception

3.2.3 Fine-tuning of spatiotemporal instructions for large language models

There are two major challenges in using large language models (LLMs) for instruction fine-tuning to generate spatiotemporal predictions in text form. First, this type of prediction task relies on numerical data, which has a structure and regularity that differs from that of natural language, which focuses on semantics and syntax, that LLMs are good at. Second, LLMs are usually pre-trained with a multi-classification loss function to predict the next words in the text, which is different from a regression problem that requires the output of continuous values.

Experimental Results:

Zero-shot prediction performance

Prediction of unseen areas within the same city

Cross-region scenarios use data from certain areas of the same city to predict future conditions in other areas that the model has not touched. By carefully analyzing the model's performance in such cross-region prediction tasks, the team found that UrbanGPT demonstrated excellent zero-shot prediction performance. Through the precise alignment of spatiotemporal and text information, and the seamless integration of spatiotemporal instruction fine-tuning technology and spatiotemporal dependent encoders, UrbanGPT effectively maintains universal and transferable spatiotemporal knowledge, so as to achieve accurate prediction in zero-shot scenarios. In addition, UrbanGPT also has significant advantages when dealing with data sparsity problems. Especially in crime prediction tasks, traditional baseline models often perform poorly due to the sparsity of the data, and low recall rates may suggest problems with overfitting. UrbanGPT injects rich semantic insights by integrating semantic information in the text, which enhances the model's ability to capture spatiotemporal patterns in sparse data, thereby improving the accuracy of predictions.

△Table 1: Comparison of the performance of cross-region zero-shot prediction scenarios

Cross-city forecasting tasks

To test the model's performance in cross-city predictions, the team selected the CHI-taxi dataset, which was not used during the training phase of the model. The evaluation results in Figure 4 show that the model outperforms other comparison methods at each time point, which confirms the effectiveness of UrbanGPT in cross-city knowledge transfer. By comprehensively considering a variety of geographic information and time elements, the model shows the ability to correlate regions with similar functions with the spatio-temporal patterns of the same period in history, which provides strong support for the realization of accurate zero-shot prediction in cross-city scenarios.

△Figure 4: Comparison of the performance of zero-shot prediction scenarios across cities

Typical supervised forecasting tasks

The team also explored the performance of UrbanGPT in supervised prediction scenarios, especially by using a test dataset with a larger time span to test the model's performance in long-term spatiotemporal prediction. For example, the team used data from 2017 to train the model and tested it with data from 2021. The test results show that UrbanGPT has obvious advantages over the baseline model in long-term time span scenarios, demonstrating its excellent generalization ability. This feature means that the model does not need to be retrained or incrementally updated frequently, making it more adaptable to real-world use cases. In addition, experiments also show that the introduction of additional text information does not negatively affect the performance of the model or introduce noise, which further supports the feasibility of using large language models to enhance spatiotemporal prediction tasks.

△Table 2: Evaluation of predictive performance in a supervised setting

Ablation experiments

(1) Utility of spatiotemporal context: -STC. When spatiotemporal information is removed from the guidance text, the performance of the model decreases. This may be due to the lack of time-dimensional data, which makes the model rely on the spatiotemporal encoder to process time-related features and perform predictions. At the same time, the lack of spatial information also weakens the ability of models to capture spatial correlations, which makes it more difficult to identify and analyze unique spatiotemporal patterns in different regions.

(2) The impact of using multiple datasets for instruction fine-tuning: -Multi. The model was trained only on the NYC-taxi dataset. Due to the lack of extensive information from different city indicators, this limits the model's ability to deeply present the spatiotemporal dynamics of cities, resulting in unsatisfactory prediction results. However, by fusing spatiotemporal data from multiple sources, models are able to more effectively capture the unique attributes of different geographic locations and patterns that evolve over time, leading to deeper insights into urban complexity.

(3) The role of the spatiotemporal encoder: -STE. The lack of spatiotemporal encoders significantly limits the performance of large language models in spatiotemporal prediction tasks. This highlights the importance of the designed spatiotemporal encoder in enhancing the prediction accuracy of the model.

(4) The regression layer in instruction fine-tuning: T2P. UrbanGPT is directly instructed to output its predictions in text form. The shortcomings of the model in performance are mainly due to the fact that the multi-class loss function is mainly used for optimization in the training stage, which causes the inconsistency between the probability output of the model and the continuous numerical distribution required for the spatiotemporal prediction task. To solve this problem, the team integrated a regression prediction module into the model architecture, which significantly enhanced the model's ability to generate more accurate numerical predictions in the regression task.

△Figure 5: UrbanGPT ablation experiment

Model robustness studies

This section evaluates the stability of UrbanGPT in dealing with different spatiotemporal mode scenarios. The team distinguishes regions based on the size of their numerical fluctuations over a specific period of time. Regions with a smaller variance represent a more constant temporal pattern, while regions with a larger variance represent a more variable spatiotemporal pattern, such as a busy business district or a densely populated area. The evaluation results in Figure 6 show that most of the models perform well in regions with low variance and relatively stable spatiotemporal patterns. However, the baseline model did not perform well in regions with high variance, especially in the (0.75, 1.0) range, which may be due to the limitations of the baseline model in inferring complex spatiotemporal patterns of unseen regions. In actual city operations, accurate prediction of densely populated or commercially busy areas is extremely critical for city management, including traffic signal control and safe scheduling. UrbanGPT showed significant performance improvements in regions with variances in the range of (0.75, 1.0), highlighting its superior ability in zero-shot prediction.

△Figure 6: Model robustness study

Case Study:

The purpose of this experiment was to evaluate the performance of different large language models (LLMs) in zero-shot spatiotemporal prediction tasks. Based on the experimental results in Table 3, the team can see that the various LLMs are able to generate predictions based on the instructions provided, which confirms the effectiveness of the team's prompt design.

Specifically, ChatGPT tends to rely on historical averages in its forecasts rather than explicitly integrating temporal or spatial data. Llama-2-70b was able to analyze information for specific time periods and regions, but encountered difficulties in dealing with the dependence of numerical time series, which affected the accuracy of its predictions. In contrast, Claude-2.1 is able to efficiently integrate and analyze historical data, and use peak hour patterns and point-of-interest (POI) information to improve the accuracy of traffic trend forecasting. In this study, the UrbanGPT model proposed in this study successfully combines the spatiotemporal context signal with the reasoning ability of the large language model through the spatiotemporal instruction fine-tuning, which significantly improves the accuracy of predicting numerical values and spatiotemporal trends. These findings highlight the potential of the UrbanGPT framework in capturing universal spatiotemporal patterns, confirming its effectiveness in achieving zero-shot spatiotemporal prediction.

△Table 3: Examples of zero-sample predictions for different LLMs in New York City bicycle traffic

Summary and outlook

In this study, UrbanGPT is introduced, a spatiotemporal large language model with excellent generalization performance in diverse urban contexts. By adopting an innovative spatiotemporal instruction fine-tuning strategy, the team successfully achieved the tight integration of spatiotemporal context information with large language models (LLMs), so that UrbanGPT can master a widely applicable and transferable spatiotemporal pattern. The experimental data fully proves the effectiveness of the UrbanGPT model architecture and its core components.

While the current results are promising, the team also recognizes that there are still some challenges to overcome in future research. As part of future work, the team plans to actively collect more diverse city data to strengthen and enhance the application capabilities of UrbanGPT in a wider range of urban computing scenarios. In addition, it is crucial to have a deep understanding of UrbanGPT's decision-making mechanism. Although the model performs well in performance, it is equally important to provide transparency and explainability in the decision-making process. Future research will be focused on developing UrbanGPT models that can explain their predictions.

Project Links: https://urban-gpt.github.io/

Code Links: https://github.com/HKUDS/UrbanGPT

Paper link: https://arxiv.org/abs/2403.00813

Lab Homepage: https://sites.google.com/view/chaoh/home

View original image 48K

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Read on

Just split the perceptual reasoning ability, and the 2B large model can defeat the 20B! A new framework for domestic production

The goose factory made 1 billion virtual personalities to specialize in data synthesis, so that the math score of the 7B model was equal to GPT4

Decoding the artificial intelligence model, this summer science and technology innovation summer camp is full of dry goods

The teachers and students of Trinity Vocational College made a 2-meter-high "Transformers" model by hand

Long Text vs RAG: Who Will Dominate the Big Model Future?

Alibaba Cloud PAI large language model fine-tuning training practice

The teams of Zhejiang University and Tencent released a large-scale evaluation benchmark for scientific LLM, and the domestic large-scale model performed well

Published in Nature, Topological Transformer Model Multiscale Protein-Ligand Interaction Prediction

Zhihu AI user model service performance optimization practice

Seventy-three years ago, Shannon had planted a seed for the development of large-scale models

【Scientific Reports】张静团队开发首个乳腺癌患者认知障碍的预测模型

Model test study of the pressure change law of surrounding rock during the excavation process of construction shaft

Large model technology has become the key, and iFLYTEK Xinghuo V4.0 is favored by enterprises

The large model accelerates out of the "dialog box" and goes deep into the industry

Encountering severe flooding, the "Spicy Prince" factory was suspended, and the person in charge responded: at least 5 days of shutdown

Refuse 50 million for 2 years! missed 80 million in 4 years! Tang Shen joined the Mavericks, and it was too difficult for the Lakers and Warriors

The worst group of the European Cup is born! 4 teams with 4 points went out together, all missed the top 8, and only scored 1 goal

LCK Mosaic 2.0 Team is born! DK's inappropriate remarks caused public outrage, and LPL viewers called for a ban

7.3 KPL preview, Xiaosoft first entered Group A and met a strong opponent again, DRG five kills e star is only one step away

Because his mother was seriously ill, he gave up "Flowers", and now he has been crowned emperor for the second time and won hemp

realme GT6 will be officially released on July 9

Samsung's mobile phone ban and destruction risk lifted: Reached a settlement with China's Datang Mobile

In the era of "intelligence", how to promote the home appliance industry (2) - vertical domain model

Copa America: Brazil 1-1 Colombia Finish second in the group against Uruguay Barcelona Flying Wings

Chery was exposed to forced overtime, causing heated discussions, and employees said that they would be notified if they were backward in ranking, and they worked 120 hours of overtime in May

July 3rd丨Warm City Morning News

After the craze of AI large models, the deep cultivation of "Zhihu direct answer" is used as the pen to draw a galaxy of knowledge exploration

After signing the men's basketball team, the Warriors became the home team of Chinese fans! Stud Markkanen, Curry is satisfied

National feather rising star Zhang Zhijie passed away suddenly! Malaysian legend Lee Chong Wei spoke out in pain, and Lin Dan fired a cannon to attack BWF

5 kinds of plug-in hybrid technology, once Lexus is domestically produced, the range extension within 300,000 yuan will be eliminated?

OPPO A3 officially released: 1599 onwards Super anti-manufacturing/super durable

Warrior God Operation! Li Kaier joined strongly, the lineup made up for the shortcomings, and the average of 23+8 inside the game was also coming

UrbanGPT, the first smart city model, is fully open source and open|HKU&amp;Baidu

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Read on

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu