laitimes

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Quantum Position

2024-06-01 15:10Posted on the official account of Beijing Qubit

Contributed by UrbanGPT team

Quantum Position | 公众号 QbitAI

Spatio-temporal prediction technology, ushering in the ChatGPT moment.

Spatiotemporal forecasting is dedicated to capturing the dynamics of urban life and predicting its future direction, focusing not only on the flow of traffic and people, but also on multiple dimensions such as crime trends. At present, deep spatiotemporal prediction technology relies on the support of a large number of training data to generate accurate spatiotemporal models, which is particularly difficult when urban data is insufficient.

The joint team of HKU and Baidu drew on the idea of large language models to propose a new type of spatiotemporal large language model, UbanGPT.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

The model shows excellent versatility in a variety of urban application scenarios. By combining spatiotemporal dependent encoders and instruction fine-tuning methods, the model enhances the understanding of the complex relationships between time and space, providing more accurate predictions even under conditions of data scarcity. Through a series of extensive experiments, UrbanGPT has demonstrated its superior performance on multiple city-related tasks and demonstrated its strong potential in the field of zero-shot learning.

UrbanGPT, a spatiotemporal large language model

Challenge 1: Label scarcity and high training costs

Although cutting-edge spatiotemporal networks excel in prediction tasks, their performance is limited by their dependence on large amounts of labeled data. In urban applications, it is often very difficult to obtain data, for example, to monitor traffic and air quality throughout the city, which can be quite costly. In addition, the generalization ability of these models is usually insufficient when faced with new regions or new tasks, and they need to be retrained to adapt to different spatiotemporal environments.

Challenge 2: LLMs and existing spatiotemporal prediction models have limitations in zero-shot generalization

As shown in Figure 1, the large language model LLaMA is able to infer traffic patterns based on the input text information. However, when it comes to dealing with digital time series data with complex spatiotemporal dependence, LLaMA's predictive power is limited and can sometimes produce predictions that are contrary to reality. At the same time, while pre-trained baseline models are effective at encoding spatiotemporal dependencies, they may perform poorly in new scenarios (zero-shot scenarios) with no prior experience due to over-adaptation to the original training data.

Challenge 3: How to extend the excellent reasoning power of LLMs to the field of spatiotemporal prediction:

Spatiotemporal data has its own unique properties, which differ from the information encoded by LLMs. Bridging this gap and constructing a spatiotemporal large language model that can exhibit excellent generalization performance in a variety of urban tasks is a major challenge.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△ Figure 1: Compared with LLMs and existing spatiotemporal graph neural networks, UrbanGPT can better predict future spatiotemporal trends in zero-shot scenarios

UrbanGPT, a spatiotemporal large language model

According to the team, this is the first attempt to create a spatiotemporal large language model that can predict multiple urban phenomena on different datasets, especially in scenarios where training samples are limited.

In this study, we propose a spatiotemporal prediction framework called UrbanGPT, which gives large language models the ability to deeply understand the complex interdependencies between time and space. By skillfully combining spatiotemporal dependent encoders with instruction fine-tuning strategies, the framework successfully integrates spatiotemporal information with the inference ability of large language models.

Extensive experiments based on real-world data have verified UrbanGPT's excellent generalization performance in zero-shot spatiotemporal learning scenarios. These experimental results not only highlight the strong generalization potential of the UrbanGPT model, but also confirm its effectiveness in accurately predicting and understanding spatiotemporal patterns, even in the absence of training samples.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Figure 2: UrbanGPT overall framework

Spatiotemporal dependencies encoders

LLMs excel in handling linguistic tasks, but they have difficulties in resolving time series and their evolutionary patterns inherent in spatiotemporal data. In order to overcome this problem, this paper proposes an innovative approach, that is, to integrate spatiotemporal encoders to improve the ability of large language models to capture the temporal dependence in the spatiotemporal context. Specifically, the designed spatiotemporal encoder consists of two core components: a gated diffusion convolutional layer and a multi-level correlation injection layer.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

The gated temporal diffusion convolutional layer encodes different degrees of time dependence at different levels and captures the temporal evolution characteristics with different granularity levels. In order to preserve these temporal information patterns, the team introduced a multi-layered correlation injection layer that aims to incorporate the interconnectedness between different layers.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

In order to cope with the possible diverse urban scenarios, the spatiotemporal encoder proposed in this paper does not depend on a specific graph structure when simulating spatial correlation. This approach takes into account the fact that in the case of zero-shot predictions, the spatial connections between entities may be unknown or difficult to define clearly. Such a design ensures that UrbanGPT is able to maintain its applicability and effectiveness in a wide range of urban environmental conditions.

Space-time instruction fine-tuning framework

Spatiotemporal data-text alignment

In order for language models to accurately capture spatiotemporal patterns, it is key to ensure the consistency of text information with spatiotemporal data. This alignment allows the model to integrate multiple types of data to produce richer representations of information. By combining contextual features in the textual and spatiotemporal domains, the model is not only able to capture complementary information, but also to extract more expressive high-level semantic features.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Space-time prompt commands

When making spatiotemporal predictions, both temporal and spatial dimensions contain rich semantic information, which is essential for models to accurately understand the spatiotemporal dynamics in specific contexts. For example, traffic flow characteristics in the morning are significantly different from rush hour, while traffic patterns in commercial and residential areas are also distinctive. The UrbanGPT framework integrates temporal data and spatial features of different granularities as instruction inputs for its large language model. Specifically, temporal information includes elements such as date and specific time, while spatial information includes data such as city names, administrative divisions, and surrounding points of interest (POIs), as shown in Figure 3. This multi-dimensional integration of spatiotemporal information enables UrbanGPT to accurately capture spatiotemporal patterns at different times and places, significantly enhancing its inference ability on unknown samples.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Figure 3: Spatiotemporal prompt command encoding temporal and spatial information perception

3.2.3 Fine-tuning of spatiotemporal instructions for large language models

There are two major challenges in using large language models (LLMs) for instruction fine-tuning to generate spatiotemporal predictions in text form. First, this type of prediction task relies on numerical data, which has a structure and regularity that differs from that of natural language, which focuses on semantics and syntax, that LLMs are good at. Second, LLMs are usually pre-trained with a multi-classification loss function to predict the next words in the text, which is different from a regression problem that requires the output of continuous values.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Experimental Results:

Zero-shot prediction performance

Prediction of unseen areas within the same city

Cross-region scenarios use data from certain areas of the same city to predict future conditions in other areas that the model has not touched. By carefully analyzing the model's performance in such cross-region prediction tasks, the team found that UrbanGPT demonstrated excellent zero-shot prediction performance. Through the precise alignment of spatiotemporal and text information, and the seamless integration of spatiotemporal instruction fine-tuning technology and spatiotemporal dependent encoders, UrbanGPT effectively maintains universal and transferable spatiotemporal knowledge, so as to achieve accurate prediction in zero-shot scenarios. In addition, UrbanGPT also has significant advantages when dealing with data sparsity problems. Especially in crime prediction tasks, traditional baseline models often perform poorly due to the sparsity of the data, and low recall rates may suggest problems with overfitting. UrbanGPT injects rich semantic insights by integrating semantic information in the text, which enhances the model's ability to capture spatiotemporal patterns in sparse data, thereby improving the accuracy of predictions.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Table 1: Comparison of the performance of cross-region zero-shot prediction scenarios

Cross-city forecasting tasks

To test the model's performance in cross-city predictions, the team selected the CHI-taxi dataset, which was not used during the training phase of the model. The evaluation results in Figure 4 show that the model outperforms other comparison methods at each time point, which confirms the effectiveness of UrbanGPT in cross-city knowledge transfer. By comprehensively considering a variety of geographic information and time elements, the model shows the ability to correlate regions with similar functions with the spatio-temporal patterns of the same period in history, which provides strong support for the realization of accurate zero-shot prediction in cross-city scenarios.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Figure 4: Comparison of the performance of zero-shot prediction scenarios across cities

Typical supervised forecasting tasks

The team also explored the performance of UrbanGPT in supervised prediction scenarios, especially by using a test dataset with a larger time span to test the model's performance in long-term spatiotemporal prediction. For example, the team used data from 2017 to train the model and tested it with data from 2021. The test results show that UrbanGPT has obvious advantages over the baseline model in long-term time span scenarios, demonstrating its excellent generalization ability. This feature means that the model does not need to be retrained or incrementally updated frequently, making it more adaptable to real-world use cases. In addition, experiments also show that the introduction of additional text information does not negatively affect the performance of the model or introduce noise, which further supports the feasibility of using large language models to enhance spatiotemporal prediction tasks.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Table 2: Evaluation of predictive performance in a supervised setting

Ablation experiments

(1) Utility of spatiotemporal context: -STC. When spatiotemporal information is removed from the guidance text, the performance of the model decreases. This may be due to the lack of time-dimensional data, which makes the model rely on the spatiotemporal encoder to process time-related features and perform predictions. At the same time, the lack of spatial information also weakens the ability of models to capture spatial correlations, which makes it more difficult to identify and analyze unique spatiotemporal patterns in different regions.

(2) The impact of using multiple datasets for instruction fine-tuning: -Multi. The model was trained only on the NYC-taxi dataset. Due to the lack of extensive information from different city indicators, this limits the model's ability to deeply present the spatiotemporal dynamics of cities, resulting in unsatisfactory prediction results. However, by fusing spatiotemporal data from multiple sources, models are able to more effectively capture the unique attributes of different geographic locations and patterns that evolve over time, leading to deeper insights into urban complexity.

(3) The role of the spatiotemporal encoder: -STE. The lack of spatiotemporal encoders significantly limits the performance of large language models in spatiotemporal prediction tasks. This highlights the importance of the designed spatiotemporal encoder in enhancing the prediction accuracy of the model.

(4) The regression layer in instruction fine-tuning: T2P. UrbanGPT is directly instructed to output its predictions in text form. The shortcomings of the model in performance are mainly due to the fact that the multi-class loss function is mainly used for optimization in the training stage, which causes the inconsistency between the probability output of the model and the continuous numerical distribution required for the spatiotemporal prediction task. To solve this problem, the team integrated a regression prediction module into the model architecture, which significantly enhanced the model's ability to generate more accurate numerical predictions in the regression task.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Figure 5: UrbanGPT ablation experiment

Model robustness studies

This section evaluates the stability of UrbanGPT in dealing with different spatiotemporal mode scenarios. The team distinguishes regions based on the size of their numerical fluctuations over a specific period of time. Regions with a smaller variance represent a more constant temporal pattern, while regions with a larger variance represent a more variable spatiotemporal pattern, such as a busy business district or a densely populated area. The evaluation results in Figure 6 show that most of the models perform well in regions with low variance and relatively stable spatiotemporal patterns. However, the baseline model did not perform well in regions with high variance, especially in the (0.75, 1.0) range, which may be due to the limitations of the baseline model in inferring complex spatiotemporal patterns of unseen regions. In actual city operations, accurate prediction of densely populated or commercially busy areas is extremely critical for city management, including traffic signal control and safe scheduling. UrbanGPT showed significant performance improvements in regions with variances in the range of (0.75, 1.0), highlighting its superior ability in zero-shot prediction.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Figure 6: Model robustness study

Case Study:

The purpose of this experiment was to evaluate the performance of different large language models (LLMs) in zero-shot spatiotemporal prediction tasks. Based on the experimental results in Table 3, the team can see that the various LLMs are able to generate predictions based on the instructions provided, which confirms the effectiveness of the team's prompt design.

Specifically, ChatGPT tends to rely on historical averages in its forecasts rather than explicitly integrating temporal or spatial data. Llama-2-70b was able to analyze information for specific time periods and regions, but encountered difficulties in dealing with the dependence of numerical time series, which affected the accuracy of its predictions. In contrast, Claude-2.1 is able to efficiently integrate and analyze historical data, and use peak hour patterns and point-of-interest (POI) information to improve the accuracy of traffic trend forecasting. In this study, the UrbanGPT model proposed in this study successfully combines the spatiotemporal context signal with the reasoning ability of the large language model through the spatiotemporal instruction fine-tuning, which significantly improves the accuracy of predicting numerical values and spatiotemporal trends. These findings highlight the potential of the UrbanGPT framework in capturing universal spatiotemporal patterns, confirming its effectiveness in achieving zero-shot spatiotemporal prediction.

UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

△Table 3: Examples of zero-sample predictions for different LLMs in New York City bicycle traffic

Summary and outlook

In this study, UrbanGPT is introduced, a spatiotemporal large language model with excellent generalization performance in diverse urban contexts. By adopting an innovative spatiotemporal instruction fine-tuning strategy, the team successfully achieved the tight integration of spatiotemporal context information with large language models (LLMs), so that UrbanGPT can master a widely applicable and transferable spatiotemporal pattern. The experimental data fully proves the effectiveness of the UrbanGPT model architecture and its core components.

While the current results are promising, the team also recognizes that there are still some challenges to overcome in future research. As part of future work, the team plans to actively collect more diverse city data to strengthen and enhance the application capabilities of UrbanGPT in a wider range of urban computing scenarios. In addition, it is crucial to have a deep understanding of UrbanGPT's decision-making mechanism. Although the model performs well in performance, it is equally important to provide transparency and explainability in the decision-making process. Future research will be focused on developing UrbanGPT models that can explain their predictions.

Project Links: https://urban-gpt.github.io/

Code Links: https://github.com/HKUDS/UrbanGPT

Paper link: https://arxiv.org/abs/2403.00813

Lab Homepage: https://sites.google.com/view/chaoh/home

View original image 48K

  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu
  • UrbanGPT, the first smart city model, is fully open source and open|HKU&Baidu

Read on