
Qunar Hotel International AI-generated video practice

This article is compiled from the theme shared by Zheng Jimin, Technical Director of Qunar Travel, at the WOT2024 conference.

A few days ago, at the WOT Global Technology Innovation Conference hosted by 51CTO, Zheng Jimin, Technical Director of Qunar Travel, delivered a keynote speech "Hotel International AI Generative Video Practice", focusing on the business architecture of Hotel International, detailing how to process AI generated video and related practices and thinking, bringing the audience a new vision and exclusive experience sharing.

This article will excerpt the wonderful content and organize it in a unified manner, hoping to bring inspiration to you.

This article will be expanded into the following four sections:

  • Video Generation Challenges and Opportunities
  • The professional film and television generation process is AI-based
  • AI-generated video practices
  • Video generation results display and thinking
Qunar Hotel International AI-generated video practice

Video Generation Challenges and Opportunities

Let's start by looking at the challenges of video generation for Hotel International.

Qunar Hotel International AI-generated video practice

With the development of AIGC technology, we pay attention to its application in practical business. We recognize that AIGC already has the capability to generate video and that our business needs it accordingly.

Therefore, we first identified the scenario of using AIGC to generate videos.

Next, we consider how to engineer the video production and ensure the quality of the video. Going back to the video itself, there are currently two core elements: copy and images. We need to look at our existing copy and image resources, organize them to generate videos, and end up empowering our existing businesses.

The first problem we face when making a video is the choice of footage. We have the basics of copywriting, image information, and user reviews that need to be carefully selected and used effectively to generate a video.

The difficulty in selecting materials lies in the selection and utilization of information. For example, the quality of hotel images varies, and high-star hotel images are often clearer and become the highlight of the video, but not all hotel images have this feature. In addition, user reviews are available in multiple languages, and translated information may not be directly usable.

In the face of these challenges, we need to find solutions to ensure the quality and accuracy of video content.

Let's talk about the challenges behind the challenges, and we see some opportunities.

Qunar Hotel International AI-generated video practice

We have accumulated a highly diverse range of information around the exotic character of the Hotel International business.

For the viewer viewing experience, video is not only to show the hotel, but more importantly, to convey the diversity of the area where the hotel is located, as well as the characteristics and positioning that are different from the local hotels.

The question we face is whether the production of Hotel International video can be successful, and here are some of the data I list feasible to back it up:

1. Low video coverage in Hotel International. Before we started, we found that the coverage of Hotel International videos was about 19.6%, showing a huge room for improvement.

2. The video has a significant increase in conversions. Last year, we tested high-star hotel videos and showed that videos significantly increased user conversion rates.

3. Domestic hotels have experience in video generation. Domestic hotels have generated videos for low-star hotels, which verifies that we have basic video production capabilities.

Based on these foundations, we identified three essential characteristics that Hotel International video production needs to have: stylistic diversity, content diversity, and element diversity.

Qunar Hotel International AI-generated video practice

The professional film and television generation process is AI-based

We simplify the professional film and television production process into four key steps:

First, plan creativity.

Second, storyboard creation.

Third, on-site shooting.

Fourth, post-editing.

Qunar Hotel International AI-generated video practice

Focusing on these four processes, let's briefly introduce them with the above diagram.

For example, in the clip above, each storyboard will have a copy describing her movements, accompanied by words from the girl or an inner monologue to show the plot.

Through the combination of these elements, we were able to produce a coherent film and television clip. Each storyboard consists of an image or video, copy, and voice, which are edited in post-production to form a complete mini-video. This is a basic process in film and television production.

Based on the above, let's take a look at the AI of the video production process.

Qunar Hotel International AI-generated video practice

Video is essentially made up of multiple storyboards, each of which contains core elements such as images, copy, soundtrack, and more. With the help of AI, the footage is processed to generate the content of each storyboard.

Then, through transitions and special effects, the individual storyboards are smoothly stitched together into a complete video.

Qunar Hotel International AI-generated video practice

AI-generated video practices

Specific to the practice of AI-generated videos, we will also face a problem, that is, the standard for judging the quality of the video - how to generate a content that users are willing to watch?

Qunar Hotel International AI-generated video practice

We conclude that there are the following key factors for a good video:

First of all, we emphasize value and fun, which involves the story design and storyboarding. We need to think about how to design the storyboards and whether there are suitable templates to better connect each storyboard.

Next up is the clear picture quality, and we pursue the HD standard of 1080p and even 4K to ensure that users will not feel uncomfortable due to image quality problems when watching.

Most importantly, the theme of the video should be of high quality. Our goal is not only to be excellent in terms of image quality and design, but also to convey the highlights and features of the hotel, so that users can intuitively feel the charm of the hotel through the video.

Ultimately, we want people to be willing to share a video after watching it.

Based on the basic elements of high-quality video, we plan the business process of video generation.

Qunar Hotel International AI-generated video practice

The first step is material selection. We first extract the images and text materials, and perform deduplication and high-definition processing to ensure the quality of the basic materials. For texts, especially minor languages, we will translate and extract highlights to adapt to the needs of different language environments and reach a practical level.

The second step is the pre-treatment phase. The goal of this phase is to make the images and text meet the basic requirements of the user. We will also input text into the large language model and images into the multimodal large model for reprocessing as needed.

The third step is storyboarding. We make extensive use of camera movement and special effects technology to simulate the user's perspective and dynamic effects when they actually observe the hotel. For example, for hotel exteriors, we will simulate the user's action of approaching the hotel and use the zoom effect; For room type pictures, it simulates the user's gaze movement in the room, and enhances the sense of scene by moving left and right, so that the user feels immersive. In addition, we will also add special effects according to the scene, such as the dissolve and blur effect of the island scene, and the star effect of the night scene, to create a richer artistic conception.

The final step is to compose the template clip. At this stage, we combine each storyboard with narration, compositing each completed storyboard using multiple sets of templates, and ensuring a smooth transition of the video through special effects and music, avoiding stiffness, and finally generating a complete video.

Next, let's take a look at the entire video generation from the business level. Generally speaking, we still use each storyboard as our basic unit to build a business-centric production process.

Qunar Hotel International AI-generated video practice

When building a business model, the underlying is the capabilities of AI technology. These capabilities include text preprocessing, image preprocessing, large language models, multimodal models, storyboard production, and template synthesis.

The advantage of this design is that once the upper-level thinking is determined, we can freely decide to use the appropriate AI capabilities to adapt to the needs of different hotels.

At the top of the business model are business rules. As mentioned earlier, the regional cultural differences of Hotel International are significant, and we use customized strategies to match the positioning of the video to avoid the sameness of the video.

Next, let's focus on templates, templates allow us to assemble storyboards in different ways, and the diversity of our business determines the diversity of our templates.

Qunar Hotel International AI-generated video practice

At present, the core templates are divided into these categories, commercial minimalist style, luxury & luxury style, island style, Japanese style and so on.

We worked with the company's UI team to design a template that matched the hotel's unique features to enhance the overall impact of the video.

In this way, our platform's AI capabilities have been effectively precipitated in many aspects.

Qunar Hotel International AI-generated video practice

Qunar's business lines, algorithms, and AI technology architecture are separate, and work is usually done in a cooperative manner, so we will implement the independent expansion of each AI capability, and the business side will independently select and reuse the required capabilities in the form of plug-ins.

Qunar Hotel International AI-generated video practice

It includes plug-ins with various AI capabilities such as copywriting and image processing.

Next, let's briefly talk about the enhancement of AI for multilingual translation.

Qunar Hotel International AI-generated video practice

Our translation practice shows that when dealing with 27 languages, although traditional neural network + deep learning can achieve basic "Xin" (accurately convey the original meaning), it often lacks "Da" (smooth) and "Ya" (emotion and style).

By using GPT-3.5, we were able to improve the quality of our translations to 7 to 8 points, which is similar to using Google Translate, and even better if we were using GPT-4.

Translations for small languages in particular benefit from large language models, but it is also important to balance costs.

Next, let's talk about AI's enhancement of multimodal generation.

Qunar Hotel International AI-generated video practice

When it comes to generating videos, we've mostly tried the Pika and Runway platforms. Based on Runway's Gen-2 model, we were able to create photorealistic images, such as simulating realistic ocean wave dynamics, by fine-tuning the parameters and ensuring that the content was physically logical. We noticed that without special control, the generated waves may not conform to natural phenomena (above). Therefore, we place special emphasis on the accuracy of physical logic in multimodal generation.

At the moment, Runway is the strongest performer when it comes to generating effects, although its API interface is not yet fully open. Once opened, it is expected to greatly facilitate our multimodal generation efforts.

However, even with powerful tools, parameter adjustments are still crucial.

Qunar Hotel International AI-generated video practice

Video generation results display and thinking

Let's take a look at the results of video generation.

The video below is a typical minimalist business hotel style, with side-to-side movement to simulate the viewing effect of the user entering the room. (For ease of display, the video has been compressed, and the original video resolution is 1080p).

Next up is also a minimalist business hotel, which features the surrounding landmarks.

When making a video of a minimalist business hotel, elements are also customized to highlight the highlights of the hotel and highlight the issues that users are particularly concerned about when they are on vacation, such as whether the hotel in Phuket has a free infinity pool.

Next up is a Japanese-style hotel.

Next, it is a simple manual processing of the island video based on AI capabilities.

There are a lot of special effects and dynamics here, so that the viewer can feel a relaxed and romantic atmosphere.

Finally, let's talk about the data results of the video, which is the display form in our APP, which is played at the current position of opening the details page by default, and the effect is relatively increased by 6% after it is launched.

Qunar Hotel International AI-generated video practice
Qunar Hotel International AI-generated video practice


Qunar Hotel International AI-generated video practice

In the process of AI-generated videos, we have also stepped on the pit and accumulated a lot of experience. For example, at the beginning, we would emphasize supporting 4K to provide a high-definition experience, but considering the loading situation of the actual mobile terminal, we finally chose 1080p as the standard.

For example, when we first started practicing, we were obsessed with using narration to read copy. However, in the actual test, it was found that beautiful background music with high-definition pictures is more suitable for high-end hotels.

In the use of animation and motion pictures, the dynamics of the image will add to the attractiveness, but the accuracy of the physical laws is particularly important.

Going forward, we plan to provide video generation capabilities while enabling customized coverage of high-end hotels. We will tailor the video content to the style of different hotels, including style, scenes, and highlights, and display the corresponding hotel videos for different customer groups, while providing the operation team with the ability to respond quickly to the market and help them successfully reach cooperation with the hotel.

At present, the cost of generating a video is about 1.25 yuan, and the time is about half a minute to a minute, which is an efficient and cost-effective solution.

Author: Zheng Jimin

Source-WeChat public account: 51CTO technology stack


Read on