laitimes

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

author:Leifeng.com
From commercial complexes to bakery shelves, they have all been "calculated" by the video model

When you walk into a mall that has just been digitally upgraded, you may find it easier than ever to find a store you like.

You may not realize at all that the bakery you just left is more comfortable than the last time you came because the container has turned 90 degrees.

It's also possible that when it's time to eat, you follow the flow of people and find a new cuisine.

Behind these changes, AI cameras, which once only counted the flow of people and monitored risky behavior, have become so intelligent that they can "calculate" commercial complexes.

The intelligent upgrade of the camera is due to the fact that visual AI has entered a new era of large video models.

The video model allows the ability of AI to jump from the level of a primary school student to the level of a professor, and the scenarios of retail, intelligent manufacturing, urban management, and environmental monitoring, which have already used visual AI, will enter a new era of video model AI.

The most well-known Intel Core CPU and Arc GPU, the combination of Intel Video AI Computing Box, is the easiest key to enter the new video AI era.

Store layouts and shelves that have been "calculated" by AI

The layout and management of traditional commercial complexes rely on experience. For example, the basement floor is a supermarket restaurant, the first floor is cosmetics and jewelry, the second floor is women's clothing and children's clothing, and the third floor is men's clothing.

However, consumer habits are changing, consumers in different regions have different consumption preferences, and the role of experience is decreasing, and the value of AI is becoming more and more obvious.

AI cameras, which have been widely used, can count the flow of people and help customers in shopping malls find lost items faster, but they are not yet effective in attracting customers and improving shopping mall operations.

Video models in the era of generative AI have taken the digitalization of the retail industry to the next level.

Chen Tiesheng, vice president of Beijing Fenglan International Shopping Center, has a lot of experience, Fenglan International Shopping Center, which has a history of 17 years, has undergone two transformations, the second transformation has introduced the digital system of Kaiyu Group, which can count the passenger flow of each elevator and floor of the shopping mall, the characteristics of passenger flow and consumer demand on different floors, and gain an in-depth understanding of the preference of consumers in the shopping center for catering and retail.

With richer data insights, it's easier to optimize store layouts and adjust product assortments and marketing strategies.

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

With the introduction of Kaiyu Group's digital system, Fenglan International Shopping Center has changed from experience management to refined management, which has also brought about an increase in performance, with the number of mall activities increasing by 20% and sales increasing by nearly 30%. In this way, the shopping mall is "calculated" by generative AI.

It can "calculate" large shopping malls, and large video models can also "calculate" stores and shelves.

The Beijing Yingke store of Tous Les Jours, a chain of bakery brands, also used the digital system of Kaiyu Group, and with the help of a new generation of video AI-generated customer arrival maps, it was found that about 60% of customers would go directly to the adjacent checkout counter after passing through the bread cabinet, which led to a relatively small number of customers at the sandwich counter.

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

With a simple adjustment, the operations team turned the sandwich showcase 90 degrees to follow the flow of customers, and the data for several days showed an increase in customers patronizing the sandwich counter.

These two cases fully illustrate that the video model used in Kaiyu Group's digital system has undergone a revolutionary change with the AI used in new retail after 2018, and the video model has greater business value.

A new era from traditional vision AI to large video models

The main reason why traditional vision AI algorithms cannot provide more valuable data and suggestions for applications including retail like large video models is that there are technical limitations.

Algorithms CNN and RNN integrated into traditional AI cameras can characterize video content, such as locations and people. The other can capture movements, such as the direction and trajectory of the person in the video, and it is difficult to remember a person and their trajectory at the same time.

This makes it difficult for traditional AI vision algorithms to provide shopping malls or bakeries with specific customer consumption characteristics to help make operational decisions.

The Transformer architecture of the large video model balances the content representation and the dynamics of the video, and can remember not only the specific person in the video, but also the trajectory of that person's movement.

This is the innovation of the algorithm, the traditional CNN, RNN, LSTM algorithm is like a primary school student who can't touch the bypass, the teacher uses a lot of pictures and texts to teach primary school students knowledge, such as knowing cats, but when primary school students identify, as long as it is significantly different from what the teacher has taught, it may fail to recognize.

And the information transmission of traditional AI algorithms must be carried out sequentially, and if the transmission process is long, the information will be distorted or lost.

Therefore, traditional AI algorithms have poor generalization, and they need to be deployed with professional AI teams to train and deploy them for different scenarios, which not only consumes resources and time, but also takes an extremely long construction period.

Another problem is that traditional video AI solutions need to be deployed centrally, and video streaming data needs to be transmitted to the backend for processing through the network, which poses huge challenges to massive data transmission and data security.

Poor generalization and centralized deployment limit the large-scale application of traditional vision AI and the mining of business value.

Generative AI, which goes a step further than traditional AI, is like a college student who can self-supervised learning and can bypass the class.

In contrast to the learning process of elementary school students, college students do not rely on the teacher's experience for independent learning, and through the study of high-quality materials (representative videos, with accurate natural language descriptions), such as a white cat lying on the sofa in the living room with a proper description, and a large amount of less than good quality materials, such as a video of a gray cat running, which corresponds to the description of the room, after a lot of study, the college student can tell that it is a gray cat running in the room.

Transformer is not only self-supervised learning, but also understands information in context because it does not need to be passed sequentially, which opens up a new world of vision AI and can complete more complex tasks in more scenarios.

For example, in shopping malls, traditional AI video search is limited to limited keywords, and the solution based on the video model can directly search for "find the little boy in white", etc., and the search and location can be completed quickly.

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

The time and accuracy of the results depends largely on the underlying hardware and software.

The most accessible software and hardware base in the era of large video models

Compared with one-dimensional text and two-dimensional images, processing three-dimensional video has higher requirements for the processor, and the appearance of large video models is not long ago, which is a great test of the rapid adaptation of hardware algorithms.

The Intel Video AI Computing Box, built with Intel Core CPUs and Arc GPUs, which are widely used around the world, is the best choice for deploying large video models.

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

Among them, the Core CPU processor can meet the needs of high-speed data processing, computer vision, and low-latency deterministic computing of video large model solutions in video stream reading and data analysis. For complex working environments, Intel has also optimized the stability and reliability of the processor to ensure 24-hour uninterrupted work.

Intel Arc graphics provides computing power for a large number of inference tasks in large video models. The Xe kernel in the microarchitecture integrates the high-bandwidth matrix engine XMX, which provides hardware-based performance acceleration for matrix multiplication and accumulation calculations that are common in AI inference.

Powerful hardware is not enough, and the OpenVINO toolkit ensures that the Intel Video AI Compute Box can quickly adapt to and deploy large video model algorithms.

The inference engine based on the x86 kernel instruction set in the OpenVINO toolkit uses hardware instruction sets to accelerate AI inference. The OpenVINO toolkit can also further optimize the structure of the computational graph, and improve the inference efficiency of the large video model scheme by improving the parallelism of operator computing.

With the help of the powerful computing power of Intel's video AI computing box and the OpenVINO tool suite, Kaiyu Group provides AI acceleration to build a digital mall solution for video models, and effectively sinks the capabilities of video models to various terminal products in shopping malls, including visual terminals and digital badges.

Zhao Yudi, CTO of Kaiyu Group, said, "The use of Arc GPUs, through work suites such as OpenVINO and Intel oneAPI toolkits, gives full play to the potential of Intel Arc GPUs in AI model inference, making model migration and deployment easier and faster, and greatly improving the inference speed of models." ”

Of course, Kaiyu Group will also use more powerful software and hardware of Intel in addition to Intel's video AI computing box to give full play to the advantages of generative AI, provide advanced digital solutions for retail, real estate, campus and other fields, and help users unlock new passwords for digital transformation.

The Intel Video AI Compute Box also has a significant advantage – it is compatible with existing security surveillance systems.

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

Thanks to the more compatible design of the Intel Video AI Compute Box, the new solution can be easily connected to most existing security monitoring systems, and can be quickly deployed and debugged.

On this basis, the generalized performance of the Intel Video AI Compute Box enables richer AI functions and supports a wider range of scenarios.

The huge commercial value of the implementation of the video model

Kaiyu Group's solution is a "cloud-edge-end" architecture design, and the video model deployed based on the edge allows the system to avoid massive network data transmission and make AI response faster.

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

The Intel Video AI Compute Box for data testing at the edge is not uploaded to the cloud, which can also ensure data security and privacy.

Coupled with Kaiyu Group's technical accumulation and rich experience in the field of retail digitalization, the combination of self-developed algorithms and large models can not only help merchants optimize store layouts and innovate marketing strategies, but also significantly improve the efficiency of shopping malls and personnel management.

For example, it can realize the complete identification of the behavior trajectory of "people" in the space across the mirror, realize the accurate statistics of the number of passengers/person-times under the premise of ensuring personal privacy and security, and can also isolate the impact of non-customer behaviors such as shopping guides and security guards on passenger flow data without feeling.

It can not only realize common functions such as traffic statistics and store navigation, but also realize data insights in more dimensions such as store attractiveness, customer flow preference, consumer analysis, trajectory and heat, dwell time, and climbing rate, so as to achieve more refined business operation and management.

From commercial complexes to bakery shelves, they have all been "calculated" by the video model

It can also help mall managers and merchants reduce costs and increase efficiency through automatic inspections of fire escape occupancy, fall identification, non-business hours intrusion, employee vacancy, mobile phone monitoring, and traffic flow statistics.

The large video model has powerful generalization and automatic processing capabilities, reducing the workload and cost of deploying AI in commercial complexes, improving users' ability to deal with emergencies, and can also be applied in multiple industries such as real estate retail, production logistics, park management, and urban management.

The warehousing and logistics park can grasp the vehicle dynamics in real time through cameras, sensors and other equipment, optimize logistics efficiency and eliminate potential safety hazards.

The intelligent manufacturing production line can automatically identify the early signs of equipment failure with the video large model solution, and provide early warning and maintenance.

To solve traffic congestion in urban management, video models can also play a greater role, through the learning of historical traffic video data, grasp the change law of traffic flow and predict the congestion situation in the future for a period of time.

The frozen pre-trained large model has been able to achieve such powerful AI functions, and the large video model will continue to evolve in the direction of understanding longer videos and adapting to richer scenarios.

No matter how the algorithm evolves and how fast it evolves, the Intel video AI computing box based on the powerful Core processor and the Arc processor, as well as the ecosystem of OpenVINO and Intel oneAPI toolkits, are the cornerstones of the landing video model.

For more detailed information, please visit: https://www.intel.cn/content/www/cn/zh/internet-of-things/unlocking-new-password-for-digital-transformation.html Leifeng.com

Read on