laitimes

China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award

author:New Zhiyuan

Editor: Editorial Department

The 2023 National Science and Technology Award has been announced! This year's selection is the most rigorous, difficult, and competitive in history. Among them, only iFLYTEK won the first prize of the National Science and Technology Progress Award, becoming the first national award in the field of AI in the past decade.

Just yesterday, as soon as the 2023 National Science and Technology Awards were released, they directly topped the hot search list.

China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award

Academician Li Deren of Wuhan University and Academician Xue Qikun of Tsinghua University won the highest honor in the field of science and technology - the highest national science and technology award.

In addition, a large number of scientists and landmark achievements with outstanding contributions have won national science and technology awards.

China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award

Overall, the overall status of this year's awards is as follows:

Major achievements in the field of basic research have been continuously produced, and the Natural Science Award has produced the first prize for 9 consecutive times.

Young and middle-aged scientific and technological talents have become an important force in the mainland's scientific and technological innovation.

Among the general projects of the three awards, about 40% of them are under the age of 45.

More than half of the State Natural Science Awards are under the age of 45.

It is worth mentioning that in the 2023 National Award, Baidu, Alibaba Cloud and other companies have participated in the election, but only iFLYTEK won the first prize of the National Science and Technology Progress Award.

This is the first national award in the field of artificial intelligence in the past decade since deep learning triggered a new generation of AI.

China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award

iFLYTEK, as the first unit, won the award for the project "Key Technology and Industrialization of Multilingual Intelligent Speech"

This is also the first time in 12 years that iFLYTEK has won the National Award after winning the second prize of the National Award in 2002 and 2011 respectively.

Unlike in the past, this year's national award selection can be called the most competitive national award in history, and it is also the most difficult year in the history of the national science and technology award.

The most rigorous, difficult, and competitive session in history

Why?

- The National Science and Technology Progress Award is getting harder and harder

Macroscopically speaking, the National Science and Technology Progress Award is becoming more and more difficult, which is closely related to many reforms.

Since 2017, the State Council has made more and more reforms in the award system and award slimming.

For example, in 2017, the total number of awards was reduced from no more than 400 to no more than 300, and the "recommendation system" was clearly adjusted to the "nomination system".

In 2020, new measures were proposed, such as streamlining nomination materials, diluting the primacy of SCI (Science Citation Index) papers, and allowing foreigners to participate in the selection.

It can be seen that the implementation of the nomination system and the number of awards are the two major directions of the reform of the national science and technology award system.

The first prize of the 2020 National Science and Technology Progress Award is vacant.

- After a two-year hiatus, a large number of excellent results have been accumulated

Another reason is that due to the suspension of the application for the national awards in 2021 and 2022, a large number of excellent results have been accumulated in 2023.

This year, the total number of general projects accepted is as high as 1,261, and the total number of final screening and preliminary evaluation is 301 (including 243 general projects).

At the same time, there are only 29 special prizes and first prizes in the preliminary evaluation (excluding special projects).

It is not difficult to see that 2023 is the strictest, most difficult, and most competitive session in the history of the National Awards.

- Computer and automatic control group, electronic and scientific instrument group, network and communication group are gathered

In addition, there is a large proportion of projects related to computers, electronic information, and AI.

Even the AI experts of major technology companies such as Huawei Chen Haibo, Baidu Wang Haifeng, and Shuguang Lijun have appeared to lead the project to participate in the selection.

Specifically, in this year's formal review stage, there were a total of 86 scientific and technological progress awards related to the field of information.

Among them, there are 45 items in the computer and automatic control group, 28 items in the electronic and scientific instrument group, and 13 items in the network and communication group.

After the preliminary evaluation, only 5 projects remained, and won the first prize of the Science and Technology Progress Award (including 2 in the computer and automatic control group, 2 in the electronic and scientific instrument group, and 1 in the network and communication group).

China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award
China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award
China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award

It can be seen that, similar to the past situation, most of the winners are academic institutions and central enterprises, and private enterprises are very rare.

iFLYTEK's ability to stand out from the crowd shows that a technology company has world-leading scientific and technological hard power, and has a strong AI core and profound AI foundation.

Ten years of sharpening a sword, China's AI "bright sword"

Looking back at the company's entrepreneurial history, iFLYTEK has been deeply engaged in the field of AI for 25 years since its establishment, and has always adhered to the principle of technology and application, and has made related artificial intelligence technologies and industries the first.

From initially focusing on Chinese speech synthesis, it has gradually expanded into the field of speech recognition and semantic understanding; From focusing on both Chinese and English, to covering multiple languages; From a single speech technology to a fusion of image processing and multi-modal perception, multi-dimensional information expression has been realized. and then to the benchmarking of large-scale model technology with the international leading level.

This independent research and development road full of Chinese wisdom not only demonstrates the technical strength of iFLYTEK, but also breaks the long-term monopoly position of foreign companies in this field.

Multilingual intelligent technology, continuous research for nearly 10 years, its main technology has won dozens of world championships:

  • In terms of speech synthesis technology, from 2006 to 2019, iFLYTEK won the Blizzard Challenge International Speech Synthesis Competition for 14 consecutive years;
  • In terms of speech recognition technology, from 2016 to 2023, it has won four consecutive championships in the international multi-channel speech separation and recognition competition CHiME, and in 2021, it won the championship of all 15 language restricted tracks and 7 language unrestricted tracks in the international low-resource multilingual speech recognition competition OpenASR.
  • In terms of multilingual translation technology, iFLYTEK won the IWSLT (International Oral Machine Translation Competition) championship for three consecutive years from 2021 to 2023.

This time, the reason why iFLYTEK was able to "fight its way out" among a group of tough opponents is precisely because of the continuous research and accumulation over the years.

Its award-winning project "Multilingual Intelligent Speech Technology" can be said to be the "bright sword" of China's AI technology.

It is jointly developed by iFLYTEK, top universities such as University of Science and Technology of China and Tsinghua University, as well as well-known enterprises such as Huawei and China Mobile.

At present, the number of languages supported by the technology has reached 69, and at the same time, it also covers 24 major Chinese dialects such as Uyghur, Tibetan, Mongolian, Kazakh, Chao, Zhuang and Yi.

Specifically, the project proposes significant technological innovations in four areas:

1. Decoupled modeling of complex speech signals

In speech recognition, the most challenging scenario is the recognition of far-field, noise, and multi-person voice aliasing, which is a well-known "cocktail party problem" in the industry.

In order to overcome this problem, iFLYTEK proposed a spatiotemporal separation modeling method for multi-channel speech signals, which uses an adaptive speech separation algorithm to estimate the frame-level voiceprint representation of multiple speakers, and combines the feedback of back-end speech tasks to iteratively guide the front-end to achieve accurate spatial separation of multiple speakers and noise.

In addition, in order to decouple the speech content and noise in the speech signal, iFLYTEK also proposed a decoupling representation method for the multi-dimensional attributes of content, prosody, timbre and language, which makes a major breakthrough in the accuracy of speech recognition in complex scenarios.

2. Multilingual shared modeling

In the face of foreign science and technology blockade, there is also a very thorny problem in training multilingual and small language models, which is the lack of knowledge and the scarcity of training data.

iFLYTEK's idea is to classify small languages according to language families, find the common rules of similar languages, and then analyze, model and train.

Based on this idea, they designed the multilingual general phonemic system RGP and the basic language unit SE from scratch to realize the construction of a unified phonemic prosody system for multilingualism.

In the training process, multiple languages of the same language family were put together for shared modeling and common pre-training based on meta-learning, which finally significantly improved the performance of the speech system of small languages.

3. Speech semantic union modeling

There is always a technical problem in voice interaction and speech translation technology in complex application scenarios, that is, it is difficult to understand the deep semantics, especially when it comes to professional fields.

If you can't incorporate semantic understanding into speech technology, it will inevitably reduce accuracy. To this end, iFLYTEK proposes a robust oral language understanding technology with mutual reinforcement of speech and semantics and a trusted text generation technology enhanced by multi-source knowledge.

The former realizes a unified coding network for phonetic-semantic spatial alignment, and uses the technology of multi-task joint training to enhance each other. The latter constructs an information retrieval module based on weakly supervised data, and uses cross-attention fusion in the model to improve the accuracy of professional vocabulary and knowledge citation.

4. Training and inference acceleration on domestic heterogeneous hardware platforms

In the case of increasingly fierce competition in science and technology, independent innovation is an important strategic task.

However, building a localized computing platform is a big problem, and in the process of migrating training and inference to domestic hardware devices, many models are also facing difficulties such as low performance and difficult adaptation, and there is a gap between the computing power and the international mainstream chips.

In order to completely solve the problem of "stuck neck", iFLYTEK proposed two technologies: hardware-friendly variable-length input operator fusion, and joint and unified quantitative perception training.

The former automatically fuses dynamic tensor operators through software and hardware collaborative optimization, which is more suitable for the variable-length input mode of voice, and the performance is optimized to the same level as international mainstream chips.

The latter greatly reduces the difficulty of model deployment through quantitative computing simulation of multi-hardware combination. Only one training is required to achieve "one-click deployment" across hardware platforms.

With relevant technological breakthroughs as the cornerstone, iFLYTEK has taken on the heavy responsibility and has built 5 localized clusters, with 873 million daily services in speech synthesis, recognition, translation, interaction and other applications.

iFLYTEK also cooperated with Huawei to jointly tackle the core problems of the localized computing power base of large models. At present, the first national computing power platform "Feixing No. 1" has been built, filling the gap of the domestic super-large model training platform.

What is striking is that iFLYTEK ranks first in the domestic market share of the voice industry, and accounts for 8.1% of the global multilingual market, and continues to increase.

This is due to the fact that the project has built an independent and controllable multilingual industrial ecology:

Create and lead the intelligent voice industry, and create new categories of intelligent hardware such as intelligent translators, smart office books, and intelligent voice recorders in terms of intelligent software and hardware; In meetings, offices and other scenarios, its services cover more than 50 countries and regions around the world, supporting more than 400,000 meetings, including the National People's Congress and the National People's Congress; On the side of ordinary users, the TV voice remote control service of China Mobile exceeds 100 million households.

Behind the Chinese manufacturing overseas, there is also the support of iFLYTEK's multilingual technology. It supports the cumulative activation of more than 1 billion devices by mainstream mobile phone manufacturers, effectively solving the problem of multilingual "bottleneck" for Huawei and other manufacturers going overseas. In terms of smart cars, it supports Chery, FAW, Changan and other car companies to send more than 2 million sets of orders to the sea.

Worldwide, iFLYTEK provides 5.15 billion translation services every year.

Not only that, iFLYTEK also released the "Multilingual Voice Cloud" platform, undertook the construction of a new generation of artificial intelligence open innovation platform for intelligent voice, and carried out a number of public welfare actions such as "Hearing the Voice of AI".

The next step is the intelligent voice + cognitive model

In the era of general artificial intelligence, the innovation and large-scale model technology in the key technology of iFLYTEK's multilingual intelligent voice complement and promote each other.

On January 30 this year, based on technological breakthroughs such as the decoupling of voice attributes and the spatiotemporal separation of voice signals, iFLYTEK released the "Spark Speech Model" for the first time, and achieved international leading results.

In terms of the effect of 37 mainstream languages, the performance of Xinghuo significantly exceeds that of OpenAI Whisper V3. Among the 24 major languages, the average recognition rate of Whisper v3 is 82%, and the Spark voice model has reached 90%.

China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award

Then, on April 26, iFLYTEK once again debuted the "multi-emotional super-anthropomorphic synthesis" function, which can also realize sound reproduction in one sentence.

This makes the AI like a real person, with richer emotions, vivid oral expressions, laughter, tone, mood, and so on.

China's science and technology "bright sword"! For the first time in the past decade, a project in the field of AI has won the first prize of the National Science and Technology Award

The combination of large model + voice technology has become a major trend in the future development of AI.

The speech technology supported by LLM (large language model) can further improve the effect of speech recognition, synthesis and translation in complex semantic understanding and long text modeling capabilities.

At the same time, based on LLM's powerful speech understanding, knowledge question and answer, multi-round dialogue, and multi-modal modeling capabilities, the use scenarios and application value of intelligent speech technology have been greatly improved.

In the track of voice simultaneous interpretation, automatic customer service, supplementary Q&A, virtual employees, companion robots, service robots, etc., this technology will bring huge industrial opportunities in the future and accelerate the arrival of the era of general artificial intelligence.

By the way, on June 27, iFLYTEK Xinghuo V4.0 will also be officially released, and the base capability is fully benchmarked against GPT-4 Turbo.

At the same time, the Xinghuo voice model will also usher in a new upgrade.

In the future, on the basis of iFLYTEK's leading intelligent voice technology, iFLYTEK Xinghuo will continue to climb and make progress towards the greater vision of "liberating productivity, unleashing imagination, and creating exclusive AI assistants for every enterprise and everyone", and build a better world with artificial intelligence!

Read on