
iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

author:Quantum Position

Swimming Fish from Au Fei Temple

Quantum Position | 公众号 QbitAI

It's explosive! iFLYTEK Xinghuo showed off its voice recognition capabilities, and the applause at the scene was thunderous——

Three people speak at the same time, coupled with background music, such a scene of strong interference, but the large model said that they can understand and hear clearly, and it is instantly converted into text, and the "cocktail party" problem of voice recognition is not a problem~

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

Well, I only heard the last Peking duck, who understands......

I have to admit that the iFLYTEK press conference, which is held once every few months, is full of dry goods every time, and this time it also brought surprises.

iFLYTEK Xinghuo 4.0 version is coming, and this time the capabilities of the 7 bases have been improved, ranking first in the eight lists, and comprehensively benchmarking GPT-4 Turbo.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

In addition, iFLYTEK Xinghuo APP/Desk and voice model have also ushered in a number of upgrades.

Come and see what kind of new release there is this time~

How strong is iFLYTEK Spark 4.0? No. 1 on the eight lists

First of all, let's take a look at the new upgrade of the base model iFLYTEK Xinghuo 4.0, mainly in these aspects:

  • In terms of basic capabilities: text generation, language comprehension, knowledge question and answer, logical reasoning, mathematical code and multimodal capabilities have been fully upgraded, and GPT-4 Turbo is fully benchmarked;
  • The ability of image and text recognition is also being continuously upgraded, especially in the complex understanding of layout, text recognition that integrates the semantics of the text, and symbol recognition in professional fields, which are stronger than GPT-4o in scientific research, finance, medical care, justice and other industries.
iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference
  • Complex instruction, complex logical reasoning, spatial reasoning, mathematics, and multi-modal understanding based on logical relationships have also been improved. For example, the logical relationship of the content in the graph can be sorted out according to several graphs, and the improvement of these capabilities can accelerate the practical application of large models.
iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference
  • Among the 12 mainstream Chinese and English test sets at home and abroad, Xinghuo V4.0 has achieved 8 firsts, including Chinese and English tests in the dimensions of comprehension and reasoning, comprehensive test, and mathematics.
iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

However, Liu Qingfeng admitted that there is still a gap in code and multimodal capabilities this time.

It is worth mentioning that the general ability of Xinghuo's long text has also been newly upgraded, and the content traceability function has been released for the first time.

Liu Cong, president of the Xunfei Research Institute, also gave a live demonstration, threw a Chinese version of Journey to the West and the English version of Harry Potter to it, and asked:

What is the difference between Monkey King's golden wand and Harry Potter's wand?
iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

In addition to the step-by-step answers, there is a small flag on the Chinese characters of the answers, and you will find out where the source is when you open it at a point.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

In this way, the illusion of the large model can be greatly reduced, which is equivalent to Xinghuo answering your question, and telling you why it answered this way, which paragraph it was referring to, saving you the time to check the full text, just verify its traceability.

And note that this is not limited to Chinese, English traceability can also be realized. The Xinghuo large model does not translate English into Chinese, but directly finds the correspondence, which is truly based on the English traceability ability automatically trained in English.

Of course, this content source is not limited to text, including voice and video.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

Well, the base ability has been basically understood, and now the web version and App side have also been fully upgraded, so let's take a brief test.

First of all, let's take a look at the college entrance examination mathematics that stumped a wave of large models some time ago, how to deal with iFLYTEK Xinghuo 4.0, and directly take the first 4 objective multiple-choice questions in the first volume of the college entrance examination:

Look at the question and give the answer to the question.
iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

As a result, all four questions are correct, and the analysis is completely correct, whether to say it or not, there is something to it~

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

Let's take a look at its multi-mode comprehension ability, and whether it can find the corresponding logical relationship from several graphs.

For a cartoon, it can also clearly judge the content inside, and successfully answer the question given: after a year, will the child grow taller?

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

In addition, the speech recognition capability in strong interference scenarios has also achieved a breakthrough, and the accuracy rate of two-person aliasing scenarios has reached 91%; The three-person aliasing scene can also achieve 86% speech recognition accuracy; In the high-noise scene of -5dB, the noise is already much higher than that of human speech, and the accuracy rate of more than 90% can still be achieved - which is why there is a scene where "even if you talk nonsense, you can accurately recognize" at the beginning.

The ability of language recognition is also getting stronger and stronger, and the upgraded Xinghuo voice model can support 74 languages without switching, including 37 languages and 37 dialects, without switching, you can communicate freely.

Among them, the recognition effect of 37 languages is ahead of OpenAI whisper-V3, and the recognition effect of 37 dialects has increased by an average of 30%

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

Just a few days ago, iFLYTEK won the first prize of the National Science and Technology Progress Award for the project "Key Technology and Industrialization of Multilingual Intelligent Voice" as the first completion unit.

This is the first first prize of the National Science and Technology Progress Award in the field of artificial intelligence in the past decade since deep learning triggered the global wave of artificial intelligence.

On this basis, applications in the field of speech are also being refactored. The intelligent cockpit of Xinghuo Automobile has been newly upgraded, and it has "free interaction" in multiple languages and dialects, as well as super-anthropomorphic interaction with multiple emotions and modalities. At present, iFLYTEK's voice interaction products rank first in the domestic market share, and are widely exported to all over the world. The Xinghuo model is a highly intelligent interactive experience for many models of FAW, Chery, GAC, JAC, Great Wall and other car companies.

Featuring personalized AI assistants

With the upgrade of the base model capability, the application experience of Xinghuo in various industries and scenarios has also been further upgraded.

In iFLYTEK's own words: understand your AI assistant.

Compared with the previous positioning of "general AI assistant", Liu Qingfeng said that he mainly realized the stand-in at the three ability levels.

  • Personalized expression based on user portraits;
  • memory learning based on usage history;
  • Profile-based reinforcement learning;

Specifically, when constructing a user's personal portrait, the personality style can be selected by oneself, or it can be dynamically improved according to the dialogue and usage history, so as to form a personalized expression style. AI assistants, combined with profiles, can generate personalized and targeted content.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

And now everyone can have their own personalized assistant through the iFLYTEK Xinghuo APP or Desk interface.

This time, the "Personal Space" has been upgraded, which can collect and manage all kinds of data you upload and build your own exclusive knowledge base. And large models can also do reinforcement learning based on your profile.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

At the scene, Liu Cong uploaded his daughter's writing essay, and after selecting the label that conforms to his daughter's AI personality, the follow-up copywriting generation style is all with his daughter's personality style.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

On the iFLYTEK Xinghuo APP, there is also an agent function, which integrates a variety of AI assistants, including medical assistants, English listening and speaking assistants, math answering assistants, recording assistants, manuscript writing assistants, code assistants and other practical functions, which you can call at any time.

At present, the first batch of 14 agents has been launched.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

Focusing more on the application of specific industries, Xinghuo, as an "AI assistant that understands you", is constantly deepening and continuing to create value.

For example, medical care. At present, the iFLYTEK Xinghuo medical model has also been upgraded again, and its core medical capabilities have comprehensively surpassed GPT-4 Turbo, including medical-related knowledge quizzes, complex semantic understanding, professional document generation, diagnosis and treatment, and multiple rounds of dialogue.

The iFLYTEK Xiaoyi APP, which focuses on personal health assistants, has covered 1,600 common diseases, 2,800 common drugs, and 6,000 common examinations and tests, meeting the health needs of users in core scenarios before, during, and after medical treatment. So far, it has accumulated 12 million downloads. The user praise rate is 98.8%, and nearly half of it comes from user word-of-mouth recommendation.

You can ask it some general questions directly, such as, what if I have insomnia? Can people with gout drink soy juice?

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

iFLYTEK Xiaoyi APP has launched a "personal digital health space", which can be linked to your own and your family's health records, including electronic medical records, examination reports, physical examination reports and other information. When there are some minor illnesses, we will analyze the causes for you; Personalized judgments of drug contraindications are given when taking drugs, and data changes can also be given compared with previous reports.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

Then there is the field of education. AI is becoming a teaching assistant for teachers and a learning assistant for students.

This time, the underlying Spark model has greatly improved its Chinese, mathematics, English ability and OCR recognition ability.

On the teacher's side, iFLYTEK released the Xinghuo intelligent review machine this time, which can automatically correct, scan and approve, and operate on the spot.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

After the approval, it can also analyze the learning situation of the whole class, and assist the teacher to give each student's learning path plan.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

The original 90 minutes of homework correction time can be turned into 5 minutes; The 60-minute learning statistics time is programmed for one minute, which greatly liberates the teacher's productivity.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

On the student side, the AI learning machine equipped by the Xinghuo large model further realizes super-anthropomorphic Q&A tutoring based on the improvement of underlying capabilities.

Judging from the existing pilot data, the completion rate of children's independent learning has increased from 67% to 90%, and the problem solving rate has reached 93% from 72% in the past relying on video learning.

In addition, in the field of enterprise applications, the enterprise intelligent twins platform, as well as business opportunities, bid evaluation, code and other enterprise intelligent assistants were also released.

At the same time, the influence of iFLYTEK Xinghuo's developer ecology is still expanding——

Since the release of Xunfei Xinghuo V3.5 on January 30 this year, in just five months, the growth of Xinghuo's developer ecosystem has accelerated, with the number of developers increasing from 5.98 million to 7.02 million, with more than 1.04 million new developers, more than 400,000 overseas developers, and 570,000 large model developers.

Make large models easier to use and more practical

After watching the whole press conference, iFLYTEK released such a force signal;

Make large models easier to use and more practical.

And to further concretize it, it is the AI intelligent assistant.

It can be that the health of the whole family is guarded by AI; It can also be the lifelong learning ability of active thinking in the one-to-one personalized teaching of each child; There are also service scenarios such as in-depth enterprise management, where each worker can easily manage his or her own knowledge base.

And if throughout human civilization, behind every progress there is a great assistant, and every generation of assistants has its mission.

The mission of iFLYTEK is to liberate and unleash productivity.

Liu Qingfeng said that we hope that through our ability, we can achieve every great enterprise and help everyone become a great self.

As the "carrier" of AI assistants, the iFLYTEK Xinghuo APP is actually continuing to empower and has long been changing our production and life around us.

At the meeting, Liu Qingfeng provided these sets of key figures.

On the Android side, among all the apps related to downloading large models, the iFLYTEK Xinghuo APP ranks first in the tool category, with a total of 131 million downloads.

It means that all kinds of assistants of Xinghuo APP, including writing, programming, work, study, life, parenting, translation and other assistants, are used by us on a daily basis, and some of the calls have even reached millions or even tens of millions.

However, from the perspective of the entire industry, in fact, this is not a new concept, which has appeared in many science fiction TV series and movies, and has not been brought by the era of large models until now, and science fiction scenes have been brought into reality.

As for the ChatGPT boyfriend DAN who exploded before, and GPT-4o, which brought a new heated discussion on human-computer interaction, more general-purpose AI assistants with both functional and emotional attributes appeared, which made people shout: "Her" is really here.

But it is not easy to build it as an AI assistant.

I believe many friends have noticed that GPT Builder is about to end its service in July. This was highly anticipated because "everyone can create their own GPT", but now it is about to shut down less than half a year after its release.

I still remember that when it first came out, it was criticized by many people that some customized GPTs were no different from ChatGPT's original dialogue and could not solve complex instructions......

When a large-scale model product is directly facing users, people's expectations and requirements for it are far more stringent than ever. When the existing capacity of the product cannot meet the needs of users, it will soon be eliminated by users and eliminated by the market......

Only by constantly polishing product capabilities, directly hitting user pain points, and always maintaining an open ecology, can we continue to thrive in such a wave.

At least for now, the large-scale model products that are still alive and continue to bring services to users have undergone a test. iFLYTEK is one of them.

A recent decision by ChatGPT has once again made the proposition that large models are autonomous and controllable particularly important.

OpenAI's large model will not become the base of China's AI applications, and naturally it will not become the base of China's AI assistants. And players like iFLYTEK have focused on autonomy and controllability from the very beginning-

Until now, iFLYTEK Xinghuo 4.0 is still the only officially certified large model that is open to the whole people.

What is the concept?

It is a large model trained on the national computing power platform, and all algorithms, every line of code, and every data are our independent and controllable large models.

The release of the iFLYTEK Xinghuo large model is based on the country's first domestic Wanka computing power cluster "Feixing No. 1".

Liu Qingfeng said: The ability of the large model base determines the height of development, and China needs to establish an independent and controllable general large model base.

iFLYTEK Spark 4.0 dominates the eight lists, and the big show of voice recognition is premeditated interference

It is necessary to scientifically understand the boundaries of large model capabilities, and now with the upgrading of large model capabilities, it is possible for everyone to be AI intelligent assistants.

Spark represents a trend and is leading the way in its development.

— END —

QubitAI · 头条号签

Follow us and be the first to know about cutting-edge science and technology

Read on