Lei Jun's AI voice has been banned repeatedly, who is responsible for this?

Economic Observer reporter Chen Qijie Ma Yun, Lei Jun, Jay Chou, Trump...... The real voices of these domestic and foreign public figures have been cloned by some creators using AI audio applications, made into voice characters, and posted in the AI audio sharing community. In just a few tens of seconds, users can rely on these voice characters to generate an AI audio that is close to the quality of a human speech.

During the "Eleventh" holiday, netizens produced a large number of AI audio of Lei Jun, the founder of Xiaomi, and after combining the picture to synthesize audio and video, such content was uploaded to Internet platforms such as Douyin, Kuaishou and Station B. At the end of September, a suspect used the AI audio app Reecho to falsify the audio recording of Lu Wenqing, the founder of Three Sheep, which led to the police to intervene in the investigation.

After the Lei Jun AI audio incident fermented, a large number of related audio and video on the Internet platform were deleted, but there is still a small amount left. On October 13, a reporter from the Economic Observer Network found that users can still use the voice character named "Lei Jun" in the Ruisheng AI audio sharing community (hereinafter referred to as the "Ruisheng Community") to make audio, and a "Lei Jun" voice character has been used 603,000 times.

Raysheng is an AI audio application and sharing community under Shenzhen Yanyu Technology Co., Ltd. (hereinafter referred to as "Yanyu Technology"). Xie Weiduo, founder of Yanyu Technology, told the Economic Observer that Xiaomi's legal affairs have contacted the company, hoping that it can control the audio content involving Lei Jun in the community. At present, Yanyu Technology has notified the original author to remove the relevant content, and if the original author does not remove it within 7 working days, the company will forcibly delete it.

AI audio producers, voice character creators, AI audio applications and sharing communities, and Internet platforms together form the upstream and downstream chain of producing AI audio for public figures. When such audio triggers negative public opinion, what are the responsibilities of the parties involved in the chain? Can the infringed party use the existing technology to accurately trace back to the perpetrator?

Producer's Responsibility

Currently, AI audio makers generate audio of public figures in two main forms.

The first way is for the AI audio maker to directly use the technology of the AI technology provider to generate the audio. Xie Weiduo said that previously, the Hefei police traced the source of the suspect who forged Lu Wenqing's audio recording through the Internet platform, and saw the record of using Ruisheng on his computer. Yanyu Technology cooperated with the police to obtain the suspect's generation records as evidence.

Yanyu Technology disclosed that the suspect intercepted about 30 seconds of Lu Wenqing's emotionally charged audio in the live broadcast as material, and synthesized it on Ruisheng's platform with the text he compiled. Subsequently, the suspect played the synthesized audio in a noisy and empty environment, and used other recording equipment to rip and stitch it in segments. This operation adds to the sense of ambience, making it difficult for many netizens to distinguish whether the recording is real or fake.

The second way is for creators to upload the real voice of a public figure, clone a voice character close to their voice, and share it with the AI audio sharing community, where other producers can directly use such voice characters to generate AI audio. Compared with the former, the second way to generate AI speech is more convenient.

The reporter of the Economic Observer Network selects a voice character named "Lei Jun" in the Ruisheng community, enters 200 characters of text, and clicks to generate audio. In less than a minute, an AI audio that mimicked Lei Jun's speech was produced.

As of October 13, a "Lei Jun" voice character in the Raysheng community has been used 603,000 times and generated 33.718 million characters, and on the AI audio application Fish Audio, a "Lei Jun" voice character has been used 174,000 times.

Netizens often don't pay attention to the need to be authorized when making AI audio of public figures, and a user of station B said that he just saw many people synthesize Lei Jun's AI audio, so he came up with the idea of making Lei Jun's AI voice commentary game video. Another Bilibili user who made a similar video said, "(This kind of video) has a lot on the Internet, it should be fine, and I will delete it if there is a prompt for violation."

Du Shuang, a lawyer at Tahota Law Firm, told the Economic Observer that highly recognizable AI-generated voices are legally protected according to the existing judicial determination standards. Without the authorization of Lei Jun or Xiaomi, AI dubbing material providers and video producers use their AI-generated voices to produce and disseminate abusive and defamatory audio and video content, which in itself constitutes an infringement of Lei Jun's portrait rights, reputation rights and other personality rights, and may even infringe on Xiaomi's reputation.

Obligations of Technology Providers

In the two AI audio production methods, AI audio application companies play different roles. In the first case, they only act as AI technology providers. In the second approach, they have the dual attributes of AI technology provider and operator.

Xie Weiduo, the founder of Yanyu Technology, believes that AI applications are only tools for content transformation, and in terms of content control, the more priority option is the user's responsibility. As a user-generated content (UGC) community, Yanyu Technology is not willing to overly interfere with user-generated content under the condition of legal compliance.

Therefore, Yanyu Technology mainly conducts compliance management in the form of prompting users and tracing the source of content.

On its official website, Yanyu reminds users not to use its services to clone or generate any content that infringes copyright, violates morality and ethics, or violates the laws and regulations of the People's Republic of China.

After the Three Sheep forged audio recording incident, Yanyu Technology responded that it is deploying multiple security measures including a strengthened real-name authentication mechanism, enhanced multi-dimensional intelligent detection and early warning of sensitive words, and traceable audio watermarks.

Xie Weiduo said that now all the content generated by users using Rainbow can be traced, and the information that can be traced can be traced, including technology platforms and content production users.

Content traceability is a regulatory requirement for AI technology providers and operators. According to the "Practice Guide for Cybersecurity Standards - Methods for Identifying Content of Generative AI Services" issued by the Secretariat of the National Information Security Standardization Technical Committee in August last year, explicit and implicit watermarks should be added to the generated content when images, audio, and videos are generated by AI.

Du Shuang said that according to the "Provisions on the Administration of Deep Synthesis of Internet Information Services" (hereinafter referred to as the "Administrative Provisions"), AI technology providers also have obligations to strengthen the management of training data, ensure the security of training data, and protect personal information. At the same time, where biometric information editing functions such as faces and voices are provided, the technical users shall be prompted to inform the individuals being edited in accordance with law, and obtain their separate consent.

The Economic Observation Network noticed that voice characters such as "Jack Ma", "Jay Chou", "Trump" and "Cai Xukun" made by some creators were posted on the Home of the Ruisheng community and could be seen by unregistered users. At this time, the voice personas of these public figures have become a tool for AI audio application companies to attract traffic.

Du Shuang said that in this case, the creator who made the voice character has constituted infringement. If the AI audio sharing community discovers infringement or rights holders complain and report, they need to delete the relevant materials in a timely manner. If it is not deleted in time, the AI audio sharing community needs to bear certain infringement liability.

Xie Weiduo said that at present, Ruisheng is working with some voice actors to consider launching high-quality official characters. In the future, it is also possible that users will sell sound copyrights after verifying copyrights.

Platform: Rely on user identity

In this Lei Jun AI audio incident, Internet platforms such as Douyin, Kuaishou and Bilibili are the main communication channels. In the face of more and more AI-generated content, what are the responsibilities of internet platforms?

Du Shuang said that as a communication channel, the platform side is also regulated by the "People's Republic of China Cybersecurity Law" and "Management Provisions", and for deep synthesis technology that may cause public confusion or misidentification, it should be conspicuously marked in the reasonable position and area of the generated or edited information content to remind the public of the deep synthesis situation.

At present, the measures taken by mainstream social platforms for AI content are generally to ask users to make a statement first, and mark a logo similar to "this content is AI-generated" next to the AI-generated content.

The Economic Observer learned from Kuaishou that it launched the AI content author declaration function in September 2023, and required users to add an author statement when publishing AI-generated content to avoid misunderstandings caused by AI content in the process of dissemination.

Video content platforms such as Bilibili and Douyin have also taken similar measures. For example, Douyin mentions in the user service agreement that users are not allowed to use new technologies and applications based on deep learning, virtual reality, etc. to create, publish, and disseminate fake news information. When users publish or disseminate non-authentic audio and video information produced using new technologies and applications such as deep learning and generative artificial intelligence, or other information content that may cause confusion or misidentification by the public, they shall mark it in a conspicuous manner.

For AI-generated content that is not identified in a conspicuous manner, the user agreement of Bilibili further reminds that Bilibili "has the right to take measures including but not limited to adding logos, restrictions, and bans to relevant content and accounts".

However, after the reporter of the Economic Observer Network uploaded an AI audio and video for Douyin and Station B, it was found that neither platform detected that the audio or video was generated by AI without actively choosing to label the audio and video.

An executive at an AI counterfeiting company had access to internet platforms. In his opinion, due to comprehensive reasons such as their own costs and regulatory strength, the current willingness of Internet platforms to manage AI fake content is not high.