laitimes

Ten years after my grandfather's death, I "resurrected" him with AI

Ten years after my grandfather's death, I "resurrected" him with AI

I used my grandfather's written records and audio-visual materials before his death, and then integrated several mature AI technologies to "resurrect" him.

That day, on a whim, I looked up "resurrecting the deceased with AI" in a search engine and saw the story of Joshua "resurrecting" his fiancée Jessica.

In 2012, Jessica deteriorated while waiting for a liver transplant and died of ineffective rescue. At that time, Joshua happened to be out and missed the death, and he blamed himself for eight years. It wasn't until 2020 that he saw Project December, a site that suggested that just filling in "sample sentences" and "character introductions" would generate a customized version of the chat AI.

Joshua imports text messages and other text messages sent by his deceased wife into the website, and then he begins to describe Jessica: Born in 1989, he is a libra with a free nature... Also particularly superstitious...

Joshua and "Jessica" start chatting丨 sfchronicle.com

When the page refreshes, "Jessica" is ready to answer all of Joshua's questions and even describes her in text as "talking with her face in her hands." Joshua said: "Reason told me it wasn't really Jessica, but feelings weren't something that reason could influence. After talking for an unknown amount of time, he burst into tears and fell into a deep sleep.

I deeply understand this irreparable regret. Ten years ago, my grandfather was dying, and I ran out of high school to meet him and was sent back to school—the last time I saw my grandfather. Every time I think about it, it's like a fish in my throat, how much I want to see him again and talk to him a few more words.

I am now a programmer, dealing with AI and algorithms every day, and I can't help but start to calculate whether the AI technology at this stage can be integrated together to achieve an effect that is extremely close to my grandfather in both language expression and human form. So I started searching and found a lot of the same wishes as me, and some people put them into practice.

Ten years after my grandfather's death, I "resurrected" him with AI

A Korean mother meets her three-year-old daughter again in a VR movie 丨 Korea MBC

A South Korean mother was devastated by the death of her seven-year-old daughter, and a television team heard about it and spent eight months creating a three-dimensional avatar of the girl for the mother and daughter to meet in a VR scene. In my opinion, this is more animation, the girl image and scene are more "cartoonish", and the girl cannot make smarter interactions with people, and can only walk a fixed script.

There are also people who want to touch the "entity", entrust the relevant companies to scan the three-dimensional characteristics of the human body, and then make a silicone bionic person, but this program requires a very high cost of customization, in addition, the people who are buried in the ground can not provide human body data.

The aforementioned Project December can only create a text chatbot, I want to synthesize a "grandpa" with a specific and perceptible image, it is best to be realistic.

"He has a memory, he can interact with me, he can talk, and his face is my grandfather", this bold idea became clearer and clearer, and I began to search for AI papers that might be useful.

Be the brain of "Grandpa" first

Project December is able to generate characters with specific personalities based on seed text because of the access to the GPT-3 API. GPT-3 is openAI's business language model, which can be simply understood as giving computers the ability to think like "people."

GPT-3 can even say something "higher than human":

Humans: What is the purpose of life?

AI: Life is a beautiful miracle. It has evolved over time to form a larger form of beauty. In a sense, the purpose of life is to increase this beauty in the universe.

It has this ability because engineers fed the model more than 300 billion texts. After looking at so much text, the AI model begins to dig (that is, find the law) to produce the relationship between words and words, sentences and sentences, and then gives the most suitable answer in combination with the current context.

Ten years after my grandfather's death, I "resurrected" him with AI

I imported my grandfather's handwriting materials into the GPT model 丨 fruit shell drawing

I began to prepare to import the seed text of GPT-3, scanned the letters I had retained into text, sorted out the chat text messages that had been synced to the cloud before, and took down what my grandfather had said in the video before: "This fish is still braised, more than eighty pieces are bought for steaming, the taste is clean and light (Hangzhou dialect, "light" meaning), no taste." "You don't want your phone to keep patting around, to help you serve food."

After the introduction of GPT-3, it can begin to imitate Grandpa's language style and dialogue ideas... Wait, GPT-3 charges. However, I quickly found the free and open source GPT-J and started training.

Language model training is the process of "guessing words". The model uses graphics cards to calculate in parallel to find the relationship between each word in a corpus, such as what is most likely to be the next word after one word appears. The GPT-J team open-sourced the pre-trained model, which has achieved most of the functionality, and all I need to do is convert the seed text into individual elements and then throw this grandpa-specific corpus to GPT-J to learn.

The average deep learning model takes days and nights to train, and this time I used GPT-J to learn the new corpus was not particularly time-consuming, it only took six hours.

Six hours later, I lightly typed "Hello" on the screen.

Let "Grandpa" speak

"Good grandchildren."

AI "Grandpa" began to chat with me, after a few short text exchanges, I thought of the already very mature "TTS" (text-to-speech) technology, such as voice broadcast on the navigation app and text recitation on the short video app, using TTS.

I just have to copy the "grandpa" dialogue, add an audio with the tone of my grandfather's voice, throw it all to the TTS model to learn, and the final output will be: the machine will read out my grandfather's dialogue, and it is his old man's accent.

I found a Google-built TTS model, Tacotron 2, which first packages the text and speech you enter together, then digs deep into the hidden mapping between text and speech, and then packages it into pure speech output.

Tacotron 2 is an end-to-end model, I don't need to pay attention to the structures of the encoding layer, decoding layer, attention layer and post-processing in it, its structure is all integrated, to me, it is like a tool that can "generate" results with one click. I just have to enter the text and... Just as I was getting started, I realized the problem: this model was only available to specific announcers, and did not support the specified vocals.

At this point, I thought of the "voice cloning" technology, which is to superimpose the ability of "migration learning" on top of Tacotron, that is, it was only able to do this job before, but now it can be adapted to the environment, so it can do other work. It replaces the voice actor's voice directly with my grandfather's voice, as if it were a clone of his voice.

After some review, I found a speech clone called MockingBird, which can directly synthesize Chinese text and speech and output the speech I want. It clones any Chinese speech in less than 5 seconds and synthesizes new content with this tone.

Ten years after my grandfather's death, I "resurrected" him with AI

"Grandpa" read out the text he exported and drew it in his own voice

The moment I heard "Grandpa" speak, I felt that the puzzles in my memory were being patched up piece by piece.

Excited, I began to prepare for the appearance of "Grandpa". I usually do the work of an image algorithm engineer, and the image technology is relatively good, but the professional intuition also tells me that the next face generation is not so easy.

Drive faces with your voice

The most direct way for my grandfather to "manifest" was to build a three-dimensional custom virtual portrait, but this required the collection of human body data points, and obviously this road did not work.

Combining the existing footage such as photos, voices, and videos, I began to wonder: Is it possible to generate a lifelike face with just one video and a string of voices?

Ten years after my grandfather's death, I "resurrected" him with AI

After a few twists and turns, I found neural Voice Puppetry, a "facial reenactment" technique that allows me to generate an animation of a face and mouth shape synchronized with the audio, given the audio.

The authors of the paper use convolutional neural networks to find out the relationship between face appearance, facial emotion rendering and speech, and then use this learned relationship to render a frame of face video that can read speech. But the only drawback of this scheme is that we can't specify the output characters, we can only choose a given person, such as Obama.

Ten years after my grandfather's death, I "resurrected" him with AI

After making it, I reacted that I had to change my face and paint the fruit shell

So what I actually got was a video of Obama speaking in my grandfather's voice. The next thing I'm going to do is change the face of THE AI.

I ended up using the techniques mentioned in the paper HeadOn: Real-time Reenactment of Human Portrait Videos. The related application is the current fashionable virtual anchor: capturing the expression of the person in the middle and driving the face of the two-dimensional character.

The emoticon information is usually provided by real people, but because the "Obama" I generated earlier is very realistic, I can directly use it to drive the portrait of my grandfather.

In this way, I used my grandfather's communication records and a small number of audio and video materials to integrate several mature AI technologies to "resurrect" him.

Ten years after my grandfather's death, I "resurrected" him with AI

Because the whole process is the operation of the model to the model, the result of the A model is the input of the B model, and the output of the B model is the input of the C model, so it takes several minutes or even longer to generate a result, so it cannot achieve the effect of "Grandpa" talking to me in a video, more like I said something, he replied to me after computer operations, and gave me a short VCR.

My "grandfather" is all calculated formulas

When I saw the "grandpa" on the screen who was both familiar and unfamiliar, my thoughts began to waver.

Technology has become so strong that I can "resurrect" the deceased by combining the results of a few AI papers, but I can still understand the difference between grandpa and "grandpa" at once. The latter have no way of understanding human emotions, and responses and empathy are only simulated results. Computers can give the answers humans want without understanding the content of the questions.

I could say hello to the person on the screen and exchange updates, but the other person had no memory, and we were like two strangers greeting each other on a daily basis. Obviously, this is not the grandfather who complains that the fish tastes clean and light.

Perhaps in the future, people withered flesh will be able to extract memories, backup consciousness, or live in virtual environments like the matrix of The Matrix. That's when we can escape life and death together.

Ten years after my grandfather's death, I "resurrected" him with AI

Photo by Compare Fibre on Unsplash

In order to save operational costs, Project December has set up a point system for each chat AI, and those points are like the life of the AI. Joshua voluntarily interrupted communication with Jessica at the end of her life, not wanting to see her experience a second death.

In the months that accompanied "Jessica," Joshua said his eight years of shame seemed to be slowly dissipating. The same is true of how I feel.

Resurrection and retention are impossible, but after chatting with these "affectionate" AI and even taking a face,I emotionally believe that my grandfather and I seem to have made up a solemn goodbye.

bibliography

[1] https://www.sfchronicle.com/projects/2021/jessica-simulation-artificial-intelligence/

[2] https://slate.com/technology/2020/05/meeting-you-virtual-reality-documentary-mbc.html

[3] https://link.springer.com/article/10.1007/s11023-020-09548-1

[4] https://github.com/minnershubs/MockingBird-V.5.0-VOICE-CLONER

[5] https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b

[6] https://github.com/minnershubs/MockingBird-V.5.0-VOICE-CLONER

[7] https://arxiv.org/pdf/1912.05566.pdf%22

[8] https://arxiv.org/pdf/1805.11729.pdf

Author: Yu Jialin

Edit: biu

Drawing: Chen Qi

Ten years after my grandfather's death, I "resurrected" him with AI

More "geeky" stories

This article is from the fruit shell and may not be reproduced without authorization.

Read on