Ten years after my grandfather's death, I "resurrected" him with AI

I used my grandfather's written records and audio-visual materials before his death, and then integrated several mature AI technologies to "resurrect" him.

That day, on a whim, I looked up "resurrecting the deceased with AI" in a search engine and saw the story of Joshua "resurrecting" his fiancée Jessica.

In 2012, Jessica deteriorated while waiting for a liver transplant and died of ineffective rescue. At that time, Joshua happened to be out and missed the death, and he blamed himself for eight years. It wasn't until 2020 that he saw Project December, a site that suggested that just filling in "sample sentences" and "character introductions" would generate a customized version of the chat AI.

Joshua imports text messages and other text messages sent by his deceased wife into the website, and then he begins to describe Jessica: Born in 1989, he is a libra with a free nature... Also particularly superstitious...

Joshua and "Jessica" start chatting丨 sfchronicle.com

When the page refreshes, "Jessica" is ready to answer all of Joshua's questions and even describes her in text as "talking with her face in her hands." Joshua said: "Reason told me it wasn't really Jessica, but feelings weren't something that reason could influence. After talking for an unknown amount of time, he burst into tears and fell into a deep sleep.

I deeply understand this irreparable regret. Ten years ago, my grandfather was dying, and I ran out of high school to meet him and was sent back to school—the last time I saw my grandfather. Every time I think about it, it's like a fish in my throat, how much I want to see him again and talk to him a few more words.

I am now a programmer, dealing with AI and algorithms every day, and I can't help but start to calculate whether the AI technology at this stage can be integrated together to achieve an effect that is extremely close to my grandfather in both language expression and human form. So I started searching and found a lot of the same wishes as me, and some people put them into practice.

A Korean mother meets her three-year-old daughter again in a VR movie 丨 Korea MBC

A South Korean mother was devastated by the death of her seven-year-old daughter, and a television team heard about it and spent eight months creating a three-dimensional avatar of the girl for the mother and daughter to meet in a VR scene. In my opinion, this is more animation, the girl image and scene are more "cartoonish", and the girl cannot make smarter interactions with people, and can only walk a fixed script.

There are also people who want to touch the "entity", entrust the relevant companies to scan the three-dimensional characteristics of the human body, and then make a silicone bionic person, but this program requires a very high cost of customization, in addition, the people who are buried in the ground can not provide human body data.

The aforementioned Project December can only create a text chatbot, I want to synthesize a "grandpa" with a specific and perceptible image, it is best to be realistic.

"He has a memory, he can interact with me, he can talk, and his face is my grandfather", this bold idea became clearer and clearer, and I began to search for AI papers that might be useful.

Be the brain of "Grandpa" first

Project December is able to generate characters with specific personalities based on seed text because of the access to the GPT-3 API. GPT-3 is openAI's business language model, which can be simply understood as giving computers the ability to think like "people."

GPT-3 can even say something "higher than human":

Humans: What is the purpose of life?

AI: Life is a beautiful miracle. It has evolved over time to form a larger form of beauty. In a sense, the purpose of life is to increase this beauty in the universe.

It has this ability because engineers fed the model more than 300 billion texts. After looking at so much text, the AI model begins to dig (that is, find the law) to produce the relationship between words and words, sentences and sentences, and then gives the most suitable answer in combination with the current context.

I imported my grandfather's handwriting materials into the GPT model 丨 fruit shell drawing

I began to prepare to import the seed text of GPT-3, scanned the letters I had retained into text, sorted out the chat text messages that had been synced to the cloud before, and took down what my grandfather had said in the video before: "This fish is still braised, more than eighty pieces are bought for steaming, the taste is clean and light (Hangzhou dialect, "light" meaning), no taste." "You don't want your phone to keep patting around, to help you serve food."

After the introduction of GPT-3, it can begin to imitate Grandpa's language style and dialogue ideas... Wait, GPT-3 charges. However, I quickly found the free and open source GPT-J and started training.

Language model training is the process of "guessing words". The model uses graphics cards to calculate in parallel to find the relationship between each word in a corpus, such as what is most likely to be the next word after one word appears. The GPT-J team open-sourced the pre-trained model, which has achieved most of the functionality, and all I need to do is convert the seed text into individual elements and then throw this grandpa-specific corpus to GPT-J to learn.

The average deep learning model takes days and nights to train, and this time I used GPT-J to learn the new corpus was not particularly time-consuming, it only took six hours.

Six hours later, I lightly typed "Hello" on the screen.

Let "Grandpa" speak

"Good grandchildren."

AI "Grandpa" began to chat with me, after a few short text exchanges, I thought of the already very mature "TTS" (text-to-speech) technology, such as voice broadcast on the navigation app and text recitation on the short video app, using TTS.

I just have to copy the "grandpa" dialogue, add an audio with the tone of my grandfather's voice, throw it all to the TTS model to learn, and the final output will be: the machine will read out my grandfather's dialogue, and it is his old man's accent.

I found a Google-built TTS model, Tacotron 2, which first packages the text and speech you enter together, then digs deep into the hidden mapping between text and speech, and then packages it into pure speech output.

Tacotron 2 is an end-to-end model, I don't need to pay attention to the structures of the encoding layer, decoding layer, attention layer and post-processing in it, its structure is all integrated, to me, it is like a tool that can "generate" results with one click. I just have to enter the text and... Just as I was getting started, I realized the problem: this model was only available to specific announcers, and did not support the specified vocals.

At this point, I thought of the "voice cloning" technology, which is to superimpose the ability of "migration learning" on top of Tacotron, that is, it was only able to do this job before, but now it can be adapted to the environment, so it can do other work. It replaces the voice actor's voice directly with my grandfather's voice, as if it were a clone of his voice.

After some review, I found a speech clone called MockingBird, which can directly synthesize Chinese text and speech and output the speech I want. It clones any Chinese speech in less than 5 seconds and synthesizes new content with this tone.

"Grandpa" read out the text he exported and drew it in his own voice

The moment I heard "Grandpa" speak, I felt that the puzzles in my memory were being patched up piece by piece.

Excited, I began to prepare for the appearance of "Grandpa". I usually do the work of an image algorithm engineer, and the image technology is relatively good, but the professional intuition also tells me that the next face generation is not so easy.

Drive faces with your voice

The most direct way for my grandfather to "manifest" was to build a three-dimensional custom virtual portrait, but this required the collection of human body data points, and obviously this road did not work.

Combining the existing footage such as photos, voices, and videos, I began to wonder: Is it possible to generate a lifelike face with just one video and a string of voices?

After a few twists and turns, I found neural Voice Puppetry, a "facial reenactment" technique that allows me to generate an animation of a face and mouth shape synchronized with the audio, given the audio.

The authors of the paper use convolutional neural networks to find out the relationship between face appearance, facial emotion rendering and speech, and then use this learned relationship to render a frame of face video that can read speech. But the only drawback of this scheme is that we can't specify the output characters, we can only choose a given person, such as Obama.

After making it, I reacted that I had to change my face and paint the fruit shell

So what I actually got was a video of Obama speaking in my grandfather's voice. The next thing I'm going to do is change the face of THE AI.

I ended up using the techniques mentioned in the paper HeadOn: Real-time Reenactment of Human Portrait Videos. The related application is the current fashionable virtual anchor: capturing the expression of the person in the middle and driving the face of the two-dimensional character.

The emoticon information is usually provided by real people, but because the "Obama" I generated earlier is very realistic, I can directly use it to drive the portrait of my grandfather.

In this way, I used my grandfather's communication records and a small number of audio and video materials to integrate several mature AI technologies to "resurrect" him.

Because the whole process is the operation of the model to the model, the result of the A model is the input of the B model, and the output of the B model is the input of the C model, so it takes several minutes or even longer to generate a result, so it cannot achieve the effect of "Grandpa" talking to me in a video, more like I said something, he replied to me after computer operations, and gave me a short VCR.

My "grandfather" is all calculated formulas

When I saw the "grandpa" on the screen who was both familiar and unfamiliar, my thoughts began to waver.

Technology has become so strong that I can "resurrect" the deceased by combining the results of a few AI papers, but I can still understand the difference between grandpa and "grandpa" at once. The latter have no way of understanding human emotions, and responses and empathy are only simulated results. Computers can give the answers humans want without understanding the content of the questions.

I could say hello to the person on the screen and exchange updates, but the other person had no memory, and we were like two strangers greeting each other on a daily basis. Obviously, this is not the grandfather who complains that the fish tastes clean and light.

Perhaps in the future, people withered flesh will be able to extract memories, backup consciousness, or live in virtual environments like the matrix of The Matrix. That's when we can escape life and death together.

Photo by Compare Fibre on Unsplash

In order to save operational costs, Project December has set up a point system for each chat AI, and those points are like the life of the AI. Joshua voluntarily interrupted communication with Jessica at the end of her life, not wanting to see her experience a second death.

In the months that accompanied "Jessica," Joshua said his eight years of shame seemed to be slowly dissipating. The same is true of how I feel.

Resurrection and retention are impossible, but after chatting with these "affectionate" AI and even taking a face,I emotionally believe that my grandfather and I seem to have made up a solemn goodbye.

bibliography

[1] https://www.sfchronicle.com/projects/2021/jessica-simulation-artificial-intelligence/

[2] https://slate.com/technology/2020/05/meeting-you-virtual-reality-documentary-mbc.html

[3] https://link.springer.com/article/10.1007/s11023-020-09548-1

[4] https://github.com/minnershubs/MockingBird-V.5.0-VOICE-CLONER

[5] https://github.com/kingoflolz/mesh-transformer-jax/#gpt-j-6b

[6] https://github.com/minnershubs/MockingBird-V.5.0-VOICE-CLONER

[7] https://arxiv.org/pdf/1912.05566.pdf%22

[8] https://arxiv.org/pdf/1805.11729.pdf

Author: Yu Jialin

Edit: biu

Drawing: Chen Qi

More "geeky" stories

This article is from the fruit shell and may not be reproduced without authorization.

Ten years after my grandfather's death, I "resurrected" him with AI

Read on

OpenAI taught GPT-3 to learn to go online, and the "omniscient and all-powerful" AI model was launched

GPT-3 search for answers on the Internet yourself! New openAI achievements, allowing AI to answer open-ended questions

Idea is king! The former Chairman of Mensa International summarized the application of artificial intelligence in 2021

AI big model, AI track a "banknote" ability arms race that can not be lost?

Ukrainian model who came to China to "pan for gold"

In-depth reporting| trend circle, the last territory that "they" can't occupy?

A single GPU gets the GPT-3 hyperparameters! The | train small models first, then "one-click migration" has been open sourced

Why did the real-life version of the legal beauty choose the Mazda CX-5? | you choose the car

Deep learning has reached a dead end?

Changing the code is super fast! GPT-3 adds editing and inserting text features, and AI can write poems for you

The new film queen "Model Sister" let people see the "ceiling" of professional wear

The corpus was not chosen correctly, and GPT-3 trained a microwave oven "killer" AI

Not too large, dense with a plain text model, a new wave of predictions for GPT-4 is coming

Meta fork GPT-3 "backstab" OpenAI, full model weights & training code published

To see how strong the AI is, someone took it to play a "script kill"

Hardware 丨 AMD expects to launch a CPU with an integrated AI engine as early as 2023

Why sound is suitable for building a brand strengthens the mind

The 7th generation of Qualcomm AI engine: through AI, see the future

Capture once in 5 minutes, at least 89 times a day at home! Suntech employee: I don't even dare to go to the toilet

Played a script kill, the same car teammate "not human"

2022 Le Orange New Product Launch: 14 new products qifa software and hardware fully upgraded

Is there any software to dub videos? Share software that can dub videos

Don't let ChatGPT run

The meme search artifact is here! You can also search for videos, netizens: I found a six-year meme to solve in two minutes

Cheating with ChatGPT, beware of being caught, anti-plagiarism watermark technology makes students' nightmares come early

Google's "crazy" generative AI track, the latest model can "create" music with text and pictures

What to do if ChatGPT goes crazy? Xiaoice Li Di: Two keys that I can break

Experience ChatGPT again: it will still be wrong, but the logic is stronger

Xiaoza personally officially announced the Meta vision big model! Self-supervised learning requires no fine-tuning

The CV ring exploded again? Xiaoza high-profile official announcement DINOv2, split retrieval omnipotent, netizens: Meta is "Open" AI