[Image Generation AI] Innovative! Realistic Face Swapping Experience with CG Videos

December 8, 2023

Hello, I'm Ushiyama from the CG team.

In this blog, we have been continuously exploring how to make CG characters more realistic using image-generating AI.
Let's first recap what we've covered so far.

[Image Generation AI] CG Avatar Replacement using Stable Diffusion

Here, we explained how to make only the facial part more realistic by using masks and text prompts.

[Image Generation AI] Ensuring Avatar Consistency from Various Angles!

We aimed for consistency in portraying the same person from various angles, not just relying on the Seed value but also using ControlNet.

[Image Generation AI] Utilizing Deepfake to Create the Same Individual!

We utilized an extension called Roop for face swapping, aiming for consistency in the person’s appearance. The strength was being able to swap faces consistently at both close and long distances.

Our verification results so far suggest that swapping faces offers higher versatility compared to making them more realistic through text or contour detection.

Today's Task

We will continue using deepfake, this time with CG videos for verification. Looking at various cases online, face swapping in live-action footage seems to have reached a high level.
We will focus on a scenario of swapping faces in CG videos for apparel, keeping in mind that for apparel-related CG, the clothing should be made using CG.
If we change the clothing with AI, it might deviate from the actual product, so we are imposing this constraint.

Today's Goals

Understand the entire process and workload for creating a video
Verify if there are any random elements or flickering between frames
Identify the parts where AI struggles and devise countermeasures

Preparation for Verification

Deepfake Mechanism

Let's first briefly look at the mechanism of deepfake in image-generating AI, as explained by ChatGPT.

Detecting Faces in Images:
- The AI analyzes the image to find human faces.
- It detects facial features (eyes, nose, mouth, etc.), the direction of the face, and expressions, gathering these data.
Replacing Faces in Specified Images:
- At this stage, the face in another image replaces the original face.
- The AI adjusts the size, shadows, and skin tones of the face to ensure the replaced face appears natural.
- The replacement is finely tuned so that eye and mouth movements sync with the original image.

Thus, if a "face can't be detected," the face swapping fails.
To test this, we will create videos with various camera works.

Preparing Materials

Let's prepare materials that can be used in our upcoming accelerando.AI project.

First, we prepare images for face swapping.
We'll turn a Zepeto avatar used influentially for accelerando.AI into a real-life appearance using AI.

Then, we prepare CG materials.

For this verification, we’ll use AI to change only the face of a CG-created character. We won’t make complex changes like AI-generated hair or clothing realism; other parts of the character will use CG rendering. This approach simplifies the verification process.

We use the silver outfit from the first launch of accelerando.AI. ("People and AI Collaboration" brought to life – the first item from the future fashion brand "accelerando.Ai" goes on sale)
We didn't have time to create detailed hair, so we’ll use something similar from a preset.

Procedure and Settings

We'll proceed as follows:

Use a walking cycle motion for CG
Set the camera for each hypothesis item and render CG videos
Replace faces in each rendered video using Stable Diffusion

The AI generation settings are at Denoise 0.1, meaning no replacement except for the face.

Face swapping is done using an extension called Re Actor.
Previously, we used Roop, but it seems Re Actor is now the mainstream due to its higher performance.

Also, for rendering CG, we prioritize speed and use Blender's Eevee. We also compare with high-quality Cycles towards the end.

Verification Results

Fixed Camera

Here, we verify if the same person appears in each frame.
Let's first look at the face swapping results.

The swapping worked well! We intended to swap "only the face," but the surrounding hair also became slightly more realistic in the blending process.
Using higher-quality CG materials like Cycles should yield even more realistic results.

Now, let's compare the videos.

They look good!
We were concerned about changes in shadows, brightness, or sudden switches to different faces, but everything appears natural.

However, some points of concern include:

Unstable Eye Contact

If you look closely, the eye contact swings left and right. It's not too noticeable, but if the eye movement is slower or has a clear intention, extra care might be needed.

Noise at the Hair Boundary

It might not be evident in the embedded video, but there's noise every time the hair moves at the boundary.
As mentioned earlier, the side effect of the blending process post-face swapping seems to be causing this noise.
It's not too concerning, but currently, there's no workaround.

Camera Movement

What happens when a character appears from the edge of the screen, and the face is partially undetected?

It's too fast to see clearly, so let's look at it frame by frame.

As expected, when it's less than half visible, it can't be detected.

Thinking of Countermeasures

For cut-ins, allowing some leeway for face detection seems wise, so increasing the resolution horizontally might be a good idea.
Although this means longer rendering times and adjusting the ratio for each cut is cumbersome...
It seems like the only option for now.

Camera Panning

Here, we'll see up to what angle the face can be detected.

Let's also examine this closely, frame by frame.

Unlike with camera movement, the face is detected here.
However, there's an error-like behavior where the AI tries to apply a frontal face to the back of the head.

Thinking of Countermeasures

This is very difficult to avoid... A simple solution might be to use the original CG frame where the error occurs.
If the motion is fast, it might go unnoticed, but slow turns might reveal the discrepancy.
It might be best to avoid such shots in AI-based replacements.
Let's hope for future improvements in accuracy.

Camera Zoom

Actually, this was the most unpredictable test.
In prompt-based image generation, maintaining consistency in images where the subject size changes, despite fixed seed values or using ControlNet, is challenging.

In the previous blog post, we successfully fixed faces using Roop.
How about with Re Actor?

It's successful!

Like with a fixed camera, there's a slight tendency for the eyes to wander, but it's at an acceptable level.

As I mentioned at the beginning, the reason why swapping faces is the most versatile is evident here.

If we could also fix hairstyles in the same way, it would greatly enhance practicality.

Other: Comparison of CG Rendering Engines

We've been looking at AI generation against Eevee rendering. Let's finally compare with high-quality, more time-consuming Cycles rendering.

Nice!
With proper shadows, there's a significant increase in realism!
Even when swapping faces with AI, inheriting shadows from CG means it's good to have proper lighting and rendering.

Cost

Let's look at the cost aspect, including generation time.

Item	Per Frame	Total (90 Frames)
AI Generation Time (Face Only)	10 seconds	15 minutes
Rendering Time (Eevee)	1 second	90 seconds
Rendering Time (Cycles)	10 seconds	15 minutes

The specs of the test machine are RTX2080, 32GB memory. It's a high-end machine from a few years ago, not so much by today's standards.
The rendering resolution is 1024px.
Since it's not an overnight process, it seems quite feasible to try.

For this test, we kept rendering settings quite conservative. In a production environment, Cycles would likely take more than 60 seconds per frame.

Exploring how to reduce rendering costs, perhaps using Unreal Engine, could be interesting.

Conclusion

How was it?
Just making the face realistic, while the rest of the CG remains unchanged, seems to enhance the quality.

The results of our verification show that current AI technology is sufficient for practical use, especially for tasks involving only face swapping.
However, as tested, there are camera angles that AI struggles with.
In cases where it's difficult to devise countermeasures for AI-detectable frames, it might be necessary to rethink those camera works or cuts.

Moreover, since everything else remains CG, we need to ensure there's no incongruity in textures.
Preparing hairstyles and ethnicities close to the target is a bit of a hassle.
Replacing everything from clothes to hairstyles in videos with consistency seems a bit further down the road.

AI is finally entering the era of videos.
I hope to create rich content by combining CG and AI!

Recommended Bookmark List

To last...

For those who haven't yet implemented Stable Diffusion, it's become much easier compared to the beginning of the year, though it can be overwhelming with the deluge of information.
I'll introduce a few useful tutorials.

AI is in wonderland offers clear, up-to-date explanations and is highly recommended!

Back to blog

Blog

View all

[Easy on Your Smartphone] How to Create ZEPETO Items with Maison AI

April 5, 2024

[Easy on Your Smartphone] How to Create ZEPETO Items with Maison AI

April 5, 2024
MaisonAI: AI Beginners Take On the Dance Costume Creation Project Using the 'Store'

March 15, 2024

MaisonAI: AI Beginners Take On the Dance Costume Creation Project Using the 'Store'

March 15, 2024
New Feature! Testing Image Prompts Offered in the Store! Logo Design Edition

February 29, 2024

New Feature! Testing Image Prompts Offered in the Store! Logo Design Edition

February 29, 2024

1 3

View all

Item added to your cart

[Image Generation AI] Innovative! Realistic Face Swapping Experience with CG Videos