I'm Ushiyama from the CG team.
I have been writing the following blogs on the topics of image generation AI × 3DCG.
I have been exploring how to make CG avatar faces more realistic and consistent using StableDiffusion.
In our previous work, we achieved the goal of making the avatar appear consistent from various angles.
At first glance, one might assume that this would solve the problem of "fixing the identity."
However, more than just angles, the "distance," or the "area of the subject appearing in the image," becomes a significant challenge for generating AI.
Today, we'll observe the results produced by different distances using the same prompt, and by the end, I'll explain how we addressed these issues.
The goal for this time is to:
- Make the face of a distant avatar look realistic
- Ensure the avatar looks like the same person, regardless of the distance.
- Seamlessly blend CG clothing with AI-generated faces (as the clothing needs to be precise CG based on CAD data)
Experiment 1: What happens when the camera distance changes?
Let's see what problems different distances can cause.
As the camera moves farther away, the area of the displayed avatar becomes smaller, resulting in lower resolution.
This essentially means that a change in "camera distance" equates to a change in "resolution."
Images We're Using
The avatar images are rendered in Blender using CLO.All rights to the displayed avatars belong to CLO Virtual Fashion Incorporated. https://www.clo3d.com/
Differences in Resolution
We'll compare the outcomes when generating images with Stable Diffusion at two different resolutions (512px / 1024px).
For integrating with CG, we're using ControlNet's SoftEdge (edge detection) to stabilize poses and facial features.
Additionally, we've enabled the "Restore Face" setting to prevent distortion of the face.
We're using the following prompts, inheriting seed values:
Prompt: Photo, photorealistic
Negative prompt: 3DCG, low quality,
By increasing the resolution, the image quality improved, but it also altered the age appearance.
Differences in Camera
Next, let's examine the differences between a bust shot and a close-up shot taken from a relatively close distance.
The results show that in the close-up shot, the image appears more CG-like.
We're using negative prompts to counteract the CG appearance.
The fidelity to this instruction is controlled by the CFG Scale, and the degree of modification from the original image is managed by the Denoise parameter.
However, it seems that how these parameters apply varies between these two images.
Summary of Experiment 1:
As demonstrated, differences in distance, which equate to differences in resolution, can easily disrupt the stabilization of an individual via ControlNet, prompts, seed values, etc.
We've been making various efforts to enhance the realism of individuals on this blog, but is there no other way to achieve this?
Experiment 2: Face Replacement with Roop
What is Roop?
There are various extensions available to prevent facial distortions and make the results resemble a consistent individual.
While I've tried a range of functionalities like Reference-only and Adetailer, combining them with ControlNet often didn't yield satisfactory results.
Using ControlNet is indispensable when integrating with CG. Ultimately, I landed on Roop.
Roop is an extension that facilitates deepfake (AI-based facial replacement).
Mentioning "deepfake" might conjure up some wary images.
We've heard of incidents where unauthorized replacements of faces of presidents and celebrities have caused controversies.
When used correctly, however, it's a powerful tool.
Originally, a dedicated environment setup was needed, but with Stable Diffusion, Roop can be easily installed and used as an external extension.
For a more comprehensive understanding, you might want to check out this article that demonstrates its capabilities using Mona Lisa:
Stable Diffusion: Easy Deepfakes with Web UI Extensions. (The article written in Japanese)
This time, rather than replacing with a celebrity's face, we aim to replace with a face generated by AI.
Roop Test: Changes in Long Shots
Using this realistically rendered image based on CG, we will attempt to replace the face in a long shot.
The result? The replaced face closely resembles the input image!
Such is the power of deepfake. Incredible!
Images We’re Using
For our next tests, serving as demonstrations as well, we will be using different scenes, more reminiscent of e-commerce sites.
Again, all rights to the displayed avatars belong to CLO Virtual Fashion Incorporated. https://www.clo3d.com/
Deploying Roop in Real-World Scenarios!
First off, let's generate various faces for replacement using Roop.
We aim to depict a Japanese individual, so we're utilizing the BRAV model, renowned for producing realistic Asian faces.
We will attempt to replace with the above-generated face.
The usage is quite simple: just launch Roop from the extension and specify the face image for input.
Here's the result, with the face generated by AI and the clothing rendered in CG.
What do you think? Despite varying distances and angles, the faces are relatively consistent. No more distortions or age discrepancies like before.
With Roop, we can now overcome the challenges posed by differences in resolution.
What Roop Can and Can’t Do
Let's delve deeper into Roop's capabilities.
Beyond mere replacement, you can input text into the prompt to make additional modifications, like changing expressions or directing the gaze.
As shown above, while it’s possible to change ethnicity, adjusting skin tones wasn’t achievable.
Moreover, only the "face" can be replaced, meaning hairstyles from the original will be retained.
Summary of Experiment 2:
As we've seen, the technology of deepfake is not limited to just replacing real-life faces but can also be employed to stabilize faces generated by AI.
By ensuring that the AI consistently renders the same individual, regardless of differences in distance and angle, we enhance the reliability of the final product.
Merging AI's photorealistic faces with CG clothing will undoubtedly enhance the allure of ad images on e-commerce sites.