[Image Generation AI] CG Avatar Replacement using Stable Diffusion

May 16, 2023

Hello! I'm Ushiyama, the lead designer of the Omnis CG team.

We work daily on CG production for apparel. Today, I'd like to share a challenge we undertook using 3D CG and image generation AI.

Apparel CG and Digital Humans

Creating digital humans, CG avatars that are so realistic they can be mistaken for humans, still has significant hurdles in 2023. Human faces are incredibly complex, and it is a daunting task to reproduce subtle expressions and skin textures in CG.

Therefore, these productions are often left to a select group of exceptional artists or large-scale CG productions.

In the apparel industry, while the clothes depicted in CG are stunningly real, the avatars still retain some unnaturalness. As a result, it's common to use CG mannequins as avatars or make the person invisible.

If image generation AI could easily create photorealistic avatars, it would further evolve the expression of CG for apparel.

CLO Virtual Fashion, Inc. owns all rights to the avatar displayed.

What We'll Do Today

We often use Midjourney in our blog, but this time we'll use a different image generation AI, StableDiffusion.

These two are often compared, but the most significant difference is that with StableDiffusion, you can freely specify which region of the image to generate and control it with a wealth of parameters. This allows designers to use it like applying a filter in Photoshop.

So, today, we'll use our digital human, a girl still unnamed but called "the perm girl". She looks "realistic", but you can still sense the CG. Let's use image generation AI to transform her into a photo-realistic appearance!

Test 1: Generation with Initial Values

One feature of StableDiffusion is the ability to upload mask materials. In this case, only the face and hair parts of the image on the left will be transformed by AI. Mask materials can be easily produced in 3D CG, so I think they're a good match.

We'll use the default settings for various parameters. Let's just enter "Photo" in the prompt and generate it!

Hmm... It's not as real as we thought. It's rather creepy. Even though we want to make it real, it seems to be dragged by the original 3D CG image. To guide it, I'd like to add "3D CG" to the negative prompt.

It's become more realistic, but this time, the direction of the face has changed and it's being cut off by the mask material.

So, let's take a closer look at the parameters to deepen our understanding.

Test 2: Parameter Adjustment

The parameters used in this verification of StableDiffusion are the following two:

CFG Scale (hereafter CFG): If high, it faithfully follows the prompt, but there's a risk of distortion.
Default: 7 (130)
Denoising Strength (hereafter Denoise): The higher it is, the further it moves away from the original image.
Default: 0.75 (01.0)

First, ignoring the prompt, let's compare the difference in Denoise. Can you see that it becomes quite soft at 0.3? However, it still has a CG feel. Let's adjust the CFG to make it more faithful to the prompt.

How about that? I think you can see that it has become quite photorealistic. Not only is it soft, but subtle wrinkles and uneven skin color have been added. Yet, it matches the direction of the lighting, so it was combined with CG without any discomfort. Also, I feel like the original character's features, such as the cool eyes, are showing.

This is an image with the same parameters, but if you look closely, you'll see that the features are slightly different. It's a minor difference, but this one gives me the impression of an older era (just my imagination...). Thus, it doesn't always become the same person, so in image generation AI, there's a phrase like "drawing a gacha."

Next, let's increase the CFG and Denoise even further. What will happen?

The realism has increased even more. The direction of the gaze varies, and there are variations in the curliness of the perm. This is the difference from the original image due to Denoise. It's already quite realistic, but let's push it a step further.

We've gone too far! There are too many details = more wrinkles, and the face has collapsed. I hope you now understand the relationship between CFG and Denoise. In the meantime, I only adjusted the parameters, so I reached this quality in just about an hour.

Conclusion

How was it? Thanks to the power of image generation AI, we were able to create photorealistic characters with astonishing ease, while somewhat considering the atmosphere of the original CG avatar.

This has enabled simple avatars that were only used for design consideration within apparel companies to also be used for rich advertisements aimed at consumers on EC sites.

However, there's a dilemma that to fully reproduce the features of the original person, we must sacrifice photorealism. To overcome this, next time, we plan to use the ControlNet function of StableDiffusion to make bold modifications while retaining the features of the original avatar.