[What are the Challenges?] Points to Note When Using Midjourney / describe for Generating Fashion Images

[What are the Challenges?] Points to Note When Using Midjourney / describe for Generating Fashion Images

Hello! I'm Nagashima from the CG team.

In the previous blog,"Introducing an Incredible Feature : The Basic Usage of Midjourney's "Describe" Command", I explained the basic usage of the "/describe" feature in Midjourney.

This time, we're going to take a step further and delve into the intricacies of generating fashion images using the "/describe" feature, and the things to be cautious about during this process.

(The previous blog was based on a trial using Midjourney v5, but as of June 30, 2023, the version has been updated to v5.2. I plan to include the changes brought by v5.2 and current functionality in this discussion.)

The Advantages and Strengths of "/describe"

The "/describe" feature analyses the content of an image and generates a prompt based on that analysis. It assists users in understanding how to verbalize an image, and provides new perspectives and expressions through AI's unique analysis. Moreover, through diverse prompt generation, users have the opportunity to view images from various angles. The verbalization of images by AI is a significant advantage of this feature.

How to Use the "/describe" Feature

The "/describe" feature can be used for verbalizing complex images, and AI's analysis results can be used as references. By comparing multiple prompts, you can understand which features each prompt emphasizes and use this information to create more refined prompts. Furthermore, it allows for the creation of custom prompts that incorporate your own intentions, enabling you to maintain your perspective while utilizing AI.

In this blog, I would like to examine the imperfections and limitations of the prompts generated using the "/describe" feature.

The Constraints of the "/describe" Feature

While the "/describe" feature may seem all-powerful at first glance, it has some issues in certain areas. Therefore, it is necessary to understand the limitations of this feature and explore methods to complement or improve these limitations.

The most challenging issue I've encountered is as follows:

It's challenging to generate images of designs that are unfamiliar or unique in line with your expectations.

Especially with unique designs, the resulting image often does not meet expectations. Let's take a look at an image generation using the "/describe" feature with the following image.

We will use a design image where the shoulders and chest are significantly cut out (cut-out design) and test it with "/describe." We will examine how the cut-out design is reflected in the generated images, in both v5 and v5.2.

First, let's test with v5.

v5 Test Results: The cut-out design was not reflected in the generated image. You can see the AI trying to incorporate aspects like the lace at the chest, and the openness at the shoulders and sides.

Next, let's see what happens with v5.2.

v5.2 Test Results: It seems like the issues from v5 have not been resolved. Although the overall image appears more realistic after the version upgrade, the cut-out elements seemed better reflected in v5.

After testing with both v5 and v5.2, it seems that when you want to specify unfamiliar or unique designs, due to the lack of training data for certain prompts, the AI struggles to understand them properly, making it difficult to generate images of designs as you would expect.

The "/describe" function did not yield the image results we were expecting. In such cases, the problem cannot be solved alone, and it seems necessary to utilize other tools.

I am considering trying out changing the design of clothes with the AI function of Photoshop (Beta) that was introduced in the recently published blog, "Utilizing Photoshop (Beta) AI for Fashion!"

We will throw the image generated by Midjourney into Photoshop (Beta), select the part we want to change, input a prompt, and make a design change. First, we input "Hollow out all the lace on the chest."

The chest is cut out and the bare skin is visible!

Next, I will try changing the fabric of the turtleneck top to knit material. I selected the entire top and input "thick knit, turtleneck sweater."

The turtleneck top has changed to a knit material!

It is possible to adjust the design of images using the AI functions of other tools in this way. Midjourney and other tools have various functions, and how to utilize them is up to the user. It is a mystery whether we can adjust until we reach our ideals, but it is possible to get closer.


There are also the following issues:

The generated prompts tend to use proper nouns that generate licenses frequently.

When generating images with the "/describe" function, the suggested prompt may contain proper nouns that generate licenses. Proper nouns have a significant impact on the taste of image generation, and while they can be omitted, it is common to move away from the image by doing so.

At our company, we have many cases where we generate images using general terms in the prompt, considering the license, and without including proper nouns such as artists and brands, so even if we use the "/describe" function, if we exclude the proper nouns, it is often difficult to reproduce the original reference image.
The "/describe" function tends to generate images with prompts that incorporate proper nouns, and it is difficult to get appropriate results and make adjustments from there if you want to avoid proper nouns and generate images.


Continuing, I also feel that the following is a problem:

It's difficult to utilize as it suggests keywords that are not recognized by users or in general as proper nouns.

When generating images with "/describe", there are often unique prompts such as "~~ kei" and "~~ punk."

The following is an example of "unrecognized keywords being suggested as proper nouns by users."

💡 Example: oshare kei
High fashion editorial photography, a woman is standing on the sidewalk looking at a crosswalk, in the style of oshare kei, red and azure, konica big mini, grandparentcore, edogawa ranpo, dark gray and green, travel

I searched for "oshare kei".
I didn't know that there's a genre like this!

The next is an example of "keywords that are not generally recognized are suggested as proper nouns."

💡 Example: china punk
High fashion editorial photography, contemporary art, diamond earrings, pink nails, hyper-realistic sculptural style, pixelated, shot at 70mm, china punk, trompe l'oeil, fashion, rubber

I also searched for "china punk", but there weren't any particularly notable sites that came up. There are also terms like "kawaii punk", which can be considered as a punk element.

In this way, there are examples where keywords that are not recognized by users or in general are used as proper nouns in prompts and are suggested, and we are confused about how to handle them.

One of our team members said, 'I don't understand living English, so I don't even know if such a word really exists, and I hesitate to use it.' Therefore, the decision to use/not use it is left to the individual, and the research itself often feels like a hassle.

The Value of Challenging and Expectations for the Future

In the midst of generating images from text prompts with Midjourney, there are times when we can imagine in our heads but can't think of a prompt, or even if we can, it doesn't reflect well.

While utilizing the "/describe" function, there are limits to what can be done against its constraints, so we would like to get along well while protecting technological developments and combining various tools.

Look forward to the next blog post!!

Back to blog

Leave a comment

Please note, comments need to be approved before they are published.