Trying and comparing various Image to 3D services
I've tried out a mix of services I've been curious about and some recent discoveries all at once ☺
Let's use the same image and compare the results generated by each service.
Here is the image we used as input this time, generated by Midjourney.
The prompt was: sneaker from side view --v 5
When preparing an image for verification, Midjourney is very handy!
First, let me introduce each service and the workflow up to generation.
Shap-E
Shap-E is a service developed by OpenAI, which many of you might know from Chat-GPT.
It offers two types: Tex to 3D and Image to 3D, and this time we used Image to 3D.
The generation time ranges from 15 seconds to 40 seconds at most, so it's pretty quick.
There are several options you can change during generation.
Playing around with it, I noticed significant variations in generation depending on the "Seed" value.
Even if what you initially thought doesn't get generated, if you leave the Randomize seed checkbox checked and keep generating, something close should come out.
I actually generated it about 30 times during this verification☺
The settings of the model created for verification this time are as follows.
Seed | 368591701 |
Guidance scale | 3 |
Number of inference steps | 64 |
Alpha3D
Alpha3D is a service I happened to find while researching image to 3D.
Here, they offer two services, AI Lab and Designer Studio, and this time we used AI Lab.
What's interesting about their service is that it specializes in sneakers.
Although the usage is just uploading an image, you need to be careful as the direction of the shoes is limited to those taken from the side.
There is a My Project page, and from there you can easily see a list of what you have generated.
The generation time was about 2 to 3 hours.
CSM
CSM (Common Sense Machines) is a bit of a hot topic right now.
The characteristic of this service is that it generates using Discord.
※For detailed generation methods, there is a video provided by the official, so please check it out.
When you generate on Discord, an image like a three-view drawing is first generated.
Click on Get 3D below it to start the generation.
What's generated can be checked from the link to the Showcase.
After a while, what you have generated will come up.
This time it took about 1.5 hours to generate.
Also, I thought the first generation was a bit meh, so I tried generating it twice for a test.
I thought the second one was better as a comparison material than the first one, so I'll use this model in the following!
Comparison
Let's compare what we've generated.
When lined up, they're all quite different which is fascinating.
Shap-E
The outline somewhat feels like a shoe.
Also, only Shap-E has the space to put your feet in, which gives it a shoe-like feel.
However, expressing detail seems difficult.
It didn't come with textures, and the vertex colors were set from the start.
Alpha3D
As a service specialized in sneakers, I had high expectations but...
The sole part feels like a shoe as it's differentiated in material feeling from the upper part.
However, laces and such are only expressed as textures, so there's no three-dimensional feel in the detail. Without textures, it's pretty much like a shoe tree.
Also, if you look from directly behind, you can see that the texture is not fully reproduced.
As a side note, here is the texture image.
CSM
I felt the expression of detail was the best here.
Even without texture, it's slightly recognizable.
I thought the balance was good when viewed from all angles, not just the input image view.
Here is the texture image.
Comparison Summary and Findings
The format, polygon count, presence or absence of texture for each service's model are as follows.
(Note: Shap-E was generated with both feet this time, but to make comparisons easier, one side has been deleted.)
Shap-E | Alpha3D | CSM | |
---|---|---|---|
Format | glb | glb | glb |
Polygon | 69,512 | 11,708 | 30,334 |
Texture | None | Yes | Yes |
Generation Time | 15~40 seconds |
2~3 hours | 1.5 hours |
As with the 30 times on Shap-E and twice on CSM this time, the results vary each time it's generated, so I thought it's important to try and generate it several times.
In conclusion,
I tried to see what kind of generation results could come from the same input image in each service.
It still seems hard to get up to the detail level, but it's amazing technology that it can generate this much from a single image in such a short time.
Also, the services I used this time can be tried for free, so it's nice to be able to get started right away if you think you want to give it a try ☺
I'd like to try out more interesting image to 3D services if I find them!
(Note: Some services have a limit to the number of times they can be tried for free, and changes may occur, so please check before using.)