Midjourney has been making small incremental increases since last few version but the expectation with version 6 is that is a bigger leap and change coming up. In some of the shares the Midjourney team have highlighted several aspects will improve such as:
- Prompt interpretation
- Details across the image – less artefacts
- Upscaled quality improvements
- Text Generation
- Upscale Subtle or Creative
I played around the rating party earlier in the week and found some really nice gems that I saved but now the Alpha release is out so you can try it out yourself. You can use the switch –v 6 or using the /settings command to set v6 alpha as your default.
A simple test can be describing various objects next to each other with different colours assigned to them. My test prompt for this: a red book on a wooden table with a white cup
Version 6 alpha understand this well and is able to render the desired image consistently, the book is red and the cup is white in all the images produced. The table is also wooden.
Below image is v5.2 result with the same prompt but as you can see cup is white in only 1 of 4 images. Book is red in 3 of 4 however the background is somehow red in 2 of 4 images.
So the consistency and coherence to the prompt is much stronger in the version 6 alpha than its predecessors.
To further evaluate the prompt coherence I employed good friend called ChatGPT-3 to describe a more detailed scene and it produced a prompt that went something like: A visually captivating still life features a vibrant bouquet of flowers, a bowl of assorted fruit, an aged book, a ceramic teapot, and a dynamic abstract painting. The carefully balanced composition showcases a harmonious interplay of colors, textures, and forms, inviting viewers to appreciate the beauty in the ordinary and extraordinary.
There are a lot of subjects described here and I wonder how the model will interpret this. Check out the comparison below by sliding the slider left to right. You have the left side which is v5.2 image and right side which is v 6.0 alpha.
Surprisingly the v6 alpha image is more consistent with the prompt and has the book in all the occurrences, the variety of fruit is there where as the v5.2 image struggles to be consistent with the prompt, the book is missing, only a couple of fruit types. As as you can v5.2 interprets the whole thing as painting in grid position 1.
Human beings are much more realistic and the details are vastly improved where the skin is not being imitated by small squiggles and artifacts but actual skin with pores and textures. There are two kinds of upscalers available as well which will add Subtle details and the other more pronounced Creative upscale.
I downsized the images for the web after upscaling these using the Subtle Upscale option. However its not hard to notice the details in the eyes, eyebrows, skin pores and imperfections, and lips. I mean if you didn’t know you were looking at an AI Generated human you won’t have a clue that this is not a real person.
You can also skin folds and creases that should be there naturally. Look at the above image at her right shoulder, you see the pores, tiny hair on the skin. The only odd thing in the above image is one of eyelashes starts to be mashed up with the sunglasses but you only notice that at 1:1 zoom when looking at the full high res image.
You can get more creative with your images and the details continue to stick.
Here is another example which demonstrates the power of the new model, the hair in the beard & head are more pronounced then ever before, the skin pores, a mole or skin tag on the forehead. Due to the depth of field the skin is softened a touch but this could also be due to Subtle upscale. The jacket has jean texture and weaving like the fabric should, the double stitching is present as it should have.
When I upscaled with Creative I see a lot more details in the resulting image. Here is a 1 to 1 zoom snippet of some sections
Notice the pores on the nose and the skin and the eyebrow hair. Eye lashes are well formed and the iris is also very natural. There are even tiny details in the nose bridge of the eye glasses.
As I write this post on the eve of Christmas, I thought I would create some images that have the theme of the moment. The first creation I made was using the prompt: a Christmas theme wallpaper for your smartphone, with text “Merry Christmas ” written in white color, beautiful and elegant, vibrant color tones, centered –ar 9:16 –style raw –v 6.0
The results were surprising well, the text is written in fancy seasonal fonts that fit the moment and the lettering is correctly spelled out. However, after the initial success the next few generations were falling apart with the lettering not being correct.
Overall the images are very beautiful and very well composed but the text and lettering are not correct. I further tried to generate some cards and New Years cards.
The prompt for the new year: a cityscape night scene with fireworks overhead, with text “2024” written in white, beautiful elegant, vibrant colors, centered –ar 3:2 –style raw –v 6.0
It seems that this is getting better with Midjourney version 6 alpha, however it’s a much better improvement than the version 5.2. My repeated attempts were failing me and the resulting images did not have correctly spelled text, although the characters are forming better the new model still struggles to spell the text correctly. Its a hit and miss it seems.
Another passion of mine is cars, I like my luxuries and engineering of European cars, so I had to try out and see how good would Midjourney versions 6 alpha cars would be.
First we start off with a muscle car from the states. Its correctly rendered shapes and lines of the car with the logo and 5.0 lettering appear in the correct locations and being easily recognizable.
Then my current ride Mercedes-Benz C43 looks nice, the double fin on the front is correct resembles of an AMG version and although very tiny the AMG lettering is on wedged in-between the two fins. A normal C200-300 model (non-AMG) would have two separated fins.
Let’s get some Porches from Midjourney and man are these lines and shapes nice or what. The logo however is not correctly rendered in this version which I upscaled.
As we are in the woods and maybe we’ve been having a bit of fun sliding the car around, you see some dirt on the rear bumper/fender and on the tires. The dried pine sticks are on the ground with some grass and moss. Just gorgeous!!
Even though the Midjourney model can do a lot more, I particularly want to focus on the details and quality of the images in few areas that I wanted to explore. The version 6 alpha model is certainly much more improved since its predecessor and has better prompt coherence when interpreting the context. It is consistent in the images and quality, however still lacks in text rendering which is not quite there yet, perhaps some improvements will come in the final version or in future versions.
The details have certainly improved when you upscale an image and this is apparent in the people images generate above. I’d love to compare this quality in a future post against Magnific AI which I have already talked about on this blog (Magnific.AI Upscaler) and compared the upscalers head to head in Upscaler Comparison Midjourney vs Magnific AI