Qwen Image Dominates Text to Image 700 Tests Reveal Why Its Better Than FLUX Presets Published

Qwen Image Dominates Text-to-Image: 700+ Tests Reveal Why It's Better Than FLUX - Presets Published

Full tutorial link > https://www.youtube.com/watch?v=R6h02YY6gUs

I have done over 700 generations to find out the very best configuration for generating the very best images in Qwen Image model. After this research I have published 1-click to use presets with maximum quality and realism. Furthermore, I have compared Qwen Image model to the current king FLUX Dev and FLUX Krea Dev. I have concluded that the new king is Qwen Image and it is the future. This full step by step tutorial and guide video is for you to start generating the most amazing images with Qwen Image with most easy way.

🔗Follow below link to download the zip file that contains SwarmUI installer and AI models downloader Gradio App - the one used in the tutorial for downloading models, presets, prompt generator guide txt ⤵️

▶️ https://www.patreon.com/posts/SwarmUI-Installer-AI-Videos-Downloader-114517862

▶️ How to install SwarmUI main tutorial : https://youtu.be/fTzlQ0tjxj0

🔗Follow below link to download the zip file that contains ComfyUI 1-click installer that has all the Flash Attention, Sage Attention, xFormers, Triton, DeepSpeed, RTX 5000 series support ⤵️

▶️ https://www.patreon.com/posts/Advanced-ComfyUI-1-Click-Installer-105023709

▶️ RunPod SwarmUI & ComfyUI Install Tutorial : https://youtu.be/R02kPf9Y3_w

▶️ Massed Compute SwarmUI & ComfyUI Install Tutorial : https://youtu.be/8cMIwS9qo4M

🔗 Python, Git, CUDA, C++, FFMPEG, MSVC installation tutorial - needed for ComfyUI ⤵️

▶️ https://youtu.be/DrhUHnYfwC0

🔗 SECourses Official Discord 10500+ Members ⤵️

▶️ https://discord.com/servers/software-engineering-courses-secourses-772774097734074388

🔗 Stable Diffusion, FLUX, Generative AI Tutorials and Resources GitHub ⤵️

▶️ https://github.com/FurkanGozukara/Stable-Diffusion

🔗 SECourses Official Reddit - Stay Subscribed To Learn All The News and More ⤵️

▶️ https://www.reddit.com/r/SECourses/

Video Chapters

00:00:00 Introducing Qwen Image: The New King of Text-to-Image

00:00:22 Why Qwen is King: 700+ Generations & Extensive Testing

00:00:45 Qwen vs FLUX Models: A Detailed Comparison of Strengths & Weaknesses

00:01:15 One-Click Setup: Easy Installation & Pre-configured Presets

00:01:31 Secret to Perfect Prompts: A Surprising Automatic Generation Method

00:01:53 Low VRAM? No Problem with SwarmUI & ComfyUI Backend

00:02:21 Real-Time Image Generation Showcase (Dragons & Warriors)

00:03:14 Tutorial: How to Automatically Generate Prompts from Reference Images

00:03:41 Essential Prerequisites: Updating SwarmUI & ComfyUI

00:04:07 Step 1: Download & Extract the Newest Zip File (Version 62)

00:04:27 Step 2: Running the Update Scripts for ComfyUI & SwarmUI

00:05:07 Step 3: Importing the New Qwen Presets into SwarmUI

00:05:35 Exploring the New Presets: High Quality vs Realism Fast

00:06:21 Step 4: Downloading the Required Qwen Core Models

00:07:05 Low VRAM Alternative: Using the Q4 Quantized Model

00:07:34 Important Troubleshooting: How to Fix Potential Black Image Bugs

00:08:07 Qwen Technical Details: Resolution Requirements & Advantages

00:08:55 Analyze My Tests: Accessing & Using the Comparison Grids

00:09:53 Amazing In-Image Text Generation: Creating YouTube Thumbnails with Qwen

00:11:43 In-Depth Comparison Grid: Qwen vs FLUX Dev & FLUX Krea

00:12:25 Visual Comparison: Anime, Dinosaurs, and Complex Scenes

00:12:53 Realism Comparison: Where FLUX Krea Still Wins (For Now)

00:13:13 Mind-Blowing Prompt Following: Qwen's Biggest Strength

00:14:07 The Future of Qwen: Fine-Tuning & LoRA Training with Kohya

00:14:34 Final Thoughts & My Remote Generation Setup (Vast.ai)

00:15:40 Pro Tip: Using Wildcards for Automated Batch Image Generation

00:16:11 Final Image Showcase & Conclusion

Exploring Qwen-Image: Alibaba's Breakthrough in AI Image GenerationIn the rapidly evolving field of artificial intelligence, Alibaba's Qwen team has unveiled Qwen-Image, a groundbreaking 20-billion-parameter foundation model for image generation.

At its core, Qwen-Image excels in superior text rendering capabilities. Unlike many predecessors that struggle with legible text, it handles multi-line layouts, paragraph-level semantics, and fine-grained control over fonts, styles, and positioning.

This makes it ideal for creating stunning graphic posters, advertisements, and designs with embedded text in both English and Chinese.

It also extends to multimodal tasks such as view synthesis, image segmentation, and depth estimation, making it a versatile tool for creative and technical applications.

Early benchmarks show it outperforming other open-source models in text-heavy scenarios.

For industries, Qwen-Image democratizes high-quality image creation. Graphic designers can rapidly prototype posters, while marketers generate localized content with accurate bilingual text. In education and entertainment, it enables custom visuals for stories or simulations.As AI image tools proliferate, Qwen-Image stands out for its precision and accessibility.

Some background music by NoCopyrightSounds : https://gist.github.com/FurkanGozukara/681667e5d7051b073f2e795794c46170

Video Transcription

00:00:00 Greetings everyone. Today I'm going to introduce you to Qwen Image, which is the newest king of
00:00:06 the text-to-image models. When I say king, it is not an exaggeration. This is the newest,
00:00:14 very best model to generate images from text prompts. And how I am sure of it?
00:00:22 I have done over 700 generations of Qwen Image and compared the very best presets
00:00:31 of this model with FLUX Dev and FLUX Krea model. Preparation of the very
00:00:38 best workflow for this model took a lot of time, but the results are mind-blowing.
00:00:45 This model, Qwen Image, produces better images than the FLUX base dev model in
00:00:52 every case. Moreover, when it comes to understanding complex scenes and prompts,
00:00:59 this model is unchallenged. The only weak side of this model is that currently, it is not as
00:01:05 realistic as FLUX Krea model, but other than that, in every case, this model produces amazing images.
00:01:15 To make it easy for you, I have prepared one-click to download these models and one-click to apply
00:01:24 presets and right away use it. You see, currently I am generating some random images real-time and
00:01:31 they are just excellent quality. And I even didn't write the prompts of these images. How
00:01:37 I made these prompts will surprise you. It is just so easy and so elegant, and it is working amazing.
00:01:46 All of these images are real-time being generated and you are seeing them as they are generated.
00:01:53 It is not, of course, too fast because it is a big model, but since we are using SwarmUI with
00:02:00 ComfyUI backend, as long as you have sufficient amount of RAM memory, you will be able to generate
00:02:07 amazing images even on low VRAM GPUs. You see, these are the previews of the images that are
00:02:14 being generated and as they get generated, we will see them. For example, this is another new image.
00:02:21 This is another amazing image. This is another amazing image. I mean, look at this detail.
00:02:27 Look at the anatomy, the accuracy, everything is just perfect. You see, this is the tail and this
00:02:34 is a full dragon. I mean, dragons are not even real. However, this model knows it very well.
00:02:41 And this is a warrior challenging to the dragon. I mean, look at this image. Look at the quality.
00:02:48 This model is unchallenged, and this is just released. With the fine-tuning,
00:02:53 with the new LoRAs, this model will get only better. And look at the contrast. It is able
00:03:00 to generate two completely different images in a single image like this. And I even didn't write
00:03:06 the prompts. They are all automatic generations, and I'm going to show you how I made it right now.
00:03:14 I just uploaded the images from CivitAI. You see, like this, and I gave this prompt: "Write prompt
00:03:22 for each attached image and separate each prompt with this." Whatever you want, you can make it new
00:03:27 line, but the key thing here is that I am using Video Models Prompt Generate guidance, which I
00:03:34 have introduced you in the previous video. So, how you are going to use this model in SwarmUI?
00:03:41 If you have watched our latest tutorial about Wan 2.2, you already know, but if you haven't yet,
00:03:48 I recommend to watch it. For installation of SwarmUI and ComfyUI, we have this tutorial.
00:03:54 Both of them will be in the description of the video, so you will be able to quickly find it.
00:04:00 So, all you need to do is install them and use our newest zip file. Our newest
00:04:07 zip file is located here. The link will be in the description of the video. Download SwarmUI
00:04:12 Model Downloader version 62. As usual, extract it into your previous installation folder or
00:04:20 wherever you want to install. Right-click, I will use this one, extract here, override files. Then,
00:04:27 this is super important. To be able to use this new model, you need to update your ComfyUI and
00:04:34 SwarmUI. For updating ComfyUI, I will use Windows Update ComfyUI, as usual, and it will update it
00:04:40 automatically for me. Don't forget that. Then, you also need to update SwarmUI. For SwarmUI updating,
00:04:48 we have Windows Update SwarmUI, and this will update my SwarmUI as usual. If you are first-time
00:04:55 installing, watch those tutorials, but if you already followed our requirements tutorial,
00:05:01 all you need to do is just Windows install for ComfyUI and Windows install SwarmUI.
00:05:07 Once your SwarmUI starts, you need to import new presets. You see, there is "Import Preset." This
00:05:14 is important. It is not automatically imported. So, choose file, go back to your extraction of
00:05:20 your zip file, and you will see that there is "Amazing SwarmUI Presets Version 9." Select it,
00:05:27 overwrite and import. Then you will get these two presets: Qwen Image Realism Fast and Qwen Image
00:05:35 High Quality. Then all you need to do is just quick tools, reset params to default,
00:05:40 direct apply, and type your prompt. That's it. Or direct apply and type your prompt. You see,
00:05:46 Qwen Image High Quality also has default negative prompts. However, Qwen Image Realism
00:05:53 Fast doesn't have it because it is using CFG scale 1. Therefore, the negative prompts is not working.
00:05:59 And exactly, I am using that preset right now. The selected preset is Qwen Image
00:06:06 High Quality. It is generating, you see, new images are being generated as I am recording
00:06:12 this video and as we are watching it. Amazing quality. Prompt following,
00:06:18 the prompt understanding of this model is just mind-blowing.
00:06:21 So, what about the models that you need to download? To download the models,
00:06:25 you need to double-click "Windows Start Download Models Up.bat" file and start
00:06:30 it. Never run any of my installers as administrator. Always run them with
00:06:37 double-clicking. Do not forget that. It will install necessary libraries and start the newest
00:06:42 version of our downloader, which is SwarmUI Model Downloader version 62. And in here,
00:06:48 all you need to do is go to SwarmUI bundles and Qwen Image Core bundle. Download it. This
00:06:55 will download the Qwen Image GGUF Q8 model, Qwen necessary clip model, and Qwen VAE file.
00:07:05 If your VRAM is low and if you don't want to use block swapping a lot,
00:07:09 what you can do? Go to image generation models, go to Qwen image models and download Qwen Image
00:07:15 Q4 GGUF file. You see these are the sizes. This model quality is also excellent. However,
00:07:22 if you have sufficient amount of RAM memory, you can use Q8 and not lose
00:07:27 any quality. The SwarmUI will just work good with automatic block swapping of the ComfyUI backend.
00:07:34 Currently, if you use --use-sage-attention, it may fail. So, try with it. If you get black output,
00:07:42 just remove it because it is getting updated. It is not fully working yet. So, this optimization
00:07:48 may cause black output. Moreover, in the server configuration, in the very bottom, if you get
00:07:55 black outputs, disable this "Allow GPU-specific optimizations." The team of ComfyUI is working to
00:08:02 fix these issues. It could be fixed when you are watching this, but I am just letting you know.
00:08:07 One another restriction of this model is that the resolution has to be divisible to 16. Its
00:08:14 default resolution is 1328 to 1328, so it is about 70% bigger than the FLUX model. So,
00:08:23 when we fine-tune or when we LoRA this model, hopefully, it is coming very soon hopefully,
00:08:30 it will be able to learn much more details than the FLUX itself because it has a better base
00:08:36 resolution. And you see the quality is amazing. The realism is not there yet, but when we
00:08:42 fine-tune or when we LoRA train, it will be there. I am pretty sure. However, this is the new leader
00:08:48 of the image generation models from text. And when I say that, I am not exaggerating or I am
00:08:55 not saying it out of nothing. When you follow the post, the link will be in the description
00:09:00 of the video, you will see that I have shared the grid tests that I have made. You need to put them
00:09:06 into your SwarmUI > output > local > grids. When you put them here, they will be ready to follow.
00:09:14 And then restart your SwarmUI, go to Tools > Grid Generator, and load grid config, and you
00:09:20 will see the grids here. There are lots of grids, not only this one. When you download it, you will
00:09:26 see lots of grids like here. You see Qwen Image and other ones. You see, I did massive number of
00:09:32 grid testing, and this is just one of them. And after analyzing all the results, I came up with
00:09:39 these presets. So, this was a huge work done by me. So, you can also analyze every image, every
00:09:46 configuration that I have tested yourself on your computer with highest quality and see yourself.
00:09:53 Furthermore, this model is amazing at writing text. You see, "New King Image
00:10:02 Models Qwen has arrived." This is how I have generated the thumbnail of this video. So,
00:10:10 this is the new thumbnail generation, if you ask my opinion. And what I did was extremely lazy.
00:10:17 I just added this to the random prompts. Let me show you. So, with a better approach, you can get
00:10:25 even better text. The image has the following text with an amazing 3D font: "New King of Image Models
00:10:32 Qwen has arrived." And then I just added the other prompts. So, this is a very lazy way of working.
00:10:40 And if you look at the final prompt, it is like this. Let me show you so you will see what I
00:10:46 mean. The image has the following text with an amazing 3D font, blah blah blah. You see,
00:10:51 this is a very lazy way of writing the prompt and even at this way, it is able to generate
00:10:58 amazing images. I mean, look at the beauty of this text written on the image. This is just amazing.
00:11:04 In some cases, it is failing to write "Qwen" accurately, probably because it is not an
00:11:09 English word. However, as you try more, you will get the perfect text like this. And this
00:11:16 is another one. I mean, look at this. It is also matching the text color and style with the rest of
00:11:23 the image as well. Making something like this would take a lot of time, but now with Qwen,
00:11:30 we can have amazing images with a beautiful text written on them like this. So, with Qwen,
00:11:37 now you can generate your thumbnails with just prompting and not spending any time.
00:11:43 And finally, I have compared my very best preset of FLUX Dev, FLUX Krea Dev,
00:11:51 Qwen Image Realism Fast, and Qwen Image High Quality. So when we analyze the results, FLUX Dev
00:11:57 is inferior to Qwen image at every case, whatever you can think of. The FLUX Krea Dev has better
00:12:04 realism at certain prompts than the Qwen. I can say that. For example, we see that FLUX Krea has
00:12:11 a better realism, as you can see, than the Qwen Realism or Qwen Highest Quality. But you know the
00:12:18 resolution is 70% lower than the Qwen image. For example, this is not a very realistic scene. We
00:12:25 can say that at this scene, the Qwen is just much better. At anime, again, Qwen image shines. And we
00:12:33 can say that when it comes to not very realistic images like dinosaurs, which we do not have any
00:12:38 realistic image, again, the Qwen shines. You see, this is much better than the FLUX Krea, and it is
00:12:46 much more accurate. Or this is like a 3D scene. Again, the Qwen is shining, much, much better.
00:12:53 But when we come to the human, FLUX Dev is, as we know, it is not very good. FLUX Krea is excellent,
00:12:59 shines at the human, and Qwen is not there yet. I mean, the realism is not there yet for humans,
00:13:06 but the base is so good. So with LoRAs, which I assume they will pop anytime, it will get
00:13:13 much better, or with just fine-tuning. But when it comes to understanding prompts, it shines. I mean,
00:13:19 look at this prompt. Just pause the video and read it. This is FLUX Dev. Look at this. This is FLUX
00:13:25 Krea. I mean, nothing like that. And this is Qwen Realism, and this is Qwen High Quality. Qwen High
00:13:34 Quality is just mind-blowing. It is able to follow the prompt amazingly. It is just so perfect.
00:13:40 I shared these grids, they are in the post, they are public, you don't need to be even subscribers,
00:13:45 so you can just download and look at your computer. Again, this is a realism-related
00:13:50 prompt, and the FLUX Dev, as we know, not very good. FLUX Krea, really good,
00:13:57 realistic. Qwen Realism preset we prepared, it is also pretty decent at this prompt,
00:14:03 and this is Qwen High Quality. The realism is not there yet.
00:14:07 Hopefully, I will make a full tutorial and a very easy-to-use graphical user interface to
00:14:12 train Qwen with Kohya's Musubi Tuner. Kohya is working on that, and we will be able to generate
00:14:19 amazing LoRAs and amazing fine-tunes from Qwen. I am pretty confidently saying to you that this
00:14:26 is our new very best text-to-image model which we will use from this moment. And let's see some of
00:14:34 the more generations that have been completed. Currently, I am on vacation, therefore I am not
00:14:40 on my regular computer. So, I am using Vast.ai compute to generate these images. But you know
00:14:46 I have covered all of them. You can use Vast.ai compute, you can use RunPod, you can use your own
00:14:51 local GPU, and it will work very well because we are using ComfyUI backend and therefore,
00:14:58 it is fully optimized, working amazing. And the image quality is just mind-blowing. We don't see
00:15:05 anatomy errors, we don't see, you know, other things that shouldn't be there. I mean, look at
00:15:10 this. The foot is accurate, you see? Look at this. The fingers are accurate. I mean, everything is
00:15:17 accurate. This is just an amazing model, believe me. You will love this model. So therefore,
00:15:22 I'm recommending you to try this model, and I am pretty sure that it will be your new best model.
00:15:28 Hopefully, see you in future tutorial videos. Ask me any questions from Patreon, from YouTube reply,
00:15:34 join our Discord channel. I am expecting you there. All of them is just available to you
00:15:40 publicly. And the new images are coming. So if you are wondering how I am generating these random
00:15:45 things, I went to the wildcards and generated with wildcard, you see? Every line becomes a new
00:15:52 prompt. This is how it works. So it is generating them like this. So in the prompt, I just typed
00:15:58 it. When you click it, it adds it to there, and it will randomly pick a prompt and generate it.
00:16:06 You see? So good. Okay, this is another good image. I mean, really good, really, really.
00:16:11 And this is another one. The composition, the complex prompt following, it is just perfect.
00:16:18 So when we fine-tune this model with ourselves or our art set, style, I think it will be amazing.
00:16:26 And I am hoping that this model also will be able to learn multiple subjects, multiple person at a
00:16:33 time. So we will see. And this is the thing that I did. Seed -1, 999 images and just generate. When
00:16:43 I click just generate, it generates. Okay, thank you so much. Hopefully, see you later.

Uh oh!

Qwen Image Dominates Text to Image 700 Tests Reveal Why Its Better Than FLUX Presets Published

Qwen Image Dominates Text-to-Image: 700+ Tests Reveal Why It's Better Than FLUX - Presets Published

Full tutorial link > https://www.youtube.com/watch?v=R6h02YY6gUs

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!