FLUX LoRA Training Simplified From Zero to Hero with Kohya SS GUI 8GB GPU Windows Tutorial Guide

FLUX LoRA Training Simplified: From Zero to Hero with Kohya SS GUI (8GB GPU, Windows) Tutorial Guide

Full tutorial link > https://www.youtube.com/watch?v=nySGu12Y05k

Ultimate Kohya GUI FLUX LoRA training tutorial. This tutorial is product of non-stop 9 days research and training. I have trained over 73 FLUX LoRA models and analyzed all to prepare this tutorial video. The research still going on and hopefully the results will be significantly improved and latest configs and findings will be shared. Please watch the tutorial without skipping any part. If you are a beginner user or an expert user, this tutorial covers all for you.

🔗 Full Instructions and Links Written Post (the one used in the tutorial) ⤵️

▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-110879657

00:00:00 Full FLUX LoRA Training Tutorial

00:03:37 Guide on downloading and extracting Kohya GUI

00:04:22 System requirements: Python, FFmpeg, CUDA, C++ tools, and Git

00:05:40 Verifying installations using the command prompt

00:06:20 Kohya GUI installation process and error-checking

00:06:59 Setting the Accelerate option in Kohya GUI, with a discussion of choices

00:07:50 Use of the bat file update to upgrade libraries and scripts

00:08:42 Speed differences between Torch 2.4.0 and 2.5, particularly on Windows and Linux

00:09:54 Starting Kohya GUI via the gui.bat or automatic starter file

00:10:14 Kohya GUI interface and selecting LoRA training mode

00:10:33 LoRA vs. DreamBooth training, with pros and cons

00:11:03 Emphasis on extensive research, with over 72 training sessions

00:11:50 Ongoing research on hyperparameters and future updates

00:12:30 Selecting configurations based on GPU VRAM size

00:13:05 Different configurations and their impact on training quality

00:14:22 "Better colors" configuration for improved image coloring

00:15:58 Setting the pre-trained model path and links for downloading models

00:16:42 Significance of training images and potential errors

00:17:08 Dataset preparation, emphasizing image captioning, cropping, and resizing

00:17:54 Repeating and regularization images for balanced datasets

00:18:25 Impact of regularization images and their optional use in FLUX training

00:19:00 Instance and class prompts and their importance in training

00:19:58 Setting the destination directory for saving training data

00:20:26 Preparing training data in Kohya GUI and generated folder structure

00:21:10 Joy Caption for batch captioning images, with key features

00:21:52 Joy Caption interface for batch captioning

00:22:39 Impact of captioning on likeness, with tips for training styles

00:23:26 Adding an activation token to prompts

00:23:54 Image caption editor for manual caption editing

00:24:53 Batch edit options in the caption editor

00:25:34 Verifying captions for activation token inclusion

00:26:06 Kohya GUI and copying info to respective fields

00:27:01 "Train images image" folder path and its relevance

00:28:10 Setting different repeating numbers for multiple concepts

00:28:45 Setting the output name for generated checkpoints

00:29:03 Parameters: epochs, training dataset, and VAE path

00:29:21 Epochs and recommended numbers based on images

00:30:11 Training dataset quality, including diversity

00:31:00 Importance of image focus, sharpness, and lighting

00:31:42 Saving checkpoints at specific intervals

00:32:11 Caption file extension option (default: TXT)

00:33:20 VAE path setting and selecting the appropriate VA.saveTensor file

00:33:59 Clip large model setting and selecting the appropriate file

00:34:20 T5 XXL setting and selecting the appropriate file

00:34:51 Saving and reloading configurations in Kohya GUI

00:35:36 Ongoing research on clip large training and VRAM usage

00:36:06 Checking VRAM usage before training and tips to reduce it

00:37:39 Starting training in Kohya GUI and explanation of messages

00:38:48 Messages during training: steps, batch size, and regularization factor

00:39:59 How to set virtual RAM memory to prevent errors

00:40:34 Checkpoint saving process and their location

00:41:11 Output directory setting and changing it for specific locations

00:42:00 Checkpoint size and saving them in FP16 format for smaller files

00:43:21 Swarm UI for using trained models and its features

00:44:02 Moving LoRA files to the Swarm UI folder

00:44:41 Speed up Swarm UI on RTX 4000 series GPUs

00:45:13 Generating images using FLUX in Swarm UI

00:46:12 Generating an image without a LoRA using test prompts

00:46:55 VRAM usage with FLUX and using multiple GPUs for faster generation

00:47:54 Using LoRAs in Swarm UI and selecting a LoRA

00:48:27 Generating an image using a LoRA

00:49:01 Optional in-painting face feature in Swarm UI

00:49:46 Overfitting in FLUX training and training image quality

00:51:59 Finding the best checkpoint using the Grid Generator tool

00:52:55 Grid Generator tool for selecting LoRAs and prompts

00:53:59 Generating the grid and expected results

00:56:57 Analyzing grid results in Swarm UI

00:57:56 Finding the best LoRA checkpoint based on grid results

00:58:56 Generating images with wildcards in Swarm UI

01:00:05 Save models on Hugging Face with a link to a tutorial

01:00:05 Training SDXL and SD1.5 models using Kohya GUI

01:03:20 Using regularization images for SDXL training

01:05:30 Saving checkpoints during SDXL training

01:06:15 Extracting LoRAs from SDXL models

Video Transcription

00:00:00 Hello everyone, today I will be guiding you step-by-step through the process of training
00:00:06 LoRA on the latest state-of-the-art text-to-image generative AI model, FLUX. Over the past week,
00:00:14 I have been deeply immersed in research, working tirelessly to identify the most effective training
00:00:22 workflows and configurations. So far, I have completed 72 full training sessions and more
00:00:30 are underway. I have developed a range of unique training configurations that cater to GPUs with
00:00:37 as little as 8GB of VRAM all the way up to 48GB. These configurations are optimized for VRAM usage
00:00:46 and ranked by training quality. Remarkably, all of them deliver outstanding results. The
00:00:53 primary difference lies in the training speed. So yes, even if you are using an 8GB RTX GPU,
00:01:01 you can train an impressive FLUX LoRA at a respectable speed. For this tutorial,
00:01:07 I will be using Kohya GUI, a user-friendly interface built on the acclaimed Kohya
00:01:13 training scripts. With this graphical user interface, you will be able to install,
00:01:18 set up, and start training with just mouse clicks. Although this tutorial demonstrates how to use
00:01:24 Kohya GUI on a local Windows machine, the process is identical for cloud-based services. Therefore,
00:01:32 it is essential to watch this tutorial to understand how to use Kohya GUI on cloud
00:01:37 platforms, even though I will be making separate tutorials specifically for cloud setups. We will
00:01:43 cover everything from the basics to the expert settings. So even if you are a complete beginner,
00:01:50 you will be able to fully train and utilize an amazing FLUX LoRA model. The tutorial is organized
00:01:56 into chapters and includes manually written English captions, so be sure to check out the
00:02:02 chapters and enable captions if you need them. In addition to training, I will also show you how to
00:02:08 use the generated LoRAs within the Swarm UI and how to perform grid generation to identify the
00:02:15 best training checkpoint. Finally, at the end of the video, I will demonstrate how you can train
00:02:21 Stable Diffusion 1.5 and SDXL models using the latest Kohya GUI interface. So I have prepared
00:02:30 an amazing written post where you will find all of the instructions, links, and guides for this
00:02:39 amazing tutorial. The link to this post will be in the description of the video, and this post will
00:02:45 get updated as I get new information as I complete my new research with new hyperparameters and new
00:02:53 features. So this post will be your ultimate guide to follow this tutorial, to do FLUX LoRA
00:03:02 training by using the Kohya GUI. Kohya GUI is a wrapper developed by the Bmaltais that allows
00:03:09 us to use Kohya SS scripts very easily. This is its official GitHub repository. Basically,
00:03:16 we are using Kohya SS. However, we have a GUI and one-click installers and easy setup to use
00:03:23 the Kohya SS scripts. So you will see that I have a zip file here. When you go to the bottom of the
00:03:31 post, you will also see the attachments, and in the attachments, you will see the zip file. I
00:03:37 may have forgotten to put the zip file at the very top of the post, but when you look at the
00:03:42 attachments, you will always find the zip file. So let's download the zip file. Click here. Move this
00:03:48 zip file to any disk that you want to install. I am going to install it on my R drive. Do not
00:03:55 install it in your Windows folder, into your users folder. Directly install into any drive, or do not
00:04:02 install it in your cloud drives. Okay, right-click and extract it. Enter the extracted folder,
00:04:09 and you will see all of these files. These files may get updated, and there may be more files when
00:04:16 you're watching this tutorial. Don't get confused. I will update everything as necessary. Currently,
00:04:22 the main branch of the Kohya GUI doesn't support the FLUX training. You see, there is a SD3 FLUX
00:04:28 branch. So my installer will automatically switch to the FLUX branch and install it for you. Once it
00:04:36 is merged into the main branch main repository, I will update my installers. To install Kohya
00:04:42 GUI we are going to use the Windows_Install_Step_1.bat file. Double-click it. It will clone the
00:04:47 repository, switch to the accurate branch, and it will start the installation. For this to work, you
00:04:54 need to have the requirements installed. You'll see that we have a Windows requirements section.
00:05:00 We need Python 3.10.11, FFmpeg, CUDA 11.8, C++ tools, and Git. Now, once you install these,
00:05:09 you will be able to use all of the open-source AI applications like Stable Diffusion, Automatic1111 Web
00:05:15 UI, Forge Web UI, One Trainer, Swarm UI, ComfyUI, and whatever you can imagine, like Rope Pearl, Face
00:05:23 Fusion, and such things. So I have a very good tutorial that shows how to install all of this.
00:05:29 Please watch this tutorial. Do not skip it. Please make sure that your Python is directly installed
00:05:34 on your C drive. You have installed FFmpeg, CUDA, and C++ tools. These are all important.
00:05:40 How can you check whether you have installed them or not? Type CMD and open a command prompt like
00:05:45 this. Type Python and you should get 3.10.11. Open another CMD and type Git and you should get Git
00:05:52 code like this. Open another CMD and type FFmpeg and you should get FFmpeg. FFmpeg is not necessary
00:05:59 for Kohya, but you should install it because you may use other AI applications. And type nvcc
00:06:06 --version, and this will let you see whether you have installed the correct CUDA or not. Everything
00:06:13 is explained in this video. Please watch it to avoid any errors. So the automatic installation
00:06:20 will ask you these options. We are going to select option 1 and hit enter. Then this will start
00:06:26 installing the Kohya SS GUI latest version to our computer. Currently, we are in the accurate
00:06:32 branch. As I said, I will update this branch when it is merged into the master branch. Just wait
00:06:40 patiently. You see that it is showing me I have Python 3.10.11. It is installed, and it is going
00:06:46 to download and install everything automatically for me. So the installation has been completed,
00:06:52 and I have the new options. You should scroll up and check whether there were any errors or not
00:06:59 because the installation step is extremely important. So scroll up and check all of the
00:07:05 messages. So as a next step, we are going to set the Accelerate because on some computers it may
00:07:11 not be set correctly. So type 5 and hit enter. It is going to ask us some of the options. We
00:07:18 are going to use this machine. Hit enter. We are not going to do distributed training. Hit enter.
00:07:24 We are not training on Apple or anything. We are training on CUDA drivers. So no,
00:07:29 we are not using Torch Dynamo. No, we are not using DeepSpeed. No, we are going to use all of
00:07:36 the GPUs. So type "all" and hit enter. I also say yes to this, but I didn't see any difference. Hit
00:07:42 enter. And we are going to use BF16. This one. So select this one and hit enter. And we are ready.
00:07:50 And we are ready. You don't need to do option 2, option 3, or option 4. But we are not going
00:07:56 to use option 6 to start yet. So hit 7 and hit enter. Return back to the folder. And whenever you
00:08:03 are going to start training or after the initial installation, run this bat file: Update_Kohya_and_Fix_FLUX_Step2.bat
00:08:11 This is going to upgrade to the latest libraries. It is going to also update
00:08:16 the scripts to the latest version. So we are going to have the very latest version with this
00:08:22 bat file. I will keep these files updated. Just wait for it to install everything. The steps are
00:08:29 also written in this post file, but watching this video is better. And the updates have been
00:08:34 completed. Verify that they are all working. We are currently using Torch 2.4.0. And it is working
00:08:42 slowly on Windows compared to the Linux systems. However, with Torch 2.4.1, hopefully this speed
00:08:52 issue on Windows will be fixed. And as soon as it is released, I am going to update my installer
00:09:00 scripts. There has been a new development after I have completed the tutorial video, which is Torch
00:09:08 2.5 version. With this version, there is a huge speed improvement on Windows. It is almost equal
00:09:18 to Linux. However, Torch 2.5 version is currently in development. So it may be broken. It may give
00:09:26 you errors. I also didn't test its quality yet. Hopefully, I will test it and I will update the
00:09:32 Patreon post. You see, it is already updated. The file that you need to use is: Install_Torch_2_5_Dev_Huge_Speed_Up.bat file
00:09:43 after completing the installation. As I said, this is experimental
00:09:46 and follow the news on the Patreon post. And now we are ready to start using the Kohya SS GUI. So
00:09:54 for starting the Kohya SS GUI, you can either enter the Kohya SS folder and start the gui.bat
00:10:01 file. Or you can use my automatic starter file, which is the Windows start KohyaSS.bat file. This
00:10:08 will automatically start and open the browser. You see the interface has started. If it doesn't
00:10:14 start, you need to open this manually. All right, now this is the interface of Kohya. It is very,
00:10:19 very useful and easy to use. A very important thing that you need to be careful of is that
00:10:25 you need to select LoRA because we are currently training LoRA. LoRA is optimization. So it is
00:10:33 different from DreamBooth. DreamBooth is basically fine-tuning, training the entire model. But with
00:10:38 LoRA, we are only training a certain part of the model. Thus, it requires lesser hardware,
00:10:46 but also results in lesser quality. Currently, this tutorial is for LoRA training. Hopefully,
00:10:51 I am going to do full research on DreamBooth fine-tuning and publish another tutorial for
00:10:57 that. And I am going to publish configurations as well. So this tutorial is so far the combination
00:11:03 of over 64 different trainings. This is literal. When you go to this post,
00:11:09 you will see the entire research history. This is a very, very long, lengthy post. You will see all
00:11:15 of the tests I have made. You will see all of the results, grids, and comparisons. I am also going
00:11:21 to show some of the parts in this tutorial. So read this post if you want to learn how I
00:11:27 came up with my workflow and configuration. Also, when you open this post, you will see all of the
00:11:33 models that I have prepared up to now. And this is not all. Currently, I am running 8 different
00:11:41 trainings on 8x RTX A6000 GPUs on Massed Compute. You can see the trainings are running right now.
00:11:50 Currently, I am testing a new hyperparameter and the clip large text encoder training,
00:11:56 which has arrived just today. So after these tests have been completed, I am going to analyze them,
00:12:03 post the results on this research topic, and I am going to update the configuration files. Don't
00:12:09 worry, you will just load the configuration. You will just download the latest zip file and load
00:12:14 the configuration, and you will get the very best workflow, very best configuration whenever you are
00:12:20 watching this tutorial. So that is why following this post is extremely important. The link will
00:12:25 be in the description and also in the comment section. Don't forget that. So now our interface
00:12:30 started and I am going to load the configuration for beginning the setup. I have selected the
00:12:36 LoRA in the training tab. When you look at the folder, you will see best configs and you will
00:12:41 see best configs, better colors. And what do these configurations mean? When you scroll down, you
00:12:49 will see the description of each configuration. I have prepared a configuration for every GPU. So if
00:12:56 you have an 8GB GPU, you need to use this one: Rank_9_7514MB.json file. When you enter inside
00:13:05 the best configs folder, you will see that the JSON file is there. So according to your VRAM,
00:13:12 pick the configuration file. There are slow and fast versions. What do they mean? With the slow
00:13:19 version, we are going to get slightly better training quality. There is not much difference
00:13:25 between rank 1 and rank 6. After rank 6, we are going to lose some quality because of the reduced
00:13:34 training resolution and also reduced LoRA rank. If you read the research post, you will see the
00:13:40 difference of each configuration. Training under 1024 pixels seriously reduces the quality. Also,
00:13:49 I find that LoRA rank 128 is the very best spot. So these three configurations will have slightly
00:13:57 lesser quality than these ones. These ones will be very, very close. Since I have an RTX 3090 GPU,
00:14:05 I am going to use the very best configuration rank 3. Currently, this is running slow on my PC,
00:14:13 but with the Torch version 2.4.1, it will be much, much faster. So go to here and click this icon,
00:14:22 this folder icon, enter inside the folder of best config and load the rank 3. Now you may wonder
00:14:30 what is the difference with better colors. This is in experimentation. Currently, I am training
00:14:36 it to decide whether to fully switch to it or not. But this uses time step shift sampling
00:14:43 and it makes a huge impact on the coloring. When you open this file, you will see this very big,
00:14:50 huge grid. And when you look at this grid file, you will see that using the time step shift
00:14:57 brings better colors and a better environment like this one. However, it also slightly reduces the
00:15:03 likeness. I am still researching it. I am still training it. When we open this imgsli link,
00:15:09 you can see a one by one comparison. So according to your needs, decide the configuration you want.
00:15:15 Hopefully, I will complete research on time step shift sampling and I will decide the
00:15:20 very best configuration. Currently, we have two different configurations and their difference is
00:15:25 like this. But to be safe, you can use the best configs folder right now. However, you can also
00:15:30 use best configs better colors too. So look at the grid file imgsli and decide yourself. OK,
00:15:37 as a next step, you don't need to set anything here. This is for multi GPU. I also have multi
00:15:42 GPU configs and I will show them hopefully in the cloud tutorial. I am going to also make a cloud
00:15:47 tutorial for Massed Compute and RunPod. So what you need to set with this configuration,
00:15:52 you need to set pre-trained model path. This is super important. The model links are posted here
00:15:58 so you can either download them from here or I have a one-click downloader here. You see Windows
00:16:03 download training models files. If you already have some of the models, make sure that they
00:16:08 are accurate models or you will get errors. This bat file will download everything automatically
00:16:14 for me. You see, it started downloading already. However, I already have them downloaded here. So
00:16:19 I am going to just pick them. If you already have them, you can use them. We are going to use FLUX
00:16:26 Dev version 23.8 gigabytes. The FP8 support also arrived, but I haven't tested it. So to be sure,
00:16:34 use the FP16 version. It doesn't matter because we are going to use it in FP8 mode and training with
00:16:42 24 gigabyte GPUs and it will cast the precision automatically. So select this. You see, this is
00:16:48 the base model that I have. Now selecting training images is super important and so many people are
00:16:55 making mistakes here. If you know how to set up Kohya folder structure, you can already set it up
00:17:01 yourself, but don't use it if you are not an expert. So in the dataset preparation section,
00:17:08 we are going to prepare our dataset. The dataset preparation is extremely important. I have written
00:17:15 some information here on how to caption them, how to crop them. I already have an auto cropper and
00:17:22 auto resize script, but you can also manually auto crop and auto resize them. So currently my
00:17:29 training images are auto cropped and auto resized to 1024 pixels as you are seeing right now. These
00:17:36 scripts that you will find are extremely useful. Watch this video and check out this script to
00:17:41 automatically crop the subject with focus and then resize to get your training dataset. Once
00:17:47 you have your dataset like this, copy the path of this dataset or you can also select the path from
00:17:54 here. You see training images directory. Click this icon, go to the folder like this and select
00:18:00 the folder like this. Now this is super important: repeating. What does repeating mean? We are going
00:18:07 to set the repeating to 1 because we are not going to use classification / regularization images.
00:18:13 I have tested the impact of regularization and classification images. You can see the results in
00:18:17 this research post. For example, let's open this post and no matter what I have tried, there is no
00:18:25 way to improve the likeness and the quality with classification regularization images. In none of
00:18:31 the cases it yielded better results. When I have used the classification regularization images,
00:18:38 you see, you will get mixed faces like this. And if you still want to know what does repeat means,
00:18:45 there is a link here. Open this link. In this link, I have asked Kohya to explain the logic
00:18:53 of repeating. The logic of repeating is initially made to balance imbalanced datasets. What does
00:19:00 that mean? Let's say you have 5 concepts that you want to train at once. And one of the concepts
00:19:06 has 100 training images. The other one has 20 images. The other one has 30 images. So in machine
00:19:12 learning, you would like to make datasets balanced as much as equally trained. Therefore, if you have
00:19:19 100 training images, you make the repeating 1. And for other concepts, if there are 20 images,
00:19:25 you make the repeating 5 of that concept. However, since I am training a single concept
00:19:31 right now, I am going to set repeating 1 and I am not going to use regularization images because
00:19:37 for FLUX training, it doesn't improve the results. However, when you train Stable Diffusion 1.5 or
00:19:44 Stable Diffusion XL (SDXL), it really improves the quality of training. At the end of this tutorial,
00:19:51 I will show you how to load the very best configuration for SD1.5 and SDXL and train
00:19:58 them. Don't worry about that. So we did set the training images directory here. We are going to
00:20:03 set the instance prompt. I am going to use a rare token ohwx because it contains very
00:20:09 little knowledge. Also, with FLUX training, it is not as important as before, but still use it. And
00:20:16 as a class prompt, I am going to use "man." Even if you don't use a class prompt, since the FLUX
00:20:23 model has an internal system that behaves like a text encoder and automatic captioning, it will
00:20:30 still know what you are training like you have fully captioned it. So even if you don't provide
00:20:36 any instance prompt or class prompt, it will work with FLUX. I have also tested it. In the research
00:20:43 topic you will see the training results with ohwx man, with only ohwx, and with only man. And you
00:20:52 will see that almost in all cases, it perfectly generates your concept. Still, I find it a little
00:21:00 bit better to set the class prompt as "man" or if you are training a woman, it's "woman." If you are
00:21:05 training a car, it's "car." If you are training a style, it's "style." But training for a man,
00:21:10 I use "man." Even if you don't use it, it will work. If you are training multiple people in one
00:21:16 training, you don't need to set a class prompt. Just give each one of them a rare token like ohwx,
00:21:23 like bbuk, and other rare tokens that you can decide yourself, and train all of them at once.
00:21:30 But currently, this is for single-person training. So this is the setup. Instance prompt, class
00:21:35 prompt, and destination directory. This is super important. The Kohya GUI will prepare the correct
00:21:41 folders and save them at that destination. So I am going to use the installed folder like here. You
00:21:48 can use anywhere, and I will save train images like this and then click prepare training data
00:21:55 and watch the CMD. You will see that it is done creating the Kohya SS training folder structure.
00:22:01 When you enter inside that folder, you see the train images folder generated here. When I enter
00:22:07 inside it, you see there is an image and inside the image, there is 1_ohwx man. So what does this
00:22:15 mean? This means that the one is the repeating. This is super important. If you set this to 100,
00:22:21 at every epoch, it will repeat 100 times. So this is super important. Kohya will always read the
00:22:27 repeating number from here, and the rest will be simply the caption of my images. If I wanted to
00:22:34 caption them, I could caption them, and it will read the captions. If you don't caption your
00:22:39 images, it will read the folder name as a caption. So let's also make a captioned example. I will
00:22:45 copy and paste this like this. The name of this folder will not be important because I'm going to
00:22:49 caption it. So copy this path, and for captioning, I have an amazing application called Joy Caption.
00:22:54 Click here. When you get to this post, you will see the installer for Joy Captioning. This is a
00:23:00 very advanced application. It supports multi GPU captioning as well because I am going to
00:23:06 caption my entire regularization images dataset, and I am going to make a LoRA on that. There will
00:23:12 be a total of 20,000 images that I am going to caption. So it supports multi GPU and also batch
00:23:19 captioning. It is already installed. Let me open it and show you how to caption. So Joy Caption 1.
00:23:25 Let's start the Windows application. Let's start like this. The application started. All I need
00:23:31 to do is give the input folder path here and just start batch captioning. You can decide on multiple
00:23:39 GPUs, multiple batch sizes. All is working. This is a very optimized application. There
00:23:43 is an override existing caption file, append new caption file, and max new tokens. You don't
00:23:49 need to set these parameters. You can just change the max new tokens. I'm going to make a dedicated
00:23:54 tutorial for this application hopefully later. You can see the progress on the CMD here. So first it
00:24:01 is loading the checkpoints to start captioning. The captioning started and it is captioning the
00:24:07 files. When I enter inside this file, you will see that it is generating captions like this. When
00:24:12 you open the text file, you will see the caption it generated for that image. However, if you do
00:24:19 captioning, it will reduce the likeness of your concept. If you are training a person or if you
00:24:26 are training similar stuff. Captioning with styles may work better. I am hopefully going to test it,
00:24:32 but still, if you want to do captioning, you can use this application to caption. I
00:24:37 have also compared the captioning effect. So let's search the caption in this post and let's see.
00:24:45 Yes, here the captioning results are here. Let's open it. And when we look at the caption results,
00:24:51 there will be a slightly reduced likeness. However, I didn't see improvement in the
00:24:58 environment or generalization or the overfitting. This is because it already captioned itself in the
00:25:06 FLUX architecture. So whether I caption or not, it doesn't matter much. The likeness is still there.
00:25:12 It's a little bit reduced and it doesn't improve the overall quality. So for person training,
00:25:18 I don't suggest captioning, but for training a style, it may work better. As I said, hopefully
00:25:23 I am going to research it and make a tutorial for style captioning. After you did the caption,
00:25:28 you still should add an activation token. What was the activation token? "instance prompt". So you
00:25:34 can either edit them manually or I have a caption editor. The caption editor is posted here. When
00:25:41 you go to this link, you will see the instructions and downloader zip file. It's a very, very
00:25:47 lightweight Gradio application. Let's open the image caption editor, start the application. It
00:25:53 doesn't use any GPU. It is just Python code that allows you to edit the images. It is web-based so
00:26:00 it can be used anywhere. So you need to enter the input folder path. This was our folder path. Let's
00:26:06 enter it and it will automatically refresh and scan the images. You see, these are the images.
00:26:10 When I click here, it will show me the caption. I can either manually edit it here like ohwx.
00:26:18 Then I can click save caption and it will save it. Then I can click the next image. Then I can filter
00:26:24 images by processed or unprocessed. This is an extremely useful application. So with processed,
00:26:30 you see this was the saved processed and these are the still waiting ones. This is a very,
00:26:35 very lightweight application. Let's make this again as "man" because I am going to show you
00:26:40 another thing. Let's save the caption. Then in the batch edit options, you can enter the folder
00:26:46 and you can replace words like replace "man" with "ohwx man" and replace all occurrences and it is
00:26:54 going to override all the files. Apply batch edit and all captions are edited. Then you can check
00:27:01 all the captions to see whether they contain my activation token or not like this check word and
00:27:07 you see all captions contain the specific word or phrase. Then when I refresh here and open
00:27:14 an image, you see this photograph features a "ohwx man standing." So all the captions are
00:27:20 now ready to use. Since this folder exists in my images folder, it will train both of them,
00:27:27 but we don't need captioned images now. So I'm just going to delete this folder. OK,
00:27:31 let's return back to the Gradio interface where we set up. After we clicked, prepare training data,
00:27:37 we also need to click "copy info to respective fields." Otherwise, you will see that the image
00:27:42 folder is not accurate. You see it is inaccurate. So click "copy info to respective fields" and you
00:27:48 will see that it did set the folder like this "train images img." So what does this mean? It
00:27:55 didn't give the path of this folder. It gives the parent folder path here so you can have multiple
00:28:02 concepts, multiple persons or items, anything inside this folder, and Kohya will read each one
00:28:10 of the folders and whether you have captioned them or you don't have captioned them, it will
00:28:15 read the folder name or the captions and it will train based on all of the images. You can also set
00:28:21 different repeating numbers for each concept, as I have explained. Let me show you. Let's say I have
00:28:27 3 concepts. This can have 3 repeating. This can have 5 repeating. So the images inside
00:28:33 this one will be repeated 5 times. Images inside this one will be repeated 3 times.
00:28:37 This is the logic of setting up folders. Let's delete them. Now we have set the model path,
00:28:45 trained model output name, whatever the name you give, it will generate checkpoints with that name.
00:28:50 Let's say "Best_v2." OK, this will be the output model name for LoRAs that are going
00:28:58 to get generated. FLUX1 selected. Everything is selected. You don't need to set anything and you
00:29:03 don't need to use dataset preparation again, but you can prepare multiple datasets by using here.
00:29:09 With Stable Diffusion XL and with Stable Diffusion 1.5, we use regularization images. At the end
00:29:15 of this tutorial, I will show that too. So I'm just skipping that part. And the parameters. In the parameters what you
00:29:21 need to set are a few things. First of all, how many epochs you are going to train? We are going
00:29:28 to train based on epochs and one epoch means that all of the images are trained one time.
00:29:35 When we use regularization images, we use a different strategy. We train 200 repeating and
00:29:41 1 epoch. But since we don't use regularization images and since we use 1 repeating, we are
00:29:46 going to use epoch strategy. So if you have 100 images, you can reduce this epoch number. However,
00:29:53 you can still train up to 200 epochs and compare checkpoints to get the very best checkpoint. So as
00:29:59 you increase the number of images for a single concept, you may want to reduce the number of
00:30:04 epochs. Currently, most people will collect like 15 to 25 training images, maybe 50. So training
00:30:11 up to 200 epochs and comparing checkpoints is the best strategy. About the training dataset:
00:30:17 this training dataset is not great. Why? Because it contains the same clothing, same background
00:30:25 environment, and it is lacking expressions. Many people are asking me how to generate expressions.
00:30:31 If you want to generate expressions, you need to have expressions, emotions in your training
00:30:37 dataset. I am preparing a much better training dataset. It is not ready yet. I am still using
00:30:43 the same dataset so I can compare it with my older trainings. However, hopefully I will make another
00:30:49 video for an amazing training dataset. You will see that. So when you are preparing your training
00:30:54 dataset, have different poses, have different distances like full body shots, close shots, have
00:31:00 different expressions that you want to generate like laughing or crying, whatever you want, have
00:31:06 different clothing and have different backgrounds. The quality of the dataset is very important. With
00:31:12 FLUX, it is still very flexible. It is better than the SDXL or SD 1.5. However, as you improve your
00:31:19 dataset, you will get better results. One another thing is that make sure that your dataset has
00:31:24 excellent focus, sharpness, and lighting. This is also very important. Do not take pictures at
00:31:31 nighttime. So make sure that your images have very good lighting, very good focus, and very
00:31:36 good sharpness. OK, let's return back to Kohya GUI. So I will leave this at 200. You can reduce
00:31:42 this based on the number of images you have, how many checkpoints you want to get. You can make
00:31:47 this like 10, and after every 10 epochs, it will generate a checkpoint. And at the end of training,
00:31:53 you can compare all of them. I am going to show you how to use and compare them. There is no issue
00:31:58 with it. So you can make this 10 and compare all the checkpoints and find the very best checkpoint
00:32:04 you liked. So this is the most optimal way of obtaining the very best model and checkpoint.
00:32:11 But for this one, let's leave this as 25. Caption file extension. If you are going to use captions
00:32:18 instead of the folder names, you need to select the correct caption extension. By default,
00:32:23 it is TXT and TXT is the most used one. You can also use caption and cap. I never used them.
00:32:29 TXT is just working fine for me. Then you don't need to change anything else here. By default,
00:32:35 we are using 128 to 128 network rank. OK, this is another thing that you need to set. This is
00:32:42 super important: VAE path. So for VAE path, we set this ae.safetensors file. It is already downloaded.
00:32:51 Let's go to the folder, which was inside here. I had downloaded here and ae.safetensors file. And we
00:32:58 are going to use clip large model. Click here. Let's go to the clip large. OK, it is also set
00:33:04 and we are going to use T5 XXL. Make sure to use T5 XXL FP16. It will be auto cast to the correct
00:33:12 position. So let's also pick that file from our downloaded folder, which is here. And we are all
00:33:20 set. You don't need to do anything else. Just save the configuration wherever you want. So I
00:33:26 will save the downloaded folder by clicking this save. If you want to reload, you click this or
00:33:31 click refresh to reload. You see the refresh will refresh. You can also re-pick, but sometimes when
00:33:36 you re-pick, it may not refresh. So click this to refresh. Let's select the saved config again,
00:33:42 which was this one. And let's refresh. Yes, I can see the configuration reloaded. Currently, none of
00:33:48 the configurations are training text encoder clip large. As I said, I am researching it right now,
00:33:54 and hopefully I will update the configuration according to it. The training of clip large will
00:33:59 increase the VRAM usage slightly. It increased by like 800 megabytes on 16-bit precision.
00:34:06 It will also slow down slightly. You see from 8.82 seconds IT to 8.56 seconds IT on a 16-bit
00:34:16 precision training on a 6000 GPU. OK. As I said, we are ready. And before clicking start training,
00:34:24 you need to check your VRAM usage. Try to make your VRAM usage under 500 megabytes. Currently,
00:34:31 you see my computer is using 3.5 gigabytes. Why? Because I am running OBS studio. I am running
00:34:38 NVIDIA broadcast and some other applications. So try to reduce your VRAM usage to 500 megabytes if
00:34:45 you are on limited VRAM. How can you reduce it? Go to the startup, disable all of the startups,
00:34:51 restart your computer, and check the performance and GPU and see how much VRAM you are
00:34:58 using. Alternatively, you can also use another application, which is nvitop. So to install
00:35:05 nvitop, pip install nvitop. I already have it done. Type nvitop and it will show you
00:35:12 the VRAM usage, GPU utilization, how many GPUs you have. You see I have two GPUs. So this is
00:35:18 the way of also seeing exact VRAM usage. You see this one shows 3.8 gigabytes I am
00:35:25 using. This one is showing 3.5 dedicated GPU I am using. So you can use either way. Then
00:35:30 click start training and watch the progress on the CMD window of the Kohya. You see it is
00:35:37 starting everything. You should read the messages appearing here. If you see this error, xFormers
00:35:44 can't load. It is not an issue. Currently, we are not training xFormers. Actually, I had fixed this,
00:35:49 but I am going to fix it again. But it is not an issue because currently xFormers is not working
00:35:54 better than the default cross-attention, which is sdpa. I will also fix xFormers later, but you can
00:36:01 ignore this xFormers message. It doesn't make any difference. We are not using xFormers. So what are
00:36:07 the messages we see here? This is important. You see it has found 1 repeat. It has found
00:36:13 15 images. So one epoch will be 15 steps because the batch size is 1. Increasing the batch size,
00:36:21 I have tested it too. It doesn't bring almost any speed gain. And as you increase the batch size,
00:36:27 you will get lesser quality training. Batch sizes should be used only for speed. And in this case,
00:36:35 it doesn't increase much. But you can use multiple GPUs to get almost linear speed increase. I have
00:36:41 explained the batch size in the Patreon post. So read that section and you see the regularization
00:36:47 factor is 1. Total steps 15. Train batch size is 1. Gradient accumulation steps 1 and 200
00:36:52 epochs. So it calculates the number of total steps, which is 15 divided by 1 divided by
00:36:58 1. Why? Because this is the batch size. This is the gradient accumulation steps multiplied by 200.
00:37:04 This is the number of epochs and multiplied by 1. This is the regularization images,
00:37:09 which we don't use. So it is 1. When you use it, it is 2. And total, it will be 3000 steps.
00:37:16 I hit enter because when you touch the CMD, it will pause the CMD window. And you will also
00:37:22 see that it didn't find any captions. It didn't find regularization images. So it is going to
00:37:27 use class tokens or ohwx man, which it reads from the folder name. So everything is looking good.
00:37:34 And it will load and start training. The speed is not very great. And you will get better speed as
00:37:39 it progresses. In the beginning, it will show not an accurate speed. Once the Torch 2.4.1 published,
00:37:48 we should get speed like 4 to 5 seconds IT on Windows with an RTX 3090 GPU. But as I said,
00:37:55 don't worry, I'm going to make cloud tutorials. And it is going to be super fast on cloud GPUs.
00:38:01 You will be able to rent very powerful GPUs. And it is perfectly trainable. So this is how
00:38:06 we do training on Windows. It will be the same on Massed Compute and RunPod as well. The procedure
00:38:12 will be the same. Just the starting and installation will not be the same. So just wait until these
00:38:18 checkpoints are saved. And how are we going to use them afterward? Since I already have trained them,
00:38:25 I am not going to wait for training. In this repository, I have the training files, which is
00:38:31 trained with exactly the same configuration. You see the Best_v2. So I'm going
00:38:37 to download several checkpoints to test on my computer. When you are training, there is one
00:38:42 thing that is extremely important that you have to be careful of. Open the task manager, go to
00:38:48 the performance tab, and go to your GPU. You see that there is dedicated GPU and shared GPU memory.
00:38:56 When you are doing training, if it uses shared memory, it will become slower like 10 times,
00:39:04 20 times. So this is the criterion that you need to be careful of. Compare the shared GPU memory
00:39:10 usage before starting the training and during the training and make sure that it doesn't use more
00:39:17 shared GPU during the training. If it does, then you will have extremely slow training speeds. This
00:39:24 is really important. This is also valid for all of the AI applications you use, like Swarm UI, like
00:39:32 Forge Web UI, like Automatic1111 Web UI, whatever that comes to your mind. When it starts using shared
00:39:38 VRAM, that will make your application slower like 20 times because it will start using the memory of
00:39:46 my computer and the memory of my computer will be way slower than the GPU memory. Moreover,
00:39:52 if you have low RAM in some applications, you may encounter a problem, but it is not an issue. You
00:39:59 can set virtual RAM memory. So you see currently I am training and I am not using any shared VRAM. My
00:40:06 dedicated GPU memory still has space to go like 3 gigabytes. This is super important. And
00:40:12 about RAM memory, you can set virtual RAM to have more RAM memory. Memory is usually important when
00:40:20 it is first time loading. So if you have limited RAM, you can increase the virtual RAM memory to
00:40:25 avoid any errors. To increase the virtual RAM memory, click this PC. In here, click this PC
00:40:32 properties. It will open these properties. In here, go to the advanced system settings. And
00:40:37 in this screen, you will see the settings in the performance. And in here, go to the advanced and
00:40:43 you see I have virtual memory. Change it. Set one of your fast drives and you can set a custom size.
00:40:50 Currently, my virtual RAM memory is set here. You see system-managed size and it will allocate
00:40:57 as necessary as virtual RAM memory. OK, this is the way of it. So during the training,
00:41:03 you will see that it will save checkpoints like this. And you will see their saved locations and
00:41:11 where they will be saved. They will be saved in the outputs folder. You see the output directory
00:41:19 for trained model. This was automatically set by the GUI when we used the prepared training data
00:41:28 and copied info to the respective fields. After you clicked "copy info to respective fields," you
00:41:34 can change the output directory for training the model. You can directly give the folder path of
00:41:40 your Swarm UI, your Forge web UI, wherever you are using your Comfy UI. So my models will be
00:41:46 saved automatically here. When we open that folder, we will see the saved checkpoints like
00:41:53 this. So these checkpoints are 2.4 gigabytes. Why? Because the FLUX model is very big. It
00:42:00 has 12 billion parameters. Moreover, we are saving it as float. So we are saving it with
00:42:08 the maximum precision without any quality loss. If you want to reduce the file size,
00:42:14 you can also save it as FP16. I haven't compared it. Some of the followers say that BF16 is working
00:42:22 badly. So if you need a lower size, save it as FP16. If you want to have the maximum accuracy,
00:42:29 save it as float. Saved precision will not change VRAM usage. Will not change the VRAM usage
00:42:37 during training or during inference. So now it is time to use the saved models with the FLUX model
00:42:44 itself. So how are we going to use them after training has been completed? You can use them with
00:42:50 Comfy UI, with Swarm UI, with Forge web UI at the moment. I prefer to use in Swarm UI. I already have
00:42:58 a main Swarm UI tutorial. It is amazing. And I also have a Swarm UI tutorial for FLUX. Moreover,
00:43:07 I also have a Forge web UI models downloader and installer for Runpod and also for Massed
00:43:13 Compute. I will show the cloud service providers in the cloud tutorial, but I will show how to use
00:43:21 it on my computer at the moment. So I will move these files into my Swarm UI models folder. You
00:43:28 see it also saves the TOML file and the JSON file wherever you are saving them. So let's select the
00:43:35 generated models and move them into our Swarm UI model folder. My Swarm UI is installed here. When
00:43:43 you watch the tutorial, you will see it. I will put the generated LoRA files into the models into
00:43:49 the LoRA folder here. This is important. This is where you need to put them. Before starting
00:43:54 training as I said, you can give this folder path. So after you have set the copy info to
00:44:02 respective fields, go here and change it like this and save your configuration. Then it will save the
00:44:09 generated LoRA files into this folder for you. OK, so I'm going to start my Swarm UI. First of all,
00:44:16 I am going to update it. When you are using Swarm UI, you should update it always. You can also use
00:44:22 Forge web UI or ComfyUI. I prefer Swarm UI because it is amazing. It has so many amazing features.
00:44:29 It also uses the back end of ComfyUI. So I will use the launch windows.bat file to start it.
00:44:35 And my ComfyUI is starting. There is also a trick that I'm going to show you. This will give you
00:44:41 a lot of performance boost. If you have an RTX 4000 series GPUs, click this pencil icon and add
00:44:49 extra arguments --fast here and save it. This will hugely speed up your generations with Swarm UI
00:44:57 when you are using the FLUX model. Backends are still loading. Go to the server,
00:45:01 go to the logs. Let's see in the debug. So we can see that it is starting the server. It is loading
00:45:07 everything and the data is getting refreshed. OK, so now I am ready to generate images. For the FLUX
00:45:13 model. I am going to use CFG scale 1. When you use CFG scale 1, you see the negative problem
00:45:18 is disabled because FLUX doesn't support. I prefer 30 steps. Based on your GPU,
00:45:24 you can set it. I'm going to use resolution 1. I'm going to use sampler Uni PC. I find this the
00:45:30 best. The scheduler will be normal and you can set the preferred D type here. Since this is a
00:45:35 24-gigabyte GPU, I am going to use this one. However, you can also use 16-bit on a bigger
00:45:43 VRAM-having GPU. This is a good one. There are also quantized models, but I haven't tested LoRAs
00:45:50 on them, whether they are working or not. So I don't know. And I can't say they will work. And
00:45:56 in the models, I have FLUX 1 version dev FP8. You can also use the FP16 model. They should work
00:46:05 exactly the same actually. So to be sure, I am going to use the FP16 model. I already have it
00:46:12 here. So let's copy it. And let's copy it into our models folder. It goes into the Stable Diffusion.
00:46:22 Let's see. Yeah, it goes into the UNet because this is not a quantized model. It goes into the
00:46:27 UNet. The quantized models go into the Stable Diffusion folder in the Swarm UI. I could also
00:46:33 give the path of this file from this folder before doing the training. So you don't need
00:46:38 to have duplicate files. What would be the case? The case would be I click this icon. Then I go to
00:46:46 my Swarm UI installation, which is here. Swarm UI inside models, inside UNet. And I can select this.
00:46:55 So you see, you can give any file path in Kohya GUI. It will just work. Then refresh the models and the
00:47:03 FLUX development arrived. This is an FP16, 16-bit precision model. But since I'm going to use 8-bit,
00:47:10 it will auto-cast it. Before starting using the LoRA, let's generate an image. And a very good
00:47:17 part is that in the zip file, I have shared some test prompts. So let's open the test prompts from
00:47:23 here. And for example, let's generate this image without our LoRA. And let's hit generate. You can
00:47:29 watch the progress in the server, in the logs, in the debug. So it is going to load the model. And
00:47:36 you see model weight dtype, manual cast. So it is going to do everything automatically for me. I can
00:47:42 watch the VRAM usage here. It is using ComfyUI, thus it is very, very optimized. I have selected
00:47:49 the model. You see the FLUX 1 development model is selected. As I said, I am not sure whether it
00:47:54 is working with the quantized models or not. I haven't tested yet, but you can test. For now
00:48:00 make sure to test with the development model. Then you can test on quantized models as well. And this
00:48:05 is my VRAM usage currently. Don't worry, you don't need such VRAM. It works as low as 6 GB having
00:48:12 GPUs. It is extremely optimized. Since I have more VRAM, it is using more VRAM. When you have lower
00:48:19 VRAM, it will use lower VRAM. Moreover, I can also use my second GPU. It is amazing. So all I need to
00:48:27 do is I am going to add a new ComfyUI self-starting click here. OK. I will make this like this. Extra
00:48:35 arguments, starting script. And I will make this GPU ID 1 and save. So when I generate multiple
00:48:41 images, it will use my second GPU as well. OK. The image is almost ready. I am also using segment to
00:48:48 in-paint face. This is not mandatory. In the main tutorial, I explain everything. It is a little bit
00:48:54 slow on my GPU, but you can use the cloud always. We can see the IT per second. OK. Where is it?
00:49:01 Let's generate another image to see it. Let's also see. Yeah, it is fine. OK. Now it is starting to
00:49:07 in-paint the face. OK. Face in-painting speed is 1.5 seconds per IT. And it is doing 18 steps.
00:49:13 Why? Because I set it at 0.7 denoise strength. It is called different here. It is called image
00:49:22 creativity. So it is doing 70%. And since I use 30 steps, it is doing 21. Actually, it should do 21,
00:49:31 but it did 18. OK. This is the image we got. Now I prefer to use FLUX guidance scale 4. I forgotten
00:49:39 that. OK. Then how am I going to use my LoRAs? Go to the LoRAs tab here, refresh. And once you
00:49:46 refresh, you will see your FLUX LoRAs. You see type FLUX 1 LoRAs. For example, let's use the
00:49:52 150 epoch and hit generate. So now it is going to use my LoRA. We can see in the back end. It
00:49:59 will load. You can see everything here. Swarm UI working much better than the Forge web UI.
00:50:06 When you are using LoRAs, when you are using Forge web UI, it first processes LoRAs. It takes extra
00:50:12 VRAM. However, with Swarm UI, it doesn't do that. Also, when you have more VRAM, it will be faster.
00:50:18 Since I am using some serious VRAM right now. It is using a lot of VRAM. I am recording a video.
00:50:25 It is also making. And let's also close these two. Try to reduce your VRAM usage as much as possible.
00:50:32 And I am already using a lot of VRAM when I am recording. We can see the preview here. First,
00:50:37 it is generating the image. Then it will in-paint the face to my face with the prompt
00:50:42 photo of ohwx man. It is in-painting the face now. This is not mandatory but in-painting the face,
00:50:49 especially in distance shots, will improve the face quality. Moreover, in my training images,
00:50:56 I have eyeglasses. You see? But since FLUX has an internal text encoder, it is able
00:51:03 to separate my eyeglasses from my face. Thus, in this image, I don't have eyeglasses. This is also
00:51:10 a little bit overfit image. But I am working on a better workflow. Hopefully, I will update it.
00:51:15 and it will become much better. So you can add here with eyeglasses. And you will get a more
00:51:23 likely image. Let's see. Let's hit Generate and double-generate. So once I click it double times,
00:51:28 it will also start using my second GPU. It should start. Let's see. I think first it will load into
00:51:35 the second GPU. Then it will start. OK. Let's see the server back-ends. OK. OK. It shows here
00:51:41 running back-end. OK. For some reason, it didn't start. And if preview disappears here, you can go
00:51:47 to the image history, refresh. The last generated images will always appear here. You can see them,
00:51:52 their features. FLUX guidance scale, the LoRA. You can also set different LoRA weights from here. OK.
00:51:59 Image is generated. And you see it is now much more resembling. We can see from the training
00:52:06 images, it has a perfect resemblance. It is, as I said, more overfit. With the fine-tuning, full
00:52:13 model training I think this overfitting problem will be solved. I am also doing more research
00:52:18 right now, as I said. So I will update the config as I get better config. Every day something new is
00:52:24 arriving. Also, you may use overfit, over-trained model checkpoints. So how are we going to test
00:52:33 and find the very best checkpoint? This is a super important part. Let's see also the other generated
00:52:38 image. If you have an RTX 4090, it will be way faster than this. It will be many times faster
00:52:44 than this. You can always see the step speed here. It is around 1.5 seconds per IT for me. And the
00:52:49 second image also generated a perfect resemblance. OK. So how am I going to find the very best
00:52:55 checkpoint? To do that, we go to the Tools. And in here, we select the Grid Generator. In here,
00:53:03 in the first tab, I use LoRAs. So find the LoRAs from here. Let's find it. You can also type to
00:53:09 filter like this: LoRAs. When you click Fill, it will fill all the LoRAs. Then you can delete
00:53:15 the ones that you don't want to test. Let's start testing from the 50 epochs. So these are epochs,
00:53:20 epoch 50, epoch 75. If you save based on the step count, it will have a different naming. So
00:53:26 the last one will be the 200 epochs. Then you can test multiple prompts or you can use this prompt
00:53:32 for testing. If you want to test multiple prompts, I have already prepared prompts. So these prompts
00:53:39 are shared here. You see test prompts, no segment in-painting. This doesn't have any segmentation
00:53:45 in-painting. And I have test prompts here, test prompts with eyeglasses. And test prompts, let's
00:53:52 see. One of them doesn't have eyeglasses. OK, this doesn't have. So I am going to change this name.
00:53:59 I will fix this. Test prompts without eyeglasses, grid formatted. So you just copy this. By the way,
00:54:05 in the grid, this is the prompt separator. You see like this. Then copy-paste it here. You see it has
00:54:12 split each prompt. And now when I generate a grid, it will test all of the models for me. Let's see,
00:54:21 generate. This time it should use the second GPU as well, I believe. Let's see what will happen.
00:54:27 OK, now it started loading onto the second GPU as well. So when generating this grid, it will use
00:54:34 both of my GPUs. However, with my GPUs, this will take huge time because there are 274 generations.
00:54:43 This is estimated this will get better. But this will make a huge grid. You should wait for
00:54:49 it to update. You can rent for a 6000 on Massed Compute and use all of them at once. It will be
00:54:56 much faster. In the Massed Compute tutorial hopefully, I will show you. OK, the first
00:55:01 image is generated. The first image for verifying the model sanity. You see it is perfectly able to
00:55:08 generate a supercar image. Nothing like me. So the model sanity is perfect. It is still keeping its
00:55:15 sanity. And what kind of results are we going to get after these grids are generated? For now, I
00:55:22 will interrupt this with "interrupt all sessions." And I will show you from my cloud computing. So in
00:55:29 the Swarm UI Massed Compute tutorial, there are these new instructions. Copy this. First of all,
00:55:35 you need to install this. OK, let's copy this and open a new terminal here. New window and paste it
00:55:43 and hit enter. This will start Swarm UI on Massed Compute. But it will give me a cloud URL that I
00:55:51 can use on my computer. It will be here. It is Cloud Flare. Let's copy and paste it and open.
00:55:56 And this is where I do my testing, my experimentation. When I go to the tools and
00:56:02 grid generator, I have so many previous tests like this. You see, I have 8 GPUs running there.
00:56:09 For example, let's open one of them. OK, let's best new 150 epochs. And when I click here,
00:56:16 it will show me the grid results. I explain all of this in the main tutorial. In the advanced
00:56:22 settings, you can select to show which prompts you can select, which models to show. And we will get
00:56:29 a grid like this where we will be able to compare different checkpoints LoRAs. So by analyzing this grid,
00:56:37 you need to find your best liked checkpoint. Usually 150 is good, but it depends on your
00:56:45 training data set. So generate a grid, analyze all of the generated images. You will see at
00:56:52 the top the model used. Also, when you click the image in the bottom, you will see which
00:56:57 LoRA is used. For example, for this one, the used LoRA is. Let's see this one. Best v1_5e_05,
00:57:05 150. Also, it shows the LoRA used name here. So this is the way of finding the very best
00:57:12 LoRA checkpoint that you have. For example, on this Massed Compute, I have 8 running GPUs
00:57:18 and I am able to generate ultra-fast images. This is how I do my experimentation. There
00:57:23 is no other way to do these many trainings on a single GPU. So you need to spend a considerable
00:57:29 amount of money. Thankfully, Massed Compute is supporting me. So if you are wondering how did
00:57:35 I generate those amazing pictures that I have shown in the intro, I have generated amazing
00:57:42 new prompts and I used the wild card feature of the Swarm UI and I have written all these
00:57:49 posts in this wild card and I have generated nine thousand nine hundred ninety-nine images until I
00:57:56 stopped it. The new prompts are shared in the test prompts in the 340 prompts used as wild cards. OK,
00:58:05 we have covered everything and there is also extra information here on how to train on and
00:58:11 use on Runpod and Massed Compute. The instructions are already included in the file. You will see
00:58:17 the Massed Compute, Kohya FLUX instructions. You will see RunPod install instructions. Hopefully,
00:58:23 I'm going to make separate tutorials for them. Also, I suggest you save your models on Hugging
00:58:29 Face if you want to save them. Download them fast. I have an amazing notebook and tutorial for that.
00:58:34 Watch it. And it is also asked of me how to do SDXL and SD 1.5 training with the newest Kohya
00:58:42 interface. So everything is the same, almost the same. Therefore, I will quickly show you. For
00:58:48 example, let's begin with SDXL. Let's open the best configuration. In the very top we will see
00:58:54 the configuration. Here. Let's download the Tier 1 24 gigabytes Slower V2. There is also a Tier 2
00:59:01 low VRAM version and the configuration loaded. Let's close the Swarm UI and let's open our latest
00:59:10 Kohya installation, which is here. Let's start it again. The configuration is downloaded. Kohya is
00:59:18 starting. OK, then these are full fine-tuning configurations, not for LoRA. So select the
00:59:27 DreamBooth app here. Go to the configuration. Click this icon. By the way, this is SD3 FLUX
00:59:33 branch. So there may be some errors. You need to install it the normal way. If it doesn't work,
00:59:40 I don't know. I didn't test it. So let's go to the downloads. Select the file and it is loaded.
00:59:46 You see this is DreamBooth. SDXL is selected. We do FP16 training. Now you set everything
00:59:52 exactly the same. Trained model. Output name. You put the image folder name. Pre-trained model
00:59:57 path. But this time what differs is you should use regularization images. And regularization images
01:00:04 are posted here. I have amazing 5200 images for both women and men. So what
01:00:14 you do when you go to the dataset preparation, you also put regularization images and you put
01:00:21 200 repeats or the number of repeating may depend on your number of training images.
01:00:26 Let's say you have 50 training images. Since we have 5000 regularization images,
01:00:32 you can make this 100 and you can train 2 epochs. But let's say I have 15. So I
01:00:38 make this repeating 200. Put my training images path here. Then I also go to the downloaded
01:00:46 regularization images. Let's see. For example, these are 768. I also
01:00:51 have 1024. Yeah, here. This is an amazing dataset. Super
01:00:56 high quality. I tested and compared the effect of using regularization images and it's mind-blowing,
01:01:02 especially with OneTrainer. And you put it here. This time repeating will be 1. You
01:01:07 just type ohwx and man it will auto-generate the folders for us with accurate naming. Let me
01:01:14 show you. So let's make the destination an example place like music folder D. Usually you shouldn't
01:01:20 use the users folder. But this is just for an example. Prepare training data. Watch the CMD
01:01:26 window because copying may take time. Wait until you see done. Currently it is copying everything.
01:01:32 We can see image 200 ohwx man is copied. And it is also going to copy man datasets. OK,
01:01:40 this one copied. And in the reg folder. OK, this one copied. You see the regularization images are
01:01:46 put into the reg folder and the training images are put into the image folder. And inside the reg
01:01:52 folder man images are named as 1 man because one repeat and man class token and the training
01:02:01 images are 200 and ohwx man. So what other thing changes? Now we train only 1 epoch. So how
01:02:08 are we going to save checkpoints in this case? To save checkpoints in this case, let's copy info to
01:02:13 the respective folders and let's also select the pre-trained model path. I have already models
01:02:19 here. Like let's see SDXL base, for example. By the way, which models I suggest. I suggest you use
01:02:31 Realistic Vision Version, RealVis XL 4 for training SDXL and for training SD 1.5. I suggest what was the name: Hyperrealism
01:02:40 version 3. So you should select those models. And when you click the print training command,
01:02:46 it will show you the total number of steps. And it is now 6000 because I have 15 training
01:02:53 images. I use regularization images and I have 200 repeating. Therefore, let's see if it does
01:02:59 show the calculation. Yes, 200 repeats, 15 images. It makes 3000 steps per epoch. Since
01:03:06 regularization images are used, it is multiplied by 2. So total 6000 steps. And let's say I want
01:03:12 to save every 20 epochs. So normally it would be save every 20 epochs. But now I don't have that
01:03:20 option since I train 1 epoch. So I will save the number of steps. When you divide 6000 to 10,
01:03:28 because we want to get 10 checkpoints, you will change the save every N steps. Let's
01:03:35 see here. So if I make this 601, it will save as equal as saving every 20 epochs compared to the
01:03:45 repeat 1. This is the logic of it. This will save 10 checkpoints during the entire training.
01:03:50 And this save every N epochs is 1 and epoch is 1. This will save full checkpoints. For SDXL,
01:03:58 it will be over 6 GB. For SD1.5, it will be like, I don't remember actually, but it will be big. So
01:04:05 if you need LoRAs, do this way of DreamBooth fine-tuning training, then extract LoRA. It
01:04:12 will work much better than training LoRA with SDXL or SD1.5. I have tested it. For extracting LoRA,
01:04:20 I have an amazing post here. It shows how you can extract. Let me also show you quickly. Go
01:04:25 to the Utilities. Go to the LoRA. And in here, you will see Extract LoRA. Whether it is SDXL or not,
01:04:32 you pick it here. If you don't pick SDXL, it will extract as SD1.5. If you pick SDXL, it will
01:04:40 extract as SDXL. Currently, it doesn't support FLUX. I don't see it here. Oh, here, click here.
01:04:47 Extract LoRA also arrived. I haven't tested it yet. I will hopefully test it after I train full
01:04:53 fine-tuning. So select your fine-tuned model. This is the generated model. Select the base model.
01:04:58 This is like RealVis XL 4. Set the path where you want to save. Save precision. You can save
01:05:04 as float. It will double the size. Load precision. You can load as float. Set this minimum difference
01:05:09 00001. So it will also save the text encoder because I train the text encoder as well. And you can
01:05:15 set the network dimension 128 to 128. This is it. For SDXL, you just select SDXL. And that is
01:05:23 the way of extracting LoRAs and using them when you do training with SD1.5 and SDXL. In my post,
01:05:30 you will see very detailed instructions. So always read the posts on here. Moreover, you should join
01:05:38 our Discord channel. I am always on the Discord channel. You can message me there. At the very top
01:05:44 of the post, you will see SECourses Discord. When you click it, you will see our Discord
01:05:49 channel. We have over 8,000 members, over 1,000 online members. Just click Join Server to join.
01:05:56 Moreover, we have a Patreon exclusive post index. In here, you will see all of our amazing sharings.
01:06:02 You can just do ctrl-F to see them. Also, we have Patreon Scripts Updates History. You
01:06:08 will see which scripts are updated last and what changes are made. Sometimes I don't write the full
01:06:15 changes. And we have Patreon Special Generative Script List. This shows the useful scripts that
01:06:21 you can use for other tasks, other jobs you have. And you can also go to our GitHub repository here,
01:06:29 Stable Diffusion Generative AI. When you go to here, you should actually please Star it,
01:06:36 Fork it, and Watch it. Also, if you sponsor, I appreciate that. You see, we have 2,000 stars.
01:06:42 We have 200 forks and 82 watching. Moreover, we now have a subreddit. Go to subreddit, SECourses.
01:06:50 I follow every comment and post made here. I will reply to all of them. I am also sharing
01:06:56 a lot of announcements here. And I have a real LinkedIn account. I am not an anonymous person,
01:07:02 obviously. You can go to my LinkedIn account. You can follow me. You can connect with me. It is
01:07:08 fine. I also reply to every message here. This is it. I hope I have covered everything that you have
01:07:14 been wondering. Hopefully, see you in the future tutorials for FLUX because a lot of tutorials
01:07:20 are coming like training on Runpod, training on Massed Compute, and fine-tuning. I think
01:07:26 fine-tuning will be way better. And I am going to update this post and write the very newest
01:07:33 findings that I have. Also, you should check out this lengthy research post because you will find
01:07:40 a lot of information here. Let me show you one thing. For example, FLUX training discussions
01:07:45 with lots of information on this post. You can open them. And there is new information that
01:07:52 shows why Windows training is currently slower than Linux training. With Torch 2.4.1 hopefully,
01:08:00 we are going to get an amazing speed boost on Windows without doing anything and without
01:08:05 losing any quality. And it is 75% completed. So it is almost there. I will update my installer
01:08:12 scripts. Don't worry about that. Hopefully, see you in another amazing tutorial video.

Uh oh!

FLUX LoRA Training Simplified From Zero to Hero with Kohya SS GUI 8GB GPU Windows Tutorial Guide

FLUX LoRA Training Simplified: From Zero to Hero with Kohya SS GUI (8GB GPU, Windows) Tutorial Guide

Full tutorial link > https://www.youtube.com/watch?v=nySGu12Y05k

Video Transcription

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!