-
-
Notifications
You must be signed in to change notification settings - Fork 358
FLUX LoRA Training Simplified From Zero to Hero with Kohya SS GUI 8GB GPU Windows Tutorial Guide
FLUX LoRA Training Simplified: From Zero to Hero with Kohya SS GUI (8GB GPU, Windows) Tutorial Guide
Full tutorial link > https://www.youtube.com/watch?v=nySGu12Y05k
Ultimate Kohya GUI FLUX LoRA training tutorial. This tutorial is product of non-stop 9 days research and training. I have trained over 73 FLUX LoRA models and analyzed all to prepare this tutorial video. The research still going on and hopefully the results will be significantly improved and latest configs and findings will be shared. Please watch the tutorial without skipping any part. If you are a beginner user or an expert user, this tutorial covers all for you.
🔗 Full Instructions and Links Written Post (the one used in the tutorial)
00:00:00 Full FLUX LoRA Training Tutorial
00:03:37 Guide on downloading and extracting Kohya GUI
00:04:22 System requirements: Python, FFmpeg, CUDA, C++ tools, and Git
00:05:40 Verifying installations using the command prompt
00:06:20 Kohya GUI installation process and error-checking
00:06:59 Setting the Accelerate option in Kohya GUI, with a discussion of choices
00:07:50 Use of the bat file update to upgrade libraries and scripts
00:08:42 Speed differences between Torch 2.4.0 and 2.5, particularly on Windows and Linux
00:09:54 Starting Kohya GUI via the gui.bat or automatic starter file
00:10:14 Kohya GUI interface and selecting LoRA training mode
00:10:33 LoRA vs. DreamBooth training, with pros and cons
00:11:03 Emphasis on extensive research, with over 72 training sessions
00:11:50 Ongoing research on hyperparameters and future updates
00:12:30 Selecting configurations based on GPU VRAM size
00:13:05 Different configurations and their impact on training quality
00:14:22 "Better colors" configuration for improved image coloring
00:15:58 Setting the pre-trained model path and links for downloading models
00:16:42 Significance of training images and potential errors
00:17:08 Dataset preparation, emphasizing image captioning, cropping, and resizing
00:17:54 Repeating and regularization images for balanced datasets
00:18:25 Impact of regularization images and their optional use in FLUX training
00:19:00 Instance and class prompts and their importance in training
00:19:58 Setting the destination directory for saving training data
00:20:26 Preparing training data in Kohya GUI and generated folder structure
00:21:10 Joy Caption for batch captioning images, with key features
00:21:52 Joy Caption interface for batch captioning
00:22:39 Impact of captioning on likeness, with tips for training styles
00:23:26 Adding an activation token to prompts
00:23:54 Image caption editor for manual caption editing
00:24:53 Batch edit options in the caption editor
00:25:34 Verifying captions for activation token inclusion
00:26:06 Kohya GUI and copying info to respective fields
00:27:01 "Train images image" folder path and its relevance
00:28:10 Setting different repeating numbers for multiple concepts
00:28:45 Setting the output name for generated checkpoints
00:29:03 Parameters: epochs, training dataset, and VAE path
00:29:21 Epochs and recommended numbers based on images
00:30:11 Training dataset quality, including diversity
00:31:00 Importance of image focus, sharpness, and lighting
00:31:42 Saving checkpoints at specific intervals
00:32:11 Caption file extension option (default: TXT)
00:33:20 VAE path setting and selecting the appropriate VA.saveTensor file
00:33:59 Clip large model setting and selecting the appropriate file
00:34:20 T5 XXL setting and selecting the appropriate file
00:34:51 Saving and reloading configurations in Kohya GUI
00:35:36 Ongoing research on clip large training and VRAM usage
00:36:06 Checking VRAM usage before training and tips to reduce it
00:37:39 Starting training in Kohya GUI and explanation of messages
00:38:48 Messages during training: steps, batch size, and regularization factor
00:39:59 How to set virtual RAM memory to prevent errors
00:40:34 Checkpoint saving process and their location
00:41:11 Output directory setting and changing it for specific locations
00:42:00 Checkpoint size and saving them in FP16 format for smaller files
00:43:21 Swarm UI for using trained models and its features
00:44:02 Moving LoRA files to the Swarm UI folder
00:44:41 Speed up Swarm UI on RTX 4000 series GPUs
00:45:13 Generating images using FLUX in Swarm UI
00:46:12 Generating an image without a LoRA using test prompts
00:46:55 VRAM usage with FLUX and using multiple GPUs for faster generation
00:47:54 Using LoRAs in Swarm UI and selecting a LoRA
00:48:27 Generating an image using a LoRA
00:49:01 Optional in-painting face feature in Swarm UI
00:49:46 Overfitting in FLUX training and training image quality
00:51:59 Finding the best checkpoint using the Grid Generator tool
00:52:55 Grid Generator tool for selecting LoRAs and prompts
00:53:59 Generating the grid and expected results
00:56:57 Analyzing grid results in Swarm UI
00:57:56 Finding the best LoRA checkpoint based on grid results
00:58:56 Generating images with wildcards in Swarm UI
01:00:05 Save models on Hugging Face with a link to a tutorial
01:00:05 Training SDXL and SD1.5 models using Kohya GUI
01:03:20 Using regularization images for SDXL training
01:05:30 Saving checkpoints during SDXL training
01:06:15 Extracting LoRAs from SDXL models
-
00:00:00 Hello everyone, today I will be guiding you step-by-step through the process of training
-
00:00:06 LoRA on the latest state-of-the-art text-to-image generative AI model, FLUX. Over the past week,
-
00:00:14 I have been deeply immersed in research, working tirelessly to identify the most effective training
-
00:00:22 workflows and configurations. So far, I have completed 72 full training sessions and more
-
00:00:30 are underway. I have developed a range of unique training configurations that cater to GPUs with
-
00:00:37 as little as 8GB of VRAM all the way up to 48GB. These configurations are optimized for VRAM usage
-
00:00:46 and ranked by training quality. Remarkably, all of them deliver outstanding results. The
-
00:00:53 primary difference lies in the training speed. So yes, even if you are using an 8GB RTX GPU,
-
00:01:01 you can train an impressive FLUX LoRA at a respectable speed. For this tutorial,
-
00:01:07 I will be using Kohya GUI, a user-friendly interface built on the acclaimed Kohya
-
00:01:13 training scripts. With this graphical user interface, you will be able to install,
-
00:01:18 set up, and start training with just mouse clicks. Although this tutorial demonstrates how to use
-
00:01:24 Kohya GUI on a local Windows machine, the process is identical for cloud-based services. Therefore,
-
00:01:32 it is essential to watch this tutorial to understand how to use Kohya GUI on cloud
-
00:01:37 platforms, even though I will be making separate tutorials specifically for cloud setups. We will
-
00:01:43 cover everything from the basics to the expert settings. So even if you are a complete beginner,
-
00:01:50 you will be able to fully train and utilize an amazing FLUX LoRA model. The tutorial is organized
-
00:01:56 into chapters and includes manually written English captions, so be sure to check out the
-
00:02:02 chapters and enable captions if you need them. In addition to training, I will also show you how to
-
00:02:08 use the generated LoRAs within the Swarm UI and how to perform grid generation to identify the
-
00:02:15 best training checkpoint. Finally, at the end of the video, I will demonstrate how you can train
-
00:02:21 Stable Diffusion 1.5 and SDXL models using the latest Kohya GUI interface. So I have prepared
-
00:02:30 an amazing written post where you will find all of the instructions, links, and guides for this
-
00:02:39 amazing tutorial. The link to this post will be in the description of the video, and this post will
-
00:02:45 get updated as I get new information as I complete my new research with new hyperparameters and new
-
00:02:53 features. So this post will be your ultimate guide to follow this tutorial, to do FLUX LoRA
-
00:03:02 training by using the Kohya GUI. Kohya GUI is a wrapper developed by the Bmaltais that allows
-
00:03:09 us to use Kohya SS scripts very easily. This is its official GitHub repository. Basically,
-
00:03:16 we are using Kohya SS. However, we have a GUI and one-click installers and easy setup to use
-
00:03:23 the Kohya SS scripts. So you will see that I have a zip file here. When you go to the bottom of the
-
00:03:31 post, you will also see the attachments, and in the attachments, you will see the zip file. I
-
00:03:37 may have forgotten to put the zip file at the very top of the post, but when you look at the
-
00:03:42 attachments, you will always find the zip file. So let's download the zip file. Click here. Move this
-
00:03:48 zip file to any disk that you want to install. I am going to install it on my R drive. Do not
-
00:03:55 install it in your Windows folder, into your users folder. Directly install into any drive, or do not
-
00:04:02 install it in your cloud drives. Okay, right-click and extract it. Enter the extracted folder,
-
00:04:09 and you will see all of these files. These files may get updated, and there may be more files when
-
00:04:16 you're watching this tutorial. Don't get confused. I will update everything as necessary. Currently,
-
00:04:22 the main branch of the Kohya GUI doesn't support the FLUX training. You see, there is a SD3 FLUX
-
00:04:28 branch. So my installer will automatically switch to the FLUX branch and install it for you. Once it
-
00:04:36 is merged into the main branch main repository, I will update my installers. To install Kohya
-
00:04:42 GUI we are going to use the Windows_Install_Step_1.bat file. Double-click it. It will clone the
-
00:04:47 repository, switch to the accurate branch, and it will start the installation. For this to work, you
-
00:04:54 need to have the requirements installed. You'll see that we have a Windows requirements section.
-
00:05:00 We need Python 3.10.11, FFmpeg, CUDA 11.8, C++ tools, and Git. Now, once you install these,
-
00:05:09 you will be able to use all of the open-source AI applications like Stable Diffusion, Automatic1111 Web
-
00:05:15 UI, Forge Web UI, One Trainer, Swarm UI, ComfyUI, and whatever you can imagine, like Rope Pearl, Face
-
00:05:23 Fusion, and such things. So I have a very good tutorial that shows how to install all of this.
-
00:05:29 Please watch this tutorial. Do not skip it. Please make sure that your Python is directly installed
-
00:05:34 on your C drive. You have installed FFmpeg, CUDA, and C++ tools. These are all important.
-
00:05:40 How can you check whether you have installed them or not? Type CMD and open a command prompt like
-
00:05:45 this. Type Python and you should get 3.10.11. Open another CMD and type Git and you should get Git
-
00:05:52 code like this. Open another CMD and type FFmpeg and you should get FFmpeg. FFmpeg is not necessary
-
00:05:59 for Kohya, but you should install it because you may use other AI applications. And type nvcc
-
00:06:06 --version, and this will let you see whether you have installed the correct CUDA or not. Everything
-
00:06:13 is explained in this video. Please watch it to avoid any errors. So the automatic installation
-
00:06:20 will ask you these options. We are going to select option 1 and hit enter. Then this will start
-
00:06:26 installing the Kohya SS GUI latest version to our computer. Currently, we are in the accurate
-
00:06:32 branch. As I said, I will update this branch when it is merged into the master branch. Just wait
-
00:06:40 patiently. You see that it is showing me I have Python 3.10.11. It is installed, and it is going
-
00:06:46 to download and install everything automatically for me. So the installation has been completed,
-
00:06:52 and I have the new options. You should scroll up and check whether there were any errors or not
-
00:06:59 because the installation step is extremely important. So scroll up and check all of the
-
00:07:05 messages. So as a next step, we are going to set the Accelerate because on some computers it may
-
00:07:11 not be set correctly. So type 5 and hit enter. It is going to ask us some of the options. We
-
00:07:18 are going to use this machine. Hit enter. We are not going to do distributed training. Hit enter.
-
00:07:24 We are not training on Apple or anything. We are training on CUDA drivers. So no,
-
00:07:29 we are not using Torch Dynamo. No, we are not using DeepSpeed. No, we are going to use all of
-
00:07:36 the GPUs. So type "all" and hit enter. I also say yes to this, but I didn't see any difference. Hit
-
00:07:42 enter. And we are going to use BF16. This one. So select this one and hit enter. And we are ready.
-
00:07:50 And we are ready. You don't need to do option 2, option 3, or option 4. But we are not going
-
00:07:56 to use option 6 to start yet. So hit 7 and hit enter. Return back to the folder. And whenever you
-
00:08:03 are going to start training or after the initial installation, run this bat file: Update_Kohya_and_Fix_FLUX_Step2.bat
-
00:08:11 This is going to upgrade to the latest libraries. It is going to also update
-
00:08:16 the scripts to the latest version. So we are going to have the very latest version with this
-
00:08:22 bat file. I will keep these files updated. Just wait for it to install everything. The steps are
-
00:08:29 also written in this post file, but watching this video is better. And the updates have been
-
00:08:34 completed. Verify that they are all working. We are currently using Torch 2.4.0. And it is working
-
00:08:42 slowly on Windows compared to the Linux systems. However, with Torch 2.4.1, hopefully this speed
-
00:08:52 issue on Windows will be fixed. And as soon as it is released, I am going to update my installer
-
00:09:00 scripts. There has been a new development after I have completed the tutorial video, which is Torch
-
00:09:08 2.5 version. With this version, there is a huge speed improvement on Windows. It is almost equal
-
00:09:18 to Linux. However, Torch 2.5 version is currently in development. So it may be broken. It may give
-
00:09:26 you errors. I also didn't test its quality yet. Hopefully, I will test it and I will update the
-
00:09:32 Patreon post. You see, it is already updated. The file that you need to use is: Install_Torch_2_5_Dev_Huge_Speed_Up.bat file
-
00:09:43 after completing the installation. As I said, this is experimental
-
00:09:46 and follow the news on the Patreon post. And now we are ready to start using the Kohya SS GUI. So
-
00:09:54 for starting the Kohya SS GUI, you can either enter the Kohya SS folder and start the gui.bat
-
00:10:01 file. Or you can use my automatic starter file, which is the Windows start KohyaSS.bat file. This
-
00:10:08 will automatically start and open the browser. You see the interface has started. If it doesn't
-
00:10:14 start, you need to open this manually. All right, now this is the interface of Kohya. It is very,
-
00:10:19 very useful and easy to use. A very important thing that you need to be careful of is that
-
00:10:25 you need to select LoRA because we are currently training LoRA. LoRA is optimization. So it is
-
00:10:33 different from DreamBooth. DreamBooth is basically fine-tuning, training the entire model. But with
-
00:10:38 LoRA, we are only training a certain part of the model. Thus, it requires lesser hardware,
-
00:10:46 but also results in lesser quality. Currently, this tutorial is for LoRA training. Hopefully,
-
00:10:51 I am going to do full research on DreamBooth fine-tuning and publish another tutorial for
-
00:10:57 that. And I am going to publish configurations as well. So this tutorial is so far the combination
-
00:11:03 of over 64 different trainings. This is literal. When you go to this post,
-
00:11:09 you will see the entire research history. This is a very, very long, lengthy post. You will see all
-
00:11:15 of the tests I have made. You will see all of the results, grids, and comparisons. I am also going
-
00:11:21 to show some of the parts in this tutorial. So read this post if you want to learn how I
-
00:11:27 came up with my workflow and configuration. Also, when you open this post, you will see all of the
-
00:11:33 models that I have prepared up to now. And this is not all. Currently, I am running 8 different
-
00:11:41 trainings on 8x RTX A6000 GPUs on Massed Compute. You can see the trainings are running right now.
-
00:11:50 Currently, I am testing a new hyperparameter and the clip large text encoder training,
-
00:11:56 which has arrived just today. So after these tests have been completed, I am going to analyze them,
-
00:12:03 post the results on this research topic, and I am going to update the configuration files. Don't
-
00:12:09 worry, you will just load the configuration. You will just download the latest zip file and load
-
00:12:14 the configuration, and you will get the very best workflow, very best configuration whenever you are
-
00:12:20 watching this tutorial. So that is why following this post is extremely important. The link will
-
00:12:25 be in the description and also in the comment section. Don't forget that. So now our interface
-
00:12:30 started and I am going to load the configuration for beginning the setup. I have selected the
-
00:12:36 LoRA in the training tab. When you look at the folder, you will see best configs and you will
-
00:12:41 see best configs, better colors. And what do these configurations mean? When you scroll down, you
-
00:12:49 will see the description of each configuration. I have prepared a configuration for every GPU. So if
-
00:12:56 you have an 8GB GPU, you need to use this one: Rank_9_7514MB.json file. When you enter inside
-
00:13:05 the best configs folder, you will see that the JSON file is there. So according to your VRAM,
-
00:13:12 pick the configuration file. There are slow and fast versions. What do they mean? With the slow
-
00:13:19 version, we are going to get slightly better training quality. There is not much difference
-
00:13:25 between rank 1 and rank 6. After rank 6, we are going to lose some quality because of the reduced
-
00:13:34 training resolution and also reduced LoRA rank. If you read the research post, you will see the
-
00:13:40 difference of each configuration. Training under 1024 pixels seriously reduces the quality. Also,
-
00:13:49 I find that LoRA rank 128 is the very best spot. So these three configurations will have slightly
-
00:13:57 lesser quality than these ones. These ones will be very, very close. Since I have an RTX 3090 GPU,
-
00:14:05 I am going to use the very best configuration rank 3. Currently, this is running slow on my PC,
-
00:14:13 but with the Torch version 2.4.1, it will be much, much faster. So go to here and click this icon,
-
00:14:22 this folder icon, enter inside the folder of best config and load the rank 3. Now you may wonder
-
00:14:30 what is the difference with better colors. This is in experimentation. Currently, I am training
-
00:14:36 it to decide whether to fully switch to it or not. But this uses time step shift sampling
-
00:14:43 and it makes a huge impact on the coloring. When you open this file, you will see this very big,
-
00:14:50 huge grid. And when you look at this grid file, you will see that using the time step shift
-
00:14:57 brings better colors and a better environment like this one. However, it also slightly reduces the
-
00:15:03 likeness. I am still researching it. I am still training it. When we open this imgsli link,
-
00:15:09 you can see a one by one comparison. So according to your needs, decide the configuration you want.
-
00:15:15 Hopefully, I will complete research on time step shift sampling and I will decide the
-
00:15:20 very best configuration. Currently, we have two different configurations and their difference is
-
00:15:25 like this. But to be safe, you can use the best configs folder right now. However, you can also
-
00:15:30 use best configs better colors too. So look at the grid file imgsli and decide yourself. OK,
-
00:15:37 as a next step, you don't need to set anything here. This is for multi GPU. I also have multi
-
00:15:42 GPU configs and I will show them hopefully in the cloud tutorial. I am going to also make a cloud
-
00:15:47 tutorial for Massed Compute and RunPod. So what you need to set with this configuration,
-
00:15:52 you need to set pre-trained model path. This is super important. The model links are posted here
-
00:15:58 so you can either download them from here or I have a one-click downloader here. You see Windows
-
00:16:03 download training models files. If you already have some of the models, make sure that they
-
00:16:08 are accurate models or you will get errors. This bat file will download everything automatically
-
00:16:14 for me. You see, it started downloading already. However, I already have them downloaded here. So
-
00:16:19 I am going to just pick them. If you already have them, you can use them. We are going to use FLUX
-
00:16:26 Dev version 23.8 gigabytes. The FP8 support also arrived, but I haven't tested it. So to be sure,
-
00:16:34 use the FP16 version. It doesn't matter because we are going to use it in FP8 mode and training with
-
00:16:42 24 gigabyte GPUs and it will cast the precision automatically. So select this. You see, this is
-
00:16:48 the base model that I have. Now selecting training images is super important and so many people are
-
00:16:55 making mistakes here. If you know how to set up Kohya folder structure, you can already set it up
-
00:17:01 yourself, but don't use it if you are not an expert. So in the dataset preparation section,
-
00:17:08 we are going to prepare our dataset. The dataset preparation is extremely important. I have written
-
00:17:15 some information here on how to caption them, how to crop them. I already have an auto cropper and
-
00:17:22 auto resize script, but you can also manually auto crop and auto resize them. So currently my
-
00:17:29 training images are auto cropped and auto resized to 1024 pixels as you are seeing right now. These
-
00:17:36 scripts that you will find are extremely useful. Watch this video and check out this script to
-
00:17:41 automatically crop the subject with focus and then resize to get your training dataset. Once
-
00:17:47 you have your dataset like this, copy the path of this dataset or you can also select the path from
-
00:17:54 here. You see training images directory. Click this icon, go to the folder like this and select
-
00:18:00 the folder like this. Now this is super important: repeating. What does repeating mean? We are going
-
00:18:07 to set the repeating to 1 because we are not going to use classification / regularization images.
-
00:18:13 I have tested the impact of regularization and classification images. You can see the results in
-
00:18:17 this research post. For example, let's open this post and no matter what I have tried, there is no
-
00:18:25 way to improve the likeness and the quality with classification regularization images. In none of
-
00:18:31 the cases it yielded better results. When I have used the classification regularization images,
-
00:18:38 you see, you will get mixed faces like this. And if you still want to know what does repeat means,
-
00:18:45 there is a link here. Open this link. In this link, I have asked Kohya to explain the logic
-
00:18:53 of repeating. The logic of repeating is initially made to balance imbalanced datasets. What does
-
00:19:00 that mean? Let's say you have 5 concepts that you want to train at once. And one of the concepts
-
00:19:06 has 100 training images. The other one has 20 images. The other one has 30 images. So in machine
-
00:19:12 learning, you would like to make datasets balanced as much as equally trained. Therefore, if you have
-
00:19:19 100 training images, you make the repeating 1. And for other concepts, if there are 20 images,
-
00:19:25 you make the repeating 5 of that concept. However, since I am training a single concept
-
00:19:31 right now, I am going to set repeating 1 and I am not going to use regularization images because
-
00:19:37 for FLUX training, it doesn't improve the results. However, when you train Stable Diffusion 1.5 or
-
00:19:44 Stable Diffusion XL (SDXL), it really improves the quality of training. At the end of this tutorial,
-
00:19:51 I will show you how to load the very best configuration for SD1.5 and SDXL and train
-
00:19:58 them. Don't worry about that. So we did set the training images directory here. We are going to
-
00:20:03 set the instance prompt. I am going to use a rare token ohwx because it contains very
-
00:20:09 little knowledge. Also, with FLUX training, it is not as important as before, but still use it. And
-
00:20:16 as a class prompt, I am going to use "man." Even if you don't use a class prompt, since the FLUX
-
00:20:23 model has an internal system that behaves like a text encoder and automatic captioning, it will
-
00:20:30 still know what you are training like you have fully captioned it. So even if you don't provide
-
00:20:36 any instance prompt or class prompt, it will work with FLUX. I have also tested it. In the research
-
00:20:43 topic you will see the training results with ohwx man, with only ohwx, and with only man. And you
-
00:20:52 will see that almost in all cases, it perfectly generates your concept. Still, I find it a little
-
00:21:00 bit better to set the class prompt as "man" or if you are training a woman, it's "woman." If you are
-
00:21:05 training a car, it's "car." If you are training a style, it's "style." But training for a man,
-
00:21:10 I use "man." Even if you don't use it, it will work. If you are training multiple people in one
-
00:21:16 training, you don't need to set a class prompt. Just give each one of them a rare token like ohwx,
-
00:21:23 like bbuk, and other rare tokens that you can decide yourself, and train all of them at once.
-
00:21:30 But currently, this is for single-person training. So this is the setup. Instance prompt, class
-
00:21:35 prompt, and destination directory. This is super important. The Kohya GUI will prepare the correct
-
00:21:41 folders and save them at that destination. So I am going to use the installed folder like here. You
-
00:21:48 can use anywhere, and I will save train images like this and then click prepare training data
-
00:21:55 and watch the CMD. You will see that it is done creating the Kohya SS training folder structure.
-
00:22:01 When you enter inside that folder, you see the train images folder generated here. When I enter
-
00:22:07 inside it, you see there is an image and inside the image, there is 1_ohwx man. So what does this
-
00:22:15 mean? This means that the one is the repeating. This is super important. If you set this to 100,
-
00:22:21 at every epoch, it will repeat 100 times. So this is super important. Kohya will always read the
-
00:22:27 repeating number from here, and the rest will be simply the caption of my images. If I wanted to
-
00:22:34 caption them, I could caption them, and it will read the captions. If you don't caption your
-
00:22:39 images, it will read the folder name as a caption. So let's also make a captioned example. I will
-
00:22:45 copy and paste this like this. The name of this folder will not be important because I'm going to
-
00:22:49 caption it. So copy this path, and for captioning, I have an amazing application called Joy Caption.
-
00:22:54 Click here. When you get to this post, you will see the installer for Joy Captioning. This is a
-
00:23:00 very advanced application. It supports multi GPU captioning as well because I am going to
-
00:23:06 caption my entire regularization images dataset, and I am going to make a LoRA on that. There will
-
00:23:12 be a total of 20,000 images that I am going to caption. So it supports multi GPU and also batch
-
00:23:19 captioning. It is already installed. Let me open it and show you how to caption. So Joy Caption 1.
-
00:23:25 Let's start the Windows application. Let's start like this. The application started. All I need
-
00:23:31 to do is give the input folder path here and just start batch captioning. You can decide on multiple
-
00:23:39 GPUs, multiple batch sizes. All is working. This is a very optimized application. There
-
00:23:43 is an override existing caption file, append new caption file, and max new tokens. You don't
-
00:23:49 need to set these parameters. You can just change the max new tokens. I'm going to make a dedicated
-
00:23:54 tutorial for this application hopefully later. You can see the progress on the CMD here. So first it
-
00:24:01 is loading the checkpoints to start captioning. The captioning started and it is captioning the
-
00:24:07 files. When I enter inside this file, you will see that it is generating captions like this. When
-
00:24:12 you open the text file, you will see the caption it generated for that image. However, if you do
-
00:24:19 captioning, it will reduce the likeness of your concept. If you are training a person or if you
-
00:24:26 are training similar stuff. Captioning with styles may work better. I am hopefully going to test it,
-
00:24:32 but still, if you want to do captioning, you can use this application to caption. I
-
00:24:37 have also compared the captioning effect. So let's search the caption in this post and let's see.
-
00:24:45 Yes, here the captioning results are here. Let's open it. And when we look at the caption results,
-
00:24:51 there will be a slightly reduced likeness. However, I didn't see improvement in the
-
00:24:58 environment or generalization or the overfitting. This is because it already captioned itself in the
-
00:25:06 FLUX architecture. So whether I caption or not, it doesn't matter much. The likeness is still there.
-
00:25:12 It's a little bit reduced and it doesn't improve the overall quality. So for person training,
-
00:25:18 I don't suggest captioning, but for training a style, it may work better. As I said, hopefully
-
00:25:23 I am going to research it and make a tutorial for style captioning. After you did the caption,
-
00:25:28 you still should add an activation token. What was the activation token? "instance prompt". So you
-
00:25:34 can either edit them manually or I have a caption editor. The caption editor is posted here. When
-
00:25:41 you go to this link, you will see the instructions and downloader zip file. It's a very, very
-
00:25:47 lightweight Gradio application. Let's open the image caption editor, start the application. It
-
00:25:53 doesn't use any GPU. It is just Python code that allows you to edit the images. It is web-based so
-
00:26:00 it can be used anywhere. So you need to enter the input folder path. This was our folder path. Let's
-
00:26:06 enter it and it will automatically refresh and scan the images. You see, these are the images.
-
00:26:10 When I click here, it will show me the caption. I can either manually edit it here like ohwx.
-
00:26:18 Then I can click save caption and it will save it. Then I can click the next image. Then I can filter
-
00:26:24 images by processed or unprocessed. This is an extremely useful application. So with processed,
-
00:26:30 you see this was the saved processed and these are the still waiting ones. This is a very,
-
00:26:35 very lightweight application. Let's make this again as "man" because I am going to show you
-
00:26:40 another thing. Let's save the caption. Then in the batch edit options, you can enter the folder
-
00:26:46 and you can replace words like replace "man" with "ohwx man" and replace all occurrences and it is
-
00:26:54 going to override all the files. Apply batch edit and all captions are edited. Then you can check
-
00:27:01 all the captions to see whether they contain my activation token or not like this check word and
-
00:27:07 you see all captions contain the specific word or phrase. Then when I refresh here and open
-
00:27:14 an image, you see this photograph features a "ohwx man standing." So all the captions are
-
00:27:20 now ready to use. Since this folder exists in my images folder, it will train both of them,
-
00:27:27 but we don't need captioned images now. So I'm just going to delete this folder. OK,
-
00:27:31 let's return back to the Gradio interface where we set up. After we clicked, prepare training data,
-
00:27:37 we also need to click "copy info to respective fields." Otherwise, you will see that the image
-
00:27:42 folder is not accurate. You see it is inaccurate. So click "copy info to respective fields" and you
-
00:27:48 will see that it did set the folder like this "train images img." So what does this mean? It
-
00:27:55 didn't give the path of this folder. It gives the parent folder path here so you can have multiple
-
00:28:02 concepts, multiple persons or items, anything inside this folder, and Kohya will read each one
-
00:28:10 of the folders and whether you have captioned them or you don't have captioned them, it will
-
00:28:15 read the folder name or the captions and it will train based on all of the images. You can also set
-
00:28:21 different repeating numbers for each concept, as I have explained. Let me show you. Let's say I have
-
00:28:27 3 concepts. This can have 3 repeating. This can have 5 repeating. So the images inside
-
00:28:33 this one will be repeated 5 times. Images inside this one will be repeated 3 times.
-
00:28:37 This is the logic of setting up folders. Let's delete them. Now we have set the model path,
-
00:28:45 trained model output name, whatever the name you give, it will generate checkpoints with that name.
-
00:28:50 Let's say "Best_v2." OK, this will be the output model name for LoRAs that are going
-
00:28:58 to get generated. FLUX1 selected. Everything is selected. You don't need to set anything and you
-
00:29:03 don't need to use dataset preparation again, but you can prepare multiple datasets by using here.
-
00:29:09 With Stable Diffusion XL and with Stable Diffusion 1.5, we use regularization images. At the end
-
00:29:15 of this tutorial, I will show that too. So I'm just skipping that part. And the parameters. In the parameters what you
-
00:29:21 need to set are a few things. First of all, how many epochs you are going to train? We are going
-
00:29:28 to train based on epochs and one epoch means that all of the images are trained one time.
-
00:29:35 When we use regularization images, we use a different strategy. We train 200 repeating and
-
00:29:41 1 epoch. But since we don't use regularization images and since we use 1 repeating, we are
-
00:29:46 going to use epoch strategy. So if you have 100 images, you can reduce this epoch number. However,
-
00:29:53 you can still train up to 200 epochs and compare checkpoints to get the very best checkpoint. So as
-
00:29:59 you increase the number of images for a single concept, you may want to reduce the number of
-
00:30:04 epochs. Currently, most people will collect like 15 to 25 training images, maybe 50. So training
-
00:30:11 up to 200 epochs and comparing checkpoints is the best strategy. About the training dataset:
-
00:30:17 this training dataset is not great. Why? Because it contains the same clothing, same background
-
00:30:25 environment, and it is lacking expressions. Many people are asking me how to generate expressions.
-
00:30:31 If you want to generate expressions, you need to have expressions, emotions in your training
-
00:30:37 dataset. I am preparing a much better training dataset. It is not ready yet. I am still using
-
00:30:43 the same dataset so I can compare it with my older trainings. However, hopefully I will make another
-
00:30:49 video for an amazing training dataset. You will see that. So when you are preparing your training
-
00:30:54 dataset, have different poses, have different distances like full body shots, close shots, have
-
00:31:00 different expressions that you want to generate like laughing or crying, whatever you want, have
-
00:31:06 different clothing and have different backgrounds. The quality of the dataset is very important. With
-
00:31:12 FLUX, it is still very flexible. It is better than the SDXL or SD 1.5. However, as you improve your
-
00:31:19 dataset, you will get better results. One another thing is that make sure that your dataset has
-
00:31:24 excellent focus, sharpness, and lighting. This is also very important. Do not take pictures at
-
00:31:31 nighttime. So make sure that your images have very good lighting, very good focus, and very
-
00:31:36 good sharpness. OK, let's return back to Kohya GUI. So I will leave this at 200. You can reduce
-
00:31:42 this based on the number of images you have, how many checkpoints you want to get. You can make
-
00:31:47 this like 10, and after every 10 epochs, it will generate a checkpoint. And at the end of training,
-
00:31:53 you can compare all of them. I am going to show you how to use and compare them. There is no issue
-
00:31:58 with it. So you can make this 10 and compare all the checkpoints and find the very best checkpoint
-
00:32:04 you liked. So this is the most optimal way of obtaining the very best model and checkpoint.
-
00:32:11 But for this one, let's leave this as 25. Caption file extension. If you are going to use captions
-
00:32:18 instead of the folder names, you need to select the correct caption extension. By default,
-
00:32:23 it is TXT and TXT is the most used one. You can also use caption and cap. I never used them.
-
00:32:29 TXT is just working fine for me. Then you don't need to change anything else here. By default,
-
00:32:35 we are using 128 to 128 network rank. OK, this is another thing that you need to set. This is
-
00:32:42 super important: VAE path. So for VAE path, we set this ae.safetensors file. It is already downloaded.
-
00:32:51 Let's go to the folder, which was inside here. I had downloaded here and ae.safetensors file. And we
-
00:32:58 are going to use clip large model. Click here. Let's go to the clip large. OK, it is also set
-
00:33:04 and we are going to use T5 XXL. Make sure to use T5 XXL FP16. It will be auto cast to the correct
-
00:33:12 position. So let's also pick that file from our downloaded folder, which is here. And we are all
-
00:33:20 set. You don't need to do anything else. Just save the configuration wherever you want. So I
-
00:33:26 will save the downloaded folder by clicking this save. If you want to reload, you click this or
-
00:33:31 click refresh to reload. You see the refresh will refresh. You can also re-pick, but sometimes when
-
00:33:36 you re-pick, it may not refresh. So click this to refresh. Let's select the saved config again,
-
00:33:42 which was this one. And let's refresh. Yes, I can see the configuration reloaded. Currently, none of
-
00:33:48 the configurations are training text encoder clip large. As I said, I am researching it right now,
-
00:33:54 and hopefully I will update the configuration according to it. The training of clip large will
-
00:33:59 increase the VRAM usage slightly. It increased by like 800 megabytes on 16-bit precision.
-
00:34:06 It will also slow down slightly. You see from 8.82 seconds IT to 8.56 seconds IT on a 16-bit
-
00:34:16 precision training on a 6000 GPU. OK. As I said, we are ready. And before clicking start training,
-
00:34:24 you need to check your VRAM usage. Try to make your VRAM usage under 500 megabytes. Currently,
-
00:34:31 you see my computer is using 3.5 gigabytes. Why? Because I am running OBS studio. I am running
-
00:34:38 NVIDIA broadcast and some other applications. So try to reduce your VRAM usage to 500 megabytes if
-
00:34:45 you are on limited VRAM. How can you reduce it? Go to the startup, disable all of the startups,
-
00:34:51 restart your computer, and check the performance and GPU and see how much VRAM you are
-
00:34:58 using. Alternatively, you can also use another application, which is nvitop. So to install
-
00:35:05 nvitop, pip install nvitop. I already have it done. Type nvitop and it will show you
-
00:35:12 the VRAM usage, GPU utilization, how many GPUs you have. You see I have two GPUs. So this is
-
00:35:18 the way of also seeing exact VRAM usage. You see this one shows 3.8 gigabytes I am
-
00:35:25 using. This one is showing 3.5 dedicated GPU I am using. So you can use either way. Then
-
00:35:30 click start training and watch the progress on the CMD window of the Kohya. You see it is
-
00:35:37 starting everything. You should read the messages appearing here. If you see this error, xFormers
-
00:35:44 can't load. It is not an issue. Currently, we are not training xFormers. Actually, I had fixed this,
-
00:35:49 but I am going to fix it again. But it is not an issue because currently xFormers is not working
-
00:35:54 better than the default cross-attention, which is sdpa. I will also fix xFormers later, but you can
-
00:36:01 ignore this xFormers message. It doesn't make any difference. We are not using xFormers. So what are
-
00:36:07 the messages we see here? This is important. You see it has found 1 repeat. It has found
-
00:36:13 15 images. So one epoch will be 15 steps because the batch size is 1. Increasing the batch size,
-
00:36:21 I have tested it too. It doesn't bring almost any speed gain. And as you increase the batch size,
-
00:36:27 you will get lesser quality training. Batch sizes should be used only for speed. And in this case,
-
00:36:35 it doesn't increase much. But you can use multiple GPUs to get almost linear speed increase. I have
-
00:36:41 explained the batch size in the Patreon post. So read that section and you see the regularization
-
00:36:47 factor is 1. Total steps 15. Train batch size is 1. Gradient accumulation steps 1 and 200
-
00:36:52 epochs. So it calculates the number of total steps, which is 15 divided by 1 divided by
-
00:36:58 1. Why? Because this is the batch size. This is the gradient accumulation steps multiplied by 200.
-
00:37:04 This is the number of epochs and multiplied by 1. This is the regularization images,
-
00:37:09 which we don't use. So it is 1. When you use it, it is 2. And total, it will be 3000 steps.
-
00:37:16 I hit enter because when you touch the CMD, it will pause the CMD window. And you will also
-
00:37:22 see that it didn't find any captions. It didn't find regularization images. So it is going to
-
00:37:27 use class tokens or ohwx man, which it reads from the folder name. So everything is looking good.
-
00:37:34 And it will load and start training. The speed is not very great. And you will get better speed as
-
00:37:39 it progresses. In the beginning, it will show not an accurate speed. Once the Torch 2.4.1 published,
-
00:37:48 we should get speed like 4 to 5 seconds IT on Windows with an RTX 3090 GPU. But as I said,
-
00:37:55 don't worry, I'm going to make cloud tutorials. And it is going to be super fast on cloud GPUs.
-
00:38:01 You will be able to rent very powerful GPUs. And it is perfectly trainable. So this is how
-
00:38:06 we do training on Windows. It will be the same on Massed Compute and RunPod as well. The procedure
-
00:38:12 will be the same. Just the starting and installation will not be the same. So just wait until these
-
00:38:18 checkpoints are saved. And how are we going to use them afterward? Since I already have trained them,
-
00:38:25 I am not going to wait for training. In this repository, I have the training files, which is
-
00:38:31 trained with exactly the same configuration. You see the Best_v2. So I'm going
-
00:38:37 to download several checkpoints to test on my computer. When you are training, there is one
-
00:38:42 thing that is extremely important that you have to be careful of. Open the task manager, go to
-
00:38:48 the performance tab, and go to your GPU. You see that there is dedicated GPU and shared GPU memory.
-
00:38:56 When you are doing training, if it uses shared memory, it will become slower like 10 times,
-
00:39:04 20 times. So this is the criterion that you need to be careful of. Compare the shared GPU memory
-
00:39:10 usage before starting the training and during the training and make sure that it doesn't use more
-
00:39:17 shared GPU during the training. If it does, then you will have extremely slow training speeds. This
-
00:39:24 is really important. This is also valid for all of the AI applications you use, like Swarm UI, like
-
00:39:32 Forge Web UI, like Automatic1111 Web UI, whatever that comes to your mind. When it starts using shared
-
00:39:38 VRAM, that will make your application slower like 20 times because it will start using the memory of
-
00:39:46 my computer and the memory of my computer will be way slower than the GPU memory. Moreover,
-
00:39:52 if you have low RAM in some applications, you may encounter a problem, but it is not an issue. You
-
00:39:59 can set virtual RAM memory. So you see currently I am training and I am not using any shared VRAM. My
-
00:40:06 dedicated GPU memory still has space to go like 3 gigabytes. This is super important. And
-
00:40:12 about RAM memory, you can set virtual RAM to have more RAM memory. Memory is usually important when
-
00:40:20 it is first time loading. So if you have limited RAM, you can increase the virtual RAM memory to
-
00:40:25 avoid any errors. To increase the virtual RAM memory, click this PC. In here, click this PC
-
00:40:32 properties. It will open these properties. In here, go to the advanced system settings. And
-
00:40:37 in this screen, you will see the settings in the performance. And in here, go to the advanced and
-
00:40:43 you see I have virtual memory. Change it. Set one of your fast drives and you can set a custom size.
-
00:40:50 Currently, my virtual RAM memory is set here. You see system-managed size and it will allocate
-
00:40:57 as necessary as virtual RAM memory. OK, this is the way of it. So during the training,
-
00:41:03 you will see that it will save checkpoints like this. And you will see their saved locations and
-
00:41:11 where they will be saved. They will be saved in the outputs folder. You see the output directory
-
00:41:19 for trained model. This was automatically set by the GUI when we used the prepared training data
-
00:41:28 and copied info to the respective fields. After you clicked "copy info to respective fields," you
-
00:41:34 can change the output directory for training the model. You can directly give the folder path of
-
00:41:40 your Swarm UI, your Forge web UI, wherever you are using your Comfy UI. So my models will be
-
00:41:46 saved automatically here. When we open that folder, we will see the saved checkpoints like
-
00:41:53 this. So these checkpoints are 2.4 gigabytes. Why? Because the FLUX model is very big. It
-
00:42:00 has 12 billion parameters. Moreover, we are saving it as float. So we are saving it with
-
00:42:08 the maximum precision without any quality loss. If you want to reduce the file size,
-
00:42:14 you can also save it as FP16. I haven't compared it. Some of the followers say that BF16 is working
-
00:42:22 badly. So if you need a lower size, save it as FP16. If you want to have the maximum accuracy,
-
00:42:29 save it as float. Saved precision will not change VRAM usage. Will not change the VRAM usage
-
00:42:37 during training or during inference. So now it is time to use the saved models with the FLUX model
-
00:42:44 itself. So how are we going to use them after training has been completed? You can use them with
-
00:42:50 Comfy UI, with Swarm UI, with Forge web UI at the moment. I prefer to use in Swarm UI. I already have
-
00:42:58 a main Swarm UI tutorial. It is amazing. And I also have a Swarm UI tutorial for FLUX. Moreover,
-
00:43:07 I also have a Forge web UI models downloader and installer for Runpod and also for Massed
-
00:43:13 Compute. I will show the cloud service providers in the cloud tutorial, but I will show how to use
-
00:43:21 it on my computer at the moment. So I will move these files into my Swarm UI models folder. You
-
00:43:28 see it also saves the TOML file and the JSON file wherever you are saving them. So let's select the
-
00:43:35 generated models and move them into our Swarm UI model folder. My Swarm UI is installed here. When
-
00:43:43 you watch the tutorial, you will see it. I will put the generated LoRA files into the models into
-
00:43:49 the LoRA folder here. This is important. This is where you need to put them. Before starting
-
00:43:54 training as I said, you can give this folder path. So after you have set the copy info to
-
00:44:02 respective fields, go here and change it like this and save your configuration. Then it will save the
-
00:44:09 generated LoRA files into this folder for you. OK, so I'm going to start my Swarm UI. First of all,
-
00:44:16 I am going to update it. When you are using Swarm UI, you should update it always. You can also use
-
00:44:22 Forge web UI or ComfyUI. I prefer Swarm UI because it is amazing. It has so many amazing features.
-
00:44:29 It also uses the back end of ComfyUI. So I will use the launch windows.bat file to start it.
-
00:44:35 And my ComfyUI is starting. There is also a trick that I'm going to show you. This will give you
-
00:44:41 a lot of performance boost. If you have an RTX 4000 series GPUs, click this pencil icon and add
-
00:44:49 extra arguments --fast here and save it. This will hugely speed up your generations with Swarm UI
-
00:44:57 when you are using the FLUX model. Backends are still loading. Go to the server,
-
00:45:01 go to the logs. Let's see in the debug. So we can see that it is starting the server. It is loading
-
00:45:07 everything and the data is getting refreshed. OK, so now I am ready to generate images. For the FLUX
-
00:45:13 model. I am going to use CFG scale 1. When you use CFG scale 1, you see the negative problem
-
00:45:18 is disabled because FLUX doesn't support. I prefer 30 steps. Based on your GPU,
-
00:45:24 you can set it. I'm going to use resolution 1. I'm going to use sampler Uni PC. I find this the
-
00:45:30 best. The scheduler will be normal and you can set the preferred D type here. Since this is a
-
00:45:35 24-gigabyte GPU, I am going to use this one. However, you can also use 16-bit on a bigger
-
00:45:43 VRAM-having GPU. This is a good one. There are also quantized models, but I haven't tested LoRAs
-
00:45:50 on them, whether they are working or not. So I don't know. And I can't say they will work. And
-
00:45:56 in the models, I have FLUX 1 version dev FP8. You can also use the FP16 model. They should work
-
00:46:05 exactly the same actually. So to be sure, I am going to use the FP16 model. I already have it
-
00:46:12 here. So let's copy it. And let's copy it into our models folder. It goes into the Stable Diffusion.
-
00:46:22 Let's see. Yeah, it goes into the UNet because this is not a quantized model. It goes into the
-
00:46:27 UNet. The quantized models go into the Stable Diffusion folder in the Swarm UI. I could also
-
00:46:33 give the path of this file from this folder before doing the training. So you don't need
-
00:46:38 to have duplicate files. What would be the case? The case would be I click this icon. Then I go to
-
00:46:46 my Swarm UI installation, which is here. Swarm UI inside models, inside UNet. And I can select this.
-
00:46:55 So you see, you can give any file path in Kohya GUI. It will just work. Then refresh the models and the
-
00:47:03 FLUX development arrived. This is an FP16, 16-bit precision model. But since I'm going to use 8-bit,
-
00:47:10 it will auto-cast it. Before starting using the LoRA, let's generate an image. And a very good
-
00:47:17 part is that in the zip file, I have shared some test prompts. So let's open the test prompts from
-
00:47:23 here. And for example, let's generate this image without our LoRA. And let's hit generate. You can
-
00:47:29 watch the progress in the server, in the logs, in the debug. So it is going to load the model. And
-
00:47:36 you see model weight dtype, manual cast. So it is going to do everything automatically for me. I can
-
00:47:42 watch the VRAM usage here. It is using ComfyUI, thus it is very, very optimized. I have selected
-
00:47:49 the model. You see the FLUX 1 development model is selected. As I said, I am not sure whether it
-
00:47:54 is working with the quantized models or not. I haven't tested yet, but you can test. For now
-
00:48:00 make sure to test with the development model. Then you can test on quantized models as well. And this
-
00:48:05 is my VRAM usage currently. Don't worry, you don't need such VRAM. It works as low as 6 GB having
-
00:48:12 GPUs. It is extremely optimized. Since I have more VRAM, it is using more VRAM. When you have lower
-
00:48:19 VRAM, it will use lower VRAM. Moreover, I can also use my second GPU. It is amazing. So all I need to
-
00:48:27 do is I am going to add a new ComfyUI self-starting click here. OK. I will make this like this. Extra
-
00:48:35 arguments, starting script. And I will make this GPU ID 1 and save. So when I generate multiple
-
00:48:41 images, it will use my second GPU as well. OK. The image is almost ready. I am also using segment to
-
00:48:48 in-paint face. This is not mandatory. In the main tutorial, I explain everything. It is a little bit
-
00:48:54 slow on my GPU, but you can use the cloud always. We can see the IT per second. OK. Where is it?
-
00:49:01 Let's generate another image to see it. Let's also see. Yeah, it is fine. OK. Now it is starting to
-
00:49:07 in-paint the face. OK. Face in-painting speed is 1.5 seconds per IT. And it is doing 18 steps.
-
00:49:13 Why? Because I set it at 0.7 denoise strength. It is called different here. It is called image
-
00:49:22 creativity. So it is doing 70%. And since I use 30 steps, it is doing 21. Actually, it should do 21,
-
00:49:31 but it did 18. OK. This is the image we got. Now I prefer to use FLUX guidance scale 4. I forgotten
-
00:49:39 that. OK. Then how am I going to use my LoRAs? Go to the LoRAs tab here, refresh. And once you
-
00:49:46 refresh, you will see your FLUX LoRAs. You see type FLUX 1 LoRAs. For example, let's use the
-
00:49:52 150 epoch and hit generate. So now it is going to use my LoRA. We can see in the back end. It
-
00:49:59 will load. You can see everything here. Swarm UI working much better than the Forge web UI.
-
00:50:06 When you are using LoRAs, when you are using Forge web UI, it first processes LoRAs. It takes extra
-
00:50:12 VRAM. However, with Swarm UI, it doesn't do that. Also, when you have more VRAM, it will be faster.
-
00:50:18 Since I am using some serious VRAM right now. It is using a lot of VRAM. I am recording a video.
-
00:50:25 It is also making. And let's also close these two. Try to reduce your VRAM usage as much as possible.
-
00:50:32 And I am already using a lot of VRAM when I am recording. We can see the preview here. First,
-
00:50:37 it is generating the image. Then it will in-paint the face to my face with the prompt
-
00:50:42 photo of ohwx man. It is in-painting the face now. This is not mandatory but in-painting the face,
-
00:50:49 especially in distance shots, will improve the face quality. Moreover, in my training images,
-
00:50:56 I have eyeglasses. You see? But since FLUX has an internal text encoder, it is able
-
00:51:03 to separate my eyeglasses from my face. Thus, in this image, I don't have eyeglasses. This is also
-
00:51:10 a little bit overfit image. But I am working on a better workflow. Hopefully, I will update it.
-
00:51:15 and it will become much better. So you can add here with eyeglasses. And you will get a more
-
00:51:23 likely image. Let's see. Let's hit Generate and double-generate. So once I click it double times,
-
00:51:28 it will also start using my second GPU. It should start. Let's see. I think first it will load into
-
00:51:35 the second GPU. Then it will start. OK. Let's see the server back-ends. OK. OK. It shows here
-
00:51:41 running back-end. OK. For some reason, it didn't start. And if preview disappears here, you can go
-
00:51:47 to the image history, refresh. The last generated images will always appear here. You can see them,
-
00:51:52 their features. FLUX guidance scale, the LoRA. You can also set different LoRA weights from here. OK.
-
00:51:59 Image is generated. And you see it is now much more resembling. We can see from the training
-
00:52:06 images, it has a perfect resemblance. It is, as I said, more overfit. With the fine-tuning, full
-
00:52:13 model training I think this overfitting problem will be solved. I am also doing more research
-
00:52:18 right now, as I said. So I will update the config as I get better config. Every day something new is
-
00:52:24 arriving. Also, you may use overfit, over-trained model checkpoints. So how are we going to test
-
00:52:33 and find the very best checkpoint? This is a super important part. Let's see also the other generated
-
00:52:38 image. If you have an RTX 4090, it will be way faster than this. It will be many times faster
-
00:52:44 than this. You can always see the step speed here. It is around 1.5 seconds per IT for me. And the
-
00:52:49 second image also generated a perfect resemblance. OK. So how am I going to find the very best
-
00:52:55 checkpoint? To do that, we go to the Tools. And in here, we select the Grid Generator. In here,
-
00:53:03 in the first tab, I use LoRAs. So find the LoRAs from here. Let's find it. You can also type to
-
00:53:09 filter like this: LoRAs. When you click Fill, it will fill all the LoRAs. Then you can delete
-
00:53:15 the ones that you don't want to test. Let's start testing from the 50 epochs. So these are epochs,
-
00:53:20 epoch 50, epoch 75. If you save based on the step count, it will have a different naming. So
-
00:53:26 the last one will be the 200 epochs. Then you can test multiple prompts or you can use this prompt
-
00:53:32 for testing. If you want to test multiple prompts, I have already prepared prompts. So these prompts
-
00:53:39 are shared here. You see test prompts, no segment in-painting. This doesn't have any segmentation
-
00:53:45 in-painting. And I have test prompts here, test prompts with eyeglasses. And test prompts, let's
-
00:53:52 see. One of them doesn't have eyeglasses. OK, this doesn't have. So I am going to change this name.
-
00:53:59 I will fix this. Test prompts without eyeglasses, grid formatted. So you just copy this. By the way,
-
00:54:05 in the grid, this is the prompt separator. You see like this. Then copy-paste it here. You see it has
-
00:54:12 split each prompt. And now when I generate a grid, it will test all of the models for me. Let's see,
-
00:54:21 generate. This time it should use the second GPU as well, I believe. Let's see what will happen.
-
00:54:27 OK, now it started loading onto the second GPU as well. So when generating this grid, it will use
-
00:54:34 both of my GPUs. However, with my GPUs, this will take huge time because there are 274 generations.
-
00:54:43 This is estimated this will get better. But this will make a huge grid. You should wait for
-
00:54:49 it to update. You can rent for a 6000 on Massed Compute and use all of them at once. It will be
-
00:54:56 much faster. In the Massed Compute tutorial hopefully, I will show you. OK, the first
-
00:55:01 image is generated. The first image for verifying the model sanity. You see it is perfectly able to
-
00:55:08 generate a supercar image. Nothing like me. So the model sanity is perfect. It is still keeping its
-
00:55:15 sanity. And what kind of results are we going to get after these grids are generated? For now, I
-
00:55:22 will interrupt this with "interrupt all sessions." And I will show you from my cloud computing. So in
-
00:55:29 the Swarm UI Massed Compute tutorial, there are these new instructions. Copy this. First of all,
-
00:55:35 you need to install this. OK, let's copy this and open a new terminal here. New window and paste it
-
00:55:43 and hit enter. This will start Swarm UI on Massed Compute. But it will give me a cloud URL that I
-
00:55:51 can use on my computer. It will be here. It is Cloud Flare. Let's copy and paste it and open.
-
00:55:56 And this is where I do my testing, my experimentation. When I go to the tools and
-
00:56:02 grid generator, I have so many previous tests like this. You see, I have 8 GPUs running there.
-
00:56:09 For example, let's open one of them. OK, let's best new 150 epochs. And when I click here,
-
00:56:16 it will show me the grid results. I explain all of this in the main tutorial. In the advanced
-
00:56:22 settings, you can select to show which prompts you can select, which models to show. And we will get
-
00:56:29 a grid like this where we will be able to compare different checkpoints LoRAs. So by analyzing this grid,
-
00:56:37 you need to find your best liked checkpoint. Usually 150 is good, but it depends on your
-
00:56:45 training data set. So generate a grid, analyze all of the generated images. You will see at
-
00:56:52 the top the model used. Also, when you click the image in the bottom, you will see which
-
00:56:57 LoRA is used. For example, for this one, the used LoRA is. Let's see this one. Best v1_5e_05,
-
00:57:05 150. Also, it shows the LoRA used name here. So this is the way of finding the very best
-
00:57:12 LoRA checkpoint that you have. For example, on this Massed Compute, I have 8 running GPUs
-
00:57:18 and I am able to generate ultra-fast images. This is how I do my experimentation. There
-
00:57:23 is no other way to do these many trainings on a single GPU. So you need to spend a considerable
-
00:57:29 amount of money. Thankfully, Massed Compute is supporting me. So if you are wondering how did
-
00:57:35 I generate those amazing pictures that I have shown in the intro, I have generated amazing
-
00:57:42 new prompts and I used the wild card feature of the Swarm UI and I have written all these
-
00:57:49 posts in this wild card and I have generated nine thousand nine hundred ninety-nine images until I
-
00:57:56 stopped it. The new prompts are shared in the test prompts in the 340 prompts used as wild cards. OK,
-
00:58:05 we have covered everything and there is also extra information here on how to train on and
-
00:58:11 use on Runpod and Massed Compute. The instructions are already included in the file. You will see
-
00:58:17 the Massed Compute, Kohya FLUX instructions. You will see RunPod install instructions. Hopefully,
-
00:58:23 I'm going to make separate tutorials for them. Also, I suggest you save your models on Hugging
-
00:58:29 Face if you want to save them. Download them fast. I have an amazing notebook and tutorial for that.
-
00:58:34 Watch it. And it is also asked of me how to do SDXL and SD 1.5 training with the newest Kohya
-
00:58:42 interface. So everything is the same, almost the same. Therefore, I will quickly show you. For
-
00:58:48 example, let's begin with SDXL. Let's open the best configuration. In the very top we will see
-
00:58:54 the configuration. Here. Let's download the Tier 1 24 gigabytes Slower V2. There is also a Tier 2
-
00:59:01 low VRAM version and the configuration loaded. Let's close the Swarm UI and let's open our latest
-
00:59:10 Kohya installation, which is here. Let's start it again. The configuration is downloaded. Kohya is
-
00:59:18 starting. OK, then these are full fine-tuning configurations, not for LoRA. So select the
-
00:59:27 DreamBooth app here. Go to the configuration. Click this icon. By the way, this is SD3 FLUX
-
00:59:33 branch. So there may be some errors. You need to install it the normal way. If it doesn't work,
-
00:59:40 I don't know. I didn't test it. So let's go to the downloads. Select the file and it is loaded.
-
00:59:46 You see this is DreamBooth. SDXL is selected. We do FP16 training. Now you set everything
-
00:59:52 exactly the same. Trained model. Output name. You put the image folder name. Pre-trained model
-
00:59:57 path. But this time what differs is you should use regularization images. And regularization images
-
01:00:04 are posted here. I have amazing 5200 images for both women and men. So what
-
01:00:14 you do when you go to the dataset preparation, you also put regularization images and you put
-
01:00:21 200 repeats or the number of repeating may depend on your number of training images.
-
01:00:26 Let's say you have 50 training images. Since we have 5000 regularization images,
-
01:00:32 you can make this 100 and you can train 2 epochs. But let's say I have 15. So I
-
01:00:38 make this repeating 200. Put my training images path here. Then I also go to the downloaded
-
01:00:46 regularization images. Let's see. For example, these are 768. I also
-
01:00:51 have 1024. Yeah, here. This is an amazing dataset. Super
-
01:00:56 high quality. I tested and compared the effect of using regularization images and it's mind-blowing,
-
01:01:02 especially with OneTrainer. And you put it here. This time repeating will be 1. You
-
01:01:07 just type ohwx and man it will auto-generate the folders for us with accurate naming. Let me
-
01:01:14 show you. So let's make the destination an example place like music folder D. Usually you shouldn't
-
01:01:20 use the users folder. But this is just for an example. Prepare training data. Watch the CMD
-
01:01:26 window because copying may take time. Wait until you see done. Currently it is copying everything.
-
01:01:32 We can see image 200 ohwx man is copied. And it is also going to copy man datasets. OK,
-
01:01:40 this one copied. And in the reg folder. OK, this one copied. You see the regularization images are
-
01:01:46 put into the reg folder and the training images are put into the image folder. And inside the reg
-
01:01:52 folder man images are named as 1 man because one repeat and man class token and the training
-
01:02:01 images are 200 and ohwx man. So what other thing changes? Now we train only 1 epoch. So how
-
01:02:08 are we going to save checkpoints in this case? To save checkpoints in this case, let's copy info to
-
01:02:13 the respective folders and let's also select the pre-trained model path. I have already models
-
01:02:19 here. Like let's see SDXL base, for example. By the way, which models I suggest. I suggest you use
-
01:02:31 Realistic Vision Version, RealVis XL 4 for training SDXL and for training SD 1.5. I suggest what was the name: Hyperrealism
-
01:02:40 version 3. So you should select those models. And when you click the print training command,
-
01:02:46 it will show you the total number of steps. And it is now 6000 because I have 15 training
-
01:02:53 images. I use regularization images and I have 200 repeating. Therefore, let's see if it does
-
01:02:59 show the calculation. Yes, 200 repeats, 15 images. It makes 3000 steps per epoch. Since
-
01:03:06 regularization images are used, it is multiplied by 2. So total 6000 steps. And let's say I want
-
01:03:12 to save every 20 epochs. So normally it would be save every 20 epochs. But now I don't have that
-
01:03:20 option since I train 1 epoch. So I will save the number of steps. When you divide 6000 to 10,
-
01:03:28 because we want to get 10 checkpoints, you will change the save every N steps. Let's
-
01:03:35 see here. So if I make this 601, it will save as equal as saving every 20 epochs compared to the
-
01:03:45 repeat 1. This is the logic of it. This will save 10 checkpoints during the entire training.
-
01:03:50 And this save every N epochs is 1 and epoch is 1. This will save full checkpoints. For SDXL,
-
01:03:58 it will be over 6 GB. For SD1.5, it will be like, I don't remember actually, but it will be big. So
-
01:04:05 if you need LoRAs, do this way of DreamBooth fine-tuning training, then extract LoRA. It
-
01:04:12 will work much better than training LoRA with SDXL or SD1.5. I have tested it. For extracting LoRA,
-
01:04:20 I have an amazing post here. It shows how you can extract. Let me also show you quickly. Go
-
01:04:25 to the Utilities. Go to the LoRA. And in here, you will see Extract LoRA. Whether it is SDXL or not,
-
01:04:32 you pick it here. If you don't pick SDXL, it will extract as SD1.5. If you pick SDXL, it will
-
01:04:40 extract as SDXL. Currently, it doesn't support FLUX. I don't see it here. Oh, here, click here.
-
01:04:47 Extract LoRA also arrived. I haven't tested it yet. I will hopefully test it after I train full
-
01:04:53 fine-tuning. So select your fine-tuned model. This is the generated model. Select the base model.
-
01:04:58 This is like RealVis XL 4. Set the path where you want to save. Save precision. You can save
-
01:05:04 as float. It will double the size. Load precision. You can load as float. Set this minimum difference
-
01:05:09 00001. So it will also save the text encoder because I train the text encoder as well. And you can
-
01:05:15 set the network dimension 128 to 128. This is it. For SDXL, you just select SDXL. And that is
-
01:05:23 the way of extracting LoRAs and using them when you do training with SD1.5 and SDXL. In my post,
-
01:05:30 you will see very detailed instructions. So always read the posts on here. Moreover, you should join
-
01:05:38 our Discord channel. I am always on the Discord channel. You can message me there. At the very top
-
01:05:44 of the post, you will see SECourses Discord. When you click it, you will see our Discord
-
01:05:49 channel. We have over 8,000 members, over 1,000 online members. Just click Join Server to join.
-
01:05:56 Moreover, we have a Patreon exclusive post index. In here, you will see all of our amazing sharings.
-
01:06:02 You can just do ctrl-F to see them. Also, we have Patreon Scripts Updates History. You
-
01:06:08 will see which scripts are updated last and what changes are made. Sometimes I don't write the full
-
01:06:15 changes. And we have Patreon Special Generative Script List. This shows the useful scripts that
-
01:06:21 you can use for other tasks, other jobs you have. And you can also go to our GitHub repository here,
-
01:06:29 Stable Diffusion Generative AI. When you go to here, you should actually please Star it,
-
01:06:36 Fork it, and Watch it. Also, if you sponsor, I appreciate that. You see, we have 2,000 stars.
-
01:06:42 We have 200 forks and 82 watching. Moreover, we now have a subreddit. Go to subreddit, SECourses.
-
01:06:50 I follow every comment and post made here. I will reply to all of them. I am also sharing
-
01:06:56 a lot of announcements here. And I have a real LinkedIn account. I am not an anonymous person,
-
01:07:02 obviously. You can go to my LinkedIn account. You can follow me. You can connect with me. It is
-
01:07:08 fine. I also reply to every message here. This is it. I hope I have covered everything that you have
-
01:07:14 been wondering. Hopefully, see you in the future tutorials for FLUX because a lot of tutorials
-
01:07:20 are coming like training on Runpod, training on Massed Compute, and fine-tuning. I think
-
01:07:26 fine-tuning will be way better. And I am going to update this post and write the very newest
-
01:07:33 findings that I have. Also, you should check out this lengthy research post because you will find
-
01:07:40 a lot of information here. Let me show you one thing. For example, FLUX training discussions
-
01:07:45 with lots of information on this post. You can open them. And there is new information that
-
01:07:52 shows why Windows training is currently slower than Linux training. With Torch 2.4.1 hopefully,
-
01:08:00 we are going to get an amazing speed boost on Windows without doing anything and without
-
01:08:05 losing any quality. And it is 75% completed. So it is almost there. I will update my installer
-
01:08:12 scripts. Don't worry about that. Hopefully, see you in another amazing tutorial video.
