Skip to content

FLUX LoRA Training Simplified From Zero to Hero with Kohya SS GUI 8GB GPU Windows Tutorial Guide

FurkanGozukara edited this page Oct 19, 2025 · 1 revision

FLUX LoRA Training Simplified: From Zero to Hero with Kohya SS GUI (8GB GPU, Windows) Tutorial Guide

FLUX LoRA Training Simplified: From Zero to Hero with Kohya SS GUI (8GB GPU, Windows) Tutorial Guide

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

Ultimate Kohya GUI FLUX LoRA training tutorial. This tutorial is product of non-stop 9 days research and training. I have trained over 73 FLUX LoRA models and analyzed all to prepare this tutorial video. The research still going on and hopefully the results will be significantly improved and latest configs and findings will be shared. Please watch the tutorial without skipping any part. If you are a beginner user or an expert user, this tutorial covers all for you.

🔗 Full Instructions and Links Written Post (the one used in the tutorial) ⤵️

▶️ https://www.patreon.com/posts/click-to-open-post-used-in-tutorial-110879657

00:00:00 Full FLUX LoRA Training Tutorial

00:03:37 Guide on downloading and extracting Kohya GUI

00:04:22 System requirements: Python, FFmpeg, CUDA, C++ tools, and Git

00:05:40 Verifying installations using the command prompt

00:06:20 Kohya GUI installation process and error-checking

00:06:59 Setting the Accelerate option in Kohya GUI, with a discussion of choices

00:07:50 Use of the bat file update to upgrade libraries and scripts

00:08:42 Speed differences between Torch 2.4.0 and 2.5, particularly on Windows and Linux

00:09:54 Starting Kohya GUI via the gui.bat or automatic starter file

00:10:14 Kohya GUI interface and selecting LoRA training mode

00:10:33 LoRA vs. DreamBooth training, with pros and cons

00:11:03 Emphasis on extensive research, with over 72 training sessions

00:11:50 Ongoing research on hyperparameters and future updates

00:12:30 Selecting configurations based on GPU VRAM size

00:13:05 Different configurations and their impact on training quality

00:14:22 "Better colors" configuration for improved image coloring

00:15:58 Setting the pre-trained model path and links for downloading models

00:16:42 Significance of training images and potential errors

00:17:08 Dataset preparation, emphasizing image captioning, cropping, and resizing

00:17:54 Repeating and regularization images for balanced datasets

00:18:25 Impact of regularization images and their optional use in FLUX training

00:19:00 Instance and class prompts and their importance in training

00:19:58 Setting the destination directory for saving training data

00:20:26 Preparing training data in Kohya GUI and generated folder structure

00:21:10 Joy Caption for batch captioning images, with key features

00:21:52 Joy Caption interface for batch captioning

00:22:39 Impact of captioning on likeness, with tips for training styles

00:23:26 Adding an activation token to prompts

00:23:54 Image caption editor for manual caption editing

00:24:53 Batch edit options in the caption editor

00:25:34 Verifying captions for activation token inclusion

00:26:06 Kohya GUI and copying info to respective fields

00:27:01 "Train images image" folder path and its relevance

00:28:10 Setting different repeating numbers for multiple concepts

00:28:45 Setting the output name for generated checkpoints

00:29:03 Parameters: epochs, training dataset, and VAE path

00:29:21 Epochs and recommended numbers based on images

00:30:11 Training dataset quality, including diversity

00:31:00 Importance of image focus, sharpness, and lighting

00:31:42 Saving checkpoints at specific intervals

00:32:11 Caption file extension option (default: TXT)

00:33:20 VAE path setting and selecting the appropriate VA.saveTensor file

00:33:59 Clip large model setting and selecting the appropriate file

00:34:20 T5 XXL setting and selecting the appropriate file

00:34:51 Saving and reloading configurations in Kohya GUI

00:35:36 Ongoing research on clip large training and VRAM usage

00:36:06 Checking VRAM usage before training and tips to reduce it

00:37:39 Starting training in Kohya GUI and explanation of messages

00:38:48 Messages during training: steps, batch size, and regularization factor

00:39:59 How to set virtual RAM memory to prevent errors

00:40:34 Checkpoint saving process and their location

00:41:11 Output directory setting and changing it for specific locations

00:42:00 Checkpoint size and saving them in FP16 format for smaller files

00:43:21 Swarm UI for using trained models and its features

00:44:02 Moving LoRA files to the Swarm UI folder

00:44:41 Speed up Swarm UI on RTX 4000 series GPUs

00:45:13 Generating images using FLUX in Swarm UI

00:46:12 Generating an image without a LoRA using test prompts

00:46:55 VRAM usage with FLUX and using multiple GPUs for faster generation

00:47:54 Using LoRAs in Swarm UI and selecting a LoRA

00:48:27 Generating an image using a LoRA

00:49:01 Optional in-painting face feature in Swarm UI

00:49:46 Overfitting in FLUX training and training image quality

00:51:59 Finding the best checkpoint using the Grid Generator tool

00:52:55 Grid Generator tool for selecting LoRAs and prompts

00:53:59 Generating the grid and expected results

00:56:57 Analyzing grid results in Swarm UI

00:57:56 Finding the best LoRA checkpoint based on grid results

00:58:56 Generating images with wildcards in Swarm UI

01:00:05 Save models on Hugging Face with a link to a tutorial

01:00:05 Training SDXL and SD1.5 models using Kohya GUI

01:03:20 Using regularization images for SDXL training

01:05:30 Saving checkpoints during SDXL training

01:06:15 Extracting LoRAs from SDXL models

Video Transcription

  • 00:00:00 Hello everyone, today I will be guiding you  step-by-step through the process of training

  • 00:00:06 LoRA on the latest state-of-the-art text-to-image  generative AI model, FLUX. Over the past week,

  • 00:00:14 I have been deeply immersed in research, working  tirelessly to identify the most effective training

  • 00:00:22 workflows and configurations. So far, I have  completed 72 full training sessions and more

  • 00:00:30 are underway. I have developed a range of unique  training configurations that cater to GPUs with

  • 00:00:37 as little as 8GB of VRAM all the way up to 48GB.  These configurations are optimized for VRAM usage

  • 00:00:46 and ranked by training quality. Remarkably,  all of them deliver outstanding results. The

  • 00:00:53 primary difference lies in the training speed.  So yes, even if you are using an 8GB RTX GPU,

  • 00:01:01 you can train an impressive FLUX LoRA at  a respectable speed. For this tutorial,

  • 00:01:07 I will be using Kohya GUI, a user-friendly  interface built on the acclaimed Kohya

  • 00:01:13 training scripts. With this graphical user  interface, you will be able to install,

  • 00:01:18 set up, and start training with just mouse clicks.  Although this tutorial demonstrates how to use

  • 00:01:24 Kohya GUI on a local Windows machine, the process  is identical for cloud-based services. Therefore,

  • 00:01:32 it is essential to watch this tutorial to  understand how to use Kohya GUI on cloud

  • 00:01:37 platforms, even though I will be making separate  tutorials specifically for cloud setups. We will

  • 00:01:43 cover everything from the basics to the expert  settings. So even if you are a complete beginner,

  • 00:01:50 you will be able to fully train and utilize an  amazing FLUX LoRA model. The tutorial is organized

  • 00:01:56 into chapters and includes manually written  English captions, so be sure to check out the

  • 00:02:02 chapters and enable captions if you need them. In  addition to training, I will also show you how to

  • 00:02:08 use the generated LoRAs within the Swarm UI and  how to perform grid generation to identify the

  • 00:02:15 best training checkpoint. Finally, at the end of  the video, I will demonstrate how you can train

  • 00:02:21 Stable Diffusion 1.5 and SDXL models using the  latest Kohya GUI interface. So I have prepared

  • 00:02:30 an amazing written post where you will find all  of the instructions, links, and guides for this

  • 00:02:39 amazing tutorial. The link to this post will be in  the description of the video, and this post will

  • 00:02:45 get updated as I get new information as I complete  my new research with new hyperparameters and new

  • 00:02:53 features. So this post will be your ultimate  guide to follow this tutorial, to do FLUX LoRA

  • 00:03:02 training by using the Kohya GUI. Kohya GUI is  a wrapper developed by the Bmaltais that allows

  • 00:03:09 us to use Kohya SS scripts very easily. This  is its official GitHub repository. Basically,

  • 00:03:16 we are using Kohya SS. However, we have a GUI  and one-click installers and easy setup to use

  • 00:03:23 the Kohya SS scripts. So you will see that I have  a zip file here. When you go to the bottom of the

  • 00:03:31 post, you will also see the attachments, and in  the attachments, you will see the zip file. I

  • 00:03:37 may have forgotten to put the zip file at the  very top of the post, but when you look at the

  • 00:03:42 attachments, you will always find the zip file. So  let's download the zip file. Click here. Move this

  • 00:03:48 zip file to any disk that you want to install. I  am going to install it on my R drive. Do not

  • 00:03:55 install it in your Windows folder, into your users  folder. Directly install into any drive, or do not

  • 00:04:02 install it in your cloud drives. Okay, right-click  and extract it. Enter the extracted folder,

  • 00:04:09 and you will see all of these files. These files  may get updated, and there may be more files when

  • 00:04:16 you're watching this tutorial. Don't get confused.  I will update everything as necessary. Currently,

  • 00:04:22 the main branch of the Kohya GUI doesn't support  the FLUX training. You see, there is a SD3 FLUX

  • 00:04:28 branch. So my installer will automatically switch  to the FLUX branch and install it for you. Once it

  • 00:04:36 is merged into the main branch main repository,  I will update my installers. To install Kohya

  • 00:04:42 GUI we are going to use the Windows_Install_Step_1.bat file. Double-click it. It will clone the

  • 00:04:47 repository, switch to the accurate branch, and it  will start the installation. For this to work, you

  • 00:04:54 need to have the requirements installed. You'll  see that we have a Windows requirements section.

  • 00:05:00 We need Python 3.10.11, FFmpeg, CUDA 11.8, C++  tools, and Git. Now, once you install these,

  • 00:05:09 you will be able to use all of the open-source AI  applications like Stable Diffusion, Automatic1111 Web

  • 00:05:15 UI, Forge Web UI, One Trainer, Swarm UI, ComfyUI,  and whatever you can imagine, like Rope Pearl, Face

  • 00:05:23 Fusion, and such things. So I have a very good  tutorial that shows how to install all of this.

  • 00:05:29 Please watch this tutorial. Do not skip it. Please  make sure that your Python is directly installed

  • 00:05:34 on your C drive. You have installed FFmpeg,  CUDA, and C++ tools. These are all important.

  • 00:05:40 How can you check whether you have installed them  or not? Type CMD and open a command prompt like

  • 00:05:45 this. Type Python and you should get 3.10.11. Open  another CMD and type Git and you should get Git

  • 00:05:52 code like this. Open another CMD and type FFmpeg  and you should get FFmpeg. FFmpeg is not necessary

  • 00:05:59 for Kohya, but you should install it because  you may use other AI applications. And type nvcc

  • 00:06:06 --version, and this will let you see whether you  have installed the correct CUDA or not. Everything

  • 00:06:13 is explained in this video. Please watch it to  avoid any errors. So the automatic installation

  • 00:06:20 will ask you these options. We are going to select  option 1 and hit enter. Then this will start

  • 00:06:26 installing the Kohya SS GUI latest version to  our computer. Currently, we are in the accurate

  • 00:06:32 branch. As I said, I will update this branch when  it is merged into the master branch. Just wait

  • 00:06:40 patiently. You see that it is showing me I have  Python 3.10.11. It is installed, and it is going

  • 00:06:46 to download and install everything automatically  for me. So the installation has been completed,

  • 00:06:52 and I have the new options. You should scroll up  and check whether there were any errors or not

  • 00:06:59 because the installation step is extremely  important. So scroll up and check all of the

  • 00:07:05 messages. So as a next step, we are going to set  the Accelerate because on some computers it may

  • 00:07:11 not be set correctly. So type 5 and hit enter.  It is going to ask us some of the options. We

  • 00:07:18 are going to use this machine. Hit enter. We are  not going to do distributed training. Hit enter.

  • 00:07:24 We are not training on Apple or anything.  We are training on CUDA drivers. So no,

  • 00:07:29 we are not using Torch Dynamo. No, we are not  using DeepSpeed. No, we are going to use all of

  • 00:07:36 the GPUs. So type "all" and hit enter. I also say  yes to this, but I didn't see any difference. Hit

  • 00:07:42 enter. And we are going to use BF16. This one. So  select this one and hit enter. And we are ready.

  • 00:07:50 And we are ready. You don't need to do option  2, option 3, or option 4. But we are not going

  • 00:07:56 to use option 6 to start yet. So hit 7 and hit  enter. Return back to the folder. And whenever you

  • 00:08:03 are going to start training or after the initial  installation, run this bat file: Update_Kohya_and_Fix_FLUX_Step2.bat

  • 00:08:11 This is going to upgrade to  the latest libraries. It is going to also update

  • 00:08:16 the scripts to the latest version. So we are  going to have the very latest version with this

  • 00:08:22 bat file. I will keep these files updated. Just  wait for it to install everything. The steps are

  • 00:08:29 also written in this post file, but watching  this video is better. And the updates have been

  • 00:08:34 completed. Verify that they are all working. We  are currently using Torch 2.4.0. And it is working

  • 00:08:42 slowly on Windows compared to the Linux systems.  However, with Torch 2.4.1, hopefully this speed

  • 00:08:52 issue on Windows will be fixed. And as soon as  it is released, I am going to update my installer

  • 00:09:00 scripts. There has been a new development after I  have completed the tutorial video, which is Torch

  • 00:09:08 2.5 version. With this version, there is a huge  speed improvement on Windows. It is almost equal

  • 00:09:18 to Linux. However, Torch 2.5 version is currently  in development. So it may be broken. It may give

  • 00:09:26 you errors. I also didn't test its quality yet.  Hopefully, I will test it and I will update the

  • 00:09:32 Patreon post. You see, it is already updated. The  file that you need to use is: Install_Torch_2_5_Dev_Huge_Speed_Up.bat file

  • 00:09:43 after completing  the installation. As I said, this is experimental

  • 00:09:46 and follow the news on the Patreon post. And now  we are ready to start using the Kohya SS GUI. So

  • 00:09:54 for starting the Kohya SS GUI, you can either  enter the Kohya SS folder and start the gui.bat

  • 00:10:01 file. Or you can use my automatic starter file,  which is the Windows start KohyaSS.bat file. This

  • 00:10:08 will automatically start and open the browser.  You see the interface has started. If it doesn't

  • 00:10:14 start, you need to open this manually. All right,  now this is the interface of Kohya. It is very,

  • 00:10:19 very useful and easy to use. A very important  thing that you need to be careful of is that

  • 00:10:25 you need to select LoRA because we are currently  training LoRA. LoRA is optimization. So it is

  • 00:10:33 different from DreamBooth. DreamBooth is basically  fine-tuning, training the entire model. But with

  • 00:10:38 LoRA, we are only training a certain part of  the model. Thus, it requires lesser hardware,

  • 00:10:46 but also results in lesser quality. Currently,  this tutorial is for LoRA training. Hopefully,

  • 00:10:51 I am going to do full research on DreamBooth  fine-tuning and publish another tutorial for

  • 00:10:57 that. And I am going to publish configurations as  well. So this tutorial is so far the combination

  • 00:11:03 of over 64 different trainings. This  is literal. When you go to this post,

  • 00:11:09 you will see the entire research history. This is  a very, very long, lengthy post. You will see all

  • 00:11:15 of the tests I have made. You will see all of the  results, grids, and comparisons. I am also going

  • 00:11:21 to show some of the parts in this tutorial.  So read this post if you want to learn how I

  • 00:11:27 came up with my workflow and configuration. Also,  when you open this post, you will see all of the

  • 00:11:33 models that I have prepared up to now. And this is  not all. Currently, I am running 8 different

  • 00:11:41 trainings on 8x RTX A6000 GPUs on Massed Compute.  You can see the trainings are running right now.

  • 00:11:50 Currently, I am testing a new hyperparameter  and the clip large text encoder training,

  • 00:11:56 which has arrived just today. So after these tests  have been completed, I am going to analyze them,

  • 00:12:03 post the results on this research topic, and I  am going to update the configuration files. Don't

  • 00:12:09 worry, you will just load the configuration. You  will just download the latest zip file and load

  • 00:12:14 the configuration, and you will get the very best  workflow, very best configuration whenever you are

  • 00:12:20 watching this tutorial. So that is why following  this post is extremely important. The link will

  • 00:12:25 be in the description and also in the comment  section. Don't forget that. So now our interface

  • 00:12:30 started and I am going to load the configuration  for beginning the setup. I have selected the

  • 00:12:36 LoRA in the training tab. When you look at the  folder, you will see best configs and you will

  • 00:12:41 see best configs, better colors. And what do these  configurations mean? When you scroll down, you

  • 00:12:49 will see the description of each configuration. I  have prepared a configuration for every GPU. So if

  • 00:12:56 you have an 8GB GPU, you need to use this one:  Rank_9_7514MB.json file. When you enter inside

  • 00:13:05 the best configs folder, you will see that the  JSON file is there. So according to your VRAM,

  • 00:13:12 pick the configuration file. There are slow and  fast versions. What do they mean? With the slow

  • 00:13:19 version, we are going to get slightly better  training quality. There is not much difference

  • 00:13:25 between rank 1 and rank 6. After rank 6, we are  going to lose some quality because of the reduced

  • 00:13:34 training resolution and also reduced LoRA rank.  If you read the research post, you will see the

  • 00:13:40 difference of each configuration. Training under  1024 pixels seriously reduces the quality. Also,

  • 00:13:49 I find that LoRA rank 128 is the very best spot.  So these three configurations will have slightly

  • 00:13:57 lesser quality than these ones. These ones will  be very, very close. Since I have an RTX 3090 GPU,

  • 00:14:05 I am going to use the very best configuration  rank 3. Currently, this is running slow on my PC,

  • 00:14:13 but with the Torch version 2.4.1, it will be much,  much faster. So go to here and click this icon,

  • 00:14:22 this folder icon, enter inside the folder of best  config and load the rank 3. Now you may wonder

  • 00:14:30 what is the difference with better colors. This  is in experimentation. Currently, I am training

  • 00:14:36 it to decide whether to fully switch to it or  not. But this uses time step shift sampling

  • 00:14:43 and it makes a huge impact on the coloring. When  you open this file, you will see this very big,

  • 00:14:50 huge grid. And when you look at this grid file,  you will see that using the time step shift

  • 00:14:57 brings better colors and a better environment like  this one. However, it also slightly reduces the

  • 00:15:03 likeness. I am still researching it. I am still  training it. When we open this imgsli link,

  • 00:15:09 you can see a one by one comparison. So according  to your needs, decide the configuration you want.

  • 00:15:15 Hopefully, I will complete research on time  step shift sampling and I will decide the

  • 00:15:20 very best configuration. Currently, we have two  different configurations and their difference is

  • 00:15:25 like this. But to be safe, you can use the best  configs folder right now. However, you can also

  • 00:15:30 use best configs better colors too. So look at  the grid file imgsli and decide yourself. OK,

  • 00:15:37 as a next step, you don't need to set anything  here. This is for multi GPU. I also have multi

  • 00:15:42 GPU configs and I will show them hopefully in the  cloud tutorial. I am going to also make a cloud

  • 00:15:47 tutorial for Massed Compute and RunPod. So  what you need to set with this configuration,

  • 00:15:52 you need to set pre-trained model path. This is  super important. The model links are posted here

  • 00:15:58 so you can either download them from here or I  have a one-click downloader here. You see Windows

  • 00:16:03 download training models files. If you already  have some of the models, make sure that they

  • 00:16:08 are accurate models or you will get errors. This  bat file will download everything automatically

  • 00:16:14 for me. You see, it started downloading already.  However, I already have them downloaded here. So

  • 00:16:19 I am going to just pick them. If you already have  them, you can use them. We are going to use FLUX

  • 00:16:26 Dev version 23.8 gigabytes. The FP8 support also  arrived, but I haven't tested it. So to be sure,

  • 00:16:34 use the FP16 version. It doesn't matter because we  are going to use it in FP8 mode and training with

  • 00:16:42 24 gigabyte GPUs and it will cast the precision  automatically. So select this. You see, this is

  • 00:16:48 the base model that I have. Now selecting training  images is super important and so many people are

  • 00:16:55 making mistakes here. If you know how to set up  Kohya folder structure, you can already set it up

  • 00:17:01 yourself, but don't use it if you are not an  expert. So in the dataset preparation section,

  • 00:17:08 we are going to prepare our dataset. The dataset  preparation is extremely important. I have written

  • 00:17:15 some information here on how to caption them, how  to crop them. I already have an auto cropper and

  • 00:17:22 auto resize script, but you can also manually  auto crop and auto resize them. So currently my

  • 00:17:29 training images are auto cropped and auto resized  to 1024 pixels as you are seeing right now. These

  • 00:17:36 scripts that you will find are extremely useful.  Watch this video and check out this script to

  • 00:17:41 automatically crop the subject with focus and  then resize to get your training dataset. Once

  • 00:17:47 you have your dataset like this, copy the path of  this dataset or you can also select the path from

  • 00:17:54 here. You see training images directory. Click  this icon, go to the folder like this and select

  • 00:18:00 the folder like this. Now this is super important:  repeating. What does repeating mean? We are going

  • 00:18:07 to set the repeating to 1 because we are not  going to use classification / regularization images.

  • 00:18:13 I have tested the impact of regularization and  classification images. You can see the results in

  • 00:18:17 this research post. For example, let's open this  post and no matter what I have tried, there is no

  • 00:18:25 way to improve the likeness and the quality with  classification regularization images. In none of

  • 00:18:31 the cases it yielded better results. When I have  used the classification regularization images,

  • 00:18:38 you see, you will get mixed faces like this. And  if you still want to know what does repeat means,

  • 00:18:45 there is a link here. Open this link. In this  link, I have asked Kohya to explain the logic

  • 00:18:53 of repeating. The logic of repeating is initially  made to balance imbalanced datasets. What does

  • 00:19:00 that mean? Let's say you have 5 concepts that  you want to train at once. And one of the concepts

  • 00:19:06 has 100 training images. The other one has 20  images. The other one has 30 images. So in machine

  • 00:19:12 learning, you would like to make datasets balanced  as much as equally trained. Therefore, if you have

  • 00:19:19 100 training images, you make the repeating 1.  And for other concepts, if there are 20 images,

  • 00:19:25 you make the repeating 5 of that concept.  However, since I am training a single concept

  • 00:19:31 right now, I am going to set repeating 1 and I  am not going to use regularization images because

  • 00:19:37 for FLUX training, it doesn't improve the results.  However, when you train Stable Diffusion 1.5 or

  • 00:19:44 Stable Diffusion XL (SDXL), it really improves the  quality of training. At the end of this tutorial,

  • 00:19:51 I will show you how to load the very best  configuration for SD1.5 and SDXL and train

  • 00:19:58 them. Don't worry about that. So we did set the  training images directory here. We are going to

  • 00:20:03 set the instance prompt. I am going to use  a rare token ohwx because it contains very

  • 00:20:09 little knowledge. Also, with FLUX training, it is  not as important as before, but still use it. And

  • 00:20:16 as a class prompt, I am going to use "man." Even  if you don't use a class prompt, since the FLUX

  • 00:20:23 model has an internal system that behaves like  a text encoder and automatic captioning, it will

  • 00:20:30 still know what you are training like you have  fully captioned it. So even if you don't provide

  • 00:20:36 any instance prompt or class prompt, it will work  with FLUX. I have also tested it. In the research

  • 00:20:43 topic you will see the training results with ohwx  man, with only ohwx, and with only man. And you

  • 00:20:52 will see that almost in all cases, it perfectly  generates your concept. Still, I find it a little

  • 00:21:00 bit better to set the class prompt as "man" or if  you are training a woman, it's "woman." If you are

  • 00:21:05 training a car, it's "car." If you are training  a style, it's "style." But training for a man,

  • 00:21:10 I use "man." Even if you don't use it, it will  work. If you are training multiple people in one

  • 00:21:16 training, you don't need to set a class prompt.  Just give each one of them a rare token like ohwx,

  • 00:21:23 like bbuk, and other rare tokens that you can  decide yourself, and train all of them at once.

  • 00:21:30 But currently, this is for single-person training.  So this is the setup. Instance prompt, class

  • 00:21:35 prompt, and destination directory. This is super  important. The Kohya GUI will prepare the correct

  • 00:21:41 folders and save them at that destination. So I am  going to use the installed folder like here. You

  • 00:21:48 can use anywhere, and I will save train images  like this and then click prepare training data

  • 00:21:55 and watch the CMD. You will see that it is done  creating the Kohya SS training folder structure.

  • 00:22:01 When you enter inside that folder, you see the  train images folder generated here. When I enter

  • 00:22:07 inside it, you see there is an image and inside  the image, there is 1_ohwx man. So what does this

  • 00:22:15 mean? This means that the one is the repeating.  This is super important. If you set this to 100,

  • 00:22:21 at every epoch, it will repeat 100 times. So this  is super important. Kohya will always read the

  • 00:22:27 repeating number from here, and the rest will be  simply the caption of my images. If I wanted to

  • 00:22:34 caption them, I could caption them, and it will  read the captions. If you don't caption your

  • 00:22:39 images, it will read the folder name as a caption.  So let's also make a captioned example. I will

  • 00:22:45 copy and paste this like this. The name of this  folder will not be important because I'm going to

  • 00:22:49 caption it. So copy this path, and for captioning,  I have an amazing application called Joy Caption.

  • 00:22:54 Click here. When you get to this post, you will  see the installer for Joy Captioning. This is a

  • 00:23:00 very advanced application. It supports multi  GPU captioning as well because I am going to

  • 00:23:06 caption my entire regularization images dataset,  and I am going to make a LoRA on that. There will

  • 00:23:12 be a total of 20,000 images that I am going to  caption. So it supports multi GPU and also batch

  • 00:23:19 captioning. It is already installed. Let me open  it and show you how to caption. So Joy Caption 1.

  • 00:23:25 Let's start the Windows application. Let's start  like this. The application started. All I need

  • 00:23:31 to do is give the input folder path here and just  start batch captioning. You can decide on multiple

  • 00:23:39 GPUs, multiple batch sizes. All is working.  This is a very optimized application. There

  • 00:23:43 is an override existing caption file, append  new caption file, and max new tokens. You don't

  • 00:23:49 need to set these parameters. You can just change  the max new tokens. I'm going to make a dedicated

  • 00:23:54 tutorial for this application hopefully later. You  can see the progress on the CMD here. So first it

  • 00:24:01 is loading the checkpoints to start captioning.  The captioning started and it is captioning the

  • 00:24:07 files. When I enter inside this file, you will  see that it is generating captions like this. When

  • 00:24:12 you open the text file, you will see the caption  it generated for that image. However, if you do

  • 00:24:19 captioning, it will reduce the likeness of your  concept. If you are training a person or if you

  • 00:24:26 are training similar stuff. Captioning with styles  may work better. I am hopefully going to test it,

  • 00:24:32 but still, if you want to do captioning,  you can use this application to caption. I

  • 00:24:37 have also compared the captioning effect. So let's  search the caption in this post and let's see.

  • 00:24:45 Yes, here the captioning results are here. Let's  open it. And when we look at the caption results,

  • 00:24:51 there will be a slightly reduced likeness.  However, I didn't see improvement in the

  • 00:24:58 environment or generalization or the overfitting.  This is because it already captioned itself in the

  • 00:25:06 FLUX architecture. So whether I caption or not, it  doesn't matter much. The likeness is still there.

  • 00:25:12 It's a little bit reduced and it doesn't improve  the overall quality. So for person training,

  • 00:25:18 I don't suggest captioning, but for training a  style, it may work better. As I said, hopefully

  • 00:25:23 I am going to research it and make a tutorial  for style captioning. After you did the caption,

  • 00:25:28 you still should add an activation token. What  was the activation token? "instance prompt". So you

  • 00:25:34 can either edit them manually or I have a caption  editor. The caption editor is posted here. When

  • 00:25:41 you go to this link, you will see the instructions  and downloader zip file. It's a very, very

  • 00:25:47 lightweight Gradio application. Let's open the  image caption editor, start the application. It

  • 00:25:53 doesn't use any GPU. It is just Python code that  allows you to edit the images. It is web-based so

  • 00:26:00 it can be used anywhere. So you need to enter the  input folder path. This was our folder path. Let's

  • 00:26:06 enter it and it will automatically refresh and  scan the images. You see, these are the images.

  • 00:26:10 When I click here, it will show me the caption.  I can either manually edit it here like ohwx.

  • 00:26:18 Then I can click save caption and it will save it.  Then I can click the next image. Then I can filter

  • 00:26:24 images by processed or unprocessed. This is an  extremely useful application. So with processed,

  • 00:26:30 you see this was the saved processed and these  are the still waiting ones. This is a very,

  • 00:26:35 very lightweight application. Let's make this  again as "man" because I am going to show you

  • 00:26:40 another thing. Let's save the caption. Then in  the batch edit options, you can enter the folder

  • 00:26:46 and you can replace words like replace "man" with  "ohwx man" and replace all occurrences and it is

  • 00:26:54 going to override all the files. Apply batch edit  and all captions are edited. Then you can check

  • 00:27:01 all the captions to see whether they contain my  activation token or not like this check word and

  • 00:27:07 you see all captions contain the specific word  or phrase. Then when I refresh here and open

  • 00:27:14 an image, you see this photograph features a  "ohwx man standing." So all the captions are

  • 00:27:20 now ready to use. Since this folder exists in  my images folder, it will train both of them,

  • 00:27:27 but we don't need captioned images now. So  I'm just going to delete this folder. OK,

  • 00:27:31 let's return back to the Gradio interface where we  set up. After we clicked, prepare training data,

  • 00:27:37 we also need to click "copy info to respective  fields." Otherwise, you will see that the image

  • 00:27:42 folder is not accurate. You see it is inaccurate.  So click "copy info to respective fields" and you

  • 00:27:48 will see that it did set the folder like this  "train images img." So what does this mean? It

  • 00:27:55 didn't give the path of this folder. It gives the  parent folder path here so you can have multiple

  • 00:28:02 concepts, multiple persons or items, anything  inside this folder, and Kohya will read each one

  • 00:28:10 of the folders and whether you have captioned  them or you don't have captioned them, it will

  • 00:28:15 read the folder name or the captions and it will  train based on all of the images. You can also set

  • 00:28:21 different repeating numbers for each concept, as I  have explained. Let me show you. Let's say I have

  • 00:28:27 3 concepts. This can have 3 repeating.  This can have 5 repeating. So the images inside

  • 00:28:33 this one will be repeated 5 times. Images  inside this one will be repeated 3 times.

  • 00:28:37 This is the logic of setting up folders. Let's  delete them. Now we have set the model path,

  • 00:28:45 trained model output name, whatever the name you  give, it will generate checkpoints with that name.

  • 00:28:50 Let's say "Best_v2." OK, this will be  the output model name for LoRAs that are going

  • 00:28:58 to get generated. FLUX1 selected. Everything is  selected. You don't need to set anything and you

  • 00:29:03 don't need to use dataset preparation again, but  you can prepare multiple datasets by using here.

  • 00:29:09 With Stable Diffusion XL and with Stable Diffusion  1.5, we use regularization images. At the end

  • 00:29:15 of this tutorial, I will show that too. So I'm just  skipping that part. And the parameters. In the parameters what you

  • 00:29:21 need to set are a few things. First of all, how  many epochs you are going to train? We are going

  • 00:29:28 to train based on epochs and one epoch means  that all of the images are trained one time.

  • 00:29:35 When we use regularization images, we use a  different strategy. We train 200 repeating and

  • 00:29:41 1 epoch. But since we don't use regularization  images and since we use 1 repeating, we are

  • 00:29:46 going to use epoch strategy. So if you have 100  images, you can reduce this epoch number. However,

  • 00:29:53 you can still train up to 200 epochs and compare  checkpoints to get the very best checkpoint. So as

  • 00:29:59 you increase the number of images for a single  concept, you may want to reduce the number of

  • 00:30:04 epochs. Currently, most people will collect like  15 to 25 training images, maybe 50. So training

  • 00:30:11 up to 200 epochs and comparing checkpoints is  the best strategy. About the training dataset:

  • 00:30:17 this training dataset is not great. Why? Because  it contains the same clothing, same background

  • 00:30:25 environment, and it is lacking expressions. Many  people are asking me how to generate expressions.

  • 00:30:31 If you want to generate expressions, you need  to have expressions, emotions in your training

  • 00:30:37 dataset. I am preparing a much better training  dataset. It is not ready yet. I am still using

  • 00:30:43 the same dataset so I can compare it with my older  trainings. However, hopefully I will make another

  • 00:30:49 video for an amazing training dataset. You will  see that. So when you are preparing your training

  • 00:30:54 dataset, have different poses, have different  distances like full body shots, close shots, have

  • 00:31:00 different expressions that you want to generate  like laughing or crying, whatever you want, have

  • 00:31:06 different clothing and have different backgrounds.  The quality of the dataset is very important. With

  • 00:31:12 FLUX, it is still very flexible. It is better than  the SDXL or SD 1.5. However, as you improve your

  • 00:31:19 dataset, you will get better results. One another  thing is that make sure that your dataset has

  • 00:31:24 excellent focus, sharpness, and lighting. This  is also very important. Do not take pictures at

  • 00:31:31 nighttime. So make sure that your images have  very good lighting, very good focus, and very

  • 00:31:36 good sharpness. OK, let's return back to Kohya  GUI. So I will leave this at 200. You can reduce

  • 00:31:42 this based on the number of images you have, how  many checkpoints you want to get. You can make

  • 00:31:47 this like 10, and after every 10 epochs, it will  generate a checkpoint. And at the end of training,

  • 00:31:53 you can compare all of them. I am going to show  you how to use and compare them. There is no issue

  • 00:31:58 with it. So you can make this 10 and compare all  the checkpoints and find the very best checkpoint

  • 00:32:04 you liked. So this is the most optimal way of  obtaining the very best model and checkpoint.

  • 00:32:11 But for this one, let's leave this as 25. Caption  file extension. If you are going to use captions

  • 00:32:18 instead of the folder names, you need to select  the correct caption extension. By default,

  • 00:32:23 it is TXT and TXT is the most used one. You can  also use caption and cap. I never used them.

  • 00:32:29 TXT is just working fine for me. Then you don't  need to change anything else here. By default,

  • 00:32:35 we are using 128 to 128 network rank. OK, this  is another thing that you need to set. This is

  • 00:32:42 super important: VAE path. So for VAE path, we set  this ae.safetensors file. It is already downloaded.

  • 00:32:51 Let's go to the folder, which was inside here. I  had downloaded here and ae.safetensors file. And we

  • 00:32:58 are going to use clip large model. Click here.  Let's go to the clip large. OK, it is also set

  • 00:33:04 and we are going to use T5 XXL. Make sure to use  T5 XXL FP16. It will be auto cast to the correct

  • 00:33:12 position. So let's also pick that file from our  downloaded folder, which is here. And we are all

  • 00:33:20 set. You don't need to do anything else. Just  save the configuration wherever you want. So I

  • 00:33:26 will save the downloaded folder by clicking this  save. If you want to reload, you click this or

  • 00:33:31 click refresh to reload. You see the refresh will  refresh. You can also re-pick, but sometimes when

  • 00:33:36 you re-pick, it may not refresh. So click this  to refresh. Let's select the saved config again,

  • 00:33:42 which was this one. And let's refresh. Yes, I can  see the configuration reloaded. Currently, none of

  • 00:33:48 the configurations are training text encoder clip  large. As I said, I am researching it right now,

  • 00:33:54 and hopefully I will update the configuration  according to it. The training of clip large will

  • 00:33:59 increase the VRAM usage slightly. It increased  by like 800 megabytes on 16-bit precision.

  • 00:34:06 It will also slow down slightly. You see from  8.82 seconds IT to 8.56 seconds IT on a 16-bit

  • 00:34:16 precision training on a 6000 GPU. OK. As I said,  we are ready. And before clicking start training,

  • 00:34:24 you need to check your VRAM usage. Try to make  your VRAM usage under 500 megabytes. Currently,

  • 00:34:31 you see my computer is using 3.5 gigabytes. Why?  Because I am running OBS studio. I am running

  • 00:34:38 NVIDIA broadcast and some other applications. So  try to reduce your VRAM usage to 500 megabytes if

  • 00:34:45 you are on limited VRAM. How can you reduce it?  Go to the startup, disable all of the startups,

  • 00:34:51 restart your computer, and check the performance  and GPU and see how much VRAM you are

  • 00:34:58 using. Alternatively, you can also use another  application, which is nvitop. So to install

  • 00:35:05 nvitop, pip install nvitop. I already have  it done. Type nvitop and it will show you

  • 00:35:12 the VRAM usage, GPU utilization, how many GPUs  you have. You see I have two GPUs. So this is

  • 00:35:18 the way of also seeing exact VRAM usage.  You see this one shows 3.8 gigabytes I am

  • 00:35:25 using. This one is showing 3.5 dedicated GPU  I am using. So you can use either way. Then

  • 00:35:30 click start training and watch the progress  on the CMD window of the Kohya. You see it is

  • 00:35:37 starting everything. You should read the messages  appearing here. If you see this error, xFormers

  • 00:35:44 can't load. It is not an issue. Currently, we are  not training xFormers. Actually, I had fixed this,

  • 00:35:49 but I am going to fix it again. But it is not an  issue because currently xFormers is not working

  • 00:35:54 better than the default cross-attention, which is  sdpa. I will also fix xFormers later, but you can

  • 00:36:01 ignore this xFormers message. It doesn't make any  difference. We are not using xFormers. So what are

  • 00:36:07 the messages we see here? This is important.  You see it has found 1 repeat. It has found

  • 00:36:13 15 images. So one epoch will be 15 steps because  the batch size is 1. Increasing the batch size,

  • 00:36:21 I have tested it too. It doesn't bring almost any  speed gain. And as you increase the batch size,

  • 00:36:27 you will get lesser quality training. Batch sizes  should be used only for speed. And in this case,

  • 00:36:35 it doesn't increase much. But you can use multiple  GPUs to get almost linear speed increase. I have

  • 00:36:41 explained the batch size in the Patreon post. So  read that section and you see the regularization

  • 00:36:47 factor is 1. Total steps 15. Train batch size  is 1. Gradient accumulation steps 1 and 200

  • 00:36:52 epochs. So it calculates the number of total  steps, which is 15 divided by 1 divided by

  • 00:36:58 1. Why? Because this is the batch size. This is  the gradient accumulation steps multiplied by 200.

  • 00:37:04 This is the number of epochs and multiplied  by 1. This is the regularization images,

  • 00:37:09 which we don't use. So it is 1. When you use  it, it is 2. And total, it will be 3000 steps.

  • 00:37:16 I hit enter because when you touch the CMD, it  will pause the CMD window. And you will also

  • 00:37:22 see that it didn't find any captions. It didn't  find regularization images. So it is going to

  • 00:37:27 use class tokens or ohwx man, which it reads from  the folder name. So everything is looking good.

  • 00:37:34 And it will load and start training. The speed is  not very great. And you will get better speed as

  • 00:37:39 it progresses. In the beginning, it will show not  an accurate speed. Once the Torch 2.4.1 published,

  • 00:37:48 we should get speed like 4 to 5 seconds IT on  Windows with an RTX 3090 GPU. But as I said,

  • 00:37:55 don't worry, I'm going to make cloud tutorials.  And it is going to be super fast on cloud GPUs.

  • 00:38:01 You will be able to rent very powerful GPUs.  And it is perfectly trainable. So this is how

  • 00:38:06 we do training on Windows. It will be the same on  Massed Compute and RunPod as well. The procedure

  • 00:38:12 will be the same. Just the starting and installation  will not be the same. So just wait until these

  • 00:38:18 checkpoints are saved. And how are we going to use  them afterward? Since I already have trained them,

  • 00:38:25 I am not going to wait for training. In this  repository, I have the training files, which is

  • 00:38:31 trained with exactly the same configuration.  You see the Best_v2. So I'm going

  • 00:38:37 to download several checkpoints to test on my  computer. When you are training, there is one

  • 00:38:42 thing that is extremely important that you have  to be careful of. Open the task manager, go to

  • 00:38:48 the performance tab, and go to your GPU. You see  that there is dedicated GPU and shared GPU memory.

  • 00:38:56 When you are doing training, if it uses shared  memory, it will become slower like 10 times,

  • 00:39:04 20 times. So this is the criterion that you need  to be careful of. Compare the shared GPU memory

  • 00:39:10 usage before starting the training and during the  training and make sure that it doesn't use more

  • 00:39:17 shared GPU during the training. If it does, then  you will have extremely slow training speeds. This

  • 00:39:24 is really important. This is also valid for all of  the AI applications you use, like Swarm UI, like

  • 00:39:32 Forge Web UI, like Automatic1111 Web UI, whatever that  comes to your mind. When it starts using shared

  • 00:39:38 VRAM, that will make your application slower like  20 times because it will start using the memory of

  • 00:39:46 my computer and the memory of my computer will  be way slower than the GPU memory. Moreover,

  • 00:39:52 if you have low RAM in some applications, you may  encounter a problem, but it is not an issue. You

  • 00:39:59 can set virtual RAM memory. So you see currently I  am training and I am not using any shared VRAM. My

  • 00:40:06 dedicated GPU memory still has space to go like  3 gigabytes. This is super important. And

  • 00:40:12 about RAM memory, you can set virtual RAM to have  more RAM memory. Memory is usually important when

  • 00:40:20 it is first time loading. So if you have limited  RAM, you can increase the virtual RAM memory to

  • 00:40:25 avoid any errors. To increase the virtual RAM  memory, click this PC. In here, click this PC

  • 00:40:32 properties. It will open these properties. In  here, go to the advanced system settings. And

  • 00:40:37 in this screen, you will see the settings in the  performance. And in here, go to the advanced and

  • 00:40:43 you see I have virtual memory. Change it. Set one  of your fast drives and you can set a custom size.

  • 00:40:50 Currently, my virtual RAM memory is set here.  You see system-managed size and it will allocate

  • 00:40:57 as necessary as virtual RAM memory. OK, this  is the way of it. So during the training,

  • 00:41:03 you will see that it will save checkpoints like  this. And you will see their saved locations and

  • 00:41:11 where they will be saved. They will be saved in  the outputs folder. You see the output directory

  • 00:41:19 for trained model. This was automatically set  by the GUI when we used the prepared training data

  • 00:41:28 and copied info to the respective fields. After  you clicked "copy info to respective fields," you

  • 00:41:34 can change the output directory for training the  model. You can directly give the folder path of

  • 00:41:40 your Swarm UI, your Forge web UI, wherever you  are using your Comfy UI. So my models will be

  • 00:41:46 saved automatically here. When we open that  folder, we will see the saved checkpoints like

  • 00:41:53 this. So these checkpoints are 2.4 gigabytes.  Why? Because the FLUX model is very big. It

  • 00:42:00 has 12 billion parameters. Moreover, we are  saving it as float. So we are saving it with

  • 00:42:08 the maximum precision without any quality  loss. If you want to reduce the file size,

  • 00:42:14 you can also save it as FP16. I haven't compared  it. Some of the followers say that BF16 is working

  • 00:42:22 badly. So if you need a lower size, save it as  FP16. If you want to have the maximum accuracy,

  • 00:42:29 save it as float. Saved precision will not  change VRAM usage. Will not change the VRAM usage

  • 00:42:37 during training or during inference. So now it is  time to use the saved models with the FLUX model

  • 00:42:44 itself. So how are we going to use them after  training has been completed? You can use them with

  • 00:42:50 Comfy UI, with Swarm UI, with Forge web UI at the  moment. I prefer to use in Swarm UI. I already have

  • 00:42:58 a main Swarm UI tutorial. It is amazing. And I  also have a Swarm UI tutorial for FLUX. Moreover,

  • 00:43:07 I also have a Forge web UI models downloader  and installer for Runpod and also for Massed

  • 00:43:13 Compute. I will show the cloud service providers  in the cloud tutorial, but I will show how to use

  • 00:43:21 it on my computer at the moment. So I will move  these files into my Swarm UI models folder. You

  • 00:43:28 see it also saves the TOML file and the JSON file  wherever you are saving them. So let's select the

  • 00:43:35 generated models and move them into our Swarm UI  model folder. My Swarm UI is installed here. When

  • 00:43:43 you watch the tutorial, you will see it. I will  put the generated LoRA files into the models into

  • 00:43:49 the LoRA folder here. This is important. This  is where you need to put them. Before starting

  • 00:43:54 training as I said, you can give this folder  path. So after you have set the copy info to

  • 00:44:02 respective fields, go here and change it like this  and save your configuration. Then it will save the

  • 00:44:09 generated LoRA files into this folder for you. OK,  so I'm going to start my Swarm UI. First of all,

  • 00:44:16 I am going to update it. When you are using Swarm  UI, you should update it always. You can also use

  • 00:44:22 Forge web UI or ComfyUI. I prefer Swarm UI because  it is amazing. It has so many amazing features.

  • 00:44:29 It also uses the back end of ComfyUI. So I will  use the launch windows.bat file to start it.

  • 00:44:35 And my ComfyUI is starting. There is also a trick  that I'm going to show you. This will give you

  • 00:44:41 a lot of performance boost. If you have an RTX  4000 series GPUs, click this pencil icon and add

  • 00:44:49 extra arguments --fast here and save it. This will  hugely speed up your generations with Swarm UI

  • 00:44:57 when you are using the FLUX model. Backends  are still loading. Go to the server,

  • 00:45:01 go to the logs. Let's see in the debug. So we can  see that it is starting the server. It is loading

  • 00:45:07 everything and the data is getting refreshed. OK,  so now I am ready to generate images. For the FLUX

  • 00:45:13 model. I am going to use CFG scale 1. When you  use CFG scale 1, you see the negative problem

  • 00:45:18 is disabled because FLUX doesn't support. I prefer 30 steps. Based on your GPU,

  • 00:45:24 you can set it. I'm going to use resolution 1.  I'm going to use sampler Uni PC. I find this the

  • 00:45:30 best. The scheduler will be normal and you can  set the preferred D type here. Since this is a

  • 00:45:35 24-gigabyte GPU, I am going to use this one.  However, you can also use 16-bit on a bigger

  • 00:45:43 VRAM-having GPU. This is a good one. There are  also quantized models, but I haven't tested LoRAs

  • 00:45:50 on them, whether they are working or not. So I  don't know. And I can't say they will work. And

  • 00:45:56 in the models, I have FLUX 1 version dev FP8.  You can also use the FP16 model. They should work

  • 00:46:05 exactly the same actually. So to be sure, I am  going to use the FP16 model. I already have it

  • 00:46:12 here. So let's copy it. And let's copy it into our  models folder. It goes into the Stable Diffusion.

  • 00:46:22 Let's see. Yeah, it goes into the UNet because  this is not a quantized model. It goes into the

  • 00:46:27 UNet. The quantized models go into the Stable  Diffusion folder in the Swarm UI. I could also

  • 00:46:33 give the path of this file from this folder  before doing the training. So you don't need

  • 00:46:38 to have duplicate files. What would be the case?  The case would be I click this icon. Then I go to

  • 00:46:46 my Swarm UI installation, which is here. Swarm UI  inside models, inside UNet. And I can select this.

  • 00:46:55 So you see, you can give any file path in Kohya GUI.  It will just work. Then refresh the models and the

  • 00:47:03 FLUX development arrived. This is an FP16, 16-bit  precision model. But since I'm going to use 8-bit,

  • 00:47:10 it will auto-cast it. Before starting using the  LoRA, let's generate an image. And a very good

  • 00:47:17 part is that in the zip file, I have shared some  test prompts. So let's open the test prompts from

  • 00:47:23 here. And for example, let's generate this image  without our LoRA. And let's hit generate. You can

  • 00:47:29 watch the progress in the server, in the logs, in  the debug. So it is going to load the model. And

  • 00:47:36 you see model weight dtype, manual cast. So it is  going to do everything automatically for me. I can

  • 00:47:42 watch the VRAM usage here. It is using ComfyUI,  thus it is very, very optimized. I have selected

  • 00:47:49 the model. You see the FLUX 1 development model  is selected. As I said, I am not sure whether it

  • 00:47:54 is working with the quantized models or not. I  haven't tested yet, but you can test. For now

  • 00:48:00 make sure to test with the development model. Then  you can test on quantized models as well. And this

  • 00:48:05 is my VRAM usage currently. Don't worry, you don't  need such VRAM. It works as low as 6 GB having

  • 00:48:12 GPUs. It is extremely optimized. Since I have more  VRAM, it is using more VRAM. When you have lower

  • 00:48:19 VRAM, it will use lower VRAM. Moreover, I can also  use my second GPU. It is amazing. So all I need to

  • 00:48:27 do is I am going to add a new ComfyUI self-starting  click here. OK. I will make this like this. Extra

  • 00:48:35 arguments, starting script. And I will make this  GPU ID 1 and save. So when I generate multiple

  • 00:48:41 images, it will use my second GPU as well. OK. The  image is almost ready. I am also using segment to

  • 00:48:48 in-paint face. This is not mandatory. In the main  tutorial, I explain everything. It is a little bit

  • 00:48:54 slow on my GPU, but you can use the cloud always.  We can see the IT per second. OK. Where is it?

  • 00:49:01 Let's generate another image to see it. Let's also  see. Yeah, it is fine. OK. Now it is starting to

  • 00:49:07 in-paint the face. OK. Face in-painting speed  is 1.5 seconds per IT. And it is doing 18 steps.

  • 00:49:13 Why? Because I set it at 0.7 denoise strength.  It is called different here. It is called image

  • 00:49:22 creativity. So it is doing 70%. And since I use 30  steps, it is doing 21. Actually, it should do 21,

  • 00:49:31 but it did 18. OK. This is the image we got. Now  I prefer to use FLUX guidance scale 4. I forgotten

  • 00:49:39 that. OK. Then how am I going to use my LoRAs?  Go to the LoRAs tab here, refresh. And once you

  • 00:49:46 refresh, you will see your FLUX LoRAs. You see  type FLUX 1 LoRAs. For example, let's use the

  • 00:49:52 150 epoch and hit generate. So now it is going  to use my LoRA. We can see in the back end. It

  • 00:49:59 will load. You can see everything here. Swarm  UI working much better than the Forge web UI.

  • 00:50:06 When you are using LoRAs, when you are using Forge  web UI, it first processes LoRAs. It takes extra

  • 00:50:12 VRAM. However, with Swarm UI, it doesn't do that.  Also, when you have more VRAM, it will be faster.

  • 00:50:18 Since I am using some serious VRAM right now. It  is using a lot of VRAM. I am recording a video.

  • 00:50:25 It is also making. And let's also close these two.  Try to reduce your VRAM usage as much as possible.

  • 00:50:32 And I am already using a lot of VRAM when I am  recording. We can see the preview here. First,

  • 00:50:37 it is generating the image. Then it will  in-paint the face to my face with the prompt

  • 00:50:42 photo of ohwx man. It is in-painting the face now.  This is not mandatory but in-painting the face,

  • 00:50:49 especially in distance shots, will improve the  face quality. Moreover, in my training images,

  • 00:50:56 I have eyeglasses. You see? But since FLUX  has an internal text encoder, it is able

  • 00:51:03 to separate my eyeglasses from my face. Thus, in  this image, I don't have eyeglasses. This is also

  • 00:51:10 a little bit overfit image. But I am working on  a better workflow. Hopefully, I will update it.

  • 00:51:15 and it will become much better. So you can add  here with eyeglasses. And you will get a more

  • 00:51:23 likely image. Let's see. Let's hit Generate and  double-generate. So once I click it double times,

  • 00:51:28 it will also start using my second GPU. It should  start. Let's see. I think first it will load into

  • 00:51:35 the second GPU. Then it will start. OK. Let's  see the server back-ends. OK. OK. It shows here

  • 00:51:41 running back-end. OK. For some reason, it didn't  start. And if preview disappears here, you can go

  • 00:51:47 to the image history, refresh. The last generated  images will always appear here. You can see them,

  • 00:51:52 their features. FLUX guidance scale, the LoRA. You  can also set different LoRA weights from here. OK.

  • 00:51:59 Image is generated. And you see it is now much  more resembling. We can see from the training

  • 00:52:06 images, it has a perfect resemblance. It is, as  I said, more overfit. With the fine-tuning, full

  • 00:52:13 model training I think this overfitting problem  will be solved. I am also doing more research

  • 00:52:18 right now, as I said. So I will update the config  as I get better config. Every day something new is

  • 00:52:24 arriving. Also, you may use overfit, over-trained  model checkpoints. So how are we going to test

  • 00:52:33 and find the very best checkpoint? This is a super  important part. Let's see also the other generated

  • 00:52:38 image. If you have an RTX 4090, it will be way  faster than this. It will be many times faster

  • 00:52:44 than this. You can always see the step speed here.  It is around 1.5 seconds per IT for me. And the

  • 00:52:49 second image also generated a perfect resemblance.  OK. So how am I going to find the very best

  • 00:52:55 checkpoint? To do that, we go to the Tools. And  in here, we select the Grid Generator. In here,

  • 00:53:03 in the first tab, I use LoRAs. So find the LoRAs  from here. Let's find it. You can also type to

  • 00:53:09 filter like this: LoRAs. When you click Fill,  it will fill all the LoRAs. Then you can delete

  • 00:53:15 the ones that you don't want to test. Let's start  testing from the 50 epochs. So these are epochs,

  • 00:53:20 epoch 50, epoch 75. If you save based on the  step count, it will have a different naming. So

  • 00:53:26 the last one will be the 200 epochs. Then you can  test multiple prompts or you can use this prompt

  • 00:53:32 for testing. If you want to test multiple prompts,  I have already prepared prompts. So these prompts

  • 00:53:39 are shared here. You see test prompts, no segment  in-painting. This doesn't have any segmentation

  • 00:53:45 in-painting. And I have test prompts here, test  prompts with eyeglasses. And test prompts, let's

  • 00:53:52 see. One of them doesn't have eyeglasses. OK, this  doesn't have. So I am going to change this name.

  • 00:53:59 I will fix this. Test prompts without eyeglasses,  grid formatted. So you just copy this. By the way,

  • 00:54:05 in the grid, this is the prompt separator. You see  like this. Then copy-paste it here. You see it has

  • 00:54:12 split each prompt. And now when I generate a grid,  it will test all of the models for me. Let's see,

  • 00:54:21 generate. This time it should use the second GPU  as well, I believe. Let's see what will happen.

  • 00:54:27 OK, now it started loading onto the second GPU as  well. So when generating this grid, it will use

  • 00:54:34 both of my GPUs. However, with my GPUs, this will  take huge time because there are 274 generations.

  • 00:54:43 This is estimated this will get better. But  this will make a huge grid. You should wait for

  • 00:54:49 it to update. You can rent for a 6000 on Massed  Compute and use all of them at once. It will be

  • 00:54:56 much faster. In the Massed Compute tutorial hopefully, I will show you. OK, the first

  • 00:55:01 image is generated. The first image for verifying  the model sanity. You see it is perfectly able to

  • 00:55:08 generate a supercar image. Nothing like me. So the  model sanity is perfect. It is still keeping its

  • 00:55:15 sanity. And what kind of results are we going to  get after these grids are generated? For now, I

  • 00:55:22 will interrupt this with "interrupt all sessions."  And I will show you from my cloud computing. So in

  • 00:55:29 the Swarm UI Massed Compute tutorial, there are  these new instructions. Copy this. First of all,

  • 00:55:35 you need to install this. OK, let's copy this and  open a new terminal here. New window and paste it

  • 00:55:43 and hit enter. This will start Swarm UI on Massed  Compute. But it will give me a cloud URL that I

  • 00:55:51 can use on my computer. It will be here. It is  Cloud Flare. Let's copy and paste it and open.

  • 00:55:56 And this is where I do my testing, my  experimentation. When I go to the tools and

  • 00:56:02 grid generator, I have so many previous tests like  this. You see, I have 8 GPUs running there.

  • 00:56:09 For example, let's open one of them. OK, let's  best new 150 epochs. And when I click here,

  • 00:56:16 it will show me the grid results. I explain all  of this in the main tutorial. In the advanced

  • 00:56:22 settings, you can select to show which prompts you  can select, which models to show. And we will get

  • 00:56:29 a grid like this where we will be able to compare  different checkpoints LoRAs. So by analyzing this grid,

  • 00:56:37 you need to find your best liked checkpoint.  Usually 150 is good, but it depends on your

  • 00:56:45 training data set. So generate a grid, analyze  all of the generated images. You will see at

  • 00:56:52 the top the model used. Also, when you click  the image in the bottom, you will see which

  • 00:56:57 LoRA is used. For example, for this one, the used  LoRA is. Let's see this one. Best v1_5e_05,

  • 00:57:05 150. Also, it shows the LoRA used name here.  So this is the way of finding the very best

  • 00:57:12 LoRA checkpoint that you have. For example, on  this Massed Compute, I have 8 running GPUs

  • 00:57:18 and I am able to generate ultra-fast images.  This is how I do my experimentation. There

  • 00:57:23 is no other way to do these many trainings on a  single GPU. So you need to spend a considerable

  • 00:57:29 amount of money. Thankfully, Massed Compute is  supporting me. So if you are wondering how did

  • 00:57:35 I generate those amazing pictures that I have  shown in the intro, I have generated amazing

  • 00:57:42 new prompts and I used the wild card feature  of the Swarm UI and I have written all these

  • 00:57:49 posts in this wild card and I have generated nine  thousand nine hundred ninety-nine images until I

  • 00:57:56 stopped it. The new prompts are shared in the test  prompts in the 340 prompts used as wild cards. OK,

  • 00:58:05 we have covered everything and there is also  extra information here on how to train on and

  • 00:58:11 use on Runpod and Massed Compute. The instructions  are already included in the file. You will see

  • 00:58:17 the Massed Compute, Kohya FLUX instructions. You  will see RunPod install instructions. Hopefully,

  • 00:58:23 I'm going to make separate tutorials for them.  Also, I suggest you save your models on Hugging

  • 00:58:29 Face if you want to save them. Download them fast.  I have an amazing notebook and tutorial for that.

  • 00:58:34 Watch it. And it is also asked of me how to do  SDXL and SD 1.5 training with the newest Kohya

  • 00:58:42 interface. So everything is the same, almost the  same. Therefore, I will quickly show you. For

  • 00:58:48 example, let's begin with SDXL. Let's open the  best configuration. In the very top we will see

  • 00:58:54 the configuration. Here. Let's download the Tier 1  24 gigabytes Slower V2. There is also a Tier 2

  • 00:59:01 low VRAM version and the configuration loaded.  Let's close the Swarm UI and let's open our latest

  • 00:59:10 Kohya installation, which is here. Let's start it  again. The configuration is downloaded. Kohya is

  • 00:59:18 starting. OK, then these are full fine-tuning  configurations, not for LoRA. So select the

  • 00:59:27 DreamBooth app here. Go to the configuration.  Click this icon. By the way, this is SD3 FLUX

  • 00:59:33 branch. So there may be some errors. You need to  install it the normal way. If it doesn't work,

  • 00:59:40 I don't know. I didn't test it. So let's go to  the downloads. Select the file and it is loaded.

  • 00:59:46 You see this is DreamBooth. SDXL is selected.  We do FP16 training. Now you set everything

  • 00:59:52 exactly the same. Trained model. Output name.  You put the image folder name. Pre-trained model

  • 00:59:57 path. But this time what differs is you should use  regularization images. And regularization images

  • 01:00:04 are posted here. I have amazing 5200  images for both women and men. So what

  • 01:00:14 you do when you go to the dataset preparation,  you also put regularization images and you put

  • 01:00:21 200 repeats or the number of repeating may depend on your number of training images.

  • 01:00:26 Let's say you have 50 training images. Since  we have 5000 regularization images,

  • 01:00:32 you can make this 100 and you can train  2 epochs. But let's say I have 15. So I

  • 01:00:38 make this repeating 200. Put my training  images path here. Then I also go to the downloaded

  • 01:00:46 regularization images. Let's see. For example,  these are 768. I also

  • 01:00:51 have 1024. Yeah,  here. This is an amazing dataset. Super

  • 01:00:56 high quality. I tested and compared the effect of  using regularization images and it's mind-blowing,

  • 01:01:02 especially with OneTrainer. And you put it  here. This time repeating will be 1. You

  • 01:01:07 just type ohwx and man it will auto-generate  the folders for us with accurate naming. Let me

  • 01:01:14 show you. So let's make the destination an example  place like music folder D. Usually you shouldn't

  • 01:01:20 use the users folder. But this is just for an  example. Prepare training data. Watch the CMD

  • 01:01:26 window because copying may take time. Wait until  you see done. Currently it is copying everything.

  • 01:01:32 We can see image 200 ohwx man is copied. And  it is also going to copy man datasets. OK,

  • 01:01:40 this one copied. And in the reg folder. OK, this  one copied. You see the regularization images are

  • 01:01:46 put into the reg folder and the training images  are put into the image folder. And inside the reg

  • 01:01:52 folder man images are named as 1 man because  one repeat and man class token and the training

  • 01:02:01 images are 200 and ohwx man. So what other thing  changes? Now we train only 1 epoch. So how

  • 01:02:08 are we going to save checkpoints in this case? To  save checkpoints in this case, let's copy info to

  • 01:02:13 the respective folders and let's also select the  pre-trained model path. I have already models

  • 01:02:19 here. Like let's see SDXL base, for example. By  the way, which models I suggest. I suggest you use

  • 01:02:31 Realistic Vision Version, RealVis XL 4 for training SDXL and for training  SD 1.5. I suggest what was the name: Hyperrealism

  • 01:02:40 version 3. So you should select those models.  And when you click the print training command,

  • 01:02:46 it will show you the total number of steps.  And it is now 6000 because I have 15 training

  • 01:02:53 images. I use regularization images and I have  200 repeating. Therefore, let's see if it does

  • 01:02:59 show the calculation. Yes, 200 repeats, 15  images. It makes 3000 steps per epoch. Since

  • 01:03:06 regularization images are used, it is multiplied  by 2. So total 6000 steps. And let's say I want

  • 01:03:12 to save every 20 epochs. So normally it would be  save every 20 epochs. But now I don't have that

  • 01:03:20 option since I train 1 epoch. So I will save  the number of steps. When you divide 6000 to 10,

  • 01:03:28 because we want to get 10 checkpoints, you  will change the save every N steps. Let's

  • 01:03:35 see here. So if I make this 601, it will save as  equal as saving every 20 epochs compared to the

  • 01:03:45 repeat 1. This is the logic of it. This will  save 10 checkpoints during the entire training.

  • 01:03:50 And this save every N epochs is 1 and epoch is  1. This will save full checkpoints. For SDXL,

  • 01:03:58 it will be over 6 GB. For SD1.5, it will be like,  I don't remember actually, but it will be big. So

  • 01:04:05 if you need LoRAs, do this way of DreamBooth  fine-tuning training, then extract LoRA. It

  • 01:04:12 will work much better than training LoRA with SDXL  or SD1.5. I have tested it. For extracting LoRA,

  • 01:04:20 I have an amazing post here. It shows how you  can extract. Let me also show you quickly. Go

  • 01:04:25 to the Utilities. Go to the LoRA. And in here, you  will see Extract LoRA. Whether it is SDXL or not,

  • 01:04:32 you pick it here. If you don't pick SDXL, it  will extract as SD1.5. If you pick SDXL, it will

  • 01:04:40 extract as SDXL. Currently, it doesn't support  FLUX. I don't see it here. Oh, here, click here.

  • 01:04:47 Extract LoRA also arrived. I haven't tested it  yet. I will hopefully test it after I train full

  • 01:04:53 fine-tuning. So select your fine-tuned model. This  is the generated model. Select the base model.

  • 01:04:58 This is like RealVis XL 4. Set the path where  you want to save. Save precision. You can save

  • 01:05:04 as float. It will double the size. Load precision.  You can load as float. Set this minimum difference

  • 01:05:09 00001. So it will also save the text encoder because  I train the text encoder as well. And you can

  • 01:05:15 set the network dimension 128 to 128. This is  it. For SDXL, you just select SDXL. And that is

  • 01:05:23 the way of extracting LoRAs and using them when  you do training with SD1.5 and SDXL. In my post,

  • 01:05:30 you will see very detailed instructions. So always  read the posts on here. Moreover, you should join

  • 01:05:38 our Discord channel. I am always on the Discord  channel. You can message me there. At the very top

  • 01:05:44 of the post, you will see SECourses Discord.  When you click it, you will see our Discord

  • 01:05:49 channel. We have over 8,000 members, over 1,000  online members. Just click Join Server to join.

  • 01:05:56 Moreover, we have a Patreon exclusive post index.  In here, you will see all of our amazing sharings.

  • 01:06:02 You can just do ctrl-F to see them. Also,  we have Patreon Scripts Updates History. You

  • 01:06:08 will see which scripts are updated last and what  changes are made. Sometimes I don't write the full

  • 01:06:15 changes. And we have Patreon Special Generative  Script List. This shows the useful scripts that

  • 01:06:21 you can use for other tasks, other jobs you have.  And you can also go to our GitHub repository here,

  • 01:06:29 Stable Diffusion Generative AI. When you go  to here, you should actually please Star it,

  • 01:06:36 Fork it, and Watch it. Also, if you sponsor, I  appreciate that. You see, we have 2,000 stars.

  • 01:06:42 We have 200 forks and 82 watching. Moreover, we  now have a subreddit. Go to subreddit, SECourses.

  • 01:06:50 I follow every comment and post made here. I  will reply to all of them. I am also sharing

  • 01:06:56 a lot of announcements here. And I have a real  LinkedIn account. I am not an anonymous person,

  • 01:07:02 obviously. You can go to my LinkedIn account.  You can follow me. You can connect with me. It is

  • 01:07:08 fine. I also reply to every message here. This is  it. I hope I have covered everything that you have

  • 01:07:14 been wondering. Hopefully, see you in the future  tutorials for FLUX because a lot of tutorials

  • 01:07:20 are coming like training on Runpod, training  on Massed Compute, and fine-tuning. I think

  • 01:07:26 fine-tuning will be way better. And I am going  to update this post and write the very newest

  • 01:07:33 findings that I have. Also, you should check out  this lengthy research post because you will find

  • 01:07:40 a lot of information here. Let me show you one  thing. For example, FLUX training discussions

  • 01:07:45 with lots of information on this post. You can  open them. And there is new information that

  • 01:07:52 shows why Windows training is currently slower  than Linux training. With Torch 2.4.1 hopefully,

  • 01:08:00 we are going to get an amazing speed boost  on Windows without doing anything and without

  • 01:08:05 losing any quality. And it is 75% completed. So  it is almost there. I will update my installer

  • 01:08:12 scripts. Don't worry about that. Hopefully,  see you in another amazing tutorial video.

Clone this wiki locally