-
-
Notifications
You must be signed in to change notification settings - Fork 358
How To Do Stable Diffusion Textual Inversion TI Text Embeddings By Automatic1111 Web UI Tutorial
How To Do Stable Diffusion Textual Inversion (TI) / Text Embeddings By Automatic1111 Web UI Tutorial
Full tutorial link > https://www.youtube.com/watch?v=dNOpWt-epdQ
Our Discord : https://discord.gg/HbqgGaZVmr. Grand Master tutorial for Textual Inversion / Text Embeddings. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
In this video, I am explaining almost every aspect of Stable Diffusion Textual Inversion (TI) / Text Embeddings. I am demonstrating a live example of how to train a person face with all of the best settings including technical details.
TI Academic Paper: https://arxiv.org/pdf/2208.01618.pdf
Automatic1111 Repo: https://github.com/AUTOMATIC1111/stable-diffusion-webui
Easiest Way to Install & Run Stable Diffusion Web UI on PC
How to use Stable Diffusion V2.1 and Different Models in the Web UI
Automatic1111 Used Commit : d8f8bcb821fa62e943eb95ee05b8a949317326fe
Git Bash : https://git-scm.com/downloads
Automatic1111 Command Line Arguments List: https://bit.ly/StartArguments
S.D. 1.5 CKPT: https://huggingface.co/runwayml/stable-diffusion-v1-5/tree/main
Latest Best S.D. VAE File: https://huggingface.co/stabilityai/sd-vae-ft-mse-original/tree/main
VAE File Explanation: https://bit.ly/WhatIsVAE
Cross attention optimizations bug: https://bit.ly/CrosOptBug
Vector Pull Request: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/6667
All of the tokens list in Stable Diffusion: https://huggingface.co/openai/clip-vit-large-patch14/tree/main
Example training dataset used in the video:
https://drive.google.com/file/d/1Hom2XbILub0hQc-zmLizRcwFrKwHYGcc/view?usp=sharing
Inspect-Embedding-Training Script repo:
https://github.com/Zyin055/Inspect-Embedding-Training
How to Inject Your Trained Subject: https://youtu.be/s25hcW4zq4M
Comparison of training techniques: https://bit.ly/TechnicComparison
Embedding file name list generator script:
https://jsfiddle.net/MonsterMMORPG/Lg0swc1b/10/
00:00:00 Introduction to #StableDiffusion #TextualInversion Embeddings
00:01:00 Which commit of the #Automatic1111 Web UI we are using and how to checkout / switch to specific commit of any Git project
00:04:07 Used command line arguments of Automatic1111 webui-user.bat file
00:04:35 Automatic1111 command line arguments
00:05:31 How to and where to put Stable Diffusion models and VAE files in Automatic1111 installation
00:06:05 Why do we use latest VAE file and what does VAE file do
00:08:24 Training settings of Automatic1111
00:10:38 All about names of text embeddings
00:11:00 What is initialization text of textual inversion training
00:11:32 Embedding inspector extension of Automatic1111
00:14:25 How to set number of vectors per token when doing Textual Inversion training
00:11:52 Technical and detailed explanation of tokens and their numerical weights vectors in Stable Diffusion
00:16:00 How the prompts getting tokenized - turned into tokens - by using tokenizer extension
00:18:58 Setting number of training vectors
00:20:24 Where embedding files are saved in automatic1111 installation
00:20:38 All about preprocess images before TI training
00:23:06 Training tab of textual inversion
00:23:18 What to and how to set embedding learning rate
00:23:40 What are the Batch size and Gradient accumulation steps and how to set them
00:24:40 How to set training learning rate according to Batch size and Gradient accumulation steps
00:26:21 What are prompt templates, what are they used for, how to set and use them in textual inversion training
00:29:06 What are filewords and how they are used in training in automatic1111 web ui
00:29:35 How to edit image captions when doing textual inversion training
00:31:07 From training images pool, how and why did i choose some of them and not all of them
00:31:54 Why I did add noise to the backgrounds of some training dataset images
00:32:07 How should be your training dataset. What is a good training dataset
00:34:48 Save TI training checkpoints
00:36:31 Which latent sampling method is best
00:39:59 Training started
00:38:08 Overclock GPU to get 10% training speed up
00:38:32 Where to find TI training preview images
00:39:15 Where to see used final prompts during training
00:41:34 How to use inspect_embedding_training script to determine overtraining of textual inversion
00:42:31 What is training loss
00:48:23 Technical difference of Textual Inversion, DreamBooth, LoRA and HyperNetworks training
00:52:17 Over 200 epochs and already got very good sample preview images
00:54:28 How to set newest VAE file as default in the settings of automatic1111 web ui
00:55:06 How to use generated embeddings checkpoint files
00:58:31 How to test different checkpoints via X/Y plot and embedding files name generator script
01:07:27 How to upscale image by using AI
01:08:42 How to use multiple embeddings in a prompt
-
00:00:01 Greetings everyone. Welcome to the most comprehensive, technical, detailed and yet
-
00:00:07 still beginner-friendly Stable Diffusion Text Embeddings, also known as Textual Inversion
-
00:00:12 training tutorial. In this video I am going to cover all of the topics that you see here and
-
00:00:18 more. Currently I am hovering my mouse over there. You can pause the video and check them out if you
-
00:00:25 wish. Also, you see here the training dataset we used and here the textual embedding used results.
-
00:00:32 Let's start by quickly introducing what is textual inversion and its officially released academic
-
00:00:39 paper. If you are interested in reading this article, so you can open the link and read it,
-
00:00:47 I am also going to show some of the important parts of this article when we are going to use
-
00:00:56 them. I will explain through the article. So to do training, we are going to use
-
00:01:02 Automatic1111 web UI. If you don't know how to install and set up the Automatic1111 web UI,
-
00:01:10 I already have a video for that on my channel: Easiest Way to Install & Run Stable Diffusion
-
00:01:16 Web UI. Also, I have another video How to use Stable Diffusion V2.1 and Different Models.
-
00:01:24 So I am going to use specific version of the Automatic1111 web UI. It is constantly
-
00:01:30 getting updated and therefore it is constantly getting broken and you are asking me that:
-
00:01:37 which version did you use? I am going to use this specific version, this commit, because after bump
-
00:01:44 gradio to 3.16, it has given me a lot of errors. So how am I going to use this specific version?
-
00:01:55 To use specific version, I am going to clone it with Git Bash. If you didn't still install
-
00:02:00 Git Bash, you can install it by using Google. Just type Git Bash to Google.
-
00:02:06 You can download from this website and install it. It is so easy to install.
-
00:02:10 First I am going to select the folder where I want to clone my Automatic1111 web UI. I am entering my
-
00:02:18 F drive and in here I am generating a new folder with right click new folder. Let's give it a name
-
00:02:25 as tutorial web UI. OK, then we will move inside this folder in our Git Bash window to do that.
-
00:02:36 Type CD F: Now we are in the F drive, then CD, put a quotation mark like this and hit enter.
-
00:02:45 Now we are inside this folder. Now we can clone Automatic1111 with git clone and copy the URL from
-
00:02:55 here like this and paste it into here. Right click, paste and it will clone it. OK, it is
-
00:03:03 cloned inside this folder. So I will enter there CD "s" tab and it will be automatically completed
-
00:03:11 like this and hit enter. Now we will check out to certain version from here. Let me show you again.
-
00:03:21 Click the commits from here, and here I am moving to the commit that I want: enable progress bar
-
00:03:27 without gallery. This is the commit ID. I will also put this into the description of the video.
-
00:03:32 Then we are going to do Git checkout like this. And right click paste. Now we are into that commit
-
00:03:44 and we are using the that specific version inside our folder. So before starting setup, I am copy
-
00:03:53 pasting this web UI user bat. But because I am going to add my command line arguments to here.
-
00:04:01 OK, right click the copy and edit and let me zoom in copy paste. So I am going to use xformers,
-
00:04:09 no-half and disable-safe-unpickle. So how did I come up with these command line arguments?
-
00:04:15 xformers going to increase your speed significantly and reduce the VRAM usage
-
00:04:21 of your graphic card. No-half is necessary for xformers to work correctly when you are using
-
00:04:29 SD 1.5 or 2.1 versions and disable-safe-unpickle. According to the web UI documentation you see the
-
00:04:38 URL here. Command line arguments and settings: disable checking pytorch models for malicious
-
00:04:44 code. Why I am using this? Because if you train your model on a Google Colab, sometimes it is not
-
00:04:50 working. It is not necessary, but I am just using it and I am not downloading any model
-
00:04:54 without knowing it. OK, then we save and run, then we are going to get our fresh installation.
-
00:05:04 OK, like this, it will install all necessary things. And you see, Let me zoom in It is
-
00:05:12 using Python 3.10.8 version. By the way, you have to have installed Python correctly for
-
00:05:23 this to install. It is also showing the commit hash that I am using like this.
-
00:05:30 I also need to put my Stable Diffusion models into the models folder. So let's open it. Open
-
00:05:36 the State Diffusion folder, copy paste from my previous download. And one another thing is I
-
00:05:43 am going to use the latest VAE file that I have downloaded from the Internet, which I am going to
-
00:05:49 show you right now. So where do we put this VAE file? Go to the stable diffusion web UI and in
-
00:05:54 here you will see VAE files folder. It is not generated. It is inside the models folder and
-
00:06:02 inside here VAE, and this is the VAE file. Why we are using this VAE file? Because it is improving
-
00:06:10 generation of person images. And now let me show the link. OK, this is the link of the VAE file.
-
00:06:17 This is the latest version of VAE file. Just click the CKPT file from here and click the download
-
00:06:24 button. I will also put the link of this into the description. So if you are wondering the technical
-
00:06:30 description, technical details of the VAE files, there is a long explanation in here in this
-
00:06:37 thread. I will also put the link of this thread into this description of the video and there is a
-
00:06:42 shorter description which I liked. Each generation is done in a compressed representation. The VAE
-
00:06:48 takes the compressed results and turn them into full sized images. SD comes with a VAE already,
-
00:06:54 but certain models may supply a custom VAE that works better for that model and SD 1.5 version
-
00:07:03 model is not using the latest VAE file. Therefore, we are downloading this and putting that into our
-
00:07:10 folder. SD 2.1 version is using the latest VAE file. And which SD 1.5 version model I am using?
-
00:07:20 I am using the 1.5 pruned ckpt. And where did I download it? I have downloaded it from this
-
00:07:29 URL and we are using the pruned ckpt because it is better for training than emaonly file which,
-
00:07:36 you see, is lesser in size. By the way, the things I am going to show in this video can
-
00:07:43 be applied to any model, such as Protogen or SD 2.1 version. Actually, I have made
-
00:07:51 experiments on Protogen training as well, and I will show the results of that too to you.
-
00:07:59 Okay, the fresh installation has been completed. No errors, and these are the messages displayed,
-
00:08:04 and it has started on this URL and I have already opened it. You can copy and paste
-
00:08:11 this URL in my browser. So currently it has selected by default the Protogen and I am going
-
00:08:18 to make this tutorial on version 1.5 pruned, the official version. Okay, before starting training,
-
00:08:26 I am going to first settings. First going to show you the settings that we need. Go to the
-
00:08:32 training tab in here and check this checkbox, Move VAE and CLIP to RAM when training. This
-
00:08:40 requires a lot of RAM actually. I have 64 GB and if you have checked this, it will reduce to VRAM,
-
00:08:48 which is the GPU RAM, which is our more limited RAM. Then you can also, on check this,
-
00:08:56 Turn on pin_memory for DataLoader. This makes training slightly faster, but increase memory
-
00:09:01 usage. I think this is increasing the RAM usage, not VRAM usage. So you can test this.
-
00:09:07 In other videos. You will see that. Check this checkbox. Use cross attention optimizations while
-
00:09:13 training. This will significantly increase your training speed and reduce the VRAM usage. However,
-
00:09:19 it is also significantly reducing your training success. So, if your graphic card can do training
-
00:09:27 without checking this out, do not check this, because it will reduce your training success and
-
00:09:33 it will reduce your learning rate. How do I know this? According to the vladmandic from Github,
-
00:09:41 this is causing a lot of problems. He has opened a bug topic on the Stable
-
00:09:50 Diffusion web UI issues and he says that this is causing a lot of problems. Let me show you.
-
00:09:59 He says that when he disabled cross attention for training and rerun exactly the same settings,
-
00:10:04 the results are perfect and I can verify this. So do not check this if your graphic card can
-
00:10:11 run it. There is also one more settings that we are going to set: Save an CSV containing the loss
-
00:10:18 to log directory every N steps. So I am going to make this 1. Why? Because I will show you how we
-
00:10:24 are going to use this during the training. Then go to the apply settings. Okay. Then reload UI.
-
00:10:31 Okay, settings were saved and UI is reloaded. Now go to the train tab. Okay. First of all,
-
00:10:39 we are going to give a name to our embedding. The name is not important at all,
-
00:10:46 so you can give it any name. This will be used to activate our embedding. Okay,
-
00:10:54 so I am going to give it a name as training example. It can be any character length,
-
00:11:00 It won't affect your results or token count. Initialization text. Now what does this mean?
-
00:11:07 For example, you are teaching a face and you want it to be similar to Brad Pitt. Then you
-
00:11:14 can type Brad Pitt. So what does this mean? Actually, to show you that first we are going
-
00:11:20 to install an extension, go to the available load from and in here, type embed into your search bar
-
00:11:29 and you will see embedding inspector. This is an extremely useful extension and let's install it.
-
00:11:39 Okay, the extension has been installed, so let's just restart with this. Okay, now we can see the
-
00:11:50 embedding inspector. So everything in the Stable Diffusion is composed by tokens. What does that
-
00:12:00 mean? You can think tokens as keywords, but not exactly like that. For example, when we type cat
-
00:12:07 and click inspect, the cat is a single token and it has an embedding ID and it has weights.
-
00:12:16 So every token has numerical weights, like this. And when we do training with embeddings,
-
00:12:27 actually we are going to generate a new vector that doesn't exist in the stable diffusion. We
-
00:12:34 are going to do training on that. So when you set initialization text like this, by the way,
-
00:12:42 it is going to generate a vector with the weights of this. However, this is a two token. How do I
-
00:12:51 know, go to go to the embedding inspector and type Brad. So you see, Brad is a single token.
-
00:12:57 It has weights. And let's type Pitt, and Pitt is also another token and it has also vector.
-
00:13:05 So these weights will be assigned initially to our new vector. However, we have to use at least
-
00:13:14 two vectors, otherwise we wouldn't be able to get two vectors. So if we start our training with Brad
-
00:13:24 Pitt, our first initial weights will be according to the Brad Pitt and our model will learn upon
-
00:13:32 that. Is this good? If your face is very similar to Brad Pitt, yes, but if it is not, no. So
-
00:13:42 Shondoit from the automatic community has made an extensive experimentation and he found that
-
00:13:54 starting with zero initialization text as empty. So we will start with zeroed vectors is performing
-
00:14:04 better than starting with, for example, *. Because * is also another token and you can see it from
-
00:14:13 here. Just type * here. It is just some vectors like this. So starting with empty vectors is
-
00:14:21 better. And now the number of vectors per token. So everything, every token has a vector in the
-
00:14:30 stable diffusion, and you may wonder how many tokens there are To find out that we are going to
-
00:14:38 check out the clip vit large patch. So in here you will see the tokenizer json. Yes, inside this json
-
00:14:45 file all of the tokens are listed. So you see, let me show, there is word IDs and words themselves,
-
00:14:56 like here: you see yes. So the list is starting from here. So each one of these are tokens and
-
00:15:04 it goes to the bottom like this: For example, sickle, whos, lamo, etour, finity. So these are
-
00:15:12 all of the tokens, all of the embeddings that the stable diffusion contains. If you wonder how many
-
00:15:19 there are exactly, there are exactly 49408 tokens and each contain one vector. For SD 1.x versions,
-
00:15:33 it is 762 vector size and for SD 2 version, it is 1024. So when we do embedding inspector,
-
00:15:45 you see it is showing the vector. So everything is composed by numerical weights
-
00:15:50 and they are being used by the machine learning algorithms to do inference and so also every
-
00:16:00 prompt we type is getting into tokenized and I will also show that tokenization right now.
-
00:16:06 Before we start to do that, go to the available tab load here and search for token and you will
-
00:16:13 see there is tokenizer, like tokenizer extension. Just install it, restart the UI and now you will
-
00:16:22 see tokenizer. So type your prompt here and see how it is getting tokenized. So let's say I am
-
00:16:29 going to use this kind of prompt. It is showing in the web UI that fifty eight tokens are being
-
00:16:37 used and we are limited to seventy five tokens. But we are not using fifty eight words here.
-
00:16:44 If you count the number of words it is not fifty eight. So let's copy this and go to the tokenizer,
-
00:16:50 paste it and tokenize, and now it is showing all of the tokenization. So the face is a single token
-
00:16:57 with ID of 1810. Photo is single token of a single token and let's see: OK, So the artstation is two
-
00:17:06 tokens. It is art and station. Comas are also one tokens, as you can see, and let's see if there is
-
00:17:13 any other being tokenized into some tokens. Or photorealistic. Photorealistic is also two
-
00:17:21 tokens and artstation is two tokens. So this is how tokenization works. Each of these tokens have
-
00:17:29 their own vectors and you can see their weights from embedding inspector. However, it is not very
-
00:17:35 useful because these numbers doesn't mean anything individually, but in the bigger scheme they
-
00:17:42 are working very well with the machine learning algorithms. Machine learning is all about weights.
-
00:17:50 Also, in the official paper of textual inversion, on the page four, you see they are showing a photo
-
00:17:58 of a star, which is our embedding name. So you see there is a tokenizer and token IDs and they have
-
00:18:07 vectors like this. So it is all about vectors and their weights. OK. Now we can return to
-
00:18:14 train tab. Now we have idea of our tokenization. So let's give a name as tutorial training. You
-
00:18:23 can give this any name. This will be activation. Initialization text. I am just leaving it empty to
-
00:18:30 obtain best results. So our vectors with will start with zero. Let's say you are training
-
00:18:37 a bulldog image, so you can start with bulldog weights. So it will make your, it may make your
-
00:18:46 training better. However, for faces since we are training a new face that the database has no idea,
-
00:18:54 I think leaving it is. Leaving it as empty is better. So, number of vectors: now you know that
-
00:19:01 each token has one vector, which means that when we type Brad Pitt, only two vectors are used for
-
00:19:10 that. So all of the Brad Pitt images are saved in the stable diffusion model with just two vectors,
-
00:19:18 which means that two vectors is a good number of , is a good number for our face training or
-
00:19:29 for our subjects training. I also have made a lot of experiments with one vector, two vector,
-
00:19:35 three, vector four vector, and I have found that two vectors are working best. However, this is
-
00:19:42 based on my training data set. You can also try one, two, three, four, five and you will see that
-
00:19:49 the quality is decreasing as you increase the number of vectors. Also, in the official papers
-
00:19:55 the researchers have used up to three vectors. You see extended latent spaces. This is the number of
-
00:20:01 vector count that is derived from the official paper and they have used up to three. You see
-
00:20:07 detonated two words and three words, but it is up to you to do experimentation and I am going to
-
00:20:14 use two. If you write overwrite old embedding, it will override if there is embedding like this. So
-
00:20:20 let's click create embedded and it is created. So where it is, saved. Go to your folder installation
-
00:20:30 folder and in here you will see embeddings and in here we can see already our embedding is composed.
-
00:20:37 Then let's go to the preprocess images. So this is a generic tab of web UI. It provides you to crop
-
00:20:48 images, create flipped copies, split oversized images. Auto focal point crop, use BLIP for
-
00:20:53 caption, use deepbooru for for captioning. There is source directory and destination directory.
-
00:21:00 So I have a folder like this for experimentation and showing I am copying its address like this
-
00:21:07 and pasting it in here as source and I am going to give it a destination directory like a1. They
-
00:21:15 are going to be auto-resized and cropped. So let's check. Let's check this checkbox. Create flipped
-
00:21:22 copies. By the way, for faces, I am not suggesting to use this. It is not improving quality. You can
-
00:21:28 also split oversized images, but this doesn't make sense for faces. Autofocal point: yes,
-
00:21:34 let's just also click that. Use BLIP for caption. So it will use BLIP algorithm for
-
00:21:40 captioning. This is better for real images and deepbooru is better for i think anime images. OK,
-
00:21:48 and then just let's click preprocess. By the way, why we are doing 512 and 512? Because version 1.5,
-
00:21:58 Stable Diffusion version 1.5 is based on 512 and 512 pixels. If you use version 2.1 Stable
-
00:22:07 Diffusion. Then it has both 512 pixels and 768 pixels. So you need to process images based on
-
00:22:19 the model native resolution. Based on the model that you are going to do training. In the training
-
00:22:26 tab it will use the selected model here. So be careful with that. And when the first time
-
00:22:32 when you do preprocessing, it is downloading the necessary files as usual. OK, the processing has
-
00:22:38 been finished. Let's open the processed folder from pictures and a2 folder, a1 folder. And now
-
00:22:45 you see there are flipped copies and they were automatically cropped to 512 and 512 pixels. And
-
00:22:52 there are also descriptions generated by the BLIP. When you open the descriptions, you will see like
-
00:22:58 this: a man standing in front of a metal door in a building with a blue shirt on and black pants.
-
00:23:05 So now we are ready with the preprocess images, we can go to the training tab. In here we are
-
00:23:13 selecting the embedding that we are going to train, embedding learning rate. There are various,
-
00:23:20 let's say, discussions on this learning rate, but in the official paper, 0.005 is used. Therefore,
-
00:23:29 I believe that this is the best learning rate. The gradient clipping is related to hyper network
-
00:23:36 learning rate, a hyper network training, so just don't touch it. So the batch size and gradient
-
00:23:42 accumulation size. This is also explained in the official paper. The batch size and gradient
-
00:23:49 accumulation steps will just increase your training speed if you have sufficient amount
-
00:23:54 of RAM, VRAM memory. However, make sure that the number of training images can be divided
-
00:24:00 the multiplication of these both numbers. So let's say you have 10 training images, then you can set
-
00:24:10 these as 2 batch size and 5 gradient accumulation, which is two multiplied by five, is equal to 10,
-
00:24:20 or they can be. Or let's say you have 40 training images, then you can set it as 20 or 10 or 5,
-
00:24:28 it is up to you. However, this will increase, significantly, increase your VRAM usage. And
-
00:24:35 let's say the multiplication of these two numbers is equal to 10. Then you should also multiply
-
00:24:42 learning rate with 10. Why? Because this requires learning rate to be increased. How do I know that?
-
00:24:51 In the official paper, in the implementation details,
-
00:24:55 they say that they are using two graphic cards with batch size of four. Then they are changing
-
00:25:03 the base learning rate with multiplying by by eight. Why? Because two graphic cards,
-
00:25:08 batch size for four multiplied by two is eight, and when you multiply 0.005 with 8, then we obtain
-
00:25:17 0.04. So be careful with that. If you increase this batch size and gradient accumulation steps,
-
00:25:25 just also make sure that you are increasing the learning rate as well. However, for this tutorial
-
00:25:31 I am going to use batch size one and gradient accumulation steps as one. Actually, until you
-
00:25:37 obtain good initial results, don't change them, I suggest you. Then you can change them. Then you
-
00:25:44 need to set your training data set directory. So let's say I am going to use these images,
-
00:25:51 then I am going to set them. Also, there is log directory, so the training logs will be logged
-
00:25:59 in this directory where it is. When we open our installation folder, we will see that there is a
-
00:26:11 textual inversion. However, since we still didn't start yet, it is not generated. So when the first
-
00:26:18 time we start, it will be generated. I suggest you to not change this. Okay,
-
00:26:23 prompt template. So what are prompts templates? Why are they used? Actually, there is not a clear
-
00:26:31 explanation of this in the official paper. When you go to the very bottom, you will see
-
00:26:38 training prompt templates. So these templates are actually derived from here. From my experience,
-
00:26:46 I have a theory that these prompts are used like this. So let's say you are teaching a photo of a
-
00:26:55 person, then this, the vectors of these tokens are also used to obtain your target image. So
-
00:27:04 they are helping to reach your target image. This is my theory. So it is using the vector of photo,
-
00:27:14 it is using a vector of a of, or you are teaching a style, then it is using that. So these templates
-
00:27:23 are actually these ones. When you open the prompt template folder which is in here, let's go to the
-
00:27:32 textual inversion temples and you will see the template file files like this. So let's say, when
-
00:27:38 you open subject file words, it will, you will get a list of, like this: a photo of name and file,
-
00:27:44 or so. The name is the activation name that we have given. It will be treated specially. It will
-
00:27:52 not get turned into a regular token. For example, tutorial training would be tokenized like this if
-
00:28:00 it was not an embedding tutorial: training. Let's click tokenize. You see the tutorial training.
-
00:28:07 Actually three tokens. Tutorial is a tokenized, like tutor, ial and training. However, since
-
00:28:16 it will be our special embedding name, therefore it will be treated as with the number of special
-
00:28:25 tokens and it will be based on the number of vectors per token. We decided, if we decide this,
-
00:28:33 to take 10, then it will use 10 token space from our prompting, so it will take 10 space in here.
-
00:28:42 However, it will now take only two instead of three because it will be specially treated. Okay,
-
00:28:51 let's go back to the train tab. So? So this name is the. Sorry about that. This name is the name of
-
00:29:01 our embedding name and the filewords. So the filewords is the description generated here.
-
00:29:08 So, basically, the prompt for training will become, tutorial training, and the file words,
-
00:29:16 let's say it is training for this particular image. It will just get this and append it
-
00:29:22 here and this will become the final prompt for that image when doing training. So, what
-
00:29:30 should we? How should we edit this description? You should define the parts that you don't want
-
00:29:39 model to learn. Which parts i don't want model to learn? I don't want model to learn this clothing,
-
00:29:45 these walls, for example, or the this item here. So i have to define them as much as possibly.
-
00:29:53 So if i want model to learn glasses, then i need to remove glasses, okay,
-
00:29:58 and for example, if i want model to learn my smile, i should just remove it. Okay, i want my,
-
00:30:06 i want model to learn my face. Therefore, i can, i can just remove it, and this is so on. However,
-
00:30:15 i am not going to use file words in this training, because i have found that if you
-
00:30:21 pick your training data set carefully, you don't need to use filewords. So, how am i going to do
-
00:30:29 training in this case? I am just going to generate a new text file here and i will say: my, special,
-
00:30:41 okay, let's just open it. And here i am just going to type [name]. You have to use at least name,
-
00:30:47 otherwise it won't work. It will throw an error, and i am not going to use, filewords. Also,
-
00:30:55 i am not going to use myself in this training. I am going to use one of my followers. He had
-
00:31:03 he had sent me his pictures. Let me show you the original pictures he had sent me. Okay, this is
-
00:31:10 the images he had sent me. However, i didn't use all of them. You see the images right now. There
-
00:31:17 are different angles and, different backgrounds. When you are doing textual inversion, you should
-
00:31:25 only teach one subject at a time, but if you want to combine multiple subjects, then you can train
-
00:31:32 multiple embeddings and you can combine all of them when using, when do, when generating images.
-
00:31:38 So which ones i did pick, let me show you. I have picked these ones, okay, and now you will notice
-
00:31:46 something here. You see, the background is here, is like this. You see green and some noise. Why?
-
00:31:55 Because i don't want model to learn background. So if multiple images containing same background,
-
00:32:01 i am just noising out those backgrounds. And why I did not noise out to other backgrounds? Because
-
00:32:07 other backgrounds are different. So you see, in your training data set, only the subject should
-
00:32:14 be same and all other things need to be different, like backgrounds, like clothing and other things.
-
00:32:21 So the training will learn only your subject, in this case the face. It will not learn the
-
00:32:28 background or the clothing. Okay, so let me show the original one. So in the original one you see
-
00:32:34 this image, this image, this image and these two images have same backgrounds. So i have edited
-
00:32:40 those same backgrounds with paint .NET, which is a free editing software. You can also edit with
-
00:32:47 paint. How did i edit it? It is so actually simple and amateur, you may say. So let's set a brush
-
00:32:55 size here and just, for example, change the color like this. Then i did added some noise: select it
-
00:33:04 with a selection tool, set the tolerance from here and go to the effects, adjust effects and
-
00:33:11 in here you will see distort and frosted glass and when you click it it will change the appearance.
-
00:33:20 You can also try other distortion. By the way, i am providing these images to you for
-
00:33:27 testing. Let me show you the link. So i have uploaded images into a google drive folder
-
00:33:33 and i am going to put the link of this into the description so you can download this data
-
00:33:38 set and do training and see how it performs. Are you able to obtain good results, as me?
-
00:33:45 Okay, so i am going to change my training data set folder from pictures and i am going to use
-
00:33:55 example training set folder. I am going to set it in my training here. Okay, and i am going to use
-
00:34:05 my prompt template. Just refresh it and go to the my special. So what was my special? My special was
-
00:34:12 just only containing [name]. It is not containing any file descriptions. I have found that this is
-
00:34:18 working great if you optimize your training data set, as, like me, you can try both of them. You
-
00:34:26 can try with [filewords] and you can try without [filewords]. [filewords] and you can see how it
-
00:34:33 is working. Okay, do not resize images, because our images are already 512 pixels max steps. Now,
-
00:34:40 this can be set anything. I will show you a way to understand whether you started over training
-
00:34:48 or not, so this can be stay like this. In how many steps we want to save? Okay, this is rather
-
00:34:56 different than epochs in the DreamBooth, if you have watched my DreamBooth videos. So each image
-
00:35:03 is one step and there is no epoch saving here. It is step saving. How many training images i
-
00:35:10 have. I have total 10 images, therefore, okay, so for 10 epochs we need to set this 100, actually,
-
00:35:18 okay. So the formula is like this: one epoch equal to number of training images. 10 epoch
-
00:35:24 for 10 training images is 10 multiplied by 10 is 100, so it will be saved every 10 epoch.
-
00:35:31 Save images with embedding in png chunks. This will save embed. This will save embedding info
-
00:35:41 in the generated preview images. I will show you. Read parameters from text to image tab
-
00:35:48 when generating preview images. I don't want that, so it will just use the regular prompts, that is,
-
00:35:57 that we will see in here. Shuffle tags by comma, which which means that if you use file words,
-
00:36:05 the words in there will be shuffled when doing training. This can be useful. You can test it
-
00:36:11 out and drop out tags when creating prompts. This means that it will randomly drop the
-
00:36:18 file descriptions, file captions, that you have used. This is, i think,
-
00:36:23 percentage based. So if you set it 0.1, it will drop out randomly the 10 percent, and i am not
-
00:36:30 going to use file words. Therefore, this will have zero effect. Okay, choose latent sampling methods.
-
00:36:36 I also have searched this. In the official paper. Random is used. However, one of the
-
00:36:43 community developer proposed deterministic and he found that deterministic is working best. So
-
00:36:49 choose deterministic. And now we are ready so we can start training. So i am going to click train
-
00:36:57 embedding. Okay, training has started, as you can see, and we are. It is displaying the number of
-
00:37:05 epochs and number of steps. It is displaying the iteration, so currently it is 1.30 seconds for
-
00:37:14 per iteration. Why? Because i am recording and it is taking already a lot of gpu power. I have
-
00:37:21 RTX 3060. It has 12 gigabyte of memory. Let me also show you what is taking the memory usage.
-
00:37:32 You see, OBS studio is already using a lot of gpu memory and also training uses. But since they are
-
00:37:39 using different parts of the gpu, i think it is working fine. When we open the performance,
-
00:37:43 we can see that the training is using the 3d part of the gpu and obs is using the video encode part
-
00:37:52 of the gpu. That is how i am still able to record, but sometimes it is dropping out out my voice. I
-
00:37:59 hope that it is currently recording very well. Okay, and i also did some overclocking to my gpu
-
00:38:10 by using MSI Afterburner. I have increased the core clock by 175 and i have increased
-
00:38:18 memory clock by 900, so this boosted my training speed like 10%. You can also do that if you want.
-
00:38:26 I didn't do any core voltage increasing. So it has already generated two preview images where
-
00:38:34 we are going to find them. Let me show you. Now it will be inside textual inversion folder. You
-
00:38:42 see it has just arrived and when you open it you will see the date, the date of the training, and
-
00:38:51 you will see the name of the embed we are going training and in here you will see embeddings. This
-
00:38:56 is the checkpoint. So you can use any checkpoint to do to generate images and, this is the images
-
00:39:04 that it has generated. So this is the first image and also in image embeddings. This image
-
00:39:11 embedding contains the embedding info. Why this is generated? Because we did check this checkbox.
-
00:39:19 You will see the used prompts here. Since i didn't use any file words and i just used name,
-
00:39:25 it is only using this name as a prompt. And what does that mean? That means that it is only using
-
00:39:34 the vectors we have generated in the beginning to learn our subject, to learn the details of
-
00:39:40 our subject, which is the face, and we have two vectors to learn, and also Brad Pitt is
-
00:39:48 based on the two vectors, so why not? We can be also taught to the model with two vectors.
-
00:39:56 Okay, in the just in the 20th epoch, we already getting some similarity.
-
00:40:03 Actually, i already did the same training, so i already have the trained data.
-
00:40:12 But i am doing recording while training for you again, for to explain to you better.
-
00:40:21 It also shows here the estimated time, for training to be completed. This time is based on
-
00:40:28 100 000 steps, but we are not going to train that much. Actually, i have found that around three
-
00:40:35 thousand steps. We are getting very good results, with the training data set i have. It will totally
-
00:40:42 depend on the training data set you have with how many number of steps you can teach your subject.
-
00:40:48 I will show you the way how to determine which one is best, which checkpoint is best, which number of
-
00:40:56 steps is best. Okay, with 30 epoch we already got a very much similar image. You see, with just 30
-
00:41:06 epoch we are starting to get very similar images. It is starting to learn our subject very well
-
00:41:12 with just 30 epoch and when we get over 100 epoch, we will get much better quality images.
-
00:41:20 Okay, it has been over 60 steps, 600 steps and over 60 epochs,
-
00:41:26 and we got six preview images. Since we are generating preview images and checkpoints,
-
00:41:32 for every 10 epoch. Now i am going to show you how you can determine whether you are overtraining or
-
00:41:40 not with a community developed script. So the script name is: inspect embedding training.
-
00:41:50 It is hosted on github. It's a public project. I will put the link of this project to the
-
00:41:55 description as well. Everything, every link, will be put to the description. So check out
-
00:41:59 the video description and in here, just click code and download as zip. Okay, it is downloaded.
-
00:42:06 When you open it you will see inspect embedding training part. Extract this file into your textual
-
00:42:14 inversion and tutorial training, as i have shown. So you will see these files there. To extract it,
-
00:42:21 just drag and drop. Why we are extracting it in here? Because we are going to analyze the loss.
-
00:42:28 And so the loss, what is loss? You are always seeing the loss here. The number value is here:
-
00:42:37 loss is the penalty for a bad prediction. That is, that is loss is a number indicating how bad
-
00:42:43 the model's prediction was on a single example. If the model's prediction is perfect, the loss is
-
00:42:49 zero. Otherwise the loss is greater. In our case we can think that as the model generated image,
-
00:42:56 how likely, how close to our training subjects, training images. So if you get a zero loss,
-
00:43:05 that means that model is learning very good, okay. If your loss is too high, that means that
-
00:43:11 your model is not learning. Now, with this script we have extracted here, we are going to see the
-
00:43:20 loss. And how are we going to use this script? This script requires torch installation and
-
00:43:28 the torch is already installed in our web ui folder, inside venv folder, virtual environment,
-
00:43:36 and inside here scripts. So we are going to use the python exe here to do that. First copy the
-
00:43:43 path of this. Open a notepad file like this: okay, put quotation marks and just type python exe like
-
00:43:54 this: okay, then we are going to get the path of the file. Let me show you. The script file
-
00:44:02 is in this folder. So, with quotation marks, just copy and paste it in here and type the
-
00:44:12 script file name like this: then open a new cmd window by typing like this:
-
00:44:19 okay, let me some zoom in, copy and paste the path like this, the code, and just hit enter
-
00:44:28 and you will see it has generated some info for us: learning rate at step at,
-
00:44:34 loss jpg, vector, vector jpg and the average vector strength. So let's open our folder in
-
00:44:42 here and we will see the files. When we open the loss file we are going to see a graph like this:
-
00:44:50 an average loss is below 0.2, which means it is learning very well. Why, as close as it is to 0,
-
00:44:58 it is better, so as close as it is to 1, it is worse. So currently we are able to
-
00:45:04 learn very well. Now i will show you how to determine the over training or not.
-
00:45:13 To do that, we are going to add a parameter here, --folder, and just give the folder of
-
00:45:21 embedding files here. Just copy paste it again do not forget quotation marks and open a new
-
00:45:28 cmd window. Just copy and paste it, hit enter. It will calculate the average strength of the
-
00:45:37 vectors and when this strength is over 0.2, that usually means that you started over training. How
-
00:45:46 do we know? According to the developer of this, this script, if the strength of the average
-
00:45:54 strength of the all vectors is greater than 0.2, the embedding starts to become inflexible. That
-
00:46:02 means over training. So you will not be able to stylize your trained subject. So you won't
-
00:46:14 be able to get good images like this if you do over training, if you're, if the strength
-
00:46:20 of the vectors becomes too weak. And what was the vector strength? It was so easy. When we opened,
-
00:46:29 the embedding tab, the embedding inspector tab, we were able to see the values of vectors. So this
-
00:46:37 strength means that the average of these values, when they, when the average of these values is
-
00:46:42 over 0.2 that means that you are starting to do over training. You need to check this out
-
00:46:50 to determine that. By the way, it is said that the DreamBooth is best to teach faces,
-
00:46:58 and in the official paper of embedding, the textual inversion, the authors, the researchers,
-
00:47:04 have used all you see objects like this, or they have used training on style, let me show you like
-
00:47:13 here. However, as i have just shown, as i have just demonstrated you this textual embeddings are
-
00:47:22 also very good, very successful, for teaching faces as well, and for objects, of course,
-
00:47:29 it is working very well as well. And for styles. I think the textual inversion, the text embeddings,
-
00:47:36 is much better than DreamBooth. So if you want to teach objects or styles, then i suggest you
-
00:47:44 to use textual inversion. Actually for faces, i think textual inversion of the automatic1111 is
-
00:47:51 working very well as well. And for DreamBooth to obtain very good results, you need to merge your
-
00:47:59 learned subject into a new model, which i have shown in my video. So if you use dream boot,
-
00:48:05 you should inject your trained subject into a good custom model to obtain very good images.
-
00:48:10 But on textual inversion, you can already obtain very good images. Okay, we are over 170 epoch and
-
00:48:20 meanwhile training is going on. I will show you the difference of DreamBooth, textual inversion,
-
00:48:27 LoRA and hypernetworks. One of the community member of reddit, use_excalidraw, prepared
-
00:48:36 an infographic like this and this is very useful. So in DreamBooth, we are modifying the weights of
-
00:48:43 the model itself. So all of the prompt words we use you know. You already know by now that
-
00:48:52 they all, they all have vectors of them, each of them, and these vectors are getting modified in
-
00:48:59 DreamBooth, all of them. The token we selected for DreamBooth is also getting modified and in
-
00:49:06 DreamBooth we are not able to add a new vector. We already have to use one of the existing vectors
-
00:49:15 of the model. Therefore, we are selecting one of the existing tokens in the models,
-
00:49:20 such as sks or ohwx. So in DreamBooth we are basically modifying, altering the model itself.
-
00:49:32 Okay, in Textual Inversion we are adding a new token. Actually, this is displayed incorrectly
-
00:49:39 because it is generating a unique new vector which does not exist in the model, and we are modifying
-
00:49:50 the weights of these new vectors. So when we set the vector count as two, it is actually using two
-
00:49:58 unique new tokens. So it is modifying two vectors. If we set the vector count 10, it is using 10
-
00:50:06 unique tokens. It is being specially treated, it is adding new 10 vectors and it is not modifying
-
00:50:15 any of the existing vector of the model. So if we set the vector count to 10, actually when we do,
-
00:50:25 when we generate an image in here, it will use 10 vectors. It will use 10 tokens out
-
00:50:31 of 75 tokens we have. We have 75 tokens limit. So this is how it works. Also, if you use 10 vector,
-
00:50:41 you will see that you are getting very bad results for face. I have made tests. Tests. Okay,
-
00:50:47 in LoRA it is. This is very similar to the DreamBooth. It is modifying the existing vectors
-
00:50:55 of the model. It is. I have found that the LoRA is inferior to the DreamBooth, but it is just using
-
00:51:05 lesser VRAM and it is faster. Therefore, people is choosing that. However, for quality, DreamBooth is
-
00:51:12 better in as shown here, and the hypernetworks. Hypernetworks doesn't have an official academic
-
00:51:21 paper. I think it is made upon a leaked code and this is the least successful method. It
-
00:51:29 is the worst quality, so just don't waste time with it. It is so i don't suggest to use it.
-
00:51:38 So in hypernetworks, the original weights, original vectors of the model is not modified,
-
00:51:44 but at the inference time. Inference means that when you generate an image from text to image,
-
00:51:50 it is inference. They are just getting swapped in. So you see there are some images which are
-
00:51:58 training sample, apply noise compare and there is loss. So this is how the model
-
00:52:06 is learning. Basically, of course, there are a lot of details if you are interested in them,
-
00:52:11 you can just read the official paper, but it is very hard to understand and complex thing things.
-
00:52:19 Okay, we are over 200 epochs, so we have 20 example images and the last one is extremely
-
00:52:27 similar to our official training set, as you can see. So let's also check out our strength
-
00:52:34 of the training vector. So i am just typing, hitting the up arrow in my keyboard, and it is
-
00:52:43 retyping the last executed command and hit enter. Okay, so our strength, average strength, is 0.13,
-
00:52:53 actually almost 0.14. We are getting close to 0.2. After 0.2, we can assume we started over training.
-
00:53:02 Of course, this would depend on your training data set, but it is an indication according
-
00:53:08 to the experience of the this developer. It also makes sense because as the strength of the vector
-
00:53:16 increases, it will override the other vectors. You see, since they are all floating point numbers,
-
00:53:27 numeric numbers, the bigger numeric number is usually making ineffective the lower numeric
-
00:53:36 numbers. This is how machine learning usually works, according to the chosen algorithms. They
-
00:53:41 are extremely complex stuff, but this is one of the, let's say, common principles that in the many
-
00:53:49 of the numerical weights based machine learning algorithms. Therefore, it also makes sense.
-
00:53:57 Okay, we are over 500 epochs at the moment. So let me show you the generated sample images. These
-
00:54:05 are the sample images. We are already very similar and the latest one is, you see, looks like getting
-
00:54:11 over trained. So let's check out with the script we have. Just hit the up arrow and hit enter
-
00:54:20 and you see, we are now over 0.2 strength. Therefore, I am going to cancel the training and
-
00:54:28 now I will show you how to use these embeddings. But before doing that, first let's set the newest
-
00:54:36 VAE file to generate better quality images. To do that, let's go to the let me find it
-
00:54:49 okay go to the Stable Diffusion tab in the settings and in here, you see,
-
00:54:54 SD VAE is automatic. I am going to select the one we did put. Let's apply settings, okay,
-
00:55:02 and then we will reload to UI. Okay, settings applied and UI is reloaded. So how are we going to
-
00:55:11 use these generated embeddings? It is easy. First let's enter to our textual inversion directory and
-
00:55:21 inside here let's go to the embeddings folder. Let me show you what kind of path it is. I know
-
00:55:29 that it is looking small, so this is where I have installed my automatic1111. This is the
-
00:55:40 main folder: textual inversion. This is the date of the training, when it was started. This is the
-
00:55:46 embedding name that I have given and this is the folder where the embedding checkpoints are saved.
-
00:55:53 When we analyze the weights, we see the change it has. So I am going to pick 20
-
00:56:01 of them to compare. How am I gonna do that? I will pick with 200 per epoch, like this:
-
00:56:10 okay, I have selected 24. Right click copy. By the way, for selecting each one of them, I have
-
00:56:18 used control button. You can select all of them. It is just fine. Then move to the main folder,
-
00:56:23 installation, and in here you will see embeddings folder. Go there, I'm just going to delete
-
00:56:28 the original one and I am pasting the ones as checkpoints. So how we are going to use them, just
-
00:56:37 type their name like this: this is equal to OHWX in the, in the DreamBooth tutorials that we have
-
00:56:45 and let's see. Currently it says that it is using seven prompts, but this is not correct
-
00:56:53 actually. It should be using just two. Okay, maybe it didn't refresh. Let's do a generation.
-
00:57:03 Okay, we got our picture. I think this is taking seven length because it is also using the okay,
-
00:57:15 yeah, so okay, now fixed. Now you see it is using only two tokens. Why? Because now it has loaded
-
00:57:25 the embedding file name and our embedding was composed with two vectors. Therefore,
-
00:57:34 it is using two vectors. However, if this was not our embedding name, it, if it was, was just a
-
00:57:42 regular prompt. If we go to the tokenizer we can see it was going to take. Let me show you one,
-
00:57:50 two, three, four, five, six, seven, eight tokens. You see each number is a token. This is a token.
-
00:57:58 So it was going to use eight tokens, but since it is an embedding name and the embedding is only two
-
00:58:05 vectors, it is using only two tokens because in the background, in the technical details,
-
00:58:13 it has composed of two unique tokens, since we did set the vector count 2. So for each vector a
-
00:58:20 token is generated and with a textual embedding we are able to insert, we are able to generate
-
00:58:28 new tokens, unlike DreamBooth. DreamBooth can only use the existing tokens. Okay, so now we are going
-
00:58:35 to generate a test case with using X/Y plot. I have tested CFG values and the prompt strength.
-
00:58:45 So from prompt strength i mean that prompt attention emphasis and it is explained in the wiki
-
00:58:51 of the Stable Diffusion of automatic1111 web ui. So you see, when you use parentheses like this,
-
00:58:57 it increases the attention toward by factor of 1.1. You can also set directly the attention like
-
00:59:03 this. So i have tested with embeddings, the prompt attention and it. It always resulted bad quality
-
00:59:12 for me, but still you can test with them. I also played with the CFG higher values. They were also
-
00:59:18 not very good, but now i will show you how to test each one of the embeddings. So instead of
-
00:59:26 typing manually, manually each one of the name, i have prepared a public cs fiddle script. I will
-
00:59:34 also share the link of this script so you will be also able to use it. So the starting checkpoint:
-
00:59:41 the starting checkpoint is 400, so let's set it as 400. Our increment is 200. We have selected
-
00:59:48 and our embedding name is tutorial name. Okay, so let's just type it in here. Then just click run
-
00:59:58 and you see it has generated me all of the names. I have names up to 5000. I copy them with a ctrl
-
01:00:07 c or copy. Then we are going to paste it in here in the X/Y plot and in here select the prompt sr.
-
01:00:15 Then we need to set a keyword. Okay, let's set a keyword as kw. Okay, test, it is not important,
-
01:00:26 you can simply set, set anything here. Now i will copy and paste some good prompts. To do
-
01:00:33 that i will use png info, drag and drop. Okay, i have lots of experimentation. As you can see,
-
01:00:42 these experimentation are from protogen training with textual embeddings. It was
-
01:00:47 also extremely successful for my face. Okay, let's pick from my today, experimentation which is under
-
01:00:59 okay, under here. Let's just pick one of them. Okay, now i am going to copy paste
-
01:01:06 this into text to image tab. You see, when you use png info, it shows all of
-
01:01:11 the parameters of the selected picture, if they are select, if they are generated by the web,
-
01:01:19 ui by default. Okay, so you see. Face photo of. Let me zoom in like this:
-
01:01:27 testp2400. This is from my previous embedding, so it is currently 60 tokens. Now i am going to
-
01:01:34 replace these with my test keyword. They will be replaced with these all of the tutorial
-
01:01:43 training text which are my embedding names, and prompt sr. Okay, as a second parameter. You see,
-
01:01:51 now it is reduced to 55. You can try CFG values actually, if you want. Or you can try the prompt
-
01:02:01 strength, prompt emphasis. To do that just at another keyword here as another kw. Okay, and
-
01:02:11 let's put it in as a prompt strength, actually, not more strength, but prompt sr, okay, and
-
01:02:19 replace it with 1.0 and 1.1, for example. So you will see the results of different prompt emphasis,
-
01:02:30 attention emphasis, as explained here. You can test them. You can also test the CFG values.
-
01:02:35 It is totally up to you. Do not check this box, because you will. You want to see the
-
01:02:42 same seed images. Actually, since these are different checkpoints, you are not going to
-
01:02:47 get the same image. By the way, when we use the command argument in here, let me show you. When
-
01:02:58 we use xformers, even if you use the same seed, you, you will not get the same image because,
-
01:03:07 since this is doing a lot of optimization, it will not allow you to get the exactly same image,
-
01:03:14 even if you use the same model and the same seed.. And also, there is one more thing:
-
01:03:21 actually there are two more options. If you have low vram. Let me show you. So in command
-
01:03:29 line arguments of the wiki, if we search for VRAM, let's see like this: you will see there
-
01:03:37 is medvram, medium vram and low vram. So if you also add these parameters to your command line
-
01:03:45 arguments like this. Let me show, okay, medvram and lowvram. It will allow you to run the web
-
01:03:55 ui on a lower vram gpu. and with lowvram and medvram you can still generate images with very
-
01:04:04 low amount of gpu. However, when you use low vram, it will not allow you to do training. So you can
-
01:04:13 add medvram to your command line argument and this will allow you to do training, textual embedding,
-
01:04:20 textual inversion training on a lower vram having gpu. Okay, okay, now we are ready. I'm not going
-
01:04:30 to test the strength, so i'm only going to test, the different embedding checkpoints. Okay, draw
-
01:04:39 legend, include separate images. Keep, keep minus one. Okay, we are ready. I'm not going to apply
-
01:04:45 restore faces or tiling or high resolution fix and okay, so let's just click and see the results.
-
01:04:56 Oh, by the way, to get a better idea, i am setting the batch size eight. So in each
-
01:05:03 in each generation, it will generate eight images for each one of the embedding checkpoint.
-
01:05:14 Okay, let me also show you the speed. So it is going to generate 25 grids because we have
-
01:05:20 selected 25 checkpoints and each one will be eight images. Therefore, it will generate 200 images.
-
01:05:27 Currently it shows speed as 5.73 second per iteration. Actually, per iteration is
-
01:05:37 currently eight images, eight steps, because we are generating eight images parallely as a batch.
-
01:05:46 Therefore, it is actually eight times faster than the regular one single image generation.
-
01:05:54 Okay, since it was going to take one hour and it's already 3 am and i want to finish this
-
01:06:00 video today, i am going to show the results of my previous training with exactly same data set
-
01:06:07 and exactly same settings and you are going to get this kind of output after generating grid
-
01:06:14 images. It is actually, let me see, 90 megabytes. So you see, these are the different checkpoints,
-
01:06:24 as you can see, and from these images you need to decide which one looks best. For example, i have
-
01:06:33 picked in this example: testp-2400 steps count, which means from 10 training images, 240 epochs,
-
01:06:47 and i have generated a lot of images from this epoch and actually they are the ones that i have
-
01:06:54 shown in the beginning of the video, these ones. So these ones were generated from the testp-2400
-
01:07:07 steps, as you can see. Also, the name is written on the images description. And
-
01:07:12 show me one of the example and see how good it is. It is a 3d rendering of the person. We did
-
01:07:22 trained and you see the quality. This is the raw quality. I didn't upscale it or did anything and
-
01:07:28 it is just amazing. Let's just upscale it and see how it looks, in the bigger resolution.
-
01:07:34 Okay, to do that, let's go to the extras tab and in here i will drag and drop it one moment.
-
01:07:45 Okay, this image, okay, and then i am going to use R-ESRGAN 4x+ I find this the best one. Actually,
-
01:07:57 you can try also anime for this one, and let's just upscale it four times.
-
01:08:05 Okay, upscale is done. And look at the quality. It is just amazingly stylized quality and these
-
01:08:13 are the original images. You see how good it is. It is exactly the same person and
-
01:08:19 hundred percent stylized as we wanted. If you wanted some artist to draw this it,
-
01:08:25 i think the artist would draw as good as only like this, and i also didn't generate too much images
-
01:08:32 because i had little time. I have been doing a lot of research, experimentation to explain
-
01:08:37 to you everything in this video with as much as possible details. Now, how you can
-
01:08:45 combine multiple embeddings in the single query. Let's say you have trained on multiple person,
-
01:08:51 multiple object and you want to use them. Or you have trained multiple styles and you want to apply
-
01:08:57 them in the same query. It is just so easy. All you need to do is just type the names of them. So
-
01:09:08 if you add here like this, and you, if you had, if you add this one, they will be used both,
-
01:09:16 since these two are using the same tokens, their strength will be applied, both of their strength
-
01:09:25 will be applied, both of their weights and vectors will be applied. But if they were different,
-
01:09:32 embedding file, both of them would be applied. So this is how you use embeddings:
-
01:09:41 in the text to image tab. Hopefully, i plan i plan to work on an experiment on teaching a style and
-
01:09:50 object and make another video about them, but, the principles are same. It may just require to
-
01:09:57 select the prepare the good training data set. You see, this training data set is not even good. The
-
01:10:04 images are blurry, not high quality. The lightning is not very good. As you can see,
-
01:10:10 this is a blurry image actually, and this is also a blurry image and you will get the link of this
-
01:10:16 data set to see on your computer as well. However, even though these are not very good. The results
-
01:10:24 are just amazing. As you can see, the textual embeddings are very strong to teach faces as well,
-
01:10:31 and you can train do you can do training on official pruned or you can do training
-
01:10:38 on protogen, like a protogen, a custom, very good model or SD 2.1. And the one good side of
-
01:10:47 textual inversion than the DreamBooth is that, for example, i did DreamBooth training on protogen and
-
01:10:53 it was a failure. However, it was a great success for textual inversion. By the way,
-
01:11:00 the grid images will be saved under the outputs folder inside text to image grids like this. When
-
01:11:07 you do X/Y plot generation and regular outputs are saved in the text to image stuff like this.
-
01:11:14 And this is all for today. I hope you have enjoyed it. I have worked a lot for preparing
-
01:11:24 this tutorial. I have read a lot of technical documents. I have done a lot of research
-
01:11:30 and experimentation and please subscribe. If you join and support us, i appreciate it. Like the
-
01:11:38 video, share it and if you have any questions, just join our discord channel. To do that, go to
-
01:11:44 our about tab and in here you will see official discord channel. Just click it. And if you support
-
01:11:49 us on patreon. I would appreciate that very much. So far, we have 10 patrons and i thank them a
-
01:11:57 lot. They are keeping me to prepare more, better videos and, hopefully see you in another video.
