-
-
Notifications
You must be signed in to change notification settings - Fork 358
8 GB LoRA Training Fix CUDA and xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI
8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI
Full tutorial link > https://www.youtube.com/watch?v=O01BrQwOd-Q
updated tutorial: https://youtu.be/pom3nQejaTs - Our Discord : https://discord.gg/HbqgGaZVmr. This video I am showing how to downgrade CUDA and xformers version for proper training and I am showing how to do LoRA training with 8GB GPU. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses
Playlist of Stable Diffusion Tutorials, #Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, #LoRA, AI Upscaling, Pix2Pix, Img2Img:
https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
This CUDA downgrade will not be necessary probably after the extensions get updated. However it is not certain when will they get updated. Meanwhile you can downgrade and use CUDA 11.6.
Stable Diffusion Playlist : https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3
The commands you need to execute with order to downgrade CUDA
https://gist.github.com/FurkanGozukara/e2db853d2016a4a9ae2cc32dc41d730a
Run CMD as administrator if you get error
1:
activate
2:
pip uninstall torch torchvision
3:
pip uninstall torchaudio
4:
pip uninstall xformers
5:
pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116
6:
pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/torch13/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl
These below are specific hashes used in video but not necessary to use. You can install newest version of both DreamBooth and Automatic1111 and just downgrade CUDA with the above commands.
Automatic 1111 commit : dc8d1f4f8beb546089abd107db3432e03339c9c0
Dreambooth commit : c544ee11aee0085a7fbb7fdda65898dea2145f0c
Watch this video for learning how to use FileWords:
#xformers
OUTLINE
00:00:00 Introduction to How to downgrade CUDA version
00:01:46 Automatic1111 will ask you to upgrade CUDA. Don't yet.
00:02:03 How to downgrade your CUDA version in your Automatic1111 installation folder
00:04:30 How to install DreamBooth extension
00:05:07 How to install and use dev branch of DreamBooth extension
00:06:42 How to stash local changes to checkout different git branch
00:07:13 How to start LoRA training for 8 GB VRAM GPUs
00:08:22 Settings and setup for LoRA training
00:13:36 How to generate ckpt file from LoRA training checkpoint
Sure, here are some additional details on how transformers can be used with CUDA-enabled NVIDIA hardware:
Transfer learning: Transfer learning is a technique that can be used to leverage pre-trained transformer models, such as BERT or GPT-2, to improve the performance of NLP tasks with limited training data. NVIDIA's hardware and software can be used to fine-tune these pre-trained models on specific NLP tasks, allowing for faster convergence and higher accuracy.
Customization and optimization: The flexibility of transformers allows for a wide range of customization options and optimization techniques. NVIDIA's software libraries can be used to implement custom activation functions, weight initialization schemes, and other architectural modifications to improve model performance. In addition, CUDA enables developers to optimize the transformer models for specific hardware configurations, such as different numbers of GPUs, to achieve the best performance.
Real-time applications: Transformers can be used for real-time NLP applications, such as chatbots and speech recognition, which require low latency and high throughput. NVIDIA's hardware and software can be used to optimize transformer models for real-time applications by reducing inference time and increasing throughput.
Natural language generation: Transformers can be used for natural language generation (NLG) tasks, such as text summarization and language translation. NVIDIA's hardware and software can be used to optimize transformer models for NLG tasks, by improving the generation speed and quality of the output.
Deployment: NVIDIA's software libraries, such as TensorRT, can be used to optimize and deploy transformer models to various production environments, such as cloud-based services and edge devices. This allows for the efficient deployment of transformer models in a variety of real-world applications.
Overall, transformers and CUDA-enabled NVIDIA hardware provide a powerful combination for accelerating NLP tasks, including training and inference of transformer models, transfer learning, customization and optimization, real-time applications, natural language generation, and deployment to production environments.
-
00:00:00 Greetings everyone. This will be a short video to explain how to use CUDA 11.6 version after
-
00:00:07 the latest Automatic1111 update to be able to do training correctly by using either DreamBooth or
-
00:00:15 Textual Inversion. Moreover, I will show how to use dev branch of DreamBooth extension to
-
00:00:20 be able to use LoRA if you have 8GB VRAM having GPU. If you are interested in learning more,
-
00:00:26 I have very detailed several videos. So this is the playlist of my Stable Diffusion related videos
-
00:00:32 on my channel. If you are interested in to learn more information, I suggest you to watch with
-
00:00:39 this order: first Zero to Hero Stable Diffusion. Then how to do Stable Diffusion Textual Inversion.
-
00:00:45 Then how to inject your training subject, then DreamBooth Got Buffed 22 January Update. This
-
00:00:52 will teach you a lot of information related to Stable Diffusion and finally, you can watch my
-
00:00:58 older how to do Stable Diffusion LoRA training video. But this is not very up to date at the
-
00:01:03 moment and hopefully I will make much more updated one. So Automatic1111 recently updated its Torch
-
00:01:11 version and xformers version to latest ones or the more updated ones. You see the Torch version is
-
00:01:17 now 1.13 and CUDA version is 11.7. However, this is currently not very well supported by DreamBoot
-
00:01:25 or Textual Inversion training. How do I know? There are several issues, topics on the GitHub of
-
00:01:31 Automatic1111 and you see, don't use Torch 13. It is breaking the functionality. Or CUDA, use CUDA
-
00:01:39 11.6. So in this video I will show you how you can revert back to older version of CUDA after
-
00:01:45 you have upgraded. By the way, it will ask you to upgrade your Torch version with this command line
-
00:01:52 argument. If you have already updated, watch this video to learn how to downgrade. Or if
-
00:01:58 you are doing a fresh installation, watch this video to learn how to do downgrade.
-
00:02:03 So for downgrading our Torch and CUDA version, we are entering our installation folder,
-
00:02:10 as you can see, Stable Diffusion Web UI master and enter inside the venv folder and inside here,
-
00:02:16 enter the scripts folder. Let me show you with zooming in. This is the folder where
-
00:02:21 you need to enter first the installation folder inside that venv folder and inside the scripts
-
00:02:27 folder. Then in here, type CMD. It will open CMD window with that path, as you can see right now.
-
00:02:36 Then, with the following order, we are going to execute each one of these commands inside
-
00:02:41 here. I will put all of these commands into the description of the video. So don't worry about
-
00:02:46 that. I am just copying and pasting them like this and hitting enter, one by one.
-
00:02:52 It will ask you to proceed and click and hit y letter and hit enter. By the way,
-
00:03:00 I got error because both of my CMD windows for running Stable Diffusion is open. Make sure
-
00:03:05 that you have closed them first. Once you closed your CMD windows you will see that successfully
-
00:03:12 uninstalled Torch Vision. Then execute the next command like this: and it's executed. Then execute
-
00:03:21 the next one like this: OK, it is asking. And hit y keyword and hit enter. Then we are going
-
00:03:28 to execute this command. This will take some time because it will download the CUDA version.
-
00:03:36 If you get a warning like this, it is just fine. Just ignore it. OK, in the end you will get a
-
00:03:42 message like this: Ignore this error message and focus on this message. You see successfully
-
00:03:48 installed Torch 1.13 and CUDA 11.6. Then the next command is the xformers installation.
-
00:03:56 Just copy and paste it in here and hit enter. OK, let me copy paste again. OK, now it is installing
-
00:04:06 that one as well. OK, now we are all ready. We can now start our application as usual.
-
00:04:13 I'm just starting with xformers and I am using the latest commit of the automatic
-
00:04:19 1111. Let me show you which one it is. It is 12 minutes ago updated and its hash is this:
-
00:04:27 I will put this into the description of the video as well. OK, now we have started, like this,
-
00:04:32 with the newest installation and let's go to the extensions and click available, load from. And in
-
00:04:41 here let's install the DreamBooth extension, like this. You see, while installing, I am seeing now
-
00:04:49 the checking DreamBooth requirements and it is showing me the installed things. Torch version
-
00:04:54 is 1.13 and CUDA is 11.6 and Torch Vision is this one. So currently we are on the correct and after
-
00:05:03 installation is completed we need to restart CMD window. But before starting again now, I
-
00:05:09 will show you how to move to the developer branch of the DreamBooth extension. From here. Go to the
-
00:05:16 installer, click here. It will open the GitHub repository of the extension And in here currently
-
00:05:21 you are seeing the main branch. By default, it is installing the main branch. However, there is also
-
00:05:27 development branch and which is the most up to date branch. Actually, I think he just merged with
-
00:05:34 the main, but I will still show you the developer branch because in future you may need it. Yes,
-
00:05:43 he just updated while I started the video. So how we are going to load into the development
-
00:05:49 branch. We are going to enter our extensions folder. By the way, to do a fresh installation,
-
00:05:55 just delete this folder and you can then fresh install your extension and enter inside here
-
00:06:02 And in here we are running CMD. By the way, for git commands to work you need to have installed
-
00:06:08 Git Bash or any git repository handler. For example. If you type Git Bash, you can see it's
-
00:06:16 link in here and you can download it and install it. Then the git commands will work and then we
-
00:06:23 will pull the development branch. Git pull origin dev. It will pull the development branch. For me.
-
00:06:32 It says it's already up to date. Then you need to do git checkout dev. OK, now we are already in the
-
00:06:41 development branch. This is how you check out. If you encounter error, you can just do git
-
00:06:47 stash and it will stash the local changes. Then you will be able to check out the development
-
00:06:53 branch. Now I will check out the main again. So we can use the main and after doing that,
-
00:07:00 you see it is telling me switched to branch main. I will just restart CMD window. Which one?
-
00:07:09 Oh, we still didn't start the CMD yet. OK, sorry about that. Let's just click the start and now I
-
00:07:14 will show you how to do LoRA training and generate ckpt from saved Checkpoint. OK,
-
00:07:22 we are finally started and correct Torch Vision and xformers. Currently, we will use this CUDA
-
00:07:29 version. However, I am pretty sure that the developer will fix the problem with CUDA 11.7
-
00:07:35 in future. Then you won't be need to downgrade your CUDA version. Let's refresh our stable,
-
00:07:42 refresh our Automatic1111 web UI. Go to the Dreambooth tab And now for LoRA to appear,
-
00:07:50 first we need to pick LoRA and now LoRA drop downs will appear. Of course, we will first generate a
-
00:07:57 LoRA model as test one and you will see a new experimental thing: unfreeze model. Currently
-
00:08:05 I am working on to figure out the best settings to do LoRA training. However, it is taking time. I am
-
00:08:11 making this video to show you the latest changes. And when I get more information to train a better
-
00:08:19 LoRA model, hopefully I will make another video. So by I will use just the default settings for
-
00:08:25 now and just create model. However, you can still play with unfreeze model option. You see. You see
-
00:08:33 it says that unfreezes model layers and allows for potentially better training, but makes increased
-
00:08:38 VRAM usage more likely. Okay, once the model is generated, you will see this model is selected
-
00:08:45 here and we still didn't start the training. Therefore it is not appearing here. Then in
-
00:08:51 here I am. You can. I think they fixed. This class generate classification images using text2image.
-
00:08:57 Let's also try that. Let's say 500 epochs, zero, let's save model preview and model saving weights
-
00:09:06 every five epochs. And you see that these are the default learning rates. Actually, these these are
-
00:09:15 not very optimal right now. When I figure out the optimal ones, hopefully I will make another video.
-
00:09:20 Lets type our usual sanity prompt photo of ohwx man by Tomer Hanuka. If you watch my more detailed
-
00:09:29 videos you will learn more about how we, why we are doing sanity sample prompt and other settings.
-
00:09:36 Okay, in here I am selecting now FP16 because FP16 have better precision than BF16. Actually,
-
00:09:44 I was knowing incorrectly in my previous videos. So FP16 is supposed to have better
-
00:09:50 precision and better performance. If you check this cache latents, it will use more VRAM. So
-
00:09:56 if you have 8GB of VRAM GPU, then you may not want to check this, but I suggest you to first
-
00:10:04 try it. If you get out of memory error, then uncheck this and we are going to train UNET.
-
00:10:10 I think without this it is using about 7 GB of VRAM. So you can still train UNET with this.
-
00:10:18 And there is also another experimental thing which is freeze clip normalization layers.
-
00:10:24 Keep the normalization layers of clip frozen during training. Advanced usage, may increase
-
00:10:29 model performance and editability. However, again, this is very experimental and I am yet to figure
-
00:10:35 out the best working settings. I have been working on them for over two days and still I am not
-
00:10:42 figured out the best settings. And, as usual, let's set up our training directory and other
-
00:10:48 things. So I am going to use this training data set. This is 9 images. They are all
-
00:10:53 different backgrounds and different clothes. Okay, classification let's say example, okay,
-
00:11:01 instance token. And FileWords. So in my previous video I explained how to use FileWords. I am not
-
00:11:08 going to repeat it here. Let's just say, ohwx man and photo of man and photo of ohwx man. Okay,
-
00:11:19 these are the classical things. And let's say 100 images per instance image. Okay, and in saving
-
00:11:28 you can generate a ckpt when training completes. But 500 epochs is very likely to over train.
-
00:11:35 Actually it is becoming too fast over trained with default settings. LoRA rank. This is also another
-
00:11:42 new thing. And as you increase LoRA rank, it is supposed to hold more data. But I tested that
-
00:11:48 and when I increased it to maximum, the results were much worse than the default for. So still,
-
00:11:57 I am yet to figure out the best settings and hopefully when I figure out I will make
-
00:12:01 another video. Generate LoRA weights when saving during training. With this way it will generate a
-
00:12:07 checkpoints for us. Then later we will be able to pick the checkpoint and generate a ckpt from that.
-
00:12:12 Okay, click save settings and click train. Let's see if it will generate the classification images.
-
00:12:26 [Inaudible] Okay, in here it is showing correctly the number of steps. So I am thinking that it
-
00:12:34 will generate now. Yes, it generated. Why there is text2img tab. so you can customize the image
-
00:12:44 generation from here. Alternatively, you can batch generate from here and give the folder path. I
-
00:12:51 will show you my previous training results because I just did a training before starting this video.
-
00:12:57 Okay, this is from my previous training with LoRA with the same settings as I just shown,
-
00:13:03 and you see, I lost stylizing out even after the 187 steps and when you divide it by 9,
-
00:13:13 it was just over 20 epochs and, as you see, the results are not very good and it takes a lot of
-
00:13:21 tries to get your results stylizing. I have used this specific checkpoint 1356 to generate ckpt and
-
00:13:34 generate images from that. So how do we generate ckpt? Go to the DreamBooth and in here select the
-
00:13:41 model. Okay, then make sure that you have selected use LoRA, and then you will see the generated ckpt
-
00:13:49 points. And in here, select the ckpt point that you want to generate a generated ckpt, select the
-
00:13:57 LoRA model checkpoint, then make sure that you clicked first, load settings, then click save
-
00:14:06 settings, otherwise it is not working. Actually, the last time I tried, I tried it was not working.
-
00:14:12 Once you see config saved, click generate ckpt and in here you will see the messages:
-
00:14:21 Okay, you see it has loaded. First test one 1356 checkpoint. However, it generated a ckpt file
-
00:14:31 name with the latest step of the training. This is incorrect. I have reported this to
-
00:14:37 the developer and I am hoping that it will get fixed soon. After a ckpt is generated,
-
00:14:42 you can just click refresh and you can now start using your LoRA trained model.
-
00:14:50 And now, now let me show you the results I have got from my previous tries. I have used these
-
00:14:57 as command prompt. This is the positive prompt and this is the negative prompt. And now let me
-
00:15:03 show you the output. So you see, the outputs are all my face, the subject I teach by, but
-
00:15:10 the stylizing is very poor and the quality is also poor. I think it is already pretty overtrained and
-
00:15:20 so we need better settings. It certainly learns your subject, your face. However, it loses its
-
00:15:28 ability to stylize your face as in the DreamBooth, because the last video, the last video I made for
-
00:15:35 DreamBooth, was extremely successful, which you can watch in this video actually. So the LoRA
-
00:15:44 is currently very inferior than the DreamBooth with the default settings, but with new this,
-
00:15:50 with these new experimental settings, I am hoping that it will become much better once we figure
-
00:15:57 out the optimal settings. Still, you can stylize it, but it is much harder than DreamBooth. You
-
00:16:03 need to generate a lot of images and you need to test different cfg and perhaps checkpoints.
-
00:16:13 I hope you have enjoyed it. Please subscribe, like, comment, share and hopefully I will let
-
00:16:19 you know the news. If you also support us on patreon, I would appreciate it very much.
-
00:16:24 Currently we have 12 patrons and I appreciate them very much for supporting us. They are making me to
-
00:16:34 continue produce more quality content. You can also join our discord channel from here and I
-
00:16:40 will also put the discord channel link into the description. Hopefully see you in another video.
