Skip to content

8 GB LoRA Training Fix CUDA and xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

FurkanGozukara edited this page Oct 26, 2025 · 1 revision

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

8 GB LoRA Training - Fix CUDA & xformers For DreamBooth and Textual Inversion in Automatic1111 SD UI

image Hits Patreon BuyMeACoffee Furkan Gözükara Medium Codio Furkan Gözükara Medium

YouTube Channel Furkan Gözükara LinkedIn Udemy Twitter Follow Furkan Gözükara

updated tutorial: https://youtu.be/pom3nQejaTs - Our Discord : https://discord.gg/HbqgGaZVmr. This video I am showing how to downgrade CUDA and xformers version for proper training and I am showing how to do LoRA training with 8GB GPU. If I have been of assistance to you and you would like to show your support for my work, please consider becoming a patron on 🥰 https://www.patreon.com/SECourses

Playlist of Stable Diffusion Tutorials, #Automatic1111 and Google Colab Guides, DreamBooth, Textual Inversion / Embedding, #LoRA, AI Upscaling, Pix2Pix, Img2Img:

https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

This CUDA downgrade will not be necessary probably after the extensions get updated. However it is not certain when will they get updated. Meanwhile you can downgrade and use CUDA 11.6.

Stable Diffusion Playlist : https://www.youtube.com/playlist?list=PL_pbwdIyffsmclLl0O144nQRnezKlNdx3

The commands you need to execute with order to downgrade CUDA

https://gist.github.com/FurkanGozukara/e2db853d2016a4a9ae2cc32dc41d730a

Run CMD as administrator if you get error

1:

activate

2:

pip uninstall torch torchvision

3:

pip uninstall torchaudio

4:

pip uninstall xformers

5:

pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116

6:

pip install -U -I --no-deps https://github.com/C43H66N12O12S2/stable-diffusion-webui/releases/download/torch13/xformers-0.0.14.dev0-cp310-cp310-win_amd64.whl

These below are specific hashes used in video but not necessary to use. You can install newest version of both DreamBooth and Automatic1111 and just downgrade CUDA with the above commands.

Automatic 1111 commit : dc8d1f4f8beb546089abd107db3432e03339c9c0

Dreambooth commit : c544ee11aee0085a7fbb7fdda65898dea2145f0c

Watch this video for learning how to use FileWords:

https://youtu.be/KwxNcGhHuLY

#xformers

OUTLINE

00:00:00 Introduction to How to downgrade CUDA version

00:01:46 Automatic1111 will ask you to upgrade CUDA. Don't yet.

00:02:03 How to downgrade your CUDA version in your Automatic1111 installation folder

00:04:30 How to install DreamBooth extension

00:05:07 How to install and use dev branch of DreamBooth extension

00:06:42 How to stash local changes to checkout different git branch

00:07:13 How to start LoRA training for 8 GB VRAM GPUs

00:08:22 Settings and setup for LoRA training

00:13:36 How to generate ckpt file from LoRA training checkpoint

Sure, here are some additional details on how transformers can be used with CUDA-enabled NVIDIA hardware:

Transfer learning: Transfer learning is a technique that can be used to leverage pre-trained transformer models, such as BERT or GPT-2, to improve the performance of NLP tasks with limited training data. NVIDIA's hardware and software can be used to fine-tune these pre-trained models on specific NLP tasks, allowing for faster convergence and higher accuracy.

Customization and optimization: The flexibility of transformers allows for a wide range of customization options and optimization techniques. NVIDIA's software libraries can be used to implement custom activation functions, weight initialization schemes, and other architectural modifications to improve model performance. In addition, CUDA enables developers to optimize the transformer models for specific hardware configurations, such as different numbers of GPUs, to achieve the best performance.

Real-time applications: Transformers can be used for real-time NLP applications, such as chatbots and speech recognition, which require low latency and high throughput. NVIDIA's hardware and software can be used to optimize transformer models for real-time applications by reducing inference time and increasing throughput.

Natural language generation: Transformers can be used for natural language generation (NLG) tasks, such as text summarization and language translation. NVIDIA's hardware and software can be used to optimize transformer models for NLG tasks, by improving the generation speed and quality of the output.

Deployment: NVIDIA's software libraries, such as TensorRT, can be used to optimize and deploy transformer models to various production environments, such as cloud-based services and edge devices. This allows for the efficient deployment of transformer models in a variety of real-world applications.

Overall, transformers and CUDA-enabled NVIDIA hardware provide a powerful combination for accelerating NLP tasks, including training and inference of transformer models, transfer learning, customization and optimization, real-time applications, natural language generation, and deployment to production environments.

Video Transcription

  • 00:00:00 Greetings everyone. This will be a short video  to explain how to use CUDA 11.6 version after  

  • 00:00:07 the latest Automatic1111 update to be able to do  training correctly by using either DreamBooth or  

  • 00:00:15 Textual Inversion. Moreover, I will show how  to use dev branch of DreamBooth extension to  

  • 00:00:20 be able to use LoRA if you have 8GB VRAM having  GPU. If you are interested in learning more,  

  • 00:00:26 I have very detailed several videos. So this is  the playlist of my Stable Diffusion related videos  

  • 00:00:32 on my channel. If you are interested in to learn  more information, I suggest you to watch with  

  • 00:00:39 this order: first Zero to Hero Stable Diffusion.  Then how to do Stable Diffusion Textual Inversion.  

  • 00:00:45 Then how to inject your training subject, then  DreamBooth Got Buffed 22 January Update. This  

  • 00:00:52 will teach you a lot of information related to  Stable Diffusion and finally, you can watch my  

  • 00:00:58 older how to do Stable Diffusion LoRA training  video. But this is not very up to date at the  

  • 00:01:03 moment and hopefully I will make much more updated  one. So Automatic1111 recently updated its Torch  

  • 00:01:11 version and xformers version to latest ones or the  more updated ones. You see the Torch version is  

  • 00:01:17 now 1.13 and CUDA version is 11.7. However, this  is currently not very well supported by DreamBoot  

  • 00:01:25 or Textual Inversion training. How do I know?  There are several issues, topics on the GitHub of  

  • 00:01:31 Automatic1111 and you see, don't use Torch 13. It  is breaking the functionality. Or CUDA, use CUDA  

  • 00:01:39 11.6. So in this video I will show you how you  can revert back to older version of CUDA after  

  • 00:01:45 you have upgraded. By the way, it will ask you to  upgrade your Torch version with this command line  

  • 00:01:52 argument. If you have already updated, watch  this video to learn how to downgrade. Or if  

  • 00:01:58 you are doing a fresh installation, watch  this video to learn how to do downgrade.  

  • 00:02:03 So for downgrading our Torch and CUDA version,  we are entering our installation folder,  

  • 00:02:10 as you can see, Stable Diffusion Web UI master  and enter inside the venv folder and inside here,  

  • 00:02:16 enter the scripts folder. Let me show you  with zooming in. This is the folder where  

  • 00:02:21 you need to enter first the installation folder  inside that venv folder and inside the scripts  

  • 00:02:27 folder. Then in here, type CMD. It will open CMD  window with that path, as you can see right now.  

  • 00:02:36 Then, with the following order, we are going  to execute each one of these commands inside  

  • 00:02:41 here. I will put all of these commands into the  description of the video. So don't worry about  

  • 00:02:46 that. I am just copying and pasting them  like this and hitting enter, one by one.  

  • 00:02:52 It will ask you to proceed and click and  hit y letter and hit enter. By the way,  

  • 00:03:00 I got error because both of my CMD windows for  running Stable Diffusion is open. Make sure  

  • 00:03:05 that you have closed them first. Once you closed  your CMD windows you will see that successfully  

  • 00:03:12 uninstalled Torch Vision. Then execute the next  command like this: and it's executed. Then execute  

  • 00:03:21 the next one like this: OK, it is asking. And  hit y keyword and hit enter. Then we are going  

  • 00:03:28 to execute this command. This will take some  time because it will download the CUDA version.  

  • 00:03:36 If you get a warning like this, it is just fine.  Just ignore it. OK, in the end you will get a  

  • 00:03:42 message like this: Ignore this error message  and focus on this message. You see successfully  

  • 00:03:48 installed Torch 1.13 and CUDA 11.6. Then the  next command is the xformers installation.  

  • 00:03:56 Just copy and paste it in here and hit enter. OK,  let me copy paste again. OK, now it is installing  

  • 00:04:06 that one as well. OK, now we are all ready.  We can now start our application as usual.  

  • 00:04:13 I'm just starting with xformers and I am  using the latest commit of the automatic  

  • 00:04:19 1111. Let me show you which one it is. It is  12 minutes ago updated and its hash is this:  

  • 00:04:27 I will put this into the description of the video  as well. OK, now we have started, like this,  

  • 00:04:32 with the newest installation and let's go to the  extensions and click available, load from. And in  

  • 00:04:41 here let's install the DreamBooth extension, like  this. You see, while installing, I am seeing now  

  • 00:04:49 the checking DreamBooth requirements and it is  showing me the installed things. Torch version  

  • 00:04:54 is 1.13 and CUDA is 11.6 and Torch Vision is this  one. So currently we are on the correct and after  

  • 00:05:03 installation is completed we need to restart  CMD window. But before starting again now, I  

  • 00:05:09 will show you how to move to the developer branch  of the DreamBooth extension. From here. Go to the  

  • 00:05:16 installer, click here. It will open the GitHub  repository of the extension And in here currently  

  • 00:05:21 you are seeing the main branch. By default, it is  installing the main branch. However, there is also  

  • 00:05:27 development branch and which is the most up to  date branch. Actually, I think he just merged with  

  • 00:05:34 the main, but I will still show you the developer  branch because in future you may need it. Yes,  

  • 00:05:43 he just updated while I started the video. So  how we are going to load into the development  

  • 00:05:49 branch. We are going to enter our extensions  folder. By the way, to do a fresh installation,  

  • 00:05:55 just delete this folder and you can then fresh  install your extension and enter inside here  

  • 00:06:02 And in here we are running CMD. By the way, for  git commands to work you need to have installed  

  • 00:06:08 Git Bash or any git repository handler. For  example. If you type Git Bash, you can see it's  

  • 00:06:16 link in here and you can download it and install  it. Then the git commands will work and then we  

  • 00:06:23 will pull the development branch. Git pull origin  dev. It will pull the development branch. For me.  

  • 00:06:32 It says it's already up to date. Then you need to  do git checkout dev. OK, now we are already in the  

  • 00:06:41 development branch. This is how you check out.  If you encounter error, you can just do git  

  • 00:06:47 stash and it will stash the local changes. Then  you will be able to check out the development  

  • 00:06:53 branch. Now I will check out the main again.  So we can use the main and after doing that,  

  • 00:07:00 you see it is telling me switched to branch  main. I will just restart CMD window. Which one?  

  • 00:07:09 Oh, we still didn't start the CMD yet. OK, sorry  about that. Let's just click the start and now I  

  • 00:07:14 will show you how to do LoRA training and  generate ckpt from saved Checkpoint. OK,  

  • 00:07:22 we are finally started and correct Torch Vision  and xformers. Currently, we will use this CUDA  

  • 00:07:29 version. However, I am pretty sure that the  developer will fix the problem with CUDA 11.7  

  • 00:07:35 in future. Then you won't be need to downgrade  your CUDA version. Let's refresh our stable,  

  • 00:07:42 refresh our Automatic1111 web UI. Go to the  Dreambooth tab And now for LoRA to appear,  

  • 00:07:50 first we need to pick LoRA and now LoRA drop downs  will appear. Of course, we will first generate a  

  • 00:07:57 LoRA model as test one and you will see a new  experimental thing: unfreeze model. Currently  

  • 00:08:05 I am working on to figure out the best settings to  do LoRA training. However, it is taking time. I am  

  • 00:08:11 making this video to show you the latest changes.  And when I get more information to train a better  

  • 00:08:19 LoRA model, hopefully I will make another video.  So by I will use just the default settings for  

  • 00:08:25 now and just create model. However, you can still  play with unfreeze model option. You see. You see  

  • 00:08:33 it says that unfreezes model layers and allows for  potentially better training, but makes increased  

  • 00:08:38 VRAM usage more likely. Okay, once the model is  generated, you will see this model is selected  

  • 00:08:45 here and we still didn't start the training.  Therefore it is not appearing here. Then in  

  • 00:08:51 here I am. You can. I think they fixed. This class  generate classification images using text2image.  

  • 00:08:57 Let's also try that. Let's say 500 epochs, zero,  let's save model preview and model saving weights  

  • 00:09:06 every five epochs. And you see that these are the  default learning rates. Actually, these these are  

  • 00:09:15 not very optimal right now. When I figure out the  optimal ones, hopefully I will make another video.  

  • 00:09:20 Lets type our usual sanity prompt photo of ohwx  man by Tomer Hanuka. If you watch my more detailed  

  • 00:09:29 videos you will learn more about how we, why we  are doing sanity sample prompt and other settings.  

  • 00:09:36 Okay, in here I am selecting now FP16 because  FP16 have better precision than BF16. Actually,  

  • 00:09:44 I was knowing incorrectly in my previous  videos. So FP16 is supposed to have better  

  • 00:09:50 precision and better performance. If you check  this cache latents, it will use more VRAM. So  

  • 00:09:56 if you have 8GB of VRAM GPU, then you may not  want to check this, but I suggest you to first  

  • 00:10:04 try it. If you get out of memory error, then  uncheck this and we are going to train UNET.  

  • 00:10:10 I think without this it is using about 7 GB of  VRAM. So you can still train UNET with this.  

  • 00:10:18 And there is also another experimental thing  which is freeze clip normalization layers.  

  • 00:10:24 Keep the normalization layers of clip frozen  during training. Advanced usage, may increase  

  • 00:10:29 model performance and editability. However, again,  this is very experimental and I am yet to figure  

  • 00:10:35 out the best working settings. I have been working  on them for over two days and still I am not  

  • 00:10:42 figured out the best settings. And, as usual,  let's set up our training directory and other  

  • 00:10:48 things. So I am going to use this training  data set. This is 9 images. They are all  

  • 00:10:53 different backgrounds and different clothes.  Okay, classification let's say example, okay,  

  • 00:11:01 instance token. And FileWords. So in my previous  video I explained how to use FileWords. I am not  

  • 00:11:08 going to repeat it here. Let's just say, ohwx  man and photo of man and photo of ohwx man. Okay,  

  • 00:11:19 these are the classical things. And let's say 100  images per instance image. Okay, and in saving  

  • 00:11:28 you can generate a ckpt when training completes.  But 500 epochs is very likely to over train.  

  • 00:11:35 Actually it is becoming too fast over trained with  default settings. LoRA rank. This is also another  

  • 00:11:42 new thing. And as you increase LoRA rank, it is  supposed to hold more data. But I tested that  

  • 00:11:48 and when I increased it to maximum, the results  were much worse than the default for. So still,  

  • 00:11:57 I am yet to figure out the best settings  and hopefully when I figure out I will make  

  • 00:12:01 another video. Generate LoRA weights when saving  during training. With this way it will generate a  

  • 00:12:07 checkpoints for us. Then later we will be able to  pick the checkpoint and generate a ckpt from that.  

  • 00:12:12 Okay, click save settings and click train. Let's  see if it will generate the classification images.  

  • 00:12:26 [Inaudible] Okay, in here it is showing correctly  the number of steps. So I am thinking that it  

  • 00:12:34 will generate now. Yes, it generated. Why there  is text2img tab. so you can customize the image  

  • 00:12:44 generation from here. Alternatively, you can batch  generate from here and give the folder path. I  

  • 00:12:51 will show you my previous training results because  I just did a training before starting this video.  

  • 00:12:57 Okay, this is from my previous training with  LoRA with the same settings as I just shown,  

  • 00:13:03 and you see, I lost stylizing out even after  the 187 steps and when you divide it by 9,  

  • 00:13:13 it was just over 20 epochs and, as you see, the  results are not very good and it takes a lot of  

  • 00:13:21 tries to get your results stylizing. I have used  this specific checkpoint 1356 to generate ckpt and  

  • 00:13:34 generate images from that. So how do we generate  ckpt? Go to the DreamBooth and in here select the  

  • 00:13:41 model. Okay, then make sure that you have selected  use LoRA, and then you will see the generated ckpt  

  • 00:13:49 points. And in here, select the ckpt point that  you want to generate a generated ckpt, select the  

  • 00:13:57 LoRA model checkpoint, then make sure that you  clicked first, load settings, then click save  

  • 00:14:06 settings, otherwise it is not working. Actually,  the last time I tried, I tried it was not working.  

  • 00:14:12 Once you see config saved, click generate  ckpt and in here you will see the messages:  

  • 00:14:21 Okay, you see it has loaded. First test one 1356  checkpoint. However, it generated a ckpt file  

  • 00:14:31 name with the latest step of the training.  This is incorrect. I have reported this to  

  • 00:14:37 the developer and I am hoping that it will  get fixed soon. After a ckpt is generated,  

  • 00:14:42 you can just click refresh and you can  now start using your LoRA trained model.  

  • 00:14:50 And now, now let me show you the results I have  got from my previous tries. I have used these  

  • 00:14:57 as command prompt. This is the positive prompt  and this is the negative prompt. And now let me  

  • 00:15:03 show you the output. So you see, the outputs  are all my face, the subject I teach by, but  

  • 00:15:10 the stylizing is very poor and the quality is also  poor. I think it is already pretty overtrained and  

  • 00:15:20 so we need better settings. It certainly learns  your subject, your face. However, it loses its  

  • 00:15:28 ability to stylize your face as in the DreamBooth,  because the last video, the last video I made for  

  • 00:15:35 DreamBooth, was extremely successful, which you  can watch in this video actually. So the LoRA  

  • 00:15:44 is currently very inferior than the DreamBooth  with the default settings, but with new this,  

  • 00:15:50 with these new experimental settings, I am hoping  that it will become much better once we figure  

  • 00:15:57 out the optimal settings. Still, you can stylize  it, but it is much harder than DreamBooth. You  

  • 00:16:03 need to generate a lot of images and you need  to test different cfg and perhaps checkpoints.  

  • 00:16:13 I hope you have enjoyed it. Please subscribe,  like, comment, share and hopefully I will let  

  • 00:16:19 you know the news. If you also support us  on patreon, I would appreciate it very much.  

  • 00:16:24 Currently we have 12 patrons and I appreciate them  very much for supporting us. They are making me to  

  • 00:16:34 continue produce more quality content. You can  also join our discord channel from here and I  

  • 00:16:40 will also put the discord channel link into the  description. Hopefully see you in another video.

Clone this wiki locally