Running Stable Diffusion XL on Cloud GPUs: A Complete Guide

Why Use Cloud GPUs for Stable Diffusion?

Stable Diffusion XL needs at least 8 GB VRAM for basic generation and 12+ GB for comfortable batch workflows. Instead of buying an expensive GPU, rent one by the hour.

Choosing the Right GPU Tier

Tier	GPU	VRAM	SDXL Performance	Price
Starter	RTX 3060	12 GB	~15 sec/image	$0.40/hr
Standard	RTX 3090	24 GB	~8 sec/image	$0.60/hr
Pro	RTX 4090	24 GB	~4 sec/image	$0.90/hr
Power	A6000	48 GB	~6 sec/image	$1.20/hr

For most AI art workflows, the Standard tier (RTX 3090) is the sweet spot — fast enough for real-time iteration at a great price.

Setting Up ComfyUI

After deploying your machine and connecting via RDP:

# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Download SDXL model
cd models/checkpoints
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# Start ComfyUI
cd ../..
python main.py

Open a browser on the remote machine to http://localhost:8188 and you're ready to generate.

Workflow Tips

Use the SDXL refiner for higher quality outputs — the two-pass workflow gives sharper details

Batch generate — queue 20-50 images at different prompts and cherry-pick the best

ControlNet works great on 24 GB VRAM — load OpenPose, Canny, and Depth simultaneously

Save your models to a persistent directory so you don't re-download on restart

LoRA Training on TurboGPU

The Power tier (A6000, 48 GB VRAM) is perfect for training custom LoRA models:

# Install kohya_ss training toolkit
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
pip install -r requirements.txt

# Train a LoRA with 20-30 images in ~20 minutes
accelerate launch train_network.py \
  --pretrained_model_name_or_path="sd_xl_base_1.0.safetensors" \
  --train_data_dir="./training_images" \
  --output_dir="./output" \
  --network_module=networks.lora \
  --max_train_epochs=10

Cost for a Typical Session

A 2-hour AI art session on the Standard tier costs $1.20. You can generate hundreds of images in that time. Compare that to buying an RTX 3090 for $800+.

Start generating →

Running Stable Diffusion XL on Cloud GPUs: A Complete Guide

Why Use Cloud GPUs for Stable Diffusion?

Choosing the Right GPU Tier

Setting Up ComfyUI

Workflow Tips

LoRA Training on TurboGPU

Cost for a Typical Session

Ready to Try TurboGPU?

More Articles

Cloud Gaming on an RTX 4090: How to Play AAA Titles at 4K from Any Device

Run Llama 3, Mixtral, and GPT-J Locally on Cloud GPUs

RTX 3060 vs 3090 vs 4090 vs A6000: Cloud GPU Benchmark Showdown