RTX 4090 -- 24 units onlineA100 80GB -- 12 units availableH100 SXM -- 8 units readyRTX 4070 Ti -- 48 units onlineA100 40GB -- 16 units availableH100 NVL -- 4 units readyRTX 4090 -- 24 units onlineA100 80GB -- 12 units availableH100 SXM -- 8 units readyRTX 4070 Ti -- 48 units onlineA100 40GB -- 16 units availableH100 NVL -- 4 units readyRTX 4090 -- 24 units onlineA100 80GB -- 12 units availableH100 SXM -- 8 units readyRTX 4070 Ti -- 48 units onlineA100 40GB -- 16 units availableH100 NVL -- 4 units ready
← Back to Blog
Running Stable Diffusion XL on Cloud GPUs: A Complete Guide
AI Art2026-03-0812 min read

Running Stable Diffusion XL on Cloud GPUs: A Complete Guide

Why Use Cloud GPUs for Stable Diffusion?

Stable Diffusion XL needs at least 8 GB VRAM for basic generation and 12+ GB for comfortable batch workflows. Instead of buying an expensive GPU, rent one by the hour.

Choosing the Right GPU Tier

TierGPUVRAMSDXL PerformancePrice
StarterRTX 306012 GB~15 sec/image$0.40/hr
StandardRTX 309024 GB~8 sec/image$0.60/hr
ProRTX 409024 GB~4 sec/image$0.90/hr
PowerA600048 GB~6 sec/image$1.20/hr

For most AI art workflows, the Standard tier (RTX 3090) is the sweet spot — fast enough for real-time iteration at a great price.

Setting Up ComfyUI

After deploying your machine and connecting via RDP:

# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# Download SDXL model
cd models/checkpoints
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# Start ComfyUI
cd ../..
python main.py

Open a browser on the remote machine to http://localhost:8188 and you're ready to generate.

Workflow Tips

  • Use the SDXL refiner for higher quality outputs — the two-pass workflow gives sharper details
  • Batch generate — queue 20-50 images at different prompts and cherry-pick the best
  • ControlNet works great on 24 GB VRAM — load OpenPose, Canny, and Depth simultaneously
  • Save your models to a persistent directory so you don't re-download on restart
  • LoRA Training on TurboGPU

    The Power tier (A6000, 48 GB VRAM) is perfect for training custom LoRA models:

    # Install kohya_ss training toolkit
    git clone https://github.com/kohya-ss/sd-scripts.git
    cd sd-scripts
    pip install -r requirements.txt
    
    # Train a LoRA with 20-30 images in ~20 minutes
    accelerate launch train_network.py \
      --pretrained_model_name_or_path="sd_xl_base_1.0.safetensors" \
      --train_data_dir="./training_images" \
      --output_dir="./output" \
      --network_module=networks.lora \
      --max_train_epochs=10

    Cost for a Typical Session

    A 2-hour AI art session on the Standard tier costs $1.20. You can generate hundreds of images in that time. Compare that to buying an RTX 3090 for $800+.

    Start generating →

    Ready to Try TurboGPU?

    Deploy a cloud GPU in under 60 seconds. No commitments.

    GET STARTED FREE