Why Fine-Tune?
Pre-trained models are powerful but generic. Fine-tuning on your own data creates a model that understands your domain — your writing style, your product catalog, your medical terminology.
Choosing a GPU for Training
| Task | Min VRAM | Recommended Tier | Est. Time |
|---|---|---|---|
| LoRA on 7B model | 12 GB | Starter ($0.40/hr) | 30 min |
| LoRA on 13B model | 16 GB | Standard ($0.60/hr) | 1 hour |
| QLoRA on 70B model | 24 GB | Standard ($0.60/hr) | 3 hours |
| Full fine-tune 7B | 40 GB | Power ($1.20/hr) | 4 hours |
Step 1: Deploy a TurboGPU Instance
Choose the Power tier (A6000, 48 GB VRAM) for maximum flexibility. Connect via SSH:
ssh -p <port> user@<your-ip>
Step 2: Set Up the Environment
# Create a virtual environment python -m venv llm-train source llm-train/bin/activate # Install dependencies pip install torch transformers datasets peft accelerate bitsandbytes
Step 3: Prepare Your Dataset
from datasets import load_dataset
# Load your custom dataset (JSON format)
dataset = load_dataset("json", data_files="training_data.jsonl")
# Format: {"instruction": "...", "input": "...", "output": "..."}Step 4: LoRA Training
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Meta-Llama-3-8B",
load_in_4bit=True, # QLoRA: fits in 12 GB VRAM
)
lora_config = LoraConfig(
r=16, lora_alpha=32,
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
)
model = get_peft_model(model, lora_config)
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
args=TrainingArguments(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=4,
learning_rate=2e-4,
),
)
trainer.train()
trainer.save_model("./my-custom-model")Cost Breakdown
Training a LoRA adapter on Llama 3 8B with 10K examples:
Compare that to fine-tuning via OpenAI's API: $25+ for similar data volumes.
