Run Llama 3, Mixtral, and GPT-J Locally on Cloud GPUs

Self-Host Open-Source LLMs

Running your own LLM gives you full control, privacy, and no per-token costs. TurboGPU makes it easy — spin up a GPU, install Ollama, and start chatting.

VRAM Requirements

Model	Parameters	Min VRAM	Recommended Tier
Llama 3 8B	8B	8 GB	Starter (RTX 3060)
Llama 3 70B (Q4)	70B	40 GB	Power (A6000)
Mixtral 8x7B	46.7B	24 GB	Standard (RTX 3090)
Code Llama 34B	34B	24 GB	Standard (RTX 3090)
Phi-3 Medium	14B	12 GB	Starter (RTX 3060)

Quick Start with Ollama

# SSH into your TurboGPU instance
ssh -p <port> user@<ip>

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull and run Llama 3
ollama run llama3

# Or run Mixtral for code generation
ollama run mixtral

That's it. You're running a local LLM with full GPU acceleration in under 2 minutes.

For Production: vLLM

If you need an OpenAI-compatible API server:

pip install vllm

# Start an API server
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Meta-Llama-3-8B-Instruct \
  --port 8000

# Now you can call it like OpenAI
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Meta-Llama-3-8B-Instruct",
       "messages": [{"role": "user", "content": "Hello!"}]}'

Benchmark Results

Model	Tier	Tokens/sec	Latency (first token)
Llama 3 8B	Starter	45 tok/s	0.3s
Llama 3 8B	Standard	78 tok/s	0.2s
Mixtral 8x7B	Standard	32 tok/s	0.5s
Llama 3 70B Q4	Power	22 tok/s	0.8s

Cost Comparison vs API Providers

Running Llama 3 8B for 8 hours on the Starter tier: $3.20. In that time you can generate ~1.3 million tokens. At OpenAI's GPT-4o pricing, that would cost $6.50+. And you get full privacy.

Deploy your LLM →

Run Llama 3, Mixtral, and GPT-J Locally on Cloud GPUs

Self-Host Open-Source LLMs

VRAM Requirements

Quick Start with Ollama

For Production: vLLM

Benchmark Results

Cost Comparison vs API Providers

Ready to Try TurboGPU?

More Articles

Cloud Gaming on an RTX 4090: How to Play AAA Titles at 4K from Any Device

Running Stable Diffusion XL on Cloud GPUs: A Complete Guide

RTX 3060 vs 3090 vs 4090 vs A6000: Cloud GPU Benchmark Showdown