Skip to main content

Local Models

Run AI models on your own hardware - no API costs, but hardware required.

Cost Comparison

ApproachHardware CostRunning CostQuality
API (GPT-4o)$0$2.50/1M inExcellent
API (Llama 3.1 70B)$0$0.65/1M inGreat
Local (RTX 4090)$1,600ElectricityGood
Local (A100)$15,000ElectricityGreat

Hardware Requirements

Consumer GPUs

GPUVRAMModels$/hour
RTX 409024GBLlama 3.1 8B, Mistral 7B$0.15
RTX 309024GBLlama 3.1 8B, Mistral 7B$0.12

Data Center GPUs

GPUVRAMModels$/hour
A10G24GBLlama 70B (4-bit)$1.01
A10040GBLlama 70B$2.93
A10080GBLlama 70B (full)$3.50

Local Inference Tools

ToolLicenseBest For
OllamaMITEasy setup, Mac/Linux
LM StudioProprietaryWindows GUI
llama.cppMITCPU + GPU, quantized
vLLMApache 2.0Production, fast

Model Quantization

FormatSize ReductionQuality Loss
FP161x (baseline)None
INT850%Minimal
INT425%Small

Operating Costs

Home User

ComponentCost
RTX 4090 power$0.05-0.15/kWh
8 hours/day$1.20-3.60/month

Server (A100)

ComponentCost
A100 (1 hr)$3.50
100 hours/month$350
Electricity$50
Monthly$400

Comparison: API vs Local

FactorAPILocal
Upfront cost$0$1,600-20,000
Monthly cost$50-500$15-400
Setup time5 min1-10 hrs
PrivacyData to vendor100% local

Best For

  1. Privacy-sensitive data
  2. High-volume applications
  3. Offline capabilities
  4. Model experimentation
  5. Cost-effective at scale