Skip to main content

Hugging Face

Open-source AI models, free inference, and the largest ML community.

Inference API (Pay-per-use)

Language Models

Model	Input /1M	Output /1M
Llama 3.1 70B	$0.65	$2.20
Llama 3.1 8B	$0.22	$0.88
Mistral 7B	$0.24	$0.24
Mixtral 8x7B	$0.24	$0.24

Embeddings

Model	Per 1M Tokens
Instructor	Free
BGE	Free

Inference Endpoints (Dedicated)

Instance	Price/hr	GPU
small	$0.06	T4
medium	$0.35	A10G
large	$1.01	A100
xlarge	$2.93	A100 80GB

Free Tier

Benefit	Limit
Requests/day	1,000
Rate limit	30 RPM
Models	Popular ones

Pro Tier

Benefit	Price
Monthly	$9
Requests/day	50,000
Rate limit	200 RPM

Cost Examples

API-Based (1M requests)

Model	Tokens/req	Cost
Llama 3.1 8B	500	$110
GPT-4o-mini	500	$75

Self-Hosted (1M requests)

Instance	Hours	Cost
A10G	100	$101
A100	50	$146

Open Source Models (Free)

Model	Type	License
Llama 3.1 8B	Chat	Llama 3.1
Mistral 7B	Chat	Apache 2.0
Gemma 2B	Chat	Gemma
Whisper Medium	Audio	Apache 2.0

Comparison

Feature	Free Tier	Pro	Enterprise
Models	Popular	All	All + custom
Rate limit	30 RPM	200 RPM	Custom
Support	Community	Email	Dedicated

Best For

Experimentation and prototyping
Open-source model evaluation
Budget deployments
Model fine-tuning
Community models

Inference API (Pay-per-use)
- Language Models
- Embeddings
Inference Endpoints (Dedicated)
Free Tier
Pro Tier
Cost Examples
- API-Based (1M requests)
- Self-Hosted (1M requests)
Open Source Models (Free)
Comparison
Best For