Open-source AI models, free inference, and the largest ML community.
Inference API (Pay-per-use)
Language Models
| Model | Input /1M | Output /1M |
|---|
| Llama 3.1 70B | $0.65 | $2.20 |
| Llama 3.1 8B | $0.22 | $0.88 |
| Mistral 7B | $0.24 | $0.24 |
| Mixtral 8x7B | $0.24 | $0.24 |
Embeddings
| Model | Per 1M Tokens |
|---|
| Instructor | Free |
| BGE | Free |
Inference Endpoints (Dedicated)
| Instance | Price/hr | GPU |
|---|
| small | $0.06 | T4 |
| medium | $0.35 | A10G |
| large | $1.01 | A100 |
| xlarge | $2.93 | A100 80GB |
Free Tier
| Benefit | Limit |
|---|
| Requests/day | 1,000 |
| Rate limit | 30 RPM |
| Models | Popular ones |
Pro Tier
| Benefit | Price |
|---|
| Monthly | $9 |
| Requests/day | 50,000 |
| Rate limit | 200 RPM |
Cost Examples
API-Based (1M requests)
| Model | Tokens/req | Cost |
|---|
| Llama 3.1 8B | 500 | $110 |
| GPT-4o-mini | 500 | $75 |
Self-Hosted (1M requests)
| Instance | Hours | Cost |
|---|
| A10G | 100 | $101 |
| A100 | 50 | $146 |
Open Source Models (Free)
| Model | Type | License |
|---|
| Llama 3.1 8B | Chat | Llama 3.1 |
| Mistral 7B | Chat | Apache 2.0 |
| Gemma 2B | Chat | Gemma |
| Whisper Medium | Audio | Apache 2.0 |
Comparison
| Feature | Free Tier | Pro | Enterprise |
|---|
| Models | Popular | All | All + custom |
| Rate limit | 30 RPM | 200 RPM | Custom |
| Support | Community | Email | Dedicated |
Best For
- Experimentation and prototyping
- Open-source model evaluation
- Budget deployments
- Model fine-tuning
- Community models