Model directory

Browse models across providers with detailed specs, pricing, and performance signals.

Model directory

Browse models across providers with detailed specs, pricing, and performance signals.

Model directory

Browse models across providers with detailed specs, pricing, and performance signals.

Filters

0
80
0
120

Features

Tool Calling
Reasoning
JSON Mode
Vision
Audio Input
Audio Output
PDF Input
Image Edit

Filters

Filters

Models
Showing 0 models

anthropic

chat

claude-3-5-haiku-20241022

Anthropic fastest model, delivering advanced coding, tool use, and reasoning at an accessible price

Input

$0.80

/M tokens

Output

$4.00

/M tokens

anthropic

chat

claude-3-7-sonnet-20250219

Anthropic most intelligent model to date and the first hybrid reasoning model on the market.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

anthropic

chat

claude-3-7-sonnet-latest

Anthropic most intelligent model to date and the first hybrid reasoning model on the market.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

anthropic

chat

claude-3.5-sonnet

First release in the forthcoming Claude 3.5 model family. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

anthropic

chat

claude-haiku-4-5-20251001

Claude Haiku 4.5 is the fastest, most cost-efficient model in the Claude family, delivering enhanced performance for coding, tool use, and reasoning tasks.

Input

$1.00

/M tokens

Output

$5.00

/M tokens

anthropic

chat

claude-opus-4-20250514

Claude Opus 4 is the world's best coding model, with sustained performance on complex, long-running tasks and agent workflows.

Input

$15.00

/M tokens

Output

$75.00

/M tokens

anthropic

chat

claude-sonnet-4-5-20250929

Claude Sonnet 4.5 is a significant upgrade to Claude Sonnet 4, delivering superior coding and reasoning while responding more precisely to your instructions.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

anthropic

chat

claude-3-haiku

Updated version of Claude 2 with improved accuracy

Input

$0.25

/M tokens

Output

$1.25

/M tokens

anthropic

chat

claude-3-opus

Most powerful model for highly complex tasks

Input

$0.00

/M tokens

Output

$0.00

/M tokens

anthropic

chat

claude-opus-4-5-20251101

Claude Opus 4.5 is Anthropic's most intelligent model, offering the highest level of performance on complex tasks requiring deep reasoning, nuanced understanding, and sophisticated analysis.

Input

$5.00

/M tokens

Output

$25.00

/M tokens

anthropic

chat

claude-opus-4.1-20250805

Claude Opus 4.1 advances coding performance to 74.5% on SWE-bench Verified with improved in-depth research and data analysis skills. Notable performance gains in multi-file code refactoring.

Input

$15.00

/M tokens

Output

$75.00

/M tokens

anthropic

chat

claude-sonnet-4-20250514

Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

aws

chat

anthropic/claude-3-5-sonnet-20241022

Anthropic's most powerful AI model. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet. Claude 3.5 Sonnet shows us the frontier of what's possible with generative AI.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

aws

chat

anthropic/claude-3-7-sonnet-20250219

Anthropic most intelligent model to date and the first hybrid reasoning model on the market.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

aws

chat

anthropic/claude-3.5-sonnet

Anthropic's most powerful AI model. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet. Claude 3.5 Sonnet shows us the frontier of what's possible with generative AI.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

aws

completion

amazon/titan-text-express

This has a context length of up to 8,000 tokens, making it well-suited for a wide range of advanced general language, as well as support within Retrieval Augmented Generation (RAG

Input

$0.00

/M tokens

Output

$0.00

/M tokens

aws

chat

anthropic/claude-opus-4-5-20251101

Claude Opus 4.5 is Anthropic's most intelligent model, offering the highest level of performance on complex tasks requiring deep reasoning, nuanced understanding, and sophisticated analysis.

Input

$5.00

/M tokens

Output

$25.00

/M tokens

aws

chat

anthropic/claude-sonnet-4-20250514

Claude Sonnet 4 is Anthropic's next-generation AI model with enhanced reasoning capabilities, improved performance on complex tasks, and more nuanced understanding. This model represents a significant advancement in conversational AI with superior context handling and response quality.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

aws

chat

anthropic/claude-sonnet-4-5

Claude Sonnet 4.5 is a significant upgrade to Claude Sonnet 4, delivering superior coding and reasoning while responding more precisely to your instructions.

Input

$3.30

/M tokens

Output

$16.50

/M tokens

azure

chat

azure/o3-mini

o3-mini is our newest small reasoning model, providing high intelligence at the same cost and latency targets of o1-mini. o3-mini supports key developer features, like Structured Outputs, function calling, and Batch API.

Input

$1.10

/M tokens

Output

$4.40

/M tokens

azure

chat

gpt-4.1

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$2.00

/M tokens

Output

$8.00

/M tokens

azure

chat

gpt-4o

Groundbreaking multimodal model integrates text, vision, and audio capabilities, setting a new standard for generative and conversational AI experiences

Input

$5.00

/M tokens

Output

$20.00

/M tokens

azure

chat

gpt-4o-mini

GPT-4o mini surpasses GPT-3.5 Turbo and other small models on academic benchmarks across both textual intelligence and multimodal reasoning, and supports the same range of languages as GPT-4o

Input

$0.15

/M tokens

Output

$0.60

/M tokens

azure

chat

gpt-5-chat

GPT-5-chat is designed for advanced, natural, multimodal, and context-aware conversations for enterprise applications. It excels at maintaining long-form dialogue context and understanding nuanced communication.

Input

$1.25

/M tokens

Output

$10.00

/M tokens

azure

chat

gpt-5-mini

GPT-5-mini is a lightweight version of GPT-5 optimized for cost-sensitive applications. It maintains strong reasoning capabilities while being more economical for high-volume use cases.

Input

$0.25

/M tokens

Output

$2.00

/M tokens

azure

chat

gpt-5-nano

GPT-5-nano is optimized for speed and ideal for applications requiring low latency. It delivers rapid responses while maintaining quality for real-time applications.

Input

$0.05

/M tokens

Output

$0.40

/M tokens

azure

chat

o1

The o1 reasoning model is designed to solve hard problems across domains. o1-mini is a faster and more affordable reasoning model, but we recommend using the newer o3-mini model that features higher intelligence at the same latency and price as o1-mini.

Input

$15.00

/M tokens

Output

$60.00

/M tokens

azure

chat

o1-mini

The o1 reasoning model is designed to solve hard problems across domains. o1-mini is a faster and more affordable reasoning model, but we recommend using the newer o3-mini model that features higher intelligence at the same latency and price as o1-mini.

Input

$1.10

/M tokens

Output

$4.40

/M tokens

azure

embedding

text-embedding-ada-002

Most capable 2nd generation embedding model, replacing 16 first generation models

Input

$0.10

/M tokens

Output

$0.00

/M tokens

azure

completion

azure/llama-2-13b

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

azure/llama-2-13b-chat

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

completion

azure/llama-2-70b

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

azure/llama-2-70b-chat

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

completion

azure/llama-2-7b

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

azure/llama-2-7b-chat

Meta has developed and publicly released the Llama 2 family of large language models (LLMs), a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama-2-Chat, are optimized for dialogue use cases. Llama-2-Chat models outperform open-source chat models on most benchmarks we tested, and in our human evaluations for helpfulness and safety, are on par with some popular closed-source models like ChatGPT and PaLM. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama-2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

azure/llama-3-70B-Instruct

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

azure/llama-3-8B-Instruct

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

azure/llama-3.1-405B-Instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Input

$5.33

/M tokens

Output

$16.00

/M tokens

azure

chat

azure/llama-3.1-70B-Instruct

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

azure/llama-3.1-8B

The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models in 8B, 70B and 405B sizes (text in/text out). The Llama 3.1 instruction tuned text only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

Input

$0.30

/M tokens

Output

$0.61

/M tokens

azure

chat

gpt-4-1106-preview

More capable than any GPT-3.5 model, able to do more complex tasks, and optimized for chat. Will be updated with our latest model iteration 2 weeks after it is released.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

vision

gpt-4-turbo-vision

With 128k context, fresher knowledge and the broadest set of capabilities, GPT-4 Turbo is more powerful than GPT-4 and offered at a lower price.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

azure

chat

gpt-4.1-mini

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$0.40

/M tokens

Output

$1.60

/M tokens

azure

chat

gpt-4.1-nano

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$0.10

/M tokens

Output

$0.40

/M tokens

azure

embedding

text-embedding-3-small

New next generation larger embedding model and creates embeddings with up to 3072 dimensions

Input

$0.02

/M tokens

Output

$0.00

/M tokens

bytedance

image

SeedEdit-3.0-I2I-250628

SeedEdit-3.0-I2I-250628 is an advanced image-to-image editing model from ByteDance that enables precise image modifications based on input images and text prompts.

Input

$30.00

/M tokens

Output

$0.00

/M tokens

bytedance

image

Seedream-3.0-T2I-250415

Seedream-3.0-T2I-250415 is a state-of-the-art text-to-image generation model from ByteDance that creates high-resolution, photorealistic images from text prompts with bilingual support.

Input

$30.00

/M tokens

Output

$0.00

/M tokens

bytedance

image

Seedream-4-0-250828

Seedream-4.0-250828 is a unified image generation and editing model from ByteDance that delivers up to 4K resolution images with faster inference. It handles complex multimodal tasks including knowledge-based generation, reasoning, and reference consistency.

Input

$30.00

/M tokens

Output

$0.00

/M tokens

bytedance

image

Seedream-4-5-251128

Seedream-4-5-251128 is a unified image generation and editing model from ByteDance that delivers up to 4K resolution images with faster inference. It handles complex multimodal tasks including knowledge-based generation, reasoning, and reference consistency.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

cerebras

chat

cerebras/deepseek-r1-distill-llama-70b

DeepSeek R1 Distill Llama 70B model optimized for fast inference on Cerebras hardware. Supports up to 65,536 tokens context length.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

cerebras

chat

cerebras/gpt-oss-120b

OpenAI GPT OSS 120B model optimized for fast inference on Cerebras hardware. Supports up to 8,192 tokens context length.

Input

$0.35

/M tokens

Output

$0.75

/M tokens

cerebras

chat

cerebras/llama-3.3-70b

Llama 3.3 70B model optimized for fast inference on Cerebras hardware. Supports up to 128,000 tokens context length.

Input

$0.85

/M tokens

Output

$1.20

/M tokens

cerebras

chat

cerebras/llama3.1-8b

Llama 3.1 8B model optimized for fast inference on Cerebras hardware. Supports up to 8,192 tokens context length.

Input

$0.10

/M tokens

Output

$0.10

/M tokens

cerebras

chat

cerebras/qwen-3-235b-instruct

Qwen 3 235B Instruct 2507 model optimized for fast inference on Cerebras hardware. Only non-thinking mode.

Input

$0.60

/M tokens

Output

$1.20

/M tokens

cerebras

chat

cerebras/qwen-3-32b

Qwen 3 32B model optimized for fast inference on Cerebras hardware. Supports up to 16,382 tokens context length.

Input

$0.60

/M tokens

Output

$0.60

/M tokens

cohere

embedding

embed-english-light-v3.0

Light English embedding model, version 3.0, with 384-dimensional embeddings

Input

$0.10

/M tokens

Output

$0.00

/M tokens

cohere

embedding

embed-english-v3.0

English embedding model, version 3.0, with 1024-dimensional embeddings

Input

$0.10

/M tokens

Output

$0.00

/M tokens

cohere

embedding

embed-multilingual-light-v3.0

Light multilingual embedding model, version 3.0, with 384-dimensional embeddings

Input

$0.10

/M tokens

Output

$0.00

/M tokens

cohere

embedding

embed-multilingual-v3.0

Multilingual embedding model, version 3.0, with 1024-dimensional embeddings

Input

$0.10

/M tokens

Output

$0.00

/M tokens

cohere

rerank

rerank-english-v3.0

English reranking model, version 3.0

Input

$2.00

/M tokens

Output

$0.00

/M tokens

cohere

rerank

rerank-multilingual-v3.0

Multilingual reranking model, version 3.0

Input

$2.00

/M tokens

Output

$0.00

/M tokens

cohere

rerank

rerank-v3.5

A model for documents and semi-structured data (JSON). State-of-the-art performance in English and non-English languages; supports the same languages as embed-multilingual-v3.0. This model has a context length of 4096 tokens

Input

$2.00

/M tokens

Output

$0.00

/M tokens

cohere

chat

command-a-03-2025

Command A is our most performant model to date, excelling at tool use, agents, retrieval augmented generation (RAG), and multilingual use cases. Command A has a context length of 256K, only requires two GPUs to run, and has 150% higher throughput compared to Command R+ 08-2024.

Input

$1.00

/M tokens

Output

$5.00

/M tokens

cohere

chat

command-a-reasoning-08-2025

Command A Reasoning is Cohere's first reasoning model, able to "think" before generating an output in a way that allows it to perform well in certain kinds of nuanced problem-solving and agent-based tasks in 23 languages.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

cohere

chat

command-a-translate-08-2025

Command A Translate is Cohere's state of the art machine translation model, excelling at a variety of translation tasks on 23 languages: English, French, Spanish, Italian, German, Portuguese, Japanese, Korean, Chinese, Arabic, Russian, Polish, Turkish, Vietnamese, Dutch, Czech, Indonesian, Ukrainian, Romanian, Greek, Hindi, Hebrew, Persian.

Input

$1.50

/M tokens

Output

$6.00

/M tokens

cohere

chat

command-a-vision-07-2025

Command A Vision is our first model capable of processing images, excelling in enterprise use cases such as analyzing charts, graphs, and diagrams, table understanding, OCR, document Q&A, and object detection. It officially supports English, Portuguese, Italian, French, German, and Spanish.

Input

$2.00

/M tokens

Output

$8.00

/M tokens

cohere

chat

command-r-08-2024

command-r-08-2024 is an update of the Command R model, delivered in August 2024. An instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models.

Input

$0.15

/M tokens

Output

$0.60

/M tokens

cohere

chat

command-r-plus-08-2024

command-r-plus-08-2024 is an update of the Command R+ model, delivered in August 2024. An instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context. Best suited for complex RAG workflows and multi-step tool use.

Input

$2.50

/M tokens

Output

$10.00

/M tokens

cohere

chat

command-r7b-12-2024

command-r7b-12-2024 is a small, fast update delivered in December 2024. It excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps.

Input

$0.15

/M tokens

Output

$0.60

/M tokens

cohere

embedding

embed-v4.0

English embedding model, version 3.0, with 1024-dimensional embeddings

Input

$0.12

/M tokens

Output

$0.00

/M tokens

cohere

rerank

rerank-v4.0-fast

High performance, lowest latency reranker. AI search foundation model for enhancing the relevance of information surfaced within search and RAG systems. Supports 100+ languages with a context length of 32,768 tokens.

Input

$2.00

/M tokens

Output

$0.00

/M tokens

cohere

rerank

rerank-v4.0-pro

State of the art performance with low latency reranker. AI search foundation model for enhancing the relevance of information surfaced within search and RAG systems. Supports 100+ languages with a context length of 32,768 tokens.

Input

$2.50

/M tokens

Output

$0.00

/M tokens

contextualai

rerank

ctxl-rerank-v1-instruct

Contextual AI Rerank v1 instruction-following reranker. First generation instruction-following reranking model.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

contextualai

rerank

ctxl-rerank-v2-instruct-multilingual

Contextual AI Rerank v2 multilingual instruction-following reranker. Best-in-class performance with support for custom instructions.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

contextualai

rerank

ctxl-rerank-v2-instruct-multilingual-mini

Contextual AI Rerank v2 multilingual mini instruction-following reranker. Efficient smaller model with excellent performance.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

deepseek

chat

deepseek-chat

DeepSeek-V3, a powerful Mixture-of-Experts language model with 671B parameters, excelling at general chat, coding, and complex reasoning tasks.

Input

$0.28

/M tokens

Output

$0.42

/M tokens

deepseek

chat

deepseek-reasoner

DeepSeek-R1, a reasoning model with chain-of-thought capabilities, excelling at complex reasoning, mathematics, and coding tasks with step-by-step problem solving.

Input

$0.28

/M tokens

Output

$0.42

/M tokens

elevenlabs

tts

eleven_flash_v2

Ultra-fast model optimized for real-time use (~75ms†) - English only

Input

$15.00

/M tokens

Output

$0.00

/M tokens

elevenlabs

tts

eleven_flash_v2_5

Ultra-fast model optimized for real-time use (~75ms†)

Input

$15.00

/M tokens

Output

$0.00

/M tokens

elevenlabs

tts

eleven_multilingual_v2

Our most lifelike model with rich emotional expression

Input

$15.00

/M tokens

Output

$0.00

/M tokens

elevenlabs

tts

eleven_turbo_v2_5

Our high quality, low latency model in 32 languages. Best for developer use cases where speed matters and you need non-English languages.

Input

$15.00

/M tokens

Output

$0.00

/M tokens

elevenlabs

stt

scribe_v1

Scribe can transcribe speech into text and supports a wide range of languages for multilingual transcription

Input

$15.00

/M tokens

Output

$0.00

/M tokens

fal

image

flux-1

FLUX.1, a 12B parameters text-to-image model with outstanding aesthetics.

Input

$0.00

/M tokens

Output

$25.00

/M tokens

fal

image

flux-pro/new

FLUX.1 [pro] new is an accelerated version of FLUX.1 [pro], maintaining professional-grade image quality while delivering significantly faster generation speeds.

Input

$0.00

/M tokens

Output

$50.00

/M tokens

fal

image

flux/schnell

FLUX.1 [schnell] is a 12 billion parameter flow transformer that generates high-quality images from text in 1 to 4 steps, suitable for personal and commercial use.

Input

$0.00

/M tokens

Output

$3.00

/M tokens

fal

image

Gemini 2.5 Flash Image

State-of-the-art image generation and editing model

Input

$0.00

/M tokens

Output

$40.00

/M tokens

google

chat

anthropic/claude-3-5-sonnet-v2@20241022

Anthropic's most powerful AI model. Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming competitor models and Claude 3 Opus on a wide range of evaluations, with the speed and cost of our mid-tier model, Claude 3 Sonnet. Claude 3.5 Sonnet shows us the frontier of what's possible with generative AI.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

google

chat

anthropic/claude-3-7-sonnet@20250219

Claude 3.7 Sonnet is Anthropic's most intelligent model to date and the first Claude model to offer extended thinking—the ability to solve complex problems with careful, step-by-step reasoning. Anthropic is the first AI lab to introduce a single model where users can balance speed and quality by choosing between standard thinking for near-instant responses or extended thinking for advanced reasoning.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

google

chat

anthropic/claude-3.5-haiku

Claude 3.5 Haiku, the next generation of Anthropic's fastest and most cost-effective model, is optimal for use cases where speed and affordability matter. It improves on its predecessor across every skill set

Input

$1.00

/M tokens

Output

$5.00

/M tokens

google

chat

anthropic/claude-haiku-4-5

Claude Haiku 4.5 is the fastest, most cost-efficient model in the Claude family, delivering enhanced performance for coding, tool use, and reasoning tasks.

Input

$1.00

/M tokens

Output

$5.00

/M tokens

google

chat

anthropic/claude-sonnet-4-5@20250929

Claude Sonnet 4.5 is a significant upgrade to Claude Sonnet 4, delivering superior coding and reasoning while responding more precisely to your instructions.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

google

vision

gemini-2.5-pro-preview-03-25

Gemini 2.0 Flash delivers enhanced performance, doubling the speed of 1.5 Pro while supporting multimodal inputs and outputs, including text, images, and multilingual text-to-speech. It also enables seamless tool integration for advanced functionality like code execution and third-party functions.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

google

chat

gemini-3-flash-preview

Our most intelligent model built for speed, combining frontier intelligence with superior search and grounding.

Input

$0.50

/M tokens

Output

$3.00

/M tokens

google

chat

gemini-3-pro-preview

Gemini 3 Pro Preview - Advanced reasoning and capabilities for complex tasks. Preview version.

Input

$2.00

/M tokens

Output

$12.00

/M tokens

google

embedding

gemini-embedding-001

Latest Gemini embedding model with enhanced capabilities. Recommended replacement for text-embedding-004.

Input

$0.15

/M tokens

Output

$0.00

/M tokens

google

image

imagen 3.0

Imagen is a text-to-image AI model developed by Google AI that can generate high-quality, realistic images from text descriptions. It is the first model of its kind to achieve state-of-the-art performance on a variety of image generation tasks.

Input

$40.00

/M tokens

Output

$0.00

/M tokens

google

image

imagen 3.0-fast

Imagen is a text-to-image AI model developed by Google AI that can generate high-quality, realistic images from text descriptions. It is the first model of its kind to achieve state-of-the-art performance on a variety of image generation tasks.

Input

$20.00

/M tokens

Output

$0.00

/M tokens

google

image

imagen-2.0

Imagen is a text-to-image AI model developed by Google AI that can generate high-quality, realistic images from text descriptions. It is the first model of its kind to achieve state-of-the-art performance on a variety of image generation tasks.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

google

chat

anthropic/claude-opus-4-1@20250805

Claude Opus 4.1 advances coding performance to 74.5% on SWE-bench Verified with improved in-depth research and data analysis skills. Notable performance gains in multi-file code refactoring.

Input

$15.00

/M tokens

Output

$75.00

/M tokens

google

chat

anthropic/claude-opus-4-5@20251101

Claude Opus 4.5 is Anthropic's most intelligent model, offering the highest level of performance on complex tasks requiring deep reasoning, nuanced understanding, and sophisticated analysis.

Input

$5.00

/M tokens

Output

$25.00

/M tokens

google

chat

anthropic/claude-opus-4@20250514

Claude Opus 4 is the world's best coding model, with sustained performance on complex, long-running tasks and agent workflows.

Input

$15.00

/M tokens

Output

$75.00

/M tokens

google

chat

anthropic/claude-sonnet-4@20250514

Claude Sonnet 4 is a significant upgrade to Claude Sonnet 3.7, delivering superior coding and reasoning while responding more precisely to your instructions.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

google

chat

gemini-2.0-flash

Gemini 2.0 Flash - Fast multimodal model with enhanced reasoning capabilities and broad input/output support.

Input

$0.10

/M tokens

Output

$0.40

/M tokens

google

chat

gemini-2.0-flash-001

Gemini 2.0 Flash delivers enhanced performance, doubling the speed of 1.5 Pro while supporting multimodal inputs and outputs, including text, images, and multilingual text-to-speech. It also enables seamless tool integration for advanced functionality like code execution and third-party functions.

Input

$0.15

/M tokens

Output

$0.60

/M tokens

google

chat

gemini-2.0-flash-lite-001

Gemini 2.0 Flash Lite - Ultra-fast, lightweight model optimized for high-volume, low-latency tasks. Ideal for simple operations at scale.

Input

$0.02

/M tokens

Output

$0.08

/M tokens

google

chat

gemini-2.5-flash

Gemini 2.5 Flash - Fast and efficient model for high-volume tasks with enhanced capabilities. Generally available version.

Input

$0.30

/M tokens

Output

$2.50

/M tokens

google

chat

gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite - Our most cost-efficient and fastest 2.5 model yet. Optimized for high-volume, low-latency tasks.

Input

$0.10

/M tokens

Output

$0.40

/M tokens

google

chat

gemini-2.5-flash-lite-preview-09-2025

Gemini 2.5 Flash Lite Preview (09/2025) - Ultra-fast lightweight model optimized for speed with comprehensive multimodal support.

Input

$0.10

/M tokens

Output

$0.40

/M tokens

google

chat

gemini-2.5-flash-preview-09-2025

Gemini 2.5 Flash Preview (09/2025) - High-speed multimodal model with enhanced performance for fast inference and reasoning.

Input

$0.30

/M tokens

Output

$2.50

/M tokens

google

chat

gemini-2.5-pro

Gemini 2.5 Pro - Advanced reasoning and capabilities for complex tasks. Generally available version.

Input

$1.25

/M tokens

Output

$10.00

/M tokens

google

image

imagen-4.0-fast-generate-001

Imagen 4.0 Fast - Optimized for speed with lower latency image generation. Ideal for real-time applications and high-volume tasks while maintaining quality output.

Input

$20.00

/M tokens

Output

$0.00

/M tokens

google

image

imagen-4.0-generate-001

Imagen 4.0 Standard - Text-to-image generation with unprecedented photorealism, advanced prompt understanding, and fine-grained control. Supports various aspect ratios and multilingual prompts.

Input

$40.00

/M tokens

Output

$0.00

/M tokens

google

image

imagen-4.0-ultra-generate-001

Imagen 4.0 Ultra - Highest quality image generation with maximum detail and fidelity. Optimized for professional and artistic use cases requiring exceptional output.

Input

$60.00

/M tokens

Output

$0.00

/M tokens

google

chat

llama-3.3-70b-instruct-maas

The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out).

Input

$0.72

/M tokens

Output

$0.72

/M tokens

google

chat

llama-4-maverick-17b-128e-instruct-maas

The Meta Llama 4 Maverick 17B-128E is a high-performance multilingual large language model with 128 experts for enhanced reasoning capabilities.

Input

$0.30

/M tokens

Output

$1.11

/M tokens

google

chat

llama-4-scout-17b-16e-instruct-maas

The Meta Llama 4 Scout 17B-16E is a high-performance multilingual large language model with 16 experts optimized for efficient inference.

Input

$0.24

/M tokens

Output

$0.24

/M tokens

google

chat

mistralai/mistral-small-2503@001

Mistral Small 3.1 (25.03) is the enhanced version of Mistral Small 3, featuring multimodal capabilities and an extended context length of up to 128k. It can process and understand visual inputs as well as long documents. Designed with low-latency applications in mind and delivers best-in-class efficiency for tasks such as programming, mathematical reasoning, document understanding, dialogue, visual understanding, and summarization.

Input

$1.00

/M tokens

Output

$3.00

/M tokens

google

embedding

text-embedding-005

Latest text embedding model with improved performance and accuracy for semantic search and similarity tasks.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

google

embedding

text-multilingual-embedding-002

Multilingual text embedding model supporting 100+ languages for cross-lingual semantic search and similarity.

Input

$0.01

/M tokens

Output

$0.00

/M tokens

google-ai

chat

gemini-2.0-flash

Multimodal understanding. Realtime streaming. Native tool use.

Input

$0.10

/M tokens

Output

$0.40

/M tokens

google-ai

chat

gemini-2.5-flash

Gemini 2.5 Flash - Fast and efficient model for high-volume tasks with enhanced capabilities. Generally available version.

Input

$0.30

/M tokens

Output

$2.50

/M tokens

google-ai

chat

gemini-2.5-flash-lite

Gemini 2.5 Flash-Lite - Our most cost-efficient and fastest 2.5 model yet. Optimized for high-volume, low-latency tasks.

Input

$0.10

/M tokens

Output

$0.40

/M tokens

google-ai

chat

gemini-2.5-pro

Gemini 2.5 Pro - Advanced reasoning and capabilities for complex tasks. Generally available version.

Input

$1.25

/M tokens

Output

$10.00

/M tokens

google-ai

vision

gemini-1.5-pro-exp-0801

Gemini 1.5 Pro Experiment 0801 delivers enhanced performance with experimental features.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

google-ai

vision

gemini-1.5-pro-exp-0827

Gemini 1.5 Pro Experiment 0827 delivers enhanced performance with experimental features.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

google-ai

chat

gemini-2.0-flash-001

Multimodal understanding. Realtime streaming. Native tool use.

Input

$0.10

/M tokens

Output

$0.40

/M tokens

google-ai

chat

gemini-2.0-flash-lite-001

Gemini 2.0 Flash Lite - Ultra-fast, lightweight model optimized for high-volume, low-latency tasks. Ideal for simple operations at scale.

Input

$0.02

/M tokens

Output

$0.08

/M tokens

google-ai

chat

gemini-2.0-flash-lite-001

Gemini most cost-efficient model yet

Input

$0.08

/M tokens

Output

$0.30

/M tokens

google-ai

chat

gemini-2.0-flash-thinking-exp-01-21

Multimodal understanding. Reasoning. Coding

Input

$0.00

/M tokens

Output

$0.00

/M tokens

google-ai

chat

gemini-3-flash-preview

Our most intelligent model built for speed, combining frontier intelligence with superior search and grounding.

Input

$0.50

/M tokens

Output

$3.00

/M tokens

google-ai

chat

gemini-3-pro-preview

Gemini 3 Pro Preview - Advanced reasoning and capabilities for complex tasks. Preview version.

Input

$2.00

/M tokens

Output

$12.00

/M tokens

groq

chat

llama-3.3-70b-versatile

Meta Llama 3.3 70B versatile model with extended 128k context window, offering state-of-the-art performance for a wide range of tasks.

Input

$0.59

/M tokens

Output

$0.79

/M tokens

groq

chat

llama-prompt-guard-2-86m

Llama Prompt Guard 2 is Meta's specialized classifier model designed to detect and prevent prompt attacks in LLM applications. Part of Meta's Purple Llama initiative, this 86M parameter model identifies malicious inputs like prompt injections and jailbreaks across multiple languages. The model provides efficient, real-time protection while maintaining low latency and compute costs.

Input

$0.04

/M tokens

Output

$0.04

/M tokens

groq

chat

meta-llama/llama-4-scout-17b-16e-instruct

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Input

$0.11

/M tokens

Output

$0.34

/M tokens

groq

chat

MoonshotAI Kimi K2 instruct 0905

Kimi K2 0905 is Moonshot AI's latest improved version of the Kimi K2 model, featuring enhanced coding capabilities with superior frontend development and tool calling performance.

Input

$1.00

/M tokens

Output

$3.00

/M tokens

groq

chat

openai/gpt-oss-120b

OpenAI GPT-OSS 120B model with 131k context window and ~500 tokens/sec speed. Supports reasoning, browser search, and code execution. Preview model not recommended for production.

Input

$0.02

/M tokens

Output

$0.58

/M tokens

groq

chat

openai/gpt-oss-20b

OpenAI GPT-OSS 20B model with 131k context window and ~1000 tokens/sec speed. Supports reasoning, browser search, and code execution. Preview model not recommended for production.

Input

$0.01

/M tokens

Output

$0.25

/M tokens

groq

chat

gemma2-9b-it

Gemma 2 9B is a lightweight, state-of-the-art model from Google, offering strong performance for a variety of text generation tasks with an 8k context window.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

groq

chat

llama-3.1-70b-versatile

Meta Llama 3.1 70B versatile model with extended context window

Input

$0.00

/M tokens

Output

$0.00

/M tokens

groq

chat

meta-llama/llama-4-maverick-17b-128e-instruct

Llama 4 Maverick is a multimodal, multilingual, mixture of experts model with 17Bx128E parameters. Optimized for agentic use cases with enhanced reasoning capabilities.

Input

$0.20

/M tokens

Output

$0.60

/M tokens

groq

chat

meta-llama/llama-guard-4-12b

Llama Guard 4 12B is Meta's latest safety classifier model designed to detect harmful content in both user inputs and model outputs. Supports 128k context window.

Input

$0.20

/M tokens

Output

$0.20

/M tokens

groq

chat

mistral-saba-24b

Mistral Saba 24B is a specialized language model for the Middle East & South Asia, offering strong performance with a 32k context window.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

groq

chat

qwen3-32b-131k

Qwen3 32B is a large language model with 131k context window, offering strong multilingual capabilities and reasoning performance.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

groq

stt

whisper-large-v3

OpenAI Whisper Large v3 model for speech-to-text transcription. State-of-the-art performance on multilingual transcription tasks.

Input

$1.11

/M tokens

Output

$0.00

/M tokens

groq

stt

whisper-large-v3-turbo

OpenAI Whisper Large v3 Turbo model. Optimized for faster transcription with minimal quality loss compared to the standard v3 model.

Input

$0.04

/M tokens

Output

$0.00

/M tokens

jina

embedding

jina-clip-v1

Multimodal embedding models for images and English text

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

embedding

jina-clip-v2

Multilingual Multimodal embeddings for texts and images

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

embedding

jina-embeddings-v2-base-code

Optimized for code and docstring search

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

embedding

jina-embeddings-v3

Frontier multilingual embedding model with SOTA performance

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

rerank

jina-reranker-v1-base-en

Our first reranker model maximizing search and RAG relevance

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

embedding

jina-embeddings-v2-base-de

German-English bilingual embeddings with SOTA performance

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

embedding

jina-embeddings-v2-base-en

On par with OpenAI's text-embedding-ada002

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

embedding

jina-embeddings-v2-base-es

Spanish-English bilingual embeddings with SOTA performance

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

embedding

jina-embeddings-v2-base-zh

Chinese-English bilingual embeddings with SOTA performance

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

rerank

jina-reranker-v1-tiny-en

The fastest reranker model, best suited for ranking a large number of documents reliably

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

rerank

jina-reranker-v1-turbo-en

The best combination of fast inference speed and accurate relevance scores

Input

$0.00

/M tokens

Output

$0.02

/M tokens

jina

rerank

jina-reranker-v2-base-multilingual

The latest and best reranker model with multilingual, function calling and code search support.

Input

$0.00

/M tokens

Output

$0.02

/M tokens

leonardoai

image

leonardo-diffusion-xl

The next phase of the core Leonardo model. Stunning outputs, even with short prompts.

Input

$80.00

/M tokens

Output

$80.00

/M tokens

leonardoai

image

leonardo-kino-xl

A model with a strong focus on cinematic outputs. Excels at wider aspect ratios

Input

$80.00

/M tokens

Output

$80.00

/M tokens

leonardoai

image

leonardo-lightning-xl

High-speed generalist image gen model. Great at everything from photorealism to painterly styles.

Input

$80.00

/M tokens

Output

$80.00

/M tokens

leonardoai

image

leonardo-vision-xl

A versatile model that excels at realism and photography. Better results with longer prompts.

Input

$80.00

/M tokens

Output

$80.00

/M tokens

mistral

chat

Magistral Medium 1.2

Our frontier-class multimodal reasoning model update of September 2025.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

mistral

chat

magistral-medium-2509

Our frontier-class reasoning model release candidate September 2025.

Input

$2.00

/M tokens

Output

$5.00

/M tokens

mistral

chat

magistral-small-2509

Our efficient reasoning model released September 2025.

Input

$0.50

/M tokens

Output

$1.50

/M tokens

mistral

chat

ministral-14b-2512

The largest model in the Ministral 3 family with state-of-the-art capabilities comparable to Mistral Small 3.2 24B.

Input

$0.20

/M tokens

Output

$0.20

/M tokens

mistral

chat

ministral-3b-2410

World's best edge model.

Input

$0.15

/M tokens

Output

$0.15

/M tokens

mistral

chat

ministral-3b-2512

The smallest and most efficient model in the Ministral 3 family, optimized for edge deployment.

Input

$0.10

/M tokens

Output

$0.10

/M tokens

mistral

chat

ministral-8b-2410

Powerful edge model with extremely high performance/price ratio.

Input

$0.15

/M tokens

Output

$0.15

/M tokens

mistral

embedding

mistral-embed

Our state-of-the-art semantic for extracting representation of code extracts

Input

$0.10

/M tokens

Output

$0.10

/M tokens

mistral

chat

mistral-large-2411

Our top-tier reasoning model for high-complexity tasks with the lastest version released November 2024.

Input

$2.00

/M tokens

Output

$6.00

/M tokens

mistral

chat

mistral-large-2512

Flagship open-weight multimodal model with 41B active parameters and 675B total parameters. Our top-tier reasoning model for high-complexity tasks.

Input

$0.50

/M tokens

Output

$1.50

/M tokens

mistral

chat

mistral-medium-2505

Our frontier-class multimodal model released May 2025.

Input

$0.40

/M tokens

Output

$2.00

/M tokens

mistral

chat

mistral-medium-2508

Update on Mistral Medium 3 with improved capabilities.

Input

$0.40

/M tokens

Output

$2.00

/M tokens

mistral

chat

mistral-medium-latest

Mistral Medium balances state-of-the-art performance with 8X lower cost and simpler deployability. Designed for professional use cases, especially coding and multimodal understanding. Performs at or above 90% of Claude Sonnet 3.7 on benchmarks.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

mistral

chat

mistral-small-2506

Our latest enterprise-grade small model with the latest version released June 2025.

Input

$0.10

/M tokens

Output

$0.30

/M tokens

mistral

chat

mistral-small-latest

Mistral Small is a leader in the small models category, offering excellent performance for general-purpose text tasks. Latest version with 128k context window and multimodal capabilities including image understanding.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

mistral

chat

pixtral-12b-2409

A 12B model with image understanding capabilities in addition to text.

Input

$0.15

/M tokens

Output

$0.15

/M tokens

mistral

chat

pixtral-large-2411

Official pixtral-large-2411 Mistral AI model

Input

$2.00

/M tokens

Output

$6.00

/M tokens

moonshotai

chat

kimi-k2-0711-preview

Context length 128k, MoE architecture base model with 1T total parameters, 32B activated parameters. Features powerful code and Agent capabilities.

Input

$0.60

/M tokens

Output

$2.50

/M tokens

moonshotai

chat

kimi-k2-0905-preview

Context length 256k, enhanced Agentic Coding capabilities, front-end code aesthetics and practicality, and context understanding capabilities based on the 0711 version.

Input

$0.60

/M tokens

Output

$2.50

/M tokens

moonshotai

chat

kimi-k2-thinking

K2 Long-term thinking model, supports 256k context, supports multi-step tool usage and reasoning, excels at solving more complex problems.

Input

$0.60

/M tokens

Output

$2.50

/M tokens

moonshotai

chat

kimi-k2-thinking-turbo

K2 Long-term thinking model high-speed version, supports 256k context, excels at deep reasoning, output speed increased to 60-100 tokens per second.

Input

$1.15

/M tokens

Output

$8.00

/M tokens

moonshotai

chat

kimi-k2-turbo-preview

High-speed version of K2, benchmarking against the latest version (0905). Output speed increased to 60-100 tokens per second, context length 256k.

Input

$1.15

/M tokens

Output

$8.00

/M tokens

openai

chat

gpt-4.1

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$2.00

/M tokens

Output

$8.00

/M tokens

openai

chat

gpt-4.1-mini

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$0.40

/M tokens

Output

$1.60

/M tokens

openai

chat

gpt-4.1-nano

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$0.10

/M tokens

Output

$0.40

/M tokens

openai

chat

gpt-4o

GPT-4o ("o" for "omni") is our most advanced model. It is multimodal (accepting text or image inputs and outputting text), and it has the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. Additionally, GPT-4o has the best vision and performance across non-English languages.

Input

$2.50

/M tokens

Output

$10.00

/M tokens

openai

chat

gpt-4o-mini

GPT-4o mini is our most cost-efficient small model that's smarter and cheaper than GPT-3.5 Turbo, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.

Input

$0.15

/M tokens

Output

$0.60

/M tokens

openai

chat

gpt-5

GPT-5 is designed for advanced logic and multi-step tasks. It excels at complex reasoning, planning, and solving problems that require multiple interconnected steps.

Input

$1.25

/M tokens

Output

$10.00

/M tokens

openai

chat

gpt-5-chat-latest

GPT-5 Chat points to the GPT-5 snapshot currently used in ChatGPT. We recommend GPT-5 for most API usage, but feel free to use this GPT-5 Chat model to test our latest improvements for chat use cases.

Input

$1.25

/M tokens

Output

$10.00

/M tokens

openai

chat

gpt-5-mini

GPT-5-mini is a lightweight version of GPT-5 optimized for cost-sensitive applications. It maintains strong reasoning capabilities while being more economical for high-volume use cases.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

openai

chat

gpt-5-nano

GPT-5-nano is optimized for speed and ideal for applications requiring low latency. It delivers rapid responses while maintaining quality for real-time applications.

Input

$0.05

/M tokens

Output

$0.40

/M tokens

openai

chat

gpt-5.1

The best model for coding and agentic tasks with configurable reasoning effort.

Input

$1.25

/M tokens

Output

$10.00

/M tokens

openai

chat

gpt-5.1-chat-latest

GPT-5.1 Chat points to the GPT-5.1 snapshot currently used in ChatGPT.

Input

$1.25

/M tokens

Output

$10.00

/M tokens

openai

embedding

text-embedding-3-large

New next generation larger embedding model and creates embeddings with up to 3072 dimensions

Input

$0.13

/M tokens

Output

$0.00

/M tokens

openai

embedding

text-embedding-3-small

New next generation larger embedding model and creates embeddings with up to 3072 dimensions

Input

$0.02

/M tokens

Output

$0.00

/M tokens

openai

embedding

text-embedding-ada-002

Most capable 2nd generation embedding model, replacing 16 first generation models

Input

$0.10

/M tokens

Output

$0.05

/M tokens

openai

chat

chatgpt-4o-latest

GPT-4o ("o" for "omni") is our most advanced model. It is multimodal (accepting text or image inputs and outputting text), and it has the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. Additionally, GPT-4o has the best vision and performance across non-English languages.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

openai

image

dall-e-2

DALL·E 2 can create original, realistic images and art from a text description. It can combine concepts, attributes, and styles.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

openai

chat

gpt-3.5-turbo

Most capable GPT-3.5 model and optimized for chat at 1/10th the cost of text-davinci-003

Input

$0.50

/M tokens

Output

$1.50

/M tokens

openai

chat

gpt-3.5-turbo-0125

This model will also have various improvements including higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls

Input

$0.50

/M tokens

Output

$1.50

/M tokens

openai

chat

gpt-3.5-turbo-16k

Same capabilities as the standard gpt-3.5-turbo model but with 4 times the context.

Input

$3.00

/M tokens

Output

$4.00

/M tokens

openai

completion

gpt-3.5-turbo-instruct

Similar capabilities as text-davinci-003 but compatible with legacy Completions endpoint and not Chat Completions.

Input

$1.50

/M tokens

Output

$2.00

/M tokens

openai

chat

gpt-4-0125-preview

This model completes tasks like code generation more thoroughly than the previous preview model and is intended to reduce cases of "laziness" where the model doesn't complete a task. The new model also includes the fix for the bug impacting non-English UTF-8 generations.

Input

$10.00

/M tokens

Output

$30.00

/M tokens

openai

chat

gpt-4-turbo

With 128k context, fresher knowledge and the broadest set of capabilities, GPT-4 Turbo is more powerful than GPT-4 and offered at a lower price. Will always point to GPT-4 Turbo preview model

Input

$10.00

/M tokens

Output

$30.00

/M tokens

openai

chat

gpt-4-turbo-2024-04-09

With 128k context, fresher knowledge and the broadest set of capabilities, GPT-4 Turbo is more powerful than GPT-4 and offered at a lower price. Will always point to GPT-4 Turbo preview model

Input

$10.00

/M tokens

Output

$30.00

/M tokens

openai

chat

gpt-4-vision-preview

GPT-4 with Vision, sometimes referred to as GPT-4V or gpt-4-vision-preview in the API, allows the model to take in images and answer questions about them

Input

$0.00

/M tokens

Output

$0.00

/M tokens

openai

chat

gpt-4.1-2025-04-14

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$2.00

/M tokens

Output

$8.00

/M tokens

openai

chat

gpt-4.1-mini-2025-04-14

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$0.40

/M tokens

Output

$1.60

/M tokens

openai

chat

gpt-4.1-nano-2025-04-14

Outperform GPT-4o and GPT-4o mini across the board, with major gains in coding and instruction following. They also have larger context windows—supporting up to 1 million tokens of context—and are able to better use that context with improved long-context comprehension

Input

$0.10

/M tokens

Output

$0.40

/M tokens

openai

chat

gpt-4o-2024-05-13

GPT-4o ("o" for "omni") is our most advanced model. It is multimodal (accepting text or image inputs and outputting text), and it has the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. Additionally, GPT-4o has the best vision and performance across non-English languages.

Input

$5.00

/M tokens

Output

$15.00

/M tokens

openai

chat

gpt-4o-2024-08-06

GPT-4o ("o" for "omni") is our most advanced model. It is multimodal (accepting text or image inputs and outputting text), and it has the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. Additionally, GPT-4o has the best vision and performance across non-English languages.

Input

$2.50

/M tokens

Output

$10.00

/M tokens

openai

chat

gpt-4o-mini-2024-07-18

GPT-4o mini is our most cost-efficient small model that's smarter and cheaper than GPT-3.5 Turbo, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.

Input

$0.15

/M tokens

Output

$0.60

/M tokens

openai

stt

gpt-4o-mini-transcribe

Fast and affordable GPT-4o mini speech-to-text transcription model. Optimized for speed and cost efficiency while maintaining high transcription quality.

Input

$1.25

/M tokens

Output

$5.00

/M tokens

openai

tts

gpt-4o-mini-tts

Fast and affordable GPT-4o mini text-to-speech model. Optimized for speed and cost efficiency while delivering high-quality audio output.

Input

$0.60

/M tokens

Output

$12.00

/M tokens

openai

stt

gpt-4o-transcribe

GPT-4o speech-to-text transcription model with support for audio and text inputs. Provides high-quality transcription with optional prompts for context.

Input

$2.50

/M tokens

Output

$10.00

/M tokens

openai

chat

gpt-5-pro

GPT-5 Pro is the most advanced reasoning model with highest intelligence. Optimized for complex multi-step reasoning tasks requiring deep analysis and planning.

Input

$15.00

/M tokens

Output

$120.00

/M tokens

openai

chat

gpt-5.2

The best model for coding and agentic tasks across industries.

Input

$1.75

/M tokens

Output

$14.00

/M tokens

openai

image

gpt-image-1

New state-of-the-art image generation model. It is a natively multimodal language model that accepts both text and image inputs, and produces image outputs.

Input

$5.00

/M tokens

Output

$40.00

/M tokens

openai

image

gpt-image-1.5

Our latest image generation model, with better instruction following and adherence to prompts.

Input

$5.00

/M tokens

Output

$10.00

/M tokens

openai

chat

o1

Reasoning model designed to solve hard problems across domains.

Input

$15.00

/M tokens

Output

$60.00

/M tokens

openai

chat

o1_2024-12-17

Reasoning model designed to solve hard problems across domains.

Input

$15.00

/M tokens

Output

$60.00

/M tokens

openai

chat

o1-preview

o1-preview is thew new reasoning model for complex tasks that require broad general knowledge. The model has 128K context and an October 2023 knowledge cutoff.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

openai

chat

o1-preview-2024-09-12

o1-preview is thew new reasoning model for complex tasks that require broad general knowledge. The model has 128K context and an October 2023 knowledge cutoff.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

openai

chat

o3

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images.

Input

$2.00

/M tokens

Output

$8.00

/M tokens

openai

chat

o3-2025-04-16

o3 is a well-rounded and powerful model across domains. It sets a new standard for math, science, coding, and visual reasoning tasks. It also excels at technical writing and instruction-following. Use it to think through multi-step problems that involve analysis across text, code, and images.

Input

$2.00

/M tokens

Output

$8.00

/M tokens

openai

chat

o3-mini

Faster and cheaper reasoning model particularly good at coding, math, and science.

Input

$1.10

/M tokens

Output

$4.40

/M tokens

openai

chat

o3-mini-2025-01-31

Faster and cheaper reasoning model particularly good at coding, math, and science.

Input

$1.10

/M tokens

Output

$4.40

/M tokens

openai

chat

o4-mini

o4-mini is our latest small o-series model. It's optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks.

Input

$1.10

/M tokens

Output

$4.40

/M tokens

openai

chat

o4-mini-2025-04-16

o4-mini is our latest small o-series model. It's optimized for fast, effective reasoning with exceptionally efficient performance in coding and visual tasks.

Input

$1.10

/M tokens

Output

$4.40

/M tokens

openai

tts

tts-1

The latest text to speech model, optimized for speed.

Input

$15.00

/M tokens

Output

$0.00

/M tokens

openai

tts

tts-1-hd

The latest text to speech model, optimized for quality.

Input

$15.00

/M tokens

Output

$0.00

/M tokens

openai

stt

whisper

Whisper can transcribe speech into text and translate many languages into English

Input

$6.00

/M tokens

Output

$0.00

/M tokens

perplexity

chat

sonar

Standard Sonar model with 128k context window, optimized for general-purpose chat completions with web search capabilities.

Input

$1.00

/M tokens

Output

$1.00

/M tokens

perplexity

chat

sonar-deep-research

Specialized Sonar model optimized for in-depth research tasks, providing comprehensive information retrieval and analysis with 128k context window.

Input

$2.00

/M tokens

Output

$8.00

/M tokens

perplexity

chat

sonar-pro

Enhanced Sonar model with 200k context window, providing superior performance for complex tasks with web search capabilities.

Input

$3.00

/M tokens

Output

$15.00

/M tokens

perplexity

chat

sonar-reasoning

Sonar model with enhanced reasoning capabilities, providing chain-of-thought responses with 128k context window.

Input

$1.00

/M tokens

Output

$5.00

/M tokens

perplexity

chat

sonar-reasoning-pro

Advanced Sonar model with superior reasoning capabilities, providing detailed chain-of-thought responses with 128k context window.

Input

$2.00

/M tokens

Output

$8.00

/M tokens

togetherai

chat

DeepSeek R1

Performance on par with OpenAI-o1

Input

$7.00

/M tokens

Output

$7.00

/M tokens

togetherai

chat

DeepSeek V3

DeepSeek-V3 0324 achieves a significant breakthrough in inference speed over previous models. It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.

Input

$1.25

/M tokens

Output

$1.25

/M tokens

togetherai

chat

DeepSeek V3.1

DeepSeek-V3.1 is the latest version of the hybrid (thinking and non-thinking) model from DeepSeek. It tops the leaderboard among open-source models and rivals the most advanced closed-source models globally.

Input

$0.60

/M tokens

Output

$1.70

/M tokens

togetherai

chat

meta-llama/Llama-3.3-70B-Instruct-Turbo

Llama 3.3 is a text-only 70B instruction-tuned model that provides enhanced performance relative to Llama 3.1 70B–and to Llama 3.2 90B when used for text-only applications. Moreover, for some applications, Llama 3.3 70B approaches the performance of Llama 3.1 405B. Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.

Input

$0.88

/M tokens

Output

$0.88

/M tokens

togetherai

chat

meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Input

$0.27

/M tokens

Output

$0.85

/M tokens

togetherai

chat

meta-llama/Llama-4-Scout-17B-16E-Instruct

The Llama 4 collection of models are natively multimodal AI models that enable text and multimodal experiences. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

Input

$0.18

/M tokens

Output

$0.59

/M tokens

togetherai

chat

meta-llama/Meta-Llama-Guard-4-12B

Meta's latest Multimodal safety model, classifying text and images for safe LLM prompts and responses.

Input

$0.20

/M tokens

Output

$0.20

/M tokens

zai

image

cogView-4-250304

CogView-4 is a bilingual (Chinese/English) text-to-image generation model. Supports any-length text prompts, variable resolutions, and text-within-image generation. Ranked first in DPG-Bench benchmark for text-to-image models.

Input

$0.00

/M tokens

Output

$0.00

/M tokens

zai

chat

glm-4.5

GLM-4.5 is a 355B parameter Mixture-of-Experts model with 32B active parameters. Features 128K context window, hybrid reasoning modes (Thinking/Non-Thinking), and generation speeds over 100 tokens per second. First among open-source models globally.

Input

$0.60

/M tokens

Output

$2.20

/M tokens

zai

chat

glm-4.5v

GLM-4.5V is a 106B parameter multimodal model with 12B activated parameters. Supports vision, image, video, text, and file inputs with SOTA performance across 42 visual multimodal benchmarks. Includes "Thinking Mode" for balancing speed and reasoning depth.

Input

$0.60

/M tokens

Output

$1.80

/M tokens

zai

chat

glm-4.6

GLM-4.6 is an advanced AI model with 200K context window and 128K maximum output tokens. Features enhanced reasoning, superior coding capabilities, and improved multilingual translation. Performance comparable to Claude Sonnet 4/4.6 with over 30% more token efficiency.

Input

$0.60

/M tokens

Output

$2.20

/M tokens

Get your API key and start routing in minutes.

Get your API key and start routing in minutes.

Get your API key and start routing in minutes.