# Herma AI — Intelligent LLM Router > Herma is an AI gateway that routes queries to the optimal model for each task. Same quality as frontier models, 60-90% cheaper. ## What Herma Does - Routes AI queries to the best model based on task complexity - OpenAI-compatible API — change two lines of code to switch - Maintains frontier-level quality while reducing costs by 60-90% - Supports all major models: Claude, GPT-4, Gemini, DeepSeek, Mistral ## Pricing - $2 per million input tokens - $8 per million output tokens - No subscriptions, no minimums - Pay-as-you-go with free $1.00 starting credit ## How Routing Works - Classifies queries into categories: coding, analysis, creative, math, factual, chat - Estimates difficulty: easy, medium, hard - Routes to the cheapest model that maintains quality for that specific task - Hard tasks (system design, formal verification) always use frontier models - Simple tasks (factual lookups, basic coding) use cost-effective models ## Quality Validation - Benchmarked against Claude Opus 4.6 on 8 established benchmarks - MMLU: 98.2% of Opus quality - ARC-Challenge: 100.7% of Opus quality - GSM8K: 100.0% of Opus quality - HumanEval+: 102.1% of Opus quality - MBPP+: 105.8% of Opus quality ## API Compatibility - Drop-in replacement for OpenAI SDK - Works with LangChain, Vercel AI SDK, LlamaIndex - Streaming support - Tool/function calling support ## Environment Variables - HERMA_API_KEY — your API key (starts with hk-) - HERMA_BASE_URL — base URL (default: https://api.hermaai.com/v1) ## Quick Start Python: ```python import os from openai import OpenAI client = OpenAI( base_url="https://api.hermaai.com/v1", api_key=os.environ["HERMA_API_KEY"] ) response = client.chat.completions.create( model="herma-auto", messages=[{"role": "user", "content": "Hello!"}] ) ``` Node.js: ```javascript import OpenAI from "openai"; const client = new OpenAI({ apiKey: process.env.HERMA_API_KEY, baseURL: "https://api.hermaai.com/v1" }); const response = await client.chat.completions.create({ model: "herma-auto", messages: [{ role: "user", content: "Hello!" }] }); ``` LangChain (Python): ```python from langchain_openai import ChatOpenAI llm = ChatOpenAI( model="herma-auto", openai_api_key=os.environ["HERMA_API_KEY"], openai_api_base="https://api.hermaai.com/v1" ) ``` Vercel AI SDK: ```javascript import { createOpenAI } from "@ai-sdk/openai"; import { generateText } from "ai"; const herma = createOpenAI({ apiKey: process.env.HERMA_API_KEY, baseURL: "https://api.hermaai.com/v1" }); const { text } = await generateText({ model: herma("herma-auto"), prompt: "Hello!" }); ``` LlamaIndex (Python): ```python from llama_index.llms.openai import OpenAI as LlamaOpenAI llm = LlamaOpenAI(model="herma-auto", api_key=os.environ["HERMA_API_KEY"], api_base="https://api.hermaai.com/v1") ``` ## API Endpoints - POST /v1/chat/completions — Chat completion requests (OpenAI-compatible) - GET /v1/models — List available models (herma-auto + upstream) - POST /v1/classify — Test the router: see classification, model selection, savings (free, no auth) ## Open-Source - herma-eval: Benchmark toolkit for LLM routers (pip install herma-eval) - 7 benchmarks: GSM8K, HumanEval+, MMLU, MBPP+, ARC-Challenge, LiveCodeBench, Aider Polyglot - https://github.com/Nikobar5/herma-eval ## AI Coding Tool Integration Drop a configuration file into your project root so your AI coding assistant automatically uses Herma for all LLM calls: - Claude Code (CLAUDE.md): https://hermaai.com/integration/CLAUDE.md - Cursor (.cursorrules): https://hermaai.com/integration/cursor-rules.txt - Windsurf (.windsurfrules): https://hermaai.com/integration/windsurf-rules.txt - Codex / Devin / other agents (AGENTS.md): https://hermaai.com/integration/AGENTS.md - Environment variables (.env.example): https://hermaai.com/integration/.env.example ## Links - Website: https://hermaai.com - Documentation: https://hermaai.com/docs - Blog: https://hermaai.com/blog - Benchmarking Methodology: https://hermaai.com/blog/how-we-benchmark - Try it free: https://hermaai.com/demo