Question 1

What is Herma?

Accepted Answer

Herma is an intelligent AI gateway that gives you unified access to all major AI models — GPT-4o, Claude, Gemini, Mistral, DeepSeek, and more — through a single API and chat interface. Herma routes your requests to the best model for the job, tracks your usage and costs, and remembers context across conversations.

Question 2

How much does Herma cost?

Accepted Answer

Herma charges $2 per million input tokens and $8 per million output tokens. There are no subscriptions, no minimums, and no hidden fees. New accounts start with $1.00 in free credits.

Question 3

How much can I save with intelligent routing?

Accepted Answer

On average, Herma's router saves 60-90% compared to always using the most expensive model. For simple questions and routine tasks, the router selects models that cost a fraction of frontier pricing while maintaining the same quality. For complex tasks, it automatically routes to the best available model.

Question 4

How does smart model routing work?

Accepted Answer

When you send a message, Herma's routing system classifies your query by category (coding, analysis, creative, math, factual) and difficulty (easy, medium, hard), then selects the optimal model. Hard tasks like system design always use frontier models. Simple tasks use cost-effective models that match frontier quality.

Question 5

Can I use Herma with my existing code?

Accepted Answer

Yes. Herma provides an OpenAI-compatible API, so you can switch by changing just two lines of code — the base URL and your API key. Any application that works with the OpenAI API works with Herma out of the box.