Programmatic AIinfrastructure.
Your agent provisions a private AI model, gets an endpoint and an API key, and starts making inference calls — in minutes, with no human involvement after key generation. OpenAI-compatible so any framework drops in without changes.
From zero to first inference
Five steps. Pick your language. Replace YOUR_KEY with the agent management key from the dashboard.
- 1. Provision a private model
Pick any model from the catalog. Returns immediately with status: provisioning. Save the instance_id.
curl -X POST https://api.auxen.ai/v1/provision \ -H "Authorization: Bearer YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"llama3.1-8b","capacity":1}' - 2. Wait for ready status
Poll the instance until status flips to 'running' (~3 minutes). Save the endpoint and api_key — those are the inference creds.
INST=inst_xxxxxxxxxxxx while :; do RESP=$(curl -s -H "Authorization: Bearer YOUR_KEY" \ https://api.auxen.ai/v1/instances/$INST) STATUS=$(echo "$RESP" | jq -r '.data.instance.status') [ "$STATUS" = "running" ] && break sleep 15 done INSTKEY=$(echo "$RESP" | jq -r '.data.instance.api_key') - 3. Make an inference call
Use the per-instance auxk_ key (NOT your management key) against /v1/{instanceId}/chat or the OpenAI-compatible /v1/chat/completions.
curl -X POST https://api.auxen.ai/v1/$INST/chat \ -H "Authorization: Bearer $INSTKEY" \ -H "Content-Type: application/json" \ -d '{"message":"Summarize the moon landing in one sentence."}' - 4. Handle the response
Auxen-native shape returns {message, usage, response_time_ms}. OpenAI shape returns the standard chat.completion envelope.
# {"success":true,"data":{"message":"...","model":"llama3.1:8b","usage":{...},"response_time_ms":1247}} - 5. Destroy when done
Stops billing immediately. Per-minute charges halt the moment the instance status flips to destroyed.
curl -X DELETE -H "Authorization: Bearer YOUR_KEY" \ https://api.auxen.ai/v1/instances/$INST
Pay-As-You-Go, by the minute
One billing model. Add USD credits, deploy an instance, and we deduct per-minute against actual GPU runtime. Idle time costs nothing. Destroy when done. No billing parameter on /v1/provision — just pick a model and (optionally) a capacity multiplier.
$0.05/hour. Small models only.
- · Multi-tenant GPU pool
- · No capacity scaling
- · Cheapest entry point — breakeven vs Dedicated Small at ~17 hrs/month
- · Set
"tier":"shared"on provision (Small models only)
Private GPU. Capacity scales with you.
- · Small $0.15/hr · Medium $0.20/hr · Large $0.65/hr · XL $2.85/hr (1×)
- · Capacity 1× / 2× / 4× / 8× (1.0× / 1.8× / 3.3× / 6.0× cost)
- · Same model, multiple GPUs behind a load balancer
- · Pass
"capacity": 4on provision or call/scalelater
MCP and agent frameworks
Auxen exposes a Model Context Protocol server at https://api.auxen.ai/mcp so Claude Desktop, Cursor, and any other MCP-compatible client can use Auxen as a native tool. The MCP server wraps the same eight management capabilities as the REST API.
{
"mcpServers": {
"auxen": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"https://api.auxen.ai/mcp",
"--header",
"Authorization: Bearer auxen_live_xxx"
]
}
}
}Uses the mcp-remote stdio-bridge proxy (auto-installed on first launch) to connect Claude Desktop to Auxen's remote HTTP MCP server. Fully quit and relaunch Claude Desktop to load the change.
For LangChain, AutoGen, or any framework that consumes OpenAI-shaped APIs, point the SDK's base_url at your instance endpoint and use the auxk_ instance key as the API key. See the OpenAI compatibility section below.
Already using OpenAI? Switch in one line.
Auxen serves a strict OpenAI-compatible /v1/chat/completions route on every instance. Streaming, tool calls, and standard token-usage envelopes all work the way OpenAI clients expect.
from openai import OpenAI
client = OpenAI(
base_url="https://api.openai.com/v1",
api_key="sk-..."
)
resp = client.chat.completions.create(
model="gpt-4o",
messages=[{"role":"user","content":"Hi"}]
)from openai import OpenAI
client = OpenAI(
base_url="https://api.auxen.ai/v1/inst_xxx/v1",
api_key="auxk_..."
)
resp = client.chat.completions.create(
model="auxen", # ignored
messages=[{"role":"user","content":"Hi"}]
)The model parameter is ignored — the model is determined by which instance you're hitting. Pass anything; we keep it in the response for clients that introspect.
Errors and rate limits
Every error returns a stable code you can branch on, plus a human message and a docs link. Rate limit: 1,000 requests/minute per key. On 429s, read the Retry-After header.
| Code | HTTP | Meaning |
|---|---|---|
| API_KEY_REQUIRED | 401 | No `Authorization: Bearer …` header. |
| INVALID_API_KEY | 401 | Key revoked, deleted, or wrong environment. |
| INSUFFICIENT_CREDITS | 402 | USD credit balance is too low to start the instance. |
| RATE_LIMIT_EXCEEDED | 429 | Over 1,000 requests/min. Read `Retry-After`. |
| MODEL_NOT_FOUND | 404 | Model id doesn't exist or is inactive. |
| INSTANCE_NOT_FOUND | 404 | Instance id not yours, or already destroyed. |
| INSTANCE_NOT_RUNNING | 409 | Instance is not in `running` state — poll status first. |
| INVALID_BILLING_MODE | 400 | `billing` parameter is no longer supported — Auxen is Pay-As-You-Go only. |
| INVALID_MODEL | 400 | Unknown or inactive model id. |
| INVALID_CAPACITY | 400 | `capacity` must be 1, 2, 4, or 8. |
| PROVISIONING_FAILED | 502 | Upstream GPU provisioning error. Retry with backoff. |
Build agents on infrastructure you own.
Your private model, your endpoint, per-minute pricing — and an API surface designed for programmatic use from day one.