Auxiliary Intelligence · For Agents

Programmatic AIinfrastructure.

Your agent provisions a private AI model, gets an endpoint and an API key, and starts making inference calls — in minutes, with no human involvement after key generation. OpenAI-compatible so any framework drops in without changes.

Get your API key Full API reference →

🔒 Per-instance API keys·⚡ ~3 min to a live endpoint·↔ OpenAI-compatible·📊 1,000 req/min/key

Quickstart

From zero to first inference

Five steps. Pick your language. Replace YOUR_KEY with the agent management key from the dashboard.

1. Provision a private model

Pick any model from the catalog. Returns immediately with status: provisioning. Save the instance_id.

curl -X POST https://api.auxen.ai/v1/provision \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"llama3.1-8b","capacity":1}'

2. Wait for ready status

Poll the instance until status flips to 'running' (~3 minutes). Save the endpoint and api_key — those are the inference creds.

INST=inst_xxxxxxxxxxxx
while :; do
  RESP=$(curl -s -H "Authorization: Bearer YOUR_KEY" \
    https://api.auxen.ai/v1/instances/$INST)
  STATUS=$(echo "$RESP" | jq -r '.data.instance.status')
  [ "$STATUS" = "running" ] && break
  sleep 15
done
INSTKEY=$(echo "$RESP" | jq -r '.data.instance.api_key')

3. Make an inference call

Use the per-instance auxk_ key (NOT your management key) against /v1/{instanceId}/chat or the OpenAI-compatible /v1/chat/completions.

curl -X POST https://api.auxen.ai/v1/$INST/chat \
  -H "Authorization: Bearer $INSTKEY" \
  -H "Content-Type: application/json" \
  -d '{"message":"Summarize the moon landing in one sentence."}'

4. Handle the response
Auxen-native shape returns {message, usage, response_time_ms}. OpenAI shape returns the standard chat.completion envelope.
```
# {"success":true,"data":{"message":"...","model":"llama3.1:8b","usage":{...},"response_time_ms":1247}}
```
5. Destroy when done
Stops billing immediately. Per-minute charges halt the moment the instance status flips to destroyed.
```
curl -X DELETE -H "Authorization: Bearer YOUR_KEY" \
  https://api.auxen.ai/v1/instances/$INST
```

Pay-As-You-Go, by the minute

One billing model. Add USD credits, deploy an instance, and we deduct per-minute against actual GPU runtime. Idle time costs nothing. Destroy when done. No billing parameter on /v1/provision — just pick a model and (optionally) a capacity multiplier.

Shared

$0.05/hour. Small models only.

· Multi-tenant GPU pool
· No capacity scaling
· Cheapest entry point — breakeven vs Dedicated Small at ~17 hrs/month
· Set "tier":"shared" on provision (Small models only)

Best for: low-volume internal agents, experiments, anything where you're hours/month rather than hours/day.

Dedicated

Private GPU. Capacity scales with you.

· Small $0.15/hr · Medium $0.20/hr · Large $0.65/hr · XL $2.85/hr (1×)
· Capacity 1× / 2× / 4× / 8× (1.0× / 1.8× / 3.3× / 6.0× cost)
· Same model, multiple GPUs behind a load balancer
· Pass "capacity": 4 on provision or call /scale later

Best for: production endpoints, persistent agents, anything that needs throughput, isolation, or fine-tuning headroom.

MCP and agent frameworks

Auxen exposes a Model Context Protocol server at https://api.auxen.ai/mcp so Claude Desktop, Cursor, and any other MCP-compatible client can use Auxen as a native tool. The MCP server wraps the same eight management capabilities as the REST API.

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json)

{
  "mcpServers": {
    "auxen": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://api.auxen.ai/mcp",
        "--header",
        "Authorization: Bearer auxen_live_xxx"
      ]
    }
  }
}

Uses the mcp-remote stdio-bridge proxy (auto-installed on first launch) to connect Claude Desktop to Auxen's remote HTTP MCP server. Fully quit and relaunch Claude Desktop to load the change.

For LangChain, AutoGen, or any framework that consumes OpenAI-shaped APIs, point the SDK's base_url at your instance endpoint and use the auxk_ instance key as the API key. See the OpenAI compatibility section below.

Already using OpenAI? Switch in one line.

Auxen serves a strict OpenAI-compatible /v1/chat/completions route on every instance. Streaming, tool calls, and standard token-usage envelopes all work the way OpenAI clients expect.

Before — OpenAI

from openai import OpenAI

client = OpenAI(
  base_url="https://api.openai.com/v1",
  api_key="sk-..."
)

resp = client.chat.completions.create(
  model="gpt-4o",
  messages=[{"role":"user","content":"Hi"}]
)

After — Auxen

from openai import OpenAI

client = OpenAI(
  base_url="https://api.auxen.ai/v1/inst_xxx/v1",
  api_key="auxk_..."
)

resp = client.chat.completions.create(
  model="auxen",  # ignored
  messages=[{"role":"user","content":"Hi"}]
)

The model parameter is ignored — the model is determined by which instance you're hitting. Pass anything; we keep it in the response for clients that introspect.

Errors and rate limits

Every error returns a stable code you can branch on, plus a human message and a docs link. Rate limit: 1,000 requests/minute per key. On 429s, read the Retry-After header.

Code	HTTP	Meaning
API_KEY_REQUIRED	401	No `Authorization: Bearer …` header.
INVALID_API_KEY	401	Key revoked, deleted, or wrong environment.
INSUFFICIENT_CREDITS	402	USD credit balance is too low to start the instance.
RATE_LIMIT_EXCEEDED	429	Over 1,000 requests/min. Read `Retry-After`.
MODEL_NOT_FOUND	404	Model id doesn't exist or is inactive.
INSTANCE_NOT_FOUND	404	Instance id not yours, or already destroyed.
INSTANCE_NOT_RUNNING	409	Instance is not in `running` state — poll status first.
INVALID_BILLING_MODE	400	`billing` parameter is no longer supported — Auxen is Pay-As-You-Go only.
INVALID_MODEL	400	Unknown or inactive model id.
INVALID_CAPACITY	400	`capacity` must be 1, 2, 4, or 8.
PROVISIONING_FAILED	502	Upstream GPU provisioning error. Retry with backoff.

Build agents on infrastructure you own.

Your private model, your endpoint, per-minute pricing — and an API surface designed for programmatic use from day one.

Get your API key Full API reference