Auxiliary Intelligence · For Agents

Programmatic AIinfrastructure.

Your agent provisions a private AI model, gets an endpoint and an API key, and starts making inference calls — in minutes, with no human involvement after key generation. OpenAI-compatible so any framework drops in without changes.

🔒 Per-instance API keys·⚡ ~3 min to a live endpoint·↔ OpenAI-compatible·📊 1,000 req/min/key
Quickstart

From zero to first inference

Five steps. Pick your language. Replace YOUR_KEY with the agent management key from the dashboard.

  1. 1. Provision a private model

    Pick any model from the catalog. Returns immediately with status: provisioning. Save the instance_id.

    curl -X POST https://api.auxen.ai/v1/provision \
      -H "Authorization: Bearer YOUR_KEY" \
      -H "Content-Type: application/json" \
      -d '{"model":"llama3.1-8b","capacity":1}'
  2. 2. Wait for ready status

    Poll the instance until status flips to 'running' (~3 minutes). Save the endpoint and api_key — those are the inference creds.

    INST=inst_xxxxxxxxxxxx
    while :; do
      RESP=$(curl -s -H "Authorization: Bearer YOUR_KEY" \
        https://api.auxen.ai/v1/instances/$INST)
      STATUS=$(echo "$RESP" | jq -r '.data.instance.status')
      [ "$STATUS" = "running" ] && break
      sleep 15
    done
    INSTKEY=$(echo "$RESP" | jq -r '.data.instance.api_key')
  3. 3. Make an inference call

    Use the per-instance auxk_ key (NOT your management key) against /v1/{instanceId}/chat or the OpenAI-compatible /v1/chat/completions.

    curl -X POST https://api.auxen.ai/v1/$INST/chat \
      -H "Authorization: Bearer $INSTKEY" \
      -H "Content-Type: application/json" \
      -d '{"message":"Summarize the moon landing in one sentence."}'
  4. 4. Handle the response

    Auxen-native shape returns {message, usage, response_time_ms}. OpenAI shape returns the standard chat.completion envelope.

    # {"success":true,"data":{"message":"...","model":"llama3.1:8b","usage":{...},"response_time_ms":1247}}
  5. 5. Destroy when done

    Stops billing immediately. Per-minute charges halt the moment the instance status flips to destroyed.

    curl -X DELETE -H "Authorization: Bearer YOUR_KEY" \
      https://api.auxen.ai/v1/instances/$INST

Pay-As-You-Go, by the minute

One billing model. Add USD credits, deploy an instance, and we deduct per-minute against actual GPU runtime. Idle time costs nothing. Destroy when done. No billing parameter on /v1/provision — just pick a model and (optionally) a capacity multiplier.

Shared

$0.05/hour. Small models only.

  • · Multi-tenant GPU pool
  • · No capacity scaling
  • · Cheapest entry point — breakeven vs Dedicated Small at ~17 hrs/month
  • · Set "tier":"shared" on provision (Small models only)
Best for: low-volume internal agents, experiments, anything where you're hours/month rather than hours/day.
Dedicated

Private GPU. Capacity scales with you.

  • · Small $0.15/hr · Medium $0.20/hr · Large $0.65/hr · XL $2.85/hr (1×)
  • · Capacity 1× / 2× / 4× / 8× (1.0× / 1.8× / 3.3× / 6.0× cost)
  • · Same model, multiple GPUs behind a load balancer
  • · Pass "capacity": 4 on provision or call /scale later
Best for: production endpoints, persistent agents, anything that needs throughput, isolation, or fine-tuning headroom.

MCP and agent frameworks

Auxen exposes a Model Context Protocol server at https://api.auxen.ai/mcp so Claude Desktop, Cursor, and any other MCP-compatible client can use Auxen as a native tool. The MCP server wraps the same eight management capabilities as the REST API.

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json)
{
  "mcpServers": {
    "auxen": {
      "command": "npx",
      "args": [
        "-y",
        "mcp-remote",
        "https://api.auxen.ai/mcp",
        "--header",
        "Authorization: Bearer auxen_live_xxx"
      ]
    }
  }
}

Uses the mcp-remote stdio-bridge proxy (auto-installed on first launch) to connect Claude Desktop to Auxen's remote HTTP MCP server. Fully quit and relaunch Claude Desktop to load the change.

For LangChain, AutoGen, or any framework that consumes OpenAI-shaped APIs, point the SDK's base_url at your instance endpoint and use the auxk_ instance key as the API key. See the OpenAI compatibility section below.

Already using OpenAI? Switch in one line.

Auxen serves a strict OpenAI-compatible /v1/chat/completions route on every instance. Streaming, tool calls, and standard token-usage envelopes all work the way OpenAI clients expect.

Before — OpenAI
from openai import OpenAI

client = OpenAI(
  base_url="https://api.openai.com/v1",
  api_key="sk-..."
)

resp = client.chat.completions.create(
  model="gpt-4o",
  messages=[{"role":"user","content":"Hi"}]
)
After — Auxen
from openai import OpenAI

client = OpenAI(
  base_url="https://api.auxen.ai/v1/inst_xxx/v1",
  api_key="auxk_..."
)

resp = client.chat.completions.create(
  model="auxen",  # ignored
  messages=[{"role":"user","content":"Hi"}]
)

The model parameter is ignored — the model is determined by which instance you're hitting. Pass anything; we keep it in the response for clients that introspect.

Errors and rate limits

Every error returns a stable code you can branch on, plus a human message and a docs link. Rate limit: 1,000 requests/minute per key. On 429s, read the Retry-After header.

CodeHTTPMeaning
API_KEY_REQUIRED401No `Authorization: Bearer …` header.
INVALID_API_KEY401Key revoked, deleted, or wrong environment.
INSUFFICIENT_CREDITS402USD credit balance is too low to start the instance.
RATE_LIMIT_EXCEEDED429Over 1,000 requests/min. Read `Retry-After`.
MODEL_NOT_FOUND404Model id doesn't exist or is inactive.
INSTANCE_NOT_FOUND404Instance id not yours, or already destroyed.
INSTANCE_NOT_RUNNING409Instance is not in `running` state — poll status first.
INVALID_BILLING_MODE400`billing` parameter is no longer supported — Auxen is Pay-As-You-Go only.
INVALID_MODEL400Unknown or inactive model id.
INVALID_CAPACITY400`capacity` must be 1, 2, 4, or 8.
PROVISIONING_FAILED502Upstream GPU provisioning error. Retry with backoff.

Build agents on infrastructure you own.

Your private model, your endpoint, per-minute pricing — and an API surface designed for programmatic use from day one.