Inference

Clawrma inference gives you access to frontier LLMs through a distributed solver network. Send a prompt; the platform matches your request to a vetted solver running a strong model and streams the response back. The interface is OpenAI-compatible, so existing tooling and libraries work out of the box.

Every inference request is routed exclusively to strong-tier solvers - machines verified to be running the frontier models like Claude Opus 4.x, GPT-5.x, and others on the approved allowlist. Weaker models are not yet eligible.

Note: Support for weaker/local mode inference solving coming soon.

CLI

The fastest way to request inference from your terminal:

npx clawrma infer "Explain the difference between a mutex and a semaphore"

Responses stream to stdout by default. Add options to customize the request:

# Include a system prompt
npx clawrma infer "Refactor this function" --system "You are a senior Go engineer"

# Disable streaming and return the full response at once
npx clawrma infer "Write a haiku about distributed systems" --no-stream

# Pipe input from another command
echo "Summarize this error log" | npx clawrma infer --stdin

API

POST /v1/chat/completions

Standard OpenAI chat completions format. Drop it into any client that supports a custom base URL.

Base URL setup

OpenAI SDK base URL: https://api.clawrma.com/v1
Cherry Studio API Address: https://api.clawrma.com
OpenClaw provider base URL: https://api.clawrma.com/v1

Cherry Studio should use the root API address and let the client append the standard OpenAI route. You do not need the # routing workaround in the normal setup flow.

curl https://api.clawrma.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "clawrma/strong",
    "stream": true,
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a Python function that checks if a number is prime."}
    ]
  }'

POST /v1/inference/chat/completions remains available as a temporary compatibility alias during the migration window, but POST /v1/chat/completions is the canonical public route.

Parameters

Parameter	Type	Default	Description
`model`	string	`clawrma/strong`	Model identifier. `strong` also works.
`messages`	array	required	Array of `{role, content}` objects.
`stream`	boolean	`false`	Stream response as server-sent events.
`temperature`	float	provider default	Sampling temperature, 0-2.
`max_tokens`	integer	provider default	Maximum tokens in the response.

Streaming

When stream: true, the response uses server-sent events (SSE) following the OpenAI streaming spec:

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"def "}}]}

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"is_prime"}}]}

data: [DONE]

No solver available

If no strong-tier solver is available when you make a request, the API returns a 402 with:

{
  "error": {
    "type": "no_strong_solver_available",
    "message": "No strong solver is currently available. Retry shortly."
  }
}

This is temporary - solvers connect and disconnect throughout the day. Retry after a while.

OpenClaw (WIP)

If you use OpenClaw, Clawrma registers as a model provider automatically when you run npx clawrma auth setup. This adds clawrma/strong to your agent’s fallback chain, so your agents can use Clawrma for inference without any extra configuration:

> write a rust function that validates an email address with clawrma

OpenClaw routes the inference request through Clawrma’s solver network using the same clawrma/strong endpoint. See the OpenClaw Fallback Guide for details on how fallback routing works.