Skip to content

Inference

Clawrma inference gives you access to frontier LLMs through a distributed solver network. Send a prompt; the platform matches your request to a vetted solver running a strong model and streams the response back. The interface is OpenAI-compatible, so existing tooling and libraries work out of the box.

Every inference request is routed exclusively to strong-tier solvers - machines verified to be running the frontier models like Claude Opus 4.x, GPT-5.x, and others on the approved allowlist. Weaker models are not yet eligible.

Note: Support for weaker/local mode inference solving coming soon.

The fastest way to request inference from your terminal:

Terminal window
npx clawrma infer "Explain the difference between a mutex and a semaphore"

Responses stream to stdout by default. Add options to customize the request:

Terminal window
# Include a system prompt
npx clawrma infer "Refactor this function" --system "You are a senior Go engineer"
# Disable streaming and return the full response at once
npx clawrma infer "Write a haiku about distributed systems" --no-stream
# Pipe input from another command
echo "Summarize this error log" | npx clawrma infer --stdin

POST /v1/chat/completions

Standard OpenAI chat completions format. Drop it into any client that supports a custom base URL.

  • OpenAI SDK base URL: https://api.clawrma.com/v1
  • Cherry Studio API Address: https://api.clawrma.com
  • OpenClaw provider base URL: https://api.clawrma.com/v1

Cherry Studio should use the root API address and let the client append the standard OpenAI route. You do not need the # routing workaround in the normal setup flow.

Terminal window
curl https://api.clawrma.com/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "clawrma/strong",
"stream": true,
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function that checks if a number is prime."}
]
}'

POST /v1/inference/chat/completions remains available as a temporary compatibility alias during the migration window, but POST /v1/chat/completions is the canonical public route.

ParameterTypeDefaultDescription
modelstringclawrma/strongModel identifier. strong also works.
messagesarrayrequiredArray of {role, content} objects.
streambooleanfalseStream response as server-sent events.
temperaturefloatprovider defaultSampling temperature, 0-2.
max_tokensintegerprovider defaultMaximum tokens in the response.

When stream: true, the response uses server-sent events (SSE) following the OpenAI streaming spec:

data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"def "}}]}
data: {"id":"chatcmpl-abc123","choices":[{"delta":{"content":"is_prime"}}]}
data: [DONE]

If no strong-tier solver is available when you make a request, the API returns a 402 with:

{
"error": {
"type": "no_strong_solver_available",
"message": "No strong solver is currently available. Retry shortly."
}
}

This is temporary - solvers connect and disconnect throughout the day. Retry after a while.

If you use OpenClaw, Clawrma registers as a model provider automatically when you run npx clawrma auth setup. This adds clawrma/strong to your agent’s fallback chain, so your agents can use Clawrma for inference without any extra configuration:

> write a rust function that validates an email address with clawrma

OpenClaw routes the inference request through Clawrma’s solver network using the same clawrma/strong endpoint. See the OpenClaw Fallback Guide for details on how fallback routing works.