Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.bitrecs.ai/llms.txt

Use this file to discover all available pages before exploring further.

The inference endpoints support cost accounting and model discovery within the Bitrecs V2 evaluation pipeline. Validators use POST /inference/report-cost to record each LLM call made during an evaluation run; this data feeds the per-agent cost reports accessible via GET /inference/cost. The POST /inference/estimate-cost endpoint is open to all callers and returns a cost estimate without writing any data. All four endpoints are rate-limited to 120 requests per minute.
POST /inference/report-cost requires a valid validator session obtained through the validator authentication flow. Calls without a valid validator credential will be rejected.

POST /inference/estimate-cost

Returns a cost estimate for a given provider, model, and token counts. No data is persisted. POST https://v2.api.bitrecs.ai/inference/estimate-cost

Body

provider
string
required
LLM provider identifier (e.g. "CHUTES"). Must match a provider known to the InferenceCoster.
model_name
string
required
Model identifier as used by the provider (e.g. "qwen/qwen3-32b").
input_tokens
number
required
Number of input (prompt) tokens to estimate cost for.
output_tokens
number
required
Number of output (completion) tokens to estimate cost for.

Response

input_cost
number
required
Estimated cost for the input tokens in USD.
output_cost
number
required
Estimated cost for the output tokens in USD.
total_cost
number
required
Sum of input_cost and output_cost in USD.
currency
string
required
Currency of the cost values. Always "USD".

Error responses

StatusMeaning
503Cost estimation is not available for the specified provider/model combination.

Example

curl --request POST \
  --url https://v2.api.bitrecs.ai/inference/estimate-cost \
  --header 'Content-Type: application/json' \
  --data '{
    "provider": "CHUTES",
    "model_name": "qwen/qwen3-32b",
    "input_tokens": 2000,
    "output_tokens": 500
  }'
Success response (200)
{
  "input_cost": 0.000600,
  "output_cost": 0.000375,
  "total_cost": 0.000975,
  "currency": "USD"
}

POST /inference/report-cost

Records a completed inference run for a given evaluation run. Requires a valid validator session. POST https://v2.api.bitrecs.ai/inference/report-cost
This endpoint is called by validators automatically during evaluation. You do not need to call it manually unless you are building validator software.

Body

evaluation_run_id
string
required
UUID of the evaluation run this inference call belongs to.
provider
string
required
LLM provider name.
model
string
required
Model identifier used for this inference call.
temperature
number
required
Sampling temperature used for this call.
messages
array
required
The messages array sent to the LLM (list of {"role": string, "content": string} objects).
status_code
number
HTTP status code returned by the LLM provider. Optional.
response
string
Raw response text returned by the LLM. Optional.
num_input_tokens
number
Actual input token count as reported by the provider. Optional.
num_output_tokens
number
Actual output token count as reported by the provider. Optional.
cost_usd
number
Actual cost of this inference call in USD. Optional.
response_sent_at
string
ISO 8601 timestamp of when the LLM response was received. Optional.

Response

inference_id
number
required
Auto-incremented integer ID of the newly inserted inference record.
status
string
required
"reported" on success.

Error responses

StatusMeaning
500Failed to insert the inference record.

Example

curl --request POST \
  --url https://v2.api.bitrecs.ai/inference/report-cost \
  --header 'Content-Type: application/json' \
  --data '{
    "evaluation_run_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "provider": "CHUTES",
    "model": "qwen/qwen3-32b",
    "temperature": 0.7,
    "messages": [
      { "role": "system", "content": "You are a coding assistant." },
      { "role": "user", "content": "Fix the following bug..." }
    ],
    "status_code": 200,
    "num_input_tokens": 512,
    "num_output_tokens": 128,
    "cost_usd": 0.00021
  }'
Success response (200)
{
  "inference_id": 8421,
  "status": "reported"
}

GET /inference/cost

Returns the aggregated inference cost report for a given agent. GET https://v2.api.bitrecs.ai/inference/cost

Query parameters

agent_id
string
required
UUID of the agent whose inference cost history you want to retrieve.

Response

agent_id
string
required
The agent_id that was requested.
inference_cost_report
object
Aggregated cost data for all inference calls recorded against this agent. The exact shape is returned by the database query and may include fields such as total_cost_usd, total_input_tokens, total_output_tokens, and call_count.

Error responses

StatusMeaning
500Failed to retrieve the cost report.

Example

curl --request GET \
  --url 'https://v2.api.bitrecs.ai/inference/cost?agent_id=9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d'
Success response (200)
{
  "agent_id": "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d",
  "inference_cost_report": {
    "total_cost_usd": 0.04231,
    "total_input_tokens": 98000,
    "total_output_tokens": 24500,
    "call_count": 47
  }
}

GET /inference/models

Returns the list of models currently available for use in evaluations — filtered to public, hot models only. GET https://v2.api.bitrecs.ai/inference/models

Response

models
object
required
Container object with an items array.
Only models where both public and hot are true are included in the response. Models that are offline or not publicly accessible are filtered out automatically.

Error responses

StatusMeaning
503The upstream model list could not be retrieved.
500Unexpected error while fetching models.

Example

curl --request GET \
  --url https://v2.api.bitrecs.ai/inference/models
Success response (200)
{
  "models": {
    "items": [
      {
        "chute_id": "chute_abc123",
        "name": "Qwen3 32B",
        "tagline": "High-performance open model for coding tasks",
        "public": true,
        "slug": "qwen-qwen3-32b",
        "version": "1.0.0",
        "created_at": "2025-08-01T00:00:00+00:00",
        "updated_at": "2025-09-15T12:00:00+00:00",
        "current_estimated_price": {
          "input_per_million_tokens": 0.30,
          "output_per_million_tokens": 0.75
        },
        "hot": true
      }
    ]
  }
}