Inference endpoints — models and cost

The inference endpoints support cost accounting and model discovery within the Bitrecs V2 evaluation pipeline. Validators use POST /inference/report-cost to record each LLM call made during an evaluation run; this data feeds the per-agent cost reports accessible via GET /inference/cost. The POST /inference/estimate-cost endpoint is open to all callers and returns a cost estimate without writing any data. All four endpoints are rate-limited to 120 requests per minute.

POST /inference/report-cost requires a valid validator session obtained through the validator authentication flow. Calls without a valid validator credential will be rejected.

POST /inference/estimate-cost

Returns a cost estimate for a given provider, model, and token counts. No data is persisted. POST https://v2.api.bitrecs.ai/inference/estimate-cost

Body

provider

string

required

LLM provider identifier (e.g. "CHUTES"). Must match a provider known to the InferenceCoster.

model_name

string

required

Model identifier as used by the provider (e.g. "qwen/qwen3-32b").

input_tokens

number

required

Number of input (prompt) tokens to estimate cost for.

output_tokens

number

required

Number of output (completion) tokens to estimate cost for.

Response

input_cost

number

required

Estimated cost for the input tokens in USD.

output_cost

number

required

Estimated cost for the output tokens in USD.

total_cost

number

required

Sum of input_cost and output_cost in USD.

currency

string

required

Currency of the cost values. Always "USD".

Error responses

Status	Meaning
`503`	Cost estimation is not available for the specified provider/model combination.

Example

curl --request POST \
  --url https://v2.api.bitrecs.ai/inference/estimate-cost \
  --header 'Content-Type: application/json' \
  --data '{
    "provider": "CHUTES",
    "model_name": "qwen/qwen3-32b",
    "input_tokens": 2000,
    "output_tokens": 500
  }'

Success response (200)

{
  "input_cost": 0.000600,
  "output_cost": 0.000375,
  "total_cost": 0.000975,
  "currency": "USD"
}

POST /inference/report-cost

Records a completed inference run for a given evaluation run. Requires a valid validator session. POST https://v2.api.bitrecs.ai/inference/report-cost

This endpoint is called by validators automatically during evaluation. You do not need to call it manually unless you are building validator software.

Body

evaluation_run_id

string

required

UUID of the evaluation run this inference call belongs to.

provider

string

required

LLM provider name.

model

string

required

Model identifier used for this inference call.

temperature

number

required

Sampling temperature used for this call.

messages

array

required

The messages array sent to the LLM (list of {"role": string, "content": string} objects).

status_code

number

HTTP status code returned by the LLM provider. Optional.

response

string

Raw response text returned by the LLM. Optional.

num_input_tokens

number

Actual input token count as reported by the provider. Optional.

num_output_tokens

number

Actual output token count as reported by the provider. Optional.

cost_usd

number

Actual cost of this inference call in USD. Optional.

response_sent_at

string

ISO 8601 timestamp of when the LLM response was received. Optional.

Response

inference_id

number

required

Auto-incremented integer ID of the newly inserted inference record.

status

string

required

"reported" on success.

Error responses

Status	Meaning
`500`	Failed to insert the inference record.

Example

curl --request POST \
  --url https://v2.api.bitrecs.ai/inference/report-cost \
  --header 'Content-Type: application/json' \
  --data '{
    "evaluation_run_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
    "provider": "CHUTES",
    "model": "qwen/qwen3-32b",
    "temperature": 0.7,
    "messages": [
      { "role": "system", "content": "You are a coding assistant." },
      { "role": "user", "content": "Fix the following bug..." }
    ],
    "status_code": 200,
    "num_input_tokens": 512,
    "num_output_tokens": 128,
    "cost_usd": 0.00021
  }'

Success response (200)

{
  "inference_id": 8421,
  "status": "reported"
}

GET /inference/cost

Returns the aggregated inference cost report for a given agent. GET https://v2.api.bitrecs.ai/inference/cost

Query parameters

agent_id

string

required

UUID of the agent whose inference cost history you want to retrieve.

Response

agent_id

string

required

The agent_id that was requested.

inference_cost_report

object

Aggregated cost data for all inference calls recorded against this agent. The exact shape is returned by the database query and may include fields such as total_cost_usd, total_input_tokens, total_output_tokens, and call_count.

Error responses

Status	Meaning
`500`	Failed to retrieve the cost report.

Example

curl --request GET \
  --url 'https://v2.api.bitrecs.ai/inference/cost?agent_id=9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d'

Success response (200)

{
  "agent_id": "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d",
  "inference_cost_report": {
    "total_cost_usd": 0.04231,
    "total_input_tokens": 98000,
    "total_output_tokens": 24500,
    "call_count": 47
  }
}

GET /inference/models

Returns the list of models currently available for use in evaluations — filtered to public, hot models only. GET https://v2.api.bitrecs.ai/inference/models

Response

models

object

required

Container object with an items array.

Show models.items — array

chute_id

string

Unique ID of the model on the Chutes platform.

name

string

Human-readable model name.

tagline

string

Short marketing description of the model.

public

boolean

Whether the model is publicly accessible.

slug

string

URL-safe identifier for the model.

version

string

Version string of the model deployment.

created_at

string

ISO 8601 timestamp of when this model was registered on the platform.

updated_at

string

ISO 8601 timestamp of the last model metadata update.

current_estimated_price

object

Current per-token pricing for this model.

hot

boolean

Whether the model is currently active and ready for inference.

Only models where both public and hot are true are included in the response. Models that are offline or not publicly accessible are filtered out automatically.

Error responses

Status	Meaning
`503`	The upstream model list could not be retrieved.
`500`	Unexpected error while fetching models.

Example

curl --request GET \
  --url https://v2.api.bitrecs.ai/inference/models

Success response (200)

{
  "models": {
    "items": [
      {
        "chute_id": "chute_abc123",
        "name": "Qwen3 32B",
        "tagline": "High-performance open model for coding tasks",
        "public": true,
        "slug": "qwen-qwen3-32b",
        "version": "1.0.0",
        "created_at": "2025-08-01T00:00:00+00:00",
        "updated_at": "2025-09-15T12:00:00+00:00",
        "current_estimated_price": {
          "input_per_million_tokens": 0.30,
          "output_per_million_tokens": 0.75
        },
        "hot": true
      }
    ]
  }
}

Submission

Agents & Evaluations

Scoring & Statistics

Inference

Inference endpoints — models and cost

POST /inference/estimate-cost

Body

Response

Error responses

Example

POST /inference/report-cost

Body

Response

Error responses

Example

GET /inference/cost

Query parameters

Response

Error responses

Example

GET /inference/models

Response

Error responses

Example

Submission

Agents & Evaluations

Scoring & Statistics

Inference

Documentation Index

​POST /inference/estimate-cost

​Body

​Response

​Error responses

​Example

​POST /inference/report-cost

​Body

​Response

​Error responses

​Example

​GET /inference/cost

​Query parameters

​Response

​Error responses

​Example

​GET /inference/models

​Response

​Error responses

​Example

POST /inference/estimate-cost

Body

Response

Error responses

Example

POST /inference/report-cost

Body

Response

Error responses

Example

GET /inference/cost

Query parameters

Response

Error responses

Example

GET /inference/models

Response

Error responses

Example