The inference endpoints support cost accounting and model discovery within the Bitrecs V2 evaluation pipeline. Validators useDocumentation Index
Fetch the complete documentation index at: https://docs.bitrecs.ai/llms.txt
Use this file to discover all available pages before exploring further.
POST /inference/report-cost to record each LLM call made during an evaluation run; this data feeds the per-agent cost reports accessible via GET /inference/cost. The POST /inference/estimate-cost endpoint is open to all callers and returns a cost estimate without writing any data. All four endpoints are rate-limited to 120 requests per minute.
POST /inference/estimate-cost
Returns a cost estimate for a given provider, model, and token counts. No data is persisted.POST https://v2.api.bitrecs.ai/inference/estimate-cost
Body
LLM provider identifier (e.g.
"CHUTES"). Must match a provider known to the InferenceCoster.Model identifier as used by the provider (e.g.
"qwen/qwen3-32b").Number of input (prompt) tokens to estimate cost for.
Number of output (completion) tokens to estimate cost for.
Response
Estimated cost for the input tokens in USD.
Estimated cost for the output tokens in USD.
Sum of
input_cost and output_cost in USD.Currency of the cost values. Always
"USD".Error responses
| Status | Meaning |
|---|---|
503 | Cost estimation is not available for the specified provider/model combination. |
Example
POST /inference/report-cost
Records a completed inference run for a given evaluation run. Requires a valid validator session.POST https://v2.api.bitrecs.ai/inference/report-cost
This endpoint is called by validators automatically during evaluation. You do not need to call it manually unless you are building validator software.
Body
UUID of the evaluation run this inference call belongs to.
LLM provider name.
Model identifier used for this inference call.
Sampling temperature used for this call.
The messages array sent to the LLM (list of
{"role": string, "content": string} objects).HTTP status code returned by the LLM provider. Optional.
Raw response text returned by the LLM. Optional.
Actual input token count as reported by the provider. Optional.
Actual output token count as reported by the provider. Optional.
Actual cost of this inference call in USD. Optional.
ISO 8601 timestamp of when the LLM response was received. Optional.
Response
Auto-incremented integer ID of the newly inserted inference record.
"reported" on success.Error responses
| Status | Meaning |
|---|---|
500 | Failed to insert the inference record. |
Example
GET /inference/cost
Returns the aggregated inference cost report for a given agent.GET https://v2.api.bitrecs.ai/inference/cost
Query parameters
UUID of the agent whose inference cost history you want to retrieve.
Response
The
agent_id that was requested.Aggregated cost data for all inference calls recorded against this agent. The exact shape is returned by the database query and may include fields such as
total_cost_usd, total_input_tokens, total_output_tokens, and call_count.Error responses
| Status | Meaning |
|---|---|
500 | Failed to retrieve the cost report. |
Example
GET /inference/models
Returns the list of models currently available for use in evaluations — filtered to public, hot models only.GET https://v2.api.bitrecs.ai/inference/models
Response
Container object with an
items array.Only models where both
public and hot are true are included in the response. Models that are offline or not publicly accessible are filtered out automatically.Error responses
| Status | Meaning |
|---|---|
503 | The upstream model list could not be retrieved. |
500 | Unexpected error while fetching models. |