Skip to main content
POST
/
chat
/
completions
curl --request POST \
  --url https://http.llm.model-cluster.on-prem.clusters.yotta-uat.cluster.s9t.link/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'id: <id>' \
  --data '
{
  "model": "qwen-2.5-72b",
  "messages": [
    {
      "role": "user",
      "content": "write a sentence on usa"
    }
  ],
  "stream": true,
  "max_tokens": 512
}
'
{
"id": "chatcmpl-qwen-abc123",
"object": "chat.completion",
"created": 1699014493,
"model": "qwen-2.5-72b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The United States of America is a federal republic consisting of 50 states, known for its diverse culture, economic influence, and democratic principles."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 8,
"completion_tokens": 32,
"total_tokens": 40
},
"system_fingerprint": "qwen-2.5-72b-v1.0"
}

Authorizations

Authorization
string
header
required

JWT token for authentication - Use your API token as the Bearer token

Headers

id
string
default:f1e02109-e3e3-4faa-99cd-2b400f71e7d4
required

Model UUID for the request (Qwen 2.5 72B model identifier)

Body

application/json
messages
object[]
required

Array of messages in the conversation

model
enum<string>
default:qwen-2.5-72b
required

Model identifier for Qwen 2.5 72B

Available options:
qwen-2.5-72b
stream
boolean
default:false

Whether to stream the response

temperature
number
default:0.7

Sampling temperature (0.0 to 2.0). Lower values for more focused outputs, higher for more creative

Required range: 0 <= x <= 2
max_tokens
integer
default:512

Maximum number of tokens to generate

Required range: 1 <= x <= 32768
top_p
number
default:0.9

Nucleus sampling parameter. Use lower values for more focused outputs

Required range: 0 <= x <= 1
top_k
integer
default:50

Top-k sampling parameter for controlling vocabulary selection

Required range: 1 <= x <= 200
stop
string[] | null

Sequences where the API will stop generating further tokens

Maximum array length: 4
frequency_penalty
number
default:0

Penalty for frequent tokens to reduce repetition

Required range: -2 <= x <= 2
presence_penalty
number
default:0

Penalty for new tokens to encourage topic diversity

Required range: -2 <= x <= 2
repetition_penalty
number
default:1

Penalty for repeating tokens (Qwen-specific parameter)

Required range: 0.1 <= x <= 2
do_sample
boolean
default:true

Whether to use sampling for generation

seed
integer | null

Random seed for reproducible outputs

Response

Successful chat completion

  • Option 1
  • Option 2

Response for non-streaming chat completion

id
string

Unique identifier for the completion

Example:

"chatcmpl-qwen-abc123"

object
enum<string>

Object type

Available options:
chat.completion
Example:

"chat.completion"

created
integer

Unix timestamp of when the completion was created

Example:

1699014493

model
string

The model used for completion

Example:

"qwen-2.5-72b"

choices
object[]
usage
object
system_fingerprint
string | null

System fingerprint for the model version

Example:

"qwen-2.5-72b-v1.0"