Skip to main content
POST
/
chat
/
completions
curl --request POST \ --url https://http.llm.model-cluster.on-prem.clusters.yotta-uat.cluster.s9t.link/chat/completions \ --header 'Authorization: Bearer <token>' \ --header 'Content-Type: application/json' \ --header 'id: <id>' \ --data ' { "model": "deepseek-r1", "messages": [ { "role": "user", "content": "How do I delete a team member from an org given user and org tables?" } ], "stream": true, "max_tokens": 1024, "temperature": 0.7, "top_p": 1 } '
{ "id": "chatcmpl-deepseek-r1-abc123", "object": "chat.completion", "created": 1699014493, "model": "deepseek-r1", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "To delete a team member from an organization, you'll need to perform a DELETE operation. Here's how you can approach this..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 18, "completion_tokens": 156, "total_tokens": 174 }, "system_fingerprint": "deepseek-r1-v1.0" }

Authorizations

Authorization
string
header
required

JWT token for authentication - Use your API token as the Bearer token

Headers

id
string
default:018037b8-90fe-47f6-b131-33de9fbe79ab
required

Model UUID for the request (DeepSeek R1 model identifier)

Body

application/json
messages
object[]
required

Array of messages in the conversation

model
enum<string>
default:deepseek-r1
required

Model identifier for DeepSeek R1

Available options:
deepseek-r1
stream
boolean
default:false

Whether to stream the response

temperature
number
default:0.7

Sampling temperature (0.0 to 2.0). Controls randomness in generation

Required range: 0 <= x <= 2
max_tokens
integer
default:1024

Maximum number of tokens to generate

Required range: 1 <= x <= 4096
top_p
number
default:1

Nucleus sampling parameter for controlling response diversity

Required range: 0 <= x <= 1
top_k
integer
default:50

Top-k sampling parameter for vocabulary selection

Required range: 1 <= x <= 100
stop
string[] | null

Sequences where the API will stop generating further tokens

Maximum array length: 4
frequency_penalty
number
default:0

Penalty for frequent tokens to reduce repetition

Required range: -2 <= x <= 2
presence_penalty
number
default:0

Penalty for new tokens to encourage topic diversity

Required range: -2 <= x <= 2
repetition_penalty
number
default:1

Penalty for repeating tokens (DeepSeek-specific parameter)

Required range: 0.1 <= x <= 2
do_sample
boolean
default:true

Whether to use sampling for generation

seed
integer | null

Random seed for reproducible outputs

system_prompt
string | null

System prompt to set model behavior (alternative to system message)

Response

Successful chat completion

Response for non-streaming chat completion

id
string

Unique identifier for the completion

Example:

"chatcmpl-deepseek-r1-abc123"

object
enum<string>

Object type

Available options:
chat.completion
Example:

"chat.completion"

created
integer

Unix timestamp of when the completion was created

Example:

1699014493

model
string

The model used for completion

Example:

"deepseek-r1"

choices
object[]
usage
object
system_fingerprint
string | null

System fingerprint for the model version

Example:

"deepseek-r1-v1.0"