Llama 3.1 405 API

curl --request POST \
  --url https://http.llm.model-cluster.on-prem.clusters.yotta-uat.cluster.s9t.link/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'id: <id>' \
  --data '
{
  "model": "llama-3.1-405b",
  "messages": [
    {
      "role": "user",
      "content": "write a sentence on usa"
    }
  ],
  "stream": true,
  "max_tokens": 512
}
'

{
  "id": "<string>",
  "object": "chat.completion",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "assistant",
        "content": "<string>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  }
}

POST

chat

completions

curl --request POST \
  --url https://http.llm.model-cluster.on-prem.clusters.yotta-uat.cluster.s9t.link/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --header 'id: <id>' \
  --data '
{
  "model": "llama-3.1-405b",
  "messages": [
    {
      "role": "user",
      "content": "write a sentence on usa"
    }
  ],
  "stream": true,
  "max_tokens": 512
}
'

{
  "id": "<string>",
  "object": "chat.completion",
  "created": 123,
  "model": "<string>",
  "choices": [
    {
      "index": 123,
      "message": {
        "role": "assistant",
        "content": "<string>"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 123,
    "completion_tokens": 123,
    "total_tokens": 123
  }
}

Authorizations

Authorization

string

header

required

JWT token for authentication (use the full JWT token from your code)

Headers

string

default:3c281955-ad3b-4845-97a3-9a352c7d9f78

required

Model UUID for the request (Llama 3.1 405B model identifier)

Body

application/json

messages

object[]

required

Array of messages in the conversation

Show child attributes

messages.role

enum<string>

required

The role of the message sender

Available options:

system,

user,

assistant

messages.content

string

required

The content of the message

model

enum<string>

default:llama-3.1-405b

required

Model identifier for Llama 3.1 405B

Available options:

llama-3.1-405b

stream

boolean

default:false

Whether to stream the response

temperature

number

default:0.7

Sampling temperature (0.0 to 2.0)

Required range: 0 <= x <= 2

max_tokens

integer

default:512

Maximum number of tokens to generate

Required range: 1 <= x <= 4096

top_p

number

default:0.95

Nucleus sampling parameter

Required range: 0 <= x <= 1

stop

string[] | null

Sequences where the API will stop generating

frequency_penalty

number

default:0

Frequency penalty to reduce repetition

Required range: -2 <= x <= 2

presence_penalty

number

default:0

Presence penalty to encourage topic diversity

Required range: -2 <= x <= 2

Response

Successful chat completion

Option 1
Option 2

Response for non-streaming chat completion

string

Unique identifier for the completion

object

enum<string>

Object type

Available options:

chat.completion

created

integer

Unix timestamp of when the completion was created

model

string

The model used for completion

choices

object[]

Show child attributes

choices.index

integer

required

Index of the choice

choices.message

object

required

Show child attributes

choices.message.role

enum<string>

required

Role of the message sender

Available options:

assistant

choices.message.content

string

required

Generated text content

choices.finish_reason

enum<string>

required

Reason for finishing the generation

Available options:

stop,

length,

content_filter,

tool_calls

usage

object

Show child attributes

usage.prompt_tokens

integer

required

Number of tokens in the prompt

usage.completion_tokens

integer

required

Number of tokens in the completion

usage.total_tokens

integer

required

Total number of tokens used

Llama 3.1 70B API Llama 3.2 11B API

⌘I

API Documentation

Inference APIs

Training APIs

Authorizations

Headers

Body

Response