Create a chat completion for given messages with streaming support using the most powerful Llama model
JWT token for authentication (use the full JWT token from your code)
Model UUID for the request (Llama 3.1 405B model identifier)
Array of messages in the conversation
Model identifier for Llama 3.1 405B
llama-3.1-405b Whether to stream the response
Sampling temperature (0.0 to 2.0)
0 <= x <= 2Maximum number of tokens to generate
1 <= x <= 4096Nucleus sampling parameter
0 <= x <= 1Sequences where the API will stop generating
Frequency penalty to reduce repetition
-2 <= x <= 2Presence penalty to encourage topic diversity
-2 <= x <= 2Successful chat completion
Response for non-streaming chat completion