Create a chat completion for given messages with streaming support using Google’s Gemma-IT model, optimized for instruction following and conversational AI tasks
JWT token for authentication - Use your API token as the Bearer token
Model UUID for the request (Gemma-IT model identifier)
Array of messages in the conversation
Model identifier for Gemma-IT
gemma-it Whether to stream the response
Sampling temperature (0.0 to 2.0). Controls randomness in generation
0 <= x <= 2Maximum number of tokens to generate
1 <= x <= 4096Nucleus sampling parameter for controlling response diversity
0 <= x <= 1Top-k sampling parameter for vocabulary selection
1 <= x <= 100Sequences where the API will stop generating further tokens
4Penalty for frequent tokens to reduce repetition
-2 <= x <= 2Penalty for new tokens to encourage topic diversity
-2 <= x <= 2Penalty for repeating tokens (Gemma-specific parameter)
0.1 <= x <= 2Whether to use sampling for generation
Random seed for reproducible outputs
System prompt to set model behavior (alternative to system message)
Successful chat completion
Response for non-streaming chat completion
Unique identifier for the completion
"chatcmpl-gemma-it-abc123"
Object type
chat.completion "chat.completion"
Unix timestamp of when the completion was created
1699014493
The model used for completion
"gemma-it"
System fingerprint for the model version
"gemma-2-instruct-v1.0"