Skip to content
Moonshot AI logo

kimi-k2.6

Text GenerationMoonshot AIHosted

Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

Model Info
Context Window262,144 tokens
Terms and Licenselink
Function calling Yes
ReasoningYes
VisionYes
Unit Pricing$0.95 per M input tokens, $0.16 per M cached input tokens, $4.00 per M output tokens

Playground

Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.

Launch the LLM Playground

Usage

TypeScript
export interface Env {
AI: Ai;
}
export default {
async fetch(request, env): Promise<Response> {
const messages = [
{ role: "system", content: "You are a friendly assistant" },
{
role: "user",
content: "What is the origin of the phrase Hello, World",
},
];
const stream = await env.AI.run("@cf/moonshotai/kimi-k2.6", {
messages,
stream: true,
});
return new Response(stream, {
headers: { "content-type": "text/event-stream" },
});
},
} satisfies ExportedHandler<Env>;

Parameters

Synchronous — Send a request and receive a complete response
Input format
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
model
stringID of the model to use (e.g. '@cf/zai-org/glm-4.7-flash, etc').
frequency_penalty
number | nullPenalizes new tokens based on their existing frequency in the text so far.
logit_bias
object | nullModify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.
logprobs
boolean | nullWhether to return log probabilities of the output tokens.
top_logprobs
integer | nullHow many top log probabilities to return at each token position (0-20). Requires logprobs=true.
max_tokens
integer | nullDeprecated in favor of max_completion_tokens. The maximum number of tokens to generate.
max_completion_tokens
integer | nullAn upper bound for the number of tokens that can be generated for a completion.
metadata
object | nullSet of 16 key-value pairs that can be attached to the object.
modalities
array | nullOutput types requested from the model (e.g. ['text'] or ['text', 'audio']).
n
integer | nullHow many chat completion choices to generate for each input message.
parallel_tool_calls
booleandefault: trueWhether to enable parallel function calling during tool use.
presence_penalty
number | nullPenalizes new tokens based on whether they appear in the text so far.
reasoning_effort
string | nullConstrains effort on reasoning for reasoning models (o1, o3-mini, etc.).
seed
integer | nullIf specified, the system will make a best effort to sample deterministically.
service_tier
string | nullSpecifies the processing type used for serving the request.
store
boolean | nullWhether to store the output for model distillation / evals.
stream
boolean | nullIf true, partial message deltas will be sent as server-sent events.
temperature
number | nullSampling temperature between 0 and 2.
top_p
number | nullNucleus sampling: considers the results of the tokens with top_p probability mass.
user
stringA unique identifier representing your end-user, for abuse monitoring.
Streaming — Send a request with `stream: true` and receive server-sent events
Input format
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
model
stringID of the model to use (e.g. '@cf/zai-org/glm-4.7-flash, etc').
frequency_penalty
number | nullPenalizes new tokens based on their existing frequency in the text so far.
logit_bias
object | nullModify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.
logprobs
boolean | nullWhether to return log probabilities of the output tokens.
top_logprobs
integer | nullHow many top log probabilities to return at each token position (0-20). Requires logprobs=true.
max_tokens
integer | nullDeprecated in favor of max_completion_tokens. The maximum number of tokens to generate.
max_completion_tokens
integer | nullAn upper bound for the number of tokens that can be generated for a completion.
metadata
object | nullSet of 16 key-value pairs that can be attached to the object.
modalities
array | nullOutput types requested from the model (e.g. ['text'] or ['text', 'audio']).
n
integer | nullHow many chat completion choices to generate for each input message.
parallel_tool_calls
booleandefault: trueWhether to enable parallel function calling during tool use.
presence_penalty
number | nullPenalizes new tokens based on whether they appear in the text so far.
reasoning_effort
string | nullConstrains effort on reasoning for reasoning models (o1, o3-mini, etc.).
seed
integer | nullIf specified, the system will make a best effort to sample deterministically.
service_tier
string | nullSpecifies the processing type used for serving the request.
store
boolean | nullWhether to store the output for model distillation / evals.
stream
boolean | nullIf true, partial message deltas will be sent as server-sent events.
temperature
number | nullSampling temperature between 0 and 2.
top_p
number | nullNucleus sampling: considers the results of the tokens with top_p probability mass.
user
stringA unique identifier representing your end-user, for abuse monitoring.
Batch — Send multiple requests in a single API call

API Schemas (Raw)

Synchronous Input
Synchronous Output
Streaming Input
Streaming Output
Batch Input
Batch Output