Skip to content
OpenAI logo

gpt-oss-20b

Text GenerationOpenAIHosted

OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

Model Info
Context Window128,000 tokens
Function calling Yes
ReasoningYes
Unit Pricing$0.20 per M input tokens, $0.30 per M output tokens

Usage

export default {
async fetch(request, env): Promise<Response> {
const response = await env.AI.run('@cf/openai/gpt-oss-20b', {
instructions: 'You are a concise assistant.',
input: 'What is the origin of the phrase Hello, World?',
});
return Response.json(response);
},
} satisfies ExportedHandler<Env>;

Parameters

Synchronous — Send a request and receive a complete response
Input format
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
temperature
numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.
top_p
numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
top_k
integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
seed
integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.
repetition_penalty
numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.
frequency_penalty
numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.
presence_penalty
numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.
Streaming — Send a request with `stream: true` and receive server-sent events
Input format
prompt
stringrequiredminLength: 1The input text prompt for the model to generate a response.
lora
stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.
raw
booleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
stream
booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.
max_tokens
integerdefault: 256The maximum number of tokens to generate in the response.
temperature
numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.
top_p
numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.
top_k
integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.
seed
integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.
repetition_penalty
numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.
frequency_penalty
numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.
presence_penalty
numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.
Batch — Send multiple requests in a single API call

API Schemas (Raw)

Synchronous Input
Synchronous Output
Streaming Input
Streaming Output
Batch Input
Batch Output