gpt-oss-20b
Text Generation • OpenAI • HostedOpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.
| Model Info | |
|---|---|
| Context Window ↗ | 128,000 tokens |
| Function calling ↗ | Yes |
| Reasoning | Yes |
| Unit Pricing | $0.20 per M input tokens, $0.30 per M output tokens |
Usage
export default { async fetch(request, env): Promise<Response> { const response = await env.AI.run('@cf/openai/gpt-oss-20b', { instructions: 'You are a concise assistant.', input: 'What is the origin of the phrase Hello, World?', });
return Response.json(response); },} satisfies ExportedHandler<Env>;import osimport requests
ACCOUNT_ID = os.environ.get("CLOUDFLARE_ACCOUNT_ID")AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post( f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/v1/responses", headers={"Authorization": f"Bearer {AUTH_TOKEN}"}, json={ "model": "@cf/openai/gpt-oss-20b", "input": "Tell me all about PEP-8" })result = response.json()print(result)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/v1/responses -H "Content-Type: application/json" -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" -d '{ "model": "@cf/openai/gpt-oss-20b", "input": "What are the benefits of open-source models?" }'Parameters
Synchronous — Send a request and receive a complete response
stringrequiredminLength: 1The input text prompt for the model to generate a response.stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.objectbooleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.integerdefault: 256The maximum number of tokens to generate in the response.numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.stringThe generated text response from the modelobjectUsage statistics for the inference requestarrayAn array of tool calls requests made during the response generationStreaming — Send a request with `stream: true` and receive server-sent events
stringrequiredminLength: 1The input text prompt for the model to generate a response.stringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model.objectbooleandefault: falseIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.booleandefault: falseIf true, the response will be streamed back incrementally using SSE, Server Sent Events.integerdefault: 256The maximum number of tokens to generate in the response.numberdefault: 0.6minimum: 0maximum: 5Controls the randomness of the output; higher values produce more random results.numberminimum: 0.001maximum: 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses.integerminimum: 1maximum: 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises.integerminimum: 1maximum: 9999999999Random seed for reproducibility of the generation.numberminimum: 0maximum: 2Penalty for repeated tokens; higher values discourage repetition.numberminimum: -2maximum: 2Decreases the likelihood of the model repeating the same lines verbatim.numberminimum: -2maximum: 2Increases the likelihood of the model introducing new topics.stringStream_OutputServer-Sent Events stream when streaming is enabledtext/event-streambinary