gemma-4-26b-a4b-it
Text Generation • Google • HostedGemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.
| Model Info | |
|---|---|
| Context Window ↗ | 256,000 tokens |
| Terms and License | link ↗ |
| Function calling ↗ | Yes |
| Reasoning | Yes |
| Vision | Yes |
| Unit Pricing | $0.10 per M input tokens, $0.30 per M output tokens |
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ];
const stream = await env.AI.run("@cf/google/gemma-4-26b-a4b-it", { messages, stream: true, });
return new Response(stream, { headers: { "content-type": "text/event-stream" }, }); },} satisfies ExportedHandler<Env>;export interface Env { AI: Ai;}
export default { async fetch(request, env): Promise<Response> {
const messages = [ { role: "system", content: "You are a friendly assistant" }, { role: "user", content: "What is the origin of the phrase Hello, World", }, ]; const response = await env.AI.run("@cf/google/gemma-4-26b-a4b-it", { messages });
return Response.json(response); },} satisfies ExportedHandler<Env>;import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post( f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/google/gemma-4-26b-a4b-it", headers={"Authorization": f"Bearer {AUTH_TOKEN}"}, json={ "messages": [ {"role": "system", "content": "You are a friendly assistant"}, {"role": "user", "content": prompt} ] })result = response.json()print(result)curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/google/gemma-4-26b-a4b-it \ -X POST \ -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \ -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'Parameters
Synchronous — Send a request and receive a complete response
stringrequiredminLength: 1The input text prompt for the model to generate a response.booleandefault: falsestringID of the model to use (e.g. '@cf/zai-org/glm-4.7-flash, etc').objectParameters for audio output. Required when modalities includes 'audio'.number | nullPenalizes new tokens based on their existing frequency in the text so far.object | nullModify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.boolean | nullWhether to return log probabilities of the output tokens.integer | nullHow many top log probabilities to return at each token position (0-20). Requires logprobs=true.integer | nullDeprecated in favor of max_completion_tokens. The maximum number of tokens to generate.integer | nullAn upper bound for the number of tokens that can be generated for a completion.object | nullSet of 16 key-value pairs that can be attached to the object.array | nullOutput types requested from the model (e.g. ['text'] or ['text', 'audio']).integer | nullHow many chat completion choices to generate for each input message.booleandefault: trueWhether to enable parallel function calling during tool use.objectnumber | nullPenalizes new tokens based on whether they appear in the text so far.string | nullConstrains effort on reasoning for reasoning models (o1, o3-mini, etc.).objectone ofSpecifies the format the model must output.integer | nullIf specified, the system will make a best effort to sample deterministically.string | nullSpecifies the processing type used for serving the request.one ofboolean | nullWhether to store the output for model distillation / evals.boolean | nullIf true, partial message deltas will be sent as server-sent events.objectnumber | nullSampling temperature between 0 and 2.one ofControls which (if any) tool is called by the model. 'none' = no tools, 'auto' = model decides, 'required' = must call a tool.arrayA list of tools the model may call.number | nullNucleus sampling: considers the results of the tokens with top_p probability mass.stringA unique identifier representing your end-user, for abuse monitoring.objectOptions for the web search tool (when using built-in web search).one ofarrayminItems: 1maxItems: 128stringA unique identifier for the chat completion.stringintegerUnix timestamp (seconds) of when the completion was created.stringThe model used for the chat completion.arrayminItems: 1objectstring | nullstring | nullStreaming — Send a request with `stream: true` and receive server-sent events
stringrequiredminLength: 1The input text prompt for the model to generate a response.booleandefault: falsestringID of the model to use (e.g. '@cf/zai-org/glm-4.7-flash, etc').objectParameters for audio output. Required when modalities includes 'audio'.number | nullPenalizes new tokens based on their existing frequency in the text so far.object | nullModify the likelihood of specified tokens appearing in the completion. Maps token IDs to bias values from -100 to 100.boolean | nullWhether to return log probabilities of the output tokens.integer | nullHow many top log probabilities to return at each token position (0-20). Requires logprobs=true.integer | nullDeprecated in favor of max_completion_tokens. The maximum number of tokens to generate.integer | nullAn upper bound for the number of tokens that can be generated for a completion.object | nullSet of 16 key-value pairs that can be attached to the object.array | nullOutput types requested from the model (e.g. ['text'] or ['text', 'audio']).integer | nullHow many chat completion choices to generate for each input message.booleandefault: trueWhether to enable parallel function calling during tool use.objectnumber | nullPenalizes new tokens based on whether they appear in the text so far.string | nullConstrains effort on reasoning for reasoning models (o1, o3-mini, etc.).objectone ofSpecifies the format the model must output.integer | nullIf specified, the system will make a best effort to sample deterministically.string | nullSpecifies the processing type used for serving the request.one ofboolean | nullWhether to store the output for model distillation / evals.boolean | nullIf true, partial message deltas will be sent as server-sent events.objectnumber | nullSampling temperature between 0 and 2.one ofControls which (if any) tool is called by the model. 'none' = no tools, 'auto' = model decides, 'required' = must call a tool.arrayA list of tools the model may call.number | nullNucleus sampling: considers the results of the tokens with top_p probability mass.stringA unique identifier representing your end-user, for abuse monitoring.objectOptions for the web search tool (when using built-in web search).one ofarrayminItems: 1maxItems: 128stringtext/event-streambinaryBatch — Send multiple requests in a single API call
arraystringA unique identifier for the chat completion.stringintegerUnix timestamp (seconds) of when the completion was created.stringThe model used for the chat completion.arrayminItems: 1objectstring | nullstring | null