Changelog

New updates and improvements at Cloudflare.

Subscribe to RSS View RSS feeds

Aug 26, 2025

List all vectors in a Vectorize index with the new list-vectors operation
Vectorize
You can now list all vector identifiers in a Vectorize index using the new list-vectors operation. This enables bulk operations, auditing, and data migration workflows through paginated requests that maintain snapshot consistency.

The operation is available via Wrangler CLI and REST API. Refer to the list-vectors best practices guide for detailed usage guidance.

Aug 25, 2025

Manage and deploy your AI provider keys through Bring Your Own Key (BYOK) with AI Gateway, now powered by Cloudflare Secrets Store
Secrets Store AI Gateway SSL/TLS
Cloudflare Secrets Store is now integrated with AI Gateway, allowing you to store, manage, and deploy your AI provider keys in a secure and seamless configuration through Bring Your Own Key ↗. Instead of passing your AI provider keys directly in every request header, you can centrally manage each key with Secrets Store and deploy in your gateway configuration using only a reference, rather than passing the value in plain text.

You can now create a secret directly from your AI Gateway in the dashboard ↗ by navigating into your gateway -> Provider Keys -> Add.

You can also create your secret with the newly available ai_gateway scope via wrangler ↗, the Secrets Store dashboard ↗, or the API ↗.

Then, pass the key in the request header using its Secrets Store reference:
```
curl -X POST https://gateway.ai.cloudflare.com/v1/<ACCOUNT_ID>/my-gateway/anthropic/v1/messages \
 --header 'cf-aig-authorization: ANTHROPIC_KEY_1 \
 --header 'anthropic-version: 2023-06-01' \
 --header 'Content-Type: application/json' \
 --data  '{"model": "claude-3-opus-20240229", "messages": [{"role": "user", "content": "What is Cloudflare?"}]}'
```
Or, using Javascript:
```
import Anthropic from '@anthropic-ai/sdk';


const anthropic = new Anthropic({
 apiKey: "ANTHROPIC_KEY_1",
 baseURL: "https://gateway.ai.cloudflare.com/v1/<ACCOUNT_ID>/my-gateway/anthropic",
});


const message = await anthropic.messages.create({
 model: 'claude-3-opus-20240229',
 messages: [{role: "user", content: "What is Cloudflare?"}],
 max_tokens: 1024
});
```
For more information, check out the blog ↗!

Aug 05, 2025

Agents SDK adds MCP Elicitation support, http-streamable suppport, task queues, email integration and more
Agents Workers
The latest releases of @cloudflare/agents ↗ brings major improvements to MCP transport protocols support and agents connectivity. Key updates include:

MCP elicitation support

MCP servers can now request user input during tool execution, enabling interactive workflows like confirmations, forms, and multi-step processes. This feature uses durable storage to preserve elicitation state even during agent hibernation, ensuring seamless user interactions across agent lifecycle events.
TypeScript
```
// Request user confirmation via elicitation
const confirmation = await this.elicitInput({
  message: `Are you sure you want to increment the counter by ${amount}?`,
  requestedSchema: {
    type: "object",
    properties: {
      confirmed: {
        type: "boolean",
        title: "Confirm increment",
        description: "Check to confirm the increment",
      },
    },
    required: ["confirmed"],
  },
});
```
Check out our demo ↗ to see elicitation in action.

HTTP streamable transport for MCP

MCP now supports HTTP streamable transport which is recommended over SSE. This transport type offers:
- Better performance: More efficient data streaming and reduced overhead
- Improved reliability: Enhanced connection stability and error recover- Automatic fallback: If streamable transport is not available, it gracefully falls back to SSE
TypeScript
```
export default MyMCP.serve("/mcp", {
  binding: "MyMCP",
});
```
The SDK automatically selects the best available transport method, gracefully falling back from streamable-http to SSE when needed.

Enhanced MCP connectivity

Significant improvements to MCP server connections and transport reliability:
- Auto transport selection: Automatically determines the best transport method, falling back from streamable-http to SSE as needed
- Improved error handling: Better connection state management and error reporting for MCP servers
- Reliable prop updates: Centralized agent property updates ensure consistency across different contexts
Lightweight .queue for fast task deferral

You can use .queue() to enqueue background work — ideal for tasks like processing user messages, sending notifications etc.
TypeScript
```
class MyAgent extends Agent {
  doSomethingExpensive(payload) {
    // a long running process that you want to run in the background
  }

  queueSomething() {
    await this.queue("doSomethingExpensive", somePayload); // this will NOT block further execution, and runs in the background
    await this.queue("doSomethingExpensive", someOtherPayload); // the callback will NOT run until the previous callback is complete
    // ... call as many times as you want
  }
}
```
Want to try it yourself? Just define a method like processMessage in your agent, and you’re ready to scale.

New email adapter

Want to build an AI agent that can receive and respond to emails automatically? With the new email adapter and onEmail lifecycle method, now you can.
TypeScript
```
export class EmailAgent extends Agent {
  async onEmail(email: AgentEmail) {
    const raw = await email.getRaw();
    const parsed = await PostalMime.parse(raw);

    // create a response based on the email contents
    // and then send a reply

    await this.replyToEmail(email, {
      fromName: "Email Agent",
      body: `Thanks for your email! You've sent us "${parsed.subject}". We'll process it shortly.`,
    });
  }
}
```
You route incoming mail like this:
TypeScript
```
export default {
  async email(email, env) {
    await routeAgentEmail(email, env, {
      resolver: createAddressBasedEmailResolver("EmailAgent"),
    });
  },
};
```
You can find a full example here ↗.

Automatic context wrapping for custom methods

Custom methods are now automatically wrapped with the agent's context, so calling getCurrentAgent() should work regardless of where in an agent's lifecycle it's called. Previously this would not work on RPC calls, but now just works out of the box.
TypeScript
```
export class MyAgent extends Agent {
  async suggestReply(message) {
    // getCurrentAgent() now correctly works, even when called inside an RPC method
    const { agent } = getCurrentAgent()!;
    return generateText({
      prompt: `Suggest a reply to: "${message}" from "${agent.name}"`,
      tools: [replyWithEmoji],
    });
  }
}
```
Try it out and tell us what you build!

Aug 05, 2025

Cloudflare Sandbox SDK adds streaming, code interpreter, Git support, process control and more
Agents Workers
We’ve shipped a major release for the @cloudflare/sandbox ↗ SDK, turning it into a full-featured, container-based execution platform that runs securely on Cloudflare Workers.

This update adds live streaming of output, persistent Python and JavaScript code interpreters with rich output support (charts, tables, HTML, JSON), file system access, Git operations, full background process control, and the ability to expose running services via public URLs.

This makes it ideal for building AI agents, CI runners, cloud REPLs, data analysis pipelines, or full developer tools — all without managing infrastructure.

Code interpreter (Python, JS, TS)

Create persistent code contexts with support for rich visual + structured outputs.

createCodeContext(options)

Creates a new code execution context with persistent state.
TypeScript
```
// Create a Python context
const pythonCtx = await sandbox.createCodeContext({ language: "python" });

// Create a JavaScript context
const jsCtx = await sandbox.createCodeContext({ language: "javascript" });
```
Options:
- language: Programming language ('python' | 'javascript' | 'typescript')
- cwd: Working directory (default: /workspace)
- envVars: Environment variables for the context
runCode(code, options)

Executes code with optional streaming callbacks.
TypeScript
```
// Simple execution
const execution = await sandbox.runCode('print("Hello World")', {
  context: pythonCtx,
});

// With streaming callbacks
await sandbox.runCode(
  `
for i in range(5):
    print(f"Step {i}")
    time.sleep(1)
`,
  {
    context: pythonCtx,
    onStdout: (output) => console.log("Real-time:", output.text),
    onResult: (result) => console.log("Result:", result),
  },
);
```
Options:
- language: Programming language ('python' | 'javascript' | 'typescript')
- cwd: Working directory (default: /workspace)
- envVars: Environment variables for the context
Real-time streaming output

Returns a streaming response for real-time processing.
TypeScript
```
const stream = await sandbox.runCodeStream(
  "import time; [print(i) for i in range(10)]",
);
// Process the stream as needed
```
Rich output handling

Interpreter outputs are auto-formatted and returned in multiple formats:
- text
- html (e.g., Pandas tables)
- png, svg (e.g., Matplotlib charts)
- json (structured data)
- chart (parsed visualizations)
TypeScript
```
const result = await sandbox.runCode(
  `
import seaborn as sns
import matplotlib.pyplot as plt

data = sns.load_dataset("flights")
pivot = data.pivot("month", "year", "passengers")
sns.heatmap(pivot, annot=True, fmt="d")
plt.title("Flight Passengers")
plt.show()

pivot.to_dict()
`,
  { context: pythonCtx },
);

if (result.png) {
  console.log("Chart output:", result.png);
}
```
Preview URLs from Exposed Ports

Start background processes and expose them with live URLs.
TypeScript
```
await sandbox.startProcess("python -m http.server 8000");
const preview = await sandbox.exposePort(8000);

console.log("Live preview at:", preview.url);
```
Full process lifecycle control

Start, inspect, and terminate long-running background processes.
TypeScript
```
const process = await sandbox.startProcess("node server.js");
console.log(`Started process ${process.id} with PID ${process.pid}`);

// Monitor the process
const logStream = await sandbox.streamProcessLogs(process.id);
for await (const log of parseSSEStream<LogEvent>(logStream)) {
  console.log(`Server: ${log.data}`);
}
```
- listProcesses() - List all running processes
- getProcess(id) - Get detailed process status
- killProcess(id, signal) - Terminate specific processes
- killAllProcesses() - Kill all processes
- streamProcessLogs(id, options) - Stream logs from running processes
- getProcessLogs(id) - Get accumulated process output
Git integration

Clone Git repositories directly into the sandbox.
TypeScript
```
await sandbox.gitCheckout("https://github.com/user/repo", {
  branch: "main",
  targetDir: "my-project",
});
```
Sandboxes are still experimental. We're using them to explore how isolated, container-like workloads might scale on Cloudflare — and to help define the developer experience around them.

Aug 05, 2025

OpenAI open models now available on Workers AI
Agents Workers AI
We're thrilled to be a Day 0 partner with OpenAI ↗ to bring their latest open models ↗ to Workers AI, including support for Responses API, Code Interpreter, and Web Search (coming soon).

Get started with the new models at @cf/openai/gpt-oss-120b and @cf/openai/gpt-oss-20b. Check out the blog ↗ for more details about the new models, and the gpt-oss-120b and gpt-oss-20b model pages for more information about pricing and context windows.

Responses API

If you call the model through:
- Workers Binding, it will accept/return Responses API – env.AI.run(“@cf/openai/gpt-oss-120b”)
- REST API on /run endpoint, it will accept/return Responses API – https://api.cloudflare.com/client/v4/accounts/<account_id>/ai/run/@cf/openai/gpt-oss-120b
- REST API on new /responses endpoint, it will accept/return Responses API – https://api.cloudflare.com/client/v4/accounts/<account_id>/ai/v1/responses
- REST API for OpenAI Compatible endpoint, it will return Chat Completions (coming soon) – https://api.cloudflare.com/client/v4/accounts/<account_id>/ai/v1/chat/completions
```
curl https://api.cloudflare.com/client/v4/accounts/<account_id>/ai/v1/responses \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $CLOUDFLARE_API_KEY" \
  -d '{
    "model": "@cf/openai/gpt-oss-120b",
    "reasoning": {"effort": "medium"},
    "input": [
      {
        "role": "user",
        "content": "What are the benefits of open-source models?"
      }
    ]
  }'
```
Code Interpreter

The model is natively trained to support stateful code execution, and we've implemented support for this feature using our Sandbox SDK ↗ and Containers ↗. Cloudflare's Developer Platform is uniquely positioned to support this feature, so we're very excited to bring our products together to support this new use case.

Web Search (coming soon)

We are working to implement Web Search for the model, where users can bring their own Exa API Key so the model can browse the Internet.

Jul 28, 2025

Introducing pricing for the Browser Rendering API — $0.09 per browser hour

Browser Rendering

We’ve launched pricing for Browser Rendering, including a free tier and a pay-as-you-go model that scales with your needs. Starting August 20, 2025, Cloudflare will begin billing for Browser Rendering.

There are two ways to use Browser Rendering. Depending on the method you use, here’s how billing will work:

REST API: Charged for Duration only ($/browser hour)
Workers Bindings: Charged for both Duration and Concurrency ($/browser hour and # of concurrent browsers)

Included usage and pricing by plan

Plan	Included duration	Included concurrency	Price (beyond included)
Workers Free	10 minutes per day	3 concurrent browsers	N/A
Workers Paid	10 hours per month	10 concurrent browsers (averaged monthly)	1. REST API: $0.09 per additional browser hour 2. Workers Bindings: $0.09 per additional browser hour $2.00 per additional concurrent browser

What you need to know:

Workers Free Plan: 10 minutes of browser usage per day with 3 concurrent browsers at no charge.
Workers Paid Plan: 10 hours of browser usage per month with 10 concurrent browsers (averaged monthly) at no charge. Additional usage is charged as shown above.

You can monitor usage via the Cloudflare dashboard ↗. Go to Compute (Workers) > Browser Rendering.

If you've been using Browser Rendering and do not wish to incur charges, ensure your usage stays within your plan's included usage. To estimate costs, take a look at these example pricing scenarios.

Jul 22, 2025

Browser Rendering now supports local development
Browser Rendering
You can now run your Browser Rendering locally using npx wrangler dev, which spins up a browser directly on your machine before deploying to Cloudflare's global network. By running tests locally, you can quickly develop, debug, and test changes without needing to deploy or worry about usage costs.

Get started with this example guide that shows how to use Cloudflare's fork of Puppeteer (you can also use Playwright) to take screenshots of webpages and store the results in Workers KV.

Jul 08, 2025

Faster indexing and new Jobs view in AutoRAG
AI Search
You can now expect 3-5× faster indexing in AutoRAG, and with it, a brand new Jobs view to help you monitor indexing progress.

With each AutoRAG, indexing jobs are automatically triggered to sync your data source (i.e. R2 bucket) with your Vectorize index, ensuring new or updated files are reflected in your query results. You can also trigger jobs manually via the Sync API or by clicking “Sync index” in the dashboard.

With the new jobs observability, you can now:
- View the status, job ID, source, start time, duration and last sync time for each indexing job
- Inspect real-time logs of job events (e.g. Starting indexing data source...)
- See a history of past indexing jobs under the Jobs tab of your AutoRAG
This makes it easier to understand what’s happening behind the scenes.

Coming soon: We’re adding APIs to programmatically check indexing status, making it even easier to integrate AutoRAG into your workflows.

Try it out today on the Cloudflare dashboard ↗.

Jul 01, 2025

Introducing Pay Per Crawl (private beta)
AI Crawl Control
We are introducing a new feature of AI Crawl Control — Pay Per Crawl. Pay Per Crawl enables site owners to require payment from AI crawlers every time the crawlers access their content, thereby fostering a fairer Internet by enabling site owners to control and monetize how their content gets used by AI.

For Site Owners:
- Set pricing and select which crawlers to charge for content access
- Manage payments via Stripe
- Monitor analytics on successful content deliveries
For AI Crawler Owners:
- Use HTTP headers to request and accept pricing
- Receive clear confirmations on charges for accessed content
Learn more in the Pay Per Crawl documentation.

Jul 01, 2025

AI Crawl Control refresh
AI Crawl Control
We redesigned the AI Crawl Control dashboard to provide more intuitive and granular control over AI crawlers.
- From the new AI Crawlers tab: block specific AI crawlers.
- From the new Metrics tab: view AI Crawl Control metrics.
To get started, explore:
- Manage AI crawlers.
- Analyze AI traffic.

Jun 25, 2025

Run AI-generated code on-demand with Code Sandboxes (new)
Agents Workers Workflows
AI is supercharging app development for everyone, but we need a safe way to run untrusted, LLM-written code. We’re introducing Sandboxes ↗, which let your Worker run actual processes in a secure, container-based environment.
TypeScript
```
import { getSandbox } from "@cloudflare/sandbox";
export { Sandbox } from "@cloudflare/sandbox";

export default {
  async fetch(request: Request, env: Env) {
    const sandbox = getSandbox(env.Sandbox, "my-sandbox");
    return sandbox.exec("ls", ["-la"]);
  },
};
```
Methods
- exec(command: string, args: string[], options?: { stream?: boolean }):Execute a command in the sandbox.
- gitCheckout(repoUrl: string, options: { branch?: string; targetDir?: string; stream?: boolean }): Checkout a git repository in the sandbox.
- mkdir(path: string, options: { recursive?: boolean; stream?: boolean }): Create a directory in the sandbox.
- writeFile(path: string, content: string, options: { encoding?: string; stream?: boolean }): Write content to a file in the sandbox.
- readFile(path: string, options: { encoding?: string; stream?: boolean }): Read content from a file in the sandbox.
- deleteFile(path: string, options?: { stream?: boolean }): Delete a file from the sandbox.
- renameFile(oldPath: string, newPath: string, options?: { stream?: boolean }): Rename a file in the sandbox.
- moveFile(sourcePath: string, destinationPath: string, options?: { stream?: boolean }): Move a file from one location to another in the sandbox.
- ping(): Ping the sandbox.
Sandboxes are still experimental. We're using them to explore how isolated, container-like workloads might scale on Cloudflare — and to help define the developer experience around them.

You can try it today from your Worker, with just a few lines of code. Let us know what you build.

Jun 19, 2025

View custom metadata in responses and guide AI-search with context in AutoRAG
AI Search
In AutoRAG, you can now view your object's custom metadata in the response from /search and /ai-search, and optionally add a context field in the custom metadata of an object to provide additional guidance for AI-generated answers.

You can add custom metadata to an object when uploading it to your R2 bucket.

Object's custom metadata in search responses

When you run a search, AutoRAG now returns any custom metadata associated with the object. This metadata appears in the response inside attributes then file , and can be used for downstream processing.

For example, the attributes section of your search response may look like:
```
{
  "attributes": {
    "timestamp": 1750001460000,
    "folder": "docs/",
    "filename": "launch-checklist.md",
    "file": {
      "url": "https://wiki.company.com/docs/launch-checklist",
      "context": "A checklist for internal launch readiness, including legal, engineering, and marketing steps."
    }
  }
}
```
Add a context field to guide LLM answers

When you include a custom metadata field named context, AutoRAG attaches that value to each chunk of the file. When you run an /ai-search query, this context is passed to the LLM and can be used as additional input when generating an answer.

We recommend using the context field to describe supplemental information you want the LLM to consider, such as a summary of the document or a source URL. If you have several different metadata attributes, you can join them together however you choose within the context string.

For example:
```
{
  "context": "summary: 'Checklist for internal product launch readiness, including legal, engineering, and marketing steps.'; url: 'https://wiki.company.com/docs/launch-checklist'"
}
```
This gives you more control over how your content is interpreted, without requiring you to modify the original contents of the file.

Learn more in AutoRAG's metadata filtering documentation.

Jun 19, 2025

Filter your AutoRAG search by file name
AI Search
In AutoRAG, you can now filter by an object's file name using the filename attribute, giving you more control over which files are searched for a given query.

This is useful when your application has already determined which files should be searched. For example, you might query a PostgreSQL database to get a list of files a user has access to based on their permissions, and then use that list to limit what AutoRAG retrieves.

For example, your search query may look like:
JavaScript
```
const response = await env.AI.autorag("my-autorag").search({
  query: "what is the project deadline?",
  filters: {
    type: "eq",
    key: "filename",
    value: "project-alpha-roadmap.md",
  },
});
```
This allows you to connect your application logic with AutoRAG's retrieval process, making it easy to control what gets searched without needing to reindex or modify your data.

Learn more in AutoRAG's metadata filtering documentation.

Jun 03, 2025

AI Gateway adds OpenAI compatible endpoint
AI Gateway
Users can now use an OpenAI Compatible endpoint in AI Gateway to easily switch between providers, while keeping the exact same request and response formats. We're launching now with the chat completions endpoint, with the embeddings endpoint coming up next.

To get started, use the OpenAI compatible chat completions endpoint URL with your own account id and gateway id and switch between providers by changing the model and apiKey parameters.
OpenAI SDK Example
```
import OpenAI from "openai";
const client = new OpenAI({
  apiKey: "YOUR_PROVIDER_API_KEY", // Provider API key
  baseURL:
    "https://gateway.ai.cloudflare.com/v1/{account_id}/{gateway_id}/compat",
});

const response = await client.chat.completions.create({
  model: "google-ai-studio/gemini-2.0-flash",
  messages: [{ role: "user", content: "What is Cloudflare?" }],
});

console.log(response.choices[0].message.content);
```
Additionally, the OpenAI Compatible endpoint can be combined with our Universal Endpoint to add fallbacks across multiple providers. That means AI Gateway will return every response in the same standardized format, no extra parsing logic required!

Learn more in the OpenAI Compatibility documentation.

May 28, 2025

Playwright MCP server is now compatible with Browser Rendering
Browser Rendering
We're excited to share that you can now use the Playwright MCP ↗ server with Browser Rendering.

Once you deploy the server, you can use any MCP client with it to interact with Browser Rendering. This allows you to run AI models that can automate browser tasks, such as taking screenshots, filling out forms, or scraping data.

Playwright MCP is available as an npm package at @cloudflare/playwright-mcp ↗. To install it, type:
- npm
- yarn
- pnpm
Terminal window
npm i -D @cloudflare/playwright-mcp
Terminal window
yarn add -D @cloudflare/playwright-mcp
Terminal window
pnpm add -D @cloudflare/playwright-mcp
Deploying the server is then as easy as:
TypeScript
```
import { env } from "cloudflare:workers";
import { createMcpAgent } from "@cloudflare/playwright-mcp";

export const PlaywrightMCP = createMcpAgent(env.BROWSER);
export default PlaywrightMCP.mount("/sse");
```
Check out the full code at GitHub ↗.

Learn more about Playwright MCP in our documentation.

Apr 23, 2025

Metadata filtering and multitenancy support in AutoRAG
AI Search
You can now filter AutoRAG search results by folder and timestamp using metadata filtering to narrow down the scope of your query.

This makes it easy to build multitenant experiences where each user can only access their own data. By organizing your content into per-tenant folders and applying a folder filter at query time, you ensure that each tenant retrieves only their own documents.

Example folder structure:
Terminal window
```
customer-a/logs/
customer-a/contracts/
customer-b/contracts/
```
Example query:
JavaScript
```
const response = await env.AI.autorag("my-autorag").search({
  query: "When did I sign my agreement contract?",
  filters: {
    type: "eq",
    key: "folder",
    value: "customer-a/contracts/",
  },
});
```
You can use metadata filtering by creating a new AutoRAG or reindexing existing data. To reindex all content in an existing AutoRAG, update any chunking setting and select Sync index. Metadata filtering is available for all data indexed on or after April 21, 2025.

If you are new to AutoRAG, get started with the Get started AutoRAG guide.

Apr 11, 2025

Workers AI for Developer Week - faster inference, new models, async batch API, expanded LoRA support
Workers AI
Happy Developer Week 2025! Workers AI is excited to announce a couple of new features and improvements available today. Check out our blog ↗ for all the announcement details.

Faster inference + New models

We’re rolling out some in-place improvements to our models that can help speed up inference by 2-4x! Users of the models below will enjoy an automatic speed boost starting today:
- @cf/meta/llama-3.3-70b-instruct-fp8-fast gets a speed boost of 2-4x, leveraging techniques like speculative decoding, prefix caching, and an updated inference backend.
- @cf/baai/bge-small-en-v1.5, @cf/baai/bge-base-en-v1.5, @cf/baai/bge-large-en-v1.5 get an updated back end, which should improve inference times by 2x.
  - With the bge models, we’re also announcing a new parameter called pooling which can take cls or mean as options. We highly recommend using pooling: cls which will help generate more accurate embeddings. However, embeddings generated with cls pooling are not backwards compatible with mean pooling. For this to not be a breaking change, the default remains as mean pooling. Please specify pooling: cls to enjoy more accurate embeddings going forward.
We’re also excited to launch a few new models in our catalog to help round out your experience with Workers AI. We’ll be deprecating some older models in the future, so stay tuned for a deprecation announcement. Today’s new models include:
- @cf/mistralai/mistral-small-3.1-24b-instruct: a 24B parameter model achieving state-of-the-art capabilities comparable to larger models, with support for vision and tool calling.
- @cf/google/gemma-3-12b-it: well-suited for a variety of text generation and image understanding tasks, including question answering, summarization and reasoning, with a 128K context window, and multilingual support in over 140 languages.
- @cf/qwen/qwq-32b: a medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.
- @cf/qwen/qwen2.5-coder-32b-instruct: the current state-of-the-art open-source code LLM, with its coding abilities matching those of GPT-4o.
Batch Inference

Introducing a new batch inference feature that allows you to send us an array of requests, which we will fulfill as fast as possible and send them back as an array. This is really helpful for large workloads such as summarization, embeddings, etc. where you don’t have a human-in-the-loop. Using the batch API will guarantee that your requests are fulfilled eventually, rather than erroring out if we don’t have enough capacity at a given time.

Check out the tutorial to get started! Models that support batch inference today include:
Expanded LoRA support

We’ve upgraded our LoRA experience to include 8 newer models, and can support ranks of up to 32 with a 300MB safetensors file limit (previously limited to rank of 8 and 100MB safetensors) Check out our LoRAs page to get started. Models that support LoRAs now include:

Apr 07, 2025

Build MCP servers with the Agents SDK
Agents Workers
The Agents SDK now includes built-in support for building remote MCP (Model Context Protocol) servers directly as part of your Agent. This allows you to easily create and manage MCP servers, without the need for additional infrastructure or configuration.

The SDK includes a new MCPAgent class that extends the Agent class and allows you to expose resources and tools over the MCP protocol, as well as authorization and authentication to enable remote MCP servers.
- JavaScript
- TypeScript
JavaScript
export class MyMCP extends McpAgent { server = new McpServer({ name: "Demo", version: "1.0.0", }); async init() { this.server.resource(`counter`, `mcp://resource/counter`, (uri) => { // ... }); this.server.tool( "add", "Add two numbers together", { a: z.number(), b: z.number() }, async ({ a, b }) => { // ... }, ); } }
TypeScript
export class MyMCP extends McpAgent<Env> { server = new McpServer({ name: "Demo", version: "1.0.0", }); async init() { this.server.resource(`counter`, `mcp://resource/counter`, (uri) => { // ... }); this.server.tool( "add", "Add two numbers together", { a: z.number(), b: z.number() }, async ({ a, b }) => { // ... }, ); } }
See the example ↗ for the full code and as the basis for building your own MCP servers, and the client example ↗ for how to build an Agent that acts as an MCP client.

To learn more, review the announcement blog ↗ as part of Developer Week 2025.

Agents SDK updates

We've made a number of improvements to the Agents SDK, including:
- Support for building MCP servers with the new MCPAgent class.
- The ability to export the current agent, request and WebSocket connection context using import { context } from "agents", allowing you to minimize or avoid direct dependency injection when calling tools.
- Fixed a bug that prevented query parameters from being sent to the Agent server from the useAgent React hook.
- Automatically converting the agent name in useAgent or useAgentChat to kebab-case to ensure it matches the naming convention expected by routeAgentRequest.
To install or update the Agents SDK, run npm i agents@latest in an existing project, or explore the agents-starter project:
Terminal window
```
npm create cloudflare@latest -- --template cloudflare/agents-starter
```
See the full release notes and changelog on the Agents SDK repository ↗ and

Apr 07, 2025

Create fully-managed RAG pipelines for your AI applications with AutoRAG
AI Search Vectorize
AutoRAG is now in open beta, making it easy for you to build fully-managed retrieval-augmented generation (RAG) pipelines without managing infrastructure. Just upload your docs to R2, and AutoRAG handles the rest: embeddings, indexing, retrieval, and response generation via API.

With AutoRAG, you can:
- Customize your pipeline: Choose from Workers AI models, configure chunking strategies, edit system prompts, and more.
- Instant setup: AutoRAG provisions everything you need from Vectorize, AI gateway, to pipeline logic for you, so you can go from zero to a working RAG pipeline in seconds.
- Keep your index fresh: AutoRAG continuously syncs your index with your data source to ensure responses stay accurate and up to date.
- Ask questions: Query your data and receive grounded responses via a Workers binding or API.
Whether you're building internal tools, AI-powered search, or a support assistant, AutoRAG gets you from idea to deployment in minutes.

Get started in the Cloudflare dashboard ↗ or check out the guide for instructions on how to build your RAG pipeline today.

Apr 07, 2025

Browser Rendering REST API is Generally Available, with new endpoints and a free tier
Browser Rendering
We’re excited to announce Browser Rendering is now available on the Workers Free plan ↗, making it even easier to prototype and experiment with web search and headless browser use-cases when building applications on Workers.

The Browser Rendering REST API is now Generally Available, allowing you to control browser instances from outside of Workers applications. We've added three new endpoints to help automate more browser tasks:
- Extract structured data – Use /json to retrieve structured data from a webpage.
- Retrieve links – Use /links to pull all links from a webpage.
- Convert to Markdown – Use /markdown to convert webpage content into Markdown format.
For example, to fetch the Markdown representation of a webpage:
Markdown example
```
curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/markdown' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <apiToken>' \
  -d '{
    "url": "https://example.com"
  }'
```
For the full list of endpoints, check out our REST API documentation. You can also interact with Browser Rendering via the Cloudflare TypeScript SDK ↗.

We also recently landed support for Playwright in Browser Rendering for browser automation from Cloudflare Workers, in addition to Puppeteer, giving you more flexibility to test across different browser environments.

Visit the Browser Rendering docs to learn more about how to use headless browsers in your applications.

Apr 04, 2025

Playwright for Browser Rendering now available

Browser Rendering

We're excited to share that you can now use Playwright's browser automation capabilities ↗ from Cloudflare Workers.

Playwright ↗ is an open-source package developed by Microsoft that can do browser automation tasks; it's commonly used to write software tests, debug applications, create screenshots, and crawl pages. Like Puppeteer, we forked ↗ Playwright and modified it to be compatible with Cloudflare Workers and Browser Rendering ↗.

Below is an example of how to use Playwright with Browser Rendering to test a TODO application using assertions:

import { launch, type BrowserWorker } from "@cloudflare/playwright";
import { expect } from "@cloudflare/playwright/test";

interface Env {
  MYBROWSER: BrowserWorker;
}

export default {
  async fetch(request: Request, env: Env) {
    const browser = await launch(env.MYBROWSER);
    const page = await browser.newPage();

    await page.goto("https://demo.playwright.dev/todomvc");

    const TODO_ITEMS = [
      "buy some cheese",
      "feed the cat",
      "book a doctors appointment",
    ];

    const newTodo = page.getByPlaceholder("What needs to be done?");
    for (const item of TODO_ITEMS) {
      await newTodo.fill(item);
      await newTodo.press("Enter");
    }

    await expect(page.getByTestId("todo-title")).toHaveCount(TODO_ITEMS.length);

    await Promise.all(
      TODO_ITEMS.map((value, index) =>
        expect(page.getByTestId("todo-title").nth(index)).toHaveText(value),
      ),
    );
  },
};

Playwright is available as an npm package at @cloudflare/playwright ↗ and the code is at GitHub ↗.

Learn more in our documentation.

Mar 21, 2025

AI Gateway launches Realtime WebSockets API
AI Gateway
We are excited to announce that AI Gateway now supports real-time AI interactions with the new Realtime WebSockets API.

This new capability allows developers to establish persistent, low-latency connections between their applications and AI models, enabling natural, real-time conversational AI experiences, including speech-to-speech interactions.

The Realtime WebSockets API works with the OpenAI Realtime API ↗, Google Gemini Live API ↗, and supports real-time text and speech interactions with models from Cartesia ↗, and ElevenLabs ↗.

Here's how you can connect AI Gateway to OpenAI's Realtime API ↗ using WebSockets:
OpenAI Realtime API example
```
import WebSocket from "ws";

const url =
  "wss://gateway.ai.cloudflare.com/v1/<account_id>/<gateway>/openai?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
  headers: {
    "cf-aig-authorization": process.env.CLOUDFLARE_API_KEY,
    Authorization: "Bearer " + process.env.OPENAI_API_KEY,
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", () => console.log("Connected to server."));
ws.on("message", (message) => console.log(JSON.parse(message.toString())));

ws.send(
  JSON.stringify({
    type: "response.create",
    response: { modalities: ["text"], instructions: "Tell me a joke" },
  }),
);
```
Get started by checking out the Realtime WebSockets API documentation.

Mar 20, 2025

Markdown conversion in Workers AI

Workers AI

Document conversion plays an important role when designing and developing AI applications and agents. Workers AI now provides the toMarkdown utility method that developers can use to for quick, easy, and convenient conversion and summary of documents in multiple formats to Markdown language.

You can call this new tool using a binding by calling env.AI.toMarkdown() or the using the REST API endpoint.

In this example, we fetch a PDF document and an image from R2 and feed them both to env.AI.toMarkdown(). The result is a list of converted documents. Workers AI models are used automatically to detect and summarize the image.

import { Env } from "./env";

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext) {
    // https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/somatosensory.pdf
    const pdf = await env.R2.get("somatosensory.pdf");

    // https://pub-979cb28270cc461d94bc8a169d8f389d.r2.dev/cat.jpeg
    const cat = await env.R2.get("cat.jpeg");

    return Response.json(
      await env.AI.toMarkdown([
        {
          name: "somatosensory.pdf",
          blob: new Blob([await pdf.arrayBuffer()], {
            type: "application/octet-stream",
          }),
        },
        {
          name: "cat.jpeg",
          blob: new Blob([await cat.arrayBuffer()], {
            type: "application/octet-stream",
          }),
        },
      ]),
    );
  },
};

This is the result:

[
  {
    "name": "somatosensory.pdf",
    "mimeType": "application/pdf",
    "format": "markdown",
    "tokens": 0,
    "data": "# somatosensory.pdf\n## Metadata\n- PDFFormatVersion=1.4\n- IsLinearized=false\n- IsAcroFormPresent=false\n- IsXFAPresent=false\n- IsCollectionPresent=false\n- IsSignaturesPresent=false\n- Producer=Prince 20150210 (www.princexml.com)\n- Title=Anatomy of the Somatosensory System\n\n## Contents\n### Page 1\nThis is a sample document to showcase..."
  },
  {
    "name": "cat.jpeg",
    "mimeType": "image/jpeg",
    "format": "markdown",
    "tokens": 0,
    "data": "The image is a close-up photograph of Grumpy Cat, a cat with a distinctive grumpy expression and piercing blue eyes. The cat has a brown face with a white stripe down its nose, and its ears are pointed upright. Its fur is light brown and darker around the face, with a pink nose and mouth. The cat's eyes are blue and slanted downward, giving it a perpetually grumpy appearance. The background is blurred, but it appears to be a dark brown color. Overall, the image is a humorous and iconic representation of the popular internet meme character, Grumpy Cat. The cat's facial expression and posture convey a sense of displeasure or annoyance, making it a relatable and entertaining image for many people."
  }
]

See Markdown Conversion for more information on supported formats, REST API and pricing.

Mar 18, 2025

npm i agents
Agents Workers

agents-sdk -> agents Updated

📝 We've renamed the Agents package to agents!

If you've already been building with the Agents SDK, you can update your dependencies to use the new package name, and replace references to agents-sdk with agents:
Terminal window
```
# Install the new package
npm i agents
```
Terminal window
```
# Remove the old (deprecated) package
npm uninstall agents-sdk

# Find instances of the old package name in your codebase
grep -r 'agents-sdk' .
# Replace instances of the old package name with the new one
# (or use find-replace in your editor)
sed -i 's/agents-sdk/agents/g' $(grep -rl 'agents-sdk' .)
```
All future updates will be pushed to the new agents package, and the older package has been marked as deprecated.

Agents SDK updates New

We've added a number of big new features to the Agents SDK over the past few weeks, including:
- You can now set cors: true when using routeAgentRequest to return permissive default CORS headers to Agent responses.
- The regular client now syncs state on the agent (just like the React version).
- useAgentChat bug fixes for passing headers/credentials, including properly clearing cache on unmount.
- Experimental /schedule module with a prompt/schema for adding scheduling to your app (with evals!).
- Changed the internal zod schema to be compatible with the limitations of Google's Gemini models by removing the discriminated union, allowing you to use Gemini models with the scheduling API.
We've also fixed a number of bugs with state synchronization and the React hooks.
- JavaScript
- TypeScript
JavaScript
// via https://github.com/cloudflare/agents/tree/main/examples/cross-domain export default { async fetch(request, env) { return ( // Set { cors: true } to enable CORS headers. (await routeAgentRequest(request, env, { cors: true })) || new Response("Not found", { status: 404 }) ); }, };
TypeScript
// via https://github.com/cloudflare/agents/tree/main/examples/cross-domain export default { async fetch(request: Request, env: Env) { return ( // Set { cors: true } to enable CORS headers. (await routeAgentRequest(request, env, { cors: true })) || new Response("Not found", { status: 404 }) ); }, } satisfies ExportedHandler<Env>;
Call Agent methods from your client code New

We've added a new @unstable_callable() decorator for defining methods that can be called directly from clients. This allows you call methods from within your client code: you can call methods (with arguments) and get native JavaScript objects back.
- JavaScript
- TypeScript
JavaScript
// server.ts import { unstable_callable, Agent } from "agents"; export class Rpc extends Agent { // Use the decorator to define a callable method @unstable_callable({ description: "rpc test", }) async getHistory() { return this.sql`SELECT * FROM history ORDER BY created_at DESC LIMIT 10`; } }
TypeScript
// server.ts import { unstable_callable, Agent, type StreamingResponse } from "agents"; import type { Env } from "../server"; export class Rpc extends Agent<Env> { // Use the decorator to define a callable method @unstable_callable({ description: "rpc test", }) async getHistory() { return this.sql`SELECT * FROM history ORDER BY created_at DESC LIMIT 10`; } }
agents-starter Updated

We've fixed a number of small bugs in the agents-starter ↗ project — a real-time, chat-based example application with tool-calling & human-in-the-loop built using the Agents SDK. The starter has also been upgraded to use the latest wrangler v4 release.

If you're new to Agents, you can install and run the agents-starter project in two commands:
Terminal window
```
# Install it
$ npm create cloudflare@latest agents-starter -- --template="cloudflare/agents-starter"
# Run it
$ npm run start
```
You can use the starter as a template for your own Agents projects: open up src/server.ts and src/client.tsx to see how the Agents SDK is used.

More documentation Updated

We've heard your feedback on the Agents SDK documentation, and we're shipping more API reference material and usage examples, including:
- Expanded API reference documentation, covering the methods and properties exposed by the Agents SDK, as well as more usage examples.
- More Client API documentation that documents useAgent, useAgentChat and the new @unstable_callable RPC decorator exposed by the SDK.
- New documentation on how to route requests to agents and (optionally) authenticate clients before they connect to your Agents.
Note that the Agents SDK is continually growing: the type definitions included in the SDK will always include the latest APIs exposed by the agents package.

If you're still wondering what Agents are, read our blog on building AI Agents on Cloudflare ↗ and/or visit the Agents documentation to learn more.

Mar 17, 2025

New models in Workers AI
Workers AI
Workers AI is excited to add 4 new models to the catalog, including 2 brand new classes of models with a text-to-speech and reranker model. Introducing:
- @cf/baai/bge-m3 - a multi-lingual embeddings model that supports over 100 languages. It can also simultaneously perform dense retrieval, multi-vector retrieval, and sparse retrieval, with the ability to process inputs of different granularities.
- @cf/baai/bge-reranker-base - our first reranker model! Rerankers are a type of text classification model that takes a query and context, and outputs a similarity score between the two. When used in RAG systems, you can use a reranker after the initial vector search to find the most relevant documents to return to a user by reranking the outputs.
- @cf/openai/whisper-large-v3-turbo - a faster, more accurate speech-to-text model. This model was added earlier but is graduating out of beta with pricing included today.
- @cf/myshell-ai/melotts - our first text-to-speech model that allows users to generate an MP3 with voice audio from inputted text.
Pricing is available for each of these models on the Workers AI pricing page.

This docs update includes a few minor bug fixes to the model schema for llama-guard, llama-3.2-1b, which you can review on the product changelog.

Try it out and let us know what you think! Stay tuned for more models in the coming days.

Search all changelog entries

Changelog

MCP elicitation support

HTTP streamable transport for MCP

Enhanced MCP connectivity

Lightweight .queue for fast task deferral

New email adapter

Automatic context wrapping for custom methods

Code interpreter (Python, JS, TS)

createCodeContext(options)

runCode(code, options)

Real-time streaming output

Rich output handling

Preview URLs from Exposed Ports

Full process lifecycle control

Git integration

Responses API

Code Interpreter

Web Search (coming soon)

Methods

Object's custom metadata in search responses

Add a context field to guide LLM answers

Faster inference + New models

Batch Inference

Expanded LoRA support

Agents SDK updates

agents-sdk -> agents Updated

Agents SDK updates New

Call Agent methods from your client code New

agents-starter Updated

More documentation Updated

Add a `context` field to guide LLM answers

`agents-sdk` -> `agents` Updated