uform-gen2-qwen-500m Beta

Image-to-Text • Unum

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

Model Info
Deprecated	5/30/2026
More information	link ↗
Beta	Yes

Usage

export interface Env {
  AI: Ai;
}


export default {
  async fetch(request: Request, env: Env): Promise<Response> {
    const res = await fetch("https://cataas.com/cat");
    const blob = await res.arrayBuffer();
    const input = {
      image: [...new Uint8Array(blob)],
      prompt: "Generate a caption for this image",
      max_tokens: 512,
    };
    const response = await env.AI.run(
      "@cf/unum/uform-gen2-qwen-500m",
      input
      );
    return new Response(JSON.stringify(response));
  },
} satisfies ExportedHandler<Env>;

Option 1

stringformat: binary

Binary string representing the image contents.

▶Option 2{}

object

description

string

API Schemas (Raw)

Input

Output