Skip to content
Unum logo

uform-gen2-qwen-500m Beta

Image-to-TextUnumHosted

UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

Model Info
More informationlink
BetaYes

Usage

TypeScript
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const res = await fetch("https://cataas.com/cat");
const blob = await res.arrayBuffer();
const input = {
image: [...new Uint8Array(blob)],
prompt: "Generate a caption for this image",
max_tokens: 512,
};
const response = await env.AI.run(
"@cf/unum/uform-gen2-qwen-500m",
input
);
return new Response(JSON.stringify(response));
},
} satisfies ExportedHandler<Env>;

Parameters

Option 1
stringformat: binary
Binary string representing the image contents.

API Schemas (Raw)

Input
Output