uform-gen2-qwen-500m Beta
Image-to-Text • Unum • HostedUForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.
| Model Info | |
|---|---|
| More information | link ↗ |
| Beta | Yes |
Usage
export interface Env { AI: Ai;}
export default { async fetch(request: Request, env: Env): Promise<Response> { const res = await fetch("https://cataas.com/cat"); const blob = await res.arrayBuffer(); const input = { image: [...new Uint8Array(blob)], prompt: "Generate a caption for this image", max_tokens: 512, }; const response = await env.AI.run( "@cf/unum/uform-gen2-qwen-500m", input ); return new Response(JSON.stringify(response)); },} satisfies ExportedHandler<Env>;Parameters
string