Skip to content
l

llava-1.5-7b-hf Beta

Image-to-Textllava-hfHosted

LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

Model Info
BetaYes

Usage

TypeScript
export interface Env {
AI: Ai;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const res = await fetch("https://cataas.com/cat");
const blob = await res.arrayBuffer();
const input = {
image: [...new Uint8Array(blob)],
prompt: "Generate a caption for this image",
max_tokens: 512,
};
const response = await env.AI.run(
"@cf/llava-hf/llava-1.5-7b-hf",
input
);
return new Response(JSON.stringify(response));
},
} satisfies ExportedHandler<Env>;

Parameters

Option 1
stringformat: binary
Binary string representing the image contents.

API Schemas (Raw)

Input
Output