Skip to main content

Building AI-Powered Image Generation with OpenAI-Compatible Responses API

· 10 min read
Karolis Gudiškis
Karolis Gudiškis

Introduction

The Responses API represents a powerful evolution in how we interact with large language models. Unlike traditional chat completion APIs that return simple text responses, the Responses API enables structured, multi-step workflows that can orchestrate multiple tools and produce rich, multi-modal outputs.

In this article, we'll explore how to build an AI-powered application that combines web search and image generation capabilities.

Source Code: The complete example is available on GitHub.

Documentation: For comprehensive Responses API documentation, see the Responses API guide and Image Generation guide.

Understanding the Responses API

The Responses API is a more powerful alternative to the traditional Completions API. It enables structured, multi-step workflows with support for multiple built-in tools like web search and image generation, producing rich, multi-modal outputs that can be easily processed programmatically.

Prerequisites and Setup

Before we dive into the code, let's ensure we have everything we need.

Required Dependencies

Our example requires the following Rust crates:

  • vllora_llm - The Vllora LLM client library
  • async-openai-compat - OpenAI-compatible type definitions (version 0.30.1)
  • base64 - For decoding base64-encoded images (version 0.22)
  • tokio - Async runtime (version 1.x with full features)
  • serde_json - JSON serialization support

Cargo.toml Configuration

Here's the complete Cargo.toml for our example:

[package]
name = "responses_image_generation_example"
version = "0.1.0"
edition = "2021"

[workspace]

[dependencies]
vllora_llm = "0.1.17"

tokio = { version = "1", features = ["full"] }
serde_json = "1.0"
base64 = "0.22"

Environment Setup

You'll need to set your API key as an environment variable:

export VLLORA_OPENAI_API_KEY="your-api-key-here"

Note: Make sure to keep your API key secure. Never commit it to version control or expose it in client-side code.

Building the Request

Now let's construct our Responses API request. We'll create a request that uses both web search and image generation tools.

Creating the CreateResponse Structure

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;

let responses_req = CreateResponse {
model: Some("gpt-4.1".to_string()),
input: InputParam::Text(
"Search for the latest news from today and generate an image about it".to_string(),
),
tools: Some(vec![
Tool::WebSearch(WebSearchTool::default()),
Tool::ImageGeneration(ImageGenTool::default()),
]),
..Default::default()
};

Understanding the Components

Model Selection - We're using "gpt-4.1", which supports the Responses API and tool calling. Make sure to use a model that supports these features.

Input Parameter - We use InputParam::Text to provide a simple text prompt. The model will:

  1. First use the web search tool to find current news
  2. Then use the image generation tool to create an image related to that news

Tool Configuration - We specify two tools:

  • WebSearchTool::default() - Uses default web search configuration
  • ImageGenTool::default() - Uses default image generation settings

The ..Default::default() ensures all other fields use their default values, which is a common Rust pattern for struct initialization.

Initializing the Client

Next, we need to set up the Vllora LLM client with our credentials.

Client Configuration

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;

let client = VlloraLLMClient::default()
.with_credentials(Credentials::ApiKey(ApiKeyCredentials {
api_key: std::env::var("VLLORA_OPENAI_API_KEY")
.expect("VLLORA_OPENAI_API_KEY must be set"),
}));

Credential Management

The client uses a builder pattern for configuration. Here we:

  1. Start with VlloraLLMClient::default() for default settings
  2. Chain .with_credentials() to provide authentication
  3. Use Credentials::ApiKey() with ApiKeyCredentials for API key authentication
  4. Read the API key from the environment variable

Tip: In production, consider using a more robust error handling approach instead of .expect(), such as returning a Result or using a configuration management library.

Sending the Request and Handling Responses

Now let's send our request and see what we get back.

Making the API Call

use vllora_llm::error::LLMResult;

println!("Sending request with tools: web_search_preview and image_generation");
let response = client.responses().create(responses_req).await?;

The client.responses().create() method:

  • Returns a Result<Response, LLMError>
  • Is async, so we use .await
  • The ? operator propagates errors up the call stack

Understanding the Response Structure

The Response struct contains an output field, which is a vector of OutputItem variants. Each item represents a different type of output from the API:

  • Text messages from the model
  • Image generation results
  • Web search results
  • Other tool outputs

Processing Text Messages

Let's see how to extract and display text content from the response.

Matching Message Outputs

use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;

for (index, output) in response.output.iter().enumerate() {
match output {
OutputItem::Message(message) => {
println!("\n[Message {}]", index);
println!("{}", "-".repeat(80));

for content in &message.content {
match content {
OutputMessageContent::OutputText(text_output) => {
// Print the text content
println!("\n{}", text_output.text);

// Print sources/annotations if available
if !text_output.annotations.is_empty() {
println!("Annotations: {:#?}", text_output.annotations);
}
}
_ => {
println!("Other content type: {:?}", content);
}
}
}
println!("\n{}", "=".repeat(80));
}
// ... handle other output types
}
}

Understanding Message Content

Message Structure - Each Message contains a content vector that can hold different content types:

  • OutputText - The actual text response
  • Other content types for different media

Annotations - Text outputs can include annotations which provide:

  • Citations and sources (especially useful with web search)
  • References to tool calls
  • Additional metadata

These annotations are particularly valuable when using web search tools, as they show where the information came from.

Handling Image Generation Results

This is the core focus of our example - extracting and saving generated images.

Understanding ImageGenToolCall

When the model uses the image generation tool, the response includes OutputItem::ImageGenerationCall variants. Each call contains:

  • A result field with the base64-encoded image data
  • Metadata about the generation

Decoding and Saving Images

Here's our complete image handling function:

use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;

/// Decodes a base64-encoded image from an ImageGenerationCall and saves it to a file.
///
/// # Arguments
/// * `image_generation_call` - The image generation call containing the base64-encoded image
/// * `index` - The index to use in the filename
///
/// # Returns
/// * `Ok(filename)` - The filename where the image was saved
/// * `Err(e)` - An error if the call has no result, decoding fails, or file writing fails
fn decode_and_save_image(
image_generation_call: &ImageGenToolCall,
index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
// Extract base64 image from the call
let base64_image = image_generation_call
.result
.as_ref()
.ok_or("Image generation call has no result")?;

// Decode base64 image
let image_data = STANDARD.decode(base64_image)?;

// Save to file
let filename = format!("generated_image_{}.png", index);
fs::write(&filename, image_data)?;

Ok(filename)
}

Step-by-Step Breakdown

  1. Extract Base64 Data - We access the result field, which is an Option<String>. We use .ok_or() to convert None into an error if the result is missing.

  2. Decode Base64 - The base64 crate's STANDARD engine decodes the base64 string into raw bytes. This can fail if the string is malformed, so we use ? to propagate errors.

  3. Save to File - We use Rust's standard library fs::write() to save the decoded bytes to a file. We name it generated_image_{index}.png to avoid conflicts when multiple images are generated.

  4. Return Filename - We return the filename so the caller knows where the image was saved.

Using the Function

Here's how we integrate this into our response processing:

OutputItem::ImageGenerationCall(image_generation_call) => {
println!("\n[Image Generation Call {}]", index);
match decode_and_save_image(image_generation_call, index) {
Ok(filename) => {
println!("✓ Successfully saved image to: {}", filename);
}
Err(e) => {
eprintln!("✗ Failed to decode/save image: {}", e);
}
}
}

We match on OutputItem::ImageGenerationCall, extract the call, and pass it to our decoding function. We handle both success and error cases gracefully.

Complete Example Walkthrough

Let's put it all together and see the complete flow:

Complete Source Code

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;

use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::error::LLMResult;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;

fn decode_and_save_image(
image_generation_call: &ImageGenToolCall,
index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
let base64_image = image_generation_call
.result
.as_ref()
.ok_or("Image generation call has no result")?;

let image_data = STANDARD.decode(base64_image)?;
let filename = format!("generated_image_{}.png", index);
fs::write(&filename, image_data)?;

Ok(filename)
}

#[tokio::main]
async fn main() -> LLMResult<()> {
// 1) Build a Responses-style request using async-openai-compat types
// with tools for web_search_preview and image_generation
let responses_req = CreateResponse {
model: Some("gpt-4.1".to_string()),
input: InputParam::Text(
"Search for the latest news from today and generate an image about it".to_string(),
),
tools: Some(vec![
Tool::WebSearch(WebSearchTool::default()),
Tool::ImageGeneration(ImageGenTool::default()),
]),
..Default::default()
};

// 2) Construct a VlloraLLMClient
let client =
VlloraLLMClient::default().with_credentials(Credentials::ApiKey(ApiKeyCredentials {
api_key: std::env::var("VLLORA_OPENAI_API_KEY")
.expect("VLLORA_OPENAI_API_KEY must be set"),
}));

// 3) Non-streaming: send the request and print the final reply
println!("Sending request with tools: web_search_preview and image_generation");
let response = client.responses().create(responses_req).await?;

println!("\nNon-streaming reply:");
println!("{}", "=".repeat(80));

for (index, output) in response.output.iter().enumerate() {
match output {
OutputItem::ImageGenerationCall(image_generation_call) => {
println!("\n[Image Generation Call {}]", index);
match decode_and_save_image(image_generation_call, index) {
Ok(filename) => {
println!("✓ Successfully saved image to: {}", filename);
}
Err(e) => {
eprintln!("✗ Failed to decode/save image: {}", e);
}
}
}
OutputItem::Message(message) => {
println!("\n[Message {}]", index);
println!("{}", "-".repeat(80));

for content in &message.content {
match content {
OutputMessageContent::OutputText(text_output) => {
println!("\n{}", text_output.text);

if !text_output.annotations.is_empty() {
println!("Annotations: {:#?}", text_output.annotations);
}
}
_ => {
println!("Other content type: {:?}", content);
}
}
}
println!("\n{}", "=".repeat(80));
}
_ => {
println!("\n[Other Output {}]", index);
println!("{:?}", output);
}
}
}

Ok(())
}

Execution Flow

  1. Request Construction - We build a CreateResponse with our prompt and tools
  2. Client Initialization - We create and configure the Vllora LLM client
  3. API Call - We send the request and await the response
  4. Response Processing - We iterate through output items:
    • Handle image generation calls by decoding and saving
    • Display text messages with annotations
    • Handle any other output types
  5. File Output - Generated images are saved to disk as PNG files

Expected Output

When you run this example, you'll see output like:

Sending request with tools: web_search_preview and image_generation

Non-streaming reply:
================================================================================

[Message 0]
--------------------------------------------------------------------------------

Here's the latest news from today: [summary of current news]

Annotations: [citations and sources from web search]

================================================================================

[Image Generation Call 1]
✓ Successfully saved image to: generated_image_1.png

AI-Powered Image Generation with Responses API

The actual news content and image will vary based on what's happening when you run it!

Summary

This example demonstrates how to use the Responses API to create multi-tool workflows that combine web search and image generation. The key steps are:

  1. Build a CreateResponse request with the desired tools (WebSearchTool and ImageGenTool)
  2. Initialize the VlloraLLMClient with your API credentials
  3. Send the request and receive structured outputs
  4. Process different output types: extract text from OutputItem::Message and decode base64 images from OutputItem::ImageGenerationCall
  5. Save decoded images to disk using standard Rust file I/O

The Responses API enables powerful, structured workflows that go beyond simple text completions, making it ideal for building applications that need to orchestrate multiple AI capabilities.