Building AI-Powered Image Generation with OpenAI-Compatible Responses API
Introduction
The Responses API represents a powerful evolution in how we interact with large language models. Unlike traditional chat completion APIs that return simple text responses, the Responses API enables structured, multi-step workflows that can orchestrate multiple tools and produce rich, multi-modal outputs.
In this article, we'll explore how to build an AI-powered application that combines web search and image generation capabilities.
Source Code: The complete example is available on GitHub.
Documentation: For comprehensive Responses API documentation, see the Responses API guide and Image Generation guide.
Understanding the Responses API
The Responses API is a more powerful alternative to the traditional Completions API. It enables structured, multi-step workflows with support for multiple built-in tools like web search and image generation, producing rich, multi-modal outputs that can be easily processed programmatically.
Prerequisites and Setup
Before we dive into the code, let's ensure we have everything we need.
Required Dependencies
Our example requires the following Rust crates:
vllora_llm- The Vllora LLM client libraryasync-openai-compat- OpenAI-compatible type definitions (version 0.30.1)base64- For decoding base64-encoded images (version 0.22)tokio- Async runtime (version 1.x with full features)serde_json- JSON serialization support
Cargo.toml Configuration
Here's the complete Cargo.toml for our example:
[package]
name = "responses_image_generation_example"
version = "0.1.0"
edition = "2021"
[workspace]
[dependencies]
vllora_llm = "0.1.17"
tokio = { version = "1", features = ["full"] }
serde_json = "1.0"
base64 = "0.22"
Environment Setup
You'll need to set your API key as an environment variable:
export VLLORA_OPENAI_API_KEY="your-api-key-here"
Note: Make sure to keep your API key secure. Never commit it to version control or expose it in client-side code.
Building the Request
Now let's construct our Responses API request. We'll create a request that uses both web search and image generation tools.
Creating the CreateResponse Structure
use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;
let responses_req = CreateResponse {
model: Some("gpt-4.1".to_string()),
input: InputParam::Text(
"Search for the latest news from today and generate an image about it".to_string(),
),
tools: Some(vec![
Tool::WebSearch(WebSearchTool::default()),
Tool::ImageGeneration(ImageGenTool::default()),
]),
..Default::default()
};
Understanding the Components
Model Selection - We're using "gpt-4.1", which supports the Responses API and tool calling. Make sure to use a model that supports these features.
Input Parameter - We use InputParam::Text to provide a simple text prompt. The model will:
- First use the web search tool to find current news
- Then use the image generation tool to create an image related to that news
Tool Configuration - We specify two tools:
WebSearchTool::default()- Uses default web search configurationImageGenTool::default()- Uses default image generation settings
The ..Default::default() ensures all other fields use their default values, which is a common Rust pattern for struct initialization.
Initializing the Client
Next, we need to set up the Vllora LLM client with our credentials.
Client Configuration
use vllora_llm::client::VlloraLLMClient;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;
let client = VlloraLLMClient::default()
.with_credentials(Credentials::ApiKey(ApiKeyCredentials {
api_key: std::env::var("VLLORA_OPENAI_API_KEY")
.expect("VLLORA_OPENAI_API_KEY must be set"),
}));
Credential Management
The client uses a builder pattern for configuration. Here we:
- Start with
VlloraLLMClient::default()for default settings - Chain
.with_credentials()to provide authentication - Use
Credentials::ApiKey()withApiKeyCredentialsfor API key authentication - Read the API key from the environment variable
Tip: In production, consider using a more robust error handling approach instead of
.expect(), such as returning aResultor using a configuration management library.
Sending the Request and Handling Responses
Now let's send our request and see what we get back.
Making the API Call
use vllora_llm::error::LLMResult;
println!("Sending request with tools: web_search_preview and image_generation");
let response = client.responses().create(responses_req).await?;
The client.responses().create() method:
- Returns a
Result<Response, LLMError> - Is async, so we use
.await - The
?operator propagates errors up the call stack
Understanding the Response Structure
The Response struct contains an output field, which is a vector of OutputItem variants. Each item represents a different type of output from the API:
- Text messages from the model
- Image generation results
- Web search results
- Other tool outputs
Processing Text Messages
Let's see how to extract and display text content from the response.
Matching Message Outputs
use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;
for (index, output) in response.output.iter().enumerate() {
match output {
OutputItem::Message(message) => {
println!("\n[Message {}]", index);
println!("{}", "-".repeat(80));
for content in &message.content {
match content {
OutputMessageContent::OutputText(text_output) => {
// Print the text content
println!("\n{}", text_output.text);
// Print sources/annotations if available
if !text_output.annotations.is_empty() {
println!("Annotations: {:#?}", text_output.annotations);
}
}
_ => {
println!("Other content type: {:?}", content);
}
}
}
println!("\n{}", "=".repeat(80));
}
// ... handle other output types
}
}
Understanding Message Content
Message Structure - Each Message contains a content vector that can hold different content types:
OutputText- The actual text response- Other content types for different media
Annotations - Text outputs can include annotations which provide:
- Citations and sources (especially useful with web search)
- References to tool calls
- Additional metadata
These annotations are particularly valuable when using web search tools, as they show where the information came from.
Handling Image Generation Results
This is the core focus of our example - extracting and saving generated images.
Understanding ImageGenToolCall
When the model uses the image generation tool, the response includes OutputItem::ImageGenerationCall variants. Each call contains:
- A
resultfield with the base64-encoded image data - Metadata about the generation
Decoding and Saving Images
Here's our complete image handling function:
use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;
/// Decodes a base64-encoded image from an ImageGenerationCall and saves it to a file.
///
/// # Arguments
/// * `image_generation_call` - The image generation call containing the base64-encoded image
/// * `index` - The index to use in the filename
///
/// # Returns
/// * `Ok(filename)` - The filename where the image was saved
/// * `Err(e)` - An error if the call has no result, decoding fails, or file writing fails
fn decode_and_save_image(
image_generation_call: &ImageGenToolCall,
index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
// Extract base64 image from the call
let base64_image = image_generation_call
.result
.as_ref()
.ok_or("Image generation call has no result")?;
// Decode base64 image
let image_data = STANDARD.decode(base64_image)?;
// Save to file
let filename = format!("generated_image_{}.png", index);
fs::write(&filename, image_data)?;
Ok(filename)
}
Step-by-Step Breakdown
-
Extract Base64 Data - We access the
resultfield, which is anOption<String>. We use.ok_or()to convertNoneinto an error if the result is missing. -
Decode Base64 - The
base64crate'sSTANDARDengine decodes the base64 string into raw bytes. This can fail if the string is malformed, so we use?to propagate errors. -
Save to File - We use Rust's standard library
fs::write()to save the decoded bytes to a file. We name itgenerated_image_{index}.pngto avoid conflicts when multiple images are generated. -
Return Filename - We return the filename so the caller knows where the image was saved.
Using the Function
Here's how we integrate this into our response processing:
OutputItem::ImageGenerationCall(image_generation_call) => {
println!("\n[Image Generation Call {}]", index);
match decode_and_save_image(image_generation_call, index) {
Ok(filename) => {
println!("✓ Successfully saved image to: {}", filename);
}
Err(e) => {
eprintln!("✗ Failed to decode/save image: {}", e);
}
}
}
We match on OutputItem::ImageGenerationCall, extract the call, and pass it to our decoding function. We handle both success and error cases gracefully.
Complete Example Walkthrough
Let's put it all together and see the complete flow:
Complete Source Code
use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::ImageGenTool;
use vllora_llm::async_openai::types::responses::ImageGenToolCall;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;
use vllora_llm::async_openai::types::responses::Tool;
use vllora_llm::async_openai::types::responses::WebSearchTool;
use base64::{engine::general_purpose::STANDARD, Engine as _};
use std::fs;
use vllora_llm::client::VlloraLLMClient;
use vllora_llm::error::LLMResult;
use vllora_llm::types::credentials::ApiKeyCredentials;
use vllora_llm::types::credentials::Credentials;
fn decode_and_save_image(
image_generation_call: &ImageGenToolCall,
index: usize,
) -> Result<String, Box<dyn std::error::Error>> {
let base64_image = image_generation_call
.result
.as_ref()
.ok_or("Image generation call has no result")?;
let image_data = STANDARD.decode(base64_image)?;
let filename = format!("generated_image_{}.png", index);
fs::write(&filename, image_data)?;
Ok(filename)
}
#[tokio::main]
async fn main() -> LLMResult<()> {
// 1) Build a Responses-style request using async-openai-compat types
// with tools for web_search_preview and image_generation
let responses_req = CreateResponse {
model: Some("gpt-4.1".to_string()),
input: InputParam::Text(
"Search for the latest news from today and generate an image about it".to_string(),
),
tools: Some(vec![
Tool::WebSearch(WebSearchTool::default()),
Tool::ImageGeneration(ImageGenTool::default()),
]),
..Default::default()
};
// 2) Construct a VlloraLLMClient
let client =
VlloraLLMClient::default().with_credentials(Credentials::ApiKey(ApiKeyCredentials {
api_key: std::env::var("VLLORA_OPENAI_API_KEY")
.expect("VLLORA_OPENAI_API_KEY must be set"),
}));
// 3) Non-streaming: send the request and print the final reply
println!("Sending request with tools: web_search_preview and image_generation");
let response = client.responses().create(responses_req).await?;
println!("\nNon-streaming reply:");
println!("{}", "=".repeat(80));
for (index, output) in response.output.iter().enumerate() {
match output {
OutputItem::ImageGenerationCall(image_generation_call) => {
println!("\n[Image Generation Call {}]", index);
match decode_and_save_image(image_generation_call, index) {
Ok(filename) => {
println!("✓ Successfully saved image to: {}", filename);
}
Err(e) => {
eprintln!("✗ Failed to decode/save image: {}", e);
}
}
}
OutputItem::Message(message) => {
println!("\n[Message {}]", index);
println!("{}", "-".repeat(80));
for content in &message.content {
match content {
OutputMessageContent::OutputText(text_output) => {
println!("\n{}", text_output.text);
if !text_output.annotations.is_empty() {
println!("Annotations: {:#?}", text_output.annotations);
}
}
_ => {
println!("Other content type: {:?}", content);
}
}
}
println!("\n{}", "=".repeat(80));
}
_ => {
println!("\n[Other Output {}]", index);
println!("{:?}", output);
}
}
}
Ok(())
}
Execution Flow
- Request Construction - We build a
CreateResponsewith our prompt and tools - Client Initialization - We create and configure the Vllora LLM client
- API Call - We send the request and await the response
- Response Processing - We iterate through output items:
- Handle image generation calls by decoding and saving
- Display text messages with annotations
- Handle any other output types
- File Output - Generated images are saved to disk as PNG files
Expected Output
When you run this example, you'll see output like:
Sending request with tools: web_search_preview and image_generation
Non-streaming reply:
================================================================================
[Message 0]
--------------------------------------------------------------------------------
Here's the latest news from today: [summary of current news]
Annotations: [citations and sources from web search]
================================================================================
[Image Generation Call 1]
✓ Successfully saved image to: generated_image_1.png

The actual news content and image will vary based on what's happening when you run it!
Summary
This example demonstrates how to use the Responses API to create multi-tool workflows that combine web search and image generation. The key steps are:
- Build a
CreateResponserequest with the desired tools (WebSearchToolandImageGenTool) - Initialize the
VlloraLLMClientwith your API credentials - Send the request and receive structured outputs
- Process different output types: extract text from
OutputItem::Messageand decode base64 images fromOutputItem::ImageGenerationCall - Save decoded images to disk using standard Rust file I/O
The Responses API enables powerful, structured workflows that go beyond simple text completions, making it ideal for building applications that need to orchestrate multiple AI capabilities.
