Skip to main content

Responses API

The vllora Responses API provides a unified interface for building advanced AI agents capable of executing complex tasks autonomously. This API is compatible with OpenAI's Responses API format and supports multimodal inputs, reasoning capabilities, and seamless tool integration.

Overview

The Responses API is a more powerful alternative to the traditional Chat Completions API. It enables:

  • Structured, multi-step workflows with support for multiple built-in tools
  • Rich, multi-modal outputs that can be easily processed programmatically
  • Tool orchestration including web search, image generation, and more
  • Streaming support for real-time response processing

Basic Usage

Non-Streaming Example

Here's a simple example that sends a text prompt and receives a structured response:

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::InputParam;
use vllora_llm::async_openai::types::responses::OutputItem;
use vllora_llm::async_openai::types::responses::OutputMessageContent;
use vllora_llm::client::VlloraLLMClient;
use vllora_llm::error::LLMResult;

#[tokio::main]
async fn main() -> LLMResult<()> {
// 1) Build a Responses-style request using async-openai-compat types
let responses_req = CreateResponse {
model: Some("gpt-4o".to_string()),
input: InputParam::Text("Stream numbers 1 to 20 in separate lines.".to_string()),
max_output_tokens: Some(100),
..Default::default()
};

// 2) Construct a VlloraLLMClient
let client = VlloraLLMClient::default();

// 3) Non-streaming: send the request and print the final reply
let response = client.responses().create(responses_req.clone()).await?;

println!("Non-streaming reply:");
for output in &response.output {
if let OutputItem::Message(message) = output {
for message_content in &message.content {
if let OutputMessageContent::OutputText(text) = message_content {
println!("{}", text.text);
}
}
}
}

Ok(())
}

Streaming Example

The Responses API also supports streaming for real-time processing:

use vllora_llm::async_openai::types::responses::CreateResponse;
use vllora_llm::async_openai::types::responses::InputParam;
use tokio_stream::StreamExt;
use vllora_llm::client::VlloraLLMClient;
use vllora_llm::error::LLMResult;

#[tokio::main]
async fn main() -> LLMResult<()> {
let responses_req = CreateResponse {
model: Some("gpt-4o".to_string()),
input: InputParam::Text("Stream numbers 1 to 20 in separate lines.".to_string()),
max_output_tokens: Some(100),
..Default::default()
};

let client = VlloraLLMClient::default();

// Streaming: send the same request and print chunks as they arrive
// Note: Streaming for responses is not yet fully implemented in all providers
println!("\nStreaming response...");
let mut stream = client
.responses()
.create_stream(responses_req.clone())
.await?;

while let Some(chunk) = stream.next().await {
let chunk = chunk?;
// ResponseEvent structure may vary - print the chunk for debugging
println!("{:?}", chunk);
}

Ok(())
}

Understanding the Response Structure

The Response struct contains an output field, which is a vector of OutputItem variants. Each item represents a different type of output from the API:

  • OutputItem::Message - Text messages from the model
  • OutputItem::ImageGenerationCall - Image generation results
  • OutputItem::WebSearchCall - Web search results
  • Other tool outputs

Each output type can be pattern-matched to extract the relevant data.

Working with Tools

The Responses API supports multiple built-in tools that enable powerful workflows:

  • Web Search - Search the web for current information
  • Image Generation - Generate images from text prompts
  • Custom Tools - Define your own tools for specific tasks

For a comprehensive guide on using tools, especially image generation, see the Image Generation Guide.

Next Steps