Skip to main content

vllora LLM crate (vllora_llm)

Crates.io GitHub

This crate powers the Vllora AI Gateway's LLM layer. It provides:

  • Unified chat-completions client over multiple providers (OpenAI-compatible, Anthropic, Gemini, Bedrock, …)
  • Gateway-native types (ChatCompletionRequest, ChatCompletionMessage, routing & tools support)
  • Streaming responses and telemetry hooks via a common ModelInstance trait
  • Tracing integration: out-of-the-box tracing support, with a console example in llm/examples/tracing (spans/events to stdout) and an OTLP example in llm/examples/tracing_otlp (send spans to external collectors such as New Relic)
  • Supported parameters: See the usage guide for a detailed table of which parameters are honored by each provider

Use it when you want to talk to the gateway's LLM engine from Rust code, without worrying about provider-specific SDKs.

Quick start

use vllora_llm::client::VlloraLLMClient;
use vllora_llm::types::gateway::{ChatCompletionRequest, ChatCompletionMessage};
use vllora_llm::error::LLMResult;

#[tokio::main]
async fn main() -> LLMResult<()> {
// 1) Build a chat completion request using gateway-native types
let request = ChatCompletionRequest {
model: "gpt-4.1-mini".to_string(),
messages: vec![
ChatCompletionMessage::new_text(
"system".to_string(),
"You are a helpful assistant.".to_string(),
),
ChatCompletionMessage::new_text(
"user".to_string(),
"Stream numbers 1 to 20 in separate lines.".to_string(),
),
],
..Default::default()
};

// 2) Construct a VlloraLLMClient
let client = VlloraLLMClient::new();

// 3) Non-streaming: send the request and print the final reply
let response = client
.completions()
.create(request.clone())
.await?;

// ... handle response
Ok(())
}

Note: By default, VlloraLLMClient::new() fetches API keys from environment variables following the pattern VLLORA_{PROVIDER_NAME}_API_KEY. For example, for OpenAI, it will look for VLLORA_OPENAI_API_KEY.

Next steps