API Reference

This section provides detailed documentation for the vLLM Client API.

Design Philosophy

The vLLM Client API follows these design principles:

Builder Pattern

All request constructions use the builder pattern for ergonomic and flexible API calls:

#![allow(unused)]
fn main() {
let response = client.chat.completions().create()
    .model("model-name")
    .messages(json!([{"role": "user", "content": "Hello"}]))
    .temperature(0.7)
    .max_tokens(1024)
    .send()
    .await?;
}

Async-First

All API operations are async, built on Tokio. Use #[tokio::main] or integrate with your existing runtime:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Your async code here
}

Type Safety

Strong types are used throughout the library with Serde serialization:

  • ChatCompletionResponse - Response from chat completions
  • StreamEvent - Events from streaming responses
  • ToolCall - Tool/function call data
  • VllmError - Comprehensive error types

OpenAI Compatibility

The API mirrors the OpenAI API structure, making it easy to migrate existing code:

OpenAIvLLM Client
client.chat.completions.create(...)client.chat.completions().create()...send().await
stream=True.stream(true).send_stream().await
tools=[...].tools(json!([...]))

Module Structure

VllmClient
├── chat
│   └── completions()      # Chat completions API
│       ├── create()       # Create request builder
│       └── send()         # Execute request
│       └── send_stream()  # Execute with streaming
├── completions            # Legacy completions API
└── builder()              # Client builder

Core Types

Request Types

TypeDescription
ChatCompletionsRequestBuilder for chat completion requests
VllmClientBuilderBuilder for client configuration

Response Types

TypeDescription
ChatCompletionResponseResponse from chat completions
CompletionResponseResponse from legacy completions
MessageStreamStreaming response iterator
StreamEventIndividual stream events
ToolCallTool/function call data
UsageToken usage statistics

Error Types

TypeDescription
VllmError::HttpHTTP request failed
VllmError::JsonJSON serialization error
VllmError::ApiErrorAPI returned error
VllmError::StreamStreaming error
VllmError::TimeoutConnection timeout

Quick Reference

Creating a Client

#![allow(unused)]
fn main() {
use vllm_client::VllmClient;

// Simple
let client = VllmClient::new("http://localhost:8000/v1");

// With API key
let client = VllmClient::new("http://localhost:8000/v1")
    .with_api_key("sk-xxx");

// With builder
let client = VllmClient::builder()
    .base_url("http://localhost:8000/v1")
    .api_key("sk-xxx")
    .timeout_secs(120)
    .build();
}

Chat Completion

#![allow(unused)]
fn main() {
use vllm_client::{VllmClient, json};

let response = client.chat.completions().create()
    .model("Qwen/Qwen2.5-7B-Instruct")
    .messages(json!([
        {"role": "user", "content": "Hello!"}
    ]))
    .temperature(0.7)
    .max_tokens(1024)
    .send()
    .await?;

println!("{}", response.content.unwrap());
}

Streaming

#![allow(unused)]
fn main() {
use vllm_client::{VllmClient, json, StreamEvent};
use futures::StreamExt;

let mut stream = client.chat.completions().create()
    .model("Qwen/Qwen2.5-7B-Instruct")
    .messages(json!([{"role": "user", "content": "Hello!"}]))
    .stream(true)
    .send_stream()
    .await?;

while let Some(event) = stream.next().await {
    match event {
        StreamEvent::Content(delta) => print!("{}", delta),
        StreamEvent::Reasoning(delta) => eprintln!("[thinking] {}", delta),
        StreamEvent::Done => break,
        _ => {}
    }
}
}

Sections