API Reference

This section provides detailed documentation for the vLLM Client API.

Design Philosophy

The vLLM Client API follows these design principles:

Builder Pattern

All request constructions use the builder pattern for ergonomic and flexible API calls:

#![allow(unused)]
fn main() {
let response = client.chat.completions().create()
    .model("model-name")
    .messages(json!([{"role": "user", "content": "Hello"}]))
    .temperature(0.7)
    .max_tokens(1024)
    .send()
    .await?;
}

Async-First

All API operations are async, built on Tokio. Use #[tokio::main] or integrate with your existing runtime:

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Your async code here
}

Type Safety

Strong types are used throughout the library with Serde serialization:

ChatCompletionResponse - Response from chat completions
StreamEvent - Events from streaming responses
ToolCall - Tool/function call data
VllmError - Comprehensive error types

OpenAI Compatibility

The API mirrors the OpenAI API structure, making it easy to migrate existing code:

OpenAI	vLLM Client
`client.chat.completions.create(...)`	`client.chat.completions().create()...send().await`
`stream=True`	`.stream(true).send_stream().await`
`tools=[...]`	`.tools(json!([...]))`

Module Structure

VllmClient
├── chat
│   └── completions()      # Chat completions API
│       ├── create()       # Create request builder
│       └── send()         # Execute request
│       └── send_stream()  # Execute with streaming
├── completions            # Legacy completions API
└── builder()              # Client builder

Core Types

Request Types

Type	Description
`ChatCompletionsRequest`	Builder for chat completion requests
`VllmClientBuilder`	Builder for client configuration

Response Types

Type	Description
`ChatCompletionResponse`	Response from chat completions
`CompletionResponse`	Response from legacy completions
`MessageStream`	Streaming response iterator
`StreamEvent`	Individual stream events
`ToolCall`	Tool/function call data
`Usage`	Token usage statistics

Error Types

Type	Description
`VllmError::Http`	HTTP request failed
`VllmError::Json`	JSON serialization error
`VllmError::ApiError`	API returned error
`VllmError::Stream`	Streaming error
`VllmError::Timeout`	Connection timeout

Quick Reference

Creating a Client

#![allow(unused)]
fn main() {
use vllm_client::VllmClient;

// Simple
let client = VllmClient::new("http://localhost:8000/v1");

// With API key
let client = VllmClient::new("http://localhost:8000/v1")
    .with_api_key("sk-xxx");

// With builder
let client = VllmClient::builder()
    .base_url("http://localhost:8000/v1")
    .api_key("sk-xxx")
    .timeout_secs(120)
    .build();
}

Chat Completion

#![allow(unused)]
fn main() {
use vllm_client::{VllmClient, json};

let response = client.chat.completions().create()
    .model("Qwen/Qwen2.5-7B-Instruct")
    .messages(json!([
        {"role": "user", "content": "Hello!"}
    ]))
    .temperature(0.7)
    .max_tokens(1024)
    .send()
    .await?;

println!("{}", response.content.unwrap());
}

Streaming

#![allow(unused)]
fn main() {
use vllm_client::{VllmClient, json, StreamEvent};
use futures::StreamExt;

let mut stream = client.chat.completions().create()
    .model("Qwen/Qwen2.5-7B-Instruct")
    .messages(json!([{"role": "user", "content": "Hello!"}]))
    .stream(true)
    .send_stream()
    .await?;

while let Some(event) = stream.next().await {
    match event {
        StreamEvent::Content(delta) => print!("{}", delta),
        StreamEvent::Reasoning(delta) => eprintln!("[thinking] {}", delta),
        StreamEvent::Done => break,
        _ => {}
    }
}
}

Sections

Client - VllmClient configuration and methods
Chat Completions - Chat completions API
Streaming - Streaming response handling
Tool Calling - Function/tool calling
Error Handling - Error types and handling

vLLM Client - Rust Client for vLLM API