Client API
The VllmClient is the main entry point for interacting with the vLLM API.
Creating a Client
Simple Construction
#![allow(unused)] fn main() { use vllm_client::VllmClient; let client = VllmClient::new("http://localhost:8000/v1"); }
With API Key
#![allow(unused)] fn main() { use vllm_client::VllmClient; let client = VllmClient::new("http://localhost:8000/v1") .with_api_key("sk-your-api-key"); }
With Timeout
#![allow(unused)] fn main() { use vllm_client::VllmClient; let client = VllmClient::new("http://localhost:8000/v1") .timeout_secs(120); // 2 minutes }
Using the Builder Pattern
For more complex configurations, use the builder:
#![allow(unused)] fn main() { use vllm_client::VllmClient; let client = VllmClient::builder() .base_url("http://localhost:8000/v1") .api_key("sk-your-api-key") .timeout_secs(300) .build(); }
Methods Reference
new(base_url: impl Into<String>) -> Self
Create a new client with the given base URL.
#![allow(unused)] fn main() { let client = VllmClient::new("http://localhost:8000/v1"); }
Parameters:
base_url- The base URL of the vLLM server (should include/v1path)
Notes:
- Trailing slashes are automatically removed
- The client is cheap to create but should be reused when possible
with_api_key(self, api_key: impl Into<String>) -> Self
Set the API key for authentication (builder pattern).
#![allow(unused)] fn main() { let client = VllmClient::new("http://localhost:8000/v1") .with_api_key("sk-xxx"); }
Parameters:
api_key- The API key to use for Bearer authentication
Notes:
- The API key is sent as a Bearer token in the
Authorizationheader - This method returns a new client instance
timeout_secs(self, secs: u64) -> Self
Set the request timeout in seconds (builder pattern).
#![allow(unused)] fn main() { let client = VllmClient::new("http://localhost:8000/v1") .timeout_secs(300); }
Parameters:
secs- Timeout duration in seconds
Notes:
- Applies to all requests made by this client
- For long-running generation tasks, consider setting a higher timeout
base_url(&self) -> &str
Get the base URL of the client.
#![allow(unused)] fn main() { let client = VllmClient::new("http://localhost:8000/v1"); assert_eq!(client.base_url(), "http://localhost:8000/v1"); }
api_key(&self) -> Option<&str>
Get the API key, if configured.
#![allow(unused)] fn main() { let client = VllmClient::new("http://localhost:8000/v1") .with_api_key("sk-xxx"); assert_eq!(client.api_key(), Some("sk-xxx")); }
builder() -> VllmClientBuilder
Create a new client builder for more configuration options.
#![allow(unused)] fn main() { let client = VllmClient::builder() .base_url("http://localhost:8000/v1") .api_key("sk-xxx") .timeout_secs(120) .build(); }
API Modules
The client provides access to different API modules:
chat - Chat Completions API
Access the chat completions API for conversational interactions:
#![allow(unused)] fn main() { let response = client.chat.completions().create() .model("Qwen/Qwen2.5-72B-Instruct") .messages(json!([{"role": "user", "content": "Hello!"}])) .send() .await?; }
completions - Legacy Completions API
Access the legacy completions API for text completion:
#![allow(unused)] fn main() { let response = client.completions.create() .model("Qwen/Qwen2.5-72B-Instruct") .prompt("Once upon a time") .send() .await?; }
VllmClientBuilder
The builder provides a flexible way to configure the client.
Methods
| Method | Type | Description |
|---|---|---|
base_url(url) | impl Into<String> | Set the base URL |
api_key(key) | impl Into<String> | Set the API key |
timeout_secs(secs) | u64 | Set timeout in seconds |
build() | - | Build the client |
Default Values
| Option | Default |
|---|---|
base_url | http://localhost:8000/v1 |
api_key | None |
timeout_secs | HTTP client default (30s) |
Usage Examples
Basic Usage
use vllm_client::{VllmClient, json}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let client = VllmClient::new("http://localhost:8000/v1"); let response = client.chat.completions().create() .model("Qwen/Qwen2.5-7B-Instruct") .messages(json!([ {"role": "user", "content": "Hello!"} ])) .send() .await?; println!("{}", response.content.unwrap_or_default()); Ok(()) }
With Environment Variables
#![allow(unused)] fn main() { use std::env; use vllm_client::VllmClient; fn create_client() -> VllmClient { let base_url = env::var("VLLM_BASE_URL") .unwrap_or_else(|_| "http://localhost:8000/v1".to_string()); let api_key = env::var("VLLM_API_KEY").ok(); let mut builder = VllmClient::builder().base_url(&base_url); if let Some(key) = api_key { builder = builder.api_key(&key); } builder.build() } }
Multiple Requests
Reuse the client for multiple requests:
#![allow(unused)] fn main() { use vllm_client::{VllmClient, json}; async fn process_prompts(client: &VllmClient, prompts: &[&str]) -> Vec<String> { let mut results = Vec::new(); for prompt in prompts { let response = client.chat.completions().create() .model("Qwen/Qwen2.5-7B-Instruct") .messages(json!([{"role": "user", "content": prompt}])) .send() .await; match response { Ok(r) => results.push(r.content.unwrap_or_default()), Err(e) => eprintln!("Error: {}", e), } } results } }
Thread Safety
The VllmClient is thread-safe and can be shared across threads:
#![allow(unused)] fn main() { use std::sync::Arc; use vllm_client::VllmClient; let client = Arc::new(VllmClient::new("http://localhost:8000/v1")); // Can be cloned and shared across threads let client_clone = Arc::clone(&client); }
See Also
- Chat Completions - Chat completions API
- Streaming - Streaming response handling
- Configuration - Configuration options