快速上手
本节带你完成第一次 API 调用。
前置条件
- Rust 1.70 及以上版本
- 已启动的 vLLM 服务
基础对话补全
最简单的使用方式如下:
use vllm_client::{VllmClient, json}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { // 创建客户端,指向 vLLM 服务地址 let client = VllmClient::new("http://localhost:8000/v1"); // 发送对话补全请求 let response = client .chat .completions() .create() .model("Qwen/Qwen2.5-7B-Instruct") .messages(json!([ {"role": "user", "content": "你好,最近怎么样?"} ])) .send() .await?; // 打印响应内容 println!("回复: {}", response.content.unwrap_or_default()); Ok(()) }
流式响应
如果需要实时输出,可以使用流式模式:
use vllm_client::{VllmClient, json, StreamEvent}; use futures::StreamExt; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let client = VllmClient::new("http://localhost:8000/v1"); // 创建流式请求 let mut stream = client .chat .completions() .create() .model("Qwen/Qwen2.5-7B-Instruct") .messages(json!([ {"role": "user", "content": "写一首关于春天的短诗"} ])) .stream(true) .send_stream() .await?; // 处理流式事件 while let Some(event) = stream.next().await { match event { StreamEvent::Content(delta) => print!("{}", delta), StreamEvent::Reasoning(delta) => eprint!("[思考: {}]", delta), StreamEvent::Done => println!("\n[完成]"), StreamEvent::Error(e) => eprintln!("\n错误: {}", e), _ => {} } } Ok(()) }
使用构建器模式
需要更多配置时,可以使用构建器:
#![allow(unused)] fn main() { use vllm_client::VllmClient; let client = VllmClient::builder() .base_url("http://localhost:8000/v1") .api_key("your-api-key") // 可选 .timeout_secs(120) // 可选 .build(); }
完整示例
use vllm_client::{VllmClient, json}; #[tokio::main] async fn main() -> Result<(), Box<dyn std::error::Error>> { let client = VllmClient::new("http://localhost:8000/v1"); let response = client .chat .completions() .create() .model("Qwen/Qwen2.5-7B-Instruct") .messages(json!([ {"role": "system", "content": "你是一个有帮助的助手。"}, {"role": "user", "content": "法国的首都是哪里?"} ])) .temperature(0.7) .max_tokens(1024) .top_p(0.9) .send() .await?; println!("回复: {}", response.content.unwrap_or_default()); // 打印 token 使用统计(如有) if let Some(usage) = response.usage { println!("Token 统计: 提示词={}, 补全={}, 总计={}", usage.prompt_tokens, usage.completion_tokens, usage.total_tokens ); } Ok(()) }
错误处理
建议做好错误处理:
use vllm_client::{VllmClient, json, VllmError}; async fn chat() -> Result<String, VllmError> { let client = VllmClient::new("http://localhost:8000/v1"); let response = client .chat .completions() .create() .model("Qwen/Qwen2.5-7B-Instruct") .messages(json!([ {"role": "user", "content": "你好!"} ])) .send() .await?; Ok(response.content.unwrap_or_default()) } #[tokio::main] async fn main() { match chat().await { Ok(text) => println!("回复: {}", text), Err(VllmError::ApiError { status_code, message, .. }) => { eprintln!("API 错误 ({}): {}", status_code, message); } Err(VllmError::Timeout) => { eprintln!("请求超时"); } Err(e) => { eprintln!("错误: {}", e); } } }