Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
0.1.0 - 2024-01-XX
Added
- Initial release of vLLM Client
VllmClientfor connecting to vLLM servers- Chat completions API (
client.chat.completions()) - Streaming response support with
MessageStream - Tool/function calling support
- Reasoning/thinking mode support for compatible models
- Error handling with
VllmErrorenum - Builder pattern for client configuration
- Request builder pattern for chat completions
- Support for vLLM-specific parameters via
extra() - Token usage tracking in responses
- Timeout configuration
- API key authentication
Features
Client
VllmClient::new(base_url)- Create a new clientVllmClient::builder()- Create a client with builder patternwith_api_key()- Set API key for authenticationtimeout_secs()- Set request timeout
Chat Completions
model()- Set model namemessages()- Set conversation messagestemperature()- Set sampling temperaturemax_tokens()- Set maximum output tokenstop_p()- Set nucleus sampling parametertop_k()- Set top-k sampling (vLLM extension)stop()- Set stop sequencesstream()- Enable streaming modetools()- Define available toolstool_choice()- Control tool selectionextra()- Pass vLLM-specific parameters
Streaming
StreamEvent::Content- Content tokensStreamEvent::Reasoning- Reasoning content (thinking models)StreamEvent::ToolCallDelta- Streaming tool call updatesStreamEvent::ToolCallComplete- Complete tool callStreamEvent::Usage- Token usage statisticsStreamEvent::Done- Stream completionStreamEvent::Error- Error events
Response Types
ChatCompletionResponse- Chat completion responseToolCall- Tool call data with parsing methodsUsage- Token usage statistics
Dependencies
reqwest- HTTP clientserde/serde_json- JSON serializationtokio- Async runtimethiserror- Error handling
[Unreleased]
Planned
- Custom HTTP headers support
- Connection pooling configuration
- Request/response logging
- Retry middleware
- Multi-modal input helpers
- Async iterator for batch processing
- OpenTelemetry integration
- WebSocket transport
Version History
| Version | Date | Highlights |
|---|---|---|
| 0.1.0 | 2024-01 | Initial release |