Streaming Completions Example

This example demonstrates streaming completions using the legacy /v1/completions API.

Basic Streaming Completions

use vllm_client::{VllmClient, json, CompletionStreamEvent};
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = VllmClient::new("http://localhost:8000/v1");

    println!("=== Streaming Completions ===\n");
    println!("Model: Qwen/Qwen2.5-7B-Instruct\n");
    println!("Prompt: What is machine learning?");
    println!("\nGenerated text: ");

    let mut stream = client
        .completions
        .create()
        .model("Qwen/Qwen2.5-7B-Instruct")
        .prompt("What is machine learning?")
        .max_tokens(500)
        .temperature(0.7)
        .stream(true)
        .send_stream()
        .await?;

    // Process streaming events
    while let Some(event) = stream.next().await {
        match event {
            CompletionStreamEvent::Text(delta) => {
                // Print text delta (real-time output)
                print!("{}", delta);
                // Flush buffer for real-time display
                std::io::stdout().flush().ok();
            }
            CompletionStreamEvent::FinishReason(reason) => {
                println!("\n\n--- Finish reason: {} ---", reason);
            }
            CompletionStreamEvent::Usage(usage) => {
                // Output token usage statistics at the end
                println!("\n\n--- Token Usage ---");
                println!("Prompt tokens: {}", usage.prompt_tokens);
                println!("Completion tokens: {}", usage.completion_tokens);
                println!("Total tokens: {}", usage.total_tokens);
            }
            CompletionStreamEvent::Done => {
                println!("\n\n=== Generation Complete ===");
                break;
            }
            CompletionStreamEvent::Error(e) => {
                eprintln!("\nError: {}", e);
                return Err(e.into());
            }
        }
    }

    Ok(())
}

Key Differences from Chat Streaming

Aspect	Chat Completions	Completions
Event type	`StreamEvent`	`CompletionStreamEvent`
Content variant	`Content(String)`	`Text(String)`
Additional event	`Reasoning`, `ToolCall`	`FinishReason`
Use case	Conversation-based	Single prompt

When to Use Completions API

Simple text generation with a single prompt
Legacy compatibility with OpenAI API
Situations where chat messages format is not needed

For new projects, we recommend using the Chat Completions API (client.chat.completions()) which provides more flexibility and better message formatting.

Streaming - Chat streaming examples
API Streaming - Streaming API reference
Basic Chat - Non-streaming completions example