Streaming Completions Example

This example demonstrates streaming completions using the legacy /v1/completions API.

Basic Streaming Completions

use vllm_client::{VllmClient, json, CompletionStreamEvent};
use futures::StreamExt;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let client = VllmClient::new("http://localhost:8000/v1");

    println!("=== Streaming Completions ===\n");
    println!("Model: Qwen/Qwen2.5-7B-Instruct\n");
    println!("Prompt: What is machine learning?");
    println!("\nGenerated text: ");

    let mut stream = client
        .completions
        .create()
        .model("Qwen/Qwen2.5-7B-Instruct")
        .prompt("What is machine learning?")
        .max_tokens(500)
        .temperature(0.7)
        .stream(true)
        .send_stream()
        .await?;

    // Process streaming events
    while let Some(event) = stream.next().await {
        match event {
            CompletionStreamEvent::Text(delta) => {
                // Print text delta (real-time output)
                print!("{}", delta);
                // Flush buffer for real-time display
                std::io::stdout().flush().ok();
            }
            CompletionStreamEvent::FinishReason(reason) => {
                println!("\n\n--- Finish reason: {} ---", reason);
            }
            CompletionStreamEvent::Usage(usage) => {
                // Output token usage statistics at the end
                println!("\n\n--- Token Usage ---");
                println!("Prompt tokens: {}", usage.prompt_tokens);
                println!("Completion tokens: {}", usage.completion_tokens);
                println!("Total tokens: {}", usage.total_tokens);
            }
            CompletionStreamEvent::Done => {
                println!("\n\n=== Generation Complete ===");
                break;
            }
            CompletionStreamEvent::Error(e) => {
                eprintln!("\nError: {}", e);
                return Err(e.into());
            }
        }
    }

    Ok(())
}

Key Differences from Chat Streaming

AspectChat CompletionsCompletions
Event typeStreamEventCompletionStreamEvent
Content variantContent(String)Text(String)
Additional eventReasoning, ToolCallFinishReason
Use caseConversation-basedSingle prompt

When to Use Completions API

  • Simple text generation with a single prompt
  • Legacy compatibility with OpenAI API
  • Situations where chat messages format is not needed

For new projects, we recommend using the Chat Completions API (client.chat.completions()) which provides more flexibility and better message formatting.