vllm complete

The vllm complete command provides an interactive text completion interface that connects to a running vLLM API server.

Basic usage

vllm complete [OPTIONS]

Prerequisites

You need a running vLLM server:

# In terminal 1
vllm serve facebook/opt-125m

# In terminal 2
vllm complete

Examples

Basic interactive completion

vllm complete

Starts an interactive session:

Using model: facebook/opt-125m
Please enter prompt to complete:
> Once upon a time
 there was a young girl who lived in a small village...
> The weather today is
 sunny and warm with a light breeze...

Quick single completion

vllm complete --quick "The capital of France is"

Generates a single completion and exits:

Using model: facebook/opt-125m
 Paris, which is located in the northern part of the country.

Connect to custom server

vllm complete --url http://192.168.1.100:8080/v1

Specify model name

vllm complete --model-name gpt-3.5-turbo

Control output length

vllm complete --max-tokens 200

With API key

vllm complete --api-key your-secret-key

Options

--url

string

default:"http://localhost:8000/v1"

URL of the running OpenAI-compatible API server.

--model-name

string

The model name to use. If not specified, uses the first available model from the server.

--max-tokens

integer

Maximum number of tokens to generate per completion.

--quick

string

Send a single prompt and exit. Alias: -q.

--api-key

string

API key for authentication. Can also use OPENAI_API_KEY environment variable.

Interactive controls

During an interactive session:

Enter: Submit prompt for completion
Ctrl+C or Ctrl+Z: Exit
Ctrl+D (EOF): Exit

Use cases

Code completion

vllm serve codellama/CodeLlama-7b-hf

Then:

vllm complete

> def fibonacci(n):
    """Calculate the nth Fibonacci number."""
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)

Story generation

vllm complete --max-tokens 500

> In a world where magic was real,
 the young wizard apprentice discovered an ancient spellbook hidden in the 
 library's forbidden section. As he opened the dusty tome, glowing runes 
 appeared on the pages, revealing secrets that had been lost for centuries...

Text continuation

vllm complete -q "The three laws of robotics are:" --max-tokens 150

Creative writing prompts

vllm complete

> Write a haiku about programming:
Code flows like water,
Bugs emerge from the shadows,
Debugger saves all.

Advanced usage

Batch completions

Use a script to process multiple prompts:

#!/bin/bash
while IFS= read -r prompt; do
  vllm complete -q "$prompt" --max-tokens 100
done < prompts.txt

With custom parameters via API

For more control, use the REST API directly:

curl http://localhost:8000/v1/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "facebook/opt-125m",
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.8,
    "top_p": 0.95
  }'

Comparison with `vllm chat`

Use vllm complete for:

Raw text completion
Base (non-chat) models
Single-turn generation
Code completion
Creative writing

Use vllm chat for:

Conversational interactions
Chat-tuned models
Multi-turn dialogues
Question answering

Example: Documentation generation

vllm serve codellama/CodeLlama-13b-hf

Then:

vllm complete

> def process_data(df, columns):
    """Process DataFrame columns.
    
    Args:
        df: Input DataFrame
        columns: List of column names to process
    
    Returns:
        Processed DataFrame with transformed columns
    """

Environment variables

OPENAI_API_KEY

string

API key for authentication. Used if --api-key is not provided.

vllm chat - Interactive chat interface
vllm serve - Start the API server
Completions API - REST API equivalent

Documentation Index

​Basic usage

​Prerequisites

​Examples

​Basic interactive completion

​Quick single completion

​Connect to custom server

​Specify model name

​Control output length

​With API key

​Options

​Interactive controls

​Use cases

​Code completion

​Story generation

​Text continuation

​Creative writing prompts

​Advanced usage

​Batch completions

​With custom parameters via API

​Comparison with vllm chat

​Example: Documentation generation

​Environment variables

​Related

Basic usage

Prerequisites

Examples

Basic interactive completion

Quick single completion

Connect to custom server

Specify model name

Control output length

With API key

Options

Interactive controls

Use cases

Code completion

Story generation

Text continuation

Creative writing prompts

Advanced usage

Batch completions

With custom parameters via API

Comparison with `vllm chat`

Example: Documentation generation

Environment variables

Related