TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/vllm-project/vllm/llms.txt
Use this file to discover all available pages before exploring further.
vllm complete command provides an interactive text completion interface that connects to a running vLLM API server.
Basic usage
Prerequisites
You need a running vLLM server:Examples
Basic interactive completion
Quick single completion
Connect to custom server
Specify model name
Control output length
With API key
Options
URL of the running OpenAI-compatible API server.
The model name to use. If not specified, uses the first available model from the server.
Maximum number of tokens to generate per completion.
Send a single prompt and exit. Alias:
-q.API key for authentication. Can also use
OPENAI_API_KEY environment variable.Interactive controls
During an interactive session:- Enter: Submit prompt for completion
- Ctrl+C or Ctrl+Z: Exit
- Ctrl+D (EOF): Exit
Use cases
Code completion
Story generation
Text continuation
Creative writing prompts
Advanced usage
Batch completions
Use a script to process multiple prompts:With custom parameters via API
For more control, use the REST API directly:Comparison with vllm chat
Use vllm complete for:
- Raw text completion
- Base (non-chat) models
- Single-turn generation
- Code completion
- Creative writing
vllm chat for:
- Conversational interactions
- Chat-tuned models
- Multi-turn dialogues
- Question answering
Example: Documentation generation
Environment variables
API key for authentication. Used if
--api-key is not provided.Related
- vllm chat - Interactive chat interface
- vllm serve - Start the API server
- Completions API - REST API equivalent