prompt-suggestions
Shows prompt suggestions as dim ghost text directly in the editor input. When the editor is empty, a full suggested prompt appears — hit Enter to accept and submit, or start typing to dismiss it.
Ghost text interaction
Section titled “Ghost text interaction”| Key | Behavior |
|---|---|
| Enter (empty input) | Accept ghost text and submit |
| Tab | Accept ghost text into buffer (without submitting) |
| Escape | Dismiss ghost text |
| Any character | Dismiss ghost text, type normally |
Idle suggestions
Section titled “Idle suggestions”When the editor is empty and the agent is idle, a random suggestion from a curated template list appears as ghost text. Templates include common tasks like “Explain the project structure”, “Run the tests and fix any failures”, and “Review the recent changes for issues”.
Ghost text is suppressed while the agent is processing a turn and reappears after the turn ends.
Inline autocomplete
Section titled “Inline autocomplete”When you start typing (4+ characters), the extension calls a fast, cheap model to suggest a completion. The suggestion appears as ghost text after your cursor — hit Tab to accept.
Model selection
Section titled “Model selection”The default model is Groq Llama 3.1 8B ($0.05/$0.08 per million
tokens). At ~50 tokens per suggestion, 100 suggestions cost ~$0.004.
If the configured model isn’t available, the extension walks a fallback chain until it finds one with a valid API key:
groq/llama-3.1-8b-instant(default)anthropic/claude-haiku-4-5anthropic/claude-3-5-haiku-latestopenai/gpt-4o-mini- Cheapest available model in your registry (sorted by input cost)
The model is resolved once per session on the first autocomplete trigger. If no model in the chain has auth configured, inline autocomplete silently disables itself.
What the model receives
Section titled “What the model receives”Each autocomplete call sends:
- System prompt — instructs the model to act as an inline autocomplete engine predicting how a developer will finish their sentence. The model is told to reply with only the completion text, not repeat the input or add formatting.
- Conversation context — the last 6 user/assistant text exchanges from the current session (up to 2,000 characters total). Individual messages are capped at 500 characters. Tool calls and tool results are excluded — only human-readable text is included. This gives the model awareness of what you’ve been working on so suggestions stay relevant.
- Partial input — what you’ve typed so far, sent as the user message.
- Parameters —
maxTokens: 60,temperature: 0.3. The response is cleaned to the first line only.
Guardrails
Section titled “Guardrails”- Debounces at 600ms after you stop typing
- Cancels in-flight requests on new keystrokes (AbortController)
- Skips slash commands (handled by structural autocomplete)
- Capped at 200 calls per session as a cost guardrail
- Suppressed while the agent is processing a turn
Settings
Section titled “Settings”| Setting | Default | Description |
|---|---|---|
prompt-suggestions.enabled | true | Enable/disable the extension entirely |
prompt-suggestions.autocomplete | true | Enable/disable LLM autocomplete (idle suggestions still work) |
prompt-suggestions.model | groq/llama-3.1-8b-instant | Model for autocomplete (provider/model-id) |
prompt-suggestions.debounceMs | 600 | Debounce delay before calling the model |
Add settings to ~/.tallow/settings.json:
{ "prompt-suggestions.enabled": true, "prompt-suggestions.autocomplete": true, "prompt-suggestions.model": "groq/llama-3.1-8b-instant", "prompt-suggestions.debounceMs": 600}TUI integration
Section titled “TUI integration”The extension adds ghost text rendering to the Editor component
(packages/tallow-tui). Ghost text is rendered as dim ANSI text
(\x1b[2m) after the cursor position, truncated to fit the
available content width.
The EditorComponent interface exposes setGhostText() and
getGhostText() for custom editor implementations.