Skip to content

prompt-suggestions

Shows prompt suggestions as dim ghost text directly in the editor input. When the editor is empty, a full suggested prompt appears — hit Enter to accept and submit, or start typing to dismiss it.

KeyBehavior
Enter (empty input)Accept ghost text and submit
TabAccept ghost text into buffer (without submitting)
EscapeDismiss ghost text
Any characterDismiss ghost text, type normally

When the editor is empty and the agent is idle, a random suggestion from a curated template list appears as ghost text. Templates include common tasks like “Explain the project structure”, “Run the tests and fix any failures”, and “Review the recent changes for issues”.

Ghost text is suppressed while the agent is processing a turn and reappears after the turn ends.

When you start typing (4+ characters), the extension calls a fast, cheap model to suggest a completion. The suggestion appears as ghost text after your cursor — hit Tab to accept.

The default model is Groq Llama 3.1 8B ($0.05/$0.08 per million tokens). At ~50 tokens per suggestion, 100 suggestions cost ~$0.004.

If the configured model isn’t available, the extension walks a fallback chain until it finds one with a valid API key:

  1. groq/llama-3.1-8b-instant (default)
  2. anthropic/claude-haiku-4-5
  3. anthropic/claude-3-5-haiku-latest
  4. openai/gpt-4o-mini
  5. Cheapest available model in your registry (sorted by input cost)

The model is resolved once per session on the first autocomplete trigger. If no model in the chain has auth configured, inline autocomplete silently disables itself.

Each autocomplete call sends:

  • System prompt — instructs the model to act as an inline autocomplete engine predicting how a developer will finish their sentence. The model is told to reply with only the completion text, not repeat the input or add formatting.
  • Conversation context — the last 6 user/assistant text exchanges from the current session (up to 2,000 characters total). Individual messages are capped at 500 characters. Tool calls and tool results are excluded — only human-readable text is included. This gives the model awareness of what you’ve been working on so suggestions stay relevant.
  • Partial input — what you’ve typed so far, sent as the user message.
  • ParametersmaxTokens: 60, temperature: 0.3. The response is cleaned to the first line only.
  • Debounces at 600ms after you stop typing
  • Cancels in-flight requests on new keystrokes (AbortController)
  • Skips slash commands (handled by structural autocomplete)
  • Capped at 200 calls per session as a cost guardrail
  • Suppressed while the agent is processing a turn
SettingDefaultDescription
prompt-suggestions.enabledtrueEnable/disable the extension entirely
prompt-suggestions.autocompletetrueEnable/disable LLM autocomplete (idle suggestions still work)
prompt-suggestions.modelgroq/llama-3.1-8b-instantModel for autocomplete (provider/model-id)
prompt-suggestions.debounceMs600Debounce delay before calling the model

Add settings to ~/.tallow/settings.json:

{
"prompt-suggestions.enabled": true,
"prompt-suggestions.autocomplete": true,
"prompt-suggestions.model": "groq/llama-3.1-8b-instant",
"prompt-suggestions.debounceMs": 600
}

The extension adds ghost text rendering to the Editor component (packages/tallow-tui). Ghost text is rendered as dim ANSI text (\x1b[2m) after the cursor position, truncated to fit the available content width.

The EditorComponent interface exposes setGhostText() and getGhostText() for custom editor implementations.