Skip to main content

Debugging Kilocode with vLLora

· 2 min read
Mrunmay
AI Engineer

Developers building coding agents need visiblity into how context is flowing through the agent, how much context is used, what tools are being called. vLLora enables you to debug all of this in real time.

Peeking Inside Your Coding Agent

Setup

Run and configure vLLora locally. Follow the Quickstart guide to get started.

brew tap vllora/vllora
brew install vllora
vllora

In KiloCode, during setup select OpenAI Compatible and set the base URL to vLLora's endpoint. For API key, use no_key as vLLora does not validate the API key, since you set API key in the vLLora UI.

Now open your code editor with KiloCode and start prompting your agent.

The Prompt

Add a customer leaderboard or loyalty points tracker component, 
and embed a mini gallery section for user engagement.

When this prompt runs in KiloCode, the agent edits several files, creates new components, updates imports, and adjusts the layout to match the request.

Kilocode Trace

With vLLora running, we could see run involved 10 model calls and a sequence of tool executions including read_file, write_to_file, execute_command, apply_diff, and update_todo_list.

Across the session, we could see the context size steadily grow as it started with about 9,000 input tokens and reached nearly 90,000 tokens by the end as the agent read, wrote, and reloaded files.
This illustrates how coding agents like KiloCode repeatedly expand their working context as the project state evolves.

Beyond the visible tools in this trace, the underlying agent also defines a larger toolset, such as:

  • new_task, list_code_definition_names, and search_files for project understanding
  • insert_content, search_and_replace, and apply_diff for precise code edits
  • browser_action and execute_command for testing and validation
  • update_todo_list and attempt_completion for managing the reasoning cycle

vLLora captures every call in sequence, showing which tools and how they were used, how much context each request consumed, and how the model responded. This experience makes debugging easier by exposing where the agent slows down, repeats steps, or mismanages context. It helps you identify issues faster, optimize performance, and build more reliable coding agents.