Pause, Inspect, Edit: Debug Mode for LLM Requests in vLLora | vLLora

LLMs behave like black boxes. You send them a request, hope the prompt is right, hope your agent didn't mutate it, hope the framework packaged it correctly — and then hope the response makes sense. In simple one-shot queries this usually works fine. But when you're building agents, tools, multi-step workflows, or RAG pipelines, it becomes very hard to see what the model is actually receiving. A single unexpected message, parameter, or system prompt change can shift the entire run.

Today we're introducing Debug Mode for LLM requests in vLLora that makes this visible — and editable.

Here’s what debugging looks like in practice:

Debugging LLM Request using Debug Mode

vLLora now supports Debug Mode for LLM requests. When Debug Mode is enabled, every request pauses before it reaches the model. Debug Mode works by inserting breakpoints on every outgoing LLM request, allowing you to inspect, edit, or continue execution.

You can:

Inspect the exact request
Edit anything
Continue execution normally

This brings a familiar software-engineering workflow ("pause -> inspect -> edit -> continue") to LLM development.

Why We Built This

If you've built anything beyond a simple chat interface, you've likely hit one of these:

Silent tool-call failures (wrong name / bad params / malformed JSON)
Overloaded or corrupted context / RAG input leading to hallucination or truncation
Error accumulation and state drift in long or multi-step workflows
Lack of visibility: standard logs rarely show the actual request sent to the model

It is difficult to fix these issues without proper observability. Debug Mode changes that.

What Happens When a Request Pauses

Here's what it looks like when vLLora intercepts a request right before it's sent:

Paused request example

You get a real-time snapshot of:

The selected model
Full message array (system, user, assistant)
Parameters like temperature or max tokens
Any tool definitions
Any extra fields and headers your framework injected

This is the full request payload your application is about to send — not what you assume it's sending.

Edit Anything

Click Edit and the payload becomes modifiable:

Edit Request modal with JSON editor

You can adjust:

Message content
System prompts
Model name
Parameters
Tool definitions
Metadata

Temporary Changes

This affects only the current request. Your application code stays untouched.

It's a fast way to validate fixes, test ideas, and confirm what the agent should have sent.

Continue the Workflow

Continue Workflow

When you click Continue, vLLora:

Sends your edited request to the model
Receives the real response
Passes it back to your application
Resumes the workflow as if nothing unusual happened

After you click Continue, the workflow proceeds using the response from your edited request. The agent treats it the same way it would treat any normal response from the model.

Why This Matters for Agents

Agents are long-running chains of decisions. Each step can depend on the previous one, and each step can affect the next. Once you're 15 steps deep, you might not know whether:

The prompt changed
A system message was overwritten
A parameter was set differently than expected
The context blew up
A tool schema got mutated

With Debug Mode:

You catch drift early
You see exactly what the model receives
You fix issues in seconds
You avoid rerunning long multi-step workflows
You test prompt or parameter changes instantly

For deep agents, debugging becomes 10x easier.

Closing Thoughts

Debugging LLM systems has been mostly tedious. Debug Mode gives you a clear view into what’s happening and a way to correct issues as they occur.

If you need to understand or fix what an agent is sending, this is the most direct way to do it.

Read the docs: Debug Mode

Try it locally: Quickstart

Why We Built This​

What Happens When a Request Pauses​

Edit Anything​

Continue the Workflow​

Why This Matters for Agents​

Closing Thoughts​