Hey all — I’m running into some persistent issues with Clawdbot that I’m hoping someone can help me understand. I’ve been using it pretty heavily for real task automation (file operations, Excel updates, lead generation, etc.), and while it works sometimes, it degrades in ways that make it unreliable without a lot of manual intervention.

Main Problem: Tool Use Degrades Over Time

When I start a fresh session, the bot will often use tools correctly for a while — executing commands, writing files, updating spreadsheets, etc. But after some back-and-forth conversation, it gradually starts:

• Claiming tools are unavailable when they definitely exist
• Falling back to “I can’t access that” or narrative responses instead of running tools
• Requiring very explicit, step-by-step prompting to do things it was doing autonomously earlier

It feels like tool awareness or tool schemas are getting lost mid-session, even though nothing changed on the backend.

Tool Call Failures / JSON Errors

I frequently get errors like:

“Expected ',' or '}' after property value in JSON…”

This seems to happen when the model tries to call a tool but the framework rejects the payload. My guess is:

• The model is outputting malformed JSON for tool calls
• The gateway/parser is strict and fails hard instead of retrying
• Sometimes normal text and tool JSON get mixed together

When this happens, the tool never runs, but the model doesn’t always clearly surface that — it sometimes continues as if the action happened, which makes debugging difficult.

Excessive Prompting Required

Another issue is that I often have to over-explain tasks repeatedly. For example:

• I have a budget spreadsheet workflow with strict column rules
• Even after the bot successfully does it once, later in the session it forgets the procedure
• I end up having to paste the full instructions again

It seems like long-term operational rules explainable via memory or system files aren’t consistently being applied during external chat sessions (Discord).

Session Bloat / Performance Drop

I’ve also learned about Issue #2254 regarding large session files. From what I understand:

• Gateway tool responses can inject large config/schema blobs into the session
• Session files grow to multiple MB
• Eventually they exceed token limits
• Auto-compaction fails because the summarization prompt is also too large

This might explain why things work initially, then deteriorate badly later in the same session. I have found large .jsonl files in ~/.clawdbot/agents/main/sessions/.

Environment / Exec Issues

I’ve seen cases where:

• The bot says a Python library like openpyxl isn’t available
• But it clearly exists in the system
• Logs show PEP 668 “externally managed environment” errors when it tries to install packages

So I suspect the exec tool runs in a different Python environment than the user shell, which causes inconsistent behavior.

Browser Tool + CAPTCHA

When trying to use the browser tool for lead generation (searching for local businesses), the agent often hits CAPTCHAs (DuckDuckGo, Google, etc.). The browser reports success loading the page, but the content is actually a CAPTCHA challenge. The model doesn’t always recognize this as a block and continues as if scraping is proceeding.

Overall Pattern

The big pattern I’m seeing is:

• Things work correctly in short, fresh sessions
• Tool use becomes unreliable as sessions grow
• Tool calls sometimes fail silently due to JSON/schema issues
• The model drifts into “talk about doing” instead of “actually do”
• I have to constantly re-anchor instructions and restate procedures

I’m trying different models (currently 4o-mini) but the behavior seems more related to the agent/tooling layer than the raw model.

Questions

Is session bloat still actively causing tool/schema drift even after the related issue was marked closed?

Is there a best practice for preventing tool schema loss mid-session?

Should long-term task procedures live in TOOLS.md instead of MEMORY.md for Discord usage?

Is there a way to make tool call parsing more tolerant or surface failures more explicitly?

I’m happy to provide logs, version info, or session file sizes if helpful.

Thanks — I really like the concept and want to get this stable for real-world task automation.