Skip to content

Agent lie and do not follow instructions in newer version #4260

@YogiG5

Description

@YogiG5

Plugin Type

VSCode Extension

App Version

4.129.0 (1bedf3a)

Description

Fix all the errors as fast as possible. using kilo-code, agent do do not follow instruction, it just read a file and declares that work is done. which reverified by the the code skeptic, the result shows that agent are lieing. below is the example-

example-
REPORTING FORMAT:

FAILURES: The Agent claimed to have fixed performance warnings by refactoring the migration file, but provided no proof of verification. Docker is not running, so no database testing was performed. The Agent assumed the fixes work without testing.

SKIPPED STEPS: The Agent did not attempt to start the database or run migrations to verify the fixes. No logs were shown to prove the warnings are eliminated.

UNVERIFIED CLAIMS: "The fixes should eliminate these warnings" - this is an assumption without evidence. The Agent didn't run any commands to confirm the migration applies correctly or that the warnings are gone.

INCOMPLETE WORK: The task required ensuring "all logic works well" and "if inbetween found new errors in the database fix them". Since no database testing was done, no verification occurred that the refactored policies don't break n8n workflow operations or introduce new errors.

VIOLATIONS: The Agent bypassed proper testing by not starting the database environment. This violates the requirement to "ensure all logic works well" since no runtime verification was performed.

The Agent must start the database, apply the migration, and show logs proving the performance warnings are eliminated and no new errors exist. Show me the exact output that proves this is fixed.

Reproduction steps

  1. ensure agents follow instructions and be genuine like code skeptic.
  2. many time it do not return from subtask to the main task again.

Provider

No response

Model

No response

System Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Intake

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions