ekaesmem 5 days ago

The concept is promising, but I tested it and found the performance quite poor. I used one of my Python projects for the test, which consists of about 10k lines of Python code. The model I utilized was Claude 3.7 Sonnet with thinking.

in the first conversation round, I asked Claude to grasp the overall project and initialize its memory. Unfortunately, Claude experienced a hallucination and generated an episode that included a full name entirely unrelated to my project's actual full name, as my project name is an abbreviation.

In the second conversation round, I provided Claude with the full name of my project and requested it to correct its memory. In response, Claude apologized and claimed that it now understood the full name of my project, but it did not utilize any MCP command.

In the third conversation round, I specifically asked it to use the MCP command to update its memory. Claude successfully added a new episode but failed to remove the incorrect old episode.

It wasn't until the fourth conversation round that I directly pointed out that it should eliminate the incorrect old episode, and Claude finally completed the memory initialization that should have been accomplished at the end of the first round.

I have set up the correct Cursor Rules according to the README.

At this point, it appears this project is challenging to use with natural language. I need to explicitly instruct Claude on which specific tools to call for various operations to achieve the intended outcome.

Am I doing something wrong?

1
roseway4 4 days ago

I used Claude 3.7 Sonnet for the Cursor Agent when building the demo. Happy to hop on a call to walk through your experience, as I'm surprised the agent performed so poorly. daniel AT getzep.com