I've been experimenting with Claude pretty heavily the last week or so and the most frustrating thing is no matter how many time you tell it to produce 'a full and complete' <whatever you're working on> it just can't resist chopping out sections of code and replacing with a placeholder comment.
It also hits its 'maximum limit allowed at this time' quite often due to it's inability to understand its own environment and just modify an existing software artifact rather than regenerating the entire thing. I know it can do it, it knows it can do it because it will tell you why it failed but it just wont do it for some reason. Of course this comes after you manage to convince it to quit removing sections of code so its context window is already polluted from past failures.
I spent a solid three days trying to get it to successfully do a somewhat simple thing (something that would've taken me an afternoon if I dedicated myself to the task) just to see if it could manage to do it and it failed horribly, multiple times, even after it was able to say why it was failing, no matter how many tries I gave it. And it's not like it is some impossible task either, it was able to successfully (might I even say impressively) accomplish the exact same task with no problems at a smaller scale.
It's also pretty horrible at debugging, even if you tell it exactly the problem, but that's just me testing the limits of what it can do.
So, like TFA states, it is pretty amazing at smaller tasks but utterly and completely fails at anything larger than a couple hundred lines of code.
VertanaNinjai 33 days ago [-]
Claude’s web UI has some strange limitations. You may be better off using another provider that just uses Claude 3.5 Sonnet API. One example of many is Kagi’s interface. You can set a system prompt and Claude will actually follow it instead of trying to be too clever. If your prompt says “give full and complete code files at all times unless specified otherwise by the user” it tends to follow that way more accurately than Claude.ai. May be worth a try for you to check out some other providers/interfaces.
UncleEntity 33 days ago [-]
Perhaps...
Seems more like throwing good money after bad as they appear to make it so you have to get an API key to do anything useful and then there's the possibility of spending significantly more than the $20 month 'pro' plan. Not something I'm really all that interested in considering this has nothing more than entertainment value and, from a video on the youtubes that broke the costs down, they have the highest per token rate.
I have enough yaks to shave to justify $20/month but anything over that...
techbrigades 33 days ago [-]
I’ve had really good luck building micro-SaaS up to a certain point of complexity, and beyond what a coding assistant will do.
A lot of the problems you describe are due to the need to provide adequate context, project details, and rules in your prompt.
Claude will generate files it doesn’t think already exist, and in a SaaS project, you have many files.
Claude will struggle with syntax and proper usage of library/module/package versions beyond its training data. You have to provide your project with knowledge to work around this.
Lastly you will hit usage limits while working on projects because it’s a fixed cost offering. You can track and generate a project status to pass on to the next agent. This works “ok” and when I hit my sonnet limit I use haiku to bug and type fixes.
Bottom line, an out of the box chatbot is a great playground to flush out your techniques, but most software projects have complexities which must be managed in a separate system designed to manage all the details and break down the project into hundreds or thousands of individual tasks.
I want this magic wand too, but you have to build that yourself (or buy it when it becomes available). It’s been a fascinating learning process.
UncleEntity 32 days ago [-]
The problem is I'm not doing anything as complicated as what you're describing.
The task was/is to take a grammar for APL from some long forgotten paper and turn it into a lemon parser. Easy, peasy, well within its wheelhouse and it had spectacular initial results with the help of DeepSeek-R1 analyzing its work.
"Oh, good job, robot," me types, "let's work on a lexer. Hmm... you seem to have clipped out some important rules at some point, we need to add those back." Then, boom, Claude is completely worthless.
I want Claude to succeed. It was doing so well then it hit a self-reinforcing wall of failure that it just can't get over even though it can analyze its behavior and say exactly why it keeps failing.
I mean, exactly zero people think the world needs an APL interpreter written by the robots but the point of the project is to see how far they can get without having a human write a single line of code. I know they have limitations and have no problem helping them work around them.
But, alas, this project is shelved until the next big hype cycle.
lemming 33 days ago [-]
This seems to be a problem with the most recent model (claude-3-5-sonnet-20241022). There was a lot of complaining about it on their discord when it was released and I had a lot of problems with it too. If you use the previous Claude model, it still works very well and doesn't have these problems.
sunaookami 33 days ago [-]
Their "Claude 3.5 Sonnet (New)" model from end of October 2024 is very disappointing and a major step backwards. It has a lot of bugs and this pseudo Chain-of-Thought that completely ruins any output. The guardrails have gotten so ridiculous that it refuses to translate text due to "copyright" or when the original text contains a "bad" word. Claude 3.5 Sonnet was one of the best models but now it gets beaten by DeepSeek-R1.
ilrwbwrkhv 33 days ago [-]
I think Claude and all of these models can translate code really well from one language to another.
The thing is it starts to fail when you are just prompting it without any example code.
And to fully prompt it to be exactly what you wanted to write is basically writing the code yourself.
So for novel code I have had far less success.
But any kind of translation or matching an API shape, it works pretty much easily.
Also Deepseek R1 can do this for much much cheaper and at virtually the same level of quality.
svaha1728 33 days ago [-]
I'm loving Claude, is there a good VS Code integration tool that isn't Github Copilot?
ottah 33 days ago [-]
My only experience is with intellij ides, but this plugin works in both https://www.continue.dev/. We use this internally at my company, and I think it's one of the better open source ones.
At home I use it with open router and ollama/vllm/llama.cpp server
ics 32 days ago [-]
I have been using Cody with VSCodium and it uses Claude by default. I appreciate that it’s not in my face too much, it’s easy to use either the chat interface or in-editor commands/completion. I paid the $15 one month when I was trying it out more heavily and switched back to free for just the occasional chat.
tluyben2 32 days ago [-]
Cline works well for me.
contractorwolf 33 days ago [-]
You might want to think about using cursor instead of attempting to use a web interface to create full app. Cursor can do better targeted changes and allows you to tell it individual file to look at for changes or the whole codebase.
It also hits its 'maximum limit allowed at this time' quite often due to it's inability to understand its own environment and just modify an existing software artifact rather than regenerating the entire thing. I know it can do it, it knows it can do it because it will tell you why it failed but it just wont do it for some reason. Of course this comes after you manage to convince it to quit removing sections of code so its context window is already polluted from past failures.
I spent a solid three days trying to get it to successfully do a somewhat simple thing (something that would've taken me an afternoon if I dedicated myself to the task) just to see if it could manage to do it and it failed horribly, multiple times, even after it was able to say why it was failing, no matter how many tries I gave it. And it's not like it is some impossible task either, it was able to successfully (might I even say impressively) accomplish the exact same task with no problems at a smaller scale.
It's also pretty horrible at debugging, even if you tell it exactly the problem, but that's just me testing the limits of what it can do.
So, like TFA states, it is pretty amazing at smaller tasks but utterly and completely fails at anything larger than a couple hundred lines of code.
Seems more like throwing good money after bad as they appear to make it so you have to get an API key to do anything useful and then there's the possibility of spending significantly more than the $20 month 'pro' plan. Not something I'm really all that interested in considering this has nothing more than entertainment value and, from a video on the youtubes that broke the costs down, they have the highest per token rate.
I have enough yaks to shave to justify $20/month but anything over that...
A lot of the problems you describe are due to the need to provide adequate context, project details, and rules in your prompt.
Claude will generate files it doesn’t think already exist, and in a SaaS project, you have many files.
Claude will struggle with syntax and proper usage of library/module/package versions beyond its training data. You have to provide your project with knowledge to work around this.
Lastly you will hit usage limits while working on projects because it’s a fixed cost offering. You can track and generate a project status to pass on to the next agent. This works “ok” and when I hit my sonnet limit I use haiku to bug and type fixes.
Bottom line, an out of the box chatbot is a great playground to flush out your techniques, but most software projects have complexities which must be managed in a separate system designed to manage all the details and break down the project into hundreds or thousands of individual tasks.
I want this magic wand too, but you have to build that yourself (or buy it when it becomes available). It’s been a fascinating learning process.
The task was/is to take a grammar for APL from some long forgotten paper and turn it into a lemon parser. Easy, peasy, well within its wheelhouse and it had spectacular initial results with the help of DeepSeek-R1 analyzing its work.
"Oh, good job, robot," me types, "let's work on a lexer. Hmm... you seem to have clipped out some important rules at some point, we need to add those back." Then, boom, Claude is completely worthless.
I want Claude to succeed. It was doing so well then it hit a self-reinforcing wall of failure that it just can't get over even though it can analyze its behavior and say exactly why it keeps failing.
I mean, exactly zero people think the world needs an APL interpreter written by the robots but the point of the project is to see how far they can get without having a human write a single line of code. I know they have limitations and have no problem helping them work around them.
But, alas, this project is shelved until the next big hype cycle.
The thing is it starts to fail when you are just prompting it without any example code.
And to fully prompt it to be exactly what you wanted to write is basically writing the code yourself.
So for novel code I have had far less success.
But any kind of translation or matching an API shape, it works pretty much easily.
Also Deepseek R1 can do this for much much cheaper and at virtually the same level of quality.
At home I use it with open router and ollama/vllm/llama.cpp server