Over the past few years, the evolution of AI-driven tools like GitHub’s Copilot and other large language models (LLMs) has promised to revolutionise programming. By leveraging deep learning, these tools can generate code, suggest solutions, and even troubleshoot issues in real-time, saving developers hours of work. While these tools have obvious benefits in terms of productivity, there’s a growing concern that they may also have unintended consequences on the quality and skillset of programmers.

  • Daemon Silverstein
    link
    fedilink
    02 months ago

    I’m a 10+ (cumulative) yr. experience dev. While I never used The GitHub Copilot specifically, I’ve been using LLMs (as well as AI image generators) on a daily basis, mostly for non-dev things, such as analyzing my human-written poetry in order to get insights for my own writing. And I already did the same for codes I wrote, asking for LLMs to “Analyze and comment” my code, for the sake of insights. There were moments when I asked it for code snippets, and almost every code snippet it generated was indeed working or just needing few fixes.

    They’ve been becoming good at this, but not enough to really replace my own coding and analysis. Instead, they’re becoming really better for poetry (maybe because their training data is mostly books and poetry works) and sentiment analysis. I use many LLMs simultaneously in order to compare them:

    • Free version of Google Gemini is becoming lazy (short answers, superficial analysis, problems with keeping context, drafts aren’t so diverse as they were before, among other problems)
    • free version of ChatGPT is a bit better (can keep contexts, can issue detailed answers) but not enough (it does hallucinate sometimes: good for surrealist poetry but bad for code and other technical matters when precision and coherence matters)
    • Claude is laughable hypersensitive and self-censoring to certain words independently of contexts (got a code or text that remotely mentions the word “explode” as in PHP’s explode function? “Sorry, can’t comment on texts alluding to dangerous practices such as involving explosives”, I mean, WHAT?!?!)
    • Bing Copilot got web searching, but it has a context limit of 5 messages, so, only usable for quick and short things.
    • Same about Bing Copilot goes for Perplexity
    • Mixtral is very hallucination-prone (i.e. does not properly cohere)
    • LLama has been the best of all (via DDG’s “AI Chat” feature), although it sometimes glitches (i.e. starts to output repeated strings ad æternum)

    As you see, I tried almost all of them. In summary, while it’s good to have such tools, they should never replace human intelligence… Or, at least, they shouldn’t…

    Problem is, dev companies generally focus on “efficiency” over “efficacy”, wishing the shortest deadlines while wishing some perfection. Very understandable demands, but humans are humans, not robots. We need our time to deliver, we need to cautiously walk through all the steps needed to finally deploy something (especially big things), or it’ll become XGH programming (Extreme Go Horse). And machines can’t do that so perfectly, yet. For now, LLM for development is XGH: really fast, but far from coherent about the big picture (be it a platform, a module, a website, etc).

    • @[email protected]
      link
      fedilink
      English
      02 months ago

      Claude is laughable hypersensitive and self-censoring to certain words independently of contexts (…)

      That’s not a problem, nor Claude’s main problem.

      Claude’s main problem is that it is frequently down, unreliable, and extremely buggy. Overall I think it might be better than ChatGPT and Copilot, but it’s simply so unstable it becomes unusable.