Stack Overflow Website Traffic

The Picard Maneuver@lemmy.world · 6 days ago

Stack Overflow Website Traffic

faltryka@lemmy.world · 5 days ago

So what do we train gpt on when stack overflow degrades?

Will library docs be enough? Maybe.

PriorityMotif@lemmy.world · 5 days ago

SO is already degraded because they didn’t allow new answers even though the old answers are based on old depreciated versions and no longer relevant.

magic_lobster_party@fedia.io · 5 days ago

Probably public GitHub projects, which may or may not be written using GPT

ByteOnBikes@slrpnk.net · 5 days ago

Absolutely terrifies me.

I asked AI to create an encryption method and it pulled code from 2015.

Smelling funny, I asked some experts. They told me that the AI solution was vulnerable since 2020 and recommended another method.

moseschrute@lemmy.world · 5 days ago

What happened in 2020 that suddenly made that solution vulnerable?

Lem Jukes@lemm.ee · edit-2 5 days ago

I feel like the thing that terrifies you is really just idiots with powerful tools. Which have always been around and this is just a new, albeit scarier than normal, tool. The idiot implementing ‘an encryption method whole sale, directly from an ai’ was always going to break shit. They just can do it faster, more easily, and with more devastation. But the idiots were always going to idiot regardless. So it’s up to the non idiots to figure out how to use the same powerful tools to protect everyone(including the idiots themselves) from breaking absolutely everything.

In the weeds here but just trying to say Ai doesn’t kill people, people kill people. But the ai is gonna make it a fuck load easier so we should absolutely put regulation and safeguards in placez

faltryka@lemmy.world · 5 days ago

Yeah that makes sense. I know people are concerned about recycling AI output into training inputs, but I don’t know that I’m entirely convinced that’s damning.

magic_lobster_party@fedia.io · 5 days ago

The theory behind this is that no ML model is perfect. They will always make some errors. So if these errors they make are included in the training data, then future ML models will learn to repeat the same errors of old models + additional errors.

Over time, ML models will get worse and worse because the quality of the training data will get worse. It’s like a game of Chinese whispers.

lorty@lemmy.ml · 5 days ago

No matter how good your photocopier is, a copy of a copy is worse, and gets worse everytime you do it.

SpaceNoodle@lemmy.world · 5 days ago

GIGO.

faltryka@lemmy.world · 5 days ago

Yeah I agree garbage in garbage out, but I don’t know that is what will happen. If I create a library, and then use gpt to generate documentation for it, I’m going to review and edit and enrich that as the owner of that library. I think a great many people are painting this cycle in black and white, implying that any involvement from AI is automatically garbage, and that’s fallacious and inaccurate.

UnderpantsWeevil@lemmy.world · edit-2 5 days ago

There’s a serious argument that StackOverflow was, itself, a patch job in a technical environment that lacked good documentation and debug support.

I’d argue the mistake was training on StackExchange to begin with and not using an actual stack of manuals on proper coding written by professionals.

The problem was never having the correct answer but sifting out of the overall pool of information. When ChatGPT isn’t hallucinating, it does that much better than Stack Exchange

SzethFriendOfNimi@lemmy.world · edit-2 5 days ago

jacksilver@lemmy.world · 5 days ago

This has been a concern of mine for a long time. People act like docs and code bases are enough, but it’s obvious when looking up something niche that it isn’t. These models need a lot of input data, and we’re effectively killing the source(s) of new data.