OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

cyrano@lemmy.dbzer0.com · 1 day ago

OpenAI declares AI race “over” if training on copyrighted works isn’t fair use

Grimy@lemmy.world · 1 day ago

If copyrights are used to add a huge price tag to any AI development, then it did just hamper innovation and technological development.

And sadly, what most are clamoring for will disproportionately affect open source development.

FarceOfWill@infosec.pub · 1 day ago

If open source apps can’t be copyrighted then the GPL is worthless and that will harm open source development much more

Grimy@lemmy.world · 1 day ago

I’m not sure how that applies in the current context, where it would be used as training data.

FarceOfWill@infosec.pub · 23 hours ago

Because once you can generate the GPL code from the lossy ai database trained on it the GPL protection is meaningless.

Grimy@lemmy.world · 18 hours ago

In such a scenario, it will be worth it. Llm aren’t databases that just hold copy pasted information. If we get to a point where it can spit out whole functional githubs replicating complex software, it will be able to do so with most software regardless of being trained on similar data or not.

All software will be a prompt away including the closed sourced ones. I don’t think you can get more open source then that. But that’s only if strident laws aren’t put in place to ban open source ai models, since Google will put that one prompt behind a paychecks worth of money if they can.

FarceOfWill@infosec.pub · 7 hours ago

I don’t see how you can write the law such that it allows training ai on copyrighted data without making it possible to train a special llm on a single github instead of the entire universe, and essentially treat it as a full compression of the source.

Grimy@lemmy.world · 43 minutes ago

The outputs are still bound to copyright laws. Tracing pixel per pixel over an artwork doesn’t make it immune to copyright laws, maliciously over training gen ai to act like a database and outright copy shouldn’t either.

If you have a carbon copy of someone’s github, it doesn’t matter if you generated it, it’s still a copy. Although code is a difficult example since I’m not entirely where the line is for one repo to be different then the other when they are accomplishing the same task.

I always imagined businesses just grabbed the gpl software and would tell their employees to rewrite it but different. Most things I dive down into seem to stem from one algorithm or two from a paper and the rest is fluff.