0
Everyone keeps saying AI art is just copying, but I think they're missing the actual problem.
I keep seeing this argument online, especially after that big art contest in Austin last month. People say the AI just remixes existing work. That's not really it. The real issue is that the training data is a huge, unlabeled mess. We have no idea what's in there, or if the artists whose work was used ever agreed. I read a report from a researcher at a university in Oregon who tried to trace the data for one popular model. They said it was like trying to find a specific book in a landfill. How can we trust the output if we don't know what went in? Has anyone else found a clear explanation of a major model's training sources?
4 comments
Log in to join the discussion
Log In4 Comments
ruby6591mo agoMost Upvoted
That Oregon report and @smith.elliot's friend prove it.
5
kellygrant6d ago
Did that report say anything about where the landfill even came from? I've been trying to figure out if there's like a master list somewhere and it's just this rabbit hole of broken links and dead servers.
4
smith.elliot1mo ago
Heard about a friend who's an illustrator. She found her own work, with her signature cropped out, in a training data dump some guy posted online. Felt super gross, like her stuff was just taken.
3
the_robin1mo ago
I used to think scraping public art for training data was just how the internet worked. But hearing about that illustrator finding her cropped signature in a data dump really got to me. It's not just a link, it's her actual work file, taken and stripped. That moves it from vague borrowing to feeling like a direct theft of her creative property. It's a concrete example of why artists feel so violated by this stuff.
3