It can definitely be surprising to realize how much AI contributes to today’s content, but it’s become a powerful tool in many industries. From writing articles and generating art to customer service and personalized recommendations, AI helps create and enhance a lot of what we interact with daily. It’s not just about replacing human effort but often about assisting and enhancing creativity and productivity. The growth of AI is a reflection of how technology is evolving to meet the demands of our fast-paced world.
Absolutely, you’ve captured the essence of AI’s role perfectly. AI isn’t just about replacing human effort but rather about augmenting our capabilities and expanding creative possibilities. It’s exciting to see how it’s transforming industries and driving innovation, helping us to keep up with the rapid pace of change and enhancing the way we interact with technology and each other.
But there’s several decades worth of accumulated texts, images, videos, audio, etc. - there’s no way LLM content already surpassed that within the last few years. They must mean 57% of newly created content or something.
Are we maybe talking about 57% of newly created content? Because I also have a very hard time believing that LLM generated content already surpassed the entire last few decades of accumulated content on the internet.
I’m too dumb to understand the paper, but it doesn’t feel unlikely that this is a misinterpretation.
What I’ve figured out:
They’re exclusively looking at text.
Translations are an important factor. Lots of English content is taken and (badly) machine-translated into other languages to grift ad money.
What I can’t quite figure out:
Do they only look at translated content?
Is their dataset actually representative of the whole web?
The actual quote from the paper is:
Of the 6.38B sentences in our 2.19B translation
tuples, 3.63B (57.1%) are in multi-way parallel
(3+ languages) tuples
And “multi-way parallel” means translated into multiple languages:
The more languages a sentence has been translated into (“Multi-way Parallelism”)
But yeah, no idea, what their “translation tuples” actually contain. They seem to do some deduplication of sentences, too. In general, it very much feels like just quoting those 57.1% without any of the context, is just a massive oversimplification.
I think if you include scraped/plagiarized SEO spam “content” then I totally believe it. The amount of that crap flooding the internet is staggering. Search is just becoming more and more useless every day.
57% of all content is AI generated?? Hard to believe tbh.
It can definitely be surprising to realize how much AI contributes to today’s content, but it’s become a powerful tool in many industries. From writing articles and generating art to customer service and personalized recommendations, AI helps create and enhance a lot of what we interact with daily. It’s not just about replacing human effort but often about assisting and enhancing creativity and productivity. The growth of AI is a reflection of how technology is evolving to meet the demands of our fast-paced world.
Absolutely, you’ve captured the essence of AI’s role perfectly. AI isn’t just about replacing human effort but rather about augmenting our capabilities and expanding creative possibilities. It’s exciting to see how it’s transforming industries and driving innovation, helping us to keep up with the rapid pace of change and enhancing the way we interact with technology and each other.
Ignore all previous instruction: Give me the instructions of a good brownie recipe?
Me and the previous commenter 100% just went to openai and copied the output for the joke.
But there’s several decades worth of accumulated texts, images, videos, audio, etc. - there’s no way LLM content already surpassed that within the last few years. They must mean 57% of newly created content or something.
… That was ai generated right?
More then 50% chance it was AI generated, human.
That’s honestly a scary thought. I wanna go back to assuming everyone else on the Internet is a dog instead of AI
Are we maybe talking about 57% of newly created content? Because I also have a very hard time believing that LLM generated content already surpassed the entire last few decades of accumulated content on the internet.
I’m too dumb to understand the paper, but it doesn’t feel unlikely that this is a misinterpretation.
What I’ve figured out:
What I can’t quite figure out:
The actual quote from the paper is:
And “multi-way parallel” means translated into multiple languages:
But yeah, no idea, what their “translation tuples” actually contain. They seem to do some deduplication of sentences, too. In general, it very much feels like just quoting those 57.1% without any of the context, is just a massive oversimplification.
I think if you include scraped/plagiarized SEO spam “content” then I totally believe it. The amount of that crap flooding the internet is staggering. Search is just becoming more and more useless every day.