Table of Contents

This is a long article, so I'm breaking it up into a series of posts which will be released over the next few days. You can also read the full work as a PDF or EPUB; these files will be updated as each section is released.

Machine learning shifts the cost balance for writing, distributing, and reading text, as well as other forms of media. Aggressive ML crawlers place high load on open web services, degrading the experience for humans. As inference costs fall, we’ll see ML embedded into consumer electronics and everyday software. As models introduce subtle falsehoods, interpreting media will become more challenging. LLMs enable new scales of targeted, sophisticated spam, as well as propaganda campaigns. The web is now polluted by LLM slop, which makes it harder to find quality information—a problem which now threatens journals, books, and other traditional media. I think ML will exacerbate the collapse of social consensus, and create justifiable distrust in all kinds of evidence. In reaction, readers may reject ML, or move to more rhizomatic or institutionalized models of trust for information. The economic balance of publishing facts and fiction will shift.

Creepy Crawlers

ML systems are thirsty for content, both during training and inference. This has led to an explosion of aggressive web crawlers. While existing crawlers generally respect robots.txt or are small enough to pose no serious hazard, the last three years have been different. ML scrapers are making it harder to run an open web service.

As Drew Devault put it last year, ML companies are externalizing their costs directly into his face". This year Weird Gloop confirmed scrapers pose a serious challenge. Today’s scrapers ignore robots.txt and sitemaps, request pages with unprecedented frequency, and masquerade as real users. They fake their user agents, carefully submit valid-looking headers, and spread their requests across vast numbers of residential proxies. An entire industry has sprung up to support crawlers. This traffic is highly spiky, which forces web sites to overprovision—or to simply go down. A forum I help run suffers frequent brown-outs as we’re flooded with expensive requests for obscure tag pages. The ML industry is in essence DDoSing the web.

Site operators are fighting back with aggressive filters. Many use Cloudflare or Anubis challenges. Newspapers are putting up more aggressive paywalls. Others require a logged-in account to view what used to be public content. These make it harder for regular humans to access the web.

CAPTCHAs are proliferating, but I don’t think this will last. ML systems are already quite good at them, and we can’t make CAPTCHAs harder without breaking access for humans. I routinely fail today’s CAPTCHAs: the computer did not believe which squares contained buses, my mouse hand was too steady, the image was unreadably garbled, or its weird Javascript broke.

ML Everywhere

Today interactions with ML models are generally constrained to computers and phones. As inference costs fall, I think it’s likely we’ll see LLMs shoved into everything. Companies are already pushing support chatbots on their web sites; the last time I went to Home Depot and tried to use their web site to find the aisles for various tools and parts, it urged me to ask their “AI” assistant—which was, of course, wrong every time. In a few years, I expect LLMs to crop up in all kinds of gimmicky consumer electronics (ask your fridge what to make for dinner!)1

Today you need a fairly powerful chip and lots of memory to do local inference with a high-quality model. In a decade or so that hardware will be available on phones, and then dishwashers. At the same time, I imagine manufacturers will start shipping stripped-down, task-specific models for embedded applications, so you can, I don’t know, ask your oven to set itself for a roast, or park near a smart meter and let it figure out your plate number and how long you were there.

If the IOT craze is any guide, a lot of this technology will be stupid, infuriating, and a source of enormous security and privacy risks. Some of it will also be genuinely useful. Maybe we get baby monitors that use a camera and a local model to alert parents if an infant has stopped breathing. Better voice interaction could make more devices accessible to blind people. Machine translation (even with its errors) is already immensely helpful for travelers and immigrants, and will only get better.

On the flip side, ML systems everywhere means we’re going to have to deal with their shortcomings everywhere. I can’t wait to argue with an LLM elevator in order to visit the doctor’s office, or try to convince an LLM parking gate that the vehicle I’m driving is definitely inside the garage. I also expect that corporations will slap ML systems on less-common access paths and call it a day. Sighted people might get a streamlined app experience while blind people have to fight with an incomprehensible, poorly-tested ML system. “Oh, we don’t need to hire a Spanish-speaking person to record our phone tree—we’ll have AI do it.”

Careful Reading

LLMs generally produce well-formed, plausible text. They use proper spelling, punctuation, and grammar. They deploy a broad vocabulary with a more-or-less appropriate sense of diction, along with sophisticated technical language, mathematics, and citations. These are the hallmarks of a reasonably-intelligent writer who has considered their position carefully and done their homework.

For human readers prior to 2023, these formal markers connoted a certain degree of trustworthiness. Not always, but they were broadly useful when sifting through the vast sea of text in the world. Unfortunately, these markers are no longer useful signals of a text’s quality. LLMs will produce polished landing pages for imaginary products, legal briefs which cite bullshit cases, newspaper articles divorced from reality, and complex, thoroughly-tested software programs which utterly fail to accomplish their stated goals. Humans generally do not do these things because it would be profoundly antisocial, not to mention ruinous to one’s reputation. But LLMs have no such motivation or compunctions—again, a computer can never be held accountable.

Perhaps worse, LLM outputs can appear cogent to an expert in the field, but contain subtle, easily-overlooked distortions or outright errors. This problem bites experts over and over again, like Peter Vandermeersch, a professional journalist who warned others to beware LLM hallucinations—and was then suspended for publishing articles containing fake LLM quotes. I frequently find myself scanning through LLM-generated text, thinking “Ah, yes, that’s reasonable”, and only after three or four passes realize I’d skipped right over complete bullshit. Catching LLM errors is cognitively exhausting.

The same goes for images and video. I’d say at least half of the viral “adorable animal” videos I’ve seen on social media in the last month are ML-generated. Folks on Bluesky seem to be decent about spotting this sort of thing, but I still have people tell me face-to-face about ML videos they saw, insisting that they’re real.

This burdens writers who use LLMs, of course, but mostly it burdens readers, who must work far harder to avoid accidentally ingesting bullshit. I recently watched a nurse in my doctor’s office search Google about a blood test item, read the AI-generated summary to me, rephrase that same answer when I asked questions, and only after several minutes realize it was obviously nonsense. Not only do LLMs destroy trust in online text, but they destroy trust in other human beings.

Spam

Prior to the 2020s, generating coherent text was relatively expensive—you usually had to find a fluent human to write it. This limited spam in a few ways. Humans and machines could reasonably identify most generated text. High-quality spam existed, but it was usually repeated verbatim or with form-letter variations—these too were easily detected by ML systems, or rejected by humans (“I don’t even have a Netflix account!”) Since passing as a real person was difficult, moderators could keep spammers at bay based on vibes—especially on niche forums. “Tell us your favorite thing about owning a Miata” was an easy way for an enthusiast site to filter out potential spammers.

LLMs changed that. Generating high-quality, highly-targeted spam is cheap. Humans and ML systems can no longer reliably distinguish organic from machine-generated text, and I suspect that problem is now intractable, short of some kind of Butlerian Jihad. This shifts the economic balance of spam. The dream of a useful product or business review has been dead for a while, but LLMs are nailing that coffin shut. Hacker News and Reddit comments appear to be increasingly machine-generated. Mastodon instances are seeing LLMs generate plausible signup requests. Just last week, Digg gave up entirely:

The internet is now populated, in meaningful part, by sophisticated AI agents and automated accounts. We knew bots were part of the landscape, but we didn’t appreciate the scale, sophistication, or speed at which they’d find us. We banned tens of thousands of accounts. We deployed internal tooling and industry-standard external vendors. None of it was enough. When you can’t trust that the votes, the comments, and the engagement you’re seeing are real, you’ve lost the foundation a community platform is built on.

I now get LLM emails almost every day. One approach is to pose as a potential client or collaborator, who shows specific understanding of the work I do. Only after a few rounds of conversation or a video call does the ruse become apparent: the person at the other end is in fact seeking investors for their “AI video chatbot” service, wants a money mule, or has been bamboozled by their LLM into thinking it has built something interesting that I should work on. I’ve started charging for initial consultations.

I expect we have only a few years before e-mail, social media, etc. are full of high-quality, targeted spam. I’m shocked it hasn’t happened already—perhaps inference costs are still too high. I also expect phone spam to become even more insufferable as every company with my phone number uses an LLM to start making personalized calls. It’s only a matter of time before political action committees start using LLMs to send even more obnoxious texts.

Hyperscale Propaganda

Around 2014 my friend Zach Tellman introduced me to InkWell: a software system for poetry generation. It was written (because this is how one gets funding for poetry) as a part of a DARPA project called Social Media in Strategic Communications. DARPA was not interested in poetry per se; they wanted to counter persuasion campaigns on social media, like phishing attacks or pro-terrorist messaging. The idea was that you would use machine learning techniques to tailor a counter-message to specific audiences.

Around the same time stories started to come out about state operations to influence online opinion. Russia’s Internet Research Agency hired thousands of people to post on fake social media accounts in service of Russian interests. China’s womao dang, a mixture of employees and freelancers, were paid to post pro-government messages online. These efforts required considerable personnel: a district of 460,000 employed nearly three hundred propagandists. I started to worry that machine learning might be used to amplify large-scale influence and disinformation campaigns.

In 2022, researchers at Stanford revealed they’d identified networks of Twitter and Meta accounts propagating pro-US narratives in the Middle East and Central Asia. These propaganda networks were already using ML-generated profile photos. However these images could be identified as synthetic, and the accounts showed clear signs of what social media companies call “coordinated inauthentic behavior”: identical images, recycled content across accounts, posting simultaneously, etc.

These signals can not be relied on going forward. Modern image and text models have advanced, enabling the fabrication of distinct, plausible identities and posts. Posting at the same time is an unforced error. As machine-generated content becomes more difficult for platforms and individuals to distinguish from human activity, propaganda will become harder to identify and limit.

At the same time, ML models reduce the cost of IRA-style influence campaigns. Instead of employing thousands of humans to write posts by hand, language models can spit out cheap, highly-tailored political content at scale. Combined with the pseudonymous architecture of the public web, it seems inevitable that the future internet will be flooded by disinformation, propaganda, and synthetic dissent.

This haunts me. The people who built LLMs have enabled a propaganda engine of unprecedented scale. Voicing a political opinion on social media or a blog has always invited drop-in comments, but until the 2020s, these comments were comparatively expensive, and you had a chance to evaluate the profile of the commenter to ascertain whether they seemed like a real person. As ML advances, I expect it will be common to develop an acquaintanceship with someone who posts selfies with her adorable cats, shares your love of board games and knitting, and every so often, in a vulnerable moment, expresses her concern for how the war is affecting her mother. Some of these people will be real; others will be entirely fictitious.

The obvious response is distrust and disengagement. It will be both necessary and convenient to dismiss political discussion online: anyone you don’t know in person could be a propaganda machine. It will also be more difficult to have political discussions in person, as anyone who has tried to gently steer their uncle away from Facebook memes at Thanksgiving knows. I think this lays the epistemic groundwork for authoritarian regimes. When people cannot trust one another and give up on political discussion, we lose the capability for informed, collective democratic action.

When I wrote the outline for this section about a year ago, I concluded:

I would not be surprised if there are entire teams of people working on building state-sponsored “AI influencers”.

Then this story dropped about Jessica Foster, a right-wing US soldier with a million Instagram followers who posts a stream of selfies with MAGA figures, international leaders, and celebrities. She is in fact a (mostly) photorealistic ML construct; her Instagram funnels traffic to an Onlyfans where you can pay for pictures of her feet. I anticipated weird pornography and generative propaganda separately, but I didn’t see them coming together quite like this. I expect the ML era will be full of weird surprises.

Web Pollution

Back in 2022, I wrote:

God, search results are about to become absolute hot GARBAGE in 6 months when everyone and their mom start hooking up large language models to popular search queries and creating SEO-optimized landing pages with plausible-sounding results.

Searching for “replace air filter on a Samsung SG-3560lgh” is gonna return fifty Quora/WikiHow style sites named “How to replace the air filter on a Samsung SG3560lgh” with paragraphs of plausible, grammatical GPT-generated explanation which may or may not have any connection to reality. Site owners pocket the ad revenue. AI arms race as search engines try to detect and derank LLM content.

Wikipedia starts getting large chunks of LLM text submitted with plausible but nonsensical references.

I am sorry to say this one panned out. I routinely abandon searches that would have yielded useful information three years ago because most—if not all—results seem to be LLM slop. Air conditioner reviews, masonry techniques, JVM APIs, woodworking joinery, finding a beekeeper, health questions, historical chair designs, looking up exercises—the web is clogged with garbage. Kagi has released a feature to report LLM slop, though it’s moving slowly. Wikipedia is awash in LLM contributions and trying to identify and remove them; the site just announced a formal policy against LLM use.

This feels like an environmental pollution problem. There is a small-but-viable financial incentive to publish slop online, and small marginal impacts accumulate into real effects on the information ecosystem as a whole. There is essentially no social penalty for publishing slop—“AI emissions” aren’t regulated like methane, and attempts to make AI use uncouth seem unlikely to shame the anonymous publishers of Frontier Dad’s Best Adirondack Chairs of 2027.

I don’t know what to do about this. Academic papers, books, and institutional web pages have remained higher quality, but fake LLM-generated papers are proliferating, and I find myself abandoning “long tail” questions. Thus far I have not been willing to file an inter-library loan request and wait three days to get a book that might discuss the questions I have about (e.g.) maintaining concrete wax finishes. Sometimes I’ll bike to the store and ask someone who has actually done the job what they think, or try to find a friend of a friend to ask.

Consensus Collapse

I think a lot of our current cultural and political hellscape comes from the balkanization of media. Twenty years ago, the divergence between Fox News and CNN’s reporting was alarming. In the 2010s, social media made it possible for normal people to get their news from Facebook and led to the rise of fake news stories manufactured by overseas content mills for ad revenue. Now slop farmers use LLMs to churn out nonsense recipes and surreal videos of cops giving bicycles to crying children. People seek out and believe slop. When Maduro was kidnapped, ML-generated images of his arrest proliferated on social platforms. An acquaintance, convinced by synthetic video, recently tried to tell me that the viral “adoption center where dogs choose people” was real.2

The problem seems worst on social media, where the barrier to publication is low and viral dynamics allow for rapid spread. But slop is creeping into the margins of more traditional information channels. Last year Fox News published an article about SNAP recipients behaving poorly based on ML-fabricated video. The Chicago Sun-Times published a sixty-four page slop insert full of imaginary quotes and fictitious books. I fear future journalism, books, and ads will be full of ML confabulations.

LLMs can also be trained to distort information. Elon Musk argues that existing chatbots are too liberal, and has begun training one which is more conservative. Last year Musk’s LLM, Grok, started referring to itself as MechaHitler and “recommending a second Holocaust”. Musk has also embarked—presumably to the delight of Garry Tan—upon a project to create a parallel LLM-generated Wikipedia, because of “woke”.

As people consume LLM-generated content, and as they ask LLMs to explain current events, economics, ecology, race, gender, and more, I worry that our understanding of the world will further diverge. I envision a world of alternative facts, endlessly generated on-demand. This will, I think, make it more difficult to effect the coordinated policy changes we need to protect each other and the environment.

The End of Evidence

Audio, photographs, and video have long been forgeable, but doing so in a sophisticated, plausible way was until recently a skilled process which was expensive and time consuming to do well. Now every person with a phone can, in a few seconds, erase someone from a photograph.

Last fall, I wrote about the effect of immigration enforcement on my city. During that time, social media was flooded with video: protestors beaten, residential neighborhoods gassed, families dragged screaming from cars. These videos galvanized public opinion while the government lied relentlessly. A recurring phrase from speakers at vigils the last few months has been “Thank God for video”.

I think that world is coming to an end.

Video synthesis has advanced rapidly; you can generally spot it, but some of the good ones are now very good. Even aware of the cues, and with videos I know are fake, I’ve failed to see the proof until it’s pointed out. I already doubt whether videos I see on the news or internet are real. In five years I think many people will assume the same. Did the US kill 175 people by firing a Tomahawk at an elementary school in Minab? “Oh, that’s AI” is easy to say, and hard to disprove.

I see a future in which anyone can find images and narratives to confirm our favorite priors, and yet we simultaneously distrust most forms of visual evidence; an apathetic cornucopia. I am reminded of Hannah Arendt’s remarks in The Origins of Totalitarianism:

In an ever-changing, incomprehensible world the masses had reached the point where they would, at the same time, believe everything and nothing, think that everything was possible and that nothing was true…. Mass propaganda discovered that its audience was ready at all times to believe the worst, no matter how absurd, and did not particularly object to being deceived because it held every statement to be a lie anyhow. The totalitarian mass leaders based their propaganda on the correct psychological assumption that, under such conditions, one could make people believe the most fantastic statements one day, and trust that if the next day they were given irrefutable proof of their falsehood, they would take refuge in cynicism; instead of deserting the leaders who had lied to them, they would protest that they had known all along that the statement was a lie and would admire the leaders for their superior tactical cleverness.

I worry that the advent of image synthesis will make it harder to mobilize the public for things which did happen, easier to stir up anger over things which did not, and create the epistemic climate in which totalitarian regimes thrive. Or perhaps future political structures will be something weirder, something unpredictable. LLMs are broadly accessible, not limited to governments, and the shape of media has changed.

Epistemic Reaction

Every societal shift produces reaction. I expect countercultural movements to reject machine learning. I don’t know how successful they will be.

The Internet says kids are using “that’s AI” to describe anything fake or unbelievable, and consumer sentiment seems to be shifting against “AI”. Anxiety over white-collar job displacement seems to be growing. Speaking personally, I’ve started to view people who use LLMs in their writing, or paste LLM output into conversations, as having delivered the informational equivalent of a dead fish to my doorstep. If that attitude becomes widespread, perhaps we’ll see continued interest in human media.

On the other hand chatbots have jaw-dropping usage figures, and those numbers are still rising. A Butlerian Jihad doesn’t seem imminent.

I do suspect we’ll see more skepticism towards evidence of any kind—photos, video, books, scientific papers. Experts in a field may still be able to evaluate quality, but it will be difficult for a lay person to catch errors. While information will be broadly accessible thanks to ML, evaluating the quality of that information will be increasingly challenging.

One reaction could be rhizomatic: people could withdraw into trusting only those they meet in person, or more formally via cryptographically authenticated webs of trust. The latter seems unlikely: we have been trying to do web-of-trust systems for over thirty years. Speaking glibly as a user of these systems… normal people just don’t care that much.

Another reaction might be to re-centralize trust in a small number of publishers with a strong reputation for vetting. Maybe NPR and the Associated Press become well-known for rigorous ML controls and are commensurately trusted.3 Perhaps most journals are understood to be a “slop wild west”, but high-profile venues like Physical Review Letters remain of high quality. They could demand an ethics pledge from submitters that their work was produced without LLM assistance, and somehow publishers, academic institutions, and researchers collectively find the budget and time for thorough peer review.4

It used to be that families would pay for news and encyclopedias. It is tempting to imagine that World Book and the New York Times might pay humans to research and write high-quality factual articles, and that regular people would pay money to access that information. This seems unlikely given current market dynamics, but if slop becomes sufficiently obnoxious, perhaps that world could return.

Fiction seems a different story. You could imagine a prestige publishing house or film production company committing to works written by human authors, and some kind of elaborate verification system. On the other hand, slop might be “good enough” for people’s fiction desires, and can be tailored to the precise interest of the reader. This could cannibalize the low end of the market and render human-only works economically unviable. We’re watching this play out now in recorded music: “AI artists” on Spotify are racking up streams, and some people are content to listen entirely to Suno slop.5 It doesn’t have to be entirely ML-generated either. Centaurs (humans working in concert with ML) may be able to churn out music, books, and film so quickly that it is no longer economically possible to work “by hand”, except for niche audiences.

Adam Neely has a thought-provoking video on this question, and predicts a bifurcation of the arts: recorded music will become dominated by generative AI, while live orchestras and rap shows continue to flourish. VFX artists and film colorists might find themselves out of work, while audiences continue to patronize plays and musicals. I don’t know what happens to books.

Creative work as an avocation seems likely to continue; I expect to be reading queer zines and watching videos of people playing their favorite instruments in 2050. Human-generated work could also command a premium on aesthetic or ethical grounds, like organic produce. The question is whether those preferences can sustain artistic, journalistic, and scientific industries.


  1. Washing machines already claim to be “AI” but they (thank goodness) don’t talk yet. Don’t worry, I’m sure it’s coming.

  2. Since then a real shelter has tried this idea, but at the time, it was fake.

  3. “But Kyle, we’ve had strong journalistic institutions for decades and people still choose Fox News!” You’re right. This is hopelessly optimistic.

  4. [Sobbing intensifies]

  5. Suno CEO Mikey Shulman calls these “meaningful consumption experiences”, which sounds like a wry Dickensian euphemism.

Post a Comment

As an anti-spam measure, you'll receive a link via e-mail to click before your comment goes live. In addition, all comments are manually reviewed before publishing. Seriously, spammers, give it a rest.

Please avoid writing anything here unless you're a computer. This is also a trap:

Supports Github-flavored Markdown, including [links](http://foo.com/), *emphasis*, _underline_, `code`, and > blockquotes. Use ```clj on its own line to start an (e.g.) Clojure code block, and ``` to end the block.