Everything Tagged "The Future of Everything is Lies"

(In reverse chronological order)

The Future of Everything is Lies, I Guess: Information Ecology

Table of Contents

This is a long article, so I'm breaking it up into a series of posts which will be released over the next few days. You can also read the full work as a PDF or EPUB; these files will be updated as each section is released.

Machine learning shifts the cost balance for writing, distributing, and reading text, as well as other forms of media. Aggressive ML crawlers place high load on open web services, degrading the experience for humans. As inference costs fall, we’ll see ML embedded into consumer electronics and everyday software. As models introduce subtle falsehoods, interpreting media will become more challenging. LLMs enable new scales of targeted, sophisticated spam, as well as propaganda campaigns. The web is now polluted by LLM slop, which makes it harder to find quality information—a problem which now threatens journals, books, and other traditional media. I think ML will exacerbate the collapse of social consensus, and create justifiable distrust in all kinds of evidence. In reaction, readers may reject ML, or move to more rhizomatic or institutionalized models of trust for information. The economic balance of publishing facts and fiction will shift.

Creepy Crawlers

ML systems are thirsty for content, both during training and inference. This has led to an explosion of aggressive web crawlers. While existing crawlers generally respect robots.txt or are small enough to pose no serious hazard, the last three years have been different. ML scrapers are making it harder to run an open web service.

As Drew Devault put it last year, ML companies are externalizing their costs directly into his face. This year Weird Gloop confirmed scrapers pose a serious challenge. Today’s scrapers ignore robots.txt and sitemaps, request pages with unprecedented frequency, and masquerade as real users. They fake their user agents, carefully submit valid-looking headers, and spread their requests across vast numbers of residential proxies. An entire industry has sprung up to support crawlers. This traffic is highly spiky, which forces web sites to overprovision—or to simply go down. A forum I help run suffers frequent brown-outs as we’re flooded with expensive requests for obscure tag pages. The ML industry is in essence DDoSing the web.