AI tools can detect anonymous accounts


Do you have a Reddit alt, Secret X, finsta, or Glassdoor account that you badmouth your boss with? AI may have made it much easier to unmask you. This is conclusion A A recently published studysuggesting some uncomfortable consequences for staying private online — even if it’s not time to hold a funeral for anonymity yet.

This finding, which has not been peer-reviewed, comes from researchers at ETH Zurich, Anthropic, and the Machine Learning Alignment and Theory Scholars Program. They built an automated system of AI agents using anonymized models — capable of searching the web and interacting with information like a human investigator — to test how effective large language models are at re-identifying anonymized material. The system significantly outperforms traditional computational techniques to de-anonymize accounts and search text for personal details at scale.

The system works by treating publications or other texts as a body of evidence. He analyzes the text for patterns—writing quirks, stray biographical details, frequency and timing of publication—that might hint at someone’s identity. It then scans other accounts, perhaps millions of them, looking for the same combination of traits. Potential matches are flagged, compared in more detail, and sifted into a shortlist of possible identities.

Instead of targeting unsuspecting users, the team evaluated the system using datasets generated from publicly available posts, including content from Hacker News and LinkedIn, transcripts of Anthropic interviews with scientists about how they use AI, and Reddit accounts that were intentionally split into anonymous halves for testing. The paper reports that in each setting, the LLM-based approach correctly identified up to 68 percent of the matching accounts with 90 percent accuracy. In contrast, similar non-MBA approaches, such as linking data points scattered across large data sets, identified almost nothing.

The results were not uniform across each data set, and as expected, the model performed better when it had more structured information to work with. In one experiment that examined Reddit users posting about movies on the main r/movies forums and smaller movie communities, the system was able to link accounts that mentioned just one movie about 3 percent of the time with 90 percent accuracy. When users mentioned 10 or more movies, the success rate nearly halved.

Meanwhile, an experiment using Anthropic’s survey of scientists identified nine out of 125 participants, for a recall rate of about 7 percent. In this test, the system created a profile for each respondent based on the clues in their answers, and then searched publicly available information on the web for potential matches. In one identical example, the researchers highlighted how a reference to a “supervisor” could indicate a doctoral student and that the use of British English could indicate affiliation with the United Kingdom. Combined with references to background in the physical sciences and current work in biology research, the system was able to narrow the field to a specific candidate.

However, the researchers argue that the ability to identify any respondent through unstructured text is noteworthy, replicating in minutes what would take a human investigator hours. Moreover, they said Edge This performance is likely to improve as AI systems become more capable and gain access to larger data sets. More broadly, they warn that it may no longer be safe to assume that posting under pseudonyms will protect identities online, in the past or future.

“Everything LLM has found in principle could be found by a human investigator.”

“Information on the Internet has been around forever,” said Daniel Palica, a researcher at the ETH Zurich and one of the study’s authors. The researchers warn that this insistence could translate into tangible real-world risks for pseudonymous journalists, dissidents and activists, while also enabling “hyper-targeted advertising” and “hyper-personalized” scams.

The risks of deanonymizing accounts are not new, nor are they unique to AI. “Everything LLM has found in principle could be found by a human investigator,” Palica said. Edge.

What’s new, Balika says, is end-to-end automation. Work that once required a diligent investigator willing to patiently sift through publications looking for small nuggets of information can now be done more easily and across a much greater number of targets.

It’s cheap too. The researchers said their experiment cost less than $2,000, a cost of $1 to $4 per profile they ran the AI ​​agent on. “The economics are very different now,” said co-author Simon Lerman. Edgewarning that the lower barrier to entry could increase who has the ability — and incentive — to try to break through online anonymity. He added that groups that have historically been “under the radar” may find it difficult to continue to do so.

“People may misunderstand this important research and conclude that privacy is dead.” It’s not like that.

It is important not to exaggerate the results. “While these algorithms are improving, they are still a far cry from what humans can do,” said Luc Rocher, associate professor at the Oxford Internet Institute. Edge. The work does not correspond precisely to the real world; Experiments were conducted under laboratory conditions using datasets that were carefully curated and anonymized for testing purposes. They said they were concerned that people “might misunderstand this important research and conclude that privacy is dead.” They argued that this is not the case.

Despite years of gradual progress in technologies designed to detect anonymous users, “the identity of Satoshi Nakamoto, the inventor of Bitcoin, remains a mystery more than a decade later,” Rocher said. They added that whistleblowers can still communicate with journalists without being detected, and tools like Signal have “so far been successful in protecting our collective privacy.”

In this paper, the researchers said they avoided testing their system on users with actual pseudonyms due to ethical concerns. For similar reasons, they have not published the full technical details of their approach and declined to provide a demo when asked. The team also did not say whether they tested the system outside the confines of the study, again citing ethical concerns, leaving open the question of how reliable its performance is against real-world calculations.

For people who are strongly committed to anonymity, the practical impact may be limited. Basic precautions — keeping accounts separate, limiting personal details, and avoiding identifiable patterns like only posting during waking hours in your time zone — are still essential.

For those who deal with pseudonyms more casually, Paleka and Lermen advised users to think carefully about what is posted in public forums, even accounts that appear anonymous, and to keep in mind that what is already there can be pieced together more easily than many assume.

Researchers say the responsibility should not fall entirely on users. Lerman said AI labs should monitor how their tools are used and put safeguards in place to prevent them from being used to hide people’s identities. He added that social media platforms could clamp down on large-scale data mining and extraction, making such efforts possible.

In other words, Satoshi may be safe from AI investigators. Your AITA post on Reddit? That might be another issue.

Follow topics and authors From this story to see more like this in your personalized homepage feed and receive email updates.


Leave a Reply

Your email address will not be published. Required fields are marked *