Pentagon Exposed 1.8 Billion Social Media Posts From Facebook and Twitter for Anyone to See
It’s hardly a secret that the Defense Department collects social media data. More surprising is just how much data it collects—and how sloppy the Pentagon can be about storing that information.
The latest example comes via the security firm Upguard, which discovered open Amazon Web Services servers that contain more 1.8 billion social media records stored by the U.S. Central Command (Centcom) and U.S. Pacific Command (Pacom).
The records, which date back to 2009, include everything from Facebook posts to online news comments to web discussion forums about sports and politics. The data represents publicly-available information, but the scope and extent of it is remarkable:
Massive in scale, it is difficult to state exactly how or why these particular posts were collected over the course of almost a decade. Given the enormous size of these data stores, a cursory search reveals a number of foreign-sourced posts that either appear entirely benign, with no apparent ties to areas of concern for U.S. intelligence agencies, or ones that originate from American citizens, including a vast quantity of Facebook and Twitter posts, some stating political opinions. Among the details collected are the web addresses of targeted posts, as well as other background details on the authors which provide further confirmation of their origins from American citizens.
As Upguard notes, the collection is also notable because it includes ordinary social media posts of American citizens, raising questions about the Pentagon’s surveillance practices.
The security firm was able to gain access to the data because a contractor used by the Defense Department stored it in a way that was accessible by anyone with an AWS account. Specifically, as CNN reports, the contractor put the social media trove onto three unsecured AWS servers, which could be discovered and accessed simply by doing keyword searches:
Amazon servers where data is stored, called S3 buckets, are private by default. Private means only authorized users can access them. For one to be made more widely accessible, someone would have to configure it to be available to all Amazon Web Services users, but users would need to know or find the name of the bucket in order to access it.
The discovery of the Defense Department’s social media repository is just the latest example of big institutions using cloud-based storage services to park massive amounts of data, and then failing to secure it. Other examples include the Republican Party doing this with the voter data of 200 million people, and Verizon posting data about 6 million subscribers on the open Internet.
Get Data Sheet, Fortune’s technology newsletter.
In response to Upguard’s discovery, Centcom has complained about the company using “unauthorized access” to get access to the data and “employing methods to circumvent security protocols.” It’s unclear is such claims are accurate, however, if the data was available to anyone on the web.
Meanwhile, some claim the issue is not Upguard’s discovery of the data but why the Pentagon is collecting so much of it in the first place:
“I know that the U.S. government still has this ‘collect it all’ mentality,” observed Mike Masnick, the editor of TechDirt, in an essay on Friday, “but as we’ve discussed over and over again, adding more hay to the haystack doesn’t make it easier to find the needles.”