The Panama Papers—the biggest leak in data journalism history, as Edward Snowden christened it—is not so much a leak as a hemorrhage.
Mossack Fonseca, the Panamanian law firm that specializes in creating companies in off-shore tax havens, lost 2.6 terabytes worth of data, equivalent to 11.5 million documents. To pore over that many reams of documents required lots of reporters, lots of eyeballs, and lots of tech.
I caught up with Mar Cabra, head of the data and research unit at the International Consortium of Investigative Journalists, which coordinated the reporting effort, yesterday afternoon to discuss how the global investigation—more than 400 reporters in 80 countries—took place. “This would not have been possible without technology,” she said.
One aspect I found interesting was her team’s use of open source info-retrieval software: In particular, Apache Tika, Apache Solr, and Blacklight. These tools allowed reporters to dig into the cache and turn up their findings, which in many cases involved tying global leaders to tax-dodging accounts. Tika extracts document data; Solr indexes it; and Blacklight provides a user interface, the packaging and presentation. Why this specific set of tools? “We chose Solr because project Blacklight existed,” Cabra said, mentioning that her team had adopted the search software by mid-2014 for earlier projects. “It’s an interface that’s intuitive and easy to use.”
I spoke with Erik Hatcher, one of the original developers of Blacklight, on Friday as well. He said he wrote the precursor code—a Ruby on Rails application that layers on top of Solr’s Java code, for the programmers among us—while working in a research group at the University of Virginia. He created the tool to do analytics and search on a database of 19th century literature and poetry. Then he adapted it to accommodate the entirety of the university’s library records.
Hatcher said he’s proud that the search software—today used everywhere from the Rock and Roll Hall of Fame to inside national security organizations—was used in the Panama Papers data dump. “Oftentimes these tools get the job done, but they’re not really exposed in and of themselves,” he said. “They’re just a means to an end—they don’t get as much press.”
“I’m happy in this case that these technologies are being showcased for the power they offer,” he added.
Cabra said that her team is now considering using a bit of rival search software—Elasticsearch—for an upcoming project. She said the group is interested in assembling a centralized cache of all the leaks the consortium has worked on so far. “We call it a knowledge center,” she told me. “It’s going to be a global repository of everything we have.”
Expect a one stop shop for all your investigative journalism needs.
Welcome to the Cyber Saturday edition of Data Sheet, Fortune‘s daily tech newsletter. Fortune reporter Robert Hackett here. You may reach me via Twitter, Cryptocat, Jabber, PGP encrypted email, Wickr, Signal, or however you (securely) prefer. Feedback welcome.
Panama Papers: a bombshell investigation. Millions of documents leaked from the Panamanian law firm Mossack Fonseca, which specializes in creating off-shore companies, especially in tax havens. The data dump is already having repercussions around the globe, including the resignation of Iceland's Prime Minister, denials of wrongdoing by the Kremlin, and censorship in China. The SEC said it will use the cache to investigate possible breaches of its anti-bribery rules. (Fortune, Fortune, Fortune, Fortune, Fortune, Fortune)
WhatsApp turns up encryption. The Facebook-owned chat app will extend strong end-to-end encryption to all manner of correspondence that takes place within the app: chats, group chats, attachments, voice calls. The technology prevents snoops and hackers from eavesdropping on users' conversations. (Fortune, Fortune)
Senate debuts flawed encryption bill. The draft of a bill that seeks to outlaw the kind of encryption now championed by Apple has become public. Tech firms would be required to give agents of the law access to certain unscrambled user data. The proposed legislation has privacy advocates working themselves into a tizzy. (The Hill, Wired)
FBI iPhone hacking tool has limits. Director of the FBI James Comey said that the hacking tool his agency bought to break into a terrorist's iPhone works only on an iPhone 5c running Apple's iOS 9 software. In other words, its potential scope to help law enforcement crack phones in other cases is quite limited. (Fortune, Fortune)
Siri, hack this phone! Apple fixed a vulnerability affecting its iPhone 6s and iPhone 6s Plus handsets that allowed hackers to bypass a phone's lock screen and access a user's photos and contacts. The computer bug involved Apple's 3D touch feature, and required Siri to be turned on. (Fortune, Fortune)
Another Trump Hotels data breach? Fraudulent activity has been turning up on payment cards used at Trump Hotel properties since the beginning of the year, according to investigative cybersecurity reporter Brian Krebs, citing unnamed sources within the banking community. "We are in the midst of a thorough investigation on this matter," spokesperson Jennifer Rodstrom told Fortune. The company said in the fall that it had been hit with an earlier data breach. (Fortune, Krebs on Security)
By the way, did you know you can unlock a fingerprint-protected phone with Play-Doh? Yep: the more you know.
Share today's Data Sheet with a friend:
Looking for previous Data Sheets? Click here.
Finance guru James Rickards offers cybersecurity advice: Buy gold. Here's an excerpt from his latest book, The New Case For Gold.
On August 22, 2013, the NASDAQ was shut down for half a day. Investors have never been given a credible explanation as to what happened. If there were a benign or technical explanation, NASDAQ would have told us about it by now. They could have said there was a bad piece of code or an engineer blundered while updating software or an installation didn’t go well. NASDAQ has never provided information of any substance except a few vague references to an “interface problem.”
Why not? NASDAQ itself must know. One likely answer is that the cause of the shutdown was nefarious, and it was probably caused by criminal hackers or, worse yet, Chinese or Russian military cyberbrigades. Investors should have no doubt about the ability of a number of foreign cyberwarfare units to close or disrupt major stock exchanges in the United States and elsewhere. Read the rest on Fortune.com.
Stealthy Startup Anchore Says it Can Build Safer Software by Barb Darrow
The Architect of China's Great Firewall Was Himself Blocked by the Firewall by Charlie Campbell
This Hacker Found a Way to Get Free Domino's Pizza For Life by Robert Hackett
Justice Department Says Retweets Are Endorsements in Terrorism Case by Jeff John Roberts
Intel Bulking Up Safety and Security of Self-Driving Car Efforts by Aaron Pressman
TSA Paid Big For an App to Get You Through Security by Don Reisinger
Maryland Court Says Phone Tracking Unconstitutional by David Z. Morris
Taliban Launches Smartphone App to Recruit and Spread Propaganda by David Z. Morris
Sen. Al Franken Takes Aim at Oculus Rift Privacy Policies by Chris Morris
ONE MORE THING
What comes after cyberpunk? The neo-noir tropes of techno futurism have grown stale: shadowy nightscapes and neon-lit metropolises; corrupt corporate villains scheming atop skyscraping towers; hacker-detective anti-heros slumming around rain-soaked sidewalks and gritty nightclubs. "Having been the default vision of the future for nearly 30 years, cyberpunk has arguably become clichéd, even conservative," writes Darran Anderson, author of Imaginary Cities, of the genre. What worlds will the scions of sci-fi dream up next? (Versions/Kill Screen)