• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
TechPointCloud

How Tech Made the Pulitzer Prize-Winning Panama Papers Coverage Possible

Barb Darrow
By
Barb Darrow
Barb Darrow
Down Arrow Button Icon
Barb Darrow
By
Barb Darrow
Barb Darrow
Down Arrow Button Icon
May 30, 2017, 1:32 PM ET

Reporters are relying more and more on troves of public—or leaked—data to do their jobs. And they also increasingly depend on technology tools to help organize and sift through all of that information.

Case in point: The Panama Papers, a massive leak of financial records from Panamanian law firm Mossack Fonseca obtained by the German newspaper Süddeutsche Zeitung and shared with the International Consortium of Investigative Journalists (ICIJ). That data led to huge scoops last year by journalists around the world that exposed a network of tax havens used by the rich and powerful in government and private industry. The stories led to the resignation of at least one head of state and embarrassed dozens of others including former U.K. prime minister David Cameron and Russian president Vladimir Putin.

But none of those stories would have appeared without a lot of work preparing the data. This was a mother lode: The Panama Papers comprised some 2.6TB of data and 11.5 million documents about Mossack Fonseca clients many of whom, it turned out, used the law firm and its affiliates to dodge taxes. NSA whistleblower Edward Snowden, who knows a bit about these things, called it the biggest leak in data journalism history. For context, 2.6TB of data would equal the capacity of about 390 DVDs, a stack of which would be nearly 21 feet high.

Related: Behind the Panama Papers

The data came into the ICIJ’s possession in dribs and drabs, and in many formats. Much of it was email and PDF files, of the sort that are created to be printed out and viewed. That document data is called unstructured since it does not come in the neat rows-and-columns of traditional databases. For this type of data the ICIJ team used three open-source, or free, tools—Tesseract software to scan the printed information; Apache Solr to index it and make it searchable, and Apache Tika to extract data from these documents.

Biggest leak in the history of data journalism just went live, and it's about corruption. https://t.co/dYNjD6eIeZ pic.twitter.com/638aIu8oSU

— Edward Snowden (@Snowden) April 3, 2016

And much of the Mossack Fonseca data came from a traditional structured row-and-column database but arrived in very raw form, not in full database files that would normally be shared. It’s sort of like someone sends a list of letters and words instead of a fully formatted Word document—the information is there but is not all that useful.

Because of that, the ICIJ tech staff had to take the leaked information and rebuild the original SQL database structure it came from. The term SQL or structured query language, describes how these databases are set up and how users can request information from them.

Related: The Laughably Bad Security at Mossack Fonseca

From there, the ICIJ relied on an open-source version of Talend (TLND) software, known by techies as an “ETL” or extract, transform and load tool. Talend’s technology let the journalists take the row-and-column data structures they had painstakingly rebuilt and pump them into an open-source Neo4J graph database, which let reporters see onscreen icons representing people or organizations that are based on the original data.

Talend enabled the team take structured data from different sources, and automate the process of putting it all together. “It’s like a recipe. You create a job and get three columns of data from this source, and two from this source, intermix them in SQL form,” Mar Cabra, editor of the ICIJ’s Data & Research Unit told Fortune. Without a tool like Talend the team would have to write a ton of software code to do that.

The moments you win a #pulitzer prize – in @ICIJorg office in Washington D.C. What a day. What a year. Long live collaborative journalism!! pic.twitter.com/iE0TFgvP3o

— Bastian Obermayer (@b_obermayer) April 10, 2017

The next step was to use a commercial product called Linkurious which works with Neo4J to visualize the relationships between the people and organizations mentioned in the data. It creates a sort of interactive flow chart that lets users click on one party to see who that person is connected to based on the Mossack Fonseca data.

If that data were left in SQL form, finding relationships between people and organizations would require writing long and complicated database queries, Cabra said. “In a graph database, if a company is connected to you, and you are connected to other companies, reporters can follow that thread,” she added.

Get Data Sheet, Fortune’s technology newsletter.

At that point the Mossack Fonseca data trove could be shared by authorized reporters in Germany, the U.S., Spain, and elsewhere. Each reporting team could run their own queries, track down their own leads, and do their own reporting. All of that preparation work mentioned above made sure they were all working from one single source of Mossack Fonseca data.

Related: Panama Papers Law Firm Responds to Massive Hack Attack

In April of this year, after more than a year of work and a slew of articles, ICIJ members including Süddeutsche Zeitung, and the Miami Herald, were awarded the Pulitzer Prize for Explanatory journalism by Columbia University.

About the Author
Barb Darrow
By Barb Darrow
See full bioRight Arrow Button Icon

Latest in Tech

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • Future 50
  • World’s Most Admired Companies
  • See All Rankings
Sections
  • Finance
  • Leadership
  • Success
  • Tech
  • Asia
  • Europe
  • Environment
  • Fortune Crypto
  • Health
  • Retail
  • Lifestyle
  • Politics
  • Newsletters
  • Magazine
  • Features
  • Commentary
  • Mpw
  • CEO Initiative
  • Conferences
  • Personal Finance
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
About Us
  • About Us
  • Editorial Calendar
  • Press Center
  • Work At Fortune
  • Diversity And Inclusion
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Most Popular

placeholder alt text
North America
'I meant what I said in Davos': Carney says he really is planning a Canada split with the U.S. along with 12 new trade deals
By Rob Gillies and The Associated PressJanuary 28, 2026
19 hours ago
placeholder alt text
C-Suite
Fortune 500 CEOs are no longer giving employees an A for effort. Now they want proof of impact
By Claire ZillmanJanuary 28, 2026
1 day ago
placeholder alt text
Real Estate
Ryan Serhant thinks the American Dream was just a 'slogan created by banks,' but it was really about FDR, the Great Depression, and an economic crisis
By Sydney Lake and Nick LichtenbergJanuary 26, 2026
3 days ago
placeholder alt text
Personal Finance
Current price of silver as of Tuesday, January 27, 2026
By Joseph HostetlerJanuary 27, 2026
2 days ago
placeholder alt text
Commentary
Yes, you're getting a bigger tax refund. Your kids won't thank you for the $3 trillion it's adding to the deficit
By Daniel BunnJanuary 26, 2026
3 days ago
placeholder alt text
Success
As AI wipes out desk jobs, Citigroup CEO Jane Fraser says the company is training 175,000 employees to ‘reinvent themselves’ before their roles change forever
By Emma BurleighJanuary 27, 2026
2 days ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.


Latest in Tech

Big TechRetail
Amazon is closing its futuristic Go and Fresh stores—showing logistics and tech aren’t enough to make old-school retail work
By Phil WahbaJanuary 29, 2026
2 hours ago
Big TechTesla
Tesla reveals $2 billion investment in Elon Musk’s xAI and officially kills the Model S and Model X
By Jessica MathewsJanuary 28, 2026
8 hours ago
Bald man with glasses and black shirt.
Big TechFortune 500
Microsoft demand backlog doubles to $625 billion thanks to OpenAI, but hefty spending and slower revenue growth spook investors
By Amanda GerutJanuary 28, 2026
9 hours ago
MagazineSamsung
How Samsung’s first-ever chief design officer is reinventing the electronics giant for the AI age
By Nicholas GordonJanuary 28, 2026
11 hours ago
Mark Zuckerberg, chief executive officer of Meta Platforms Inc
AIMeta
Meta beats on Q4 revenue as Mark Zuckerberg predicts a ‘major AI acceleration’ in 2026—with up to $135 billion in capex spending to match
By Sharon GoldmanJanuary 28, 2026
11 hours ago
ServiceNow CEO Bill McDermott
InvestingServiceNow
ServiceNow stock falls despite earnings beat as CEO Bill McDermott tries to get investors to stop thinking of it as a SaaS company
By Jeremy KahnJanuary 28, 2026
13 hours ago