How A.I. could help solve science's 'reproducibility' crisis

Researchers often have trouble reproducing, or verifying, supposedly groundbreaking work described in scientific papers, raising questions about whether the findings in studies are genuine.

Over the past few years, they’ve been increasingly sounding alarms about this so-called reproducibility (or replication) crisis, concerned that scholars are routinely overstating their findings. The problem is particularly acute in the field of artificial intelligence, in which researchers have published a number of non-peer reviewed papers about topics like speech recognition and diagnosing medical conditions that others have been unable to replicate.

Now, a team from Northwestern University’s Kellogg School of Management and its Institute on Complex Systems has published a paper detailing a deep-learning system that it claims can figure out whether certain papers can be replicated or not. The paper, published Monday in the Proceedings of National Academy of Sciences of the United States of America journal, is noteworthy because it could help companies, universities, and other groups screen thousands of research papers while surfacing studies that are most likely to be reliable.

Brian Uzzi, a Kellogg School of Management professor and one of the paper’s authors, said his team wants to test the technology on the thousands of published coronavirus-related studies that scientists hope will lead to greater understanding about the virus. He’s concerned about a possible influx of low-quality coronavirus studies being rushed to publication.

“We want to begin to apply this to the COVID issue—an issue right now where a lot of things are becoming lax, and we need to build on a very strong foundation of prior work,” Uzzi said. “It’s unclear what prior work is going to be replicated or not and we don’t have time for replications.”

What’s unusual is that the Northwestern A.I. system doesn’t analyze the empirical or mathematical evidence that researchers typically detail in their papers to support their theses. Instead, it uses advances in natural-language processing to analyze the text of the paper, finding hidden clues in the writing that tips the software to flag that certain papers are more replicable than others.

Some of the paper’s ideas take from psychology research in the 1960’s that “found that people often reveal in the words they choose the level of confidence they have in what they’re saying,” Uzzi said. The same may apply to authors of scientific papers, who may unintentionally reveal their confidence in their findings through the language that they use.

For the paper, the researchers trained the software twice using two different data sets of academic papers. The researchers first trained the system on two million scientific abstracts so that it could discover patterns and relationships between words used in the academic papers. They then trained the system on academic papers taken from the “Reproducibility Project: Psychology,” an initiative by other researchers to manually recreate psychology papers to determine whether they could be confirmed.

When the researchers then tested their newly-trained A.I. system on hundreds of other papers taken from fields like economics and the social sciences, they found that the machine-learning model performed better than more traditional statistical techniques used to determine whether a paper’s findings could be reproduced.

Despite his technology’s promising results, however, Uzzi said his team is still unsure just what their A.I. system learned about the language of certain academic papers that make them more easily replicated. It’s part of a larger problem about the difficulties A.I. researchers have describing just how exactly neural networks reach their conclusions.

As a result, other researchers may be apprehensive about using the technology as a screening tool. Still, Uzzi is hopeful that the technology can eventually be used to help other researchers.

For future work, Uzzi said he would like to apply some of the natural-language processing techniques his team developed to analyzing corporate earnings calls. His team has already compiled the transcripts of 30,000 earnings calls as a dataset for the research project, which he hopes can help researchers understand “what it is about earnings calls that help us get more accurate predictions on what the company will do in the future.”

If the research is successful, it may be valuable for investors or analysts to use it as a financial forecasting tool.

“What we want to do us use the machine to help us predict whether the earnings call is going to lead to analysts giving a thumbs up or thumbs down to a company—like a buy or sell recommendation,” Uzzi said.

More must-read tech coverage from Fortune:

—How T-Mobile shifted 12,000 employees to work from home in less than two weeks
—Coronavirus patient data stored in electronic health records found difficult to study at scale
—Cybercriminals adapt to coronavirus faster than the A.I. cops hunting them
—Elon Musk calls COVID-19 lockdowns “fascist,” distracting from another Tesla earnings win
—Listen to Leadership Next, a Fortune podcast examining the evolving role of CEO
—WATCH: Zoom’s ups and downs since the coronavirus crisis

Catch up with Data Sheet, Fortune’s daily digest on the business of tech.

Trendingnow

1

2

3

How A.I. may help solve science’s ‘reproducibility’ crisis

More must-read tech coverage from Fortune: