• Home
  • Latest
  • Fortune 500
  • Finance
  • Tech
  • Leadership
  • Lifestyle
  • Rankings
  • Multimedia
Newsletters

Why teaching A.I. to read is a lifelong endeavor

By
Jonathan Vanian
Jonathan Vanian
Down Arrow Button Icon
By
Jonathan Vanian
Jonathan Vanian
Down Arrow Button Icon
October 27, 2020, 12:34 PM ET

It’s not just tech giants that are using artificial intelligence to understand human language, so that products like digital assistants can respond to basic questions.

More conventional businesses are also increasingly using a subset of A.I. called natural language processing (NLP) to create more powerful software to help answer basic customer call center queries or create summaries of long, complicated documents. 

LexisNexis, for instance, has been using NLP to improve the legal research software that lawyers, journalists, and analysts use to find relevant court documents. It’s light years ahead of the user unfriendly Boolean search system that I regularly used over a decade ago as a cub reporter.

With A.I., LexisNexis’ search interface is more intuitive. That’s partly because the company used Google’s free, open-source language model BERT as the foundation. The BERT model, trained on a vast amount of web data including Wikipedia pages, helps software better understand how some words mean different things depending on the context in which they appear.  

But LexisNexis can’t use BERT for all of its language needs because the company deals with information that is specific to the legal industry. This particular data can’t be found on the open web, which means the information doesn’t come baked into BERT.

Min Chen, vice president and chief technology officer for the Lexis Nexis Asia-Pacific and global search team, said that BERT “provides a good base model to start with.” But the company must fine-tune the technology with additional legal data so that it understands legal linguistics.

This fine-tuning is increasingly common for many companies operating in areas like finance or healthcare. Every industry has its own lingo that makes no sense in another context.

Chen said it took LexisNexis 12 months to train a version of BERT that understands case citations and even Latin. If someone wants to find a document showing that a case has been adjudicated, or closed, the technology knows to look for documents with the Latin term res judicata (claim preclusion, or a matter decided). 

As Amanda Stent, an NLP expert for financial news and information service Bloomberg, explained, technologies like BERT are important because they remove a lot of the grunt work required to train a language model from scratch. For a 10-word sentence, Stent said, “the combinations [of words] are astronomical,” and having a powerful language model like BERT as a starting point is very helpful.

But as other A.I. researchers have pointed out, because language models are typically trained on Internet data, they sometimes parrot back the offensive text they’ve scanned. You’ll be happy to know that companies can take precautions to make this less likely.

Stent and her colleagues recently published a best practices that companies can follow when training A.I.-powered language models and other machine learning systems. They recommended using human subject-matter experts to help annotate and label the text used for training (to ensure data is labelled accurately) and ensuring that product managers and engineers coordinate on big projects (to help ensure that problems don’t slip through the cracks).

The goal is to eliminate any problems before companies introduce new products. After all, no user wants to be bombarded with vile language.  

One thing companies should be prepared for is that data training projects are never done. There’s always room for improvement. 

Said Stent, “It never stops.”

Jonathan Vanian 
@JonathanVanian
jonathan.vanian@fortune.com

A.I. IN THE NEWS

Speaking of NLP. Eye on A.I.’s Jeremy Kahn takes a look at AI21 Labs, a NLP-focused startup founded by prominent machine learning researchers that aims “to fundamentally transform how we read and write." As opposed to other language models like OpenAI’s GPT-3, Kahn writes that the startup’s “system is a fusion between neural network-based language models and an older form of artificial intelligence that seeks to represent human knowledge, like vocabulary and the meaning of words, in a graph structure.”

Enter the A.I. Threat Matrix. The non-profit and security focused MITRE Corporation, Microsoft, IBM, Nvidia, Bosch and a host of other companies teamed up to release the Adversarial ML Threat Matrix, which VentureBeat described as “an industry-focused open framework designed to help security analysts to detect, respond to, and remediate threats against machine learning systems.” The goal is to help companies better secure their machine learning systems by thoroughly understanding all of the ways hackers can crack modern A.I. software. The authors of the threat matrix said via GitHub, “Data can be weaponized in new ways which requires an extension of how we model cyber adversary behavior, to reflect emerging threat vectors and the rapidly evolving adversarial machine learning attack lifecycle.

How to bring “dead languages” back to life. MIT researchers are using machine learning to “automatically decipher lost languages that can no longer be understood,” technology publication CNET reported. The researchers created an algorithm that analyzes the patterns of how languages develop over time to help uncover the forgotten languages.  From the report: “Going forward, the team hopes to expand its work to identify the semantic meaning of words, even if they're not readable yet. It ultimately hopes to be able to resurrect lost languages using just a few thousand words.”

The FDA sounds the A.I. bias alarms. Bakul Patel, the director of the U.S. Food and Drug Administration’s new Digital Health Center of Excellence, explained during an online meeting how biased and unclean data could cause machine learning software to misfire and “negatively impact patient care,” industry publication MedTech Dive reported. “We don't want to set up a system and we would not want to figure out after the product is out in the market that it is missing a certain type of population or demographic or other aspects that we would have accidentally not realized," Patel said.

EYE ON A.I. TALENT

Censiahas picked Deborah Leff to join the enterprise software startup’s board. Leff was previously the global leader and industry chief technology officer for data science and A.I. at IBM.

Nautilus hired Garry Wiseman to be the fitness company’s senior vice president and chief digital officer. Wiseman was previously the senior vice president of digital customer experience for Dell Technologies.

EYE ON A.I. RESEARCH

When auditing A.I. research, look at the conferences. Technology analysis website TechTalks looks into a recent research paper describing the review process that researchers face when attempting to submit their papers to The International Conference on Learning Representations. The authors of the research paper, who are currently anonymous, claim that they have found some problems with the submission process, including “evidence for a gender gap, with female authors receiving lower scores, lower acceptance rates, and fewer citations per paper than their male counterparts.”

As TechTalk notes, the research paper notes several instances of bias, including the conference organizers showing “significant preference for Carnegie Mellon, MIT, and Cornell universities.” Researchers who published their papers on the popular arXiv preprint server prior to submission also did better, especially if they came from those top-tier universities.

From TechTalk:

Interestingly, their research did not find a significant bias toward large tech companies such as Google, Facebook, and Microsoft, which house reputable AI researchers. At first glance, this is a positive finding, because big tech already has a vast influence over commercial AI and, by extension, on AI research.

But as other authors have pointed out, the same academic institutions that are very well represented at AI conferences serve as talent pools for big tech companies and receive much of their funding from those same organizations. So this just creates a feedback loop of a narrow group of people promoting each other’s work and hiring each other at the expense of others.

FORTUNE ON A.I.

Former Facebook employee’s new book exposes Big Tech’s dirty secrets—By Danielle Abril

Startup cofounded by A.I. heavy hitters debuts editing tool it hopes will ‘transform writing’—By Jeremy Kahn

Is it time for a new agency to oversee Big Tech? Many say yes—By Jeff John Roberts

Here’s what Amazon’s new Echo speakers are like—By Jonathan Vanian

How Lyft became the company with nine lives—By Beth Kowitt

A possible semiconductor shortage looms over Huawei’s new smartphone launch—By Naomi Xu Elegant

BRAIN FOOD

A.I. takes to space. Researchers have found machine learning technology to be an excellent tool for analyzing space data. In 2017, for instance, NASA and Google used neural networks to comb through imagery data captured from the Keplar space telescope, and uncovered a couple of planets far outside of our solar system. More recently, researchers from NASA’s Jet Propulsion Laboratory have used machine learning to identify recently formed craters on the surface of Mars. Space.com reports:

Scientists have fed the algorithm more than 112,000 images taken by the Context Camera on NASA's Mars Reconnaissance Orbiter (MRO). The program is designed to scan the photos for changes to Martian surface features that are indicative of new craters. In the case of the algorithm's first batch of finds, scientists think these craters formed from a meteor impact between March 2010 and May 2012. 

About the Author
By Jonathan Vanian
LinkedIn iconTwitter icon

Jonathan Vanian is a former Fortune reporter. He covered business technology, cybersecurity, artificial intelligence, data privacy, and other topics.

See full bioRight Arrow Button Icon

Latest in Newsletters

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025

Most Popular

Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Finance
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam
By Fortune Editors
October 20, 2025
Fortune Secondary Logo
Rankings
  • 100 Best Companies
  • Fortune 500
  • Global 500
  • Fortune 500 Europe
  • Most Powerful Women
  • World's Most Admired Companies
  • See All Rankings
  • Lists Calendar
Sections
  • Finance
  • Fortune Crypto
  • Features
  • Leadership
  • Health
  • Commentary
  • Success
  • Retail
  • Mpw
  • Tech
  • Lifestyle
  • CEO Initiative
  • Asia
  • Politics
  • Conferences
  • Europe
  • Newsletters
  • Personal Finance
  • Environment
  • Magazine
  • Education
Customer Support
  • Frequently Asked Questions
  • Customer Service Portal
  • Privacy Policy
  • Terms Of Use
  • Single Issues For Purchase
  • International Print
Commercial Services
  • Advertising
  • Fortune Brand Studio
  • Fortune Analytics
  • Fortune Conferences
  • Business Development
  • Group Subscriptions
About Us
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • About Us
  • Press Center
  • Work At Fortune
  • Terms And Conditions
  • Site Map
  • Facebook icon
  • Twitter icon
  • LinkedIn icon
  • Instagram icon
  • Pinterest icon

Latest in Newsletters

Aerie built a brand based on ‘real.’ That’s at the heart of its ‘no AI’ promise
NewslettersMPW Daily
Aerie built a brand based on ‘real.’ That’s at the heart of its ‘no AI’ promise
By Emma HinchliffeMay 1, 2026
1 day ago
The fruit fly cancer researcher who built his first prototype out of lollipop sticks and straws
NewslettersTerm Sheet
The fruit fly cancer researcher who built his first prototype out of lollipop sticks and straws
By Allie GarfinkleMay 1, 2026
2 days ago
Apple CEO Tim Cook in Washington, D.C. on December 10, 2025. (Tom Williams/CQ-Roll Call/Getty Images)
NewslettersFortune Tech
Tim Cook’s advice for Apple’s next CEO
By Andrew NuscaMay 1, 2026
2 days ago
Brian Niccol’s nascent Starbucks turnaround starts with treating workers better
NewslettersCEO Daily
Brian Niccol’s nascent Starbucks turnaround starts with treating workers better
By Phil WahbaMay 1, 2026
2 days ago
Meta's Hyperion data-center site in Northeastern Louisiana.
NewslettersEye on AI
Big Tech will spend nearly $700 billion on AI this year. No one knows where the buildout ends
By Sharon GoldmanApril 30, 2026
2 days ago
The Tory Burch Foundation is almost halfway to its $1 billion goal for women entrepreneurs
NewslettersMPW Daily
The Tory Burch Foundation is almost halfway to its $1 billion goal for women entrepreneurs
By Emma HinchliffeApril 30, 2026
3 days ago

Most Popular

Scott Bessent on financial literacy: 'it drives me crazy' to see young men in blue-collar construction jobs playing the lottery
Personal Finance
Scott Bessent on financial literacy: 'it drives me crazy' to see young men in blue-collar construction jobs playing the lottery
By Fatima Hussein and The Associated PressMay 1, 2026
2 days ago
Gen Z is rebelling against the economy with ‘disillusionomics,’ tackling near 6-figure debt by turning life into a giant list of income streams
Economy
Gen Z is rebelling against the economy with ‘disillusionomics,’ tackling near 6-figure debt by turning life into a giant list of income streams
By Jacqueline MunisMay 2, 2026
13 hours ago
Stop donating to Harvard and the Ivy League. There's a better option that MacKenzie Scott already figured out
Commentary
Stop donating to Harvard and the Ivy League. There's a better option that MacKenzie Scott already figured out
By Ed Smith-LewisMay 2, 2026
19 hours ago
The American household just took an 81% margin cut. Wall Street hasn’t priced it in
Commentary
The American household just took an 81% margin cut. Wall Street hasn’t priced it in
By Katica RoyMay 2, 2026
16 hours ago
Current price of oil as of May 1, 2026
Personal Finance
Current price of oil as of May 1, 2026
By Joseph HostetlerMay 1, 2026
2 days ago
A Chick-fil-A worker got fired and then showed up behind the register to allegedly refund himself over $80,000 in mac and cheese
Law
A Chick-fil-A worker got fired and then showed up behind the register to allegedly refund himself over $80,000 in mac and cheese
By Catherina GioinoMay 1, 2026
1 day ago

© 2026 Fortune Media IP Limited. All Rights Reserved. Use of this site constitutes acceptance of our Terms of Use and Privacy Policy | CA Notice at Collection and Privacy Notice | Do Not Sell/Share My Personal Information
FORTUNE is a trademark of Fortune Media IP Limited, registered in the U.S. and other countries. FORTUNE may receive compensation for some links to products and services on this website. Offers may be subject to change without notice.