CEO DailyCFO DailyBroadsheetData SheetTerm Sheet

Can A.I. bring back the three-martini lunch?

February 2, 2021, 5:05 PM UTC

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

Imagine the life of an advertising executive and a scene from Mad Men is likely to come to mind: Don Draper snake-charming a pair of Kodak marketing executives with a perfectly crafted pitch about the emotional pull of nostalgia (“It’s delicate, but potent…”) in order to win the account for their new slide projector. “This device isn’t a spaceship,” Draper tells the entranced Kodak men of their slide carousel in one famous pitch from the television show. “It’s a time machine.”

Well, it turns out, those days have mostly gone the way of three-martini lunches, skinny ties, smoking in the office, and widely-tolerated workplace sexual harassment. In the digital era, instead of a high-stakes, high-wire act focused on high concepts, advertising has largely been reduced to a volume game. Marketing departments or creative agencies have to churn out dozens or hundreds of variations of digital ads for Facebook, Instagram, or web banners, each with slightly different imagery, display copy, and calls to action, and then conduct a series of A/B experiments to figure out what works for a particular target audience. It’s a slog.

A few weeks ago, I wrote about one company trying to use machine learning to take a bit of the drudgery out of this work, helping to automate the testing of different ads. Today, I want to talk about another: Pencil, a startup that is actually using A.I. to create the ads themselves. Based in Singapore, but with employees working remotely across the globe, Pencil automatically generate dozens of six, ten or 15-second Facebook video advertisements in minutes.

“The ad industry has been moving from big ideas to small ideas,” Will Hanschell, Pencil’s co-founder and chief executive officer, tells me. “Instead of a Superbowl ad, a multi-million dollar blow out once a year, it is increasingly about very small, online ads. And in that environment, you have to run 10 ads and throw out the nine that don’t work and start again with another 10. That has made the job unfun for a lot of creative people.”

Pencil hopes it can free up these creative folks to work on the big picture while A.I. does the rest. “It cuts videos into scenes, generates copy, applies animations and then uses a predictive system that looks at variety and tries to determine what feels most on-brand and looks similar to things that have worked in the past for the brand,” Hanschell says.

A company gives Pencil’s software the URL of its website, and that software automatically grabs the logos, fonts, colors and other “brand image information” found there to use in a business’s ads. It can use images from the website or a business can choose to provide the system additional images or video. It uses sophisticated computer vision to understand what is happening in an image or a video so that it can match that to ad copy. To write the copy itself, Pencil uses GPT-3, the ultra-large natural language processing A.I. built by OpenAI, the San Francisco A.I. research firm.

Hanschell says that when Pencil started out, using GPT-3’s predecessor, GPT-2, the ad copy it generated was usable only 60% of the time. Now, with GPT-3 and better understanding of how to use the existing web copy to prompt the system, Hanschell says the system generates usable copy 95% of the time. What’s more, the system can actually generate novel ideas, he says. For instance, for a company that sells protein powder, the system can come up with ideas around energy, but it can also come up with ideas about the morning ritual or fitness, he says.

I watched a demo of Pencil’s software in which it created a series of Facebook ads for an eyeglasses company. It came up with the tagline, “Your frames, your way,” as well as, “Your wildest looks, perfectly crafted,” each paired with appropriate still images. Not exactly Don Draper. But not bad. And as Hanschell points out, in the volume game of today’s digital advertising jungle, plenty good enough to start acquiring customers.

What’s more, the system can provide a prediction for how good a particular ad will do compared to what the company has run in the past. For instance, it forecast that the “Your wildest looks, perfectly crafted” ad would do 55% better than previous ads the same company had run. That’s something most human ad executives can’t do.

Pencil is already being used by about 100 companies, including some big multinationals such as Unilever. It is a good example of a new generation of products—and even whole businesses—that are being made possible by rapid advances in natural language processing, or NLP. (For more on this, check out the latest episode of Fortune’s Brainstorm podcast. Also, last year, my Fortune colleague David Z. Morris wrote about several other companies using A.I. to automatically craft or refine digital ads. )

But at the same, a growing number of ethical concerns are being raised about these underlying NLP systems. For instance, GPT-3, despite all of its seeming power, still fails simple tests of common-sense reasoning. It also has a problem with bias: Because it was trained on the entirety of the Internet, there’s a good chance it may have picked up a tendency to write sexist or racist prose.

One area where OpenAI itself has already acknowledged a problem: The system can exhibit a clear anti-Islamic bias, with a tendency to depict Muslims as violent. A recent paper by two researchers at Stanford found that in more than 60% of cases, GPT-3 associated Muslims with violence—and that the system was more likely to write about Black people in a negative context.

This lead the tech journalist David Gershorn, who covers A.I. for tech site OneZero, to question why OpenAI would allow it to be used in a commercial setting and why OpenAI’s investor and partner, Microsoft, would be incorporating GPT-3’s capabilities into its own products. How broken does an A.I. system have to be, Gershorn asked, before a tech company decides not to release it?

I asked Hanschell about the problem of potential bias. He noted that OpenAI had developed filters that screened out some of the worst examples. And he said that in Pencil’s case, no ads are ever run without a human approving them first. “One of the principals of this is that we wanted a human to be in control at all times,” he says.

So I guess maybe we can’t get back to those three-martini lunches quite yet. There’s still work for us to do.

With that, here’s the rest of this week’s A.I. news.

Jeremy Kahn


Baidu granted permission to test driverless cars in California. The state has given permission for the Chinese search giant to begin testing a fully-autonomous version of its driverless cars in Sunnyvale, California, The Verge reports. The company is the sixth to be granted a license for testing without a safety driver. The state has licensed some 60 companies.

Hartford Financial partners with A.I. startup for instant auto repair quotes. The insurance company is working with London-based A.I. startup Tractable to offer customers a feature that will allow them to upload photos of damage their car has sustained in an accident and receive an instant A.I.-generated appraisal for how much it will cost to repair, The Wall Street Journal reported. Rival insurer USAA has been working on a similar system with Google's Cloud unit.

Orders for factory robots soar. Orders for manufacturing robots were up 64% in the fourth quarter of 2020 compared to a year earlier, Bloomberg News said, as the COVID-19 pandemic has accelerated demand for automation. And for the first time, the Association for Advancing Automation told the news service, car companies were not the top buyers of industrial robots. Instead, a variety of industries, including life sciences companies, plastics and rubber industries, and consumer goods and food service companies, jumped.

Nvidia accused of cheating on benchmark test for big data computing. The Transaction Processing Performance Council (TPC) has accused chip giant Nvidia of cheating on a key benchmark test that the Council uses to assess high-performance computing clusters, The Register reported. Nvidia had claimed at its GPU Technology Conference last year that its Nvidia DGX A100 system, marketed for many A.I. use cases, scored 19.5 times better than its nearest competitor on the key TPCx-BB benchmark. But now TPC says Nvidia has tweaked computing workloads to bypass constraints in the test. "In effect, they weren’t running the same benchmark, so all corresponding claims are invalid," Michael Majdalany, administrator of the TPC, told The Register. 

Alphabet CEO Pichai says the real value of A.I. may still be decades off. Alphabet and Google CEO Sundar Pichai told the World Economic Forum's virtual Davos Agenda conference that A.I.'s true impact was at least a decade—and maybe more—away. Asked about how artificial intelligence could help with COVID-19 vaccine distribution, Pichai replied, "Today we have tools to help, could computing, machine learning and algorithms. But these are still the early days of AI, and the real potential will come into play in 10-20 years."

U.S. military has "a moral imperative" to develop A.I.-enabled weapons, key panel tells Congress. The National Security Commission on Artificial Intelligence issued a report to Congress saying the U.S. should not support international efforts to outlaw lethal autonomous weapons systems, The Guardian reports. Instead, Robert Work, a former deputy secretary of defense and the panel's vice chairman, said that because such A.I.-enabled weapons might make fewer mistakes than human soldiers, and thus reduce civilian casualties, the U.S. military had "a moral imperative" to develop them.

A.I. can spot signs of Alzheimer's and maybe other cognitive conditions years before symptoms become obvious. Researchers at IBM were able to use a machine learning algorithm to spot those who would develop Alzheimer's with 75% accuracy based on subtle changes in the way these people used language, according to a study published in the medical journal The Lancet. According to The New York Times story on the study, the IBM A.I. "identified one group of subjects who were more repetitive in their word usage at that earlier time when all of them were cognitively normal. These subjects also made errors, such as spelling words wrongly or inappropriately capitalizing them, and they used telegraphic language, meaning language that has a simple grammatical structure and is missing subjects and words like 'the,' 'is' and 'are.' "


McLaren, the luxury car maker, technology consulting firm and motor racing group, has promoted Chris Hicks to be its chief information officer, according to Computer Weekly. Hicks, who joined McLaren in January 2020 from market research firm Gfk, had been serving as the company's director of technology services. 

WorldQuant Predictive LLC, an artificial-intelligence platform company affiliated with the investment firm WorldQuant LLC, hired Dan Wilson as its head of sales, The Wall Street Journal reported. He had previously been head of North American sales for Canadian A.I. firm Element AI, which was recently bought by ServiceNow.


Getting A.I. inside your head. It's called a theory of mind: the ability to know not just what we ourselves are thinking, but also to make reasonable inferences about what other people are thinking. If we want A.I. systems to interact with us—and with one another—effectively and safely, they will need to be able to infer the likely intentions and desires of others too.

Computer scientists have started to investigate this through games that are played in teams where explicit communication between partners is limited, such as bridge. Another game that is becoming a popular research arena is the cooperative card game Hanabi, in which a player can see the other players' hands, but not his or her own.

Now a group of scholars from the U.S. Naval Warfare Information Center—Pacific have made some progress toward getting multiple A.I. agents to play Hanabi and use their gameplay to communicate information to another A.I. player on their team, according to a paper published on research repository The researchers found that by using a dual-reward system, in which each A.I. agent was given the most points for winning the game, but also given some incentive for helping another player better understand its hidden hand, they were able to achieve results better than prior research. But they found that if the incentive for providing useful hints to other players was weighted too heavily, the A.I. agents' performance dropped, probably because the A.I. agents spent too much time trying to convey information to other players.


Why companies are thinking twice about using artificial intelligence—by Jonathan Vanian

Facebook’s Oversight Board finds that the social network isn’t great at making tough decisions—by Danielle Abril

How digital sommelier Vivino is becoming a Netflix for wine—by Stephanie Cain

Roche to use quantum computing for drug discovery—by Jeremy Kahn

Using human languages to make computers think more like us—by Fortune editors


One problem with many A.I. systems: the assumption that the future can be predicted by looking at the past.

Here's one vast area where this is definitely not always true: if an A.I. system is being used to classify human behavior for any kind of screening. In these cases, the very deployment of that system will almost certainly change human behavior in ways that will make the system less effective. What's more, the more transparent and explainable that system is, the faster the system's performance will degrade, because the faster people will figure out how to game it. In other words, a lot of classification algorithms for human behavior are inherently adversarial, even if we don't normally think of them that way.

For instance, we might all agree that an algorithm designed to ferret out credit card fraud is adversarial: The criminals want to figure out how to trick the system into letting them use a stolen credit card, for example. But what about sentiment analysis algorithms, the kind of natural language processing system that tries to figure out the sentiment of a given piece of text. That doesn't sound very adversarial, does it?  

Well, it turns out that because this kind of A.I. is being used by a lot of investment firms to screen financial filings and the transcripts of company's earnings calls to see if the news is positive or negative, they've actually had the effect of changing the language that company executives use in those contexts, making it much harder to glean a clear signal about a company's performance.

That was the conclusion of a recent working paper from researchers at the business schools of Columbia University and Georgia State University, published by the National Bureau of Economic Research. They found that instances of words that were rated as having a negative sentiment in a finance-specific dictionary have fallen markedly in financial filings recent years, most likely because companies deliberately avoid using them. What's more, the authors found that executives were even changing their tone of voice and expression on earnings calls in order to be "read" as more positive by A.I.-driven sentiment classifiers.

The result is ultimately a world where it becomes harder for the sentiment analysis software to work properly. My guess is that this will start to happen in many other walks of life as A.I. classification systems become more ubiquitous. What do you think?