From data strikes to data poisoning, how consumers can take back control from corporations

This is the web version of Eye on A.I., Fortune’s weekly newsletter covering artificial intelligence and business. To get it delivered weekly to your in-box, sign up here.

When workers get upset at their management, they can go on a strike as a symbolic gesture to implement change. It also turns out that the same idea can be applied to frustrated consumers in the form of a so-called data strike against companies that they believe abuse their personal data.

A team of Northwestern University researchers recently published a paper about ways the public can take action against companies that they believe misuse their data or act unethically. (The researchers will present their paper at the upcoming Association for Computing Machinery conference on Fairness, Accountability & Transparency in March.)

The paper focuses on the idea of “data leverage,” which refers to the power consumers have over companies that heavily use machine learning. Because companies need consumer data in order to fuel their machine-learning software, people can exert influence on businesses by altering their online behavior, such as stopping the use of such software. If people stop using a certain A.I.-powered app, for instance, the app will lose the data needed for it to learn properly, explained Nicholas Vincent, a Northwestern University graduate student who was one of the paper’s authors.

At the more extreme end of data leverage, consumers can also coordinate with each other to influence how an A.I. powered-app learns, a technique the researchers refer to as “data poisoning.” As the paper explains: Someone who dislikes pop music might use an online music platform to play a playlist of pop music when they step away from their device with the intention of “tricking” a recommender system into using their data to recommend pop music to similar pop-hating users.

While this idea may “sound silly,” Vincent said, it could be a way for people to voice their concerns about YouTube’s recommendation systems that point people toward extremist content. By infecting a machine learning system with unusual behavior, the software’s overall performance weakens because it’s trained on data that doesn’t reflect people’s true behavior.

Vincent acknowledged that there’s a fine line between this form of data activism and more nefarious acts, such as when Internet trolls caused Microsoft’s infamous Tay chatbot to spew offensive phrases based on their interactions.

But one of the researchers’ main points is to highlight to the public that their online behaviors heavily influence the A.I. systems of powerful tech companies.

“We are not trying to lead to complete data anarchy,” Vincent said.

There’s a prevailing assumption that companies like Google and Facebook are solely responsible for developing the A.I. systems that power their respective search engine or social media service, he said. People should realize, however, that these A.I. technologies derive their capabilities from the labor of users whose online behaviors help improve the software each day.

“We should say that we made all these things, because we did,” Vincent said.

Jonathan Vanian
jonathan.vanian@fortune.com
@jonathanvanian

A.I. IN THE NEWS

Google fires another prominent A.I. ethics researcher, reorganizes its ethical A.I. efforts. The company fired A.I. ethics researcher Margaret Mitchell, saying that she had breached cybersecurity policies by removing confidential data from its corporate network. But her dismissal will raise further questions about the company's heavy-handed tactics in trying to quash dissent and activism within its ranks. Mitchell, who co-led the A.I. ethics research team within Google Brain, its A.I. ethics research division, had been a strong supporter of her fellow research team co-lead Timnit Gebru, who left in late November. Gebru, who was one of the few Black A.I. researchers at the company, had objected to Google's decision to withdraw publication of a paper she and members of her team had co-written that raised questions about the ethics of large A.I. language models, including those Google had developed. Gebru had also criticized the company's commitment to racial and gender diversity on an internal message board. Thousands of people, both inside and outside of Google, had signed an open letter protesting Google's handling of situation. Sundar Pichai, CEO of Google and its parent-company Alphabet, had promised an investigation into the incident and acknowledged it had left employees questioning the company's record of diversity and inclusion. Just days before Mitchell was fired, the company announced a reorganization of all of its ethical A.I. initiatives, placing them under the supervision of Marian Croak, one of Google's most senior Black engineers but not an expert on A.I. ethics. It also announced that senior managers would now be assessed on how well they are doing in hitting the company's diversity and inclusion targets.

IBM is looking to sell its Watson A.I. Health unit. That's according to a report in The Wall Street Journal, which cited sources familiar with the sale process, saying the company was considering private equity buyers or possibly spinning off the unit as a separate public entity through a merger with a special purpose acquisition company (SPAC). Watson Health had been the jewel in IBM's Watson cognitive computing empire, but the company's technology had struggled to live up to expectations and highly-touted claims—for instance, its supposed ability to help select the best cancer treatments for individual patients or to discover new drugs—during real-world deployments. The Journal said IBM Watson Health has about $1 billion in annual revenue, but is not profitable. The possible sale of the unit is part of new IBM CEO Arvind Krishna's efforts to trim the company down and return it to growth. Krishna has said he is counting on A.I., along with IBM's cloud computing business, to deliver on that strategy. But the retrenchment from A.I. in the healthcare is a worrying sign, both for Big Blue, and for A.I.'s big business aspirations more generally.

Postmates delivery people are falling prey to scams in a trend that shows the dangers of management by algorithm. The Markup has a big report out about delivery people for Postmates falling victim to sophisticated scams in which they are tricked into revealing their Postmates password information and then their accounts are drained of their pay. The fact that Postmates treats its delivery people as freelance gig workers and its delivery system is highly automated, with no face-to-face contact with managers, makes the scams easier to pull off in some ways. Postmates delivery people say the company has done little to address the issue.

Waymo extends its fully autonomous taxi services to its own employees in San Francisco. The Alphabet-owned company said it was starting to allow volunteers from its own workforce to start booking rides in its fully-autonomous vehicles in the City by the Bay. In a blog, it detailed the ways it has had to improve its self-driving technology, including upgrades to its Lidar and A.I.-based software used to reason about road conditions, to handle the rigors of city driving. Previously, the company has only offered fully self-driving taxis (with no safety drivers) in a limited area in Phoenix, Arizona.

EYE ON A.I. TALENT

Twitter has hired Rumman Chowdhury to be its director of machine learning ethics, transparency & accountability, according to Tweets from both Chowdhury and Twitter exec Ariadna Font Llitjós.

Optibrium, a Cambridge, U.K.-based company that makes A.I.-powered software for drug discovery, said it has appointed Rae Lawrence as director of software development. She had previously been head of informatics and modeling at Cancer Research UK's Manchester Institute.

Google has named Marian Croak, a vice president at the company, to oversee the company's research on responsible A.I., supervising researchers and engineers working across 10 different teams at the company, Google said in a blog post announcing her new role. Croak, who is an expert in voice-over-Internet protocol (VOIP) technology, has most recently spent several years helping Google install public wi-fi throughout India's vast railroad network.

EYE ON A.I. RESEARCH

An A.I. coach for the operating room. Much of modern life and business is conducted in teams. And within any team, not all team members may have exactly the same idea about what the team's goals are, or should be, and what needs to be done at any given moment to achieve them. This, of course, can lead to friction among the team and less than optimal performance. In some safety critical situations—such as an emergency on airplane or a delicate heart operation—this "misalignment of mental models," as cognitive scientists term it, can cost lives.

A team of researchers from Rice University, MIT, Harvard Medical School and the U.S. Department of Veterans Affairs, recently proposed using an A.I. "coach" to assess how well-aligned the mental models of a team members are and to prompt the team when there is a potentially dangerous misalignment. In a paper published on the non-peer reviewed research repository arxiv.org, the scientists described their experiments with such a system in two simulations of important tasks during an open-heart operation, including one in which an anesthesiologist must administer a drug that restores normal blood clotting, protamine. This should be done incrementally over a period of time to avoid a potentially lethal allergic reaction that some patients have if the drug is given all at once. But anesthesiologists, due to miscommunication with the rest of the surgical team, often fail to administer it properly.

The other simulation had to do with whether a scrub nurse delivered the correct surgical tools to the operating room before the operation. In both cases, the researchers found, it was possible to train an A.I. system that can correctly identify dangerous "mental misalignments" about 76% of the time in both the protamine task simulation and the surgical tool delivery task. (In the protamine task, the system has some false positives; its overall ability to correctly determine whether the team members were aligned or misaligned was 66%, but as the researchers point out, the consequences of a false positive are not terrible compared to the potentially life threatening ones from a false negative.)

FORTUNE ON A.I.

Google fires another co-lead of its A.I. ethics research group—by Jeremy Kahn

The Black women making tech more equal—by Fortune Editors

As mutant COVID variants multiply, the hunt is on for a ‘universal’ kill-all vaccine—by Jeremy Kahn

Virtual interior design services go mainstream as the boundaries between work and home continue to blur—by Stephanie Cain

BRAIN FOOD

Is there a huge ethical morass lurking behind some of Google's biggest A.I. breakthroughs? Some believe it was the fact that Google's A.I. ethics research team was starting to question the ethics and fairness of Google's own systems that got the co-leads of that team, Timnit Gebru and Margaret Mitchell, in hot water with company executives. But now some outside researchers are also raising critical questions about the proprietary datasets that underpin a number of research breakthroughs Google scientists have made in recent years.

Vinay Prabhu, the chief scientist at UnifyID, an authentication software company, and Abeba Birhane, a graduate student in cognitive science at University College Dublin, recently tried to investigate JFT-300M, a mysterious Google dataset that has been used for a number of seminal computer vision and neural network research papers since 2015. The dataset is not well explained in most of the papers and "Google has ever published an audit of this dataset," Prabhu says in a YouTube video he posted explaining his investigation of JFT-300M.

He says that even within the company, discussion of the dataset seems to be somewhat "hush hush." Prabhu said it took him quite a lot of digging to even discover what JFT stands for—Joint Photo Tree is the apparent answer. In 2015, the dataset contained at least 100 million images in 50,000 categories. It has since been expanded to include at least 300 million images. A companion extended image database called EFT (for either Extended Foto Tree or Entity Foto Tree, according to Prabhu's various sources) brings the total up to at least 400 million images in over 100,000 classes. Google has since used a portion of these images to create its very large Open Images Dataset, which it has made available to other A.I. researchers around the world.

Prabhu and Birhane find that this massive opaque dataset almost certainly contains what is called "nonconsensual imagery," i.e. photos in which those pictured have not given their consent for their image to be used in A.I. research. The JFT-300M had, for instance, photos of people's bridal and baby showers, apparently scraped from the photo sharing site Flickr. Troublingly, the dataset's swimwear category contained many images of children in bathing suits. When Prabhu and Birhane contacted some of the people shown in the pictures—and the grandparents of some of the children depicted—they were unhappy to discover their photos had been used in this manner. The photo subjects, or those who took the photos, said they never knowingly gave their consent to have the photos used, although Flickr did in some cases default users to a "creative commons" license that would allow certain uses of the images. This was buried in small-print legalese. You can learn more in Prabhu's Twitter thread on his investigation.

Even if Google hasn't violated any law here, Prabhu says, Google may be using ethically dubious methods. And because Google has open-sourced some of the models it has created with this dataset or sells cloud-based APIs for systems trained on this data, it is possible other corporations could become culpable too.

Subscribe to Well Adjusted, our newsletter full of simple strategies to work smarter and live better, from the Fortune Well team. Sign up today.