A few years ago, when I had a hideously bad auto repair experience, I posted a negative rating on Yelp. And then soon the rating disappeared from the first page as positive ratings poured in. That experience made me suspicious about what has come to be called the reputation or ratings economy.
Over the ensuing years, ratings and ratings websites have proliferated. Everyone and everything from mental health providers to, as Times columnist Maureen Dowd humorously noted when she had trouble getting a ride, Uber passengers now get rated.
Curious about how and whether the ratings game was a good thing or not, I did a deep dive into this world and quickly discovered many problems with the reputation economy. Here is what I learned.
Michael Fertik, the founder of Reputation.com (originally called ReputationDefender), has built a huge business on the fact that ratings and reputations matter and that most people and companies understand that. His company, started in 2006, “has curated the online reputation of 1.6 million customers who pay … to have their most flattering activities showcased to the world via search engines,” The Guardian reported. A person’s reputation—whether accurate, manufactured, or some combination of the two—can have an impact on job prospects and the ability to raise capital for startups. And people’s social status affects their marriage prospects and partners.
Ratings profoundly affect consumer choice. One survey of more than 1,000 people reported that two-thirds of respondents read online reviews, that 90% of customers who accessed reviews said that their buying decisions were influenced by positive reviews, and 86% said that negative reviews influenced their choices. The scholarly literature concurs with the importance of consumer ratings. One article noted that, “consumer reviews have been shown to predict purchasing decisions … to drive further consumer ratings … and to have more influence than expert reviews.” Moreover, that same piece stated that, “sales figures increase as a function of product ratings rather than the quality of the product.”
But are they accurate?
The potent influence of consumer ratings raises the question: how accurate are these ratings that so powerfully affect judgment and decision-making? The answer to this question depends on what you mean by accurate.
Consider three examples that vary both in the importance of selecting the right provider and also in the extent to which there are objective criteria of performance.
There’s probably nothing more important than getting the best possible medical treatment. Medical outcomes, ranging from the degree of improvement in a person’s illness to the frequency of iatrogenic (medical-treatment caused) illness, are observable. You’d expect consumers to be fairly accurate in assessing the quality of the care they receive. But they aren’t.
Consumer’s Checkbook, a membership-subscription organization that operates in several metropolitan areas, including San Francisco, asks consumers to rate primary care physicians. Checkbook also surveys practicing physicians for their nominations of the best doctors in various specialties, including primary care. The organization, which accepts no advertising, also performs its own physician quality ratings.
Of the 104 top-rated primary care doctors as assessed by patients in 2014, just 17 were nominated as the best by their medical peers. And barely 60% of the doctors rated highest by patients were top-rated by Checkbook.
Then there are those ubiquitous teacher ratings, particularly of college professors. For decades, higher education institutions have used student surveys as part of the faculty evaluation process, and now most places mandate end-of-course student evaluations. If, like me, you believe that the fundamental job of a teacher is to teach—to impart knowledge that students learn and retain—as contrasted, for instance, with providing entertainment or becoming students’ best friends, then it seems reasonable to measure accuracy by examining the relationship between teacher ratings and what students learn through an objective measurement.
The good news is that teacher ratings have been done for a long time and there are numerous studies of the relationship between student evaluations and learning. The bad news is that student course evaluations do not have any relationship with objective measures of what students have learned—a fact that has been known for more than four decades. For instance, one paper, published in 1972, studied 293 undergraduates in a calculus course and found that, “Instructors with the lowest subjective ratings received the highest objective scores.” The fact that student ratings do not offer any valuable insight on how well students learn has not affected the prevalence and use of the ratings.
Restaurant quality and the dining experience are both more subjective and also have fewer consequences than choosing the right doctor or getting a good teacher. Michelin has, since 1926, employed anonymous, knowledgeable, experienced experts to go to cities all over the world and find the very best places to eat. We can compare how Michelin rates restaurants with the same restaurants’ ratings made by the general public on sites such as TripAdvisor.
I selected two cities, San Francisco, near where I live, and Barcelona, a place my wife and I recently visited. I looked at the 2015 Michelin lists of the places that earned stars (in San Francisco, I considered only establishments located in the city itself) and also ratings on TripAdvisor. Here’s what I found.
Barcelona has 21 one- or two-star Michelin restaurants. Of the Michelin-rated establishments, presumably the very best in the city, only one is in TripAdvisor’s top 10, only 2 are in the top 50, and only 7 of the 21 ranked in TripAdvisor’s top 100. Nectari, with 1 Michelin star, ranks 2,262 on TripAdvisor, and Enoteca ranks 1,333.
Diners/raters in San Francisco agree with Michelin only slightly more. Of San Francisco’s 24 Michelin-starred restaurants, one, Gary Danko, is in TripAdvisor’s top 10, but 6 are in the top 50. However, Coi, one of four places in the entire Bay Area that earned two Michelin stars, ranks just 562 on TripAdvisor.
At least for these three domains, and quite possibly many others, ratings by consumers—of restaurants, academic instruction, or medical services—are quite uncorrelated with either expert opinion or objective measures of performance. This fact, of course, is precisely why companies in the reputation management space can be successful—reputations can be “managed” in the best and worst sense of that term, regardless of actual quality.
Why ratings encourage the wrong behaviors
Because ratings, and the reputations those ratings create, have economic consequences, there are, unsurprisingly, substantial incentives to game the system. One increasingly common way of gaming the system entails hiring people (or developing software, which is fortunately easier to detect and prevent) to post inauthentic reviews. One study estimated that 16% of the restaurant reviews on Yelp were fraudulent, that fraudulent reviews were more extreme, and that restaurants with weak reputations were more likely to commit review fraud. A 2012 study by IT research firm Gartner estimated that 15% of online reviews were fake. In 2013, New York State’s attorney general “announced a deal with 19 businesses that agreed to stop writing fake reviews.”
Numerous websites pop up (and then disappear) offering to hire people to write positive reviews about you and negative reviews about your competitors. Online purchasing is supposed to give customers access to informative reviews before they make a purchase decision. Maintaining the integrity of these reviews is economically important. Not surprisingly, then, both Amazon.com and Yelp have been increasingly aggressive in their attempts to build algorithms that weed out fake reviews and also to initiate legal action against their perpetrators.
Adi Bittan, a former Stanford MBA student and co-founder of OwnerListens, told me that there were two types of strategies that companies used: “white hat” and “black hat” approaches. “White-hat” strategies entail moves such as figuring out who your most satisfied customers are and then encouraging them—and even making it easier for them—to write reviews on popular websites. “Black-hat” strategies involve disparaging competitors, or maybe even future competitors. In one particularly notorious and well-known example, Chicago celebrity chef Graham Elliot’s “highly anticipated and oft-delayed gourmet sandwich/soft serve shop” got a 1-star review on Yelp from a prospective patron who said his “otherwise pleasant walk” was ruined by going to the establishment and finding that it was closed. The café had not even opened its doors for business at that point. Elliot, whose opinions of Yelp are essentially unprintable, took this as an example of how bad reviews are.
There are more problems with the reputation economy beyond just manipulated and inaccurate ratings. The prospect of customer reviews can induce behaviors designed to increase customer ratings in ways that are not useful and are sometimes harmful.
Returning to teacher ratings, there is a common belief, supported by at least some evidence, that one way to achieve higher ratings is for instructors to give the students who are doing the ratings higher grades. This belief produces the now-endemic grade inflation in higher education and also makes grades less meaningful as indicators of student achievement or ability. It’s unclear if higher grades produce higher teacher ratings, but the belief that this relationship holds nonetheless affects instructor behavior.
This behavior is all about reciprocity—I help you out (for instance, by giving you a good grade) and you help me out (for instance, by giving me a high rating)—and the natural human tendency to be nice and the associated desire to not be perceived as negative or difficult. These ideas call into question what happens when, like with teachers or Uber drivers, you have counterparties in a transaction rating each other.
An article in TechCrunch noted that eBay dispensed with reciprocal reviews in 2008 and also reported on a study that found that the identical property was rated 14% higher on Airbnb (that uses reciprocal ratings) than on TripAdvisor, which does not. That same piece noted: “People want to look good in social settings in which people’s identities are not anonymous, people tend to shy away from saying bad things because they don’t want to be the one who seems like a constant complainer or never-ending nagger.” The average Uber driver score is too high, according to Bittan, who believes that reciprocal reviews create incentives for being overly positive to get a positive review in return.
And there are more serious problems than just giving higher grades or higher ratings to encourage others to help you out in return. Doctors seeking higher patient ratings are more willing to order (unnecessary) diagnostic tests or to prescribe antibiotics or potent painkillers even when not needed or helpful, particularly if patients request them. In other words, reviews or the prospect of being reviewed changes treatment: “In a 2012 survey by the South Carolina Medical Association, half of the physicians surveyed said that pressure to improve patient satisfaction led them to inappropriately prescribe antibiotics or narcotics.” It would be interesting to see if there is a relationship, both over time and across settings, between the prevalence of patient reviews and the growing problem of opiate abuse.
Is there any way out of this problem?
Cheating, particularly in its extreme or least sophisticated forms, can be detected statistically, albeit imperfectly. Economists Brian Jacobs and Steven Levitt, in a famous paper, showed that “unexpected test score fluctuations and suspicious patterns of answers” could be used to detect teacher cheating to artificially raise their students’ scores. As I noted above, Yelp, Amazon, and Google, among others, are all working to try to eliminate fake reviews, including by building algorithms to highlight suspicious activity.
Amazon’s verified purchaser identification of reviews and related strategies help to raise the cost and difficulty of flooding sites with bogus information.
The world of assessing job candidates and doing performance appraisals, both forms of rating, offer another useful solution: provide standardized product or service dimensions for evaluation. One reason Michelin and diners’ ratings differ is that the Michelin employees have a more standardized set of criteria to evaluate restaurants and a process to ensure that those standards are used.
Bittan, whose company was established to help provide businesses of all sizes with real-time customer feedback, preemptively solve service issues, and head off negative reviews, made two other suggestions. She noted that people are less likely to engage in deception if they can’t do so anonymously, so requiring people to identify who they are might help. And she noted that, for many obvious reasons, your friends and even acquaintances are more likely to provide useful and honest information than are others. However, in this regard, “some data show that a good majority of people in North America believe and trust online reviews more than they trust their friends’ opinions.” Bad decision.
Certainly don’t rely solely on the most recent reviews or the most prominent online search results. Most people are cognitively lazy, looking only at summaries and a few, recent reviews, and that’s precisely the behavior that reputation management of any form counts on. So drowning negative reviews in an ocean of positive ones is the simplest and, ironically, the easiest way to detect reputation management games. You can also read (or program a computer to read) the most positive and negative reviews to see if many of the reviews use similar language, a possible but not perfect indicator that they are fake or managed.
In the end, if social capital is truly like money, preventing counterfeiting is going to become increasingly important. Just as in the case of money, there is an arms race between those seeking to prevent counterfeit ratings and those who seek to profit from the fact that reputations can be “manufactured,” or at least managed. And as the economic implications of ratings grow, the temptations to cheat will increase proportionately.
The wonderful world of the reputation economy is far from completely wonderful—or even honest. Therefore, to the extent possible, you might be better off relying on unbiased expert opinions if you can find them. And you can in many domains, although sometimes you might have to pay. Many newspapers publish best restaurant lists, and numerous organizations, including Consumer Reports and Checkbook, seek to provide unbiased reviews of all types of products and service providers.
Expert opinion can be bought and sold, too, but experts have more to lose and have more of their social identity tied up in their unbiased expertise than the people selling their ratings on some website, or maybe even than the companies who manage reputations for a profit. And don’t let the ready availability of summary scores induce you to not expend sufficient effort on discerning the best from the rest. In the reputation economy, too, “let the buyer beware” is a useful guideline.
Jeffrey Pfeffer is the Thomas D. Dee II Professor of Organizational Behavior at the Graduate School of Business, Stanford University. His latest book, Leadership B.S.: Fixing Workplaces and Careers One Truth at a Time will be published in September 2015 by HarperCollins.