Sorry, Uber. Customer Service Ratings Cannot Replace Managers by Jeffrey Pfeffer @FortuneMagazine March 31, 2016, 8:40 AM EDT E-mail Tweet Facebook Linkedin Share icons The so-called “gig” economy is mostly filled with companies that have few to no employees who actually provide the companies’ primary services. Full-time employees at companies such as Uber, Airbnb, Postmates, Taskrabbit, and Doordash provide public relations and legal services, marketing, and of course the technical development and maintenance of the software platforms that make the in-home chefs, rides, or renting accommodations possible. These platform-as-business-model enterprises raise an interesting question: If the people who provide the core services are independent contractors, and if these independent contractors have no supervisors or bosses, how are they managed so the companies can deliver the high quality service necessary to build a good reputation and strong customer retention? The answer: these people are managed by customer ratings. People who are highly rated are kept on board and those who are not highly rated are dropped. Moreover, the use of customer ratings to evaluate workers is increasingly used even in companies with employees, as the use of customer experience surveys grows. For instance, it is almost impossible to call any business these days and not, as part of the phone tree experience, be asked if you are willing to do a brief survey at the end of the call. The argument: data on individual performance, derived from surveys and ratings, can replace management and supervision. It’s as simple as that. But, of course, it isn’t that simple. Here’s why. First of all, ratings provided by people who may use widely varying criteria and are not trained in how to do assessments, are almost certainly unreliable and invalid. I showed that there was little correspondence between restaurant ratings on TripAdvisor and lists of Michelin-starred restaurants and reviewed the extensive research demonstrating zero correlation between student ratings of teachers and objective measures of what students learned. Recently, three University of Colorado marketing professors published a study using the Amazon ratings of 1,272 products across some 120 different product categories. They found that the Amazon ratings did not converge either with Consumer Reports ratings or with resale values for products where there was a resale market. Moreover, the consumer ratings showed high dispersion, meaning that there was so much variation across raters that the reliability of the ratings was questionable. Second, using ratings to drop workers assumes that the people who have been “dismissed” can be readily replaced, and presumably by better performers. That Uber and Lyft have offered various bonuses to sign up drivers suggests that, even in the gig economy, workers may be scarce. And by firing a poor performer and then getting a replacement from essentially the same labor pool, you are relying on random luck to find someone who is going to do a better job. Intuitively believing that ratings could not do the job of real, human supervisors, I talked to Adi Bittan to further explore my misgivings. Bittan, a former Stanford MBA student, is the co-founder of Owner Listens, a company whose mission is to provide real-time, detailed feedback so that customer issues can be addressed before customers posted negative reviews or left unhappy, never to return. In addition to the two problems already mentioned, here are some more that emerged in our conversation. Few ratings provide actionable information. Positive or negative ratings arise from numerous behaviors, and just seeing a score doesn’t tell someone either what to keep doing or what to change. Because of the wide variations among individual ratings, people face conflicting messages—there is simply too much noise in the data to know how to respond. For instance, people using a car service may not have liked the radio volume, the choice of music, the cleanliness of the car—who knows what? Receiving comments on specific behaviors is essential for understanding what and how to change. Next, ratings are not normalized (although they could be) for the person doing the rating. Just as some teachers are hard graders and others are easier, some people give high ratings and others rate more stringently. A “4” on a five-point scale means something completely different from someone who normally gives 5’s compared to someone who mostly gives 3’s. Here’s the biggest problem: ratings, unlike supervisors, can’t provide coaching or training about how to improve. As we come to the end of “March Madness,” picture replacing successful basketball coaches with a rating system for players. Yes, such ratings could (and do) distinguish player ability. But those ratings would not provide inspiration and motivation at difficult moments during games and, more importantly, the instruction that helps even talented individuals develop to their full potential. Bittan believes that ratings are helpful in the case of the simplest, most basic services, when there is less complexity and less learning and training inherent to the task. For more complex and complicated tasks and services, though, there is no substitute for effective supervision. So, before you think of replacing your managers with rating systems, remember this: according to Brandeis University’s Jody Hoffer Gittell’s book, The Southwest Airlines Way, Southwest, which was for many years a leader in customer service, had more supervisors with fewer direct reports than its competitors. That’s because Southwest’s managers were expected to provide coaching, which requires more, rather than fewer, real human beings. Jeffrey Pfeffer is the Thomas D. Dee II Professor of Organizational Behavior at the Graduate School of Business, Stanford University. His most recent book is Leadership BS: Fixing Workplaces and Careers One Truth at a Time.