Why LinkedIn is important to the future of the Internet of things
LinkedIn LNKD is not often thought of as a technological giant on par with Google or Facebook, but the professional networking pioneer has made some significant contributions to the way that companies capture and analyze the mountains of data they’re generating. Undoubtedly the biggest is an open source technology called Kafka that was originally developed to pass messages between the pieces of LinkedIn’s sprawling web application, but which is now becoming a fundamental component of the Internet of things—the movement to connect everything from toothbrushes to jet engines.
At a high level, Kafka works by receiving “messages” from one system—that Derrick Harris has edited his profile, for example—and sending them to other systems that needs that information, often in real-time. These are things like a standard database, a big data processing engine (e.g., Hadoop) and the system that powers LinkedIn’s People You May Know feature. It’s a relatively simple function that is very difficult to pull off at when you’re a major web company dealing with huge volumes of fast-moving data.
On Wednesday, LinkedIn gave some more detail into its unique data challenges via a blog post, in which the company says Kafka handles 1 trillion messages per day. That’s the equivalent of 1.34 petabytes (or 1,340,000 gigabytes, or the storage capacity of about 10,468 base-model MacBook Airs) passing through the system each week. It’s also a 1,200% increase from the 1 billion messages that Kafka processed each day just five years ago.
That is can handle so much data—and the fact that it’s open source—has already made Kafka a big hit among other web companies, such as Netflix NFLX, that have their own challenges brokering data among millions of users and dozens (or more) of backend systems.
Conventional wisdom suggests Kafka is now poised to become a major factor in the Internet of things. Both CEOs and consumers will eventually start expecting more information faster from their connected things — from smart jewelry to sensor-heavy delivery trucks — and Kafka has already proven its ability to get the relevant data (and lots of it) from Point A to Point B in a flash. According to Wednesday’s blog post, enhanced security features, including encryption, are on their way, too.
A trio of former LinkedIn engineers who created Kafka while at the company have even formed a startup called Confluent, which has raised more than $30 million in venture capital less than a year into its existence. Its mission, as one might expect, is to commercialize Kafka and support its deployment at large, mainstream companies. Confluent, as well as other vendors in the big data space, is banking that the Internet of things could be a major driver for their technologies.
LinkedIn, however, is hardly the only web company responsible for developing the data-processing technologies that will eventually power our connected lives and businesses. Companies like Google GOOG and Facebook FB have developed fundamental tools for storing, processing, and serving huge amounts of data fast. And Twitter TWTR —a company that knows a thing or two about fast-flowing data—has driven the development of Storm, a real-time data-analysis technology already in use by early Internet of things adopters, as well as a newer, better version called Heron.