Artificial IntelligenceCryptocurrencyMetaverseCybersecurityTech Forward

Why Hortonworks acquired a data startup with ties to the internet of things

August 27, 2015, 12:09 AM UTC
Image courtesy of Hortonworks

Hortonworks, a company that sells a version of open source Hadoop software for processing huge amounts of data, is using its latest acquisition to push into small data.

The company said Tuesday that it will buy Onyara, a startup that is more concerned with understanding the kinds of data its customers have and where it came from than crunching information for hours.

The rationale behind the deal is twofold. Buying Onyara helps Hortonworks (HDP) pick up critical technology in what is among the most hyped tech sectors—the internet of things. The idea is for companies to connect machinery, trucks, and trains online so that they can collect data about them and improve their business.

The second reason is that this deal will introduce Hortonworks’ software to new customers that might be using other software like SAP’s Hana or other versions of Hadoop to crunch data. It’s a tried and true business tactic.

To understand how all of this will happen, you have to understand how Onyara fits into the internet of things and Hortonworks’ strategy to enter the data processing market outside the data center. While there are plenty of people who claim that the Internet of things is a bunch of malarkey, there is a real trend of connecting equipment, especially industrial equipment, to the Internet.

Connected sensors let companies gather data from their machines and quickly send it to computers. Cheaper and faster processing power and inexpensive storage means that managers can now analyze that data quickly and over long periods of time to help make manufacturing, mining, and whatever else more efficient. With the data, companies may be able to make higher quality products more cheaply. Manufacturing assembly lines may also be monitored more closely so there are fewer costly breakdowns, because computers can predict them before they happen.

Companies are still coming up with ways to use this extra data analysis to improve their businesses. And there are very real technical challenges in placing sensors everywhere, trying to collect data from them, and then using that data to predict the future. Onyara’s open source software, called Project NiFi, which was derived from work at the National Security Agency, helps solve some of those problems.

For example, not every sensor is placed where there is enough bandwidth. For example, a sensor on an oil drilling platform may only connect a few times a day to the Internet through an expensive satellite link.

As a result, some of that data is likely processed on the platform first. Waiting to send it elsewhere isn’t always an option.

Additionally, and more subtly, is that the data sent back to the mainland comes in a package that covers several hours of activity. However, all that data is not created equal. What if a machine on the platform overheated during that period. If so, that information is probably more important than whatever else the sensors happened to measure that day.

Onyara’s software helps manage the flow of data on a remote computer (it can run on something as small as a laptop). But it also can prioritize what’s most critical. So the sensor data detailing conditions of the machine just before it overheated is prioritized higher than other data. The reason? It is presumably relevant to include in future predictive algorithms monitoring for machines likely to overheat.

This is complicated stuff, and Onyara is not the only company working to build software to manage data at what IT specialists view as the “edge” of the internet of things. Technologists call this the edge because they view the data center as the central computing power and all of the sensors and computers outside the data center as being on the edge of that network. National Instruments is another company trying to establish a business in this arena while GE is trying to make a play covering the entire ecosystem including at the edge.

However, there’s no reason that a company like Hortonworks, which has made its name in big data, can’t also get involved in smaller data—before that data is sliced and diced using Hortonworks’ Hadoop software.

Following this deal, Hortonworks plans to add a new product called Hortonworks Data Flow that includes the elements from the Onyara acquisition. Because it will be based on open source code, it will also work with other Hadoop versions run by Hortonworks’ rivals like Cloudera and other database software such as SAP’s HANA.

Tim Hall, VP product management with Hortonworks, says Hortonworks Data Flow might offer only fewer capabilities if the customer isn’t running Hortonworks’ Hadoop software. As a result, it might convince more customers to switch to Hortonworks’ software.

It might, but my bet is that instead we’ll see rivals to hortonworks take the open source NiFi code and lure developers from Onyara to their own companies to support their efforts. Or perhaps we’ll see SAP and Cloudera acquire other startups and technologies that do similar things to Onyara.

After all, there are a lot of data startups out there, and with the internet of things as hot as it is, there are plenty companies bragging that they can handle both the real-time nature of sensor data processing as well as some of the challenges associated with poor bandwidth and data prioritization.

Subscribe to Data Sheet, Fortune’s daily newsletter on the business of technology.