Skip to Content

Pivotal open sources tech for SQL and machine learning on Hadoop

Pivotal chief executive Paul Maritz speaks at the 2014 Fortune Brainstorm Tech conference in Aspen, Colorado.Pivotal chief executive Paul Maritz speaks at the 2014 Fortune Brainstorm Tech conference in Aspen, Colorado.
Former Pivotal chief executive Paul Maritz at 2014 Fortune Brainstorm Tech.Kevin Moloney/Fortune Brainstorm TECH

Pivotal, the software company that spun out of EMC (EMC) and VMware (VMW) in 2013, is open sourcing technologies for performing advanced analytics on data stored in the Hadoop big data platform.

The two technologies—HAWQ, a scale-out SQL database on Hadoop, and MADlib, a library of machine learning algorithms for databases like HAWQ—will be released as open source projects to the Apache Software Foundation. (MADlib was technically open source already, but was not an Apache project). The move is a continuation of what Pivotal started in February, when it open sourced the code for its Greenplum database software and its proprietary distribution of the Hadoop software.

As part of that February announcement, Pivotal created an organization called the Open Data Platform, along with former competitor Hortonworks (HDP) and a handful of other companies. All of them vowed to build Hadoop technologies around a standard core that’s essentially the core of the Hortonworks software platform, but reserved the right to add proprietary technology around the edges.

A generous take on all this activity would be to say that Pivotal is making a noble gesture by open sourcing these technologies in the name of the greater big data community.

Probably a more realistic view is that Pivotal knows it cannot sell a suite of proprietary software products into a space dominated by companies such as Cloudera, Hortonworks and MapR that exist solely to sell big data software, and that are already pushing largely open source technologies. Viewed in this light, Pivotal hopes that open source versions of its software will spur more companies to start using them, and maybe paying for support down the line, and that the open source community will help pick up the pace of development on these technologies.

When Pivotal spun out from EMC and VMware, it brought along with it a slew of technologies old and new for cloud computing and big data processing. It has yet to really merge them into a holistic platform and, in fact, the cloud business (powered by another open source technology called Cloud Foundry) appears to have taken off while the big data business and tech looks a little staid. Releasing its big data technologies into the open is a good way to minimize investment while holding onto the possibility of generating some revenue should they finally take off.

For more about the business value of data analytics, watch this Fortune video:

Sign up for Data Sheet, Fortune’s daily newsletter about the business of technology.