Companies are dealing with much more data than they were twenty years ago, so why are they still storing it the same way?
By Charlie Silver
Based on some estimates, our current growth pace of data is setting mankind on a course to produce nearly 10 trillion gigabytes – or 10 “Zettabytes” – of data by 2015, enough to store over one quintillion pages of text or almost 70 trillion hours of flash video. According to IDC’s forecast of worldwide information growth, one third of this data comes from the enterprise and a majority of enterprise data is generated by individual workers.
As a result, the technology that helps enterprises make sense of this data — business intelligence — has quickly become a fast-growth multibillion-dollar market. Businesses are drawn to viewing things in charts and graphs, but the foundation of analytics — processing and storing data — is still archaic.
The format for holding data in essentially a large collection of primitive Excel spreadsheets has survived against all odds and challenges as the ubiquitous format for data in IT. This archaic tool is now responsible for thousands of wasted full-time IT jobs that could be dedicated to meaningful projects.
Any large company has an entire team dedicated to the management of these spreadsheets of data called “relational data tables.” This team spends their careers loading data into tables, restructuring them for performance and attempting to map them to user needs.
The explosive growth of data means that teams like this are capturing, formatting and optimizing an increasingly smaller portion of the company’s data that is made available for analytics, even if the team itself is growing.
I experienced this first-hand when I co-founded RealAge, a website that collects lifestyle information on individuals to determine their biological youthfulness compared to others. The anonymous data we collected on individuals, their health, interests, and lifestyles were valuable to the pharmaceutical companies that backed us. However, it was surprisingly difficult to do advanced analytics on such a large pool of data using traditional technology.
Despite having web developers, programmers and data analysts at the heart of our company, the data was overwhelming. The “cutting edge” technology we were using for data had not made a significant advancement since today’s data model was introduced by E.F. Codd’s landmark paper in 1970.
Just as the ‘status quo’ relational database went from cutting-edge to obsolete, advanced mathematics reinvented enterprise software. Flight schedules and prices, global supply chains, daily stock trades, optimal product designs and global parts sourcing are just a handful of the large, complex problems now handled by computers to automate thousands of decisions and tasks with mathematical precision.
Using math as a reasoning engine for computers, we’ve enabled computers to think for themselves, to automate, and to improve otherwise mundane tasks that don’t require human intuition to “calculate” decisions. Yet for some reason, conventional database administration has remained painfully manual, even though it lends itself to the precision and automation of mathematical calculation.
The volume of data for business analytics is now so large, diverse and increasingly unstructured that the relational model needs to join all the other technologies from the 70s and 80s as an antique.
Charlie Silver is CEO of Algebraix Data, a startup that uses advanced mathematics to dramatically improve the performance of extremely large databases.