From the Inside January/February 2012
Big Data has been a hot topic lately. Since it really is not clear what it is, or how to Big Data applies to us, I figured I'd add my two cents on the topic.
First of all, Big Data is not new. This problem has existed and been handled for much longer than the last year or so.
Big Data references to the complexity, amount, and the management of large quantity of data. If you look at the definition on Wikipedia, you will see it talks about "terabytes and petabytes" of data, but it also states "beyond the ability of commonly used software tools to manage and process within a tolerable time."
Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data sizes are a constantly moving target currently ranging from a few dozen terabytes to many petabytes of data in a single data set.
Reference: http://en.wikipedia.org/wiki/Big_data
Let's look at the concept of "tolerable time" and "software tools." At first glance you may say, well, I don't fall into that category. I can manage all my data in my programs within a "tolerable time."
"Tolerable time" is extremely relative due to the "who, when, and why" an enterprise has to deal with on a daily basis. Are you generating reports? Are you generating lookups and presentation tools for immediate action or batch processing? When does the user expect this information to be presented to them? Now? Or in 30 minutes?
The "Big" in big data is relative as well. I've seen articles on the Internet arguing about what constitutes "Big data sizes." Some define it in the amount of data being stored and the size of the database.
Let's look at the classic example of sensor data. This data is usually highly structured and not very big per transaction (50 Bytes). But there can be a large number transactions. If that is the case, one million transactions would be about 48 MB. That is not "terabytes and patabytes." It would take 21,990,232,555 transactions to reach one terabyte (60,247,212 transactions a day for one year). While this may happen in your environment, in a normal business environment this is not likely.
Working with Big data is not about the size of your database, but the complexity of the data you need to work with. The idea of "Big Data" was to address problem of working with your complex data to find trends and solve problems with large volume and variety of data.
So how does "Big Data" apply to us? For the most part, "Big Data" is more Hype and Marketing that anything real. Our business data deals with "Big Data" on a regular basis — just based on the sheer volume and complexity of the interrelated nature of your business systems.
If you want to hear more about "Big Data," join us at the 2012 International Spectrum Conference to see how people are handing their own "Big Data" and the different techniques you can use to handle your Big Data issues.