Introduction to Big Data
As a software tester, you need a clear definition of ‘Big Data’. Big Data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And Big Data may be as important to business – and society – as the Internet has become. It is more than simply a matter of size; it is an opportunity to find insights in new and emerging types of data and content, to make your business more agile, and to answer questions that were previously considered beyond your reach.
In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions. Until now, there was no practical way to harvest this opportunity. Today, IBM’s platform for Big data uses state of the art technologies including patented advanced analytics to open the door to a world of possibilities.
Precisely, Big Data is a series of approaches, tools and methods for processing of high volumes of structured and (what is the most important) of unstructured data. The key difference of Big Data from ordinary high load-systems is the ability to create flexible queries.
Big data spans three dimensions: Volume, Velocity and Variety.
Volume: Many factors contribute to the increase in data volume. Transaction-based data stored through the years. Unstructured data streaming in from social media. Increasing amounts of sensor and machine-to-machine data being collected. In the past, excessive data volume was a storage issue. But with decreasing storage costs, other issues emerge, including how to determine relevance within large data volumes and how to use analytics to create value from relevant data.
Velocity: Data is streaming in at unprecedented speed and must be dealt with in a timely manner. RFID (Radio-frequency identification) tags, sensors and smart metering are driving the need to deal with torrents of data in near-real time. Reacting quickly enough to deal with data velocity is a challenge for most organizations. Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value.
- Scrutinize 5 million trade events created each day to identify potential fraud
- Analyze 500 million daily call detail records in real-time to predict customer churn faster
Variety: Data today comes in all types of formats. Structured, numeric data in traditional databases. Big data is any type of data – structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.
- Monitor 100’s of live video feeds from surveillance cameras to target points of interest
- Exploit the 80% data growth in images, video and documents to improve customer satisfaction
Managing, merging and governing different varieties of data is something many organizations still grapple with.
Benefits of Big Data include:
- More accurate data
- Improved business decisions
- Improved marketing strategy and targeting
- Increased revenue due to increased customer and base and decreased costs
Relevance of Big Data
The hopeful vision is that organizations will be able to take data from any source, harness relevant data and analyze it to find the desired answers. By combining Big data and high-powered analytics, it is possible to:
- Determine root causes of failures, issues and defects in near-real time, potentially saving billions of dollars annually.
- Optimize routes for many thousands of package delivery vehicles while they are on the road.
- Analyze millions of SKUs to determine prices that maximize profit and clear inventory.
- Generate retail coupons at the point of sale based on the customer’s current and past purchases.
- Send tailored recommendations to mobile devices while customers are in the right area to take advantage of offers.
- Recalculate entire risk portfolios in minutes.
- Quickly identify customers who matter the most.
- Use click stream analysis and data mining to detect fraudulent behavior.
An Example of Big Data
An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people, all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.