Menu
Different Approaches to Big Data Analysis by Judith S. Hurwitz, Alan Nugent, Fern Halper, Marcia Kaufman

In many cases, big data analysis will be represented to the end user through reports and visualizations. Because the raw data can be incomprehensively varied, you will have to rely on analysis tools and techniques to help present the data in meaningful ways.

New applications are coming available and will fall broadly into two categories: custom or semi-custom.

Custom applications for big data analysis

In general, a custom application is created for a specific purpose or a related set of purposes. For big data analysis, the purpose of custom application development is to speed up the time to decision or action.

R environment

The "R" environment is based on the "S" statistics and analysis language developed in the 1990s by Bell Laboratories. It is maintained by the GNU project and is available under the GNU license.

While challenging to fully comprehend, its depth and flexibility make it a compelling choice for analytics application developers and "power users." In addition, the CRAN R project maintains a worldwide set of File Transfer Protocol and web servers with the most up-to-date versions of the R environment. A commercially supported, enterprise version of R is also available from Revolution Analytics.

More specifically, R is an integrated suite of software tools and technologies designed to create custom applications used to facilitate data manipulation, calculation, analysis, and visual display. Among other advanced capabilities, it supports:

Effective data-handling and manipulation components.
Operators for calculations on arrays and other types of ordered data.
Tools specific to a wide variety of data analyses.
Advanced visualization capabilities.

S programming language designed by programmers, for programmers with many familiar constructs, including conditionals, loops, user-defined recursive functions, and a broad range of input and output facilities.

R is well suited to single-use, custom applications for analysis of big data sources.

Google Prediction API

The Google Prediction API is an example of an emerging class of big data analysis application tools. It is available on the Google developers website and is well documented and provided with several mechanisms for access using different programming languages. To help you get started, it is freely available for six months.

The Prediction API is fairly simple. It looks for patterns and matches them to proscriptive, prescriptive, or other existing patterns. While performing its pattern matching, it also "learns." The more you use it, the smarter it gets.

Prediction is implemented as a RESTful API with language support for .NET, Java, PHP, JavaScript, Python, Ruby, and many others. Google also provides scripts for accessing the API as well as a client library for R.

Predictive analysis is one of the most powerful potential capabilities of big data, and the Google Prediction API is a very useful tool for creating custom applications.

Semi-custom applications for big data analysis

In truth, what many people perceive as custom applications are actually created using "packaged" or third-party components like libraries. It is not always necessary to completely code a new application. Using packaged applications or components requires developers or analysts to write code to "knit together" these components into a working custom application. The following are reasons why this is a sound approach:

Speed to deployment: Because you don't have to write every part of the application, the development time can be greatly reduced.

Stability: Using well-constructed, reliable, third-party components can help to make the custom application more resilient.

Better quality: Packaged components are often subject to higher quality standards because they are deployed into a wide variety of environments and domains.

More flexibility: If a better component comes along, it can be swapped into the application, extending the lifetime, adaptability, and usefulness of the custom application.

Another type of semi-custom application is one where the source code is available and is modified for a particular purpose. This can be an efficient approach because there are quite a few examples of application building blocks available to incorporate into your semi-custom application:

TA-Lib: The Technical Analysis library is used extensively by software developers who need to perform technical analysis of financial market data. It is available as open source under the BSD license, allowing it to be integrated into semi-custom applications.

JUNG: The Java Universal Network Graph framework is a library that provides a common framework for analysis and visualization of data that can be represented by a graph or network. It is useful for social network analysis, importance measures, and data mining. It is available as open source under the BSD license.

GeoTools: An open source geospatial toolkit for manipulating GIS data in many forms, analyzing spatial and non-spatial attributes or GIS data, and creating graphs and networks of the data. It is available under the GPL2 license, allowing for integration into semi-custom applications.

About the Book Author

Judith Hurwitz is an expert in cloud computing, information management, and business strategy. Alan Nugent has extensive experience in cloud-based big data solutions. Dr. Fern Halper specializes in big data and analytics. Marcia Kaufman specializes in cloud infrastructure, information management, and analytics.

Find the right big data solution for your business or organization

Big data management is one of the major challenges facing business, industry, and not-for-profit organizations. Data sets such as customer transactions for a mega-retailer, weather patterns monitored by meteorologists, or social network activity can quickly outpace the capacity of traditional data management tools. If you need to develop or manage big data solutions, you'll appreciate how these four experts define, explain, and guide you through this new and often confusing concept. You'll learn what it is, why it matters, and how to choose and implement solutions that work.

Effectively managing big data is an issue of growing importance to businesses, not-for-profit organizations, government, and IT professionals
Authors are experts in information management, big data, and a variety of solutions
Explains big data in detail and discusses how to select and implement a solution, security concerns to consider, data storage and presentation issues, analytics, and much more
Provides essential information in a no-nonsense, easy-to-understand style that is empowering

Big Data For Dummies cuts through the confusion and helps you take charge of big data solutions for your organization.