Demystifying Big Data technologies

You hear a lot about Big Data these days. Industries from pharmaceuticals to auto-makers are using Big Data solutions to improve operational efficiencies and also to transform their business models for competitive advantage. But navigating through the technology landscape of Big Data is not easy. There are several tools and frameworks and one needs to know them all to be able to apply the right tool for the right Big Data problem. 

Accion Labs has an advantage of developing Bug Data solutions for several customers that have improved the performance of business applications by orders of magnitude – be in search, in real-time analytics or in large volume data processing. Along the way, Accion has developed a collection of frameworks, tools, accelerators and reusable components to help with the delivery of Big Data solutions. But before we dive into it, let's try to understand why Big Data problems are different from traditional software problems.

Big data challenges and unique requirements

Often the volume of data processed in Big Data solutions is very large. Legacy/Traditional architectures are just not cut out to be able to handle it. They often result Performance degradation, because of an inability to scale for volume or complexity of data. Big Data problems often also involve aggregating data from multiple new data sources such as internal data e.g. emails, documents, media files, data files, log files, sensors and public data e.g. web sites, social media, blogs, microblogs, photos, videos. Big Data solutions also demand actionable inferences. And finally, Big Data solutions would like to keep costs low. Fortunately there are several mature open source alternatives that could be deployed in cloud or clustered commodity servers. 

Following visual tries to capture complexities involved in a Big Data solution implementation:

 
bigdata_funnel.png
 

Variety, volume, velocity and complexity of data is often too much to handle for many traditional software architectures. In addition, Big Data requires analytics that is

  • Real-time i.e. short turnaround from capture to presentation
  • Provides statistical insights e.g. linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering
  • Performs text analysis for insights i.e. natural language processing, entity extraction, annotation, complex search, sentiment analysis etc.

Big data technology options

There are several technologies and tools available to tackle Big Data problems. Here are some of the tools we have used in various projects:

Technology Type Examples
------------------- ----------------------------------
Key/Value Berkley DB, MemcacheDB, Redis, Voldemort
Document-oriented MongoDB, CouchDB, Riak
Graph-oriented Neo4j (+ Gremlin, Groove)
Relational Oracle, Mysql, PostgreSQL, Greenplum, Teradata
Search Lucene/Solr, Elastic Search, Google Search Appliance
Text processing Tika, Stanbol, Mahout
Statistics & Visualization R, Gnuplot, VizQL (Tableau), Leaflet (maps)

Big data solutionS

Different verticals use Big Data in various different ways. For example, a pharmaceuticals company can use Big Data for maintaining and researching a Patent Repository. A Big Data repository can integrate patents, articles, tests, expert comments and annotations into a unified patent research repository. For these companies, who spend billions of dollars on research, gathering insights from existing patents is  a unique value Big Data solutions can bring. Now, a Telecom company which deals with a lot of consumers can use Big Data for CRM Analytics by aggregating call center records, IVR logs, email and social feeds and later analyze it for patterns and trends. Not only can they identify operational efficiency opportunities but they can also improve their customer service. An Aviation company in a similar fashion can use Big Data for Customer Loyalty application. Integration of flight schedules, ancillary services, invoicing and customer data can provide customers a single point of interface for their needs. Companies in media space can deliver an online TV offering by integrating broadcast schedules, ratings, social media feeds and user recordings for a TV anywhere platform that is uniquely customized for each user. Even companies in semiconductor industry can use Big Data for Parametric Search via integration of product catalog, application notes, web content and offer advanced search. Banking and Finance sectors are not be left behind either. They can improve their Investment Advisory services by integrating social media posts, analyst opinions, web content and trading data with search and sentiment analysis and offer a customized online investor advisor. Publishing companies too benefit by Big Data by developing a knowledge platform that requires integration of aggregated and original content with text mining, automatic and semi-automatic classification and annotation and full-text search. 

Now, the common thread among all of the above examples is that these are not hypothetical scenarios. In fact, these are actual examples of solutions that we have delivered for our customers and these are being used in real life today. Based on our experience, we have built a framework to identify Big Data solution approaches that is outlined below. 

 
bigdata_approach.png
 

As you may realize that the type of Big Data solution that you need actually depends on the business outcome that you are trying to achieve and often identifying how Big Data can be helpful for your business is half the challenge. Accion Labs offers Big Data consulting services that include road map development, architecture design as well as application development. With Accion Labs as your partner, your Big Data problem is not likely to remain big at all and you can be confident that you are putting all the data in and around your enterprise to a very good use.