Hadoop for Fraud Analytics

Ashutosh Bijoor

Flourishing Technology Industry posses the other side of the coin as well. With an exponential rise in the secured internet usage comes the loop holes to get through those security doors. Fraud applies to all industries and affects businesses of all sizes. Online fraud is an unfortunate consequence of today’s globalized living and closely connected world.

Big Data Predictive Analysis provides better scope in detecting and capturing frauds.

There have already been many success stories that have come out of Big Data analytics. Various products like Hadoop, Lexis Nexis HPCC, Teradata AsterData, Kognito and Microsoft Dryad are available in the market and all of them have been quite successful in providing valuable insights in various industries. In this era of technical explosion where all of us are getting flooded with a deluge of data, now is the time for P&C insurers to embrace Big Data-driven analytics to fight claims fraud more effectively.

The reference architecture required to support fraud detection in this new world needs to support business-user focused big data analytics applications on top of a Hadoop based architecture.

Fraud analysis has been one of the oft quoted use cases for Hadoop. We look at the topic further to explore usage of Hadoop ecosystem products.

The fraud analytics can be divided into 3 further use cases:

Fraud detection:determining if a fraud is taking place or has occurred in the past and generating appropriate alert for it.
Fraud prevention:implementing controls and access to prevent fraud.
Fraud reduction:monitoring and predicting patterns to minimize chances of fraud occurrence

Hadoop can be used to implement some of the below listed methods to address either of the above 3 cases

Deduplication

Entity matching - This could include exact or similar matching of entities like name, father name or contact information (phone, e-mail id, street, city) or phonetic matches using the deduplication methods. Since this is a data intensive exercise and requires matching previously built index, there cannot be better technology fit than Hadoop.
Social network identity matching - Not very commonly used, but emerging off late, is a tendency to match social network profiles with customer identity. While this technique could be quite effective provided you have the right social network data feeds, please be aware of privacy laws that may be applicable.

Outlier detection

A usual outlier will be a deviation from a common usage pattern of a customer or transaction set. Using custom machine learning algorithms or available libraries, we would tend to combine data to see any outlier points. Clustering, probabilistic distributions along with visualization techniques are more common methods to derive outliers. These may be used in conjunction with techniques like path analysis, sessionization, tokenization and attribution. Regression, co-relation, averages and graph analysis may also be employed based on functional requirement.

Workflow

Transaction streaming, monitoring, alert forwarding, alert disposal and transaction blocking could be among a few steps that a custom workflow may implement in fraud management system. Considering the massive volume of transactions, a custom DSL workflow may be implemented on top of Hadoop.

Further implementation evidence is needed to see if a Rule Engine can also be built on top of a DSL framework. Overall, we expect a hybrid architecture involving engine, streams, workflow, dashboard, portal and Hadoop based analytics in a comprehensive Fraud management system. Implementations will vary based on current architecture in the organization and tool set preference.