Embracing Open Source Databases

Abstract

In the past few years, Accion Labs has successfully helped modernizing the client’s enterprise software arena. Accion has also helped the clients in migrating their data out of their existing databases into the new databaseplatforms. The new databases, both relational and NoSQL database, have been largely Open Source in nature.
All over the internet, there has been mixed opinion about these database migrations. This has largely been dueto lack of attention while choosing the right database platform and the approach towards migration.This whitepaper demonstrates the success stories of Accion Labs in the recent data migration projects.

Introduction

Relational Database Management Systems (RDBMS) are a common choice for the storage of information in new databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s. Relational databases have often replaced legacy hierarchical databases and network databases because they are easier to understand and use.

According to Gartner, the five leading commercial relational database vendors by revenue in 2011 were Oracle(48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%). The high cost and vendor restrictions that come with traditional relational database vendors like Oracle are driving away end users. Open source and open-source-based relational databases have matured. They’re supporting mission-critical workloads at global enterprises worldwide and their use is expanding rapidly. “Open source RDBMS have matured and today can be considered as a standard infrastructure choice for most new enterprise applications,” says Gartner in its report on open-source relational database management systems (OSRDBMS). While open-source databases like EnterpriseDB’s PostgreSQL solutions and MariaDB’s MySQL RDBMS are charting growth of 46% and 55% respectively, it’s estimated that half of commercial RDBMS users will convert to open-source by 2018.

PostgreSQL, for example, is the fourth most popular relational database in the world and its popularity has risen steadily as recent releases have enhanced usability. However, relational databases have been challenged by object databases, which were introduced in an attemptto address the object-relational impedance mismatch in relational databases, and XML databases. Given the number of choices of NoSQL and relational databases, it is important to identify the right fit for the current situation.

Case Study 1: Ever  Growing Licensing Costs For SQL Server

The client in this case study has a diverse application portfolio from home-grown CRM to a global billing platform and is a leading provider of Infrastructure-as-a-service products with a one-of-its-kind support to their customers. The data warehouse of the client comprises of a large collection of operational data stores (ODS) and a Kimball-style data warehouse hosted in Microsoft SQL Server.
Given the rate at which the company was growing organic and inorganic, the number of applications have continued to grow and so did their ODS.
At such time, the client has realized that by moving from Microsoft SQL Server to an Open Source RDBMS would reduce their license costs considerably. One another issue of the client was the pace at which the data was growing and it needed a metadata aware mechanism of scaling the database up.
Accion Labs has helped the client in choosing the right data platform for their needs, setting up their data warehouse in the new platform and moving all of their ODS and their historic data and their existing code into PostgreSQL from SQL Server.
Accion Labs has also helped the client an effective Information Lifecycle management strategy to devise the use of Cassandra and Hadoop for staging purpose and archival. Accion Labs has successfully migrated 70 ODS and 50 TB data warehouse in a record time of 12 months.

Choosing the Right Database

The big data technology landscape consists of such a large number of choices, that often the most critical step in successfully implementing a solution is choosing the right platform that will address the requirements of the problem at hand, and that is sustainable in the long term.
With such a large number of choices though, doing a feature-wise comparison of all individual platforms is just too complex and time consuming. However, it is possible to group these options based on the data models they support.
The following technology comparison matrix compares different predominant data models, their capabilities, typical applications as well as limitations:

Different Types of Database
Type Examples Capabilities Applications Limitations
Open Source Relational MySql, PostgreSQL, MariaDB TBD TBD TBD
Proprietary RDBMS Oracle, SQL Server TBD TBD TBD
Key-Value BerkleyDB, MemcacheDB, Redis, DynamoDB The simplest model where each object is retrieved with a unique key, with values having no inherent model Utilize in-memory storage to provide fast access with optional persistence Other data models built on top of this model to provide more complex objects Applications requiring fast access to a large number of objects, such as caches or queues,Applications that require fast-changing data environments like mobile, gaming, online ads Cannot update subset of a value Does not provide querying As number of objects becomes large, generating unique keys could become complex
Document Oriented MongoDB, CouchDB, Apache Solr, Elastic Search Extension of key-value model, where value is a structured document Documents can be highly complex, hierarchical data structures without requiring pre-defined “schema” Supports queries on structured documents Search platforms are also document-oriented Applications that need to manage a large variety of objects that differ in structure Large product catalogs in e-commerce, customer profiles, content management applications No standard query syntax Query performance not linearly scalable Join queries across collections not efficient
Column-Oriented Cassandra, BigTable,,HBase,,Apache Accumulo Extension of key-value model, where the value is a set of columns,A column can have multiple time-stamped versions Columns can be generated at run-time and not all rows need to have all columns Storing a large number of time-stamped data like event logs, sensor data Analytics that involve querying entire columns of data such as trends or time series analytics No join queries or sub-queries Limited support for aggregation Ordering is done per partition, specified at table creation time
Graph-Oriented Neo4J,,OrientDB,Apache,Giraph,,AllegroGraph Models graphs consisting of nodes and edges with properties (meta-data) describing them Implement very fast graph traversal operations Also support indexing of meta data to enable graph traversal combined with search queries Applications that deal with objects with a large number of inter-relations Applications like social networking friends-networks, hierarchical role based permissions, complex decision trees, maps, network topologies Difficult to scale for large data sets for generic graphs Giraph uses the Bulk Synchronous Parallel model to overcome some of the scalability limitations

About AccionLabs

Accion Labs is a Pittsburgh headquartered global technology firm specializing in working with technology firms and IT organizations in the emerging technologies such as Web 2.0, SaaS, Cloud, Open-source, BI/DW, Mobility, Automation, DevOps and Big Data. Our clients include software product firms, e SaaS firms, e-commerce organizations and e-business organizations.

With an engineering headcount of over 600 resources spread over our 10 global offices, Accion engages with its clients in a range of collaborative, white-box engagement models that includes extended teams, turn-key project and professional staffing.

We recognize that depending on your stage of growth and your business model, your needs are different. For start-up and early-stage firms, we assist with pre- and post-funding product development and deployment for both B2B and B2C software firms. This enables us to help our clientele with lower time-to-market, lower cost-of-delivery and leverage our experience of product life-cycle best-practices – while building a foundation for world class engineering organization. For mid-stage and late stage firms, we offer core product life-cycle services such as new product development, maintenance/support, QA/testing, managed services, re-engineering and others. Besides the product engineering lifecycle, we also offer a range of services for the Professional Services and Product Support organizations.

We offer a range of engagement models such as strategic consulting, value-added staffing, turn-key projects offshore leveraged extended-delivery models and a number of outcome-oriented collaborative development models.

Accion Labs is led by an entrepreneurial management team that believes in execution, outcome, continuous learning and work life balance. Accion Labs is venture-funded, privately-held with offices in multiple locations in the US, UK, India, Singapore and Malaysia.

References:

[1]    M. A. Donald Feinberg, "The State of Open-Source RDBMSs," 2015
[2]  "RDBMS," [Online]. Available: https://en.wikipedia.org/wiki/Relational_database_management_system