Information Lifecycle Management for large data warehouses
Jim Barksdale, former Netscape CEO once said, “If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”
In this modern era of data, Data Warehouses are key to decision making in most of the businesses. Maintaining these systems in good health is very critical to the success of an enterprise. Every business makes decisions based on historical data that is specific to a period of time. Determining the rate at which data is refreshed in these systems, the expiry date of data in the data warehouse & strategies on handling expired data in the data warehouse are collectively known as Information Life Cycle Management strategy (ILM) for Data Warehouse. A standard definition of ILM refers to – a process for managing information through its lifecycle, from conception until disposal, in a manner that optimizes storage and access at the lowest cost.
Every Business unit in each enterprise has varied ILM needs. Therefore, It is one of the fundamental duties of a Data warehouse architect to set up an ILM strategy. Let us now examine the definitions and importance of the three critical aspects of ILM to data warehousing environment –
- Refresh Rate or Frequency – This will determine how updated the data is for the decision making process. This rate is determined based on the frequency of the business processes that generate the data in the ecosystem and the duration is taken to perform the refresh of the data warehouse. It is a general practice to refresh data warehouse on a daily basis but it varies from case to case basis.
- Data Retention Policy – This will determine how much back in time will data need to be retained in its most granular format and at other levels of per-aggregation as well.
- Handling expired data – This will determine if data needs to be retained in any specific format for any audit/compliance requirements and the approach to be compliant.
Below mentioned are a few standard practices followed in order to perform such maintenance activities.
- Persisting the tables of the Staging area in the Data Warehouse
- Horizontal Partitioning of Fact Tables
- Usage of Mini Dimensions
- Designing Aggregate Fact Tables
- Archiving the dimension records that have no Fact coverage.
- Regular maintenance of Indexes
These activities help keep the data warehouse system healthy and to provide relevant and current data to enable effective decision making.