In recent years, there has been significant growth in the amount of data businesses create and use. In fact, IDC in their 2017 “Data Age 2025” study estimates that data is now growing at the rate of 40% per year, a figure that doubles every two years! This growth has been driven in large part due to the potential insight and product innovation organizations can achieve by harvesting and properly consuming information.
While there is tremendous potential in data, such as new insights and information to enable new pharmaceuticals or more focused consumer products, the potential doesn’t come without clear challenges and possible liabilities. This has prompted many organizations to develop and adopt data governance programs designed to create clarity and dependability. While this is a positive step, many of the basic approaches depend heavily upon IT and storage, without adequately addressing business needs. For example, to cope with data growth an organization may focus on adding more data storage capacity to a unique or siloed application, but not consider the potential added benefits of combining the data of the siloed application in a more capable Datamart with data from other processes or areas of the firm to help their product development teams gain richer information.
In previous posts in this blog series, we discussed the necessary elements of a successful data governance program. It’s also helpful to explore data type-specific issues that can complicate – or complement – a successful data governance program.
Redundant, Obsolete and Trivial Data
Many believe the majority of data an organization captures is redundant, obsolete or trivial, including redundant “copy” data. With tremendous growth in data sources, applications, repositories, and silos, data now resides in many places. When that data isn’t tagged appropriately or updated consistently, it adds to the already significant cost of data storage and stymies efforts to effectively and efficiently identify, retrieve, and process desired data.
Data can also become either totally irrelevant or can lose its effectiveness after a certain period; it becomes stale. Without the capability to tag and purge obsolete data, an organization will incur unnecessary storage and processing costs. Obsolete data can also hamper efforts to search and retrieve information when needed. Other potential issues surrounding both redundant and obsolete data include the risk of inappropriate disclosure or mismanagement of sensitive information.
Trivial data, or information that has no significant inherent value, presents storage challenges and added overhead expense, rendering effective data searching and processing more challenging and costly. Information in this category may include social media data, blogs, texts, and other non-business-related data collected and stored by the organization. While some companies hold onto trivial data in hopes it may be useful in the future, we believe effective data governance programs should address how trivial information will be handled based on a current assessment of its value to the organization.
Data that presents some level of sensitivity and risk to your organization falls under the category of “risk data.” The amount of risk data organizations capture is growing, in part due to regulations such as those governing HIPPA data, social security numbers, sensitive financial information, criminal history records, etc.
Some data, like email communications, might not normally be considered risk data. However, it can be classified as such by legal actions or regulatory “holds” and inquiries. Organizations must have the capabilities and protocols to identify and analyze, effectively protect, and manage risk data in order to protect the organization from expense and potentially irreparable reputational damage.
An organization’s dark data is information it collects, processes, and stores as a byproduct of business processing, but isn’t used for other purposes. For example, daily production volumes on a given machine, or spreadsheets that are developed and used once, but retained. It’s estimated that dark data is growing at a rate comparable to the overall growth rate of data, and it is also exceeding the ability of organizations to effectively manage, utilize and exploit whatever potential value may be in the data, particularly given that much of dark data is unstructured .
While organizations store vast amounts of data for perceived compliance and recordkeeping requirements (or simply out of habit), much of dark data is siloed and unstructured and receives very little analysis. In a January 3, 2017, IBM Blog on data usage, they estimated that only 1% manufacturing data is analyzed. The real IT costs, lost productivity, and potential penalties associated with inadvertent disclosure of sensitive information may exceed any potential benefit of retaining dark data in the hope that it might be useful at some future point.
Effective Data Governance Programs Should Address All Types and Sources of Data
Your data governance plan should enable the preservation of data that has a reasonable probability of providing future value to your organization while facilitating and providing for handling of data with perceived lesser value, including redundant, trivial, obsolete, risk and dark data. At the same time, it should be easier for your company to mine the data with greater perceived business value.
If you have questions or wish to learn more about how to develop an effective data governance approach for your specific business and situation, Norwell Technology Group and Congruity360 can help. We’ll work with you to identify your top priorities and risks, developing cost-effective solutions designed to protect and better manage your crucial data assets – and your business. To learn more, contact Norwell Technology Group today online or call us at (877) 277-9648.