Every organization has large quantities of aging unstructured file system data
Remember the sci-fi classic TV show The Twilight Zone? One recurring theme of that series was to go back in time and imagine what would have happened if some event or person had never existed. Imagine what would the world be like today if cheap storage did not exist -- in other words, what if disk drives hadn't experienced the enormous increase in capacity and decline in price of the past 20 years?
For one thing, data management practices would have evolved very differently. If you adhere to the truism that necessity is the mother of invention, data retention and purging practices and capabilities would likely be far more advanced than their current state. One downside might have been that the online mass-media revolution of iPods, YouTube, et al., might never have occurred. On the other hand for corporate IT, given that only truly valuable data would be retained, perhaps supporting e-discovery would not be the challenge it is today.
The reality is that thanks to abundant cheap disk space, we've become storage gluttons and are desperately playing catch-up in areas such as indexing and classification. Driven by the dual needs to address e-discovery liabilities and to control data run-rate costs, serious efforts are now under way within organizations to gain better control over data. While the initial starting point was e-mail, analysts have long been trumpeting both the risks and opportunities associated with unstructured data.
Every organization has large quantities of unstructured file system data, much of sitting aging and untouched with a relatively small percentage of any significant value. Anxious to follow-up on e-mail archiving successes, vendors have introduced improved versions of products that relocate file data to lower-cost storage while still providing access if or when it's ever needed. However, much of this capability is driven by metadata attributes -- file type, owner, last access, etc. -- not by intrinsic value, and therefore solves only part of the problem.
Classification based on actual content has always been problematic, but is essential to meet data liability concerns. Some organizations use dedicated document management applications, a largely user-driven manual effort that can be effective, but is often costly and complex. Over the past few years, products have been introduced to index and classify unstructured data based on content. Through the natural maturation process, these technologies have evolved to a point where they are now worth consideration, particularly in environments with significant e-discovery issues.
To Continue Reading: Click Here
By: Jim Damoulakis