Sunday, September 16, 2007

Taming the World of Unstructured Data

Today's information comes in many types, shapes and sizes. It can be created, stored, shared, consumed and destroyed in myriad ways. Arguably the greatest benefit of the Internet revolution has been its ability to support the near-instantaneous dissemination of information, which often results in the creation of even more information. A recent example can be found in the phenomenon that is YouTube. By allowing users to quickly and easily create content (in YouTube's case, video) incorporating limitless contributions from others, the amount of sheer content being produced is exploding. Just how rapidly are these changes occurring? The most popular form of communication for many of today's workers - email - simply did not exist in a commercial sense 20 years ago.

Video and email have something else in common beside the rate at which they are being created: both are forms of unstructured or semistructured data. As opposed to structured data, which typically resides in a tightly controlled application, unstructured data is masses of (usually) computerized information that either does not have a data structure or has a data structure that is not easily readable by a machine. This latter factor has traditionally made unstructured data highly challenging to deal with in large quantities. Without the ability to automate the indexing, storage and handling of unstructured data, there is simply no effective way for computers to keep track of what is in each piece of unstructured data. For the average consumer, YouTube user or even rank-and-file employee, this does not present much of a problem because they can simply view a video or read an email and know what it is about. But today's businesses are experiencing significant heartache - and expending a great deal of money - in order to address the problem.

At roughly the same time information volumes (especially those of unstructured information) were exploding, two seemingly unrelated trends also gained critical momentum: compliance and e-discovery. Stringent statutory oversight of business was the byproduct of the morally bereft excesses of the Internet boom; names like Enron, WorldCom, HealthSouth, Adelphia and Tyco led to Sarbanes-Oxley, numerous Securities & Exchange Commission rules and many other compliance requirements. Corporations large and small are now required to institute extensive controls to ensure that (among other things) the data within their networks is known, tracked and accounted for at all times. Failure to implement and maintain such controls could lead to increased scrutiny, fines and, probably most harmful, bad publicity. Corporations now have to prove they are playing by the rules - but in order to do so, they need to be able to get a handle on the exploding volumes of data appearing on their networks every day.

To Continue Reading: Click Here
----------------------------------------
Source: DMReview.com
By Craig Carpenter

No comments: