Thursday, October 23, 2008

Talking About De-Duplication

In Nursing Home Pension Fund v. Oracle, No. C 01-00988 SI, 2008 U.S. Dist. LEXIS 66740 (N.D. Calif. 2008), purchasers of Oracle Corp. stock, brought a class action against Oracle and related defendants under the Securities and Exchange Act of 1934 alleging that defendants made false and misleading statements regarding Oracle's earnings, the effects of the slowing economy on Oracle, and the functionality of a new Oracle suite of applications.

The federal district court for the Northern District of California parsed through many plaintiffs' motions for sanctions for spoliation and granted two. Of interest to e-discovery practitioners is the sanctions motion for Oracle's failure to produce no more than 15 e-mails from the e-mail box of named defendant Larry Ellison, Oracle's chief executive officer; for this the court granted the plaintiffs' "adverse-inference" instructions. What makes this motion particularly interesting is that it appears to have been granted because the defendants did something that every producer of e-discovery does: de-duplicate. The opinion thus is of great interest and concern to courts and practitioners everywhere.

WHAT IS DE-DUPLICATION?

At this point, some discussion of de-duplication may be helpful. De-duplication is the removal of exact copies of electronically stored information. Generally, e-discovery involves gathering ESI, such as e-mails, e-documents and databases, and processing them into a "review database," where the text, metadata and other fields of a file can be reviewed, searched, sorted, redacted for privilege and produced for e-discovery. To build a review database, all of the e-files must first be uniquely identified. The best way to identify each e-file uniquely is to generate a "
hash value" for each e-file. A hash value is created by using the ESI -- here, e-mails and e-docs -- as a variable in a complex "hash" algorithm. That computation will create an alphanumeric hash value that will be unique to what was hashed. Thus, if three e-files have the same hash value, they will prove to be the exact same file, regardless of how they are named.

After generating hash values for all e-files, an EDD processing application will be able to "de-duplicate," i.e., compare identical e-files to a "pivot" file and remove the additional files. A single user might have several identical e-files stored in several places, and multiple users may have stored the same e-file. De-duplication within a user, or "custodian," is often referred to as "vertical," while de-duplication across all custodians is often referred to as "horizontal" or "global."

To Continue Reading: Click Here
-------------------------------------------
Source: law.com
By: Leonard Deutchman

No comments: