Electronically stored information is increasing exponentially, and bills from law firms and discovery vendors to deal with this vast sea of data escalate significantly each year. Jason Baron, the director of litigation at the National Archives and Records Administration, believes that ESI is growing so fast that even with unlimited funds and human resources it will soon be impossible for humans to review these large document populations.[FOOTNOTE 1] Still, lawyers faced with potential malpractice claims and sanctions are loath to try new methods for handling the problem. It is time for change.
The traditional means used by litigators to address ESI is the application of keywords and Boolean search terms to identify relevant and non-privileged materials.[FOOTNOTE 2] While acknowledging that this method is unquestionably deficient, a recent article published in this publication concluded that "the available evidence suggests that keyword and Boolean searches remain the state of the art and the most appropriate search technology for most cases."[FOOTNOTE 3] We agree that, in a perfect world, if the parties can nevertheless meet and confer, and agree upon keywords to reduce the population to manageable proportions, the traditional judgmental method can be made to work. However, this is an imperfect world where plaintiffs and defendants do not always agree, and are not always equally motivated, to reduce costs. In fact, it is often quite the opposite. Moreover, even where the sides use judgmental sampling to agree upon keywords, the costs nevertheless usually remain too high.
THE JUDGMENTAL APPROACH
The judgmental approach to keywords ultimately fails because of "recall" and "precision." "Recall" measures how completely a process captures target data. "Precision" measures efficiency - the amount of irrelevant data captured along with the target data. Keywords, as judgmentally used by lawyers, recall too little, while capturing much that is irrelevant. An early landmark empirical study by David Blair and M.E. Maron[FOOTNOTE 4] showed that while lawyers thought they were retrieving about 75 percent of the relevant data, the true results were more like 20 percent. A subsequent study, conducted by the Text REtrieval Conference,[FOOTNOTE 5] confirmed this result, finding that only 22 percent of relevant documents were recalled using keyword search techniques, as opposed to approximately 78 percent found by other search techniques.[FOOTNOTE 6] Many lawyers will also tell you that it is common for reviewers to find only 10 to 40 percent of the recalled documents to be relevant, meaning lawyers are reading mostly junk.