© 2012 Winston & Strawn LLP
eDiscovery and Technology Assisted Review: What You Need to Know Now Brought to you by Winston & Strawn’s eDiscovery & Information Management Practice Group
© 2012 Winston & Strawn LLP
Today’s eLunch Presenters
John Rosenthal
Chris Costello
Chair, eDiscovery & Information Management Washington, DC
eDiscovery & Information Management New York
[email protected]
[email protected]
© 2012 Winston & Strawn LLP
3
Welcome!
© 2012 Winston & Strawn LLP
4
Overview
Technology Assisted Review (“TAR”)
Definitions Types of technology assisted review Predictive Coding
Proposed Changes to the Federal Rules
Why the need for further changes in the rules Overview of the Rules Process Two-track approach of the Advisory Committee Rule 37 Duke Conference – Rules 16 and 26
© 2012 Winston & Strawn LLP
5
Let’s Look at the Numbers
94% of respondents found the cost of e-discovery “frustrating” 87% of respondents used an early case assessment to try to resolve matters earlier 81% of respondents brought software in-house, which helps to cut costs on law firm or service provider fees 52% of respondents brought staff in-house to help reduce fees spent on law firms or service providers 32% of respondents used clustering or visualization tools to speed review along (down from 34% in 2010) 71% of respondents used contract attorneys for legal review (down from 77% in 2010) 61% of respondents were able to quantify how much money they spent on e-discovery. Many companies are still unaware of their spending habits. 42% of respondents have a tool to collect and preserve data from the cloud or from social media FTI Survey of 31 In-House General Counsels
6 © 2012 Winston & Strawn LLP
6
E-Discovery Spend
Fulbright Annual Litigation Trends Survey
7 © 2012 Winston & Strawn LLP
7
Electronic Document Review
Excessive and unpredictable costs:
Traditional document review is not accurate:
58 % to 70 % of total litigation costs Document review costs are rising due to the increasing amount of electronic information Evidence suggests that there are high error rates in linear manual review Error rates lead to likelihood of inadvertent production of privileged or sensitive information
Inability to defend the review process:
Judges are increasingly focusing on the need for validation of review processes
© 2012 Winston & Strawn LLP
8
8
Traditional Electronic Document Review = Linear Review
Over collection Little or no culling Ad hoc use of Boolean searches Linear review of the data set Use of traditional associate work force to perform review
Traditional Approach
Manually Acquire Broad Amounts of Data
Process Data
First Level Review Second Level Review
Produced Documents
© 2012 Winston & Strawn LLP
9
Goals of ESI Review
Recall - Identification and prioritization of relevant material Precision - Elimination of irrelevant/non-responsive material Identification of privileged material
Relevant Data Non-relevant and retrieved
Relevant and not retrieved
Retrieved Data © 2012 Winston & Strawn LLP
Relevant and10 retrieved
Accuracy of Human Review
Recall Number of responsive documents retrieved Total number of responsive documents in the collection
Precision Number of responsive documents identified Total number of documents retrieved
© 2012 Winston & Strawn LLP
11
Accuracy of Human Review Perfection
100%
TREC
90%
Blair & Maron (1985) 80%
Voorhees
Precision
70%
Roitblat
60% 50% 40%
TREC TREC
30% 20%
TREC 10% 0% 0%
10%
20%
30%
40%
50%
60%
Recall © 2012 Winston & Strawn LLP
12
70%
80%
90%
100%
The Sedona Conference Commentary on the Use of Search and Information Retrieval “[T]here appears to be a myth that manual review by humans of large amounts of information is as accurate and complete as possible – perhaps even perfect – and constitutes the gold standard by which all searches should be measured. Even assuming that the profession had the time and resources to continue to conduct manual review of massive sets of electronic data sets (which it does not), the relative efficacy of that approach versus utilizing newly developed automated methods of review remains very much open to debate.” 13 © 2012 Winston & Strawn LLP
13
2011 RAND Study re E-Discovery “Taken together, this body of research shows that groups of human reviewers exhibit significant inconsistency when examining the same set of documents for responsiveness under conditions similar to those in large-scale reviews. . . . Human error in applying the criteria for inclusion appears to be the primary culprit [regarding the lack of accuracy], not a lack of clarity in the document’s meaning or ambiguity in how the scope of the production demand should be interpreted. In other words, people make mistakes, and based on the evidence, they make them regularly when it comes to judging relevancy or responsiveness.” 14 © 2012 Winston & Strawn LLP
14
Ralph Losey Revised EDRM Model
© 2012 Winston & Strawn LLP
15
Document Review Models Outsourced Manual Review
Technology Assisted Reviews
• Most prominent model used today • Limited culling and analysis • Heavy reliance on attorney review • Use of sampling to ensure quality control
• Process approach to review to increase efficiency, recall and precision, using legally accepted tool sets: • Threading • Near-Duping • Advance search • Clustering
© 2012 Winston & Strawn LLP
16
Predictive Coding • Great deal of confusion regarding what it means • Uses attorneys to develop a seed set of data that can be fed into a black box to find similar documents • Emphasizes sampling of inclusion set and exclusion set • Only a handful of courts have addressed its use
Technology Assisted Review
Meta-Data Context
Boolean Queries
Wildcard expansions
Proximity Specification
Misspellings/Fuzzy Search
Synonyms
Dupe and Near Dupe
Threading
Concept/clustering engines
LSI, LSA, PLSA
Predictive coding
© 2012 Winston & Strawn LLP
17
Technology Assisted Reviews Analytical
Collection
Processing, Filtering and Culling
Non-Linear Review
© 2012 Winston & Strawn LLP
• Working with client and data to develop a set of defensible “relevance criteria” to select data subject to review
• Use of search and retrieval at the front end can dramatically reduce the volume and cost • Risk consideration • Employ more sophisticated processing tools to further reduce the volume set • Unilaterally vs. negotiate
• Clustering/Concepting • Threading • Near Dupe • Predictive Coding
18
E-Mail Threads • 70% of production is eMail and of that nearly 65% or more are part of e-mail threads
Less Time
Less Errors The Problem:
eMail Threads – Step 1
eMail Threads – Step 2
No clear method to identify eMail threads
Group into eMail sets
Build tree structure Identify missing links
eMails are reviewed multiple times and inconsistently
Suppress duplicates Focus on inclusives
Extremely difficult to identify where missing eMails exist © 2012 Winston & Strawn LLP
Source: Equivio
19
Less Cost
Duplication and Near Duplication • 15% to 40% of document population are duplicates or near duplicates
Less Time Near-Duping – Step 2
The Problem:
Near-Duping – Step 1
No clear method to organize and allocate documents across reviewers
Group the near-duplicates Assign near-dupe sets for coherent review to Identify the differences reviewers among the near Reviewers prioritize and duplicates review only the differences
Documents are reviewed multiple times by different reviewers
Apply coding to entire near-dupe sets where appropriate
High risk of different coding among similar documents © 2012 Winston & Strawn LLP Source: Equivio
20
Less Errors Less Cost
Clustering or Concepting
Concept search places a document or part of a document in this space. Results are returned in order of relevance.
higher score = closer document
• Document 1: 98 • Document 3: 92 • Document 4: 91
© 2012 Winston & Strawn LLP Source: K-Cura Corp.
21
“Predictive Coding”
© 2012 Winston & Strawn LLP
22
What is Predictive Coding Document Set for Review
Source: Servient Inc. http://www.servient.com/ © 2012 Winston & Strawn LLP
23
Use Cases for Predictive Coding
Early case assessment Relevance inclusion Relevance exclusion Pre-review tagging Pre-review batching Privileged review Review of incoming productions Internal investigations
© 2012 Winston & Strawn LLP
24
Limitations on Predictive Coding
As with any statistical model, caution should be exercised (“Torture numbers, and they’ll confess to anything”) Garbage in = garbage out Limitations: Not right for all types of cases Size matters Unable to address:
Images Graphics Excel files Video Voice
Confidentiality
© 2012 Winston & Strawn LLP
25
Do you Need to Understand the Technology?
“Muddy water is best cleared by leaving it alone.” Alan Wilson Watts
© 2012 Winston & Strawn LLP
26
Very few individuals in the industry will ever understand the technology Even fewer people would know how to attack the technology Does the technology matter? Not all TAR software is created equal Same seed set put into different TAR software will yield vastly different results
Defending the Technology?
What is the basic underlying technology? Support Vector Machines (SVM) (i.e., patterns are determined and categorized from positive examples (relevant documents) and negative examples (irrelevant documents), and new examples are classified in one category or the other based on whether these patterns appear in the new examples) Probabilistic Latent Semantic Analysis (PLSA) (i.e., documents are categorized by detecting concepts through a statistical analysis of word contexts; documents are grouped based on probabilities of the number of times words occur together) Other potential algorithms that generate correlations and categorizations What has the vendor done to explain the technology? What has the vendor done to defend the technology? How can the technology be abused or misused? What are its limitations? 2012 Technology Concepts & Design, Inc.
© 2012 Winston & Strawn LLP
27
27
Stages to a Predictive Coding Process?
Team Selection
© 2012 Winston & Strawn LLP
Culling
Iterative Training the System
Selection of Control Set
Selection of Sensitivity
Quality Control of Corpus
28 28
The Process
Who designed, implemented and supervised the process? What should your team look like?
Sr. Partner? Contract attorney?
How many people should be on the team?
© 2012 Winston & Strawn LLP
29
29
The Process
Selection of the control set?
Size? Random or targeted? Entire corpus or issue driven? Entire documents or selected portions? Richness of the data?
Training the system?
Iterations? How are conflicts resolved? Is it more important to focus on inclusive or exclusive documents?
© 2012 Winston & Strawn LLP
30
30
Stabilization criteria
Source: Equivo
© 2012 Winston & Strawn LLP
31
31
The Process
Sensitivity
Source: Equivo
© 2012 Winston & Strawn LLP
32
32
The Process
Quality control of remaining corpus
Written sampling protocol? How much do you look at? When do you need to retrain?
© 2012 Winston & Strawn LLP
33
33
Predictive Coding Decisions
Da Silva Moore v. Publicis Groupe – Case No. 11 Civ. 1279 (S.D.N.Y. Feb. 24, 2012) (Peck) “[t]he Court determined that the use of predictive coding was appropriate considering … the superiority of computer-assisted review to the available alternatives (i.e., linear manual review or keyword searches).” Global Aerospace v. Landow Aviation, Case No. CL 61040 (Va. Cir. Ct. Loudon Co. April 23, 2012), – Va. state judge approved use of predictive coding where defendant stated that it would achieve recall of 76.7%. Kleen Products v. Packaging Corp. (N.D. Ill.) (Nolan) - refused plaintiffs' request to force defendants to use predictive coding over search terms. In Re: Actos (Pioglitazone) Products Liability Litigation (W.D. La. 2012) (Doherty) – order setting forth a detailed protocol for the use of predictive coding. 34
© 2012 Winston & Strawn LLP
34
Moore v. Publicis & MSL
Judge Peck – “This judicial opinion now recognizes that computer-assisted review is an acceptable way to search for relevant ESI in appropriate cases.” “The technology exists and should be used where appropriate, but it is not a case of machine replacing humans: it is the process used and the interaction of man and machine that the court needs to examine.”
© 2012 Winston & Strawn LLP
35
Judge Peck’s Key Takeaways
Process Transparency Proportionality Cooperation Competence
© 2012 Winston & Strawn LLP
36
Kleen Products, LLC v. Packaging Corp. of America, et al., (J. Nolan) N.D. ILL, Case No. 1:10-cv-05711 (Sept. 9, 2010)
Class action antitrust action filed in 2010 Plaintiffs requested that Defendants use predictive coding (Feb. 2012) Defendants (7 paper companies)
Had already produced over 1 million docs using traditional keyword-based search terms on key custodians Thousands of hours (99% complete) No glaring omissions
3 days of hearings
© 2012 Winston & Strawn LLP
37
Global Aerospace v. Landow Aviation (J. Chamblin) Virg. Cir. Ct. (Loudoun), No. CL-61040 (Apr. 23, 2012)
Protective order — defendants can use predictive coding to process and produce documents. Explains that predictive coding meets duty under Virginia law to use reasonable inquiry and care in discovery. Contrasts predictive coding with linear human review and keyword searches. Takeaway — such opinions will dominate.
© 2012 Winston & Strawn LLP
38
In re Actos (Pioglitazone) Products Liability Litigation, MDL No. 229 (W.D. La. 2012) (Doherty)
Court issued ESI protocol utilizing predictive coding (Equivio Relevance) Select 4 custodians for creation of sample collection population; parties to select three “experts” to work together to work collaboratively to train the system. Parties to meet and confer on the relevance scores generated using sample collection and to decide on a “cutoff” score. Iterative training phase until system reaches stability Post Predictive Coding “meet and confer” to finalize method for searching for documents. Results still a long way off.
© 2012 Winston & Strawn LLP
39
Defense of Process
Legal Standard – does not exist Documentation vs. Transparency Transparency
Is it required? How much is too much?
© 2012 Winston & Strawn LLP
40
Lessons Learned
Sedona Conference Cooperation Proclamation is gaining traction among judiciary, especially as it applies to TAR/predictive coding. Discussions concerning use of predictive coding should occur early and often (e.g., disclosure of seed sets and process involved, discuss acceptable rates of recall and precision, number of iterations, etc.) Counsel needs to be cognizant of the strengths and weaknesses of the various TAR/Predictive Coding software and prepared to discuss how best to implement it. Clients should inquire as to use of predictive coding and appropriateness in case at hand and cost-saving potential. Although Predictive Coding is not appropriate in all circumstances, courts are beginning to accept its use as a means to handle high volume, complex litigation where it can serve to reduce overall costs and increase likelihood of recall and precision.
© 2012 Winston & Strawn LLP
41
Moving Forward
Expect to see more instances where Predictive Coding gets judicial stamp of approval. Use of Predictive Coding continues moving to investigations and review of documents produced by opposing party to speed reviews. Expect to see more instances where clients push for cost savings and benefits from using predictive coding. More in-depth discussions of predictive coding methodologies, proportionality, and sharing of data between counsel prior to Rule 16(f) conferences, and longer, multi-day Rule 16(f) conferences as parties try to agree on protocols implementing TAR/Predictive Coding strategies. Focus on the process and transparency of the software/Predictive Coding protocol. Increased importance of developing highly trained and experienced “experts” to develop sample/seed sets. Loss in revenue from linear review and shifting law firm approach to embrace new technologies/roles for lawyers and paralegals.
© 2012 Winston & Strawn LLP
42
Update on Federal Rule Process
Overview of Rules Process
Discovery Subcommittee (preservation – triggers, scope, sanctions) Duke Subcommittee (proportionality, cooperation, early case management)
Scope of potential amendments
Rule 1 Rule 26 Rule 16 Rule 37 Rule 45 (already in progress)
© 2012 Winston & Strawn LLP
43
Federal Rules Process
2010:
2011
American College of Trial Lawyers study Sedona Conference on Future of Civil Litigation Duke Conference on Future of Civil Litigation Call for a “mini-conference” in September Mini-conference occurs in Dallas – Sept. 9 Submissions by law firms, corporations and academia
2012
FJC Early Stages of Litigation Report Rand Report Sedona Conference Proposed Draft Rules
© 2012 Winston & Strawn LLP
44
Microsoft’s Comments
© 2012 Winston & Strawn LLP
45
Microsoft’s Comments
Preserved 48,431,250 pages Collected and Processed 12,915,000 pages Reviewed 645,750 pages Produced 141,450 pages
© 2012 Winston & Strawn LLP
46
Used 142 pages
Rule 37(e) Proposal (e) FAILURE TO PRESERVE DISCOVERABLE INFORMATION. If a party fails to preserve discoverable information that reasonably should be preserved in the anticipation or conduct of litigation, (1) The court may permit additional discovery, order the party to undertake curative measures, or require the party to pay the reasonable expenses, including attorney’s fees, caused by the failure. (2) The court may impose any of the sanctions listed in Rule 37(b)(2)(A) or give an adverse-inference jury instruction only if the court finds: (A) that the failure was willful or in bad faith and caused substantial prejudice in the litigation; or (B) that the failure irreparably deprived a party of any meaningful opportunity to present a claim or defense.
© 2012 Winston & Strawn LLP
47
Rule 37(e) Proposal (3) In determining whether a party failed to preserve discoverable information that reasonably should have been preserved, and whether the failure was willful or in bad faith, the court should consider all relevant factors, including: (A) the extent to which the party was on notice that litigation was likely and that the information would be discoverable; (B) the reasonableness of the party’s efforts to preserve the information, including the use of a litigation hold and the scope of the preservation efforts; (C) whether the party received a request that information be preserved, the clarity and reasonableness of the request, and whether the person who made the request and the party engaged in good-faith consultation regarding the scope of preservation; (D) the party’s resources and sophistication in litigation; (E) the proportionality of the preservation efforts to any anticipated or ongoing litigation; and (F) whether the party sought timely guidance from the court regarding any unresolved disputes concerning the preservation of discoverable information. © 2012 Winston & Strawn LLP
48
What Will Happen?
© 2012 Winston & Strawn LLP
49
Questions?
© 2012 Winston & Strawn LLP
Thank You.
© 2012 Winston & Strawn LLP