How AI and Machine Learning Are Impacting the Litigation Landscape


Mike DeCesaris and Sachin Sancheti detail how expert witnesses are incorporating artificial intelligence and machine learning into their testimony in a variety of civil cases.

Artificial intelligence has long been present in our everyday activities, from a simple Google search to keeping your car centered in its lane on the highway. The public unveiling of ChatGPT in late 2022, however, brought the power of AI closer to home, making it accessible to anyone with a web browser. And in the legal industry, we are seeing the use of AI and machine learning ramp up in litigation, especially when it comes to expert witness preparation and testimony.

The support of expert witnesses has always required leading-edge analytical tools and data science techniques, and AI and machine learning are increasingly important tools in experts’ arsenals. The concept of technology being able to “think” and make decisions, accomplishing tasks more quickly and with better results than humans, conjures thoughts of a “Jetsons-like” world run by robots. However, unlike the old Jetsons cartoons of the 1960s, where flying cars were the de facto mode of transport and robot attendants addressed every need, the “futuristic” ideas around the impact of AI were not that far off from a rapidly approaching reality. In fact, as older, rules-based AI has evolved into machine learning (ML) where computers are programmed to accurately predict outcomes by learning from patterns found in massive data sets, the legal industry has found that AI can do far more than many imagined.

In the world of litigation, the power of AI and ML have been understood for years by law firms and economic and financial consulting firms. AI is ideally suited to support, qualify, and substantiate expert work in litigation matters, which formerly relied on a heavily manual process to improve the efficiency or quality of the data presented in testimony. Moreover, over the last several years, AI and ML have been used directly in expert testimony by both plaintiff and defense side experts.

Somewhat ironically, humans are at least partially responsible for driving the increased use of AI and ML in expert work as we produce ever-growing volumes of user-generated content. Consumer reviews and social media posts, for example, are becoming increasingly relevant in regulatory and litigation matters, including consumer fraud and product liability cases. The volume of this content can be overwhelming, so one familiar approach involves leveraging keywords to identify a more manageable subset of data for review. This is limiting, however, as it often produces results that are irrelevant to the case while omitting relevant results containing novel language. By contrast, ML-based approaches can consider the entire text, using context and syntax to identify the linguistic elements that most accurately indicate relevance.

To see this approach in action, consider litigation involving alleged marketing misrepresentations or defamatory statements, which require an examination of the at-issue content. The most robust analyses are systematic and objective, making them ideal for outsourcing to the noncontroversial training data and impartial models that are hallmarks of state-of-the-art AI and ML approaches.

AI and ML have also proven to be valuable tools for experts across a broad spectrum of consumer fraud and product liability matters. While some scenarios may be obvious, humans possess the creativity to adapt a solution to other use cases. Here, these novel uses include:

Domain-specific sentiment analysis – Publicly available sentiment models perform well on many problems but often fail on tasks that feature domain-specific linguistic structures. Such failure might arise when tasked with measuring the sentiment surrounding an entity in an industry whose discussion features novel or counterintuitive language. Consider a defamation suit filed by a fitness influencer. Terms like “confusion,” “resistance,” and “to failure” generally have negative connotations, but in the fitness space, are often used to describe a successful workout. Likewise, slang terms like “guns” and “shredded” mean something entirely different in the fitness context than in conventional use. In these cases, a general-purpose sentiment model may mischaracterize or overlook such language, while training a domain-specific sentiment model will provide a more accurate assessment of the sentiment contained in allegedly defamatory statements. This training process could involve gathering hundreds of thousands of user-generated reviews for industry products, and then directing a context-aware language model to predict the review score from the text. This custom model will quantify the polarity of the discussion surrounding the influencer, which can then be tracked through time and around certain critical events.

Assessing marketing influence on social media – To assess allegations that a company steered an online discussion through social media marketing, AI and ML can compare the company’s posts to those generated by unaffiliated users (earned media). This can be done using language models and text similarity metrics that quantitatively and objectively assess whether earned media immediately following the company’s posts were more like the company’s posts than either earned media preceding the posts or selected at random.

Image object detection – To assess the incidences of client logos and products appearing across images posted to social media, a custom object detection model can be trained and applied to a random sample of millions of social media images.

Public press topic modeling – To quantify the extent and timing of the public awareness of a marketing claim at issue, AI and ML can be applied to articles published in media outlets. This approach helps isolate the at-issue topic from other closely related but distinct topics. Such distinctions can then facilitate an analysis that is more narrowly focused on the claim at hand.

Multimedia characterization – Where there are allegations of product misrepresentation or improper marketing, AI and ML can characterize the nature of a company’s social media presence. A model trained on text and image content from unaffiliated but topically relevant brands can learn to distinguish content along the lines of broad brand identities (e.g., healthy vs. unhealthy, eco-friendly vs. climate-damaging). Applying such a model to at-issue social media content can quantify whether it conveys each of these brand features.

The nature of allegedly defamatory statements – Even in the presence of clearly negative statements, defamation is notoriously difficult to prove. Defendants may claim that statements were expressed not as fact but as opinion, possibility, entertainment or satire. By leveraging datasets and models that identify the degree of certainty present in natural language examples, experts can objectively measure the degree to which reasonable consumers may interpret the information as fact.

Product liability – One growing area of research concerns the quantification and isolation of specific entities referenced in a broader text. Product liability cases, for instance, may examine user-generated product reviews to identify the importance and sentiment surrounding at-issue product features. Rather than assess the review as a whole, aspect-based sentiment analysis focuses on at-issue features only, allowing for the extraction of strong indicators from nuanced or mixed reviews.

Class certification – A successful class certification challenge will demonstrate that the circumstances of putative class members were sufficiently varied to require individual treatment. Any of the methods discussed above can be taken together to quantify the heterogeneity of the at-issue materials. For example, a case concerning marketing misrepresentations may train a classifier to distinguish at-issue marketing content from content not at issue, model the topics targeted throughout multiple distinct marketing campaigns, and summarize images to demonstrate differing appeal to different consumers.

For centuries, the ability of humans to mold available resources to serve their needs has separated them from less-evolved species. We see it in all walks of life, and the above examples demonstrate it in our small corner of the world. And we will continue to see it as the availability of voluminous social media and other user-generated data continues to expand and become more complex. In its simplest terms, AI and ML are critical in helping us efficiently search through the “haystack” to find the “needle.” Those who try to find the needle by hand will inevitably be left behind.

This article was originally published by in March 2023.

The views expressed herein are solely those of the authors and do not necessarily represent the views of Cornerstone Research.


  • San Francisco

Mike DeCesaris

Vice President, Data Science Center

  • New York

Sachin Sancheti

Vice President