Web Data Collection
Cornerstone Research is experienced in collecting and distilling large amounts of online information. We perform targeted searches and data collection, using custom scraping techniques and other proprietary and third-party resources.
- Deployed parallelized, cloud-based web data collection infrastructure, scaling work across dozens of worker nodes, and reducing run time from several days to under an hour.
- Downloaded, converted to PDF file format, and performed robust text searches across millions of SEC filings and dozens of filing form types.
With a vast user base that eclipses traditional media, social media platforms offer rich sources of data that multiply at dizzying speed. In litigation contexts, knowing how to effectively navigate, collect, and characterize such huge amounts of data is crucial.
In addition to our deep familiarity with social media data sources, our experience with machine learning and AI tools equips us to assess the relevancy and relative prominence of content and contributors.
- Built web data pipelines and automated approaches for large-scale analysis of sub-forums (subreddits) on Reddit, leveraging machine learning to score a post’s textual/context-driven relevance to topics of interest and characterize the prominence of a given post relative to other posts in the sub-forum.
- Leveraged existing sentiment models to efficiently generate large-scale, methodologically consistent sentiment scores associated with online reviews.
- Beef Products, Inc. et al. v. American Broadcasting Companies, Inc. et al.
- Facebook Inc. IPO Securities and Derivative Litigation
- Employed advanced language models to effectively distinguish homographs in tweets and generate features for a machine learning classifier. This framework facilitated the reliable and scalable detection of public awareness of alleged material omissions prior to required disclosure, despite the reliance on broad search terms with many often-irrelevant meanings.
Online Consumer Reviews
Online consumer reviews of products and services are among the most intriguing—and demanding—social media data. These reviews can be the subject of litigation; yet if employed appropriately, they can also provide a valuable source of real-world data. Cornerstone Research is skilled in evaluating these distinctive data, including assessing the relative importance of product features, changes in customer sentiment over time, and fraudulent reviews.