Overcoming Sentiment Analysis Challenges with Machine Learning

When utilized by highly skilled data scientists to engineer a rightsized solution for a given application, sentiment analysis can be a highly effective tool in litigation.

What is sentiment analysis?

Sentiment analysis produces estimates of the attitude or tone present in a natural language excerpt. Often these estimates will fall along a single dimension, a continuum from purely negative to purely positive. A more complex form of sentiment analysis, known as emotional analysis, may generate estimates along multiple, more nuanced emotional dimensions like anger, joy, or embarrassment.

Sentiment analysis applications in litigation

Sentiment analysis provides valuable insights when applied in litigation. For example, the impact of alleged marketing misrepresentations may be measured by the change in public sentiment toward a product prior to and following the allegedly misleading marketing campaign. Similarly, analyses in defamation matters may measure public sentiment toward an entity before and after the allegedly defamatory statements appeared. Sentiment analysis can also provide an objective measure of the sentiment contained in the allegedly defamatory statements themselves. For matters concerning the quality or defectiveness of specific product features, sentiment analysis can provide an assessment of the average consumer’s perception of the quality of at-issue features relative to other product features or similar features of competing products.

Sentiment analysis data sources

Sentiment analysis can be applied to any text content, but user-generated content is the most abundant and commonly used. This content originates on social media sites, retail sites that host consumer reviews, and other forums for public discussion. Additional content, like marketing materials or internal communications, may also be evaluated using sentiment analysis. The smaller volume generally associated with these data types may limit the value of efficient programmatic sentiment analysis, but the objectivity provided by such an analysis can be beneficial.

While the most common form of sentiment analysis is text, methods also exist for detecting sentiment in audio and video. These methods supplement the content of subject statements with additional features, like the tone of voice and facial movements. Extratextual information can increase the accuracy of the analysis, especially in the context of linguistic complexities like sarcasm.

Sentiment analysis approaches

Several approaches to sentiment analysis exist, and the discipline remains an area of active research.

Lexicon

The simplest and oldest form of sentiment analysis is lexicon-based. In this approach, the researcher obtains or creates a list of terms associated with negative and positive sentiments. The researcher then identifies the number of positive and negative terms in each text. These counts may be aggregated into a single comprehensive score normalized by the length of the text or the number of relevant terms.

Lexicon-based, with rules

Sentiment analyses based on a lexicon alone are simple to explain, but they need to be improved. Specifically, purely lexicon-based approaches ignore the context that could significantly impact the interpretation of relevant terms. In response, approaches evolved to combine carefully curated lexica with additional rules to modify the sentiment according to the surrounding terms and style. These rules adjust for punctuation, capitalization, intensifiers (e.g., “very”), and negation.

Custom classifier models

General purpose lexica may perform adequately in many different environments but fail in the presence of highly domain-specific language. Such language may be completely absent from general purpose lexica, and certain terms may have opposite connotations from their more typical uses. A custom sentiment classifier model trained on language from a similar domain can learn these distinctions in these situations.

In the example below, slang terms like “swole” are missing from the general-purpose lexicon, and terms with positive connotations in the fitness space, like “ripped,” “shredded,” and “failure,” express negative sentiment more generally. By training a model on fitness-specific language, the custom classifier is able to properly identify the relevance and sentiment associated with these terms.

Aspect-based sentiment with dependency parsing

Basic custom sentiment classifiers perform well on short, consistent text. In many real-world use cases, however, text subject to sentiment analysis often contains multiple individual sentiments directed toward different aspects. For example, a product review of a cell phone might express a strong positive sentiment toward the camera and screen and a strong negative sentiment toward the battery and responsiveness. Analyzing such a review in its entirety might inappropriately result in an overall neutral sentiment, as the sentiment toward one feature negates the opposite sentiment toward another.

Some aspect-based sentiment models resolve this issue by parsing and diagramming each text to identify the language most closely related to each aspect, then analyzing that language separately. Aspects may be tagged manually, or complementary models can extract aspects automatically.

In assessing only the most relevant language, these models can identify multiple potentially conflicting sentiments toward different aspects that an analysis of overall sentiment might combine and incorrectly consider neutral.

Aspect-based sentiment with attention

Rather than parse formal linguistic syntactical dependencies, recent state-of-the-art language models use attention to learn the strength of the relationships between terms. This attention component is fundamental to transformer-based large language models like Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-trained Transformers (GPT) derivatives. This structure’s capacity to efficiently remember complex dependencies over long sequences contributes to the unprecedented accuracy of these models.

Applying this attention concept to aspect-based sentiment analysis further enhances model accuracy. The matrix below depicts an example of attention between each term in a review.

As shown, by using attention, the model tracks relationships through several levels of noun-adjective and pronoun-antecedent pairs.

Sentiment analysis in practice

With several options for sentiment analysis approaches, each with their own advantages and disadvantages, selecting the best sentiment analysis method must include careful consideration of the task at hand. The simplest models may need to perform more adequately on lengthy, complex text with specialized vocabulary. For simple texts, state-of-the-art models may require too much time to implement, be too costly to customize, and be too complex for lay audiences to comprehend and trust. An efficient and effective sentiment analysis requires a deep understanding of these different model mechanisms and their corresponding trade-offs. When utilized by highly skilled data scientists to engineer a rightsized solution for a given application, sentiment analysis can be a highly effective tool in litigation.

Cookie	Duration	Description
AWSELB	session	Associated with Amazon Web Services and created by Elastic Load Balancing, AWSELB cookie is used to manage sticky sessions across production servers.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
_cfuvid	session	The _cfuvid cookie is used to allow the Cloudflare WAF to distinguish individual users who share the same IP address. Visitors who do not provide the cookie are likely to be grouped together and may not be able to access the site if there are many other visitors from the same IP address.
cf_clearance	1 year	The cf_clearance cookie is used by Cloudflare to verify that visitors have successfully passed a security challenge and can access the website.
PBSECURESUSID	session	This cookie is set by the provider Podbean. This is a session cookie used to verify that the users are on secure sessions. It helps iin implementing audio files on the website.
wpEmojiSettingsSupports	session	WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gat_UA-*	1 minute	Google Analytics sets this cookie for user behaviour tracking.
_gat_UA-12672498-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
vuid	2 years	Vimeo-generated ID used for generating analytics information for the video owner.

Cookie	Duration	Description
_guid	90 days	linkedin.com - Used to identify a LinkedIn Member for advertising through Google ads - LinkedIn
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg	session	.linkedin.com - Indicates the start of a session for Adobe Experience Cloud - Adobe
AnalyticsSyncHistory	30 days	.linkedin.com - Used to store information about the time a sync took place with the lms_analytics cookie - LinkedIn
bcookie	1 year	.linkedin.com - Browser Identifier cookie used for diagnostic purposes. - LinkedIn
dfpfpt	2 years	.linkedin.com - Unique user identifier to prevent abuse in payment workflows for LinkedIn - LinkedIn
fptctx2	session	.linkedin.com - Used to prevent abuse in payment workflows for LinkedIn - Microsoft
gpv_pn	6 months	.linkedin.com - Used to retain and fetch previous page visited in Adobe Analytics - Adobe
lang	session	.linkedin.com - Used to remember a user's language setting to ensure LinkedIn.com displays in the language selected by the user in their settings. - LinkedIn
li_gp	1 year	.linkedin.com - Stores privacy preferences for guests to LinkedIn - LinkedIn
li_sugr	90 days	.linkedin.com - Used to make a probabilistic match of a user's identity - LinkedIn
liap	1 year	.linkedin.com - Used by non-www.domains to denote the logged in status of a member - LinkedIn
lidc	24 hours	.linkedin.com - To facilitate data center selection - LinkedIn
lms_ads	30 days	.linkedin.com - Used to identify LinkedIn Members off LinkedIn for advertising - LinkedIn
lms_analytics	30 days	.linkedin.com - Used to identify LinkedIn Members off LinkedIn for analytics - LinkedIn
s_cc	session	.linkedin.com - Used to determine if cookies are enabled for Adobe Analytics - Adobe
s_fid	180 days	.linkedin.com - Unique identifier for Adobe Analytics - Adobe
s_ips	session	.linkedin.com - Tracks percent of page viewed - Adobe
s_plt	session	.linkedin.com - Tracks the time that the previous page took to load - Adobe
s_ppv	session	.linkedin.com - Used by Adobe Analytics to retain and fetch what percentage of a page was viewed - Adobe
s_sq	session	.linkedin.com - Used to store information about the previous link that was clicked on by the user by Adobe Analytics - Adobe
s_tp	session	.linkedin.com - Tracks percent of page viewed - Adobe
s_tslv	6 months	.linkedin.com - Used to retain and fetch time since last visit in Adobe Analytics - Adobe
UserMatchHistory	30 days	linkedin.com - Used for id sync process. It stores the last sync time to avoid repeating the syncing process in a frequent manner - LinkedIn