ESG Scores from Textual Disclosures with Zero-Shot Classifiers

Category: Data Sourcing & Alternative Data • Article #6 • Reading time: 5 minutes

Traditional ESG scores from rating agencies like MSCI, Sustainalytics, and Refinitiv often disagree significantly with each other, creating confusion for investors seeking to integrate sustainability into their portfolios. An alternative approach uses zero-shot classification—a natural language processing technique that requires no task-specific training data—to derive ESG scores directly from corporate textual disclosures, offering transparency, customizability, and rapid coverage expansion.

The Problem with Traditional ESG Scores

Research has documented remarkably low correlations (0.40-0.60) between major ESG rating providers, far lower than the near-perfect agreement among credit rating agencies. This divergence stems from differences in scope (which ESG factors are measured), measurement (how each factor is quantified), and weighting (how factors are aggregated into a composite score). For quantitative investors, this inconsistency makes ESG integration challenging.

Zero-shot NLP approaches address several of these issues by going directly to the source: the companys own sustainability reports, 10-K filings, proxy statements, and press releases. Rather than relying on a third partys interpretation, investors can define their own ESG taxonomy and apply it consistently across thousands of companies.

Zero-Shot Classification Explained

Zero-shot classification uses pre-trained language models to categorize text into predefined labels without any labeled training examples. Models like BART-MNLI or DeBERTa-v3, fine-tuned on natural language inference (NLI) tasks, can evaluate whether a given text passage entails, contradicts, or is neutral toward a candidate label.

For ESG scoring, candidate labels correspond to specific ESG themes: "carbon emissions reduction," "workplace diversity initiatives," "supply chain human rights," "board independence," and so on. The model assigns a probability to each label for every text passage, enabling automated classification at scale.

Pipeline Architecture

A typical implementation involves several stages. Document ingestion collects and parses corporate disclosures (PDF extraction, HTML scraping, or SEC EDGAR API). Text chunking splits documents into paragraph-level segments suitable for classification. Zero-shot inference applies the NLI model to each chunk against the ESG label taxonomy. Score aggregation computes company-level scores by averaging or weighting chunk-level probabilities. Finally, normalization and benchmarking calibrate scores relative to sector peers.

Building an ESG Taxonomy

The quality of zero-shot ESG scores depends heavily on the label taxonomy. Effective labels are specific enough to capture meaningful ESG dimensions but general enough that the language model can match them to diverse corporate language. A two-level taxonomy works well: broad pillars (Environmental, Social, Governance) containing 5-10 specific themes each.

For the Environmental pillar, labels might include: greenhouse gas emissions management, renewable energy adoption, water resource conservation, waste reduction and circular economy, and biodiversity protection. The Social pillar could cover: employee health and safety, diversity and inclusion, community engagement, data privacy and security, and supply chain labor standards. Governance themes might address: board composition and independence, executive compensation alignment, anti-corruption policies, shareholder rights, and audit committee effectiveness.

Advantages Over Traditional Scores

Transparency: Every score can be traced back to specific text passages, making the methodology fully auditable.
Customizability: Investors can define their own ESG priorities by adjusting the label taxonomy and weighting scheme.
Coverage: Zero-shot models can score any company that publishes textual disclosures, including small-caps and emerging-market firms typically excluded from commercial ESG ratings.
Timeliness: Scores can be updated immediately when new disclosures are published, rather than waiting for annual rating agency updates.
Consistency: The same model and taxonomy are applied uniformly across all companies, eliminating inter-rater inconsistency.

Limitations and Mitigations

Zero-shot classification measures what companies say, not necessarily what they do. Greenwashing—making sustainability claims without substantive action—is a fundamental limitation of any text-based approach. Mitigation strategies include cross-referencing textual claims with quantitative data (emissions inventories, workforce statistics), analyzing temporal consistency (do claims change without corresponding actions?), and comparing corporate claims against third-party investigative reporting.

Model calibration is another challenge. Zero-shot probabilities are not inherently comparable across different labels or document types. Calibration techniques such as Platt scaling or isotonic regression, applied using a small validation set, improve the reliability of cross-company comparisons.

Integration with Investment Processes

Zero-shot ESG scores can serve as standalone signals or as complements to traditional ESG ratings. In multi-factor equity models, text-derived ESG scores provide an independent information source that can be combined with commercial ratings for improved signal quality. In engagement-focused strategies, the ability to trace scores to specific text passages supports targeted dialogue with corporate management.

Conclusion

Zero-shot classification offers a transparent, scalable, and customizable approach to ESG scoring that addresses many shortcomings of traditional ratings. While text-based analysis cannot fully replace quantitative sustainability data, it provides a valuable complementary signal that can be rapidly deployed across large investment universes. As language models continue to improve, the accuracy and nuance of text-derived ESG scores will only increase.