Introduction

Company earnings calls and investor presentations contain forward-looking statements. Fact-checking these claims against historical data and public sources reduces the risk of being misled. Retrieval-Augmented Generation (RAG) retrieves relevant documents, then uses LLMs to evaluate claim truthfulness.

Retrieval-Augmented Generation Architecture

RAG combines three components: 1. Retrieval: given a claim, search knowledge base for relevant documents. 2. Ranking: rank retrieved documents by relevance. 3. Generation: feed claim + top documents to LLM, asking it to evaluate truthfulness based on evidence.

This is superior to naive LLM fact-checking: LLMs alone hallucinate; RAG constrains LLM outputs to documented evidence.

Knowledge Bases for Financial Fact-Checking

Build knowledge base from: SEC filings (10-K, 10-Q), historical financial statements, news archives, analyst reports. Index documents with embeddings for fast similarity search.

Key claims to check: revenue growth, cost reduction, margin expansion, market share gains, competitive positioning. Cross-reference claims against financial statements and external sources.

Extracting Claims from Documents

Use NER + NLP to extract factual claims from transcripts. Examples: "we doubled quarterly revenue", "market share increased from 15% to 22%", "we reduced operating costs by 30%". Extract claim subject, predicate, and object.

Evaluation Metrics

For each claim, evaluate: 1. Verifiability: is there sufficient data to check? 2. Truthfulness: does evidence support claim? 3. Materiality: is claim important for investment decisions?

Flag claims with low verifiability (no public data) or low truthfulness (contradicted by evidence) as risks.

Empirical Testing

Fact-check 200 earnings call claims from 10 major tech companies using RAG with SEC filing knowledge base. Accuracy (correct truthfulness assessment): 82%. False positives (claim marked false incorrectly): 12%. False negatives: 6%.

Application: Risk Adjustment

Score companies by claim quality: high-quality, well-supported claims → lower risk premium. Low-quality, unsupported claims → higher risk premium. Backtesting: adding claim-quality scores to valuation models improved prediction accuracy by 4-6%.

Implementation

Use LangChain for RAG pipelines. Embed documents (SEC filings) with Sentence-BERT. For each claim, retrieve top-5 similar documents, feed to GPT with few-shot examples, get truthfulness assessment with confidence score.

Limitations

RAG depends on knowledge base quality: missing data leads to wrong assessments. LLM hallucinations still occur; human review essential for high-stakes decisions. Some claims require domain expertise beyond public documents to evaluate fully.