Fact-Checking Corporate Claims with Retrieval-Augmented Generation
Introduction
Company earnings calls and investor presentations contain forward-looking statements. Fact-checking these claims against historical data and public sources reduces the risk of being misled. Retrieval-Augmented Generation (RAG) retrieves relevant documents, then uses LLMs to evaluate claim truthfulness.
Retrieval-Augmented Generation Architecture
RAG combines three components: 1. Retrieval: given a claim, search knowledge base for relevant documents. 2. Ranking: rank retrieved documents by relevance. 3. Generation: feed claim + top documents to LLM, asking it to evaluate truthfulness based on evidence.
This is superior to naive LLM fact-checking: LLMs alone hallucinate; RAG constrains LLM outputs to documented evidence.
Knowledge Bases for Financial Fact-Checking
Build knowledge base from: SEC filings (10-K, 10-Q), historical financial statements, news archives, analyst reports. Index documents with embeddings for fast similarity search.
Key claims to check: revenue growth, cost reduction, margin expansion, market share gains, competitive positioning. Cross-reference claims against financial statements and external sources.
Extracting Claims from Documents
Use NER + NLP to extract factual claims from transcripts. Examples: "we doubled quarterly revenue", "market share increased from 15% to 22%", "we reduced operating costs by 30%". Extract claim subject, predicate, and object.
Evaluation Metrics
For each claim, evaluate: 1. Verifiability: is there sufficient data to check? 2. Truthfulness: does evidence support claim? 3. Materiality: is claim important for investment decisions?
Flag claims with low verifiability (no public data) or low truthfulness (contradicted by evidence) as risks.
Empirical Testing
Fact-check 200 earnings call claims from 10 major tech companies using RAG with SEC filing knowledge base. Accuracy (correct truthfulness assessment): 82%. False positives (claim marked false incorrectly): 12%. False negatives: 6%.
Application: Risk Adjustment
Score companies by claim quality: high-quality, well-supported claims → lower risk premium. Low-quality, unsupported claims → higher risk premium. Backtesting: adding claim-quality scores to valuation models improved prediction accuracy by 4-6%.
Implementation
Use LangChain for RAG pipelines. Embed documents (SEC filings) with Sentence-BERT. For each claim, retrieve top-5 similar documents, feed to GPT with few-shot examples, get truthfulness assessment with confidence score.
Limitations
RAG depends on knowledge base quality: missing data leads to wrong assessments. LLM hallucinations still occur; human review essential for high-stakes decisions. Some claims require domain expertise beyond public documents to evaluate fully.