Multilingual NLP for ADR Trading—Handling Dual Language News
Introduction
American Depositary Receipts (ADRs) allow US investors to trade foreign stocks. News in home-country language often precedes English translations, creating information advantage. Multilingual NLP processes news in original language, extracting signals before translation.
Multilingual Sentiment Analysis
Standard sentiment models train on English. For ADRs, need models supporting target language: Spanish (Mexico/Latin America), Mandarin (China), Japanese, Korean, etc. Use multilingual BERT (mBERT) trained on 100+ languages, fine-tune on financial news in target language.
Empirically, mBERT achieves 80-85% accuracy on financial sentiment tasks in non-English languages, competitive with English performance.
Translation vs Direct Analysis
Two approaches: 1. Translate to English, analyze. 2. Analyze directly in original language. Direct analysis is faster (no translation latency) and more accurate (translation loses nuance). For real-time trading, direct analysis is preferred.
Named Entity Recognition in Multiple Languages
Companies, executives, economic indicators must be recognized in original language. For example, Mexican news mentioning "Femsa" (FEMSA ADR ticker in US). mBERT supports NER across languages; fine-tuned models achieve 85-90% accuracy for company name extraction.
Topic Modeling for Multilingual News Streams
Process news in parallel from English, Mandarin, Spanish, Japanese sources. Use mBERT embeddings to align topics across languages: "贸易战" (Chinese trade war) aligns with English "trade tension". This reveals global themes before English sources report.
Case Study: Alibaba (BABA) ADR
Alibaba news breaks in Chinese first, English later (12-24 hour lag). Using Mandarin NLP:
- Detected regulatory announcement 18 hours before English translation
- Extracted sentiment: negative (regulatory risk)
- ADR fell 3.2% next day; early signal enabled short position
- Similar patterns observed for Tencent, Pinduoduo ADRs
Handling Code-Switching
Financial discussions often mix languages. Chinese traders discussing US companies might write: "This company很好,but valuation太高" (mixing Chinese and English). Models must handle code-switching; mBERT handles this reasonably well but struggles with short code-switched phrases.
Building Multilingual Financial Vocabulary
Financial terminology differs across languages. "Hedge" in English ≠ "套保" in Chinese. Build multilingual term dictionaries, ensure models understand domain-specific vocabulary in each language.
Implementation Strategy
1. Subscribe to news feeds in multiple languages (especially ADR home countries).
2. Process in parallel with mBERT sentiment + NER models.
3. Normalize signals across languages using embedding similarity.
4. Aggregate signals: if Chinese, Spanish, and English news all positive, confidence is high.
5. Monitor news lag: if consistent delay between languages, exploit it.
Regulatory Considerations
Using non-public foreign language news that hasn't been translated/disclosed in English may constitute insider trading in some jurisdictions. Consult legal counsel. If news is publicly available in original language, using NLP translation is defensible.