Introduction

American Depositary Receipts (ADRs) allow US investors to trade foreign stocks. News in home-country language often precedes English translations, creating information advantage. Multilingual NLP processes news in original language, extracting signals before translation.

Multilingual Sentiment Analysis

Standard sentiment models train on English. For ADRs, need models supporting target language: Spanish (Mexico/Latin America), Mandarin (China), Japanese, Korean, etc. Use multilingual BERT (mBERT) trained on 100+ languages, fine-tune on financial news in target language.

Empirically, mBERT achieves 80-85% accuracy on financial sentiment tasks in non-English languages, competitive with English performance.

Translation vs Direct Analysis

Two approaches: 1. Translate to English, analyze. 2. Analyze directly in original language. Direct analysis is faster (no translation latency) and more accurate (translation loses nuance). For real-time trading, direct analysis is preferred.

Named Entity Recognition in Multiple Languages

Companies, executives, economic indicators must be recognized in original language. For example, Mexican news mentioning "Femsa" (FEMSA ADR ticker in US). mBERT supports NER across languages; fine-tuned models achieve 85-90% accuracy for company name extraction.

Topic Modeling for Multilingual News Streams

Process news in parallel from English, Mandarin, Spanish, Japanese sources. Use mBERT embeddings to align topics across languages: "贸易战" (Chinese trade war) aligns with English "trade tension". This reveals global themes before English sources report.

Case Study: Alibaba (BABA) ADR

Alibaba news breaks in Chinese first, English later (12-24 hour lag). Using Mandarin NLP:

  • Detected regulatory announcement 18 hours before English translation
  • Extracted sentiment: negative (regulatory risk)
  • ADR fell 3.2% next day; early signal enabled short position
  • Similar patterns observed for Tencent, Pinduoduo ADRs

Handling Code-Switching

Financial discussions often mix languages. Chinese traders discussing US companies might write: "This company很好,but valuation太高" (mixing Chinese and English). Models must handle code-switching; mBERT handles this reasonably well but struggles with short code-switched phrases.

Building Multilingual Financial Vocabulary

Financial terminology differs across languages. "Hedge" in English ≠ "套保" in Chinese. Build multilingual term dictionaries, ensure models understand domain-specific vocabulary in each language.

Implementation Strategy

1. Subscribe to news feeds in multiple languages (especially ADR home countries).
2. Process in parallel with mBERT sentiment + NER models.
3. Normalize signals across languages using embedding similarity.
4. Aggregate signals: if Chinese, Spanish, and English news all positive, confidence is high.
5. Monitor news lag: if consistent delay between languages, exploit it.

Regulatory Considerations

Using non-public foreign language news that hasn't been translated/disclosed in English may constitute insider trading in some jurisdictions. Consult legal counsel. If news is publicly available in original language, using NLP translation is defensible.