LOBSTER vs TickData—Choosing the Right HFT Dataset

High-frequency trading research and algorithm development require high-quality market microstructure data. Two dominant data vendors are LOBSTER (Limit Order Book Reconstruction from Exchange) and TickData. Choosing between them (or using both) is a critical decision that affects research quality and infrastructure costs.

LOBSTER: The Standard for Academics

LOBSTER, developed by academic researchers and affiliated with major universities, reconstructs limit order books from NASDAQ exchange messages. It provides reconstructed snapshots of the order book at specified intervals (e.g., every 100 milliseconds, or every quote change).

Key advantages:

  • Clean, well-documented data with extensive quality checking
  • Reconstructed from exchange message feeds, ensuring accuracy
  • Widely used in academic research, making results easily comparable
  • Comprehensive historical coverage back many years
  • Reasonable pricing for academic institutions
  • Transparent methodology allowing verification of data quality

Limitations:

  • Limited to NASDAQ stocks (no Nasdaq futures, crypto, or international markets)
  • Pre-processed and reconstructed rather than raw feed data
  • Limited to visible order-book data (no dark pools)
  • Updates at specified intervals, not true tick-by-tick
  • Monthly data files can be large and require significant storage/compute

TickData: Commercial Comprehensiveness

TickData, now owned by Refinitiv (formerly Thomson Reuters), provides tick-by-tick data across multiple asset classes and exchanges. It includes raw trade data, quote data, and reconstructed order books.

Key advantages:

  • Covers multiple asset classes: equities, futures, FX, fixed income
  • Multiple exchange options: NYSE, NASDAQ, international exchanges
  • Includes both quote and trade data
  • Raw data available for researchers wanting to do their own processing
  • Professional-grade support and documentation
  • Integrated with institutional trading platforms

Limitations:

  • Expensive, especially for historical tick-by-tick coverage
  • Requires significant technical infrastructure to process efficiently
  • Data quality varies by market/asset class
  • Less standardized than LOBSTER, requiring custom processing
  • Corporate ownership introduces dependency on vendor relationship

Alternative Options and Combinations

Beyond LOBSTER and TickData, other sources include:

  • Exchange-native feeds: Direct feeds from NYSE, NASDAQ, CME. Most expensive but lowest latency and highest quality for those specific venues
  • Consolidated feeds: SIP (Securities Information Processor) feeds provide consolidated quotes and trades across exchanges at low cost, but lower quality than venue-specific feeds
  • Alternative vendors: Kibot, Pinnacle Data, and others offer various coverage and quality levels
  • Free/open data: Some crypto exchanges provide free tick-by-tick data; traditional markets rarely offer this

Many sophisticated researchers and firms use a combination: LOBSTER for well-studied NASDAQ stocks and initial research, TickData or exchange feeds for production systems or specialized assets.

Data Quality and Survivorship Bias

Regardless of vendor, data quality issues must be considered. Survivorship bias means delisted stocks are often missing from data, creating an optimistic performance bias in backtests. LOBSTER partially addresses this by maintaining historical records; TickData less so.

Corporate actions (stock splits, dividends) must be handled correctly. Some data vendors adjust historical prices; others do not. This affects feature engineering and backtesting accuracy.

Computational Considerations

LOBSTER data, while comprehensive, requires substantial storage and computational resources. A single month of order-book snapshots for all NASDAQ stocks can be several gigabytes. Processing millions of snapshots to extract features is computationally intensive.

TickData has similar or worse storage requirements but can be downloaded selectively (only the stocks and time periods needed), reducing storage burden.

For exploratory research, cloud-based access (available for both LOBSTER and TickData) avoids expensive local infrastructure.

Research Reproducibility

Using LOBSTER facilitates reproducibility—other researchers can use the same data source, same reconstruction methodology, and verify results. Using proprietary exchange feeds or commercial data makes reproduction more difficult.

In academia, this reproducibility advantage has made LOBSTER dominant despite its limitations. In industry, where proprietary data and methods are valuable, the tradeoff is less important.

Choosing for Your Project

Decision criteria:

  • Asset class: LOBSTER if focused on NASDAQ equities, TickData for multi-asset
  • Budget: LOBSTER cheaper for academic institutions; TickData expensive
  • Reproducibility: LOBSTER better for published research
  • Completeness: TickData more comprehensive (crypto, futures, bonds)
  • Timeliness: Both update frequently; both suitable for research

Conclusion

LOBSTER and TickData represent different approaches to microstructure data. LOBSTER's cleanliness, academic acceptance, and reasonable cost make it ideal for equity research. TickData's comprehensiveness and commercial support serve institutional needs. Understanding the tradeoffs allows researchers to make informed choices aligned with project requirements.