Licensing Pitfalls: Legal Aspects of Scraping and Reselling Data
Introduction
Scraping websites for alternative data and reselling it to other traders seems straightforward: extract publicly visible data, sell it to market participants who value it. However, the legal landscape is complex. Website terms of service, copyright law, data protection regulations, and proprietary data claims create significant risks. This article explores the legal challenges of data scraping and licensing.
Website Terms of Service and Contract Law
Websites explicitly restrict scraping in their terms of service. Amazon's terms prohibit automated data collection. LinkedIn forbids scraping and reselling member data. Twitter restricts API usage and commercial redistribution. These aren't merely guidelines—violating them can constitute breach of contract.
Legal Status of ToS Violations
Courts have inconsistently evaluated whether website terms of service restrictions are enforceable. In some cases (hiQ Labs vs LinkedIn), courts have ruled that publicly accessible data can be scraped despite ToS prohibitions. In other cases (OpenTable class-action), courts have upheld ToS restrictions. The ambiguity is dangerous: you might legally scrape Twitter data, but face expensive litigation to establish this right.
Corporate Websites and IR Data
Company investor relations websites present a gray area. Scraping press releases, earnings transcripts, and financial statements is technically breach of contract (violates ToS) but practically tolerated—companies benefit from analysts reading this data. Risks are low but non-zero.
Copyright and Fair Use Complications
When you scrape content (text, images), you potentially infringe copyright. Company earnings transcripts are copyrighted. News articles are copyrighted. Satellite imagery is copyrighted. Simply scraping doesn't create a license to resell.
Fair Use Doctrine
U.S. copyright law includes a "fair use" exception: limited copying for research, criticism, commentary is permitted. However, commercial scraping and reselling to third parties is generally not considered fair use. The factors courts evaluate:
- Purpose: commercial use weighs against fair use; research/educational use weighs in favor
- Nature of work: factual data (stock prices) is less protected than creative work (artwork)
- Amount used: copying entire works weighs against fair use; using small excerpts for quotes supports it
- Market effect: does scraping harm the original creator's commercial interests?
For financial data scraping, commercial purpose (trading, selling data) weighs heavily against fair use, even if the underlying data is factual.
Data Protection Regulations (GDPR, CCPA, etc.)
If scraped data contains personal information (even pseudonymized), data protection regulations apply. GDPR (Europe) and CCPA (California) restrict collection, processing, and sharing of personal data without consent.
GDPR Implications
If you scrape personal data (names, emails, addresses) from websites, you're a "data controller" under GDPR. You must: establish legal basis for processing (explicit consent, legitimate interest, or legal obligation), maintain records of processing, allow individuals to access/delete their data, report data breaches.
Scraping personal data without consent is generally illegal in GDPR jurisdictions unless you have a narrow "legitimate interest" basis—and "selling data to traders" likely doesn't qualify.
CCPA and Comparable U.S. Regulations
California's CCPA gives consumers rights similar to GDPR. Scraping and selling personal information of California residents without proper disclosures violates CCPA. Other states (Virginia, Colorado, Connecticut, Utah) have passed similar laws.
Proprietary Data and Conversion Doctrine
Courts can treat unauthorized scraping and resale as "conversion"—wrongful taking of property. Companies have pursued conversion claims against scrapers of proprietary data (customer lists, pricing data). If you scrape competitor data (prices, product lists) and resell to others, you might face conversion suits.
Safe Approaches to Data Sourcing
Licensed Data from Providers
The safest approach: pay for data from authorized providers (Bloomberg, Reuters, FactSet, SEC EDGAR API). These providers have licensed rights to distribute data. You receive licenses clarifying what you can do with the data (use internally vs resell to clients).
Public Domain and Open Data
Some data is explicitly public domain: U.S. government data (NOAA, USDA, Census data) is free for any use. Use it. Some data uses permissive licenses (Creative Commons, open-source licenses) allowing commercial use. Confirm licenses before use.
Explicit Consent and Commercial Agreements
If scraping from a website, get permission. Contact the website owner, explain your use case, negotiate a licensing agreement. Many companies will grant scraping rights if you're valuable customer or if you share revenue.
Anonymization and Aggregation
Scraping aggregate data (website traffic statistics, sector-level sales trends, anonymized survey results) faces lower legal risk than scraping personal data or proprietary secrets. Focus on aggregation to reduce legal exposure.
Risk Assessment Framework
Evaluate legal risk for any scraping project:
- Type of data: factual public data (low risk), personal data (high risk), proprietary secrets (very high risk)
- Source: government sites (low risk), company sites (medium risk), competitors (high risk)
- Usage: internal analysis (low risk), reselling to clients (high risk)
- Scale: small volume (low risk), massive volume replacing original service (high risk)
- Competition: no overlap with source (low risk), direct competition (high risk)
High-risk combinations warrant legal review before proceeding.
Compliance Documentation
Even legal scraping should be documented. Maintain records of: data sources, licenses obtained, terms of service reviewed, consent obtained, dates of collection. If challenged, documentation demonstrates reasonable care in compliance.
Conclusion
Scraping and reselling data is legally perilous. Website terms of service, copyright law, data protection regulations, and conversion doctrine all create exposure. The safest approach is licensing data from authorized providers or using explicitly public/open-source data. If scraping directly from websites, consult legal counsel, obtain written permission, and limit resale. The temptation to scrape-and-sell cheaply sourced data must be weighed against significant legal risk.