Satellite Imagery for Foot-Traffic Proxies: A Guide to Pre-Processing

Category: Data Sourcing & Alternative Data • Article #1 • Reading time: 5 minutes

Introduction

Satellite imagery has become a valuable alternative data source for traders and investors. High-resolution satellites can photograph parking lots, revealing how many cars are present (proxy for retail foot traffic). They can monitor port facilities, showing cargo volumes. They can track oil storage tanks, estimating inventory levels. The challenge is converting raw satellite images into tradeable signals. This requires understanding imagery resolution, temporal frequency, cloud cover, atmospheric effects, and image processing techniques. This guide walks through the practical aspects of working with satellite imagery for financial trading.

Satellite Imagery Fundamentals

Satellite imagery comes in many forms. Multispectral imagery captures data in multiple wavelength bands (visible light, near-infrared, thermal). Panchromatic imagery is single-band, high-resolution. SAR (Synthetic Aperture Radar) works through clouds (important for tropical regions or monsoon seasons). Hyperspectral imagery has many narrow bands enabling detailed material identification.

Resolution varies: freely available data (USGS Landsat, Copernicus Sentinel) has 10-30 meter resolution. Commercial providers (Planet Labs, Maxar) offer 3-0.3 meter resolution. Sub-meter resolution enables counting vehicles; 10-meter resolution shows general activity levels but not detailed objects.

Temporal frequency matters critically. Free data revisits locations every 8-16 days. Commercial daily imagery allows tracking changes between consecutive days. Weekly revisits sufficient for seasonal patterns; daily imagery needed for short-term tactical trading.

Preprocessing: From Raw Images to Usable Data

Raw satellite images require substantial preprocessing before analysis. First, atmospheric correction: satellite sensors measure reflected light modified by atmospheric effects. Algorithms like FLAASH or iCOR remove atmospheric interference, converting raw digital numbers to reflectance values comparable across time and location.

Second, radiometric and geometric correction: ensure pixels align properly across different images. Images acquired at different sun angles or sensor angles need standardization. Georeferencing maps pixels to geographic coordinates.

Third, cloud and shadow removal: clouds obscure data (making counting vehicles impossible in clouded parking lots). Algorithms detect clouded pixels and either mask them or use image inpainting to estimate clouded regions. Shadow removal corrects for non-uniform illumination.

Feature Extraction: Finding Signals in Images

Raw pixel values aren't directly tradeable. You need to extract meaningful features: how many vehicles in the parking lot? What's the water level in a reservoir? How much snow covers a ski resort?

Manual feature extraction: count cars in parking lot images by manually examining imagery and noting counts. Works for validation but doesn't scale to thousands of locations.

Classical computer vision: segment images into objects (vehicles, trees, water) using classical techniques (edge detection, morphological operations, color thresholding). More scalable but requires tuning for each specific use case and is brittle to variations in lighting, seasons, image quality.

Deep learning: train object detection networks (YOLO, Faster R-CNN) to identify vehicles, measure cargo containers, or detect other objects of interest. Requires labeled training data (images with counted objects) but once trained, generalizes across diverse conditions. Most modern applications use deep learning approaches.

Vehicle Counting Example: From Pixels to Trading Signal

Counting vehicles in retail parking lots exemplifies the full pipeline. First, obtain multispectral satellite imagery of target parking lots at consistent times (e.g., Friday afternoons). Multiple images per location improve reliability.

Preprocess images: apply atmospheric correction, align to geographic grid, mask clouds. For parking lots, you're typically working with high-resolution (1-5 meter) imagery since free data is too coarse for vehicle counting.

Train object detection model: manually label training images (draw bounding boxes around vehicles). Train a deep neural network to identify vehicles. Validate on held-out test images, assessing counting accuracy.

Extract signal: run trained model on all target parking lots. Generate time series of vehicle counts for each location. Aggregate across locations. Compare counts to historical patterns to detect anomalies (unusually high or low traffic relative to seasonal norms).

Convert to trading signal: unusually low traffic may predict weaker near-term retail sales. Construct trading rule: when satellite foot-traffic is below rolling average, underweight retail equities. Validate on historical data for time periods when satellite imagery was available.

Temporal Aggregation and Seasonality

Individual images are noisy (weather, time-of-day effects, single-day randomness). Aggregate over time to reduce noise. Daily images over 5-day weeks smooth weekend effects. Weekly aggregation smooths daily variation. Monthly aggregation necessary for longer-horizon signals.

Seasonality is critical. Retail traffic is higher during holidays than summer. Ski resort occupancy varies seasonally. Compare month-to-month or year-to-year changes, not absolute levels. Calculate z-scores relative to seasonal norms, or use seasonal decomposition to extract trend separate from seasonal patterns.

Data Quality and Validation

Cloud cover is the primary quality issue. Images obscured by clouds provide no information. Prioritize data sources with low cloud probability for target regions (seasonal data cleanliness varies).

Validate results: spot-check image-derived counts against ground truth when possible. Survey a parking lot manually to count vehicles, then compare to satellite-derived count. Discrepancies reveal systematic biases in your model.

Compare alternative sources: compare satellite foot-traffic signals to credit card spending data, Google mobility trends, or company-reported store traffic. Correlation validates that satellite signals capture real activity.

Common Pitfalls and How to Avoid Them

Pitfall 1: Cloud Bias. Cloudy regions have less available imagery, biasing analysis toward clear-weather locations. Account for data availability differences.

Pitfall 2: Sun Angle Effects. Shadows change with sun angle (seasonal and latitude dependent), affecting vehicle visibility. Normalize for seasonal and latitudinal sun angle variations.

Pitfall 3: Data Leakage. If using satellite imagery to detect anomalies, ensure imagery dates are before trading date. Don't use Friday afternoon imagery to trade on Friday morning.

Pitfall 4: Overfitting to Training Locations. Model trained on 100 parking lots might overfit to those specific locations' peculiarities. Validate on held-out parking lots.

Cost-Benefit Analysis

Satellite imagery costs vary: free imagery (Sentinel, Landsat) costs nothing but has 10+ meter resolution and 8-16 day revisit frequency. Commercial daily imagery costs $1-10 per image. Curated datasets cost $1000-100,000 monthly depending on coverage and granularity.

For a trading signal to justify costs, it must generate alpha exceeding imagery costs. A daily satellite feed at $100,000/month justifies costs only if it generates >10 basis points per month in excess returns, which requires substantial AUM.

Conclusion

Satellite imagery can generate valuable trading signals by providing ground-truth data about economic activity. Converting raw images to signals requires preprocessing (atmospheric correction, cloud removal, geometric alignment), feature extraction (object detection via classical or deep learning methods), and temporal aggregation (smoothing noise, accounting for seasonality). Successful applications require domain knowledge about what patterns matter, validation against ground truth and alternative data sources, and realistic assessment of costs relative to expected alpha generation. The frontier of satellite-derived trading signals is likely not in counting vehicles but in combining multiple satellite-derived signals (foot traffic, cargo volume, construction progress) into composite economic activity measures that predict market movements.