How ~3,000 stocks become a network, and how that network becomes a trading signal.
The system watches the Russell 3000 — roughly 3,000 US stocks. Every trading day, it pulls data from several sources:
The filings get an extra pass: an LLM reads each one and extracts structured relationships between companies — who's a supplier, customer, competitor. These are cached so the LLM only runs once per filing.
Every day, the system builds a directed graph over a rolling window. Three types of edges connect companies:
If stock A has a big move on day D, and stock B has a big move shortly after, that's a co-movement event. "Big move" is adaptive per stock. If this lead-lag pattern repeats enough times in the window, a directed edge is created from A to B. The direction matters: A leading B is independent of B leading A.
When two tickers appear in the same news headline repeatedly, they get a news edge. The news cutoff is timezone-aware and anchored at market close — so a graph built on Monday doesn't accidentally include Tuesday's pre-market news.
These come from the LLM extraction step. "AAPL is a customer of TSM" becomes a directed edge with a confidence score. Only relationships above a confidence threshold make it in.
The graph is rebuilt from scratch each day, so it always reflects the most recent relationships. The key design choice: price co-movement is the backbone. News and filing edges enrich the graph with context — sentiment, supply-chain structure — but a pair of stocks only stays in the graph if they also have a co-movement edge. If two companies are mentioned in the same headline but their prices never lead or lag each other, that edge is dropped. Stocks with no co-movement connections at all are removed entirely and aren't even sent to model training. The model only sees stocks the graph says are part of an active lead-lag network.
For each stock on each day, the system extracts features from the graph. The idea: a stock's neighborhood tells you things that the stock's own price history doesn't.
Features fall into a few broad groups:
Recent observations are weighted more heavily than older ones, so the graph stays responsive to the current market regime without losing longer-term structure.
A small set of gradient-boosted classifiers, each answering a slightly different question: will this stock have a significant move up (or down) over the next few days? Separate models for different horizons and directions, combined into a single buy score via a simple formula — not an ensemble in the stacking sense, just a weighted combination of the individual model outputs.
The models are intentionally simple — the complexity budget goes into the features, not the model architecture. Early stopping on a held-out validation set, calibrated probabilities, and that's about it.
The system never trains on a single fixed split. Instead, it tiles short test windows backward from the most recent data:
The backward tiling is deliberate — it anchors the last iteration to the freshest data, so the ensemble's recency weighting puts the most weight where it matters most.
At inference time, only the last few iterations contribute — keeping too many stale models in the ensemble adds noise. Recent iterations count more via a recency half-life, so the latest model has significantly more weight than one from two weeks ago. Each iteration's individual model outputs are combined via the same formula, then the iteration-level scores are blended with the recency weighting.
The combined score produces a ranked list, but rank quality at the very top matters more than overall ranking accuracy. A second-stage learning-to-rank model re-orders the top of the list using actual forward returns as relevance labels, with heavy weighting toward the best-performing stocks.
If the reranker can't beat a simple "sort by combined score" on the test set, it's discarded and inference falls back to the base ranking. No harm done.
Every Saturday, the full pipeline retrains: feature extraction, hyperparameter tuning, model assembly, and reranker training. The new model runs a backtest against the same period the current production model was tested on. If the new model wins, it gets promoted. If it loses, nothing changes. Monday's automation picks up whichever model is current.
How well all of this actually works — and the many ways it can go wrong — is in Part 3.