In this week’s edition of Research Insights, I break down three recent papers. The first examines look-ahead bias in large language models. The second introduces a new approach to enhancing momentum strategies. The third explores how sensitive many cross-country return anomalies are to investors’ design choices.

Thanks for reading! If you enjoyed the post, feel free to like it, and consider subscribing if you haven’t already.

In This Post:

Look-ahead Bias in Large Language Models

Improving Momentum Strategies

Rethinking Alpha in Country Anomalies

Look-ahead Bias in Large Language Models

Several recent papers have found success using large language models (LLMs), such as ChatGPT, to analyze financial text, extract sentiment scores, and predict stock returns. However, a major concern is look-ahead bias: the risk that these models may rely on future information since they are trained on data that could include events occurring after the time of the analyzed text.

To address this issue, researchers have explored several strategies: restricting predictions to dates after the model’s training cutoff, masking identifiable information such as company names, or, as in a recent paper by He et al. (2025), train LLMs in a walk-forward fashion. A recent paper by Engelberg et al., titled “Entity Neutering”, introduces a novel technique that prompts the LLM itself to anonymize financial text in a way that prevents even advanced models from linking it to a specific company or time period, thereby reducing the risk of look-ahead bias.

Key Findings

Dataset & Methodology

The authors use over 900,000 news articles from the Dow Jones Archive, covering 2000–2009.

They create two versions of each article: the original and a "neutered" version where GPT-4o-mini is instructed to remove all identifying details (names, dates, products, locations).

They then test whether various LLMs (including GPT, LLaMA, and DeepSeek) can still identify the firm or year from the neutered version.

They compare the sentiment extracted from the original and neutered versions and test how well each predicts future stock returns.

Main Results

GPT correctly guesses the firm from neutered text only about 9 % of the time, and the year only about 2%, indicating the neutering process works well.

Sentiment extracted from neutered and original texts matches 90% of the time, with a rank correlation of about 70% in sentiment magnitude, suggesting most of the core message is preserved.

Return predictability remains similar: For positive sentiment: next-day abnormal returns were about 10 bps (raw) vs about 9 bps (neutered). For strong sentiment: the gap grows slightly (e.g., about 13 bps for raw vs 10 bps for neutered).

The authors estimate the upper bound of look-ahead bias in LLM-based return prediction to be around 17–38%, with stronger effects for articles with strong sentiment.

Implications for Investors

Overall, the results suggest that look-ahead bias in LLMs is generally modest when predicting returns from news sentiment, except in cases involving strongly positive or negative articles, where a meaningful bias is likely present. Importantly, the study shows that LLMs remain strong predictors of returns even when working with neutered, anonymized text. As such, entity neutering appears to be a practical, scalable, and effective solution for mitigating look-ahead bias. It enables cleaner and more credible backtesting while preserving much of the predictive power that makes LLMs valuable for sentiment analysis and return forecasting.

Improving Momentum Strategies

Momentum is arguably one of the most studied and exploited anomalies in finance. A large body of research has explored ways to enhance standard momentum strategies, for example, by scaling positions based on volatility to reduce downside risk, conditioning momentum signals on variables like trading volume, removing industry effects, or combining price momentum with news-based signals.

A recently published paper by Zeng et al., “Improving Momentum Returns Using Generalized Linear Models,” takes a different approach. On each rebalancing date, the authors estimate the probability that a stock currently classified as a momentum winner or loser will remain in that category over the next six months, using a well-established survival model.

Key Findings

Dataset & Methodology

The authors study all common stocks from NYSE, AMEX, and Nasdaq between 1980 and 2018.

Each month, they rank stocks based on their past 6-month returns and label the top and bottom deciles as “winners” and “losers.”

They then measure the “enduring time” over the next six months, which is how long these stocks remain in their winner or loser status.

Using 37 company characteristics (such as trading volume, volatility, and leverage), they apply models like the Cox Proportional Hazards model, along with other models, to estimate the likelihood that a stock stays a winner or loser.

They construct a portfolio by selecting the 10 stocks with the highest probabilities on each side (top and bottom), holding positions for 6 months, and rebalancing monthly.

Main Results

The enhanced strategy earns an average monthly return of about 2.2%, nearly double that of the traditional momentum strategy (about 1.1%).

The performance boost is largely due to better identification of losers to short, which adds significant value.

The strategy remains strong even after excluding tiny, volatile, or illiquid stocks, and holds up well across various market conditions, including downturns.

The Cox model outperforms five other commonly used models, delivering stronger and more consistent returns.

Interestingly, the authors find that only 7% of winners and 6% of losers stay in the same category for the entire 6-month holding period, helping explain why momentum strategies tend to have high turnover.

Implications for Investors

To my knowledge, the approach of combining momentum rankings with estimates of the probability that a stock will remain a winner or loser, using a survival model, is novel. Focusing on rank persistence marks a shift from traditional momentum modeling or return prediction and appears to deliver significantly stronger and more stable returns. By targeting stocks with a higher likelihood of sustained performance, the strategy filters out short-lived signals that often lead to unnecessary turnover and transaction costs.

Rethinking Alpha in Country Anomalies

A large body of research suggests that certain country traits, such as political risk, valuation, and momentum, predict stock market returns. Typically, researchers sort countries into long-short portfolios based on these characteristics to assess their predictive power. However, these studies often differ in how they build their datasets and implement strategies, including choices about which countries to include, which data sources to rely on, and how portfolios are constructed. These methodological decisions are rarely scrutinized, yet they can have a major impact on the results.

A recent paper by Cakici et al., “Lost in the Multiverse: Methodological Uncertainty in Studying Global Equity Returns,” addresses this issue. The authors test how sensitive country-level return anomalies are to a wide range of design choices. By running more than 69,000 different strategy variations, they show that many well-known global return predictors may not be as reliable or robust as previously thought.

Key Findings

Dataset & Methodology

The authors examine 15 country-level return predictors using data from six major sources, including CRSP/Compustat, MSCI, Datastream, GFD, ETFs, and futures.

They test more than 69,000 different ways of designing return strategies by varying 10 research decisions, such as: Whether to include developed or emerging markets How many portfolios to sort countries into Whether to use equal or value-weighting How often to rebalance

This approach creates over a million unique portfolios, and each strategy’s performance is tracked using monthly data from 1973 to 2022.

Main Results

Most well-known predictors (like momentum and dividend yield) only produce strong returns in a small fraction of cases. For example, momentum only works in about 10% of the tested scenarios.

Only a few factors consistently show strong performance, especially: Market size (small countries beat large ones) Political risk (riskier countries give higher returns) Issuance (countries with low equity issuance outperform) Sovereign credit risk (marginally effective)

Data source, sample selection, and portfolio construction matter a lot: Results look strongest when including smaller, emerging markets and using broader datasets like CRSP/Compustat. When focusing on more tradable assets like ETFs or futures, most patterns weaken. Equal weighting and using more granular portfolio groupings tend to boost returns, while value-weighted portfolios and a focus on developed markets generally reduce performance.

The variation caused by these research choices is almost as large as standard statistical uncertainty. This means many findings may not be as reliable as they appear.

Implications for Investors

This paper joins a growing body of research exploring how methodological choices impact the performance of investment strategies. The results offer a clear warning: many cross-country strategies may not be as robust as they seem. Investors should note that the success of these backtested strategies often depends heavily on the specific design choices made. The authors show that small tweaks, such as switching data providers or altering how countries are grouped, can flip results from profitable to unprofitable.

The most reliable and stable performance tends to come from strategies that include smaller and riskier markets. However, these markets are often harder and costlier to trade, which limits their practical use. For investors, the key takeaway is that relying on a single factor or dataset is risky. What appears to be alpha may simply reflect specific methodological assumptions. A more robust approach would involve combining multiple return predictors across different datasets and methodologies, essentially building an ensemble model, to reduce exposure to unreliable or sensitive implementation choices.

The discussion above is based on the following research papers. For full details, please refer to the original sources:

References

Cakici, Nusret, Christian Fieberg, Gabor Neszveda, Vanja Piljak, and Adam Zaremba, 2025, Lost in the multiverse: Methodological uncertainty in studying global equity returns, SSRN Working Paper 5181455.

Engelberg, Joseph, Asaf Manela, William Mullins, and Luka Vulicevic, 2025, Entity Neutering, SSRN Working Paper 5182756.

He, Songrun, Linying Lv, Asaf Manela, and Jimmy Wu, 2025, Chronologically consistent large language models, arXiv Working Paper 2502.21206.

Zeng, Hui, Ben R. Marshall, Nhut H. Nguyen, and Nuttawat Visaltanachoti, 2025, Improving momentum returns using generalized linear models, International Review of Finance 25 (2), e70014.

Disclaimer: This newsletter is for informational and educational purposes only and should not be construed as investment advice. The author does not endorse or recommend any specific securities or investments. While information is gathered from sources believed to be reliable, there is no guarantee of its accuracy, completeness, or correctness.

This content does not constitute personalized financial, legal, or investment advice and may not be suitable for your individual circumstances. Investing carries risks, and past performance does not guarantee future results. The author and affiliates may hold positions in securities discussed, and these holdings may change at any time without prior notification.

The author is not affiliated with, sponsored by, or endorsed by any of the companies, organizations, or entities mentioned in this newsletter. Any references to specific companies or entities are for informational purposes only.

The brief summaries and descriptions of research papers and articles provided in this newsletter should not be considered definitive or comprehensive representations of the original works. Readers are encouraged to refer to the original sources for complete and authoritative information.

This newsletter may contain links to external websites and resources. The inclusion of these links does not imply endorsement of the content, products, services, or views expressed on these third-party sites. The author is not responsible for the accuracy, legality, or content of external sites or for that of any subsequent links. Users access these links at their own risk.

The author assumes no liability for losses or damages arising from the use of this content. By accessing, reading, or using this newsletter, you acknowledge and agree to the terms outlined in this disclaimer.