Some thoughts following up on Dorian's paper link that I forgot to add:
- Look-ahead bias: I believe the authors do a good job using multiple techniques to address this LLM-inherent risk. The authors rule out look-ahead bias by anonymizing financial statements by removing company identifiers, dates, and using relative time (e.g., t, t–1) and prove the model's look-ahead bias risk is minimized using reverser-engineering tests.
- Earnings focus: As far as I am concerned, the paper is only focused on estimating earnings. There are statistical techniques that where already used and outperformed human analyst output so LLMs are not a big deal in this sense. It would be great if authors follow up with a paper that also uses guidance to predict profit warnings or changes in guidance that can predict PEAD (Post-Earnings Abnormal Drift), which will really be a value-adding upgrade.
- Scarce OOS (Out-of-Sample): this is the biggest concern of the paper. The paper uses GPT4 as main model but it only uses 2023 as the only year of data out-of-sample to test it. I am aware the authors make a good point on minimizing look-ahead bias as highlighted earlier, yet this diminishes dramatically the robustness of the results presented.
Overall, we all are aware that financial markets as Complex Adaptive Systems (CAS) whose main trait is based on being made out of agents (investors, traders, etc) that learn and evolve their strategies over time based on new information and past experiences. There's a lot of reasons to be optimist about LLMs but it's also prudent to remember that CAS are prone to exhibit reflexivity and feedback: Market participants may change behaviour in response to an LLM's output and, as a result, the system changes. If a model identifies a profitable strategy and others copy it, the strategy self-destructs-a phenomenon seen in Goodhart's Law.
Common sense tells us that if everybody uses the same type of LLM model with similar training and data, the system undergoes reflexive degeneration-a form of self-defeating prophecy and systemic fragility. In other words, If everyone uses the same kind of LLM, the market doesn't become more predictable, it becomes more unstable as we have seen in the past many times with simpler models.
It's in this last point that humans can still play an important role adding value with their input to improve LLMs, so that each asset manager creates an in-house version of an industry-accepted model via domain-specific fine-tuning, proprietary reinforcement signals, and adversarial training.
By integrating the firm's analyst insights, firm-specific investment philosophy, and bespoke workflows, human experts could steer the model's behavior away from generic outputs and toward alpha-generating perspectives. This process has the potential to create an institutional LLM that reflects unique research culture, risk preferences, and decision frameworks and not just public consensus. This differentiated reasoning becomes a competitive edge that is both human-guided and machine-scaled.
------------------------------
Carlos Salas
Portfolio Manager & Freelance Investment Research Consultant
------------------------------
Original Message:
Sent: 07-01-2025 14:19
From: Florian U. Esterer
Subject: AI in investment Research
Does investment research make sense in the age of AI?
Might be behind paywall, but sounds about right.
Many discussions I have about AI is that there is still a fairly high error rate in LLM responses. I am always wondering about that answer. As investors we accept that we are not correct. Most of us are happy to have a hit rate above 55% at the end! The question is rather, what is the error rate of a human, or what is the error rate of your current process. Many processes within investment management have been automated, eg via screens. The question is if we can improve on these.
ps: if you have not looked at the paper mentioned in the article you should.
------------------------------
Florian U. Esterer
IASB Board Member
------------------------------