Skip main navigation (Press Enter).

Technology and Innovation Community

View Only

Back to discussions

Expand all | Collapse all

Nanochat miniseries (Karpathy's discussion on GitHub)

1. Nanochat miniseries (Karpathy's discussion on GitHub)

Recommend
Todor Kostov

Community Champion
Posted 11-01-2026 23:37

Reply Reply Privately
Latest post from Andrej Karpathy on GitHub discussing nanochat miniseries on GitHub (available on Discord channel, as well).

discussion

A few notes from Karpathy below:

Why miniseries - The correct way to think about LLMs is that you are not optimizing for a single specific model but for a family models controlled by a single dial (the compute you wish to spend) to achieve monotonically better results. This allows you to do careful science of scaling laws and ultimately this is what gives you the confidence that when you pay for "the big run", the extrapolation will work and your money will be well spent.

Top-level comparison to GPT-2/GPT-3 miniseries

Details included for Scaling laws, Hyperparameter sweeps, GPT-2 / GPT-3 CORE scores, Miniseries v1 CORE scores

- Todor

------------------------------
Todor Kostov
Director
------------------------------

Powered by Higher Logic

Global message icon