Technology and Innovation Community

 View Only
Expand all | Collapse all

LLMS - DeepSeek V3 Breakthrough

  • 1.  LLMS - DeepSeek V3 Breakthrough

    Posted 28-01-2025 08:20

    Hi everyone

    I have just posted in "Library" the original paper of DeepSeek V3 along with a summary and some definition of key concepts to fully understand the key takeaways of the paper. You can  also find the paper from the original github source.

    For those new to LLMs or just starting to explore them, this video provides a straightforward and accessible explanation.

    Feel free to share your thoughts on DeepSeek and what you think are the implications in the short-to-long term for investments. 



    ------------------------------
    Carlos Salas
    CIO, Co-Founder
    ------------------------------


  • 2.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 28-01-2025 08:47

    Amazing i was just coming here to look for content on this! thank you for sharing.. keen to hear thoughts from this Community! 



    ------------------------------
    Tilly Baker
    ------------------------------



  • 3.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 28-01-2025 22:25

    That's great.

    Also, just to point out that those who want to use DeepSeek without downloading it could do so by using Perplexity AI.

    Aravind Srinivas, the CEO of Perplexity AI, pointed out yesterday that DeepSeek R1 is now available on Perplexity to support deep web research. This is hosted exclusively in US & EU data centers. People just should toggle on Pro Search if they want to try it out.

    DeepSeek on Perplexity

    Below also is a quick snapshot on current benchmark input/output token costs for different AI models from Statista.

    Hope this helps.

    Best,

    Todor



    ------------------------------
    Todor Kostov
    Director
    ------------------------------



  • 4.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 29-01-2025 22:42

    Further update on this by Jonathan Ross, the co-founder and CEO of Groq and one of the designers of the TPU original chip when working at Google/Alphabet.

    video



    ------------------------------
    Todor Kostov
    Director
    ------------------------------



  • 5.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 30-01-2025 08:23

    Great video

    Some takeaways from the video for those with no time to watch it:

    DeepSeek's Breakthrough and Market Challenges

    • High-quality data distillation has been central to DeepSeek's success.
    • Data privacy concerns (storage in Chinese servers) hinder adoption by global corporations.
    • Pricing pressure on OpenAI: DeepSeek's lower-cost models challenge OpenAI, which may need to open access to retain retail users.
    • Limited impact on basic tasks: DeepSeek's model may not significantly outperform existing solutions in simple applications like text summarisation.
    • China vs. Western AI firms: Western AI companies face stricter regulations compared to China-based teams like DeepSeek.

    GenAI Development and Regulation

    • Blurred line between R&D and corporate theft when advancing GenAI models.
    • Europe lags in AI entrepreneurship and should foster a risk-taking culture to stay competitive.
    • Reliable AI for critical applications (e.g., medical diagnostics) requires low hallucination and perplexity to meet regulatory standards.

    Jevon's Paradox and the AI Growth Cycle

    • Jevon's Paradox: As AI becomes cheaper, demand increases, creating a feedback loop of growth.
    • AI development cycle:
      • Better models → More developer demand → More inference demand → More model training → Cycle repeats.

    Hardware & Infrastructure Implications

    • Nvidia and AI hardware players will benefit long-term, as AI growth is still in early stages.
    • Lower OPEX in AI development → Higher AI adoption → Increased demand for Nvidia's hardware.
    • Inference (using models) will become a bigger business than training models.
    • Stargate Project: A major bet on GenAI infrastructure to achieve dominance through scale.

    MoE (Mixture of Experts) & Model Development

    • MoE models will dominate due to:
      • Higher efficiency (fewer active parameters per task).
      • Scalability (flexible parameter usage).
      • Lower training costs and higher performance for specialised tasks.
    • Analogy: A traditional LLM consults everyone in a room; an MoE LLM consults only the experts (e.g., CFA members for investment questions).
    • Smaller, retrainable models: MoE and DeepSeek's new architecture allow smaller, easily updated models.

    AI Adoption and the Future of Work

    • More data ≠ better models: Quality over quantity is key in AI training.
    • Training scales with user base, inference scales with usage.
    • Human error tolerance varies by task:
      • Lower tolerance (e.g., car accidents, trading losses).
      • Higher tolerance (e.g., office tasks, writing errors).
      • Explains why GenAI adoption in office work is faster than in fields like autonomous driving.
    • Cybersecurity & AI fuel a new Cold War as nations leverage GenAI in cyber warfare.
    • Prompt engineering will become a key skill for the workforce to stay relevant.


    ------------------------------
    Carlos Salas
    Portfolio Manager & Freelance Investment Research Consultant
    ------------------------------



  • 6.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 03-02-2025 17:54

    To add to this -

    Dylan Patel from SemiAnalysis has done some great deep-dive work on the total potential true traning costs end-to-end.

    • total server CapEx ~$1.6B
    • operating costs for clusters ~$0.9B

    Todor



    ------------------------------
    Todor Kostov
    Director
    ------------------------------



  • 7.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 04-02-2025 07:39

    This is a great article. It's clear that there's more than meets the eye with regards the players involved and that it's no mere coincidence the model was released before the new Trump administration announced new tariffs.

    Some good quotes of the article are:

    • "MLA is a key innovation responsible for a significant reduction in the inference price for DeepSeek. The reason is MLA reduces the amount of KV Cache required per query by about 93.3% versus standard attention. KV Cache is a memory mechanism in transformer models that stores data representing the context of the conversation, reducing unnecessary computation."
    • "Estimates put algorithmic progress at 4x per year, meaning that for every passing year, 4x less compute is needed to achieve the same capability. Dario, CEO of Anthropic argues that algorithmic advancements are even faster and can yield a 10x improvement. As far as inference pricing goes for GPT-3 quality, costs have fallen 1200x."
    • "Despite concerns that Mixture-of-Experts (MoE) efficiency gains might reduce investment, Dario points out that the economic benefits of more capable AI models are so substantial that any cost savings are quickly reinvested into building even larger models."
    • "And speaking of distillation, perhaps the most interesting part of the R1 paper was being able to turn non-reasoning smaller models into reasoning ones via fine tuning them with outputs from a reasoning model. The dataset curation contained a total of 800k samples, and now anyone can use R1's CoT outputs to make a dataset of their own and make reasoning models with the help of those outputs."
    • "MLA is a key innovation responsible for a significant reduction in the inference price for DeepSeek. The reason is MLA reduces the amount of KV Cache required per query by about 93.3% versus standard attention. KV Cache is a memory mechanism in transformer models that stores data representing the context of the conversation, reducing unnecessary computation."


    ------------------------------
    Carlos Salas
    Portfolio Manager & Freelance Investment Research Consultant
    ------------------------------



  • 8.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 17-02-2025 21:02

    Another follow up analysis from Jonathan Ross on the future of Training vs. Inference.

    video



    ------------------------------
    Todor Kostov
    Director
    ------------------------------



  • 9.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 11-06-2025 09:43

    More new developments on LLMs coming from  Europe:

    • Mistral on Tuesday launched Europe's first AI reasoning model, which uses logical thinking to create a response, as it tries to keep pace with American and Chinese rivals at the forefront of AI development.
    • The French startup has attempted to differentiate itself by championing its European roots, winning the support of French President Emmanuel Macron, as well as making some of its models open source in contrast to the proprietary offerings of OpenAI or Alphabet's Google.
    • Mistral is considered Europe's best shot at having a home-grown AI competitor, but has lagged behind in terms of market share and revenue.
    • Reasoning models use chain-of-thought techniques - a process that generates answers with intermediate reasoning abilities when solving complex problems.
    • They could also be a promising path forward in advancing AI's capabilities as the traditional approach of building ever-bigger large language models by adding more data and computing power begins to hit limitations.
    • For Mistral, which was valued by venture capitalists at $6.2 billion, an industry shift away from "scaling up" could give it a window to catch up against better capitalized rivals.
    • China's DeepSeek broke through as a viable competitor in January through its low-cost, open-sourced AI models, including one for reasoning.
    • OpenAI was the first to launch its reasoning models last year, followed by Google a few months later.
    • Meta, which also offers its models open-sourced, has not yet released a standalone reasoning model, though it said its latest top-shelf model has reasoning capabilities.
    • Mistral is launching an open-sourced Magistral Small model and a more powerful version called Magistral Medium for business customers.
    • "The best human thinking isn't linear - it weaves through logic, insight, uncertainty, and discovery. Reasoning language models have enabled us to augment and delegate complex thinking and deep understanding to AI," Mistral said. 
    • American companies have mostly kept their most advanced models proprietary, though a handful, such as Meta, has released open-source models. In contrast, Chinese firms ranging from DeepSeek to Alibaba have taken the open-source path to demonstrate their technological capabilities.
    • Mistral Small is available for download on Hugging Face's platform and can reason in languages including English, French, Spanish, Arabic and simplified Chinese.

    sources: 

    • https://www.axios.com/2025/06/10/mistral-ai-reasoning-models-open-source
    • https://www.reuters.com/business/frances-mistral-launches-europes-first-ai-reasoning-model-2025-06-10/


    Feel free to add information or comment on this new LLM development



    ------------------------------
    Carlos Salas
    Portfolio Manager & Freelance Investment Research Consultant
    ------------------------------



  • 10.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 12-06-2025 14:02

    Thanks Carlos. 

    Below is a quick update on the recent Meta investment in Scale AI ($14.8b for 49% stake) in the race for 'superintelligence'.

    https://www.semafor.com/article/06/10/2025/metas-15-billion-investment-in-scale-ai-comes-with-a-hidden-perk



    ------------------------------
    Todor Kostov
    Director
    ------------------------------



  • 11.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 08-08-2025 17:14

    Meet GPT-5

    GPT-5 is OpenAI's newest flagship model. It's a big step up in reasoning, speed, and "do-the-task" reliability, and it's built to work across text, images, audio, and video.

    What it's good at

    • Stronger reasoning & planning: tackles multi-step problems with fewer mistakes and better uncertainty handling. 

    • Agentic skills: pairs with ChatGPT's new agent to operate software for you (think: "book this flight and file the expense") while staying within clear permissions. 

    • Multimodal I/O: understand and generate across text, images, and speech, and reason over mixed inputs. 

    • Long context: far longer working memory than prior models, enabling large docs, datasets, and end-to-end workflows.

    • Developer-friendly: benefits from recent API advances like structured outputs and the newer 4.x family groundwork. 

    Where you can use it

    • ChatGPT (personal and work plans), with GPT-5 as the default "just works" model.

    • OpenAI API for apps/automation, alongside mini/nano variants for cost/latency trade-offs.

    Why it matters

    Compared to earlier GPT-4.x models, GPT-5 is more reliable at complex tasks, better at following instructions, and faster-so it handles end-to-end jobs (analysis → decision → action) with less hand-holding. 

    https://openai.com/index/introducing-gpt-5/



    ------------------------------
    Carlos Salas
    Portfolio Manager & Freelance Investment Research Consultant
    ------------------------------



  • 12.  RE: LLMS - DeepSeek V3 Breakthrough

    Posted 10-08-2025 23:00

    Thanks Carlos, Great update!

    Below are a few snapshots from a recent update from Artificial Analysis on the State of AI.

    Back in February, when o3 was introduced, it was well ahead of the rest of the frontier models.

    o3
    Currently, the competition is way more intense and o5 (according to them) is behind Grok 4 on some of the benchmarks even though xAI introduced it almost a month ago.
    o5
    Some food for thought here ...
    Lite edition from the presentation from Artificial Analysis through the link below:


    ------------------------------
    Todor Kostov
    Director
    ------------------------------