By Haicheng | March 20, 2025, 2:56 p.m.
With AI models like GPT-4, Claude 3, and Gemini 1.5 dominating discussions today, it’s easy to forget that Large Language Models (LLMs) had a much humbler beginning. But where did it all start? How did we go from basic text generators to the advanced AI systems we have now?
Before deep learning, natural language processing (NLP) relied on statistical models like n-grams, Hidden Markov Models (HMMs), and early neural networks. These systems could generate text based on probabilities, but they lacked real understanding.
One of the first big shifts came with word embeddings, like Word2Vec (2013), which mapped words into vector space, allowing AI to understand relationships between words beyond just frequency.
The real game-changer came in 2017 with the famous paper "Attention Is All You Need", which introduced the Transformer architecture. This was the foundation for BERT (2018), GPT-1 (2018), and every modern LLM since.
After OpenAI showed the potential of transformers, the race began. GPT-3 (2020) pushed things further with 175 billion parameters, making AI-generated text almost indistinguishable from human writing. Soon after, companies like Google, Meta, and Anthropic jumped in, leading to today’s AI boom.
Now, we’re seeing models like Claude 3, Gemini 1.5, LLaMA 3, and Deepseek, all building on this early research and expanding into multi-modal AI, longer memory, and better efficiency.
The first LLMs were far from perfect, but they laid the foundation for everything we see today. Looking back, it’s crazy how fast things have evolved in just a few years.
What was the first AI model you remember using? And where do you think LLMs will go next? Let’s discuss.
Log in to like or favorite this post.