Generative AI and Large Language Models
AI is a broad topic first introduced by Alan Turing in 1950 as a concept. As you can imagine, it has evolved quite a lot since then and it has kept accelerating in recent years. My goal for this article is not to go through its entire history nor cover all possible AI Models and Architectures but to be laser-focused on Generative AI and Large Language Models. There is a lot of history and also other models such as Convolutional Neural Networks (CNN) or Recurrent Neural Networks (RNN) on which we are not going to focus in this post, but you can read this great blog post from Maria’s Substack with a lot of resources to go deeper.
The reason for focusing on Generative AI is the seminal paper from Google titled “Attention is all you need” in which the authors proposed a new architecture called Transformer. Thanks to its novel architecture, they were able to require significantly less time to train the AI Model, make it more parallelizable, and have it be quite superior in quality.
Large Language Models in a Nutshell
This and major advancements across the entire compute stack led many companies in the past year to start training on very large sets of data (called datasets) and come up with a new class of models called Large Language Models (LLM for short) which have unlocked an AI model’s ability to generate human-like content (text, audio, video, code, 3D shapes, …) also known as Generative AI. Generative AI & LLMs have been transformational in how companies are integrating or rethinking their product roadmap with those new AI capabilities.
The amount and quality of the training data have been a very important factor for companies to differentiate themselves and can be considered as their gold mine and part of a company's AI defensive moat. When considering coming up with new AI use cases, it’s very important to know which kind of data you gather and hold that can make you unique or more advanced.
Related to this training data, you will often read when a new AI Large Language Model is released that it has been trained on X Billion or Trillion tokens and has X Billion parameters. So let’s dive into those two concepts of tokens and parameters as a first stepping stone.
What are Tokens?
In LLM AI, any data is converted into Tokens. Let’s demystify this a little bit as this is probably the first concept you will encounter when having to interact with an AI model for dealing with AI use Cases at your company. As an example, the OpenAI API Pricing Model, at the time of writing, is based on the number of Tokens given as input or generated (output)
Tokens can be characters, words, subwords, or other segments of text or code. Tokens are the basic units of text or code that an LLM AI uses to process and generate language. Any text input given to an AI Model, either for training the model initially with a given dataset (RedPajama-Data-v2 is an example of such a huger 30 Trillion tokens dataset released by Together.ai) or if you want to ask it questions (called Prompting) after it has been deployed, needs to be converted to tokens (through a component called Tokenizer that will perform Tokenization). In turn, the AI Model will generate tokens to produce content in response to a prompt that will eventually become a full sentence or paragraph(s). The LLM will use an output layer to convert tokens into human-readable text.
One thing to note is that the Tokenizer is a separate algorithm from the AI Model. OpenAI has a nice Tokenizer tool to visualize how some piece of text gets converted into tokens. Tokenization affects the amount of data and the number of calculations that the model needs to process. The more tokens that the model has to deal with, the more memory and computational resources that the model consumes. So please choose your Tokenizer carefully when you need to do so. Oftentimes, when doing Prompting type of use cases only, the tokenizer is hidden from the end user for a better user experience in interacting with AI Models.
The capabilities, performance, accuracy, and size of an AI Model usually depend on the number of parameters it has. So let’s understand what are those parameters.
What are Parameters?
Parameters are coefficients inside the model that are adjusted by the training procedure. To simplify, you can think of parameters as neurons inside a brain and each of those has a weight associated with it. Each weight is adjusted during the AI Training based on the data provided. Collectively and when trained, similar to a brain, they will define the behavior of the model when something is requested of it to be able to make predictions and generate content. OpenAI has been making headlines by creating ever bigger models from GPT-1 (~115 Million Parameters) towards GPT-4 (rumored to have 1+ Trillion parameters). The issue is that the bigger the number of parameters in the model the more costly it is to run. Recently, thanks to collaborative work in the open source, companies have been competing to find techniques to keep models to reasonable sizes (between 7 Billion and 70 Billion parameters) while keeping the same accuracy and performance. We will keep those techniques aside for now and cover them in future posts when we have a deeper understanding of it all.
A picture is worth a thousand words
Please note that this picture is intentionally high-level level and the goal is to dive deeper in future articles so you can build up knowledge gradually
What’s Up Next?
In the next article, we will continue to stay high level and understand the various considerations that go into choosing the right LLM for a given company and product either by using a hosted model, creating your own LLM, using a pre-trained LLM, and fine-tuning a pre-trained LLM.
Please subscribe below to avoid missing the next article unless you’re already subscribed ;)