AI Jargons
LLM (Large Language Model)
LLM is a sophisticated mathematical function that predicts the next word in the sequence. So if you ask LLM to complete the sentence “The capital of France is ___“, it will try to predict the next word based on knowledge it is trained upon. For more comprehensible overview of the LLM, please watch this amazing video from 3Blue1Brown
There are two types of LLMs:
Autoregressive Language Model
An autoregressive language model is trained to predict the next token in a sequence, using only the preceding tokens.
Think of it like writing a sentence word by word, where you can only see what you've already written - never what comes next. Each prediction depends solely on the past.
Masked Language Model
A masked language model is trained to predict missing tokens anywhere in a sequence, using the context from both before and after the missing tokens.
Think of it like a fill-in-the-blank test where you can read the entire sentence before answering:
Input: "The [MASK] sat on the mat"
Model sees: words before AND after the blank
Prediction: "cat" (using context from both sides)
Tokens
A token is a fundamental unit of a text. A token is often a word or even part of a word. For example, the sentence "I love cats" has three tokens: "I", "love", and "cats". Sometimes, it can go even smaller. If you have a complicated word like "Flibbertigibbet", it will get split into six tokens like "Fl", "ib", "bert", “ig”, “ib” and “keit”
Whether a word splits into multiple tokens or not comes down to the vocabulary of the respective LLM. The process of breaking the original text into tokens is called tokenization. You can play around on how OpenAI tokenize on their website. GPT-4o vocabulary size is 199,997.
Prompt
It’s just a fancy way of saying “what you tell the computer to do.” For example, when you ask ChatGPT, “Write an email to my manager that I am quitting“, you’re basically prompting ChatGPT.
To receive high quality response from LLMs, there are many techniques that AI researchers have came up with. Check out this website to learn more about prompt engineering.
LMM (Large Multimodal Model)
A Large Multimodal Model is an advanced AI model that can process and understand multiple types of data, such as text, images, audio, and video. Unlike traditional Large Language Models (LLMs) that focus primarily on text, LMMs handle diverse data forms for a holistic interpretation. Examples include GPT-4o and Google Gemini, which can respond to both text and images.
Inference
Inference is the process of a LLM drawing conclusion (generating response) based on the data it was trained upon. So basically inference is an AI model in action.
Think of it like studying vs taking the exam. During training, the model reads billions of documents and learns patterns. During inference, it applies that knowledge to answer your specific question.
Fine tuning
Fine tuning is a technique where a pre-trained AI model is further trained on a smaller, specialized dataset to adapt for a specific task, like coding. It leverages existing knowledge, enhancing performance without extensive retraining, which is useful when data is limited.
Think of it like hiring a smart generalist and then training them on your company's specific processes. They already know how to think and communicate - you're just teaching them your particular way of doing things.
RAG
Retrieval Augmented Generation (RAG) is a smart technology that helps LLM provide more accurate and relevant answers by combining two key steps: searching for information and then generating a response. Think of RAG like a super-smart librarian who doesn't just recall information from memory, but actively searches through a vast library to find the most relevant sources before crafting an answer.
The RAG Process:
Retrieval: Imagine you ask a question. LLM first searches through a large database to find the most relevant information related to your query. This is similar to a librarian pulling out the most relevant books from shelves.
Augmentation: The LLM then takes those retrieved documents and summarizes or enhances the key information, making it more digestible. It’s like the librarian highlighting the most important passages in those books.
Generation: Finally, the LLM uses this retrieved and augmented information to craft a precise, contextually accurate response. This means the answer is not just pulled from the LLM’s original training, but grounded in up-to-date and specific information.
The big advantage of RAG is that it helps solve a common problem with LLM called hallucination - where LLM might confidently provide incorrect information. By retrieving and using real, current information, RAG makes AI responses more reliable and accurate.
Tools
Tools are functions or APIs that agents use to interact with the environment, access information, or perform tasks, like web search or database queries, extending their capabilities beyond pre-trained knowledge.
Parameters/Weights
Parameters (also called weights) are the numbers inside an AI model that determine how it processes information. When you hear "GPT-4 has 1.8 trillion parameters" or "Llama 3 is a 70B model," these numbers refer to how many adjustable values the model contains.
Think of parameters like the settings on a massive mixing board in a recording studio. Each knob affects how the final sound comes out. During training, the AI adjusts millions or billions of these "knobs" until it produces good outputs. More parameters generally means the model can capture more nuanced patterns, but also requires more computing power to run.
The relationship isn't linear though. A well-trained 8B model can outperform a poorly-trained 70B model. Quality of training data and techniques matter as much as raw size.
Context Window
The context window is the maximum amount of text an AI model can process at once. It's measured in tokens. If a model has a 128K context window, it can "see" roughly 100,000 words simultaneously.
Imagine reading a book, but you can only remember the last 50 pages. That's essentially what a limited context window does to an AI. Anything beyond that window effectively doesn't exist for the model.
Context windows have grown dramatically:
GPT-3 (2020): 4K tokens
GPT-4 (2023): 128K tokens
Claude 3 (2024): 200K tokens
Gemini 1.5 (2024): 1M+ tokens
Larger context windows enable use cases like analyzing entire codebases, summarizing long documents, or maintaining coherent conversations over extended sessions. The tradeoff is that longer contexts require more compute and can increase latency.
Agentic AI
Agentic AI refers to AI systems that can autonomously plan, reason, and take actions to accomplish goals, rather than just responding to single prompts. Instead of asking "write me an email," you might tell an agent "research competitors, draft a market analysis, and schedule a meeting with the team to discuss findings."
The key capabilities that make AI "agentic":
• Planning: Breaking down complex tasks into steps
• Tool Use: Calling APIs, searching the web, executing code
• Memory: Remembering context across interactions
• Reasoning: Deciding what to do next based on results
• Autonomy: Operating with minimal human intervention
A simple chatbot answers questions. An agent books your flight, checks your calendar for conflicts, sends confirmation to your team, and adds the trip to your expense tracker.
MCP (Model Context Protocol)
MCP is an open standard created by Anthropic that defines how AI models connect to external tools, data sources, and services. Think of it as USB-C for AI. Before USB-C, every device had its own charger. MCP aims to solve the same fragmentation problem for AI integrations.
Without MCP, connecting an AI to your calendar, database, or code editor requires custom integration work for each combination. With MCP, a tool built once works with any MCP-compatible AI system.
The protocol defines three core primitives:
• Tools: Functions the AI can call (e.g., send_email, search_database)
• Resources: Data the AI can access (e.g., files, documents)
• Prompts: Reusable templates for common tasks
Example: An MCP server for Notion exposes tools like "create_page" and "search_notes." Any MCP-compatible AI (Claude, GPT, local models) can then interact with Notion without needing Notion-specific code.
Skills
Skills are packaged capabilities that extend what an AI agent can do. A skill typically includes instructions, tools, and templates that enable the agent to perform a specific task well.
Think of skills like apps on your phone. Your phone's base OS can do a lot, but you install apps (skills) for specific tasks: photography, banking, navigation. Similarly, a base AI model can chat, but skills let it do specialized work like "analyze SEO," "generate images," or "manage your calendar."
A skill usually contains:
• Instructions: How to approach the task (prompts, best practices)
• Tools: APIs or functions needed (e.g., image generation API)
• Templates: Reusable formats for outputs
• Context: Domain knowledge relevant to the task
Skills enable modularity. Instead of one massive AI that tries to do everything, you compose smaller, focused skills as needed. This makes AI systems more maintainable and allows specialized skills to be shared across different agents.