Mohit's Personal Blog

AI Jargons

Mohit Tilwani — Wed, 18 Mar 2026 08:00:00 GMT

LLM (Large Language Model)

LLM is a sophisticated mathematical function that predicts the next word in the sequence. So if you ask LLM to complete the sentence “The capital of France is ___“, it will try to predict the next word based on knowledge it is trained upon. For more comprehensible overview of the LLM, please watch this amazing video from 3Blue1Brown

There are two types of LLMs:

Autoregressive Language Model

An autoregressive language model is trained to predict the next token in a sequence, using only the preceding tokens.

Think of it like writing a sentence word by word, where you can only see what you've already written - never what comes next. Each prediction depends solely on the past.

Masked Language Model

A masked language model is trained to predict missing tokens anywhere in a sequence, using the context from both before and after the missing tokens.

Think of it like a fill-in-the-blank test where you can read the entire sentence before answering:

Input: "The [MASK] sat on the mat"
Model sees: words before AND after the blank
Prediction: "cat" (using context from both sides)

Tokens

A token is a fundamental unit of a text. A token is often a word or even part of a word. For example, the sentence "I love cats" has three tokens: "I", "love", and "cats". Sometimes, it can go even smaller. If you have a complicated word like "Flibbertigibbet", it will get split into six tokens like "Fl", "ib", "bert", “ig”, “ib” and “keit”

Whether a word splits into multiple tokens or not comes down to the vocabulary of the respective LLM. The process of breaking the original text into tokens is called tokenization. You can play around on how OpenAI tokenize on their website. GPT-4o vocabulary size is 199,997.

Prompt

It’s just a fancy way of saying “what you tell the computer to do.” For example, when you ask ChatGPT, “Write an email to my manager that I am quitting“, you’re basically prompting ChatGPT.

To receive high quality response from LLMs, there are many techniques that AI researchers have came up with. Check out this website to learn more about prompt engineering.

LMM (Large Multimodal Model)

A Large Multimodal Model is an advanced AI model that can process and understand multiple types of data, such as text, images, audio, and video. Unlike traditional Large Language Models (LLMs) that focus primarily on text, LMMs handle diverse data forms for a holistic interpretation. Examples include GPT-4o and Google Gemini, which can respond to both text and images.

Inference

Inference is the process of a LLM drawing conclusion (generating response) based on the data it was trained upon. So basically inference is an AI model in action.

Think of it like studying vs taking the exam. During training, the model reads billions of documents and learns patterns. During inference, it applies that knowledge to answer your specific question.

Fine tuning

Fine tuning is a technique where a pre-trained AI model is further trained on a smaller, specialized dataset to adapt for a specific task, like coding. It leverages existing knowledge, enhancing performance without extensive retraining, which is useful when data is limited.

Think of it like hiring a smart generalist and then training them on your company's specific processes. They already know how to think and communicate - you're just teaching them your particular way of doing things.

RAG

Retrieval Augmented Generation (RAG) is a smart technology that helps LLM provide more accurate and relevant answers by combining two key steps: searching for information and then generating a response. Think of RAG like a super-smart librarian who doesn't just recall information from memory, but actively searches through a vast library to find the most relevant sources before crafting an answer.

The RAG Process:

Retrieval: Imagine you ask a question. LLM first searches through a large database to find the most relevant information related to your query. This is similar to a librarian pulling out the most relevant books from shelves.

Augmentation: The LLM then takes those retrieved documents and summarizes or enhances the key information, making it more digestible. It’s like the librarian highlighting the most important passages in those books.

Generation: Finally, the LLM uses this retrieved and augmented information to craft a precise, contextually accurate response. This means the answer is not just pulled from the LLM’s original training, but grounded in up-to-date and specific information.

The big advantage of RAG is that it helps solve a common problem with LLM called hallucination - where LLM might confidently provide incorrect information. By retrieving and using real, current information, RAG makes AI responses more reliable and accurate.

Tools

Tools are functions or APIs that agents use to interact with the environment, access information, or perform tasks, like web search or database queries, extending their capabilities beyond pre-trained knowledge.

Parameters/Weights

Parameters (also called weights) are the numbers inside an AI model that determine how it processes information. When you hear "GPT-4 has 1.8 trillion parameters" or "Llama 3 is a 70B model," these numbers refer to how many adjustable values the model contains.

Think of parameters like the settings on a massive mixing board in a recording studio. Each knob affects how the final sound comes out. During training, the AI adjusts millions or billions of these "knobs" until it produces good outputs. More parameters generally means the model can capture more nuanced patterns, but also requires more computing power to run.

The relationship isn't linear though. A well-trained 8B model can outperform a poorly-trained 70B model. Quality of training data and techniques matter as much as raw size.

Context Window

The context window is the maximum amount of text an AI model can process at once. It's measured in tokens. If a model has a 128K context window, it can "see" roughly 100,000 words simultaneously.

Imagine reading a book, but you can only remember the last 50 pages. That's essentially what a limited context window does to an AI. Anything beyond that window effectively doesn't exist for the model.

Context windows have grown dramatically:

GPT-3 (2020): 4K tokens
GPT-4 (2023): 128K tokens
Claude 3 (2024): 200K tokens
Gemini 1.5 (2024): 1M+ tokens

Larger context windows enable use cases like analyzing entire codebases, summarizing long documents, or maintaining coherent conversations over extended sessions. The tradeoff is that longer contexts require more compute and can increase latency.

Agentic AI

Agentic AI refers to AI systems that can autonomously plan, reason, and take actions to accomplish goals, rather than just responding to single prompts. Instead of asking "write me an email," you might tell an agent "research competitors, draft a market analysis, and schedule a meeting with the team to discuss findings."

The key capabilities that make AI "agentic":

• Planning: Breaking down complex tasks into steps
• Tool Use: Calling APIs, searching the web, executing code
• Memory: Remembering context across interactions
• Reasoning: Deciding what to do next based on results
• Autonomy: Operating with minimal human intervention

A simple chatbot answers questions. An agent books your flight, checks your calendar for conflicts, sends confirmation to your team, and adds the trip to your expense tracker.

MCP (Model Context Protocol)

MCP is an open standard created by Anthropic that defines how AI models connect to external tools, data sources, and services. Think of it as USB-C for AI. Before USB-C, every device had its own charger. MCP aims to solve the same fragmentation problem for AI integrations.

Without MCP, connecting an AI to your calendar, database, or code editor requires custom integration work for each combination. With MCP, a tool built once works with any MCP-compatible AI system.

The protocol defines three core primitives:

• Tools: Functions the AI can call (e.g., send_email, search_database)
• Resources: Data the AI can access (e.g., files, documents)
• Prompts: Reusable templates for common tasks

Example: An MCP server for Notion exposes tools like "create_page" and "search_notes." Any MCP-compatible AI (Claude, GPT, local models) can then interact with Notion without needing Notion-specific code.

Skills

Skills are packaged capabilities that extend what an AI agent can do. A skill typically includes instructions, tools, and templates that enable the agent to perform a specific task well.

Think of skills like apps on your phone. Your phone's base OS can do a lot, but you install apps (skills) for specific tasks: photography, banking, navigation. Similarly, a base AI model can chat, but skills let it do specialized work like "analyze SEO," "generate images," or "manage your calendar."

A skill usually contains:

• Instructions: How to approach the task (prompts, best practices)
• Tools: APIs or functions needed (e.g., image generation API)
• Templates: Reusable formats for outputs
• Context: Domain knowledge relevant to the task

Skills enable modularity. Instead of one massive AI that tries to do everything, you compose smaller, focused skills as needed. This makes AI systems more maintainable and allows specialized skills to be shared across different agents.

Was ist Vector Database

Mohit Tilwani — Thu, 28 Aug 2025 20:36:15 GMT

Imagine you are at a library with hundreds of thousands of books. Instead of searching by exact title or keyword, you want to find all the books that are about the same idea as your question.
Traditional databases can answer: “Find me books where the title contains ‘blockchain’.”
But what if you ask: “Show me books that explain how money moves without banks”? A keyword match might miss it.

This is where vectors come in.

AI models can convert text, images, or audio into vectors which is basically long lists of numbers that capture meaning. Two pieces of content with similar meaning will have vectors that are close together in this high-dimensional number space.

A vector database is simply a database built to store these vectors and quickly find the ones most similar to your query.

Best way to think about it is like it is Google Maps for ideas → instead of distance between cafés, it’s distance between meanings.

Embedding Models

At the heart of every vector database is an embedding model. An embedding model takes an input like text, image, audio etc. and converts it into a vector which is a long list of numbers that captures its meaning.

Think of it as translation, just like Google Translate converts English into German, an embedding model converts human language into the “language of vectors.”

There are different kinds of embedding model based on the use case of the particular app. To name few

Text Embeddings —> semantic search, chatbots, classification, RAG.
Image Embeddings —> “find similar images,” cross-modal search
Multimodal Embeddings —> “find images that match a text description,” “align video with captions.”
Domain-Specific Embeddings —> Specialized for legal, medical, financial, or code data

There are many open source model and choosing the model is one of the most important decision that needs to be made earlier in the product development lifecycle.

Embedding Dimensions

The length of the vector is different based on the model you will choose. For example some models produces a 1536-dimensional vector, it means:

Every “piece of text” you pass in is turned into an array of 1536 numbers.
Each number captures a tiny piece of information about the meaning of that text (kind of like “semantic ingredients”).
Together, these 1536 numbers form a position in a 1536-dimensional space.

Imagine describing a fruit:

In real life, you might use 3 dimensions (color, size, sweetness).
So “banana” might be (yellow=0.9, size=0.6, sweetness=0.8).

Now, instead of 3 traits, an embedding model uses 1536 traits. They are not as intuitive as “color” or “size” as they are abstract semantic features learned from data. But the idea is the same: each text becomes a long numeric fingerprint.

Isn’t 1536 a very long array for a piece of text! Yes it is but also more dimensions means richer representation. With 1536 numbers, the model can capture subtle differences in meaning (e.g., “bank” as a financial institution vs. “bank” of a river).

Distance Metrics

Okay, so we have our embeddings but how does the database actually know which ones are closer to your question? Think of a vector database like Google Maps for ideas. To find the ‘nearest’ concepts, it needs a way to measure distance which is not in kilometers, but in terms of meaning. These measurement rules are called distance metrics.

There are some common distance metrics that is used by the vector DB to find the relevant results.

Cosine Similarity

Cosine similarity cares about the direction of meaning, not the length of the vector.
Euclidean Distance
Euclidean distance measures how far two points are from each other in the vector space.

Let’s say there are three users. 1st user has watched 2 comedy and 2 action movies. 2nd user has watched 20 comedy and 20 action movies and 3rd user has watched 10 comedy and 2 action movies.

If you use cosine similarity, 1st user and 2nd user are more similar compare to 1st and 3rd. If you use euclidean distance 1st user and 3rd user are more similar. Cosine similarity cares about taste proportions. Euclidean distance cares about how many movies watched.

Metadata Filtering

When you store embeddings in a vector database, you don’t just store the vector. You usually also attach metadata: extra information about where the text came from, who owns it, when it was added, etc.

Metadata filtering means when searching for similar vectors, you can also apply conditions on this extra information. It’s like saying: “Find me the most relevant results, but only from project X, written in English, and after 2023.”

Without metadata filters, you would risk retrieving semantically similar but irrelevant results.

Why It Matters

Keeps results contextual and secure (e.g., tenant-based access).
Reduces noise (e.g., don’t show outdated info).
Saves latency (search smaller candidate pool).

Let’s consider an e-commerce example where the data stored in vector DB looks like below

{
  "id": "p_902",
  "vector": [...],
  "text": "Leather sneakers with memory foam soles",
  "metadata": {
    "category": "shoes",
    "price": 120,
    "brand": "Nike",
    "gender": "men"
  }
}

User queries “comfortable men’s shoes”. Vector DB search would find sneakers as relevant with a filter applied gender = “men“. User sees only men’s shoes and not irrelevant women’s high heels.

Hybrid Search

When you search in a vector database, you can do it in two ways:

Keyword search (sparse search)
- Looks for exact words that match.
- Great for things like names, IDs, code, or very specific terms.
- Example: If you search for “iPhone 16 Pro”, keyword search will catch documents with the exact phrase.
Vector search (dense search)
- Looks for meaning, not exact words.
- Great for natural language queries.
- Example: If you search for “latest Apple smartphone”, vector search can still match documents about “iPhone 16 Pro” even if those words aren’t used.

The problem is keyword search alone misses documents if the wording is different and Vector search alone sometimes misses exact matches (like product codes or rare keywords). Hybrid search combines both methods. It looks at exact keywords and semantic meaning, then merges the results.

How Vector Databases Stay Fast

If you had 10 documents, you could brute-force compare your query embedding to each one which is not efficient but still would be pretty fast. But what if you had 100 million vectors?

Doing “compare with every vector” would be too slow and too expensive. That’s why vector databases use Approximate Nearest Neighbor (ANN) search. Instead of scanning everything, they use clever shortcuts to jump directly to the “neighborhood” where the answer probably is. It’s like finding the nearest Starbucks by checking your neighborhood first instead of the whole city. You sacrifice a tiny bit of accuracy for huge speed gains.

Juicebox 🧃

Mohit Tilwani — Sun, 06 Jul 2025 22:00:00 GMT

While exploring options for securely storing encryption keys in a decentralized and trust-minimized way, I came across Juicebox — and it was exactly what I was looking for. I wanted a solution that wouldn't rely on centralized storage or risky backups, but also wouldn't require users to remember anything more than a simple PIN.

The Problem Juicebox Solves

Backups are hard and risky. If a user loses their device, they lose access to their secret. If someone steals their backup, they can take everything. What's needed is a solution that:

Doesn't rely on centralized trust
Protects against brute-force attacks
Doesn't require remembering long passwords or writing down recovery phrases
Works across multiple servers, devices, and threat models

Juicebox delivers on all of this by combining cutting-edge cryptography with a developer-friendly design

How Juicebox Works (in Simple Terms)

Juicebox breaks your secret into multiple pieces and distributes them to different servers, called realms. To recover the secret, a user must interact with a threshold number of realms (e.g., 2 out of 3). Each realm contributes a piece of the puzzle, but no single realm can access the full secret.

To recover a secret, the user just enters their PIN. Juicebox performs a threshold-based Oblivious Pseudorandom Function (T-OPRF), allowing the realms to validate the PIN without ever seeing it. Once the correct PIN is validated, the user gets the pieces needed to reconstruct the secret.

Realms: The Core of Juicebox's Security

There are two types of realms:

Software Realms: Easy to deploy, run on commodity cloud infrastructure.
Hardware Realms: Backed by physical HSMs (like Entrust nShield), offering tamper resistance and brute-force protection.

You can mix and match realms. For example, a 2-of-3 setup could involve 2 software realms and 1 hardware realm. This flexibility allows you to design a trust model that fits your app's security profile.

Built-in Brute-force Protection

Juicebox doesn’t just validate PINs; it defends them. Each secret is protected by a maximum guess count. If a user enters the wrong PIN too many times, the share at that realm becomes unrecoverable. This means even if an attacker gets access to all realms, they can't brute-force their way into a user's secret.

Use Cases

Crypto wallets: Replace 12-word seed phrases with a simple PIN.
Secure messaging: Recover encryption keys securely even after device loss.

Resources:

Model Context Protocol

Mohit Tilwani — Sat, 08 Mar 2025 17:18:33 GMT

There is a new kid on the block called MCP stands for Model Context Protocol. Before we deep dive into what is MCP, we should understand how LLM gets the knowledge at first place to be able to spit out the response to the user.

How LLMs get their knowledge

LLMs are basically trained on the internet data. So you can imagine all the websites, images, videos etc. is used by the companies creating the model to be able to train the LLMs.

LLMs have two training phases, pre-training and post training. In the pre-training phase, all the internet data is fed into the models to train it and in the post-training phase, the model is trained to be helpful to humans by feeding the conversation which in turn helps the model to develop a persona.

Now pre-training phase is very expensive like expensive expensive. Hence, AI engineers came up with a novel approach called tools. Using tools, you can fetch real time information like price of Bitcoin and provide it to LLMs and in this way we don’t need to do a pre-training phase every week.

Now that you understand how real time knowledge can fed into LLMs, let’s discuss the problem that MCP is trying to solve. Please read my previous article where I explained tool a bit more in detail before proceeding

Problem to be solved

Now that we have tools that can feed real time information to LLMs, the next problem to solve is creating tools. Imagine you want to create a web application and there are NO libraries exists out there. It’s just you and Javascript. Yikes, right!

Now imagine, there are open source tools for pretty much anything you can imagine like Notion, AWS, Jira, Postgres etc. which you can integrate in your AI agents. Boom! Your agents are now capable of fetching all the information from the MCP server you just added. Maybe below digram would help to understand how mind boggling it is

Official Introduction of MCP

MCP is an open protocol that standardizes how applications provide context to LLMs. Anthropic provides the best analogy to think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.

MCP Components

MCP Hosts: The MCP Host is at the core of the system. It can be an application like a chat assistant, an IDE-integrated code assistant, or any AI-powered tool requiring access to external data. The host includes one or more MCP clients.
MCP Clients: The MCP Client operates within the MCP Host, acting as an intermediary between the host application and the MCP Server. It facilitates communication and ensures that relevant tools and data sources are requested correctly.
MCP Servers: The MCP Server serves as a bridge between the MCP Client and external data sources.
Local Data Sources: Your computer’s files, databases, and services that MCP servers can securely access
Remote Services: External systems available over the internet (e.g., through APIs) that MCP servers can connect to

Current Limitations

At the time of writing this, MCP doesn’t support MCP server though there is a product called Composio which offers remote MCP server but it is not official implementation of hosting MCP servers remotely which is originally developed by Anthropic (company behind Claude LLM)

So you can only run MCP server locally to able to take advantage of it. Tools like Claude chat, Cursor etc. acts as MCP Host & Client and you can run a MCP server locally to be able to take advantage of it.

ABC of Multi Agent

Mohit Tilwani — Fri, 31 Jan 2025 18:42:50 GMT

So in the last article we created a very simple agent which takes location as an input from the user and uses the necessary tools to provide a response to the user. But let’s be honest—the response was pretty bland. Large Language Models (LLMs) are known for their engaging, conversational nature, so let’s take things up a notch. This time, we’ll create our first multi-agent system, allowing us to generate richer, more dynamic responses for our users.

Why Multi Agent?

It's very important to answer "why" before making any system design decisions (or even life decisions 🤷‍♂️) to ensure we're not using a sword to cut a carrot.

First, let's review what an Agent is. An agent is a highly skilled helper that knows how to perform certain tasks very well to satisfy its master. If you teach this agent many tasks, as you add more capabilities over time, it might become difficult to manage and could lower the quality of its work due to task confusion. This is similar to the teams you work with now—some engineers are great at coding but not as good at presenting, while managers might excel at planning but not at marketing, etc.

The leaner your agent is, the better it will perform the task at hand, and it will be easier to manage. However, there are drawbacks. The more agents you create, the more latency there will be in providing a response to the end user, as you need to identify the right agent for the task, and that agent might rely on another agent to complete it. So, there's no right or wrong answer; it comes down to understanding the pros and cons and making a decision based on that. Now that you know the why, let's start cooking 👨‍🍳

Objective

We will create a response agent that will take JSON input from the weather agent to generate a response for the user.

High Level Flow

Response Agent

from pydantic_ai import Agent

response_agent = Agent(
    'openai:gpt-4o',
    system_prompt=(
        'Provide a detailed weather description based on the response from the weather agent. '
        'Include temperature, humidity, wind speed, and any notable conditions like rain, snow, or storms. '
        'Describe how the weather might feel to a person, such as whether it is comfortable, chilly, or humid.'
    ),
    retries=2,
    deps_type=str,
)

Weather Agent

from __future__ import annotations as _annotations
from dataclasses import dataclass
import logfire
from httpx import AsyncClient
from pydantic_ai import Agent, ModelRetry, RunContext

from response_generator_agent import response_agent

@dataclass
class Deps:
    client: AsyncClient
    weather_api_key: str | None
    geo_api_key: str | None


weather_agent = Agent(
    'openai:gpt-4o',
    system_prompt=(
        'Use the `get_lat_lng` tool to get the latitude and longitude of the locations, '
        'then use the `get_weather` tool to get the weather, '
        'then send the response from the `get_weather` tool as JSON'
    ),
    deps_type=Deps,
    retries=2,
)


@weather_agent.tool
async def get_lat_lng(
    ctx: RunContext[Deps], location_description: str
) -> dict[str, float]:
    """Get the latitude and longitude of a location.

    Args:
        ctx: The context.
        location_description: A description of a location.
    """
    if ctx.deps.geo_api_key is None:
        # if no API key is provided, return a dummy response (London)
        return {'lat': 51.1, 'lng': -0.1}

    params = {
        'q': location_description,
        'api_key': ctx.deps.geo_api_key,
    }
    with logfire.span('calling geocode API', params=params) as span:
        r = await ctx.deps.client.get('https://geocode.maps.co/search', params=params)
        r.raise_for_status()
        data = r.json()
        span.set_attribute('response', data)

    if data:
        return {'lat': data[0]['lat'], 'lng': data[0]['lon']}
    else:
        raise ModelRetry('Could not find the location')


@weather_agent.tool
async def get_weather(ctx: RunContext[Deps], lat: float, lng: float) -> dict[str, str]:
    """Get the weather at a location.

    Args:
        ctx: The context.
        lat: Latitude of the location.
        lng: Longitude of the location.
    """
    if ctx.deps.weather_api_key is None:
        # if no API key is provided, return a dummy response
        return {'temperature': '21 °C', 'description': 'Sunny'}

    params = {
        'apikey': ctx.deps.weather_api_key,
        'location': f'{lat},{lng}',
        'units': 'metric',
    }
    with logfire.span('calling weather API', params=params) as span:
        r = await ctx.deps.client.get(
            'https://api.tomorrow.io/v4/weather/realtime', params=params
        )
        r.raise_for_status()
        data = r.json()
        span.set_attribute('response', data)

    values = data['data']['values']
    # https://docs.tomorrow.io/reference/data-layers-weather-codes
    code_lookup = {
        1000: 'Clear, Sunny',
        1100: 'Mostly Clear',
        1101: 'Partly Cloudy',
        1102: 'Mostly Cloudy',
        1001: 'Cloudy',
        2000: 'Fog',
        2100: 'Light Fog',
        4000: 'Drizzle',
        4001: 'Rain',
        4200: 'Light Rain',
        4201: 'Heavy Rain',
        5000: 'Snow',
        5001: 'Flurries',
        5100: 'Light Snow',
        5101: 'Heavy Snow',
        6000: 'Freezing Drizzle',
        6001: 'Freezing Rain',
        6200: 'Light Freezing Rain',
        6201: 'Heavy Freezing Rain',
        7000: 'Ice Pellets',
        7101: 'Heavy Ice Pellets',
        7102: 'Light Ice Pellets',
        8000: 'Thunderstorm',
    }
    return {
        **values,
        'description': code_lookup.get(values['weatherCode'], 'Unknown'),
    }

Main File

import asyncio
import os
from httpx import AsyncClient
import logfire
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

from weather_agent import weather_agent, Deps
from response_generator_agent import response_agent

# 'if-token-present' means nothing will be sent (and the example will work) if you don't have logfire configured
logfire.configure(send_to_logfire='if-token-present')


async def main():
    async with AsyncClient() as client:
        # Get location from user input
        location = input("Enter a location: ")
        # create a free API key at https://www.tomorrow.io/weather-api/
        weather_api_key = os.getenv('WEATHER_API_KEY')
        # create a free API key at https://geocode.maps.co/
        geo_api_key = os.getenv('GEO_API_KEY')
        deps = Deps(
            client=client, weather_api_key=weather_api_key, geo_api_key=geo_api_key
        )
        result = await weather_agent.run(
            f'What is the weather like in {location}?', deps=deps
        )
        response = await response_agent.run(
            f'Please generate a response based on the weather data for {location}: {result.data}',
            deps=result.data,
        )
        print('Response:', response.data)


if __name__ == "__main__":
    asyncio.run(main())

Output

The code for the ABC of multi agent is available here.

Simple AF AI Agent

Mohit Tilwani — Sun, 26 Jan 2025 21:06:53 GMT

The best way to learn something is by doing it. Instead of diving deep into concepts from get go, let's jump straight into building an agent and work backward to understand each component.

Objective

We will build a weather agent. Sure, it may not be the most creative project (and it might feel a bit painful for those enduring gloomy European weather), but it is a great way to learn how to build an AI agent. And, who knows? You might reuse it for summer forecasts ☀️

Weather agent will take a location as input and provide the current weather details.

Show Time

Let’s start by creating an agent using Pydantic AI

from pydantic_ai import Agent
from httpx import AsyncClient
from dataclasses import dataclass

@dataclass
class Deps:
    client: AsyncClient
    weather_api_key: str | None
    geo_api_key: str | None

weather_agent = from pydantic_ai import Agent(
    'openai:gpt-4o',
    system_prompt=(
        'Be concise, reply with one sentence.'
        'Use the `get_lat_lng` tool to get the latitude and longitude of the locations, '
        'then use the `get_weather` tool to get the weather.'
    ),
    retries=2,
    deps_type=Deps,
)

In the above code, we instantiated the Agent with several parameters

openai:gpt-4o This is the name of the LLM model that you want to use for the agent. Pydantic support many other models.
system_prompt Prompt set by the developer to instruct the LLM what it needs to do. As you can see in the above prompt, I am instructing the LLM to reply concisely and use some tools. We will dive deeper into tools shortly.
retries Is a functionality offered by Pydantic to retry if there is an error during generating the response. It is similar to how HTTP libraries provide a retry parameter mechanism.
deps_type Specifies the dependency type used by the agent. We will discuss later in the blog about this

Tools: Our keys to the outside world

LLMs are trained from the data available on the internet. This has two limitations:

Data Staleness: When we utilize LLMs in the present, they are trained on the data from the past. World doesn’t stop and generates millions of terabytes of information every hour which LLM isn’t aware of
Limited Scope: LLMs can only use data that was accessible during their training. For instance, OpenAI models no longer have access to X.com data.

oh Nein, what to do now? Don’t you worry fam, we got some Tools! Tools creates a bridge between LLM and outside world which allows agents to be much more accurate and reliable.

In coding terms, tools are essentially functions that the LLM can call to retrieve additional context or data, helping it generate more accurate and reliable responses.

Let’s create some tools:

@weather_agent.tool
async def get_lat_lng(
    ctx: RunContext[Deps], location_description: str
) -> dict[str, float]:
    params = {
        'q': location_description,
        'api_key': ctx.deps.geo_api_key,
    }
    with logfire.span('calling geocode API', params=params) as span:
        r = await ctx.deps.client.get('https://geocode.maps.co/search', params=params)
        r.raise_for_status()
        data = r.json()
        span.set_attribute('response', data)

    if data:
        return {'lat': data[0]['lat'], 'lng': data[0]['lon']}
    else:
        raise ModelRetry('Could not find the location')

@weather_agent.tool
async def get_weather(ctx: RunContext[Deps], lat: float, lng: float) -> dict[str, Any]:
    params = {
        'apikey': ctx.deps.weather_api_key,
        'location': f'{lat},{lng}',
        'units': 'metric',
    }
    with logfire.span('calling weather API', params=params) as span:
        r = await ctx.deps.client.get(
            'https://api.tomorrow.io/v4/weather/realtime', params=params
        )
        r.raise_for_status()
        data = r.json()
        span.set_attribute('response', data)

    values = data['data']['values']
    # https://docs.tomorrow.io/reference/data-layers-weather-codes
    code_lookup = {
        1000: 'Clear, Sunny',
        1100: 'Mostly Clear',
        1101: 'Partly Cloudy',
        1102: 'Mostly Cloudy',
        1001: 'Cloudy',
        2000: 'Fog',
        2100: 'Light Fog',
        4000: 'Drizzle',
        4001: 'Rain',
        4200: 'Light Rain',
        4201: 'Heavy Rain',
        5000: 'Snow',
        5001: 'Flurries',
        5100: 'Light Snow',
        5101: 'Heavy Snow',
        6000: 'Freezing Drizzle',
        6001: 'Freezing Rain',
        6200: 'Light Freezing Rain',
        6201: 'Heavy Freezing Rain',
        7000: 'Ice Pellets',
        7101: 'Heavy Ice Pellets',
        7102: 'Light Ice Pellets',
        8000: 'Thunderstorm',
    }
    return {
        'temperature': f'{values["temperatureApparent"]:0.0f}°C',
        'description': code_lookup.get(values['weatherCode'], 'Unknown'),
    }

In the code above, we defined two tools:

get_lat_lng: Retrieves the latitude and longitude of a given location using a geocoding API.
get_weather: Fetches real-time weather data for a given latitude and longitude using a weather API.

We passed the decorator @weather_agent.tool to both functions which provides the weather_agent aware of the tools it can call. Unlike some frameworks that require passing tools as a list during agent creation, Pydantic AI simplifies this with decorators.

Dependencies in Pydantic AI

PydanticAI uses a dependency injection system to provide data and services to your agent's tools. As you can see in the above code snippet, the first parameter in both function is ctx which has a property deps which is basically Pydantic AI way of injecting dependencies that is available to tools during run time.

Let’s Run the Agent

async def main():
    async with AsyncClient() as client:
        # Get location from user input
        location = input("Enter a location: ")
        # create a free API key at https://www.tomorrow.io/weather-api/
        weather_api_key = os.getenv('WEATHER_API_KEY')
        # create a free API key at https://geocode.maps.co/
        geo_api_key = os.getenv('GEO_API_KEY')
        deps = Deps(
            client=client, weather_api_key=weather_api_key, geo_api_key=geo_api_key
        )
        result = await weather_agent.run(
            f'What is the weather like in {location}?', deps=deps
        )
        print('Response:', result.data)


if __name__ == "__main__":
    asyncio.run(main())

This script:

Prompts the user to enter a location.
Fetches the weather and geocoding API keys from environment variables.
Passes dependencies to the agent.
Invokes the agent to get the weather data for the specified location.

The code for the simple agent is available here.

Motivation links

Whitepaper on Agents from Google

Fusion of Crypto & AI

Mohit Tilwani — Sat, 25 Jan 2025 21:06:39 GMT

Background

With the breakthrough of the transformer architecture and GPUs, LLMs emerged. Think of the transformer architecture as the prefrontal cortex of an LLM—it unlocked the processing of natural language (much much faster).

The transformer's core innovation is the "attention" mechanism, which allows the model to dynamically focus on different parts of the input when generating each part of the output, mimicking how humans selectively concentrate on relevant information

GPUs (Graphics Processing Units) provided the computational muscle, offering massive parallel processing power that could handle the complex matrix calculations required by transformer models.

English is the new programming language

Soon, many engineers realized, LLM is performing task on a surface level i.e. it is using system 1 thinking^5. How can we improve the AI models to use system 2 thinking⁵? This led to the development of different techniques to interact with AI. Approaches³ like like zero-shot prompting, few-shot prompting, chain of thoughts, and tree of thoughts emerged. These are technical terms (which I won’t go into depth about here) for techniques we can use to interact with AI agents to produce higher-quality responses.

All of this worked remarkably well—so well that we now have dedicated platforms and tools built around these techniques. Then some smart folks thought: how can we utilize LLMs to also take action on behalf of us? In engineering terms, this is called automation—but automation powered by natural language computation.

Thus, AI agents were "born". But AI agents aren’t something new—we have been using agents for decades without even realizing it. One of the most famous agent is Google search. We type something in the language we are proficient at, and Google search agent does some magic to provide us with relevant links for further research.

Coming to the point

But why the heck are we involving AI agents in crypto? There are many compelling answers² to this, but the one that motivates me the most is: how can I help people to grow their wealth while they are asleep? I am a big fan of Naval Ravikant’s philosophy, and as he has said multiple times: “Seek wealth, not money or status.”¹ This is exactly what AI agents can help us achieve—giving “brains” to our money.

I am going to devote the next decade of my life to learn and build valuable products using AI. My goal is to create something that have a deep, positive impact on people’s daily lives.

What can you expect in the future posts

Deep dive into AI agents and different types of architecture
Rolling up our sleeves and getting hands-on (as Linus Torvalds said, "Talk is cheap. Show me the code.")
"Explain it like I’m five" takes on new and trendy topics
Few surprises here and there