Writing

Getting Started with the Vercel AI SDK

文章發表於

Introduction

Vercel's open-source AI SDK is arguably one of the most developer-friendly AI integration libraries available today. When building AI-powered products, we often need to leverage different large language models (LLMs) for various scenarios, which typically requires writing glue code to integrate multiple LLM APIs. The AI SDK directly addresses this pain point by providing a unified API interface that works seamlessly with any model.

Here's a practical example: imagine you're building a jsExpert AI assistant and want to integrate both Google's Gemini and Anthropic's Claude. With the AI SDK, you can abstract the jsExpert function to accept any model and prompt without writing platform-specific conditional logic, since Vercel AI SDK handles all the underlying complexity for you.

import { google } from "@ai-sdk/google";
import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from "ai";
const gemini = google("gemini-2.0-flash-001");
const claude = anthropic("claude-sonnet-4-20250514");
const SYSTEM_PROMPT = "You are a JavaScript expert. Please answer the user's question concisely."
const jsExpert = async ({
prompt,
model,
}) => {
const { text } = await generateText({
model,
prompt,
system: SYSTEM_PROMPT,
});
console.log(text);
};
await jsExpert({ prompt: "What's JavaScript?", model: gemini });
await jsExpert({ prompt: "What's JavaScript?", model: claude });

In this article, we'll build a simple CLI chat interface to explore the APIs provided by the AI SDK.

Environment Setup

Installation

Start by navigating to your development directory and run the following setup commands:

> git init
> npx gitignore node
> pnpm init
> pnpm add -D @types/dotenv-safe @types/node @typescript-eslint/eslint-plugin @typescript-eslint/parser eslint tsx typescript
> pnpm add @ai-sdk/google ai dotenv dotenv-safe zod

Add the following scripts to your package.json:

{
...
"scripts": {
"build": "tsc",
"start": "node dist/main.js",
"dev": "tsx src/main.ts",
"lint": "eslint src --ext .ts",
"clean": "rm -rf dist"
},
...
}

generateText

generateText is straightforward to use—simply provide the model and prompt, and you'll receive the LLM's response.

// main.ts
import * as dotenvSafe from "dotenv-safe";
import { google } from "@ai-sdk/google";
import { generateText } from "ai";
dotenvSafe.config();
const model = google("gemini-2.0-flash-001");
const { text } = await generateText({
model,
prompt: "What's the ECMAScript? Please answer the question concisely.",
});
console.log(text);

streamText

While generateText returns complete responses in one go, the inherent computational and transmission latency of LLMs means users still experience waiting times between submitting a query and seeing the full response, which can degrade user experience. streamText addresses this with real-time token-by-token streaming.

import * as dotenvSafe from "dotenv-safe";
import { google } from "@ai-sdk/google";
import { generateText, streamText } from "ai";
dotenvSafe.config();
const model = google("gemini-2.0-flash-001");
const { textStream } = streamText({
model,
prompt: "What is ECMAScript?",
// Sometimes the AI has to follow a specific behavior, no matter the prompt it receives. -- use system, it can be use both generateText and streamText
system:
"You are a JavaScript expert. Please answer the user's question concisely.",
});
for await (let text of textStream) {
process.stdout.write(text);
}

streamText returns results in chunks, allowing the frontend to render partial text immediately, significantly improving content delivery fluidity.

System Prompts

When you need the AI to follow specific behaviors, system prompts provide a way to predefine instructions. We've already seen how to include prompts in the system field, but you can also place them in the messages array.

const { textStream } = streamText({
model,
prompt: "What is ECMAScript?",
messages: [{ role: "system", content: "You are a JavaScript expert. Please answer the user's question concisely." }]
});

CLI Chat Interface

Let's outline the main commands for our CLI chat interface:

  • history: Display your conversation history with the system
  • help: Show available commands
  • exit: Exit the chat

The start method in the Chat class continuously reads user input. If the input starts with /, it calls handleCommand; otherwise, it calls streamResponse. This implementation uses the AI SDK as described earlier, with the addition of maintaining user chat history in this.messages.

// main.js
const dotenvSafe = require("dotenv-safe");
const { google } = require("@ai-sdk/google");
const { streamText } = require("ai");
const readline = require("readline");
dotenvSafe.config();
const gemini = google("gemini-2.0-flash-001");
class Chat {
rl;
messages = [
/** System prompt can also be put it as the first chat in the history */
{
role: "system",
content:
"You are a helpful assistant. Please answer the user's questions concisely and helpfully.",
},
];
constructor() {
this.rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
}
async streamResponse(prompt) {
this.messages.push({ role: "user", content: prompt });
try {
const { textStream } = streamText({
model: gemini,
messages: this.messages,
});
let assistantResponse = "";
process.stdout.write(`\nAssistant: `);
for await (let text of textStream) {
process.stdout.write(text);
assistantResponse += text;
}
this.messages.push({ role: "assistant", content: assistantResponse });
console.log("\n");
} catch (err) {
console.error(err);
}
}
async handleCommand(input) {
const command = input.slice(1).toLowerCase();
switch (command) {
case "help":
console.log(`
Commands:
/help - Show this help
/history - Show conversation history
/exit - Exit chat
Just type your message to chat!
`);
return false;
case "history":
console.log(JSON.stringify(this.messages, null, 2));
return false;
case "exit":
console.log("See ya!");
return true;
default:
console.log(
`Unknown command: ${command}\nType /help for available commands\n`,
);
return false;
}
}
async start() {
while (true) {
try {
// const input: string becomes const input
const input = await new Promise((res) =>
this.rl.question("You: ", res),
);
if (!input.trim()) continue;
if (input.startsWith("/")) {
const shouldExit = await this.handleCommand(input);
if (shouldExit) break;
continue;
}
await this.streamResponse(input);
} catch (error) {
console.error(`Error: ${error}`);
break;
}
}
this.rl.close();
}
}
async function main() {
console.log("Type /help for commands\n");
const chat = new Chat();
await chat.start();
}
process.on("SIGINT", () => {
console.log("\nGoodbye!");
process.exit(0);
});
main().catch(console.error);

You can now interact with the CLI chat interface using pnpm dev:

> pnpm dev
> [email protected] dev /Users/jinghuangsu/workspace/ai/ai-sdk
> tsx src/main.ts
Type /help for commands
You: hi
Assistant: Hi there! How can I help you today?
You: /history
[
{
"role": "user",
"content": "hi"
},
{
"role": "assistant",
"content": "Hi there! How can I help you today?\n"
}
]

Advanced Features

Let's explore additional APIs. Feel free to extend your Chat CLI with these features as you read along.

generateObject

Using Zod with generateObject ensures the AI returns data in a predefined object structure. For example, when asking an AI for a recipe, receiving a continuous block of text is much harder to parse and utilize compared to structured data.

This is where generateObject shines—it transforms unstructured AI output into structured data.

import { google } from "@ai-sdk/google";
import { generateObject } from 'ai';
import { z } from 'zod';
const model = google("gemini-2.0-flash-001");
const { object } = await generateObject({
model,
schema: z.object({
recipe: z.object({
name: z.string(),
ingredients: z.array(z.string()),
steps: z.array(z.string()),
}),
}),
prompt: 'Generate a Mapo Tofu recipe.',
});
console.log(JSON.stringify(object, null, 2));

You can also use this API to generate specific enum values. For instance, in an era of information overload, if you want an AI to classify an article's sentiment as positive or negative, you can predefine this in generateObject.

import { google } from "@ai-sdk/google";
import { generateObject } from 'ai';
const model = google("gemini-2.0-flash-001");
const article = "Today is a good day";
const { result } = await generateObject({
model,
output: 'enum',
enum: ['positive', 'negative', 'neatural'],
system: `You are a professional sentiment analyst. Your task is to determine the overall emotional tone of the article content provided by the user. You must select ONLY the most fitting result from 'positive', 'negative', or 'neutral' as your output, and must NOT include any explanation or additional text.`,
prompt: article
});
// "positive"
console.log(result);

streamObject

streamObject addresses the same problem as streamText but for structured data. Here's how you would rewrite the generateObject example using streamObject:

import { google } from "@ai-sdk/google";
import { streamObject } from 'ai';
import { z } from 'zod';
const model = google("gemini-2.0-flash-001");
const result = await streamObject({
model,
schema: z.object({
recipe: z.object({
name: z.string(),
ingredients: z.array(z.string()),
steps: z.array(z.string()),
}),
}),
prompt: 'Generate a Mapo Tofu recipe.',
});
for await (let chunk of result.partialObjectStream) {
console.clear();
console.dir(chunk, { depth: null });
}
const finalObject = await result.object;
console.log(finalObject)

Non-Text Inputs

Modern LLMs are multimodal, and the AI SDK supports non-text inputs. For example, to ask an AI to describe an image, you can use any of the previously mentioned APIs with slight modifications to the input parameters.

const imgURL = "https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png"
const { text } = await generateText({
model: gemini,
system: `You will receive an image. Please describe it concisely, ensuring the output length is within 100 words.`,
messages: [
{
role: "user",
content: [
{
type: "image",
image: new URL(imgURL),
},
],
},
],
});
console.log(text)
// You: https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png
// Assistant: "The image illustrates the concepts of \"Debounce\" and \"Throttle\" using simple diagrams. \"Debounce\" is represented by a series of close arrows merging into a single arrow pointing towards a stick figure. \"Throttle\" shows three vertical arrows stemming from a horizontal line and then flowing into a single arrow directed at a stick figure. The stick figures are essentially the same, each pointing a finger toward the arrows. The background is a plain, light color, ensuring the focus remains on the diagrams and text.\n"

Conclusion

This article provided a basic introduction to the AI SDK's core APIs. I highly recommend checking out the AI SDK official website for more detailed and comprehensive documentation!

If you enjoyed this article, please click the buttons below to share it with more people. Your support means a lot to me as a writer.
Buy me a coffee