Getting Started with the Vercel AI SDK

前言

Vercel 所開源的 AI SDK 可以說是當前最好用的 AI 整合套件，在開發 AI 產品時，我們會根據不同大語言模型的優勢應用在不同的場景，這時候開發者就需要寫膠水程式碼 (glue code) 去整合各家大型語言模型的 API，AI SDK 直接解決了這個痛點，它提供單一的 API 接口即可無痛的使用任何模型。

這裡舉一個例子，假設今天要開發一個 jsExpert 的 AI 專家，這時候想要整合 Google 的 Gemini 跟 Anthropic 的 Claude，使用 AI SDK 就可以抽象化 jsExpert 這個函式，傳入所要使用的模型以及提示詞即可，不需要額外用判斷式去實作各平台的邏輯，因為 Vercel AI SDK 已經在底層幫你處理好了。

import { google } from "@ai-sdk/google";
import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from "ai";

const gemini = google("gemini-2.0-flash-001");
const claude = anthropic("claude-sonnet-4-20250514");

const SYSTEM_PROMPT = "You are a JavaScript expert. Please answer the user's question concisely."

const jsExpert = async ({
  prompt,
  model,
}) => {
  const { text } = await generateText({
    model,
    prompt,
    system: SYSTEM_PROMPT,
  });

  console.log(text);
};

await jsExpert({ prompt: "What's JavaScript?", model: gemini });
await jsExpert({ prompt: "What's JavaScript?", model: claude });

本篇文章將實作簡單的 CLI 聊天介面，來介紹 AI SDK 所提供的 API。

環境設定

安裝

首先找一個平常在開發的檔案夾內，進行以下環境設定

> git init
> npx gitignore node
> pnpm init
> pnpm add -D @types/dotenv-safe @types/node @typescript-eslint/eslint-plugin @typescript-eslint/parser eslint tsx typescript
> pnpm add @ai-sdk/google ai dotenv dotenv-safe zod

在 package.json 上加入

{
  ...
  "scripts": {
    "build": "tsc",
    "start": "node dist/main.js",
    "dev": "tsx src/main.ts",
    "lint": "eslint src --ext .ts",
    "clean": "rm -rf dist"
  },
  ...
}

generateText

generateText 用法非常簡單，給定所要使用的模型 model 以及提示詞 prompt 就會得到大型語言模型的回覆。

// main.ts
import * as dotenvSafe from "dotenv-safe";
import { google } from "@ai-sdk/google";
import { generateText } from "ai";

dotenvSafe.config();

const model = google("gemini-2.0-flash-001");

const { text } = await generateText({
  model,
  prompt: "What's the ECMAScript? Please answer the question concisely.",
});

console.log(text);

streamText

儘管 generateText 能夠一次性返回完整的結果，但由於大型語言模型 (LLM) 本身的運算和傳輸延遲，使用者從提交問題到看到完整回覆之間仍需經歷等待時間，這延遲會導致使用者體驗不佳。streamText 的即時串流 (token-by-token) 功能便能有效解決這個問題。

import * as dotenvSafe from "dotenv-safe";
import { google } from "@ai-sdk/google";
import { generateText, streamText } from "ai";

dotenvSafe.config();

const model = google("gemini-2.0-flash-001");

const { textStream } = streamText({
  model,
  prompt: "What is ECMAScript?",
  // Sometimes the AI has to follow a specific behavior, no matter the prompt it receives. -- use system, it can be use both generateText and streamText
  system:
    "You are a JavaScript expert. Please answer the user's question concisely.",
});

for await (let text of textStream) {
  process.stdout.write(text);
}

streamText 會以片段 (chunk) 的方式將結果回傳到前端，如此一來，前端就能即時渲染這些部分文字，大幅提升內容呈現的流暢度。

系統提示詞 (system prompt)

當我們需要 AI 遵循特定的行為時，系統提示詞就是一個可以事先定義方法，先前我們已經看過可以放提示詞在 system 的欄位，另外一種方式則是可以將提示詞放在 messages 的陣列中。

const { textStream } = streamText({
  model,
  prompt: "What is ECMAScript?",
  messages: [{ role: "system", content: "You are a JavaScript expert. Please answer the user's question concisely." }]
});

CLI 聊天介面

首先列出 CLI 聊天介面主要的 command 功能

history: 顯示你與系統的聊天記錄
help: 支援 command 的列表
exit: 離開聊天

從 Chat 這個 class 的 start 方法開始會不斷地去讀取使用者的輸入，如果輸入是以 / 開頭，則呼叫 handleCommand，如果不是則呼叫 streamResponse。這部分就是使用 AI SDK 跟我們上面提到的使用方法一樣，主要是多維護了使用者的聊天記錄 this.messages。

// main.js

const dotenvSafe = require("dotenv-safe");
const { google } = require("@ai-sdk/google");
const { streamText } = require("ai");
const readline = require("readline");

dotenvSafe.config();

const gemini = google("gemini-2.0-flash-001");

class Chat {
    rl;
    messages = [
        /** System prompt can also be put it as the first chat in the history */
        {
            role: "system",
            content:
                "You are a helpful assistant. Please answer the user's questions concisely and helpfully.",
        },
    ];

    constructor() {
        this.rl = readline.createInterface({
            input: process.stdin,
            output: process.stdout,
        });
    }

    async streamResponse(prompt) {
        this.messages.push({ role: "user", content: prompt });

        try {
            const { textStream } = streamText({
                model: gemini,
                messages: this.messages,
            });

            let assistantResponse = "";
            process.stdout.write(`\nAssistant: `);

            for await (let text of textStream) {
                process.stdout.write(text);
                assistantResponse += text;
            }

            this.messages.push({ role: "assistant", content: assistantResponse });
            console.log("\n");
        } catch (err) {
            console.error(err);
        }
    }

    async handleCommand(input) {
        const command = input.slice(1).toLowerCase();

        switch (command) {
            case "help":
                console.log(`
  Commands:
    /help     - Show this help
    /history  - Show conversation history
    /exit     - Exit chat

  Just type your message to chat!
  `);
                return false;

            case "history":
                console.log(JSON.stringify(this.messages, null, 2));
                return false;

            case "exit":
                console.log("See ya!");
                return true;

            default:
                console.log(
                    `Unknown command: ${command}\nType /help for available commands\n`,
                );
                return false;
        }
    }

    async start() {
        while (true) {
            try {
                // const input: string becomes const input
                const input = await new Promise((res) =>
                    this.rl.question("You: ", res),
                );

                if (!input.trim()) continue;

                if (input.startsWith("/")) {
                    const shouldExit = await this.handleCommand(input);
                    if (shouldExit) break;
                    continue;
                }

                await this.streamResponse(input);
            } catch (error) {
                console.error(`Error: ${error}`);
                break;
            }
        }

        this.rl.close();
    }
}

async function main() {
    console.log("Type /help for commands\n");

    const chat = new Chat();
    await chat.start();
}

process.on("SIGINT", () => {
    console.log("\nGoodbye!");
    process.exit(0);
});

main().catch(console.error);

接下來就可以透過 pnpm dev 與 CLI 聊天介面進行交談

> pnpm dev
> [email protected] dev /Users/jinghuangsu/workspace/ai/ai-sdk
> tsx src/main.ts

Type /help for commands

You: hi

Assistant: Hi there! How can I help you today?
You: /history
[
  {
    "role": "user",
    "content": "hi"
  },
  {
    "role": "assistant",
    "content": "Hi there! How can I help you today?\n"
  }
]

進階

接下來將介紹其他 API，讀者們也可以在閱讀時，同時對 Chat CLI 進行功能上的新增。

generateObject

使用 Zod 搭配 generateObject，能夠確保 AI 回傳預先定義的物件格式。舉例來說，當你要向 AI 詢問一份食譜時，如果它回傳的是一大串連續的文字段落，相對來說，要吸收並利用這些資訊會比閱讀結構化的格式困難得多。

在這種情況下，generateObject 就特別好用：它能將 AI 非結構化輸出轉化為結構化的資料。

import { google } from "@ai-sdk/google";
import { generateObject } from 'ai';
import { z } from 'zod';

const model = google("gemini-2.0-flash-001");

const { object } = await generateObject({
  model,
  schema: z.object({
    recipe: z.object({
      name: z.string(),
      ingredients: z.array(z.string()),
      steps: z.array(z.string()),
    }),
  }),
  prompt: 'Generate a Mapo Tofu recipe.',
});

console.log(JSON.stringify(object, null, 2));

當想要生成一個特定的列舉（Enum）值時，也可以使用這個 API。舉例來說，在這個負面資訊爆炸的時代，如果想先用 AI 針對某篇文章判斷正向或負面情緒時，我們就可以預先在 generateObject 中定義好。

import { google } from "@ai-sdk/google";
import { generateObject } from 'ai';

const model = google("gemini-2.0-flash-001");

const article = "Today is a good day";

const { result } = await generateObject({
  model,
  output: 'enum',
  enum: ['positive', 'negative', 'neatural'],
  system: `You are a professional sentiment analyst. Your task is to determine the overall emotional tone of the article content provided by the user. You must select ONLY the most fitting result from 'positive', 'negative', or 'neutral' as your output, and must NOT include any explanation or additional text.`,
  prompt: article
});

// "positive"
console.log(result);

streamObject

streamObject 與 streamText 想要解決的問題是一樣的，以 generateObject 的範例來改寫，使用 streamObject 就會是以下

import { google } from "@ai-sdk/google";
import { streamObject } from 'ai';
import { z } from 'zod';

const model = google("gemini-2.0-flash-001");

const result = await streamObject({
  model,
  schema: z.object({
    recipe: z.object({
      name: z.string(),
      ingredients: z.array(z.string()),
      steps: z.array(z.string()),
    }),
  }),
  prompt: 'Generate a Mapo Tofu recipe.',
});

for await (let chunk of result.partialObjectStream) {
    console.clear();
    console.dir(chunk, { depth: null });
}

const finalObject = await result.object;

console.log(finalObject)

非文字的輸入

當今大語言模型都是多模態的，AI SDK 也支援非文字的輸入。舉例來說，若要請 AI 描述一張圖片，可以直接透過先前的任何一個 API 達成，只要將輸入的參數稍作修改即可

const imgURL = "https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png"

const { text } = await generateText({
  model: gemini,
  system: `You will receive an image. Please describe it concisely, ensuring the output length is within 100 words.`,
  messages: [
    {
      role: "user",
      content: [
        {
          type: "image",
          image: new URL(imgURL),
        },
      ],
    },
  ],
});

console.log(text)

// You: https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png

// Assistant: "The image illustrates the concepts of \"Debounce\" and \"Throttle\" using simple diagrams. \"Debounce\" is represented by a series of close arrows merging into a single arrow pointing towards a stick figure. \"Throttle\" shows three vertical arrows stemming from a horizontal line and then flowing into a single arrow directed at a stick figure. The stick figures are essentially the same, each pointing a finger toward the arrows. The background is a plain, light color, ensuring the focus remains on the diagrams and text.\n"

結語

這篇文章簡單介紹了 AI SDK 的基本 API，也非常推薦讀者們去看 AI SDK 的官網，非常詳細且完整！