Vercel AI SDK 基础介绍
- 文章發表於
前言
Vercel 开源的 AI SDK 可以说是当前最好用的 AI 整合套件,在开发 AI 产品时,我们会根据不同大语言模型的优势应用在不同的场景,这时候开发者就需要写胶水代码 (glue code) 去整合各家大型语言模型的 API,AI SDK 直接解决了这个痛点,它提供单一的 API 接口即可无缝使用任何模型。
这里举一个例子,假设今天要开发一个 jsExpert
的 AI 专家,这时候想要整合 Google 的 Gemini 跟 Anthropic 的 Claude,使用 AI SDK 就可以抽象化 jsExpert
这个函数,传入所要使用的模型以及提示词即可,不需要额外用判断式去实现各平台的逻辑,因为 Vercel AI SDK 已经在底层帮你处理好了。
import { google } from "@ai-sdk/google";import { anthropic } from '@ai-sdk/anthropic';import { generateText } from "ai";const gemini = google("gemini-2.0-flash-001");const claude = anthropic("claude-sonnet-4-20250514");const SYSTEM_PROMPT = "You are a JavaScript expert. Please answer the user's question concisely."const jsExpert = async ({prompt,model,}) => {const { text } = await generateText({model,prompt,system: SYSTEM_PROMPT,});console.log(text);};await jsExpert({ prompt: "What's JavaScript?", model: gemini });await jsExpert({ prompt: "What's JavaScript?", model: claude });
本篇文章将实现简单的 CLI 聊天界面,来介绍 AI SDK 所提供的 API。
环境设置
安装
首先找一个平时在开发的文件夹内,进行以下环境设置
> git init> npx gitignore node> pnpm init> pnpm add -D @types/dotenv-safe @types/node @typescript-eslint/eslint-plugin @typescript-eslint/parser eslint tsx typescript> pnpm add @ai-sdk/google ai dotenv dotenv-safe zod
在 package.json
上加入
{..."scripts": {"build": "tsc","start": "node dist/main.js","dev": "tsx src/main.ts","lint": "eslint src --ext .ts","clean": "rm -rf dist"},...}
generateText
generateText
用法非常简单,给定所要使用的模型 model
以及提示词 prompt
就会得到大型语言模型的回复。
// main.tsimport * as dotenvSafe from "dotenv-safe";import { google } from "@ai-sdk/google";import { generateText } from "ai";dotenvSafe.config();const model = google("gemini-2.0-flash-001");const { text } = await generateText({model,prompt: "What's the ECMAScript? Please answer the question concisely.",});console.log(text);
streamText
尽管 generateText
能够一次性返回完整的结果,但由于大型语言模型 (LLM) 本身的运算和传输延迟,使用者从提交问题到看到完整回复之间仍需经历等待时间,这延迟会导致使用者体验不佳。streamText
的实时流式 (token-by-token) 功能便能有效解决这个问题。
import * as dotenvSafe from "dotenv-safe";import { google } from "@ai-sdk/google";import { generateText, streamText } from "ai";dotenvSafe.config();const model = google("gemini-2.0-flash-001");const { textStream } = streamText({model,prompt: "What is ECMAScript?",// Sometimes the AI has to follow a specific behavior, no matter the prompt it receives. -- use system, it can be use both generateText and streamTextsystem:"You are a JavaScript expert. Please answer the user's question concisely.",});for await (let text of textStream) {process.stdout.write(text);}
streamText
会以片段 (chunk) 的方式将结果回传到前端,如此一来,前端就能实时渲染这些部分文字,大幅提升内容呈现的流畅度。
系统提示词 (system prompt)
当我们需要 AI 遵循特定的行为时,系统提示词就是一个可以事先定义方法,先前我们已经看过可以放提示词在 system
的字段,另外一种方式则是可以将提示词放在 messages
的数组中。
const { textStream } = streamText({model,prompt: "What is ECMAScript?",messages: [{ role: "system", content: "You are a JavaScript expert. Please answer the user's question concisely." }]});
CLI 聊天界面
首先列出 CLI 聊天界面主要的 command 功能
history
: 显示你与系统的聊天记录help
: 支持 command 的列表exit
: 离开聊天
从 Chat 这个 class 的 start
方法开始会不断地去读取使用者的输入,如果输入是以 /
开头,则调用 handleCommand
,如果不是则调用 streamResponse
。这部分就是使用 AI SDK 跟我们上面提到的使用方法一样,主要是多维护了使用者的聊天记录 this.messages
。
// main.jsconst dotenvSafe = require("dotenv-safe");const { google } = require("@ai-sdk/google");const { streamText } = require("ai");const readline = require("readline");dotenvSafe.config();const gemini = google("gemini-2.0-flash-001");class Chat {rl;messages = [/** System prompt can also be put it as the first chat in the history */{role: "system",content:"You are a helpful assistant. Please answer the user's questions concisely and helpfully.",},];constructor() {this.rl = readline.createInterface({input: process.stdin,output: process.stdout,});}async streamResponse(prompt) {this.messages.push({ role: "user", content: prompt });try {const { textStream } = streamText({model: gemini,messages: this.messages,});let assistantResponse = "";process.stdout.write(`\nAssistant: `);for await (let text of textStream) {process.stdout.write(text);assistantResponse += text;}this.messages.push({ role: "assistant", content: assistantResponse });console.log("\n");} catch (err) {console.error(err);}}async handleCommand(input) {const command = input.slice(1).toLowerCase();switch (command) {case "help":console.log(`Commands:/help - Show this help/history - Show conversation history/exit - Exit chatJust type your message to chat!`);return false;case "history":console.log(JSON.stringify(this.messages, null, 2));return false;case "exit":console.log("See ya!");return true;default:console.log(`Unknown command: ${command}\nType /help for available commands\n`,);return false;}}async start() {while (true) {try {// const input: string becomes const inputconst input = await new Promise((res) =>this.rl.question("You: ", res),);if (!input.trim()) continue;if (input.startsWith("/")) {const shouldExit = await this.handleCommand(input);if (shouldExit) break;continue;}await this.streamResponse(input);} catch (error) {console.error(`Error: ${error}`);break;}}this.rl.close();}}async function main() {console.log("Type /help for commands\n");const chat = new Chat();await chat.start();}process.on("SIGINT", () => {console.log("\nGoodbye!");process.exit(0);});main().catch(console.error);
接下来就可以通过 pnpm dev
与 CLI 聊天界面进行交谈
> pnpm dev> [email protected] dev /Users/jinghuangsu/workspace/ai/ai-sdk> tsx src/main.tsType /help for commandsYou: hiAssistant: Hi there! How can I help you today?You: /history[{"role": "user","content": "hi"},{"role": "assistant","content": "Hi there! How can I help you today?\n"}]
进阶
接下来将介绍其他 API,读者们也可以在阅读时,同时对 Chat CLI 进行功能上的新增。
generateObject
使用 Zod 搭配 generateObject
,能够确保 AI 回传预先定义的对象格式。举例来说,当你要向 AI 询问一份食谱时,如果它回传的是一大串连续的文本段落,相对来说,要吸收并利用这些信息会比阅读结构化的格式困难得多。
在这种情况下,generateObject
就特别好用:它能将 AI 非结构化输出转化为结构化的数据。
import { google } from "@ai-sdk/google";import { generateObject } from 'ai';import { z } from 'zod';const model = google("gemini-2.0-flash-001");const { object } = await generateObject({model,schema: z.object({recipe: z.object({name: z.string(),ingredients: z.array(z.string()),steps: z.array(z.string()),}),}),prompt: 'Generate a Mapo Tofu recipe.',});console.log(JSON.stringify(object, null, 2));
当想要生成一个特定的枚举(Enum)值时,也可以使用这个 API。举例来说,在这个负面信息爆炸的时代,如果想先用 AI 针对某篇文章判断正向或负面情绪时,我们就可以预先在 generateObject
中定义好。
import { google } from "@ai-sdk/google";import { generateObject } from 'ai';const model = google("gemini-2.0-flash-001");const article = "Today is a good day";const { result } = await generateObject({model,output: 'enum',enum: ['positive', 'negative', 'neatural'],system: `You are a professional sentiment analyst. Your task is to determine the overall emotional tone of the article content provided by the user. You must select ONLY the most fitting result from 'positive', 'negative', or 'neutral' as your output, and must NOT include any explanation or additional text.`,prompt: article});// "positive"console.log(result);
streamObject
streamObject
与 streamText
想要解决的问题是一样的,以 generateObject
的范例来改写,使用 streamObject
就会是以下
import { google } from "@ai-sdk/google";import { streamObject } from 'ai';import { z } from 'zod';const model = google("gemini-2.0-flash-001");const result = await streamObject({model,schema: z.object({recipe: z.object({name: z.string(),ingredients: z.array(z.string()),steps: z.array(z.string()),}),}),prompt: 'Generate a Mapo Tofu recipe.',});for await (let chunk of result.partialObjectStream) {console.clear();console.dir(chunk, { depth: null });}const finalObject = await result.object;console.log(finalObject)
非文字的输入
当今大语言模型都是多模态的,AI SDK 也支持非文字的输入。举例来说,若要请 AI 描述一张图片,可以直接通过先前的任何一个 API 达成,只要将输入的参数稍作修改即可
const imgURL = "https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png"const { text } = await generateText({model: gemini,system: `You will receive an image. Please describe it concisely, ensuring the output length is within 100 words.`,messages: [{role: "user",content: [{type: "image",image: new URL(imgURL),},],},],});console.log(text)// You: https://imgs.parseweb.dev/images/fe-profermance/throttle-and-debounce/og.png// Assistant: "The image illustrates the concepts of \"Debounce\" and \"Throttle\" using simple diagrams. \"Debounce\" is represented by a series of close arrows merging into a single arrow pointing towards a stick figure. \"Throttle\" shows three vertical arrows stemming from a horizontal line and then flowing into a single arrow directed at a stick figure. The stick figures are essentially the same, each pointing a finger toward the arrows. The background is a plain, light color, ensuring the focus remains on the diagrams and text.\n"
结语
这篇文章简单介绍了 AI SDK 的基本 API,也非常推荐读者们去看 AI SDK 的官网,非常详细且完整!