返回 Skill 列表
extension
分类: 内容与媒体无需 API Key

huggingface-js

在浏览器和Node.js中使用Transformers.js和Hugging Face推理API运行机器学习模型。当需要添加本地推理、嵌入或调用托管模型而无需GPU服务器时使用。

person作者: jakexiaohubgithub

Hugging Face JavaScript

Run ML models locally with Transformers.js or via the Inference API. Supports text generation, embeddings, image classification, speech recognition, and more.

Transformers.js (Local Inference)

Run models directly in browser or Node.js using ONNX Runtime.

npm install @huggingface/transformers

Text Generation

import { pipeline } from '@huggingface/transformers';

const generator = await pipeline('text-generation', 'Xenova/gpt2');

const result = await generator('The quick brown fox', {
  max_new_tokens: 50,
});

console.log(result[0].generated_text);

Text Classification (Sentiment)

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'text-classification',
  'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
);

const result = await classifier('I love this product!');
// [{ label: 'POSITIVE', score: 0.9998 }]

Embeddings

import { pipeline } from '@huggingface/transformers';

const embedder = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2'
);

const result = await embedder('Hello, world!', {
  pooling: 'mean',
  normalize: true,
});

const embedding = Array.from(result.data);
// [0.123, -0.456, ...] - 384 dimensions

Question Answering

import { pipeline } from '@huggingface/transformers';

const qa = await pipeline(
  'question-answering',
  'Xenova/distilbert-base-cased-distilled-squad'
);

const result = await qa({
  question: 'What is the capital of France?',
  context: 'France is a country in Europe. Paris is the capital of France.',
});

console.log(result);
// { answer: 'Paris', score: 0.98, start: 42, end: 47 }

Translation

import { pipeline } from '@huggingface/transformers';

const translator = await pipeline(
  'translation',
  'Xenova/nllb-200-distilled-600M'
);

const result = await translator('Hello, how are you?', {
  src_lang: 'eng_Latn',
  tgt_lang: 'fra_Latn',
});

console.log(result[0].translation_text);

Speech Recognition (Whisper)

import { pipeline } from '@huggingface/transformers';

const transcriber = await pipeline(
  'automatic-speech-recognition',
  'Xenova/whisper-tiny.en'
);

const result = await transcriber('./audio.mp3');
console.log(result.text);

Image Classification

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'image-classification',
  'Xenova/vit-base-patch16-224'
);

const result = await classifier('https://example.com/cat.jpg');
// [{ label: 'tabby cat', score: 0.95 }, ...]

Object Detection

import { pipeline } from '@huggingface/transformers';

const detector = await pipeline(
  'object-detection',
  'Xenova/detr-resnet-50'
);

const result = await detector('https://example.com/image.jpg');
// [{ label: 'cat', score: 0.98, box: { xmin, ymin, xmax, ymax } }, ...]

Zero-Shot Classification

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'zero-shot-classification',
  'Xenova/bart-large-mnli'
);

const result = await classifier(
  'This is a tutorial about machine learning',
  ['education', 'politics', 'sports']
);

console.log(result);
// { labels: ['education', ...], scores: [0.95, ...] }

Hugging Face Inference API

Call hosted models without local computation.

npm install @huggingface/inference

Setup

import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

Text Generation

const result = await hf.textGeneration({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  inputs: 'What is the meaning of life?',
  parameters: {
    max_new_tokens: 100,
    temperature: 0.7,
  },
});

console.log(result.generated_text);

Streaming Text Generation

const stream = hf.textGenerationStream({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  inputs: 'Tell me a story',
  parameters: {
    max_new_tokens: 200,
  },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.token.text);
}

Chat Completion

const result = await hf.chatCompletion({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  messages: [
    { role: 'user', content: 'Hello!' },
  ],
  max_tokens: 100,
});

console.log(result.choices[0].message.content);

Embeddings

const result = await hf.featureExtraction({
  model: 'sentence-transformers/all-MiniLM-L6-v2',
  inputs: 'Hello, world!',
});

console.log(result); // embedding vector

Image Generation

const result = await hf.textToImage({
  model: 'stabilityai/stable-diffusion-2',
  inputs: 'A futuristic city at sunset',
  parameters: {
    negative_prompt: 'blurry, low quality',
  },
});

// result is a Blob
const buffer = Buffer.from(await result.arrayBuffer());
fs.writeFileSync('output.png', buffer);

Image Classification

const result = await hf.imageClassification({
  model: 'google/vit-base-patch16-224',
  data: await fs.openAsBlob('cat.jpg'),
});

console.log(result);
// [{ label: 'tabby cat', score: 0.95 }, ...]

Speech Recognition

const result = await hf.automaticSpeechRecognition({
  model: 'openai/whisper-large-v3',
  data: await fs.openAsBlob('audio.mp3'),
});

console.log(result.text);

Inference Endpoints

For dedicated hosted models.

import { InferenceClient } from '@huggingface/inference';

const client = new InferenceClient(process.env.HF_ACCESS_TOKEN);
const endpoint = client.endpoint('https://your-endpoint.endpoints.huggingface.cloud');

const result = await endpoint.textGeneration({
  inputs: 'Hello, world!',
});

Next.js Integration

// app/api/generate/route.ts
import { HfInference } from '@huggingface/inference';
import { NextResponse } from 'next/server';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const result = await hf.textGeneration({
    model: 'meta-llama/Llama-2-7b-chat-hf',
    inputs: prompt,
    parameters: {
      max_new_tokens: 200,
    },
  });

  return NextResponse.json({ text: result.generated_text });
}

Streaming Response

// app/api/stream/route.ts
import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const stream = hf.textGenerationStream({
    model: 'meta-llama/Llama-2-7b-chat-hf',
    inputs: prompt,
    parameters: { max_new_tokens: 200 },
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        controller.enqueue(encoder.encode(chunk.token.text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { 'Content-Type': 'text/plain' },
  });
}

Browser Usage

Transformers.js works in the browser with WebGPU acceleration.

<script type="module">
  import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';

  const classifier = await pipeline('text-classification');
  const result = await classifier('I love this!');
  console.log(result);
</script>

With WebGPU

import { pipeline, env } from '@huggingface/transformers';

// Enable WebGPU
env.backends.onnx.wasm.proxy = true;

const classifier = await pipeline('text-classification', 'model-name', {
  device: 'webgpu',
});

Configuration

import { env } from '@huggingface/transformers';

// Cache settings
env.cacheDir = './models';
env.localModelPath = './local-models';

// Disable remote models (offline mode)
env.allowRemoteModels = false;

// Disable local models
env.allowLocalModels = false;

Available Tasks

| Task | Pipeline | Example Model | |------|----------|---------------| | Text Classification | text-classification | distilbert-base-uncased-finetuned-sst-2-english | | Text Generation | text-generation | gpt2, llama | | Question Answering | question-answering | distilbert-base-cased-distilled-squad | | Summarization | summarization | t5-small | | Translation | translation | nllb-200-distilled-600M | | Feature Extraction | feature-extraction | all-MiniLM-L6-v2 | | Image Classification | image-classification | vit-base-patch16-224 | | Object Detection | object-detection | detr-resnet-50 | | Speech Recognition | automatic-speech-recognition | whisper-tiny | | Zero-Shot Classification | zero-shot-classification | bart-large-mnli |

Environment Variables

HF_ACCESS_TOKEN=hf_xxxxxxxx

Best Practices

  1. Cache models - Download once, reuse
  2. Use WebGPU - Faster inference in browsers
  3. Choose small models - For client-side use
  4. Stream responses - Better UX for generation
  5. Use Inference API - For large models
  6. Consider endpoints - For production workloads
  7. Quantized models - Smaller, faster (look for ONNX models)