Back to skills
extension
Category: Content & MediaNo API key required

huggingface-js

Runs ML models in the browser and Node.js with Transformers.js and Hugging Face Inference API. Use when adding local inference, embeddings, or calling hosted models without GPU servers.

personAuthor: jakexiaohubgithub

Hugging Face JavaScript

Run ML models locally with Transformers.js or via the Inference API. Supports text generation, embeddings, image classification, speech recognition, and more.

Transformers.js (Local Inference)

Run models directly in browser or Node.js using ONNX Runtime.

npm install @huggingface/transformers

Text Generation

import { pipeline } from '@huggingface/transformers';

const generator = await pipeline('text-generation', 'Xenova/gpt2');

const result = await generator('The quick brown fox', {
  max_new_tokens: 50,
});

console.log(result[0].generated_text);

Text Classification (Sentiment)

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'text-classification',
  'Xenova/distilbert-base-uncased-finetuned-sst-2-english'
);

const result = await classifier('I love this product!');
// [{ label: 'POSITIVE', score: 0.9998 }]

Embeddings

import { pipeline } from '@huggingface/transformers';

const embedder = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2'
);

const result = await embedder('Hello, world!', {
  pooling: 'mean',
  normalize: true,
});

const embedding = Array.from(result.data);
// [0.123, -0.456, ...] - 384 dimensions

Question Answering

import { pipeline } from '@huggingface/transformers';

const qa = await pipeline(
  'question-answering',
  'Xenova/distilbert-base-cased-distilled-squad'
);

const result = await qa({
  question: 'What is the capital of France?',
  context: 'France is a country in Europe. Paris is the capital of France.',
});

console.log(result);
// { answer: 'Paris', score: 0.98, start: 42, end: 47 }

Translation

import { pipeline } from '@huggingface/transformers';

const translator = await pipeline(
  'translation',
  'Xenova/nllb-200-distilled-600M'
);

const result = await translator('Hello, how are you?', {
  src_lang: 'eng_Latn',
  tgt_lang: 'fra_Latn',
});

console.log(result[0].translation_text);

Speech Recognition (Whisper)

import { pipeline } from '@huggingface/transformers';

const transcriber = await pipeline(
  'automatic-speech-recognition',
  'Xenova/whisper-tiny.en'
);

const result = await transcriber('./audio.mp3');
console.log(result.text);

Image Classification

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'image-classification',
  'Xenova/vit-base-patch16-224'
);

const result = await classifier('https://example.com/cat.jpg');
// [{ label: 'tabby cat', score: 0.95 }, ...]

Object Detection

import { pipeline } from '@huggingface/transformers';

const detector = await pipeline(
  'object-detection',
  'Xenova/detr-resnet-50'
);

const result = await detector('https://example.com/image.jpg');
// [{ label: 'cat', score: 0.98, box: { xmin, ymin, xmax, ymax } }, ...]

Zero-Shot Classification

import { pipeline } from '@huggingface/transformers';

const classifier = await pipeline(
  'zero-shot-classification',
  'Xenova/bart-large-mnli'
);

const result = await classifier(
  'This is a tutorial about machine learning',
  ['education', 'politics', 'sports']
);

console.log(result);
// { labels: ['education', ...], scores: [0.95, ...] }

Hugging Face Inference API

Call hosted models without local computation.

npm install @huggingface/inference

Setup

import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

Text Generation

const result = await hf.textGeneration({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  inputs: 'What is the meaning of life?',
  parameters: {
    max_new_tokens: 100,
    temperature: 0.7,
  },
});

console.log(result.generated_text);

Streaming Text Generation

const stream = hf.textGenerationStream({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  inputs: 'Tell me a story',
  parameters: {
    max_new_tokens: 200,
  },
});

for await (const chunk of stream) {
  process.stdout.write(chunk.token.text);
}

Chat Completion

const result = await hf.chatCompletion({
  model: 'meta-llama/Llama-2-7b-chat-hf',
  messages: [
    { role: 'user', content: 'Hello!' },
  ],
  max_tokens: 100,
});

console.log(result.choices[0].message.content);

Embeddings

const result = await hf.featureExtraction({
  model: 'sentence-transformers/all-MiniLM-L6-v2',
  inputs: 'Hello, world!',
});

console.log(result); // embedding vector

Image Generation

const result = await hf.textToImage({
  model: 'stabilityai/stable-diffusion-2',
  inputs: 'A futuristic city at sunset',
  parameters: {
    negative_prompt: 'blurry, low quality',
  },
});

// result is a Blob
const buffer = Buffer.from(await result.arrayBuffer());
fs.writeFileSync('output.png', buffer);

Image Classification

const result = await hf.imageClassification({
  model: 'google/vit-base-patch16-224',
  data: await fs.openAsBlob('cat.jpg'),
});

console.log(result);
// [{ label: 'tabby cat', score: 0.95 }, ...]

Speech Recognition

const result = await hf.automaticSpeechRecognition({
  model: 'openai/whisper-large-v3',
  data: await fs.openAsBlob('audio.mp3'),
});

console.log(result.text);

Inference Endpoints

For dedicated hosted models.

import { InferenceClient } from '@huggingface/inference';

const client = new InferenceClient(process.env.HF_ACCESS_TOKEN);
const endpoint = client.endpoint('https://your-endpoint.endpoints.huggingface.cloud');

const result = await endpoint.textGeneration({
  inputs: 'Hello, world!',
});

Next.js Integration

// app/api/generate/route.ts
import { HfInference } from '@huggingface/inference';
import { NextResponse } from 'next/server';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const result = await hf.textGeneration({
    model: 'meta-llama/Llama-2-7b-chat-hf',
    inputs: prompt,
    parameters: {
      max_new_tokens: 200,
    },
  });

  return NextResponse.json({ text: result.generated_text });
}

Streaming Response

// app/api/stream/route.ts
import { HfInference } from '@huggingface/inference';

const hf = new HfInference(process.env.HF_ACCESS_TOKEN);

export async function POST(request: Request) {
  const { prompt } = await request.json();

  const stream = hf.textGenerationStream({
    model: 'meta-llama/Llama-2-7b-chat-hf',
    inputs: prompt,
    parameters: { max_new_tokens: 200 },
  });

  const encoder = new TextEncoder();
  const readable = new ReadableStream({
    async start(controller) {
      for await (const chunk of stream) {
        controller.enqueue(encoder.encode(chunk.token.text));
      }
      controller.close();
    },
  });

  return new Response(readable, {
    headers: { 'Content-Type': 'text/plain' },
  });
}

Browser Usage

Transformers.js works in the browser with WebGPU acceleration.

<script type="module">
  import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';

  const classifier = await pipeline('text-classification');
  const result = await classifier('I love this!');
  console.log(result);
</script>

With WebGPU

import { pipeline, env } from '@huggingface/transformers';

// Enable WebGPU
env.backends.onnx.wasm.proxy = true;

const classifier = await pipeline('text-classification', 'model-name', {
  device: 'webgpu',
});

Configuration

import { env } from '@huggingface/transformers';

// Cache settings
env.cacheDir = './models';
env.localModelPath = './local-models';

// Disable remote models (offline mode)
env.allowRemoteModels = false;

// Disable local models
env.allowLocalModels = false;

Available Tasks

| Task | Pipeline | Example Model | |------|----------|---------------| | Text Classification | text-classification | distilbert-base-uncased-finetuned-sst-2-english | | Text Generation | text-generation | gpt2, llama | | Question Answering | question-answering | distilbert-base-cased-distilled-squad | | Summarization | summarization | t5-small | | Translation | translation | nllb-200-distilled-600M | | Feature Extraction | feature-extraction | all-MiniLM-L6-v2 | | Image Classification | image-classification | vit-base-patch16-224 | | Object Detection | object-detection | detr-resnet-50 | | Speech Recognition | automatic-speech-recognition | whisper-tiny | | Zero-Shot Classification | zero-shot-classification | bart-large-mnli |

Environment Variables

HF_ACCESS_TOKEN=hf_xxxxxxxx

Best Practices

  1. Cache models - Download once, reuse
  2. Use WebGPU - Faster inference in browsers
  3. Choose small models - For client-side use
  4. Stream responses - Better UX for generation
  5. Use Inference API - For large models
  6. Consider endpoints - For production workloads
  7. Quantized models - Smaller, faster (look for ONNX models)