Browserless Cloud Browser Automation

Complete guide for Browserless - headless Chrome as a service.

Quick Reference

Key Features

| Feature | Description | |---------|-------------| | REST API | HTTP endpoints for screenshots, PDFs, scraping | | WebSocket | Puppeteer/Playwright connection | | Screenshots | Page captures at scale | | PDF | Document generation | | Scraping | Data extraction | | Functions | Custom script execution |

Endpoints

/screenshot - Capture screenshots
/pdf - Generate PDFs
/content - Get page HTML
/scrape - Extract data
/function - Run custom code

Setup

Self-Hosted (Docker)

docker run -d \
  -p 3000:3000 \
  -e "TOKEN=your-token" \
  -e "CONCURRENT=10" \
  -e "QUEUED=10" \
  --name browserless \
  ghcr.io/browserless/chromium

Docker Compose

services:
  browserless:
    image: ghcr.io/browserless/chromium
    ports:
      - "3000:3000"
    environment:
      - TOKEN=your-secure-token
      - CONCURRENT=10
      - QUEUED=50
      - TIMEOUT=60000
      - MAX_PAYLOAD_SIZE=5mb
      - HEALTH_CHECK=true
    restart: unless-stopped
    deploy:
      resources:
        limits:
          memory: 4G

Cloud Service

Sign up at browserless.io
Get API token
Use: wss://chrome.browserless.io?token=YOUR_TOKEN

REST API

Screenshots

# Basic screenshot
curl -X POST "http://localhost:3000/screenshot?token=your-token" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com"}' \
  -o screenshot.png

# With options
curl -X POST "http://localhost:3000/screenshot?token=your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "options": {
      "fullPage": true,
      "type": "png",
      "quality": 80
    },
    "viewport": {
      "width": 1920,
      "height": 1080
    },
    "waitForSelector": "#content"
  }' \
  -o screenshot.png

PDF Generation

curl -X POST "http://localhost:3000/pdf?token=your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "options": {
      "format": "A4",
      "printBackground": true,
      "margin": {
        "top": "1cm",
        "right": "1cm",
        "bottom": "1cm",
        "left": "1cm"
      }
    }
  }' \
  -o document.pdf

Scrape Data

curl -X POST "http://localhost:3000/scrape?token=your-token" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com",
    "elements": [
      {"selector": "h1", "name": "title"},
      {"selector": ".price", "name": "price"},
      {"selector": "img", "name": "images", "attr": "src"}
    ]
  }'

JavaScript Integration

Using Puppeteer

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.connect({
    browserWSEndpoint: "ws://localhost:3000?token=your-token",
  });

  const page = await browser.newPage();
  await page.goto("https://example.com");
  const title = await page.title();
  console.log("Title:", title);

  await browser.close();
})();

Using Playwright

const { chromium } = require("playwright");

(async () => {
  const browser = await chromium.connectOverCDP(
    "ws://localhost:3000?token=your-token"
  );

  const context = await browser.newContext();
  const page = await context.newPage();

  await page.goto("https://example.com");
  const title = await page.title();
  console.log("Title:", title);

  await browser.close();
})();

REST API with Fetch

async function takeScreenshot(url) {
  const response = await fetch(
    "http://localhost:3000/screenshot?token=your-token",
    {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        url,
        options: { fullPage: true },
      }),
    }
  );
  return Buffer.from(await response.arrayBuffer());
}

async function scrapeData(url, elements) {
  const response = await fetch(
    "http://localhost:3000/scrape?token=your-token",
    {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ url, elements }),
    }
  );
  return await response.json();
}

Python Integration

Using Playwright

from playwright.async_api import async_playwright
import asyncio

async def main():
    async with async_playwright() as p:
        browser = await p.chromium.connect_over_cdp(
            'ws://localhost:3000?token=your-token'
        )
        page = await browser.new_page()
        await page.goto('https://example.com')
        title = await page.title()
        print(f'Title: {title}')
        await browser.close()

asyncio.run(main())

REST API with Requests

import requests

def screenshot(url, options=None):
    response = requests.post(
        'http://localhost:3000/screenshot',
        params={'token': 'your-token'},
        json={
            'url': url,
            'options': options or {'fullPage': True}
        }
    )
    return response.content

def scrape(url, elements):
    response = requests.post(
        'http://localhost:3000/scrape',
        params={'token': 'your-token'},
        json={'url': url, 'elements': elements}
    )
    return response.json()

# Usage
screenshot_data = screenshot('https://example.com')
with open('screenshot.png', 'wb') as f:
    f.write(screenshot_data)

Function API

Custom Functions

// POST /function
{
  "code": `
    export default async ({ page }) => {
      await page.goto('https://example.com');
      await page.waitForSelector('.loaded');

      const data = await page.evaluate(() => {
        return {
          title: document.title,
          items: Array.from(document.querySelectorAll('.item'))
            .map(el => ({
              name: el.querySelector('.name').textContent,
              price: el.querySelector('.price').textContent
            }))
        };
      });

      return { data, type: 'application/json' };
    }
  `
}

Advanced Options

Viewport and Device

{
  "url": "https://example.com",
  "viewport": {
    "width": 375,
    "height": 812,
    "deviceScaleFactor": 2,
    "isMobile": true,
    "hasTouch": true
  }
}

Wait Conditions

{
  "url": "https://example.com",
  "waitForSelector": "#content",
  "waitForTimeout": 2000,
  "waitForEvent": "networkidle0",
  "waitForFunction": "window.loaded === true"
}

Authentication

{
  "url": "https://example.com",
  "authenticate": {
    "username": "user",
    "password": "pass"
  },
  "cookies": [
    {
      "name": "session",
      "value": "abc123",
      "domain": "example.com"
    }
  ]
}

Docker Configuration

Full Configuration

services:
  browserless:
    image: ghcr.io/browserless/chromium
    ports:
      - "3000:3000"
    environment:
      - TOKEN=your-secure-token
      - CONCURRENT=10
      - QUEUED=50
      - TIMEOUT=60000
      - CONNECTION_TIMEOUT=30000
      - MAX_PAYLOAD_SIZE=10mb
      - ENABLE_CORS=true
      - HEALTH_CHECK=true
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/"]
      interval: 30s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: "2"
          memory: 4G

Monitoring

# Health check
curl http://localhost:3000/

# Queue stats
curl http://localhost:3000/stats

# Prometheus metrics
curl http://localhost:3000/metrics

Best Practices

Use tokens - Secure your instance
Set timeouts - Prevent hanging sessions
Limit concurrency - Based on resources
Close sessions - Prevent memory leaks
Use health checks - Monitor availability
Queue management - Handle request bursts
Retry logic - Handle transient failures
Resource limits - Set Docker constraints
Use functions - For complex workflows
Monitor metrics - Track performance

When to Use This Skill

Cloud-based browser automation
Scalable screenshot services
PDF generation at scale
Web scraping infrastructure
Headless Chrome as a service
Parallel browser operations