Back to skills
extension
Category: Data & AnalyticsAPI key required

CrawlHub

CrawlHub is a professional web data extraction platform that provides structured data from social media and messaging platforms (X/Twitter, Instagram, Telegr...

personAuthor: wolflabs88hubclawhub

CrawlHub Integration Skill

CrawlHub is a professional web data extraction platform that provides structured, normalized data from major social media and messaging platforms — via a clean REST API.

What CrawlHub Does

CrawlHub handles all the hard parts of web scraping:

  • Proxies & rate limit handling — avoiding IP blocks
  • Anti-bot circumvention — making requests look like real browsers
  • Parsing & normalization — turning raw HTML/JSON into clean structured records
  • Data delivery — via API (JSON), webhook, or push to S3/Postgres/warehouse

Supported platforms include: X/Twitter, Instagram, Telegram, LinkedIn, YouTube, TikTok, Facebook, Threads — and more.

Platform Overview

| Platform | Data Types Available | |---|---| | X / Twitter | User profiles, tweets, timelines, search, trending topics | | Instagram | User profiles, posts, comments, hashtags, followers | | Telegram | Channels, messages, groups, public content | | LinkedIn | Company profiles, posts, job listings, people data | | YouTube | Video metadata, channels, comments, search | | TikTok | User profiles, videos, trending content | | Facebook | Pages, posts, groups, public content | | Threads | Posts, user profiles, threads search | | + more | CrawlHub adds new platforms regularly |

API Reference

Base URL: https://api.thecrawlhub.com/api/v1

Authentication:

  • Login: POST /auth/login with {"email": "...", "password": "..."} → returns access_token and refresh_token
  • Use: Authorization: Bearer {access_token} header on all requests
  • Refresh: POST /auth/refresh with {"refresh_token": "..."}

Key Endpoints:

Platform Discovery

GET /scraper/platforms                          → List all available platforms
GET /scraper/platforms/{platform_id}             → List modules & endpoints of a platform
GET /scraper/endpoints/{endpoint_id}           → Get detailed info for a specific endpoint

Data Execution

GET  /execution/endpoints/{endpoint_id}/execute     → Execute with query params
POST /execution/endpoints/{endpoint_id}/execute     → Execute with JSON body
PATCH /execution/endpoints/{endpoint_id}/execute    → Partial update style execution
PUT  /execution/endpoints/{endpoint_id}/execute     → Full replacement style execution
DELETE /execution/endpoints/{endpoint_id}/execute    → Delete style execution

Authentication & Users

POST /auth/register       → Register new account
POST /auth/login          → Login (email + password)
POST /auth/refresh        → Refresh access token
POST /auth/logout         → Revoke tokens
POST /auth/password-reset → Request password reset email
GET  /auth/token-validate  → Validate current JWT

Team Management

GET  /teams                        → List user's teams
POST /teams                        → Create a new team
GET  /teams/{team_id}              → List team members
POST /teams/{team_id}/invite       → Invite member to team
DELETE /teams/{team_id}/{member_id} → Remove member
GET  /teams/{team_id}/permissions  → Get current user's permissions
PUT  /teams/{team_id}/{member_id}/role → Change member role
GET  /teams/roles                  → List available team roles
GET  /teams/invite/validate        → Validate invite token
POST /teams/invite/accept          → Accept team invite

API Keys (Team)

GET  /teams/{team_id}/api-keys              → List team's API keys
POST /teams/{team_id}/api-keys              → Create new API key
PATCH /teams/{team_id}/api-keys/{api_key_id} → Enable/disable key
GET  /teams/{team_id}/api-keys/{api_key_id}/permissions → Get permission tree for a key
PUT  /teams/{team_id}/api-keys/{api_key_id}/permissions → Sync/set permissions

Billing & Subscription

GET /teams/{team_id}/billing/cycle          → Current billing cycle
GET /teams/{team_id}/billing/transactions   → Transaction history (paginated)
GET /teams/{team_id}/billing/wallet          → Wallet balance
GET /teams/{team_id}/subscription           → Current subscription plan
POST /teams/{team_id}/subscription          → Switch to different plan
PATCH /teams/{team_id}/subscription/policy  → Update subscription policy
GET /plans                                  → List all available plans

Request Logs

GET /teams/{team_id}/scraper/endpoints/{endpoint_id}/logs  → Request logs for an endpoint
     Query params: page, per_page, from, to, status_code, sort_key, sort_order

User Profile

GET    /user/info    → Get current user info
PATCH  /user/update  → Update profile (name, address, phone, company)

Pricing Model

CrawlHub uses a per-record pricing model:

| Plan | Price | Rate Limit | Best For | |---|---|---|---| | Pay as you go | $1.79 / 1,000 records | 50 req/15min/endpoint | Testing, prototyping | | Scaler | $299/month | 150 req/15min/endpoint | Teams in production | | Business | $999/month | 600 req/15min/endpoint | High-scale data pipelines | | Enterprise | Custom | Custom | Unique requirements, SLAs |

Rate limits are per endpoint. Records are counted in the response (not requests).

Execution Response Format

Successful execution returns:

{
  "data": {
    "records": [
      { "title": "...", "url": "...", "created_at": "...", ... }
    ]
  },
  "http_status": 200
}

Error responses include kind (e.g., BAD_INPUT, ABORT_ERROR, HTTP_ERROR, REGISTRY_ERROR) and details.

Use Cases

  • Brand Intelligence — Monitor brand mentions, sentiment, emerging narratives
  • Competitive Intelligence — Track competitor content, launches, audience movements
  • Threat Intelligence — Surface threats, leaks, coordinated inauthentic activity
  • Crypto & Web3 Intelligence — Monitor tokens, projects, communities across X + Telegram
  • News & Media Monitoring — Breaking event coverage across platforms
  • Lead Generation — Build targeted outreach lists from public platform data
  • Academic Research — Collect public social data for research projects

Authentication Flow (Step by Step)

  1. Register or Login to get tokens:

    POST /auth/login
    Body: {"email": "user@example.com", "password": "password"}
    
    Response: {"data": {"access_token": "...", "refresh_token": "..."}}
    
  2. Use the access token in all subsequent requests:

    Authorization: Bearer eyJhbGc...
    
  3. When token expires, refresh:

    POST /auth/refresh
    Body: {"refresh_token": "eyJhbGc..."}
    
  4. Discover platforms and endpoints:

    GET /scraper/platforms
    GET /scraper/platforms/{platform_id}
    GET /scraper/endpoints/{endpoint_id}
    
  5. Execute an endpoint to get data:

    GET /execution/endpoints/{endpoint_id}/execute?param1=value1&param2=value2
    POST /execution/endpoints/{endpoint_id}/execute
    Body (JSON): {"param1": "value1", "param2": "value2"}
    

Error Handling

| HTTP Status | Kind | Cause | |---|---|---| | 400 | BAD_INPUT | Invalid request parameters | | 401 | AUTH_HEADER_FORMAT | Missing or malformed Authorization header | | 401 | INVALID_CREDENTIALS | Wrong email/password | | 403 | ABORT_ERROR | Permission denied (endpoint-level) | | 404 | REGISTRY_ERROR | Endpoint not found | | 405 | METHOD_NOT_ALLOWED | Wrong HTTP method for endpoint | | 502 | HTTP_ERROR | Upstream platform returned error | | 503 | ABORT_ERROR | Server busy, retry later |

Best Practices

  • Use idempotent retries — pass X-Request-ID header when retrying to avoid duplicate billing
  • Check /plans — before executing to understand your current plan's rate limits
  • Monitor usage — via /teams/{team_id}/billing/transactions and request logs
  • Handle 503s gracefully — implement exponential backoff when server is busy
  • Store access tokens securely — never log them; refresh before expiry

Notes

  • All timestamps are ISO 8601 / date-time format
  • Pagination uses page + per_page (max 100 per page)
  • All list endpoints return paged results
  • API keys (team-level) can have custom permission trees — useful for granular access control
  • CrawlHub adds new platforms and endpoints regularly — check /scraper/platforms periodically