JinDaGe - linkedin-spider MCP Details

article

README

🚀 LinkedIn 爬虫

轻松抓取 LinkedIn 数据，零检测风险。提取、导出并自动化处理你的 LinkedIn 数据。

🚀 快速开始

安装

选择你偏好的安装方式：

选项 1：使用 pip（推荐用于常规使用）

# 仅安装 Python 库
pip install linkedin-spider

# 安装用于命令行界面 (CLI) 的版本
pip install linkedin-spider[cli]

# 安装用于 MCP 服务器的版本
pip install linkedin-spider[mcp]

# 安装包含所有功能的版本（CLI + MCP + 库）
pip install linkedin-spider[all]

选项 2：使用 uv 进行开发设置

# 克隆仓库
git clone https://github.com/vertexcover-io/linkedin-spider
cd linkedin-spider
# 使用 uv 进行安装
uv sync

⚠️ 重要提示

认证方式更新：LinkedIn 加强了反爬虫机制，暂时影响了基于 Cookie 的认证方式。建议使用电子邮件/密码认证方式以确保可靠访问。我们正在积极恢复完整的 Cookie 认证支持。

✨ 主要特性

使用高级过滤器（地理位置、人脉类型、当前公司、职位）搜索 LinkedIn 个人资料。
根据关键词搜索并提取 LinkedIn 帖子，同时获取全面的元数据。
提取完整的个人资料信息（工作经历、教育背景、技能、联系方式）。
获取公司详细信息。
检索收到和发出的人脉请求。
向个人资料发送人脉请求。
获取对话列表和详细的对话历史记录。
内置反检测和会话管理功能。

💻 使用示例

1. Python 库

非常适合集成到你现有的 Python 应用程序中：

from linkedin_spider import LinkedinSpider, ScraperConfig

config = ScraperConfig(headless=True, page_load_timeout=30)

# 进行认证（使用电子邮件/密码或 Cookie）。
# 认证通常只需进行一次，会话会保存在 Chrome 配置文件中
scraper = LinkedinSpider(
    email="your_email@example.com",
    password="your_password",
    config=config
)

# 搜索个人资料
results = scraper.search_profiles("software engineer", max_results=10)

输出示例：

[
  {
    "name": "John Doe",
    "title": "Senior Software Engineer at Google",
    "location": "San Francisco, CA",
    "profile_url": "https://linkedin.com/in/johndoe",
    "connections": "500+"
  },
  {
    "name": "Jane Smith",
    "title": "Software Engineer at Microsoft",
    "location": "Seattle, WA",
    "profile_url": "https://linkedin.com/in/janesmith",
    "connections": "200+"
  }
]

# 根据关键词搜索帖子
posts = scraper.search_posts("artificial intelligence", max_results=10, scroll_pause=2.0)

输出示例：

[
  {
    "author_name": "John Doe",
    "author_headline": "AI Research Scientist at OpenAI",
    "author_profile_url": "https://linkedin.com/in/johndoe",
    "connection_degree": "2nd",
    "post_time": "2024-01-15T14:30:00+00:00",
    "post_text": "Excited to share our latest research on [large language models](https://example.com/paper)...",
    "hashtags": ["#AI", "#MachineLearning", "#Research"],
    "links": ["https://example.com/paper"],
    "post_url": "https://linkedin.com/feed/update/urn:li:activity:123456789",
    "media_urls": ["https://media.licdn.com/dms/image/..."],
    "likes_count": 1247,
    "comments_count": 89,
    "reposts_count": 234,
    "comments": [
      {
        "author_name": "Jane Smith",
        "author_profile_url": "https://linkedin.com/in/janesmith",
        "comment_text": "Great insights! Looking forward to reading the full paper.",
        "comment_time": "2024-01-15T15:45:00+00:00",
        "reactions_count": 12
      }
    ]
  }
]

# 抓取单个个人资料
profile = scraper.scrape_profile("https://linkedin.com/in/someone")

输出示例：

{
  "name": "John Doe",
  "title": "Senior Software Engineer",
  "location": "San Francisco, CA",
  "about": "Passionate software engineer with 8+ years of experience...",
  "experience": [
    {
      "title": "Senior Software Engineer",
      "company": "Google",
      "duration": "2021 - Present",
      "description": "Leading backend development for search infrastructure..."
    }
  ],
  "education": [
    {
      "school": "Stanford University",
      "degree": "BS Computer Science",
      "years": "2013 - 2017"
    }
  ],
  "skills": ["Python", "Java", "Kubernetes", "AWS"]
}

# 抓取公司信息
company = scraper.scrape_company("https://linkedin.com/company/tech-corp")

输出示例：

{
  "name": "TechCorp Inc",
  "industry": "Software Development",
  "company_size": "1,001-5,000 employees",
  "headquarters": "San Francisco, CA",
  "founded": "2010",
  "specialties": ["Cloud Computing", "AI/ML", "Data Analytics"],
  "description": "Leading technology company focused on enterprise solutions...",
  "website": "https://techcorp.com",
  "follower_count": "45,230"
}

# 别忘了清理资源
scraper.close()

更多示例请参考：examples

2. 命令行界面 (CLI)

非常适合快速提取数据和编写脚本：

# 如果通过 pip 安装
# 搜索个人资料
linkedin-spider-cli search -q "product manager" -n 10 -o results.json --email your@email.com --password yourpassword

# 搜索帖子
linkedin-spider-cli search-posts -k "artificial intelligence" -n 10 -s 2.0 -o posts.json --email your@email.com --password yourpassword

# 抓取单个个人资料
linkedin-spider-cli profile -u "https://linkedin.com/in/johndoe" -o profile.json --email your@email.com --password yourpassword

# 抓取公司信息
linkedin-spider-cli company -u "https://linkedin.com/company/openai" -o company.json --email your@email.com --password yourpassword

# 获取人脉请求
linkedin-spider-cli connections -n 20 -o connections.json --email your@email.com --password yourpassword

# 如果使用开发设置
# 搜索个人资料
uv run linkedin-spider-cli search -q "product manager" -n 10 -o results.json --email your@email.com --password yourpassword

# 搜索帖子
uv run linkedin-spider-cli search-posts -k "artificial intelligence" -n 10 -s 2.0 -o posts.json --email your@email.com --password yourpassword

# 抓取单个个人资料
uv run linkedin-spider-cli profile -u "https://linkedin.com/in/johndoe" -o profile.json --email your@email.com --password yourpassword

# 抓取公司信息
uv run linkedin-spider-cli company -u "https://linkedin.com/company/openai" -o company.json --email your@email.com --password yourpassword

# 获取人脉请求
uv run linkedin-spider-cli connections -n 20 -o connections.json --email your@email.com --password yourpassword

💡 使用建议

通常你只需提供一次 --email 和 --password。CLI 会保存你的认证会话，并在后续命令中重复使用，直到会话过期（通常为几小时或几天）。你也可以设置 LINKEDIN_EMAIL 和 LINKEDIN_PASSWORD 环境变量，以避免重复输入。

3. MCP 服务器

在 .env 文件中设置环境变量：

# 认证（选择一种方式）
LINKEDIN_EMAIL=your_email@example.com
LINKEDIN_PASSWORD=your_password
# 或者
LINKEDIN_COOKIE=your_li_at_cookie_value

# 配置
HEADLESS=true

# 传输方式（可选，默认为 stdio）
TRANSPORT=sse
HOST=127.0.0.1
PORT=8000

启动 MCP 服务器：

# 如果通过 pip 安装
# 显示可用的传输选项
linkedin-spider-mcp

# 使用特定的传输方式启动
linkedin-spider-mcp serve sse --email your@email.com --password yourpassword
linkedin-spider-mcp serve http --host 0.0.0.0 --port 9000 --email your@email.com --password yourpassword
linkedin-spider-mcp serve stdio --email your@email.com --password yourpassword

# 或者使用环境变量
TRANSPORT=sse linkedin-spider-mcp serve

# 如果使用开发设置
# 显示可用的传输选项
uv run linkedin-spider-mcp

# 使用特定的传输方式启动
uv run linkedin-spider-mcp serve sse --email your@email.com --password yourpassword
uv run linkedin-spider-mcp serve http --host 0.0.0.0 --port 9000 --email your@email.com --password yourpassword
uv run linkedin-spider-mcp serve stdio --email your@email.com --password yourpassword

# 或者使用环境变量
TRANSPORT=sse uv run linkedin-spider-mcp serve

与 Claude Code 集成

# 添加到 Claude Code
claude mcp add linkedin-spider --transport sse <server-url>
# 示例服务器 URL 格式：http://localhost:8080/sse

与 Claude Desktop 集成

将以下内容添加到你的 Claude Desktop 配置文件中：

Windows：%APPDATA%\Claude\claude_desktop_config.json
macOS：~/Library/Application Support/Claude/claude_desktop_config.json
Linux：~/.config/Claude/claude_desktop_config.json

选项 1：使用 Docker（推荐）

Docker 方式提供了可靠的、隔离的执行环境，包含所有依赖项。首先，构建 Docker 镜像：

# 构建 stdio 服务器镜像
docker build -f Dockerfile.stdio -t linkedin-mcp-stdio .

然后将以下内容添加到你的 Claude Desktop 配置文件中：

{
  "mcpServers": {
    "linkedin-spider": {
      "command": "docker",
      "args": [
        "run",
        "--rm",
        "-i",
        "-e",
        "LINKEDIN_EMAIL=your_email@example.com",
        "-e",
        "LINKEDIN_PASSWORD=your_password",
        "-e",
        "HEADLESS=true",
        "-e",
        "TRANSPORT=stdio",
        "linkedin-mcp-stdio"
      ]
    }
  }
}

Docker 开发与测试

在使用 Docker 进行开发和测试时，你可以使用一个镜像并配置不同的传输方式：

构建 Docker 镜像

# 一次性构建适用于所有传输类型的镜像
docker build -t linkedin-mcp .

使用不同的传输方式运行

SSE 服务器

docker run -p 8000:8000 -e TRANSPORT=sse --env-file .env linkedin-mcp

HTTP 服务器

docker run -p 8000:8000 -e TRANSPORT=http --env-file .env linkedin-mcp

STDIO 服务器

docker run --rm -i -e TRANSPORT=stdio --env-file .env linkedin-mcp

认证方式

方式 1：使用 LinkedIn Cookie

在浏览器中登录 LinkedIn。
打开开发者工具（F12）。
转到“应用程序/存储” → “Cookies” → “linkedin.com”。
复制 li_at Cookie 的值。
在代码中使用它：

scraper = LinkedinSpider(li_at_cookie="your_cookie_value")

方式 2：使用电子邮件和密码（推荐）

scraper = LinkedinSpider(
    email="your_email@example.com",
    password="your_password"
)

🤝 贡献

我们欢迎贡献！请参考 CONTRIBUTING.md 获取贡献指南。

📄 许可证

本项目采用 MIT 许可证 - 详情请参阅 LICENSE 文件。

⚠️ 免责声明

本工具仅供个人使用。请遵守以下规则：

遵守 LinkedIn 的服务条款。
使用合理的请求速率限制。
不要对用户进行垃圾邮件或骚扰行为。
对收集的数据负责。

linkedin-spider

README

🚀 LinkedIn 爬虫

🚀 快速开始

安装

选项 1：使用 pip（推荐用于常规使用）

选项 2：使用 uv 进行开发设置

✨ 主要特性

💻 使用示例

1. Python 库

2. 命令行界面 (CLI)

3. MCP 服务器

与 Claude Code 集成

与 Claude Desktop 集成

选项 1：使用 Docker（推荐）

Docker 开发与测试

构建 Docker 镜像

使用不同的传输方式运行

SSE 服务器

HTTP 服务器

STDIO 服务器

认证方式

方式 1：使用 LinkedIn Cookie

方式 2：使用电子邮件和密码（推荐）

🤝 贡献

📄 许可证

⚠️ 免责声明

Runtime guide

Hosted runtime

Local runtime / other methods