返回 Skill 列表
extension
分类: 效率与办公需要 API Key

Dify Knowledge Base Search

在 Dify 知识库(数据集)中搜索,以获取用于增强 RAG 回答的准确上下文。

person作者: xiaowenzhouhubclawhub

Dify Knowledge Base Search Skill

🔍 Search your Dify Knowledge Base to get accurate, contextual answers

This skill enables AI agents to query Dify datasets for RAG (Retrieval-Augmented Generation) context retrieval. Perfect for knowledge base Q&A, documentation search, and contextual AI responses.

Dify Knowledge Base

✨ Features

  • List Knowledge Bases - Discover all available Dify datasets
  • Smart Search - Query datasets with hybrid, semantic, or keyword search
  • Auto-Discovery - Automatically find available datasets if ID not provided
  • Configurable Results - Adjust top-k, search method, and reranking
  • Error Handling - Graceful error messages for debugging
  • Zero Hardcoding - All configuration via environment variables

🚀 Quick Start

1. Configure Environment Variables

Set up in openclaw.json:

{
  "env": {
    "vars": {
      "DIFY_API_KEY": "${DIFY_API_KEY}",
      "DIFY_BASE_URL": "https://dify.example.com/v1"
    }
  }
}

Environment Variables:

| Variable | Required | Default | Description | |----------|----------|---------|-------------| | DIFY_API_KEY | ✅ Yes | - | Your Dify API Key (from Settings → API) | | DIFY_BASE_URL | ❌ No | http://localhost/v1 | Your Dify instance base URL |

2. Install Dependencies

pip3 install requests

🛠️ Tools

dify_list

Lists all available knowledge bases (datasets) in your Dify instance.

Invocation: dify_list tool

Example Response:

{
  "status": "success",
  "count": 2,
  "datasets": [
    {
      "id": "dataset-abc123",
      "name": "Product Documentation",
      "doc_count": 42,
      "description": "All product guides and tutorials"
    },
    {
      "id": "dataset-xyz789",
      "name": "API Reference",
      "doc_count": 156,
      "description": "REST API documentation"
    }
  ]
}

Usage:

{}

dify_search

Searches a Dify Dataset for relevant context chunks.

Invocation: dify_search tool (mapped to python3 scripts/search.py)

Parameters:

| Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | query | string | ✅ Yes | - | Search query or question | | dataset_id | string | ❌ No | Auto-discover | Specific dataset ID to search | | top_k | integer | ❌ No | 3 | Number of results to return | | search_method | string | ❌ No | hybrid_search | Search strategy | | reranking_enable | boolean | ❌ No | false | Enable reranking for better results |

Search Methods:

  • hybrid_search - Combine semantic + keyword search (recommended)
  • semantic_search - Meaning-based similarity search
  • keyword_search - Exact keyword matching

Example Usage:

{
  "query": "How do I configure OpenClaw?",
  "top_k": 5
}
{
  "query": "API authentication methods",
  "dataset_id": "dataset-xyz789",
  "search_method": "semantic_search",
  "reranking_enable": true
}

Example Response:

{
  "status": "success",
  "query": "How do I configure OpenClaw?",
  "dataset_id": "dataset-abc123",
  "count": 3,
  "results": [
    {
      "content": "To configure OpenClaw, edit the openclaw.json file...",
      "score": 0.8923,
      "title": "Installation Guide",
      "document_id": "doc-001"
    },
    {
      "content": "OpenClaw supports environment variables via...",
      "score": 0.8451,
      "title": "Configuration Options",
      "document_id": "doc-002"
    }
  ]
}

📋 Complete Workflow Example

[
  {
    "tool": "dify_list",
    "parameters": {}
  },
  {
    "tool": "dify_search",
    "parameters": {
      "query": "What are the system requirements?",
      "top_k": 5,
      "search_method": "hybrid_search"
    }
  }
]

🔧 Troubleshooting

Common Errors

| Error | Solution | |-------|----------| | Missing DIFY_API_KEY | Set DIFY_API_KEY in environment variables | | Connection refused | Check DIFY_BASE_URL is correct and accessible | | No datasets found | Verify dataset exists in your Dify workspace | | API request failed | Check network connectivity and API key permissions |

Debug Mode

Run manually to see detailed errors:

DIFY_API_KEY=your-key python3 scripts/search.py <<< '{"query":"test"}'

📚 Integration Tips

RAG Pipeline Integration

# Example: Use search results in AI response
results = dify_search(query, top_k=5)
context = "\n".join([r["content"] for r in results["results"]])
final_prompt = f"Answer based on context:\n\n{context}\n\nQuestion: {query}"

Multiple Datasets

For searching across multiple datasets, loop through them:

{
  "query": "Find information about authentication",
  "dataset_id": "dataset-api-docs"
}

Then query another dataset separately.

🔒 Security

  • Never commit API keys - Use environment variables or .env files
  • Rotate keys regularly - Generate new keys in Dify Settings
  • Restrict access - Limit API key permissions where possible

📖 Implementation Details

This skill uses the Dify Dataset API:

  • List Datasets: GET /v1/datasets
  • Search: POST /v1/datasets/{id}/retrieve

For API documentation, see: https://docs.dify.ai/reference/api-reference

📝 Changelog

v1.1.0 (2026-02-08):

  • ✅ Added search method selection (hybrid/semantic/keyword)
  • ✅ Added reranking support
  • ✅ Auto-discovery of datasets
  • ✅ Improved error handling
  • ✅ Removed hardcoded URLs (fully configurable)
  • ✅ Added detailed logging

v1.0.0 (2026-02-06):

  • Initial release
  • Basic list and search functionality