返回 Skill 列表
extension
分类: 数据与分析无需 API Key

spotify-news-digest

从多个来源(Spotify官方博客、工程/研究/新闻室、TechCrunch、The Verge、Music Business等)抓取并汇总Spotify相关新闻。

person作者: ibillxiahubclawhub

Spotify News Digest Skill

Overview

This skill automatically aggregates Spotify-related news from 9+ sources (official Spotify blogs + media + community), deduplicates articles, ranks them by relevance, and delivers a formatted Chinese digest — one-sentence summary per article with the original link.

Key Features:

  • 🎵 Covers Spotify's full content ecosystem (Engineering, Newsroom, Research, Design)
  • 📰 Pulls media coverage from TechCrunch, The Verge, Music Business Worldwide, Forbes
  • 💬 Includes Hacker News community discussions via Algolia API
  • 🔍 DDG News search fallback ensures coverage even when RSS is restricted
  • 🧹 Title-similarity deduplication (threshold: 0.65)
  • 📊 Multi-factor ranking: source authority + recency + community score
  • 🤖 One-sentence Chinese summaries generated by LLM at render time
  • ⏰ Configurable time range (hours or days)

⚠️ Security Notes

Read this before scheduling or running in any environment with internal network access.

  • TLS verification is enabled. The fetcher uses verify=True on all HTTP requests. Do not add any ssl._create_default_https_context overrides.
  • Domain allowlist enforced for search results. DDG News results are filtered through ALLOWED_DDG_DOMAINS (defined at the top of scripts/fetch_spotify_news.py). Only well-known public news domains pass; internal hostnames are rejected. Review and edit that list before running in sensitive environments.
  • RSS sources are explicit and fixed. The base sources in config/sources.json are official Spotify feeds and a few trusted media outlets. Do not add intranet or metadata-service URLs to sources.json.
  • Run in isolation when scheduling. If you plan to schedule this skill, run it in a container or VM that has no access to internal services or secrets. A mis-configured DDG result that slips through would otherwise reach your internal network.
  • Audit pip dependencies before installing in production: feedparser, beautifulsoup4, requests, python-dateutil, ddgs.
  • Cron delivery scope. When you ask OpenClaw to schedule this skill, it creates an isolated agentTurn cron job. Confirm the target channel is a group/recipient you intend before confirming the job.

Quick Start

One-Time Digest (Last 24 Hours)

cd /projects/.openclaw/skills/spotify-news-digest
python3 scripts/generate_digest.py

Last 7 Days (Broader Coverage)

python3 scripts/generate_digest.py --hours 168

Save to Markdown File

python3 scripts/generate_digest.py --hours 24 --output /tmp/spotify_digest.md

News Sources

| Source | Type | URL / Method | Category | |--------|------|-------------|----------| | Spotify Engineering Blog | RSS | engineering.atspotify.com/feed/ | official | | Spotify Newsroom | RSS | newsroom.spotify.com/feed/ | official | | Spotify Research | RSS | research.atspotify.com/feed/ | research | | Spotify Design | RSS | spotify.design/feed | design | | TechCrunch Spotify | RSS | techcrunch.com/tag/spotify/feed/ | media | | The Verge (filtered) | RSS | Verge full feed + keyword filter | media | | Hacker News Spotify | Algolia API | hn.algolia.com query=spotify | community | | DDG News Search | DDGS API | 5 queries × 8 results | media/official |

Note: Music Business Worldwide and Billboard RSS feeds have high latency and are disabled by default (_disabled: true in sources.json). Their articles are captured via DDG News Search instead.


Output Format

Each digest is grouped by category with one-sentence Chinese summaries:

🎵 Spotify 新闻日报 · YYYY-MM-DD
共 N 条(去重后)
─────────────────────────────

🎵 官方动态(N 条)
· [一句话中文总结](Source Name)
  🔗 https://...

📰 媒体报道(N 条)
· [一句话中文总结](Source Name)
  🔗 https://...

🔬 技术研究(N 条)
· ...

─────────────────────────────
🤖 由 OpenClaw · spotify-news-digest 自动生成

Category Labels

| Category Key | Display Label | |-------------|--------------| | official | 🎵 官方动态 | | research | 🔬 技术研究 | | design | 🎨 产品设计 | | media | 📰 媒体报道 | | community | 💬 社区讨论 | | industry | 🏭 行业资讯 |


Usage Examples

1. As Python Module

import sys
sys.path.insert(0, '/projects/.openclaw/skills/spotify-news-digest/scripts')

from fetch_spotify_news import SpotifyNewsFetcher
from process_spotify_news import SpotifyNewsProcessor, format_digest

# Fetch articles from the last 48 hours
articles = SpotifyNewsFetcher().fetch_all(hours=48)
print(f"Fetched {len(articles)} articles")

# Deduplicate, score, and group by category
result = SpotifyNewsProcessor().process(articles, max_output=20)

# Render Markdown digest
md = format_digest(result, date_str='2026-03-17')
print(md)

2. LLM-Enhanced Summaries (Recommended)

The format_digest() function outputs [English Title] placeholders when no zh_summary is set on an article. The calling LLM should:

  1. Run generate_digest.py to get the raw structured result
  2. For each article, read title + summary and generate a one-sentence Chinese summary
  3. Set item['zh_summary'] before calling format_digest()

This two-step flow keeps the scraping fast and the summarization accurate.

# Example: LLM fills zh_summary before formatting
for cat_items in result.values():
    for item in cat_items:
        item['zh_summary'] = llm_summarize_zh(item['title'], item['summary'])

digest = format_digest(result)

3. Scheduled Daily Digest (Cron via OpenClaw)

Ask OpenClaw to set up a daily cron job:

"每天上午 10 点发一份 Spotify 新闻日报到 [群/频道]"

OpenClaw will create an isolated agentTurn cron job that:

  1. Runs this skill
  2. Generates Chinese summaries with LLM
  3. Posts the digest to the specified channel

Configuration

Add / Remove Sources

Edit config/sources.json:

{
  "sources": [
    {
      "name": "My Custom Source",
      "type": "rss",
      "url": "https://example.com/feed/",
      "language": "en",
      "category": "media",
      "keyword_filter": "spotify"
    }
  ],
  "settings": {
    "max_news_per_source": 15,
    "final_output_count": 20,
    "similarity_threshold": 0.65,
    "timeout": 12,
    "keyword_filter_default": "spotify"
  }
}

keyword_filter: When set, only articles containing this keyword (case-insensitive) in title or summary are included. Leave empty for official Spotify feeds that are already Spotify-only.

_disabled: true: Mark a source to skip it at runtime without deleting it.

Security: Only add public news domains to sources.json. Do not add intranet, VPN, or cloud metadata URLs — they could be fetched directly via RSS without domain-allowlist filtering.

Extend the DDG Domain Allowlist

If you add sources that are reachable via DDG News search (not RSS), also add their domain to ALLOWED_DDG_DOMAINS near the top of scripts/fetch_spotify_news.py:

ALLOWED_DDG_DOMAINS: tuple = (
    'atspotify.com',
    'techcrunch.com',
    ...
    'your-new-domain.com',  # ← add here
)

Tune Deduplication

In scripts/process_spotify_news.py:

processor = SpotifyNewsProcessor(similarity_threshold=0.65)
# 0.5 = aggressive dedup | 0.8 = loose dedup

Tune Source Authority Weights

In process_spotify_news.py, edit source_weight:

source_weight = {
    'Spotify Engineering Blog': 90,
    'Spotify Newsroom': 80,
    'TechCrunch': 60,
    # Add your custom sources here
}

Command-Line Reference

python3 scripts/generate_digest.py [OPTIONS]

Options:
  --hours N     Time range in hours (default: 24)
  --max N       Max articles to output after dedup (default: 20)
  --output PATH Save Markdown digest to file
python3 scripts/fetch_spotify_news.py [OPTIONS]

Options:
  --hours N     Fetch articles published within last N hours

Troubleshooting

No articles fetched (0 条)

Most likely a network restriction on RSS endpoints.

# Test direct RSS access
curl -I https://engineering.atspotify.com/feed/

# Test DDG search fallback
python3 -c "
from ddgs import DDGS
with DDGS() as d:
    r = list(d.news('spotify new feature', max_results=3, timelimit='w'))
    print(r)
"

If RSS is blocked but DDG works, the skill will still return results via the search fallback. This is normal in restricted network environments.

ddgs not installed

pip3 install ddgs

duckduckgo-search (older package name) is also supported as a fallback.

Slow fetch / timeout

Sources with slow RSS feeds are marked _disabled: true in sources.json. To disable an additional slow source:

{ "name": "...", "_disabled": true, ... }

Too many duplicate articles

Lower the similarity threshold:

SpotifyNewsProcessor(similarity_threshold=0.5)

Articles missing Chinese summary

The format_digest() function wraps untranslated titles in [brackets]. This is intentional — the LLM caller should fill zh_summary before rendering. See Usage Examples → LLM-Enhanced Summaries above.


File Structure

spotify-news-digest/
├── SKILL.md                    ← You are here
├── config/
│   └── sources.json            ← News source definitions & settings
├── scripts/
│   ├── fetch_spotify_news.py   ← Multi-source fetcher (RSS + DDG)
│   ├── process_spotify_news.py ← Dedup, scoring, formatting
│   └── generate_digest.py      ← CLI entry point (fetch → process → print)
└── references/
    └── (reserved for future API reference docs)

Dependencies

pip3 install feedparser beautifulsoup4 requests python-dateutil ddgs

| Package | Purpose | |---------|---------| | feedparser | RSS feed parsing | | beautifulsoup4 | HTML summary cleanup | | requests | HTTP requests for RSS | | python-dateutil | Robust date parsing | | ddgs | DuckDuckGo News search fallback |

Python: 3.8+


Design Notes

  • Why DDG News over site-specific scrapers? RSS feeds from Spotify blogs occasionally return 0 results due to network restrictions. DDG News search provides a reliable fallback that requires no authentication and respects timelimit for recency filtering.
  • Why one-sentence Chinese summaries? The target audience (WeChat Work groups) prefers dense, scannable content. English titles are preserved in the source data so users can verify accuracy.
  • Why not store summaries? Summaries are generated at render time by the calling LLM, keeping the skill stateless and reproducible.

Changelog

| Version | Date | Changes | |---------|------|---------| | 1.1.0 | 2026-07-14 | Security: remove SSL bypass, add domain allowlist for DDG results, explicit verify=True on all requests, updated security guidance | | 1.0.0 | 2026-03-17 | Initial release: RSS + DDG, 9 sources, category grouping, Chinese summary support |