Website Email Scraper Apify Skill
Overview
This skill helps an AI agent run the Apify Website Email Scraper & Phone Finder actor for public website contact extraction from domains and URLs.
Default actor:
- Actor ID:
kWfD7C0WpHtIt8VAh - Actor name:
x_guru/website-email-phone-finder - Store page:
https://apify.com/x_guru/website-email-phone-finder - Console source:
https://console.apify.com/actors/kWfD7C0WpHtIt8VAh/source
Use this skill when a user asks to:
- scrape public business emails from website domains
- find emails from company websites, landing pages, contact pages, or domain lists
- enrich lead lists with emails, phones, social profile links, source URLs, and crawl diagnostics
- process domains from Google Maps, CRMs, spreadsheets, directories, search results, Apollo-style lists, or agency prospecting workflows
- return only websites with emails, only websites with any contact, or all scanned websites
- control Apify spend with
maxTotalChargeUsd - export contact rows for Sheets, Airtable, n8n, CRM, BI, CSV, JSON, or agent workflows
Quick Workflow
- Clarify the submitted domains or website URLs and the desired saved result count.
- Use
resultMode: "emailsOnly"by default for email lead extraction. - Use
contactsOnlywhen phone numbers or social profiles are useful even without emails. - Use
allWebsitesonly when the user needs diagnostics for every submitted website. - Keep
maxPagesPerWebsiteat3for fast runs; use5-10when contacts are likely on staff, team, legal, imprint, or contact pages. - Set
includePersonalData=falsewhen person-like emails or personal LinkedIn profile URLs should be excluded. - Set a budget guard with Apify
maxTotalChargeUsdwhen spend matters. - Run
scripts/website_email_scraper_actor.pyor call the Apify API directly. - Return compact metrics and website contact rows. Check
RUN_SUMMARYfor diagnostics when counts are lower than requested.
Payload Rules
- Use
domainsfor bare domains and full website URLs. urlsandstartUrlscan be normalized intodomainsby the runner for agent convenience.maxResultsis the maximum number of saved dataset rows.resultModemust beemailsOnly,contactsOnly, orallWebsites.maxPagesPerWebsitemust be1-25; default is3.concurrencymust be1-500; default is100.requestTimeoutSecsmust be2-30; default is5.extractPhones,extractSocials,includePersonalData, andsameDomainOnlyare booleans.- Do not send Google Maps search fields such as
searchStringsArray,placeIds,locationQuery, or review fields to this website-only actor. - Pass
maxTotalChargeUsdas an Apify run option, not inside actor input. The included script exposes it as--budget-usd.
Authentication
Use the Apify API token from the environment:
export APIFY_TOKEN='apify_api_xxx'
Never hardcode or print the full token in user-facing output.
Script Usage
The bundled script uses only Python standard library.
Run a quick domain email scrape:
APIFY_TOKEN='apify_api_xxx' \
python3 scripts/website_email_scraper_actor.py quick-domains \
--domains example.com apify.com \
--max-results 50 \
--budget-usd 1
Run with deeper contact-page discovery:
APIFY_TOKEN='apify_api_xxx' \
python3 scripts/website_email_scraper_actor.py quick-domains \
--domains centralrestaurante.com alchemist.dk caitlinmcweeney.com \
--max-results 100 \
--max-pages 5 \
--result-mode emailsOnly \
--budget-usd 1
Run custom JSON:
APIFY_TOKEN='apify_api_xxx' \
python3 scripts/website_email_scraper_actor.py run \
--input-file references/sample_input.json \
--budget-usd 1
Recommended Inputs
Public email leads only
{
"domains": ["centralrestaurante.com", "alchemist.dk", "caitlinmcweeney.com"],
"maxResults": 1000,
"resultMode": "emailsOnly",
"maxPagesPerWebsite": 3,
"concurrency": 100,
"requestTimeoutSecs": 5,
"extractPhones": true,
"extractSocials": true,
"includePersonalData": true,
"sameDomainOnly": true
}
Company inboxes only
{
"domains": ["example.com", "https://example.com/contact"],
"maxResults": 500,
"resultMode": "emailsOnly",
"includePersonalData": false,
"extractPhones": true,
"extractSocials": true
}
Contact records for every website with any public contact
{
"domains": ["example.com", "apify.com"],
"maxResults": 100,
"resultMode": "contactsOnly",
"maxPagesPerWebsite": 5
}
Output Contract
The runner returns JSON:
okactorIdfetchedAtinputUseditemCountrows[]
Rows are actor dataset items. Important groups:
- Website identity:
input,url,domain,status - Emails:
emails,emailDetails.email,emailDetails.type,emailDetails.sourceUrl,emailDetails.domainMatch - Contacts:
phones,socialLinks,facebooks,instagrams,linkedIns,twitters,youtubes,tiktoks - Crawl diagnostics:
contactSignals,pagesFetched,fetchedUrls,httpStatusCodes,errors,durationMs
For the full contract, read references/input-output-contract.md.
Agent Response Rules
- If rows are empty, say the run succeeded but no website contact rows matched the selected mode, then suggest checking
RUN_SUMMARY. - If fewer rows than requested are returned, explain that submitted websites had fewer public contacts, the result mode filtered rows, or budget stopped saving.
- If
emailsis empty incontactsOnlyorallWebsites, explain that the row was saved due to phone/social/diagnostic data. - Explain website email extraction as best-effort because each website controls what it publishes.
- Use
maxTotalChargeUsdfor any user concerned about spend. - Do not promise Google Maps place discovery from this actor. Use the Google Maps Email Extractor actor when the user needs search-by-keyword/location first.
References
references/input-output-contract.mdreferences/sample_input.jsonreferences/troubleshooting.md
Scan to join WeChat group