Skip to content

pynex-dev/author-watch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

author-watch

Monitor specific authors across multiple publication venues and get notified whenever they publish something new.

Most RSS readers let you follow a publication — but not a specific author within it. author-watch solves that: define the authors you care about, point it at their publication pages or RSS feeds, and it tracks new articles across all of them. When something new appears, it can notify you via Telegram and/or automatically save it to Readwise Reader.


Features

  • Author-specific tracking — follow individual writers, not whole publications
  • Multiple venues per author — a single author can publish on a think tank site, a Substack, and a magazine; watch all of them
  • RSS + HTML scraping — RSS where available, CSS-selector HTML scraping as fallback
  • Deduplication — same article found on two venues in one run is only reported once
  • Telegram notifications — get a message with title + link when something new appears
  • Readwise Reader integration — articles are automatically saved and tagged by author name
  • Stateful — tracks what's been seen so you only hear about genuinely new articles
  • Dry-run mode — test your config without sending any notifications
  • Venue discovery — search the web to find new publication venues for any author, then interactively add them to your watch list

Quick start

# 1. Install dependencies
pip install requests pyyaml python-dotenv beautifulsoup4

# 2. Copy and edit the config
cp authors.yaml.example authors.yaml
$EDITOR authors.yaml

# 3. Copy and edit credentials
cp .env.example .env
$EDITOR .env

# 4. Dry-run to verify article detection without sending anything
python watcher.py --dry-run --verbose

# 5. Run for real
python watcher.py

Discovering new venues

Not sure where an author publishes? Run the discovery tool:

# Show up to 12 new venues where "Michael Doran" has published in the last year
python discover.py "Michael Doran"

# Show 20 results, search back 2 years
python discover.py "Michael Doran" --limit 20 --year 2

# Search then interactively choose which venues to add to authors.yaml
python discover.py "Michael Doran" --add

# Machine-readable JSON output
python discover.py "Michael Doran" --json

The tool:

  1. Runs multiple targeted web searches for the author's recent work
  2. Deduplicates results by domain
  3. Probes each new domain for RSS autodiscovery
  4. Shows already-watched venues (marked ✓) and new ones (numbered)
  5. With --add: lets you type numbers to select venues, then appends them to authors.yaml automatically — using RSS if found, HTML otherwise

For best results, add a free search API key to .env:

# Serper.dev (2,500 free searches/month) — https://serper.dev
SERPER_API_KEY=your_key_here

# OR Brave Search API — https://api.search.brave.com
BRAVE_API_KEY=your_key_here

Without an API key the tool falls back to DuckDuckGo HTML scraping, which is rate-limited after a few queries. A single Serper free account is more than sufficient for occasional discovery runs.

Note on false positives: web search results may include different people who share the author's name. Always review the sample titles/URLs before adding a venue.


Configuration: authors.yaml

authors:
  - name: "Jane Author"           # Display name in notifications
    match_names:                  # Name variants to match in RSS text
      - "Jane Author"
      - "J. Author"
    venues:
      # Option 1 — RSS feed (simplest, most reliable)
      - type: rss
        url: "https://example.com/rss.xml"

      # Option 2 — HTML author page with CSS selectors
      - type: html
        url: "https://example.com/experts/jane-author"
        item_selector: ".article-card"       # Container for each article
        date_selector: ".article-date"       # Date within container (optional)
        # Only include cards that link to this author's profile page
        # (useful on multi-author pages where articles from other authors
        # appear in the same list)
        author_link_contains: "/jane-author"

See authors.yaml.example for more patterns.

Field reference

Field Type Description
name str Author display name used in notifications
match_names list[str] Substrings to match in RSS author/title/content fields
venues[].type rss or html Venue type
venues[].url str RSS feed URL or HTML page URL
venues[].item_selector str CSS selector for article cards (HTML only)
venues[].title_selector str CSS selector for title within a card (HTML only, optional)
venues[].link_selector str CSS selector for the link within a card (HTML only, optional)
venues[].date_selector str CSS selector for date within a card (HTML only, optional)
venues[].author_link_contains str Filter cards by author link substring (HTML only, optional)

Environment variables

Copy .env.example to .env and fill in at least one output:

Variable Required Description
TELEGRAM_BOT_TOKEN One of these Bot token from @BotFather
TELEGRAM_CHAT_ID If using Telegram Your chat/user ID (get it from @userinfobot)
READWISE_TOKEN One of these Readwise API token from readwise.io/access_token
SERPER_API_KEY Optional Serper.dev key for richer venue discovery
BRAVE_API_KEY Optional Brave Search API key for venue discovery

At least one of TELEGRAM_BOT_TOKEN or READWISE_TOKEN must be set. Both can be active simultaneously.


Outputs

Telegram

When a new article is found, sends a message like:

✍️ Jane Author published on example.com

Article Title Here
📅 2026-05-30

Readwise Reader

Articles are saved to your Reader inbox via the Reader API, automatically tagged with the author's name (e.g. jane-author). Readwise fetches and parses the full article content.


Scheduling

Run watcher.py on a schedule using any method you prefer:

cron (e.g. twice daily at 6am and 6pm):

0 6,18 * * * cd /path/to/author-watch && python watcher.py >> watcher.log 2>&1

systemd timer, launchd (macOS), or any task scheduler also work.


State file

state.json tracks all article IDs that have been notified. It is created automatically on first run. To re-trigger notifications for all currently visible articles (useful for testing), delete it:

rm state.json
python watcher.py --dry-run  # preview what would fire

CLI reference

python watcher.py [options]

Options:
  --config PATH    Path to authors.yaml  (default: ./authors.yaml)
  --state PATH     Path to state.json    (default: ./state.json)
  --dry-run        Check for new articles without sending any notifications
  --verbose / -v   Enable DEBUG logging

python discover.py AUTHOR [options]

Options:
  --limit N        Max new venues to return (default: 12)
  --year N         How many years back to search (default: 1)
  --no-rss-probe   Skip RSS feed probing (faster)
  --add            Interactive: choose venues to add to authors.yaml
  --json           Output results as JSON
  --config PATH    Path to authors.yaml (default: ./authors.yaml)
  --verbose / -v   Enable DEBUG logging

Running tests

pip install pytest
python -m pytest test_watcher.py -v

Tests use mocked HTTP — no real network calls or notifications are made.


Common venue patterns

Publication type Recommended approach
Substack RSS: https://AUTHOR.substack.com/feed
Medium RSS: https://medium.com/feed/@USERNAME
WordPress blog RSS: usually /feed or /rss.xml
Hudson Institute HTML + item_selector: ".research-card" + author_link_contains: "/EXPERT-ID"
The Free Press HTML author page: https://www.thefp.com/w/AUTHOR-SLUG
Generic think tank / magazine HTML author page with appropriate CSS selectors

RSS is always preferred when available — it's faster and more reliable than HTML scraping.


Dependencies

  • requests — HTTP
  • pyyaml — config parsing
  • beautifulsoup4 — HTML scraping (falls back to regex if not installed)
  • python-dotenv.env file loading (optional)

About

Monitors author publications and saves new articles to Readwise Reader

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages