reddit-scraper

Extracts Reddit threads with full comment hierarchy using the public JSON API.

Requirements

Python 3.8+
Internet connection

Installation

git clone https://github.com/auroraflux/reddit-scraper.git
cd reddit-scraper
pip install -e .

Usage

Start the API server:

python -m reddit_scraper.server

Server runs at http://localhost:8001

Scrape from command line:

python -m reddit_scraper.scraper https://reddit.com/r/python/comments/abc123/example

Use as Python module:

from reddit_scraper import scrape_reddit_post

result = scrape_reddit_post('https://reddit.com/r/python/comments/abc123/example')
print(result['stats']['total_comments'])

API

POST /scrape

Request:

{
  "url": "https://reddit.com/r/python/comments/abc123/example"
}

Response:

{
  "success": true,
  "url": "https://old.reddit.com/r/python/comments/abc123/example",
  "post": {
    "title": "Example Post",
    "author": "username",
    "score": 123,
    "timestamp": 1234567890,
    "subreddit": "python",
    "num_comments": 37
  },
  "comments": [
    {
      "id": "abc",
      "author": "user1",
      "score": 45,
      "text": "Comment text",
      "timestamp": 1234567890,
      "replies": []
    }
  ],
  "stats": {
    "total_comments": 37,
    "top_level_comments": 13,
    "max_depth": 5,
    "comments_by_depth": {"0": 13, "1": 18, "2": 6}
  }
}

GET /health

Returns server status.

GET /

Web interface for testing.

GET /docs

OpenAPI documentation.

Configuration

The scraper automatically converts reddit.com URLs to old.reddit.com and appends .json?limit=500.

No authentication required. No API keys needed.

Performance

Typical thread: 5-8 seconds (network-bound)

Extracts 100% of visible comments with unlimited nesting depth. Doesn't expand "load more comments" links.

Limitations

Reddit only
Public posts only
No "load more comments" expansion
No rate limiting protection

Development

Run tests:

pip install -e ".[dev]"
pytest

Code must follow docs/guidelines.md:

Maximum 20 lines per function
Type hints required
No magic numbers

Troubleshooting

Port 8001 already in use:

# Change port in reddit_scraper/server.py
# Find: port=8001
# Change to: port=8002

Module import errors:

pip install -e .

Documentation

docs/api-guide.md - Complete API documentation
docs/scraping-guide.md - Implementation details
docs/comparison.md - Comparison with alternatives
docs/guidelines.md - Code standards

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
openspec		openspec
reddit_scraper		reddit_scraper
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reddit-scraper

Requirements

Installation

Usage

API

POST /scrape

GET /health

GET /

GET /docs

Configuration

Performance

Limitations

Development

Troubleshooting

Documentation

Related

License

About

Uh oh!

Releases

Packages

Languages

License

hkay-dev/reddit-scraper

Folders and files

Latest commit

History

Repository files navigation

reddit-scraper

Requirements

Installation

Usage

API

POST /scrape

GET /health

GET /

GET /docs

Configuration

Performance

Limitations

Development

Troubleshooting

Documentation

Related

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages