loading…
Your own weekly Python-news archive.
Fetch HN's Python tag feed, parse the front-page entries, dedupe by URL, and save the result to a JSON file. Run it weekly to build an archive of Python conversations you can search.
You'll never learn `requests`, `BeautifulSoup`, or file I/O from a lecture — you learn them by scraping something you actually want. This one takes about an hour and gives you a script you'll re-run.
The AI reviewer grades your submission against this exact list.
Copy this into a new file. The ... placeholders are yours to fill.
"""
HN Python scraper.
Usage:
python scraper.py
Writes hn-python.json with a list of {title, url, points, discuss_url}.
"""
import json
from pathlib import Path
import requests
from bs4 import BeautifulSoup
URL = "https://news.ycombinator.com/from?site=python.org"
OUT = Path("hn-python.json")
def fetch() -> str:
# TODO: fetch URL with a real User-Agent header and return response.text.
# Retry once on failure. Raise for status codes 4xx/5xx.
...
def parse(html: str) -> list[dict]:
# TODO: BeautifulSoup(html, "html.parser") — extract each story row.
# Return list of {"title", "url", "points", "discuss_url"}.
...
def dedupe(items: list[dict]) -> list[dict]:
# TODO: keep first occurrence per URL.
...
def save(items: list[dict]) -> None:
OUT.write_text(json.dumps(items, indent=2, ensure_ascii=False))
def main() -> None:
html = fetch()
items = parse(html)
items = dedupe(items)
save(items)
print(f"Saved {len(items)} entries to {OUT}")
if __name__ == "__main__":
main()
pip.Push your solution to a public GitHub gist, repo, or Replit — paste the URL below. Optional: paste your `scraper.py` inline for the AI reviewer to give line-level feedback.