Beginner+1–2 hours

Scrape every Python question from Hacker News

Your own weekly Python-news archive.

Goal

Fetch HN's Python tag feed, parse the front-page entries, dedupe by URL, and save the result to a JSON file. Run it weekly to build an archive of Python conversations you can search.

Why it matters

You'll never learn `requests`, `BeautifulSoup`, or file I/O from a lecture — you learn them by scraping something you actually want. This one takes about an hour and gives you a script you'll re-run.

Requirements · rubric

The AI reviewer grades your submission against this exact list.

01Fetch https://news.ycombinator.com/from?site=python.org or the Python tag page
02Parse the list of story links + titles using BeautifulSoup
03Deduplicate entries by URL
04Persist the result as a valid JSON array to `hn-python.json`
05Handle network errors gracefully (retry once, then log and exit non-zero)
06Include a runnable example — `python scraper.py` produces the file

Starter code

Copy this into a new file. The ... placeholders are yours to fill.

"""
HN Python scraper.

Usage:
    python scraper.py

Writes hn-python.json with a list of {title, url, points, discuss_url}.
"""

import json
from pathlib import Path

import requests
from bs4 import BeautifulSoup

URL = "https://news.ycombinator.com/from?site=python.org"
OUT = Path("hn-python.json")


def fetch() -> str:
    # TODO: fetch URL with a real User-Agent header and return response.text.
    # Retry once on failure. Raise for status codes 4xx/5xx.
    ...


def parse(html: str) -> list[dict]:
    # TODO: BeautifulSoup(html, "html.parser") — extract each story row.
    # Return list of {"title", "url", "points", "discuss_url"}.
    ...


def dedupe(items: list[dict]) -> list[dict]:
    # TODO: keep first occurrence per URL.
    ...


def save(items: list[dict]) -> None:
    OUT.write_text(json.dumps(items, indent=2, ensure_ascii=False))


def main() -> None:
    html = fetch()
    items = parse(html)
    items = dedupe(items)
    save(items)
    print(f"Saved {len(items)} entries to {OUT}")


if __name__ == "__main__":
    main()

Skills used

requestsBeautifulSoupfile I/Odict / list handling

Builds on

dictionaries json

Runtime

Runs on your machine · Python 3.11+ · install deps with pip.

submitship + review

Ship your solution.

Push your solution to a public GitHub gist, repo, or Replit — paste the URL below. Optional: paste your `scraper.py` inline for the AI reviewer to give line-level feedback.

loading…

Requirements · rubric

The AI reviewer grades your submission against this exact list.

01Fetch https://news.ycombinator.com/from?site=python.org or the Python tag page

02Parse the list of story links + titles using BeautifulSoup

03Deduplicate entries by URL

04Persist the result as a valid JSON array to `hn-python.json`

05Handle network errors gracefully (retry once, then log and exit non-zero)

06Include a runnable example — `python scraper.py` produces the file

Starter code

Copy this into a new file. The ... placeholders are yours to fill.

""" HN Python scraper. Usage: python scraper.py Writes hn-python.json with a list of {title, url, points, discuss_url}. """ import json from pathlib import Path import requests from bs4 import BeautifulSoup URL = "https://news.ycombinator.com/from?site=python.org" OUT = Path("hn-python.json") def fetch() -> str: # TODO: fetch URL with a real User-Agent header and return response.text. # Retry once on failure. Raise for status codes 4xx/5xx. ... def parse(html: str) -> list[dict]: # TODO: BeautifulSoup(html, "html.parser") — extract each story row. # Return list of {"title", "url", "points", "discuss_url"}. ... def dedupe(items: list[dict]) -> list[dict]: # TODO: keep first occurrence per URL. ... def save(items: list[dict]) -> None: OUT.write_text(json.dumps(items, indent=2, ensure_ascii=False)) def main() -> None: html = fetch() items = parse(html) items = dedupe(items) save(items) print(f"Saved {len(items)} entries to {OUT}") if __name__ == "__main__": main()

Ship your solution.

Push your solution to a public GitHub gist, repo, or Replit — paste the URL below. Optional: paste your `scraper.py` inline for the AI reviewer to give line-level feedback.