Architecture

wayparam is intentionally modular. Each module has a single responsibility, which makes the tool easier to test, audit, and package.

High-level data flow

  1. cli.py
  2. Parses args
  3. Builds option objects
  4. Orchestrates concurrency
  5. wayback.py
  6. Builds CDX query parameters
  7. Handles pagination/resumeKey
  8. http.py
  9. Makes resilient HTTP requests (retries, backoff)
  10. filters.py
  11. Drops “boring” URLs (static assets) early
  12. normalize.py
  13. Canonicalizes and normalizes URLs (stable output)
  14. output.py
  15. Writes records to files and/or stdout (txt/jsonl)
  16. ratelimit.py
  17. Global RPS limiter (optional)

Why this structure matters

  • unit tests focus on pure logic (normalize.py, filters.py, parsing)
  • integration tests mock HTTP at the transport layer (httpx MockTransport)
  • CLI stays pipeline-friendly: stdout is clean and predictable