Internal API reference (for contributors)

This is not a public library API. It is provided to help contributors and maintainers.

wayparam.http

HttpConfig

Fields: - timeout_s: float (default: 30.0) - retries: int (default: 4) - backoff_base_s: float - max_backoff_s: float - user_agent: Optional[str] - proxy: Optional[str]

get_text(client, url, params, config) -> str

Performs a GET request and returns response text, with retry/backoff behavior.

Raises RuntimeError after retries with a message like: - HTTP request failed after retries (status=503): ... - HTTP request failed after retries (no-status): ...

wayparam.wayback

CdxOptions

Fields: - include_subdomains: bool - collapse: str | None (default: urlkey) - from_ts: str | None - to_ts: str | None - limit: int - filters: list[str] | None

iter_original_urls(domain, client, http_config, rate_limiter, opt) -> AsyncIterator[str]

Yields “original” URLs from the CDX API, handling paging/resumeKey.

wayparam.normalize

NormalizeOptions

Fields: - placeholder: str - keep_values: bool - only_params: bool - drop_tracking: bool - drop_empty: bool - sort_params: bool

canonicalize_url(url, opt) -> str | None

Returns a canonicalized URL or None if filtered out or invalid.

wayparam.filters

FilterOptions

Fields: - ext_blacklist: set[str] - ext_whitelist: set[str] | None - path_exclude_regex: list[re.Pattern] | None

is_boring(url, opt) -> bool

Returns True if the URL should be filtered out as “boring”.

wayparam.output

UrlRecord

Fields: - domain: str - url: str - source: str (default: wayback) - fetched_at: str | None

write_record(fh, rec, fmt)

Writes one record to a file handle.

Prints one record to stdout.

Prints diagnostics to stderr.