A simple web crawler written in C++ using WinINet for HTTP requests.
- Start: Begins crawling from a given URL
- Download: Fetches the HTML content of the page
- Parse: Extracts all links from
<a href="...">tags - Normalize: Converts relative URLs (e.g.,
/about) to absolute URLs - Filter: Only follows links from the same domain
- Recurse: Repeats the process for each found link up to max depth
- Track: Keeps a set of visited URLs to avoid crawling the same page twice
The max depth parameter controls how deep the crawler goes:
- Depth 0: Only the starting URL
- Depth 1: Starting URL + all links found on it
- Depth 2: Starting URL + its links + links found on those pages
- And so on...