A compilation of the most common issues and troubleshooting solutions when using WinHTTrack to mirror and clone websites.
This is usually caused by the following two reasons:
1. JavaScript Dynamic Rendering: HTTrack is a traditional web crawler that parses links statically in HTML codes and CSS stylesheets. If the target website is a SPA (Single Page Application), or loads its layouts and images dynamically via JS (like Ajax or Lazyload scripts), HTTrack cannot find the hidden URLs.
2. External Domain Resources: Many websites store their CSS or image files on CDN (Content Delivery Networks) or external third-party domains. By default, WinHTTrack does not download external assets for security reasons, which leaves those files broken on local disk.
💡 Solutions:
+*cdn.example.com* or +*.example.com/*.Many modern websites deploy WAF (Web Application Firewalls) or anti-scraping systems. If you crawl a site with high concurrent requests, the server will flag your IP as a malicious attack and ban it (usually returning 403 Forbidden or 503 Service Unavailable errors).
💡 Solutions:
Configure speed limit parameters under Set options:
WinHTTrack has a powerful Scan Rules filter system. This allows you to download only target formats while skipping useless HTML pages.
💡 Step-by-Step:
Go to Set options -> Scan Rules:
1. If you only want to download PDF files: first exclude all files by writing -*, and then add +*.pdf specifically, like so:
-* +*.pdf
2. If you only want to download JPG and PNG images:
-* +*.jpg +*.png
Note: Separate multiple rules using spaces.
No. HTTrack has a built-in Incremental Update mechanism, which is one of its most powerful features as a website copier.
💡 Step-by-Step:
1. Open WinHTTrack, select the exact same project name in the first wizard screen (to load history settings).
2. In the second screen, change the default Download website(s) action to Update existing mirror in the Action dropdown menu.
3. Click next and finish. The program will check cache definitions and download only new, missing, or changed resources.
This means the target page of that hyperlink lies outside your mirror boundaries:
In this case, HTTrack's smart rebuilder preserves the absolute web address of the URL rather than generating a dead local file path, ensuring the links remain functional when connected to the internet.