HTTrack FAQ | Troubleshooting Website Copier & Web Crawler Issues

Why is the mirrored page layout broken, or why are some images missing?

This is usually caused by the following two reasons:

1. JavaScript Dynamic Rendering: HTTrack is a traditional web crawler that parses links statically in HTML codes and CSS stylesheets. If the target website is a SPA (Single Page Application), or loads its layouts and images dynamically via JS (like Ajax or Lazyload scripts), HTTrack cannot find the hidden URLs.

2. External Domain Resources: Many websites store their CSS or image files on CDN (Content Delivery Networks) or external third-party domains. By default, WinHTTrack does not download external assets for security reasons, which leaves those files broken on local disk.

💡 Solutions:

Go to Set options -> Scan Rules, and manually append rules to include required external domains. E.g., +*cdn.example.com* or +*.example.com/*.
For lazy-loaded images, you can try changing the User-Agent identification string in settings to mimic popular search engines.

The download speed is very slow, or my IP gets banned by the target website. What should I do?

Many modern websites deploy WAF (Web Application Firewalls) or anti-scraping systems. If you crawl a site with high concurrent requests, the server will flag your IP as a malicious attack and ban it (usually returning 403 Forbidden or 503 Service Unavailable errors).

💡 Solutions:

Configure speed limit parameters under Set options:

Limits tab: Lower the maximum concurrent connections (Max connections) to a small number (e.g. 2-4 connections).
Flow Control tab: Set a delay wait time between requests.
Browser ID tab: Change the default User-Agent string to match standard web browsers (such as Chrome or Edge Agent strings) to avoid being flagged as a bot.

I only want to download specific file types (such as PDF, MP4, or JPG). How can I set this up?

WinHTTrack has a powerful Scan Rules filter system. This allows you to download only target formats while skipping useless HTML pages.

💡 Step-by-Step:

Go to Set options -> Scan Rules:

1. If you only want to download PDF files: first exclude all files by writing -*, and then add +*.pdf specifically, like so:

-* +*.pdf

2. If you only want to download JPG and PNG images:

-* +*.jpg +*.png

Note: Separate multiple rules using spaces.

The website content has changed. Do I need to download everything from scratch again?

No. HTTrack has a built-in Incremental Update mechanism, which is one of its most powerful features as a website copier.

💡 Step-by-Step:

1. Open WinHTTrack, select the exact same project name in the first wizard screen (to load history settings).
2. In the second screen, change the default Download website(s) action to Update existing mirror in the Action dropdown menu.
3. Click next and finish. The program will check cache definitions and download only new, missing, or changed resources.

Why do some links on my cloned local page redirect to the online live website?

This means the target page of that hyperlink lies outside your mirror boundaries:

The link belongs to an external domain, and cross-domain downloads were not allowed in your settings.
The link depth exceeds the limit configured under Limits -> Max depth.

In this case, HTTrack's smart rebuilder preserves the absolute web address of the URL rather than generating a dead local file path, ensuring the links remain functional when connected to the internet.

Frequently Asked Questions