Reliable crawling
Crawlee won't fix broken selectors for you (yet), but it helps you build and maintain your crawlers faster. When a website adds JavaScript rendering, you don't have to rewrite everything, only switch to one of the browser crawlers. When you later find a great API to speed up your crawls, flip the switch back. It keeps your proxies healthy by rotating them smartly with good fingerprints that make your crawlers look human-like. It's not unblockable, but it will save you money in the long run. Crawlee is built by people who scrape for a living and use it every day to scrape millions of pages. Meet our community on Discord.
JavaScript & TypeScript
We believe websites are best scraped in the language they're written in. Crawlee runs on Node.js and it's built in TypeScript to improve code completion in your IDE, even if you don't use TypeScript yourself. Crawlee supports both TypeScript and JavaScript crawling.
HTTP scraping
Crawlee makes HTTP requests that mimic browser headers and TLS fingerprints. It also rotates them automatically based on data about real-world traffic. Popular HTML parsers Cheerio and JSDOM are included.
Headless browsers
Switch your crawlers from HTTP to headless browsers in 3 lines of code. Crawlee builds on top of Puppeteer and Playwright and adds its own anti-blocking features and human-like fingerprints. Chrome, Firefox and more.
Automatic scaling and proxy management
Crawlee automatically manages concurrency based on available system resources and smartly rotates proxies. Proxies that often time-out, return network errors or bad HTTP codes like 401 or 403 are discarded.
Queue and Storage
You can save files, screenshots and JSON results to disk with one line of code or plug an adapter for your DB. Your URLs are kept in a queue that ensures their uniqueness and that you don't lose progress when something fails.
Helpful utils and configurability
Crawlee includes tools for extracting social handles or phone numbers, infinite scrolling, blocking unwanted assets and many more. It works great out of the box, but also provides rich configuration options.