Firecrawl
This guide shows how to use Firecrawl with LangChain to load web data into an LLM-ready format using Firecrawl.
Overview
FireCrawl crawls and convert any website into LLM-ready data. It crawls all accessible subpages and give you clean markdown and metadata for each. No sitemap required.
FireCrawl handles complex tasks such as reverse proxies, caching, rate limits, and content blocked by JavaScript. Built by the mendable.ai team.
This guide shows how to scrap and crawl entire websites and load them using the FireCrawlLoader
in LangChain.
Setup
Sign up and get your free FireCrawl API key to start. FireCrawl offers 300 free credits to get you started, and it's open-source in case you want to self-host.
Usage
Here's an example of how to use the FireCrawlLoader
to load web search results:
Firecrawl offers 2 modes: scrape
and crawl
. In scrape
mode, Firecrawl will only scrape the page you provide. In crawl
mode, Firecrawl will crawl the entire website.
- npm
- Yarn
- pnpm
npm install @mendable/firecrawl-js
yarn add @mendable/firecrawl-js
pnpm add @mendable/firecrawl-js
import { FireCrawlLoader } from "@langchain/community/document_loaders/web/firecrawl";
const loader = new FireCrawlLoader({
url: "https://firecrawl.dev", // The URL to scrape
apiKey: process.env.FIRECRAWL_API_KEY, // Optional, defaults to `FIRECRAWL_API_KEY` in your env.
mode: "scrape", // The mode to run the crawler in. Can be "scrape" for single urls or "crawl" for all accessible subpages
params: {
// optional parameters based on Firecrawl API docs
// For API documentation, visit https://docs.firecrawl.dev
},
});
const docs = await loader.load();
API Reference:
- FireCrawlLoader from
@langchain/community/document_loaders/web/firecrawl
Additional Parameters
For params
you can pass any of the params according to the Firecrawl documentation.