
npm install convex-firecrawl-scrapeScrape any URL and get clean markdown, HTML, screenshots, or structured JSON - with durable caching and reactive queries.
const { jobId } = await scrape({ url: "https://example.com" });
// Status updates reactively as the scrape completes
const status = useQuery(api.firecrawl.getStatus, { id: jobId });Play with the example:
git clone https://github.com/gitmaxd/convex-firecrawl-scrape.git
cd convex-firecrawl-scrape
npm install
npm run devYou'll need an existing Convex project. Convex is a hosted backend platform with a database, serverless functions, and more. Learn more here.
Run npm create convex or follow any of the
quickstarts to set one up.
npm install convex-firecrawl-scrapeInstall the component in your convex/convex.config.ts:
// convex/convex.config.ts
import { defineApp } from "convex/server";
import firecrawlScrape from "convex-firecrawl-scrape/convex.config.js";
const app = defineApp();
app.use(firecrawlScrape);
export default app;Set your Firecrawl API key:
npx convex env set FIRECRAWL_API_KEY your_api_key_hereGet your API key at firecrawl.dev.
Always use exposeApi() to expose component functionality. This wrapper
enforces authentication and controls API key access.
// convex/firecrawl.ts
import { exposeApi } from "convex-firecrawl-scrape";
import { components } from "./_generated/api";
export const { scrape, getCached, getStatus, getContent, invalidate } =
exposeApi(components.firecrawlScrape, {
auth: async (ctx, operation) => {
const identity = await ctx.auth.getUserIdentity();
if (!identity) throw new Error("Unauthorized");
return process.env.FIRECRAWL_API_KEY!;
},
});import { useMutation, useQuery } from "convex/react";
import { api } from "../convex/_generated/api";
import { useState } from "react";
function ScrapeButton({ url }: { url: string }) {
const [jobId, setJobId] = useState<string | null>(null);
const scrape = useMutation(api.firecrawl.scrape);
const status = useQuery(
api.firecrawl.getStatus,
jobId ? { id: jobId } : "skip",
);
const content = useQuery(
api.firecrawl.getContent,
jobId && status?.status === "completed" ? { id: jobId } : "skip",
);
return (
<div>
<button
onClick={async () => setJobId((await scrape({ url })).jobId)}
disabled={status?.status === "scraping"}
>
{status?.status === "scraping" ? "Scraping..." : "Scrape"}
</button>
{status?.status === "completed" && <pre>{content?.markdown}</pre>}
{status?.status === "failed" && <p>Error: {status.error}</p>}
</div>
);
}const { jobId } = await scrape({
url: "https://example.com",
options: {
formats: ["markdown", "html", "links", "images", "screenshot"],
storeScreenshot: true,
},
});| Format | Description |
|---|---|
markdown | Clean markdown content (default) |
html | Cleaned HTML |
rawHtml | Original HTML source |
links | URLs found on the page |
images | Image URLs found on the page |
summary | AI-generated page summary |
screenshot | Screenshot URL (use storeScreenshot: true to persist) |
Extract structured data using a JSON schema:
const { jobId } = await scrape({
url: "https://example.com/product",
options: {
extractionSchema: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "number" },
},
required: ["name", "price"],
},
},
});
const content = await getContent({ id: jobId });
console.log(content.extractedJson); // { name: "Widget", price: 99.99 }Cached results use superset matching: a cache entry with
["markdown", "screenshot"] satisfies a request for ["markdown"].
// Check cache
const cached = await getCached({ url: "https://example.com" });
// Force refresh
const { jobId } = await scrape({ url, options: { force: true } });
// Invalidate cache
await invalidate({ url: "https://example.com" });For anti-bot protected sites:
const { jobId } = await scrape({
url: "https://protected-site.com",
options: {
proxy: "stealth", // Residential proxy
waitFor: 3000, // Wait for dynamic content
},
});Always use exposeApi() - never expose component functions directly to
clients. Server-side code can call component internals directly, but doing so
bypasses authentication. It ensures:
// ❌ DANGEROUS - bypasses auth
export const scrape = components.firecrawlScrape.lib.startScrape;
// ✅ SAFE - auth enforced
export const { scrape } = exposeApi(components.firecrawlScrape, { auth: ... });SSRF Protection: Built-in validation blocks localhost, private IPs, and non-HTTP schemes.
For domain allowlists, rate limiting, and detailed security guidance, see docs/SECURITY.md.
const status = await getStatus({ id: jobId });
if (status?.status === "failed") {
console.error(status.error, status.errorCode);
// errorCode is the HTTP status from Firecrawl (e.g., 402, 403, 429, 500)
}Found a bug? Feature request? File it here.