# Firecrawl Crawl Issue - n8n Integration

## Setup
- **Platform**: n8n workflow automation
- **Node**: `@mendable/n8n-nodes-firecrawl` (Firecrawl node)
- **Operation**: `crawl`
- **Plan**: Free tier
- **Node Version**: `@mendable/n8n-nodes-firecrawl` v1

## Current Configuration

```json
{
  "operation": "crawl",
  "url": "={{ $json.companyWebsite }}",
  "prompt": "Only extract content related to recent developments at the company that can be used as a point of relevance in an outreach message. Do not extract generic information that is not of current concern to the company. It can be things mentioned on the home page, blog, news, press, about, why etc.",
  "limit": 5,
  "delay": 1000,
  "maxConcurrency": null,
  "excludePaths": {
    "items": [
      {
        "path": "data/*"
      }
    ]
  },
  "crawlOptions": {
    "allowSubdomains": true
  },
  "scrapeOptions": {
    "options": {
      "headers": {}
    }
  },
  "requestOptions": {
    "batching": {
      "batch": {
        "batchSize": 1,
        "batchInterval": 3000
      }
    }
  }
}
```

**Note**: The `scrapeOptions` are set to default values (just empty headers), so they shouldn't cause any issues. However, I'm still getting errors related to `scrapeOptions.formats` on initial attempts.

## Issues

### 1. Initial API Error (First 2 attempts)
Getting this error on the first two crawl attempts:

```json
{
  "success": false,
  "code": "BAD_REQUEST",
  "error": "Bad Request",
  "details": [
    {
      "code": "invalid_type",
      "expected": "array",
      "received": "object",
      "path": ["scrapeOptions", "formats"],
      "message": "Expected array, received object"
    },
    {
      "code": "unrecognized_keys",
      "keys": ["formats"],
      "path": [],
      "message": "Unrecognized key in body -- please review the v2 API documentation for request body changes"
    }
  ]
}
```

**Note**: My `scrapeOptions` only contains default empty headers (`{"options": {"headers": {}}}`), so there's no `formats` key in my config. This suggests the n8n node might be adding `formats` automatically, or there's a mismatch between what the node sends and what the v2 API expects. After 2-3 failures, the crawl eventually starts successfully, but then runs forever (see issue #3).

### 2. Rate Limiting
After the initial errors, I'm hitting rate limits on the free plan:
- "Too many requests" error
- This happens even with `delay: 1000` and `maxConcurrency: 5`

### 3. Crawl Runs Forever (Main Issue)
Once the crawl starts successfully (after the initial errors):
- The crawl job is created and returns a job ID ✅
- Status check shows the crawl is "running" ✅
- **But it never completes** - status stays as "running" indefinitely ❌
- Even with `limit: 5`, the crawl doesn't finish
- I have to manually stop it in the Firecrawl dashboard
- The crawl appears to be stuck and not progressing
- No pages are being crawled or returned

**This is the main blocker** - the crawl starts but never completes, even for small limits (5 pages).

## Expected Behavior
- Crawl should complete within a reasonable time (given limit: 5 pages)
- Should return markdown content for each crawled page
- Should respect the prompt-generated paths and excludePaths configuration
- Status should eventually change from "running" to "completed"

## Questions
1. **n8n Node Compatibility**: Is there a known issue with `@mendable/n8n-nodes-firecrawl` v1 and Firecrawl v2 API? The `scrapeOptions.formats` error suggests the node might be sending incorrect parameters.

2. **Infinite Crawl**: Why would a crawl with `limit: 5` run forever and never complete? Is there something in my configuration causing this, or is this a known issue?

3. **Rate Limiting**: Are there specific settings I should use for the free tier? I'm using `delay: 1000`, `maxConcurrency: null` (which becomes 5), and batching with `batchSize: 1` and `batchInterval: 3000`.

4. **scrapeOptions**: Should `scrapeOptions` be included for crawl operations? The error suggests `formats` is being sent even though I'm not specifying it. Should I remove `scrapeOptions` entirely, or is there a correct way to configure it?

5. **Prompt vs includePaths**: I'm using a `prompt` to generate paths, but I also have `excludePaths` set. Could the AI-generated paths from the prompt be conflicting with my explicit configuration?

6. **Status Polling**: Is there a recommended way to poll for crawl status in n8n? Should I be using a different approach to check completion?

## Actual Response Example
The crawl returns this initially (this is the actual response I'm getting):

```json
[
  {
    "success": true,
    "id": "93cded59-9da6-4731-837d-a5a40dbb171f",
    "url": "https://api.firecrawl.dev/v2/crawl/93cded59-9da6-4731-837d-a5a40dbb171f",
    "promptGeneratedOptions": {
      "includePaths": [
        "/$",
        "/blogs/.*",
        "/pages/press",
        "/pages/story",
        "/pages/newsletter"
      ],
      "excludePaths": [
        "/collections/.*",
        "/products/.*",
        "/pages/contact",
        "/pages/career",
        "/pages/faq",
        "/pages/store-locator",
        "/tools/.*"
      ],
      "maxDepth": 10,
      "crawlEntireDomain": false,
      "allowExternalLinks": false,
      "allowSubdomains": false,
      "sitemap": "include",
      "ignoreQueryParameters": true,
      "deduplicateSimilarURLs": true
    },
    "finalCrawlerOptions": {
      "includePaths": [
        "/$",
        "/blogs/.*",
        "/pages/press",
        "/pages/story",
        "/pages/newsletter"
      ],
      "excludePaths": [
        "data/*"
      ],
      "limit": 5,
      "allowExternalLinks": false,
      "allowSubdomains": true,
      "ignoreRobotsTxt": false,
      "sitemap": "include",
      "deduplicateSimilarURLs": true,
      "ignoreQueryParameters": true,
      "regexOnFullURL": false,
      "delay": 1000,
      "origin": "api",
      "integration": "n8n",
      "maxConcurrency": 5,
      "maxDepth": 10,
      "crawlEntireDomain": false
    }
  }
]
```

**The Problem**: When I check the status of this crawl job, it never reaches "completed" state. It stays in "running" status indefinitely until I manually stop it in the Firecrawl dashboard. The crawl appears to be stuck and not progressing, even though the limit is only 5 pages.

Any help would be greatly appreciated! 🙏