firecrawl

bonkboykzPRO

7 months ago

Description: firecrawl api server + worker without auth, works with dify

Category: AI/ML

URL: https://railway.app/template/AIaBEM

7 Replies

nxfi777PRO

7 months ago

What's the schema?


bonkboykzPRO

7 months ago

Are you trying to use LLM extraction?


nxfi777PRO

7 months ago

Yes, but it would be ideal to know all routes and schema.


jroth55HOBBY

4 months ago

It took a bit of messing around inside the docker file.

Check the schema.txt attachment.

Here's a detailed explanation of the main routes and their functionality:

1. Scrape Endpoint (POST /v0/scrape)

Purpose: Scrapes content from a single webpage

Key Features:

Supports various scraping methods (fire-engine, ScrapingBee, Playwright)

Configurable page options:

onlyMainContent: Extract main content only

includeHtml: Include HTML in response

waitFor: Wait time after page load

screenshot: Capture page screenshot

headers: Custom request headers

Custom scraping handling for special cases (e.g., Readme Docs, Vanta portals)

PDF parsing support

2. Crawl Endpoints

Start Crawl (POST /v0/crawl)

Purpose: Initiates a web crawling job

Sitemap detection and processing

Robot.txt compliance

Configurable crawler options:

maxCrawledLinks: Limit on pages to crawl

maxDepth: Maximum crawl depth

includes/excludes: URL patterns to include/exclude

URL validation and blocking for restricted sites

Crawl Status (GET /v0/crawl/status/:jobId)

Purpose: Check status of ongoing crawl

Returns: Progress information and results

Cancel Crawl (DELETE /v0/crawl/cancel/:jobId)

Purpose: Cancels an ongoing crawl job

Crawl Preview (POST /v0/crawlWebsitePreview)

Purpose: Preview crawl results without full execution

3. Search Endpoint (POST /v0/search)

Purpose: Search the web and optionally fetch page content

Features:

Google search integration via Serper API

Configurable search options:

limit: Number of results (default: 7)

lang: Language (default: "en")

country: Country (default: "us")

location: Geographic location

Optional content fetching:

Can return just search results or include scraped content

Filters out blocked URLs

Credit system integration

Rate limiting and error handling

4. Authentication and Health Routes

Key Authentication (GET /v0/keyAuth)

Purpose: Validate API keys

Supports environment variable bypass (USE_DB_AUTHENTICATION=false)

Health Checks:

GET /v0/health/liveness: Check if service is alive

GET /v0/health/readiness: Check if service is ready

Common Features Across Routes:

Authentication:

Bearer token authentication

Optional bypass with environment variable

Rate limiting per endpoint

Error Handling:

Consistent error response format

Sentry error tracking

Detailed logging

Credit System:

Credit checking before operations

Usage tracking

Billing integration

Response Format:

Standardized success/error responses

Status codes for different scenarios

Detailed error messages

Monitoring:

Job logging

Performance tracking

System monitoring integration

Authentication

This is how the authentication system works:

1. Authentication Middleware (withAuth.ts)

The system uses a higher-order function withAuth that wraps authentication logic around API endpoints

It supports two modes:

Bypass Mode: When USE_DB_AUTHENTICATION=false

Automatically returns { success: true } without checking credentials

Logs a warning message (up to 5 times) to notify about bypassed authentication

Normal Mode: When USE_DB_AUTHENTICATION is not 'false'

Executes the normal authentication flow

2. Main Authentication Flow (auth.ts)

When authentication is enabled, the flow works as follows:

Token Extraction:

Expects a Bearer token in the Authorization header

Format: Authorization: Bearer <token>

Returns 401 if header or token is missing

Token Validation:

Special case: If token is "this_is_just_a_preview_token"

Sets teamId to "preview"

Uses preview rate limiting

Normal case:

Normalizes the API key

Validates it's a UUID format

Returns 401 if invalid

Token Verification:

Uses a caching system for performance:

First checks Redis cache using key api_key:{normalized_token}

If not in cache:

Calls Supabase RPC function get_key_and_price_id_2

Caches result for 10 seconds

Retrieves:

team_id

price_id (for subscription plan)

Rate Limiting:

Uses Redis-based rate limiting

Different limits based on mode (preview vs normal)

Tracks limits by IP + token combination

3. Database Integration:

Uses Supabase for API key storage and verification

The RPC function get_key_and_price_id_2 checks:

api_keys table for valid API keys

subscriptions table for associated pricing plans

Returns team_id and price_id if valid

4. Error Handling:

Returns appropriate HTTP status codes:

401 for missing/invalid tokens

500 for server errors

Includes error messages in the response

Logs errors and captures exceptions with Sentry

5. Usage:

With Authentication (default):

curl -H "Authorization: Bearer your-api-key" ...

2. Without Authentication (development/testing):

export USE_DB_AUTHENTICATION=false curl ... # No Authorization header needed

Attachments


jroth55HOBBY

4 months ago

here's how API keys are managed in the system:

  1. API keys are stored in the api_keys table in Supabase with the following structure:

    • key: The API key string

    • team_id: Associated team ID

    • project_id: Optional project ID (foreign key)

  2. Adding API Keys:

    • Currently there doesn't seem to be a direct API endpoint for users to create API keys

    • API keys are likely managed through the Supabase dashboard or administrative tools

    • You can add keys directly in Supabase using SQL:

  1. INSERT INTO api_keys (key, team_id, project_id) VALUES ('your-api-key', 'team-id', 'project-id');

  2. Removing API Keys:

    • Similarly, keys can be removed through the Supabase dashboard or using SQL:

    DELETE FROM api_keys WHERE key = 'your-api-key';

  3. Key Validation:

    • The system validates API keys using the getKeyAndPriceId function in auth.ts

    • It checks if the key exists in the api_keys table and returns the associated team ID and price ID

  4. Security Considerations:

    • API keys are used with Bearer token authentication

    • The system includes rate limiting and usage tracking

    • Keys are associated with teams and their subscription plans

To manage API keys, you would need to:

  1. Have administrative access to your Supabase instance

  2. Use the Supabase dashboard or SQL interface to add/remove keys

  3. Ensure keys are associated with valid team IDs and subscription plans

The database is accessed through Supabase, and requires two main environment variables to be configured:

  1. SUPABASE_URL: The URL of your Supabase instance

  2. SUPABASE_SERVICE_TOKEN: The service role API key for authentication

Additionally, there's a feature flag:

  • USE_DB_AUTHENTICATION: If set to "false", database access is disabled

To access the database:

  1. The default way to access the database is through the supabase_service client that's exported from /services/supabase.ts. You can use it like this:

import { supabase_service } from "../services/supabase"; // Query example const { data, error } = await supabase_service .from("table_name") .select("*");

  1. You need to set up these environment variables:

SUPABASE_URL=your_supabase_project_url SUPABASE_SERVICE_TOKEN=your_supabase_service_role_key USE_DB_AUTHENTICATION=true

  1. The service token should be the "service_role" key from your Supabase project settings, NOT the anon or public key, as it needs full database access.

  2. If you don't have these environment variables set:

    • If USE_DB_AUTHENTICATION=false: The system will work but without database functionality

    • If USE_DB_AUTHENTICATION=true but missing credentials: The system will throw errors when trying to access the database


bveisehPRO

4 months ago

hi, I am trying to run this but the worker gets an error cannot connect. In the variables, it says redisurl and redisratelimiturl are set to the public url. i modified them to use the internal url with the password and username, but then i get the error NOT FOUND, even though redis is up running and communicating properly with the API server. Any ideas?


rifadm817HOBBY

a month ago

Hey this doesnt have the new map feature ?