firecrawl

bonkboykzPRO

7 months ago

Description: firecrawl api server + worker without auth, works with dify

Category: AI/ML

URL: https://railway.app/template/AIaBEM

7 Replies

nxfi777PRO

7 months ago

What's the schema?


bonkboykzPRO

7 months ago

Are you trying to use LLM extraction?


nxfi777PRO

7 months ago

Yes, but it would be ideal to know all routes and schema.


jroth55HOBBY

4 months ago

It took a bit of messing around inside the docker file.

Check the schema.txt attachment.

Here's a detailed explanation of the main routes and their functionality:

1. Scrape Endpoint (POST /v0/scrape)

Purpose: Scrapes content from a single webpage

Key Features:

Supports various scraping methods (fire-engine, ScrapingBee, Playwright)

Configurable page options:

onlyMainContent: Extract main content only

includeHtml: Include HTML in response

waitFor: Wait time after page load

screenshot: Capture page screenshot

headers: Custom request headers

Custom scraping handling for special cases (e.g., Readme Docs, Vanta portals)

PDF parsing support

2. Crawl Endpoints

Start Crawl (POST /v0/crawl)

Purpose: Initiates a web crawling job

Sitemap detection and processing

Robot.txt compliance

Configurable crawler options:

maxCrawledLinks: Limit on pages to crawl

maxDepth: Maximum crawl depth

includes/excludes: URL patterns to include/exclude

URL validation and blocking for restricted sites

Crawl Status (GET /v0/crawl/status/:jobId)

Purpose: Check status of ongoing crawl

Returns: Progress information and results

Cancel Crawl (DELETE /v0/crawl/cancel/:jobId)

Purpose: Cancels an ongoing crawl job

Crawl Preview (POST /v0/crawlWebsitePreview)

Purpose: Preview crawl results without full execution

3. Search Endpoint (POST /v0/search)

Purpose: Search the web and optionally fetch page content

Features:

Google search integration via Serper API

Configurable search options:

limit: Number of results (default: 7)

lang: Language (default: "en")

country: Country (default: "us")

location: Geographic location

Optional content fetching:

Can return just search results or include scraped content

Filters out blocked URLs

Credit system integration

Rate limiting and error handling

4. Authentication and Health Routes

Key Authentication (GET /v0/keyAuth)

Purpose: Validate API keys

Supports environment variable bypass (USE_DB_AUTHENTICATION=false)

Health Checks:

GET /v0/health/liveness: Check if service is alive

GET /v0/health/readiness: Check if service is ready

Common Features Across Routes:

Authentication:

Bearer token authentication

Optional bypass with environment variable

Rate limiting per endpoint

Error Handling:

Consistent error response format

Sentry error tracking

Detailed logging

Credit System:

Credit checking before operations

Usage tracking

Billing integration

Response Format:

Standardized success/error responses

Status codes for different scenarios

Detailed error messages

Monitoring:

Job logging

Performance tracking

System monitoring integration

Authentication

This is how the authentication system works:

1. Authentication Middleware (withAuth.ts)

The system uses a higher-order function withAuth that wraps authentication logic around API endpoints

It supports two modes:

Bypass Mode: When USE_DB_AUTHENTICATION=false

Automatically returns { success: true } without checking credentials

Logs a warning message (up to 5 times) to notify about bypassed authentication

Normal Mode: When USE_DB_AUTHENTICATION is not 'false'

Executes the normal authentication flow

2. Main Authentication Flow (auth.ts)

When authentication is enabled, the flow works as follows:

Token Extraction:

Expects a Bearer token in the Authorization header

Format: Authorization: Bearer <token>

Returns 401 if header or token is missing

Token Validation:

Special case: If token is "this_is_just_a_preview_token"

Sets teamId to "preview"

Uses preview rate limiting

Normal case:

Normalizes the API key

Validates it's a UUID format

Returns 401 if invalid

Token Verification:

Uses a caching system for performance:

First checks Redis cache using key api_key:{normalized_token}

If not in cache:

Calls Supabase RPC function get_key_and_price_id_2

Caches result for 10 seconds

Retrieves:

team_id

price_id (for subscription plan)

Rate Limiting:

Uses Redis-based rate limiting

Different limits based on mode (preview vs normal)

Tracks limits by IP + token combination

3. Database Integration:

Uses Supabase for API key storage and verification

The RPC function get_key_and_price_id_2 checks:

api_keys table for valid API keys

subscriptions table for associated pricing plans

Returns team_id and price_id if valid

4. Error Handling:

Returns appropriate HTTP status codes:

401 for missing/invalid tokens

500 for server errors

Includes error messages in the response

Logs errors and captures exceptions with Sentry

5. Usage:

With Authentication (default):

curl -H "Authorization: Bearer your-api-key" ...

2. Without Authentication (development/testing):

export USE_DB_AUTHENTICATION=false curl ... # No Authorization header needed

Attachments


jroth55HOBBY

4 months ago

here's how API keys are managed in the system:

  1. API keys are stored in the api_keys table in Supabase with the following structure:

    • key: The API key string

    • team_id: Associated team ID

    • project_id: Optional project ID (foreign key)

  2. Adding API Keys:

    • Currently there doesn't seem to be a direct API endpoint for users to create API keys

    • API keys are likely managed through the Supabase dashboard or administrative tools

    • You can add keys directly in Supabase using SQL:

  1. INSERT INTO api_keys (key, team_id, project_id) VALUES ('your-api-key', 'team-id', 'project-id');

  2. Removing API Keys:

    • Similarly, keys can be removed through the Supabase dashboard or using SQL:

    DELETE FROM api_keys WHERE key = 'your-api-key';

  3. Key Validation:

    • The system validates API keys using the getKeyAndPriceId function in auth.ts

    • It checks if the key exists in the api_keys table and returns the associated team ID and price ID

  4. Security Considerations:

    • API keys are used with Bearer token authentication

    • The system includes rate limiting and usage tracking

    • Keys are associated with teams and their subscription plans

To manage API keys, you would need to:

  1. Have administrative access to your Supabase instance

  2. Use the Supabase dashboard or SQL interface to add/remove keys

  3. Ensure keys are associated with valid team IDs and subscription plans

The database is accessed through Supabase, and requires two main environment variables to be configured:

  1. SUPABASE_URL: The URL of your Supabase instance

  2. SUPABASE_SERVICE_TOKEN: The service role API key for authentication

Additionally, there's a feature flag:

  • USE_DB_AUTHENTICATION: If set to "false", database access is disabled

To access the database:

  1. The default way to access the database is through the supabase_service client that's exported from /services/supabase.ts. You can use it like this:

import { supabase_service } from "../services/supabase"; // Query example const { data, error } = await supabase_service .from("table_name") .select("*");

  1. You need to set up these environment variables:

SUPABASE_URL=your_supabase_project_url SUPABASE_SERVICE_TOKEN=your_supabase_service_role_key USE_DB_AUTHENTICATION=true

  1. The service token should be the "service_role" key from your Supabase project settings, NOT the anon or public key, as it needs full database access.

  2. If you don't have these environment variables set:

    • If USE_DB_AUTHENTICATION=false: The system will work but without database functionality

    • If USE_DB_AUTHENTICATION=true but missing credentials: The system will throw errors when trying to access the database


bveisehPRO

3 months ago

hi, I am trying to run this but the worker gets an error cannot connect. In the variables, it says redisurl and redisratelimiturl are set to the public url. i modified them to use the internal url with the password and username, but then i get the error NOT FOUND, even though redis is up running and communicating properly with the API server. Any ideas?


rifadm817HOBBY

25 days ago

Hey this doesnt have the new map feature ?


firecrawl - Railway Help Station