2 years ago
Description: firecrawl api server + worker without auth, works with dify
Category: AI/ML
8 Replies
2 years ago
What's the schema?
2 years ago
Are you trying to use LLM extraction?
2 years ago
Yes, but it would be ideal to know all routes and schema.
a year ago
It took a bit of messing around inside the docker file.
Check the schema.txt attachment.
Here's a detailed explanation of the main routes and their functionality:
1. Scrape Endpoint (POST /v0/scrape)
Purpose: Scrapes content from a single webpage
Key Features:
Supports various scraping methods (fire-engine, ScrapingBee, Playwright)
Configurable page options:
onlyMainContent: Extract main content only
includeHtml: Include HTML in response
waitFor: Wait time after page load
screenshot: Capture page screenshot
headers: Custom request headers
Custom scraping handling for special cases (e.g., Readme Docs, Vanta portals)
PDF parsing support
2. Crawl Endpoints
Start Crawl (POST /v0/crawl)
Purpose: Initiates a web crawling job
Sitemap detection and processing
Robot.txt compliance
Configurable crawler options:
maxCrawledLinks: Limit on pages to crawl
maxDepth: Maximum crawl depth
includes/excludes: URL patterns to include/exclude
URL validation and blocking for restricted sites
Crawl Status (GET /v0/crawl/status/:jobId)
Purpose: Check status of ongoing crawl
Returns: Progress information and results
Cancel Crawl (DELETE /v0/crawl/cancel/:jobId)
Purpose: Cancels an ongoing crawl job
Crawl Preview (POST /v0/crawlWebsitePreview)
Purpose: Preview crawl results without full execution
3. Search Endpoint (POST /v0/search)
Purpose: Search the web and optionally fetch page content
Features:
Google search integration via Serper API
Configurable search options:
limit: Number of results (default: 7)
lang: Language (default: "en")
country: Country (default: "us")
location: Geographic location
Optional content fetching:
Can return just search results or include scraped content
Filters out blocked URLs
Credit system integration
Rate limiting and error handling
4. Authentication and Health Routes
Key Authentication (GET /v0/keyAuth)
Purpose: Validate API keys
Supports environment variable bypass (USE_DB_AUTHENTICATION=false)
Health Checks:
GET /v0/health/liveness: Check if service is alive
GET /v0/health/readiness: Check if service is ready
Common Features Across Routes:
Authentication:
Bearer token authentication
Optional bypass with environment variable
Rate limiting per endpoint
Error Handling:
Consistent error response format
Sentry error tracking
Detailed logging
Credit System:
Credit checking before operations
Usage tracking
Billing integration
Response Format:
Standardized success/error responses
Status codes for different scenarios
Detailed error messages
Monitoring:
Job logging
Performance tracking
System monitoring integration
Authentication
This is how the authentication system works:
1. Authentication Middleware (withAuth.ts)
The system uses a higher-order function withAuth that wraps authentication logic around API endpoints
It supports two modes:
Bypass Mode: When USE_DB_AUTHENTICATION=false
Automatically returns { success: true } without checking credentials
Logs a warning message (up to 5 times) to notify about bypassed authentication
Normal Mode: When USE_DB_AUTHENTICATION is not 'false'
Executes the normal authentication flow
2. Main Authentication Flow (auth.ts)
When authentication is enabled, the flow works as follows:
Token Extraction:
Expects a Bearer token in the Authorization header
Format: Authorization: Bearer
Returns 401 if header or token is missing
Token Validation:
Special case: If token is "this_is_just_a_preview_token"
Sets teamId to "preview"
Uses preview rate limiting
Normal case:
Normalizes the API key
Validates it's a UUID format
Returns 401 if invalid
Token Verification:
Uses a caching system for performance:
First checks Redis cache using key api_key:{normalized_token}
If not in cache:
Calls Supabase RPC function get_key_and_price_id_2
Caches result for 10 seconds
Retrieves:
team_id
price_id (for subscription plan)
Rate Limiting:
Uses Redis-based rate limiting
Different limits based on mode (preview vs normal)
Tracks limits by IP + token combination
3. Database Integration:
Uses Supabase for API key storage and verification
The RPC function get_key_and_price_id_2 checks:
api_keys table for valid API keys
subscriptions table for associated pricing plans
Returns team_id and price_id if valid
4. Error Handling:
Returns appropriate HTTP status codes:
401 for missing/invalid tokens
500 for server errors
Includes error messages in the response
Logs errors and captures exceptions with Sentry
5. Usage:
With Authentication (default):
curl -H "Authorization: Bearer your-api-key" ...
2. Without Authentication (development/testing):
export USE_DB_AUTHENTICATION=false curl ... # No Authorization header needed
Attachments
a year ago
here's how API keys are managed in the system:
- API keys are stored in the
api_keystable in Supabase with the following structure:key: The API key stringteam_id: Associated team IDproject_id: Optional project ID (foreign key)
- Adding API Keys:
- Currently there doesn't seem to be a direct API endpoint for users to create API keys
- API keys are likely managed through the Supabase dashboard or administrative tools
- You can add keys directly in Supabase using SQL:
INSERT INTO api_keys (key, team_id, project_id) VALUES ('your-api-key', 'team-id', 'project-id');- Removing API Keys:
- Similarly, keys can be removed through the Supabase dashboard or using SQL:
DELETE FROM api_keys WHERE key = 'your-api-key';
- Similarly, keys can be removed through the Supabase dashboard or using SQL:
- Key Validation:
- The system validates API keys using the
getKeyAndPriceIdfunction inauth.ts - It checks if the key exists in the
api_keystable and returns the associated team ID and price ID
- The system validates API keys using the
- Security Considerations:
- API keys are used with Bearer token authentication
- The system includes rate limiting and usage tracking
- Keys are associated with teams and their subscription plans
To manage API keys, you would need to:
- Have administrative access to your Supabase instance
- Use the Supabase dashboard or SQL interface to add/remove keys
- Ensure keys are associated with valid team IDs and subscription plans
The database is accessed through Supabase, and requires two main environment variables to be configured:
SUPABASE_URL: The URL of your Supabase instanceSUPABASE_SERVICE_TOKEN: The service role API key for authentication
Additionally, there's a feature flag:
USE_DB_AUTHENTICATION: If set to "false", database access is disabled
To access the database:
- The default way to access the database is through the
supabase_serviceclient that's exported from/services/supabase.ts. You can use it like this:
import { supabase_service } from "../services/supabase"; // Query example const { data, error } = await supabase_service .from("table_name") .select("*");
- You need to set up these environment variables:
SUPABASE_URL=your_supabase_project_url SUPABASE_SERVICE_TOKEN=your_supabase_service_role_key USE_DB_AUTHENTICATION=true
- The service token should be the "service_role" key from your Supabase project settings, NOT the anon or public key, as it needs full database access.
- If you don't have these environment variables set:
- If
USE_DB_AUTHENTICATION=false: The system will work but without database functionality - If
USE_DB_AUTHENTICATION=truebut missing credentials: The system will throw errors when trying to access the database
- If
a year ago
hi, I am trying to run this but the worker gets an error cannot connect. In the variables, it says redisurl and redisratelimiturl are set to the public url. i modified them to use the internal url with the password and username, but then i get the error NOT FOUND, even though redis is up running and communicating properly with the API server. Any ideas?
a year ago
Hey this doesnt have the new map feature ?
a year ago
This image is out of date/last updated 7 months ago. Can we get the latest from the source repo?

