7 months ago
Description: firecrawl api server + worker without auth, works with dify
Category: AI/ML
7 Replies
7 months ago
What's the schema?
7 months ago
Are you trying to use LLM extraction?
7 months ago
Yes, but it would be ideal to know all routes and schema.
4 months ago
It took a bit of messing around inside the docker file.
Check the schema.txt attachment.
Here's a detailed explanation of the main routes and their functionality:
1. Scrape Endpoint (POST /v0/scrape)
Purpose: Scrapes content from a single webpage
Key Features:
Supports various scraping methods (fire-engine, ScrapingBee, Playwright)
Configurable page options:
onlyMainContent: Extract main content only
includeHtml: Include HTML in response
waitFor: Wait time after page load
screenshot: Capture page screenshot
headers: Custom request headers
Custom scraping handling for special cases (e.g., Readme Docs, Vanta portals)
PDF parsing support
2. Crawl Endpoints
Start Crawl (POST /v0/crawl)
Purpose: Initiates a web crawling job
Sitemap detection and processing
Robot.txt compliance
Configurable crawler options:
maxCrawledLinks: Limit on pages to crawl
maxDepth: Maximum crawl depth
includes/excludes: URL patterns to include/exclude
URL validation and blocking for restricted sites
Crawl Status (GET /v0/crawl/status/:jobId)
Purpose: Check status of ongoing crawl
Returns: Progress information and results
Cancel Crawl (DELETE /v0/crawl/cancel/:jobId)
Purpose: Cancels an ongoing crawl job
Crawl Preview (POST /v0/crawlWebsitePreview)
Purpose: Preview crawl results without full execution
3. Search Endpoint (POST /v0/search)
Purpose: Search the web and optionally fetch page content
Features:
Google search integration via Serper API
Configurable search options:
limit: Number of results (default: 7)
lang: Language (default: "en")
country: Country (default: "us")
location: Geographic location
Optional content fetching:
Can return just search results or include scraped content
Filters out blocked URLs
Credit system integration
Rate limiting and error handling
4. Authentication and Health Routes
Key Authentication (GET /v0/keyAuth)
Purpose: Validate API keys
Supports environment variable bypass (USE_DB_AUTHENTICATION=false)
Health Checks:
GET /v0/health/liveness: Check if service is alive
GET /v0/health/readiness: Check if service is ready
Common Features Across Routes:
Authentication:
Bearer token authentication
Optional bypass with environment variable
Rate limiting per endpoint
Error Handling:
Consistent error response format
Sentry error tracking
Detailed logging
Credit System:
Credit checking before operations
Usage tracking
Billing integration
Response Format:
Standardized success/error responses
Status codes for different scenarios
Detailed error messages
Monitoring:
Job logging
Performance tracking
System monitoring integration
Authentication
This is how the authentication system works:
1. Authentication Middleware (withAuth.ts)
The system uses a higher-order function withAuth that wraps authentication logic around API endpoints
It supports two modes:
Bypass Mode: When USE_DB_AUTHENTICATION=false
Automatically returns { success: true } without checking credentials
Logs a warning message (up to 5 times) to notify about bypassed authentication
Normal Mode: When USE_DB_AUTHENTICATION is not 'false'
Executes the normal authentication flow
2. Main Authentication Flow (auth.ts)
When authentication is enabled, the flow works as follows:
Token Extraction:
Expects a Bearer token in the Authorization header
Format: Authorization: Bearer <token>
Returns 401 if header or token is missing
Token Validation:
Special case: If token is "this_is_just_a_preview_token"
Sets teamId to "preview"
Uses preview rate limiting
Normal case:
Normalizes the API key
Validates it's a UUID format
Returns 401 if invalid
Token Verification:
Uses a caching system for performance:
First checks Redis cache using key api_key:{normalized_token}
If not in cache:
Calls Supabase RPC function get_key_and_price_id_2
Caches result for 10 seconds
Retrieves:
team_id
price_id (for subscription plan)
Rate Limiting:
Uses Redis-based rate limiting
Different limits based on mode (preview vs normal)
Tracks limits by IP + token combination
3. Database Integration:
Uses Supabase for API key storage and verification
The RPC function get_key_and_price_id_2 checks:
api_keys table for valid API keys
subscriptions table for associated pricing plans
Returns team_id and price_id if valid
4. Error Handling:
Returns appropriate HTTP status codes:
401 for missing/invalid tokens
500 for server errors
Includes error messages in the response
Logs errors and captures exceptions with Sentry
5. Usage:
With Authentication (default):
curl -H "Authorization: Bearer your-api-key" ...
2. Without Authentication (development/testing):
export USE_DB_AUTHENTICATION=false curl ... # No Authorization header needed
Attachments
4 months ago
here's how API keys are managed in the system:
API keys are stored in the
api_keys
table in Supabase with the following structure:key
: The API key stringteam_id
: Associated team IDproject_id
: Optional project ID (foreign key)
Adding API Keys:
Currently there doesn't seem to be a direct API endpoint for users to create API keys
API keys are likely managed through the Supabase dashboard or administrative tools
You can add keys directly in Supabase using SQL:
INSERT INTO api_keys (key, team_id, project_id) VALUES ('your-api-key', 'team-id', 'project-id');
Removing API Keys:
Similarly, keys can be removed through the Supabase dashboard or using SQL:
DELETE FROM api_keys WHERE key = 'your-api-key';
Key Validation:
The system validates API keys using the
getKeyAndPriceId
function inauth.ts
It checks if the key exists in the
api_keys
table and returns the associated team ID and price ID
Security Considerations:
API keys are used with Bearer token authentication
The system includes rate limiting and usage tracking
Keys are associated with teams and their subscription plans
To manage API keys, you would need to:
Have administrative access to your Supabase instance
Use the Supabase dashboard or SQL interface to add/remove keys
Ensure keys are associated with valid team IDs and subscription plans
The database is accessed through Supabase, and requires two main environment variables to be configured:
SUPABASE_URL
: The URL of your Supabase instanceSUPABASE_SERVICE_TOKEN
: The service role API key for authentication
Additionally, there's a feature flag:
USE_DB_AUTHENTICATION
: If set to "false", database access is disabled
To access the database:
The default way to access the database is through the
supabase_service
client that's exported from/services/supabase.ts
. You can use it like this:
import { supabase_service } from "../services/supabase"; // Query example const { data, error } = await supabase_service .from("table_name") .select("*");
You need to set up these environment variables:
SUPABASE_URL=your_supabase_project_url SUPABASE_SERVICE_TOKEN=your_supabase_service_role_key USE_DB_AUTHENTICATION=true
The service token should be the "service_role" key from your Supabase project settings, NOT the anon or public key, as it needs full database access.
If you don't have these environment variables set:
If
USE_DB_AUTHENTICATION=false
: The system will work but without database functionalityIf
USE_DB_AUTHENTICATION=true
but missing credentials: The system will throw errors when trying to access the database
3 months ago
hi, I am trying to run this but the worker gets an error cannot connect. In the variables, it says redisurl and redisratelimiturl are set to the public url. i modified them to use the internal url with the password and username, but then i get the error NOT FOUND, even though redis is up running and communicating properly with the API server. Any ideas?
25 days ago
Hey this doesnt have the new map feature ?