Celery worker crashes after 12am everyday and also it doesn't work as it's supposed to.

3 months ago

Celery worker crashes after 12am and doesn't work as it's supposed to. It should have auto clock out all the users but doesn't do a thing.

$10 Bounty

10 Replies

Railway
BOT

3 months ago

Hey there! We've found the following might help you get unblocked faster:

If you find the answer from one of these, please let us know by solving the thread!


bytekeim
PRO

3 months ago

yo, are you running celery beat as a separate process or just the worker?


3 months ago

it's like this

Attachments


bytekeim
PRO

3 months ago

add this to ur celery config:

from celery.schedules import crontab

app.conf.beat_schedule = {
    'auto-clock-out': {
        'task': 'your_app.tasks.auto_clock_out',
        'schedule': crontab(hour=0, minute=0),
    },
}
app.conf.timezone = 'UTC'  # change to ur timezone

also add this so workers dont crash from memory:

app.conf.worker_max_tasks_per_child = 100

check ur logs to see if beat is actually running. a big chance thats ur issue

also make sure ur redis connection isnt timing out and ur timezone is set right or itll run at the wrong time

lmk if that works


3 months ago

done this:
import os

from celery import Celery

from celery.schedules import crontab

# Set the default Django settings module for the 'celery' program.

os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'attendance_system.settings')

app = Celery('attendance_system')

# Using a string here means the worker doesn't have to serialize

# the configuration object to child processes.

app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django apps.

app.autodiscover_tasks()

# Celery Beat schedule

app.conf.beat_schedule = {

# Auto clock-out check every 30 minutes (no email notification)

'auto-clock-out-check': {

'task': 'attendance.tasks.auto_clock_out_check',

'schedule': crontab(minute='*/30'), # Every 30 minutes

},

# Weekly reports every Friday at 5 PM (beautiful HTML emails)

'weekly-reports': {

'task': 'attendance.tasks.send_weekly_reports',

'schedule': crontab(hour=17, minute=0, day_of_week=5), # Friday 5 PM

},

# Disabled email notifications (only weekly reports are sent):

# - Missed clock-out reminders (disabled)

# - Auto clock-out notifications (disabled in tasks.py)

# - Early clock-out alerts (disabled)

}

# Set timezone for beat scheduler

app.conf.timezone = 'Australia/Sydney'

# Additional Celery configuration

app.conf.update(

worker_max_tasks_per_child=100, # Reduced to prevent memory issues

worker_prefetch_multiplier=1,

task_acks_late=True,

task_reject_on_worker_lost=True,

)

@app.task(bind=True)

def debug_task(self):

print(f'Request: {self.request!r}')

and this (startworker.sh):
#!/bin/bash

echo "=== STARTING CELERY WORKER ==="

echo "Current time: $(date)"

# Run migrations (safe to run multiple times)

python manage.py migrate --noinput

# Start Celery worker

exec celery -A attendance_system worker \

--loglevel=info \

--concurrency=2 \

--max-tasks-per-child=100


bytekeim
PRO

3 months ago

yo ok so your config looks pretty solid actually. the issue is ur running the auto clock out every 30 mins but the worker is still crashing

couple things i see:

1. ur task might be the problem

show me ur attendance.tasks.auto_clock_out_check function. if theres an unhandled exception in there its gonna kill the worker. add this to it:

@shared_task(bind=True, max_retries=3, soft_time_limit=300)
def auto_clock_out_check(self):
    try:
        # ur code here
    except Exception as e:
        logger.error(f"auto clock out failed: {e}")
        raise self.retry(exc=e, countdown=300)

2- database connections timing out

django db connections can timeout overnight. add this to ur task:

from django.db import connection

@shared_task(bind=True, max_retries=3)
def auto_clock_out_check(self):
    try:
        connection.close()  # close old connection
        # then do ur work
    except Exception as e:
        logger.error(f"error: {e}")

3. check ur redis

redis might be dropping connections. add this to ur django settings:

CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True
CELERY_BROKER_POOL_LIMIT = None

whats probably happening: the task runs fine during the day but at night theres either a db connection timeout or the task hits some edge case that crashes everything

can you paste the worker logs from when it crashes? that'll tell me exactly whats breaking


bytekeim

yo ok so your config looks pretty solid actually. the issue is ur running the auto clock out every 30 mins but the worker is still crashingcouple things i see:1. ur task might be the problemshow me ur attendance.tasks.auto_clock_out_check function. if theres an unhandled exception in there its gonna kill the worker. add this to it:@shared_task(bind=True, max_retries=3, soft_time_limit=300) def auto_clock_out_check(self): try: # ur code here except Exception as e: logger.error(f"auto clock out failed: {e}") raise self.retry(exc=e, countdown=300)2- database connections timing outdjango db connections can timeout overnight. add this to ur task:from django.db import connection @shared_task(bind=True, max_retries=3) def auto_clock_out_check(self): try: connection.close() # close old connection # then do ur work except Exception as e: logger.error(f"error: {e}")3. check ur redisredis might be dropping connections. add this to ur django settings:CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True CELERY_BROKER_POOL_LIMIT = Nonewhats probably happening: the task runs fine during the day but at night theres either a db connection timeout or the task hits some edge case that crashes everythingcan you paste the worker logs from when it crashes? that'll tell me exactly whats breaking

3 months ago

Thanks for checking it out. Tasks I have:
@shared_task

def auto_clock_out_check():

"""

Auto clock-out employees after required shift hours OR at office closing time (whichever comes first)

Controlled by SystemSettings

"""

# Load system settings

system_settings = SystemSettings.load()

# Check if auto clock-out is enabled

if not system_settings.enable_auto_clockout:

return "Auto clock-out is disabled in system settings"

sydney_tz = pytz.timezone('Australia/Sydney')

now = timezone.now().astimezone(sydney_tz)

today = now.date()

current_time = now.time()

# Get all employees currently clocked IN

employees_in = DailySummary.objects.filter(

date=today,

current_status='IN'

)

for summary in employees_in:

should_clock_out = False

# Check if it's office closing time or later

if current_time >= system_settings.office_end_time:

should_clock_out = True

# Check if required shift hours have passed since first clock in

elif summary.first_clock_in:

first_in_dt = datetime.combine(today, summary.first_clock_in)

hours_elapsed = (now - sydney_tz.localize(first_in_dt)).total_seconds() / 3600

if hours_elapsed >= float(system_settings.required_shift_hours):

should_clock_out = True

if should_clock_out:

# Create auto clock-out tap

AttendanceTap.objects.create(

employee_id=summary.employee_id,

employee_name=summary.employee_name,

action='OUT',

notes='Auto clock-out'

)

# Update daily summary

summary.last_clock_out = current_time

summary.tap_count += 1

summary.current_status = 'OUT'

# Calculate hours

first_in_dt = datetime.combine(today, summary.first_clock_in)

last_out_dt = datetime.combine(today, summary.last_clock_out)

time_diff = last_out_dt - first_in_dt

raw_hours = Decimal(time_diff.total_seconds() / 3600)

summary.raw_hours = raw_hours

# Apply break deduction (use system settings)

if raw_hours > 5:

summary.break_deduction = system_settings.break_duration_hours

else:

summary.break_deduction = Decimal('0')

summary.final_hours = summary.raw_hours - summary.break_deduction

summary.save()

# Email notifications disabled - only weekly summaries are sent

# send_auto_clockout_notification.delay(summary.employee_id, str(current_time))

return f"Auto clock-out check completed. {employees_in.count()} employees checked."

I have few more other tasks but this one is what I am after.

Celery settings right now:
# Celery Production Settings

CELERY_TASK_TRACK_STARTED = True

CELERY_TASK_TIME_LIMIT = 30 * 60 # 30 minutes

CELERY_TASK_SOFT_TIME_LIMIT = 25 * 60 # 25 minutes

CELERY_WORKER_MAX_TASKS_PER_CHILD = 100

CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True

CELERY_BROKER_POOL_LIMIT = 10 # Limit Redis connections

CELERY_TASK_ACKS_LATE = True # Acknowledge task after completion

CELERY_WORKER_PREFETCH_MULTIPLIER = 1 # Only fetch one task at a time

CELERY_TASK_REJECT_ON_WORKER_LOST = True


bytekeim
PRO

3 months ago

we found it! the problem is in ur task - ur using datetime.combine() with a naive datetime and then trying to do math with timezone-aware datetimes. that's causing crashes

here's the fixed version:

from django.db import connection
import logging

logger = logging.getLogger(__name__)

@shared_task(bind=True, max_retries=3)
def auto_clock_out_check(self):
    try:
        connection.close()  # close stale connections
        
        system_settings = SystemSettings.load()
        
        if not system_settings.enable_auto_clockout:
            return "Auto clock-out is disabled in system settings"
        
        sydney_tz = pytz.timezone('Australia/Sydney')
        now = timezone.now().astimezone(sydney_tz)
        today = now.date()
        current_time = now.time()
        
        employees_in = DailySummary.objects.filter(
            date=today,
            current_status='IN'
        )
        
        clocked_out = 0
        
        for summary in employees_in:
            should_clock_out = False
            
            if current_time >= system_settings.office_end_time:
                should_clock_out = True
            elif summary.first_clock_in:
                # FIX: make datetime timezone-aware
                first_in_dt = sydney_tz.localize(datetime.combine(today, summary.first_clock_in))
                hours_elapsed = (now - first_in_dt).total_seconds() / 3600
                
                if hours_elapsed >= float(system_settings.required_shift_hours):
                    should_clock_out = True
            
            if should_clock_out:
                AttendanceTap.objects.create(
                    employee_id=summary.employee_id,
                    employee_name=summary.employee_name,
                    action='OUT',
                    notes='Auto clock-out'
                )
                
                summary.last_clock_out = current_time
                summary.tap_count += 1
                summary.current_status = 'OUT'
                
                # FIX: make both datetimes timezone-aware
                first_in_dt = sydney_tz.localize(datetime.combine(today, summary.first_clock_in))
                last_out_dt = sydney_tz.localize(datetime.combine(today, summary.last_clock_out))
                time_diff = last_out_dt - first_in_dt
                raw_hours = Decimal(time_diff.total_seconds() / 3600)
                
                summary.raw_hours = raw_hours
                
                if raw_hours > 5:
                    summary.break_deduction = system_settings.break_duration_hours
                else:
                    summary.break_deduction = Decimal('0')
                
                summary.final_hours = summary.raw_hours - summary.break_deduction
                summary.save()
                
                clocked_out += 1
        
        logger.info(f"Auto clock-out completed. Checked: {employees_in.count()}, Clocked out: {clocked_out}")
        return f"Auto clock-out check completed. {clocked_out} employees clocked out."
        
    except Exception as e:
        logger.error(f"Auto clock-out failed: {e}")
        raise self.retry(exc=e, countdown=300)

main issues:

1- ur mixing naive and timezone-aware datetimes which breaks at midnight

2- no error handling so when it crashes the whole worker dies

  1. stale db connections

the connection.close() at the start fixes db timeouts and the timezone fixes stop the crashes. try this and lmk if it works


3 months ago

Hi I tried this. I will have to wait till midnight to see if it crashes or not. But the main thing it should do was to auto logout employees. It's not doing that.


bytekeim

we found it! the problem is in ur task - ur using datetime.combine() with a naive datetime and then trying to do math with timezone-aware datetimes. that's causing crasheshere's the fixed version:from django.db import connection import logging logger = logging.getLogger(__name__) @shared_task(bind=True, max_retries=3) def auto_clock_out_check(self): try: connection.close() # close stale connections system_settings = SystemSettings.load() if not system_settings.enable_auto_clockout: return "Auto clock-out is disabled in system settings" sydney_tz = pytz.timezone('Australia/Sydney') now = timezone.now().astimezone(sydney_tz) today = now.date() current_time = now.time() employees_in = DailySummary.objects.filter( date=today, current_status='IN' ) clocked_out = 0 for summary in employees_in: should_clock_out = False if current_time >= system_settings.office_end_time: should_clock_out = True elif summary.first_clock_in: # FIX: make datetime timezone-aware first_in_dt = sydney_tz.localize(datetime.combine(today, summary.first_clock_in)) hours_elapsed = (now - first_in_dt).total_seconds() / 3600 if hours_elapsed >= float(system_settings.required_shift_hours): should_clock_out = True if should_clock_out: AttendanceTap.objects.create( employee_id=summary.employee_id, employee_name=summary.employee_name, action='OUT', notes='Auto clock-out' ) summary.last_clock_out = current_time summary.tap_count += 1 summary.current_status = 'OUT' # FIX: make both datetimes timezone-aware first_in_dt = sydney_tz.localize(datetime.combine(today, summary.first_clock_in)) last_out_dt = sydney_tz.localize(datetime.combine(today, summary.last_clock_out)) time_diff = last_out_dt - first_in_dt raw_hours = Decimal(time_diff.total_seconds() / 3600) summary.raw_hours = raw_hours if raw_hours > 5: summary.break_deduction = system_settings.break_duration_hours else: summary.break_deduction = Decimal('0') summary.final_hours = summary.raw_hours - summary.break_deduction summary.save() clocked_out += 1 logger.info(f"Auto clock-out completed. Checked: {employees_in.count()}, Clocked out: {clocked_out}") return f"Auto clock-out check completed. {clocked_out} employees clocked out." except Exception as e: logger.error(f"Auto clock-out failed: {e}") raise self.retry(exc=e, countdown=300)main issues:1- ur mixing naive and timezone-aware datetimes which breaks at midnight2- no error handling so when it crashes the whole worker diesstale db connectionsthe connection.close() at the start fixes db timeouts and the timezone fixes stop the crashes. try this and lmk if it works

3 months ago

Didn't work.


Loading...