A Cool Bug
Posted September 10, 2013on:
I got a request that the reporting weekly email is missing numbers from its users. Turned out a few days’ logs are missing in the system. I checked the cron jobs and cron running system that the sync of logs should be fine — I have manually run the sync logs scripts and it is working out fine too.
What is the problem that these syncing didn’t happen for the past few days?
From talking to other people, we realized here is what is going on. Our cron running system will run all executable scripts in a directory based on its frequency requirement (per directory). It could be the scripts before the one I need are having problems. We checked the first script using the cron log system. Nothing. Then the second script. Turned out it has been hanging for all these few days — all the cron processes are still there in the system and they couldn’t make progress at all. This is because the script needs to access some production machines but the production setup (dns service etc) has been changed in the past few days.
Solution is instead of saving files to a certain machine in production, we are uploading to s3. That way it will have nothing to do with future production upgrades.
Quite a cool experience when figured out, starting with so much confusion🙂