Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: incorrect BasicCrawler clean up after CriticalError #2807

Open
barjin opened this issue Jan 10, 2025 · 0 comments
Open

bug: incorrect BasicCrawler clean up after CriticalError #2807

barjin opened this issue Jan 10, 2025 · 0 comments
Assignees
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.

Comments

@barjin
Copy link
Contributor

barjin commented Jan 10, 2025

As reported by @danielcrabtree under #2777 , Crawlee expects CriticalError to bring the entire process down and doesn't clean up all parts of the crawler correctly.

Example:

import log from '@apify/log';
import { CheerioCrawler, CriticalError } from 'crawlee';

log.setLevel(log.LEVELS.DEBUG); // to see the culprit

const crawler = new CheerioCrawler({
    requestHandler: async () => {
        throw new CriticalError('Critical error! The crawler won\'t recover from this!');
    },
    statusMessageLoggingInterval: 1, // to see the culprit faster
});

try {
    await crawler.run(['https://crawlee.dev']);
} catch (e) {
    // We catch the critical error here, so the process doesn't crash.
    // Right after this catch clause, the process should exit with code 0, 
    // But it will hang indefinitely instead because of the periodic logging `setInterval` hogging the event loop.
}

Proposed solution:

Revise the BasicCrawler.run method and ensure all timeouts / intervals / ... are cleared correctly in all the possible execution branches.

@barjin barjin added the bug Something isn't working. label Jan 10, 2025
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Jan 10, 2025
@barjin barjin changed the title bug: incorrect BasicCrawler clean-up after CriticalError bug: incorrect BasicCrawler clean up after CriticalError Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team.
Projects
None yet
Development

No branches or pull requests

1 participant