Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default mail if nothing new since last newsletter #12

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Aurelien-Gindre
Copy link
Contributor

Problem

The user should receive an email specifying there are no news this time.

Solution

The agent checks for the date of the different piece of news and filter only those published since the last newsletter.
image

How To Test

Go to newsletterFormat.ts and modify the links, interests or minimumDate of the articles.

@adguernier adguernier self-requested a review February 5, 2025 10:50
max: 5,
minimumDate : new Date('2025-02-01'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick:

Suggested change
minimumDate : new Date('2025-02-01'),
minDate : new Date('2025-02-01'),

};

async function scrapeDate($: cheerio.CheerioAPI) {
// Extract the latest dazte of publication
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Extract the latest dazte of publication
// Extract the latest date of publication

@@ -15,5 +16,77 @@ export const scrape = async ({
const $ = cheerio.load(await response.text());
const dom = new JSDOM($.html());
const article = new Readability(dom.window.document).parse();
return article?.textContent.substring(0, maxContentSize);
const publicationDate = await scrapeDate($);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I'm afraid the way to get the publication date of the article is a bit random. For example, targeting the HTML element to hope to find the publication date with .publication-date, #pub-date is really specific and will only work for a few pages!
How to be sure that there is a publication date in the page and how to be sure to get the right one if there are several dates in the page?
IMO we can find a better way to handle this feature (using the database for instance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants