Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve RSS Feed aggregator #361

Open
knyghty opened this issue Nov 12, 2023 · 4 comments
Open

Improve RSS Feed aggregator #361

knyghty opened this issue Nov 12, 2023 · 4 comments
Assignees

Comments

@knyghty
Copy link
Member

knyghty commented Nov 12, 2023

We should use a simple sqlite database to store stuff.

We should have a command for adding a feed to a channel. Something like:

/add_feed <channel> <feed_name> <feed_url>

Not sure if the converter is needed. Maybe optional list of choices? But only if we need it for the currently planned feeds.

When the run the command, it grabs the latest entry of that feed, and adds it to the database. We store the current datetime so we know to only grab feeds later than this in the future.

Model something like:

class Feed:
    id: int
    name: str
    channel_id: int
    url: str
    time_of_latest_post: datetime

I could be wrong but I don't think there's a reason to store any feed items anywhere.

Then we poll for posts later than time_of_latest_post, stick them in the channel, and update time_of_latest_post.

It's worth noting that the publication date is optional in RSS. Not sure what we can do about this. Maybe we can do something similar to now and check if the posts are in the channel but I don't see a good way of not splurging every ancient post into the channel. Maybe we should just ignore stuff without a date.

@jakdevmail
Copy link

I'd like to avoid an additional dependency for the database. I have an idea for a very minimalistic migration system with python's built-in sqlite lib.
I don't think we'll be making any major schema changes or anything like that, but it makes your deployment a lot easier.

Did you have something else in mind, or can i start some work in a branch and we'll see where it goes?

I'm not married to the idea, so i'm open to suggestions :)

On the account of possibly missing post dates:
I'd just consider such rss feeds to be badly behaved (even though they are following the rss standard).
We'll cross that bridge when we come to it and lets - for now - ignore (but log!) those feed items.

@knyghty
Copy link
Member Author

knyghty commented Nov 14, 2023

Personally I'm happy to use what's built into python, I'm fine with raw SQL, my main concern is indeed the migrations. As we only have one deploy and that's unlikely to change something minimal could work but I'd be interested to know what it is.

@jakdevmail
Copy link

Practically we hold one directory (lets call it "migrations"), which stores python, or sql files.
Those files are actual migrations from the last migration upwards.

The files have a prefix which indicates the order.

Note: Anything like this isn't going to have the near magical django migration experience, its pretty bare-bones. Anything else would be overkill i think.

Now, we only have one problem. How does the instance know what migration to run next (or what migration it currently is at).

Sqlite comes to the rescue:
The pragma user_version (https://www.sqlite.org/pragma.html#pragma_user_version). Its a single integer which any application can use as it wants. Sqlite only stores it, but uses it for nothing else.
Its practically begging to be used for stuff like this. We can just store out current migration index in there.

I don't mind writing sql, and we can just run the migrations on startup everytime -> Making the deployment near effortless.
Rollbacks can be implemented in practically the same way - although i would push them off to another time, i don't think we'll be breaking stuff that fast :)

@knyghty
Copy link
Member Author

knyghty commented Nov 15, 2023

Seems reasonable. I would hold off for now until I've done some more refactoring though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants