-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap mozilla's readability package #10
Comments
Hi! Thanks for the topic suggestion. To help us prepare for the hackathon event it would be great to prepare a quick 30-60 sec overview of the topic to introduce it to the group and seek interested collaborators. You can use the below prompts to help with this: What is the headline idea?What is the (realistic) outcome being aimed for during the event?What types of contributions would be welcomed (i.e. specific skills, tasks)? |
Hi there, I'm sorry for joining the conversation a bit late. I want to help out with this project and I see that Dean has commented on several prompts to be filled in. I can try my best to fill the gaps between the topics and perhaps @jimjam-slam can add more detail to it if there are any changes needed. What is the headline idea?Writing an R package to act as a wrapper for the What is the (realistic) outcome being aimed for during the event?A development version of an R package which satisfies the following criteria,
What types of contributions would be welcomed (i.e. specific skills, tasks)?
Might be helpful to read this as a starting point (https://book.javascript-for-r.com/widgets-intro-intro)[https://book.javascript-for-r.com/widgets-intro-intro] |
Hey @janithwanni, welcome — and thanks for filling the template in! I think it's a great summary of what this project would add and involve 🥳 Although you could potentially use Absolutely agree that human-readable plain text done more easily for the user than One potential 'stretch goal' (or second version goal) could be to also deliver stripped-back HTML output, as the readability library returns both a plain text version of the article and an HTML version. There are also configuration options that could be supported down the road. |
Thank you so much for the links, @jimjam-slam! I will read these before the hackathon 💪 I had only worked with HTML widget-style stuff earlier, so I decided to put the only resource I knew, haha 😄 . The stretch goals sound interesting and valuable to me as well! |
Our repo: https://github.com/jimjam-slam/readabilityr The person who beat us to the punch 😅: https://github.com/nanxstats/r-readability-parser |
Mozilla's readability tool (and library) powers Firefox's Reader View, but it can also be used to extract article text from web pages.
I've had a mind to wrap it for R for a while — we've done stories before based on text scraping where the sources are varied enough that trying to automate it simply with
rvest
is a recipe for frustration. I think an R wrapper for readability would be a good complement to existing tools (not to mention it would discourage people from simply aiming an LLM at a webpage, which is often overkill).The text was updated successfully, but these errors were encountered: