Easy-to-use, non-profit platform for anyone who has an interest in Alzheimer's or Dementia.
The purpose is to make resources including articles, journals, practical information, and all kinds of knowledge, easily accessible by gathering it all in one place. Alzh.info is not affiliated with any Alzheimer's organizations or patient advocacy groups in any way.
Alzh.info is not affiliated with any Alzheimer's organizations or patients advocacy groups in any way.
Link to project: https://www.alzh.info/ FOR MORE IMAGES OF THE APPLICATION GO TO THE EXAMPLES SECTION
Tech used: HTML, CSS, Tailwind, JavaScript, React, NextJS, NodeJS, Express, MongoDB, Cheerio, Puppeteer
The front end consists of a React application made with NextJS and Tailwind. Currently, there are a few static pages, but the most important page is the "News" page, with over 6000 articles from 6 different publishers.
To optimize the searching functionality, debounced queries to the DB are made.
This is not something that the users get to see, yet. The dashboard is used by admins to Quality-assure the articles scraped, as well as updating and deleting them.
Made with NodeJS and Express.
The first responsibility for the backend, is to act as an API for us to be able to do CRUD operations on the articles in the DB. The second responsibility is scraping articles.
Made with cheerio and puppeteer. Currently, there is a script running repeatedly a few times a day, fetching the newest articles from 6 different sources. These articles are then displayed inside the dashboard, where we can update all properties as needed or delete articles. Each article for example has a status field, which will initially be set to pending, where the admin will either approve or reject the scraped article. By having this dashboard, we are ensuring a standard for each article. Only articles with the status set to 'APPROVED' will be shown to the users.
The data that is currently being scraped is:
- title, subtitle, URL, publisher, publisher URL, publish date, categories, type, status
To prevent duplications, each article being scraped is validated with articles already scraped or in the database. If we have a match in articles (title & URL), the article won't be added to the DB.
One of the original plans was to gather all journals and scientific papers from major publishers.
When researching I quickly found that it was hard to navigate through journals and scientific papers. They are in general poorly displayed with bad UI/UX. It is also difficult to keep track of how many different publishers there are. In other words, it is easy to miss crucial research.
The next bigger mission, is therefore to gather as many journals as possible, and displaying them in a way, where it will be really easy for users to find exactly what they need. Journals will be categorized into different publishers, sort of like we are doing it with news articles.
In the future, users will be able to login, and have articles and journals saved to their own dashboard.
Users will in the future be able, to provide news articles, that will be sent to the admin dashboard for approval.
If you want to see more about current issues, features or want to do a feature request - you can do it here
Learned a lot about NextJS, and I've come to love it! I really like the developer experience, especially when it comes to easy routing and all the features that it provides for production. I can only agree, that vercel is not lying when they say: "Server rendering React applications has never been easier".
In general, I learned about the difference between token-based and server-side authentication. Implemented token-based auth using the NextAuth library. Currently, the front end is only set up for me to log in to the dashboard using credentials. Both login and signup functionality for users is already ready and will be shipped as soon as the user dashboard is built.
This was my first time building out web scraping scripts. I learned about 2 different libraries - Cheerio & Puppeteer.
Cheerio, "parses markup and provides an API for traversing/manipulating the resulting data structure."
- Since cheerio is really lightweight compared to other options (like headless browsers) I'm using this for scraping, if there is not need for interactions with the browser.
Puppeteer, "is a Node library which provides a high-level API to control headless Chrome or Chromium over the DevTools Protocol."
- If interactions with the webpage need to be made, I will use Puppeteer instead. Some examples could be:
- If articles are lazy loaded or paginated - infinite scroll/load more button.
- To remove popups.
- Working with a lot of data, and figuring out how to display it in a way that is easy to work with. The dashboard table is made with the react-table library, and then additional features have been added to it, such as opening a modal for each article, where we can edit them.
- Additional searching functionality has been added, such as only displaying recommended articles, or displaying articles depending on their status.
Implemented Nextparticle into the app.
Had to figure out how the vanilla script worked, and how to create a component from the constructor function inside of that. I learned how to dynamically import components with the dynamic API in Next, so that components interacting with Web APIs (such as the window and document), are not server-side rendered where they don't have access to these.
|
|
Portfolio: https://github.com/kgni/portfolio
RUN AWAY: https://github.com/kgni/run-away
Salon Hair Valby: https://github.com/kgni/salonhair-valby