Following what is described here we performed the following steps:
- Download Node JS and install it: https://nodejs.org/en/download/
- Install Puppeteer by running this in your shell: npm i puppeteer
- Download grab_url.js, grab_all.sh, and urls.txt and put them all in the same directory (or copy/paste the code further down this blog post)
- Open a shell and change to the directory where you put the files from step 3.
- Run: bash grab_all.sh
Make sure you prepare a local environment in python with poetry to run this section. The analysis consists of two main activities:
- Resizing the pictures to have a uniform size across all the sample
- Investigated how webpages changes across time by computing similarity and dissimilarity measures as described here