Skip to content

August 2021 - Make the code nicer, urlscan.io integration.

Compare
Choose a tag to compare
@Rafiot Rafiot released this 30 Aug 13:34
· 1388 commits to main since this release
v1.8.0

New Features:

  • Integration with urlscan.io - Documentation
  • Trigger a capture from the URL - #248
  • Archiving: the captures more than 6 month old (configurable) are moved to an archive directory so they're not listed on the index anymore, but the captures can still be accessed by UUID (doesn't break permanent URLs)
  • Index file by directory for each captures (archived or not). Greatly reduces the I/O when initializing the known captures in redis.

Fixes:

  • Missing 3rd party web dependencies in docker (thanks to @FafnerKeyZee)

Changes - This release is implementing a lot of back end changes :

  • The captures are now stored a by year and month (instead of in a single directory) to avoid having too many entries in the same directory (ext4 dislikes it). All the new captures are following this new architecture, but you need to run tools/change_captures_dir.py to move the existing ones to the new format (only useful if you feel restarting the app takes too much time)
  • Move all the capture-related code from Lookyloo to AsyncCapture
  • Move all the services management code to abstractmanager
  • Use redis pooling to manage connections to the database in Lookyloo and Indexing
  • New process to trigger occasional actions, currently: generate the daily user-agent file if Lookyloo is using the UAs of its own users.
  • Reinitialize the list of captures UUIDs when starting the app instead of the in website itself
  • Improvements in processes handling (TL;DR: don't stop redis until all the async captures processes are down)
  • Move some methods from Lookyloo to the helpers
  • Simplify code in Lookyloo to make it more readable, remove dead code.
  • Bump dependencies, add hiredis to speed up redis interactions
  • Return proper HTTP error codes (mostly 4XX), when appropriate