Skip to content

Utility for downloading fanfiction in bulk from the Archive of Our Own

License

Notifications You must be signed in to change notification settings

nianeyna/ao3downloader

Repository files navigation

What is this?

This is a program intended to help you download fanfiction from the Archive of Our Own in bulk. This program is primarily intended to work with links to the Archive of Our Own itself, but has a secondary function of downloading any Pinboard bookmarks that link to the Archive of Our Own. You can ignore the Pinboard functionality if you don't know what Pinboard is or don't use Pinboard.

PSA: The Troubleshooting section of this readme exists and I swear to you it's not bullshit. If you encounter problems with the script DO THE TROUBLESHOOTING STEPS before giving up and/or sending a bug report. Thank you! 🙏

Table of Contents

  • Announcements: List of changes that may be of note for returning users (not a complete changelog).
  • Instructions: Complete instructions for downloading and starting ao3downloader on Windows and Mac (running ao3downloader on Linux is left as an exercise for the reader). I have tried to make this as easy to follow as possible, even for those who have little experience with computers. If any of it is confusing, or you have a suggestion to improve the instructions, please contact me.
  • Menu Options: Explanation of the options you will see when you start ao3downloader and what they do. Note that most of these options will in turn present you with a series of prompts. These should largely be self-explanatory, however, if you are confused by any of the prompts your question may be answered in the notes.
  • Notes: Explanation of some of ao3downloader's features and quirks that may not be immediately obvious. I recommend reading this.
  • Known Issues: List of bugs that I know about but haven't yet been able to fix. If you encounter strange behavior, there may be a workaround here.
  • Troubleshooting: If you encounter a problem running the script, please read this section carefully and do all of the steps in order to the best of your ability before sending a bug report.
  • Contact: How to get in contact with me. Don't be shy!

Announcements

Sometimes python version updates break the script, so be careful which version of python you use. See Troubleshooting if you don't know how to check your python version. The most recent version of python confirmed to work with ao3downloader is: Python 3.11.4

Filename customization is here! You can change the filename pattern by editing the file settings.ini (instructions are in the file). If you don't wish to customize filenames, you can just not change anything and the program will continue to work the same way.

As of March 8, 2022 I have changed how file names are generated to allow for the inclusion of non-alphanumeric characters (cnovel fans rejoice). If you have a Process going on which relies on file names for the same fic being the same, please take note of this if/when you download the new version of the code.

As of May 14, 2022 I have reduced the maximum length of file and folder names generated by the script from 100 characters to 50 characters. This is to reduce the incidence of download failures caused by exceeding the maximum Windows file path length. Once again, note that this may cause the same fic to be saved under a different name than when it was downloaded previously.

As of September 16, 2022 I have very regretfully removed the series subfolders option, due to the fact that it was causing a huge amount of unnecessary repeated downloads even for people who weren't using it.

As of January 17, 2023 I have changed how file names are generated (again). All file names will now be prefixed with the work id. This is to fix the problem where fics with the same title and author would sometimes overwrite each other in the downloads folder. I have also removed the fandom from the file name, because it was usually gettting cut off by the path length restriction, anyway.

Instructions

  1. install python from this link. do not install the latest version of python, or a version of python lower than 3.9.0.
    • if on Windows, make sure you get the "installer" and not the "embeddable package" (if you are not sure which of the installers you need, get the 64-bit one)
    • during installation, choose "Customize installation" when prompted, and check the "Add Python to environment variables" checkbox when it appears. (this option was previously called "add to PATH"). everything else can be left as default.
  2. download the repository as a zip file. the "repository" means the folder containing the code.
    • if you are reading this on github, you can download the repository by clicking on the "Code" button in github and selecting "Download ZIP"
    • if you are reading this on my website, you can download the repository by clicking the button at the top of the page that says "Click to Download"
  3. unzip the zip file you just downloaded. this will create a folder. open it. if you see a file called "ao3downloader.py" then you're in the right place.
    • to unzip the file, you must right-click on it and select the option that says something like "Extract All" - don't just double-click it! this may appear to open the folder, but it's really just a preview that won't work correctly.
  4. run the script using the instructions for your operating system:
    • windows: double-click on "ao3downloader.cmd" (if you can't see the file extensions: this is the file named "ao3downloader" which does not have a python logo)
      • note: don't use the search bar to find the right file - the script will not work properly when run from the search results pane
    • mac:
      • open a terminal window pointed to the folder containing "ao3downloader.py".
        • You can do this by right-clicking on the folder, going to Services at the bottom of the menu, and clicking "New Terminal at Folder". Alternatively, you can type "cd " and drag the folder to the terminal to copy the folder path.
      • enter the following commands one by one:
      python3 -m venv venv
      source venv/bin/activate
      python3 -m pip install --upgrade pip
      pip install -r requirements.txt
      python3 ao3downloader.py
      • after this initial setup, when you want to run the program you only need to enter:
      source venv/bin/activate
      python3 ao3downloader.py
      • note that if you delete the "venv" folder for any reason you will need to do the initial setup again.
    • other platforms: ao3downloader should work on any platform that supports python, however, you will need to do your own research into how to run python programs on your system.

Menu Options Explanation

  • 'download from ao3 link' - this works for most links to ao3. for example, you can use this to download a single work, a series, or any ao3 page that contains links to works or series (such as your bookmarks or an author's works). the program will download multiple pages automatically without the need to enter the next page link manually.
  • 'get all work links from an ao3 listing (saves links only)' - instead of downloading works, this will simply get a list of all the work links on the page you specify (as well as subsequent pages) and save them in a .txt file inside the downloads folder (one link on each line). this is useful if you prefer to download fics through FanFicFare or some other method, rather than using the ao3 download buttons. this option is much, much faster than a full download - usually only a few seconds per page. when using this option you can also choose to download a csv (spreadsheet) file containing detailed work metadata, instead of a plain text file containing links only.
  • 'download links from file' - allows downloading links from a text file with one work or series link on each line. good if you have already harvested the links you want to download via some other method.
  • 'download latest version of incomplete fics' - you can use this to check a folder on your computer (and any subfolders) for files downloaded from ao3 that are incomplete works. for each incomplete fic found, the program will check ao3 to see if there are any new chapters, and if so, will download the new version to the downloads folder.
  • 'download missing fics from series' - checks for files downloaded from ao3 that are part of a series, and for each series found, checks the series page on ao3 and downloads any fics in the series that are not already in your library.
  • 're-download fics saved in one format in a different format' - checks for all files downloaded from ao3 and redownloads every fic it finds (if possible - failed downloads due to deletion or other reasons will be logged). good if you change your mind about what format you want your library to be in. (file type choices for this option are not saved to settings.)
  • 'download marked for later list and mark all as read (requires login)' - for those who like to use their marked for later as a download queue, this option takes the headache out of clearing the list after a download. note that this option does not generate 'starting page x' notifications in the console, but will still download all pages.
  • 'download bookmarks from pinboard' - download ao3 bookmarks from pinboard. ignore this if you don't use pinboard. to get the api token go to settings -> password on the pinboard website.
  • 'convert logfile into interactable html' - all downloads from ao3 (and some other actions) are logged in a file called log.jsonl in the 'logs' folder (if this folder does not exist it means no logs have been generated yet), along with information such as whether or not the download was successful, details about errors encountered, and so on. this option converts log.jsonl into a much more human-readable, searchable and sortable (click on the column headers to sort) html file that can be opened in any browser. the file is called 'logvisualization.html' (filename will also include some numbers indicating the timestamps of the first and last log messages it contains) and is saved in the same place as log.jsonl. If your log file is particularly large, it may get split up across several html files. Note that the searching and sorting functionality (searchbox, filters, etc) may take some time to load in after the page opens. (If it never loads, you can try refreshing the page in your browser.)
  • 'configure ignore list (list of links to never try to download)' - creates (if it does not already exist) a file in the main script folder which allows you to specify links to works or series that you never want the script to attempt to download. particularly good if the work or series update option is perpetually grabbing junk you don't want. this option also gives you a chance to auto-add links to the ignore list if they were previously tagged in the log file as failed downloads due to deletion.

Notes

  • IMPORTANT: some of your input choices are saved in a file called settings.json (in the same folder as ao3downloader.py). In some cases you will not be able to change these choices unless you clear your settings by deleting settings.json (or editing it, if you are comfortable with json). In addition, please note that saved settings include passwords and keys and are saved in plain text. Use appropriate caution with this file.
  • You may change certain behaviors of the script by editing the file settings.ini. Current configurable options are:
    • Whether the script should save your password - if set to 'false', you will need to re-enter your password every time you log in via the script.
    • How many seconds to pause between requests to Ao3 - the default is 0 seconds, which means that pauses will only be initiated when Ao3 requests them. Normally you should not need to adjust this, but it can be useful if you are running into odd behavior related to the rate limit.
    • The file naming pattern to use. For most people ao3downloader's default file names should work fine, but if you don't like them, you can change that here.
  • The purpose of entering your ao3 login information is to download archive-locked works or anything else that is not visible when you are not logged in. If you don't care about that, there is no need to enter your login information.
  • Ao3 limits the number of requests a single user can make to the site in a given time period. When this limit is reached, the script will pause for the amount of time (usually a few minutes) that Ao3 requests. When this happens, the start time, end time, and length of the pause in seconds will be printed to the console. If you try to access Ao3 from your browser during this period, you will see a "Retry later" message. Don't be alarmed by this - it's normal, and you aren't in trouble. Simply wait for the specified amount of time and then refresh the page. Other than during these required pauses, you can use Ao3 as normal while the script is running.
  • If you choose to 'get works from all encountered series links' then if the script encounters a work that is part of a series, it will also download the entire series that the work is a part of. This can dramatically extend the amount of time the script takes to run. If you don't want this, choose 'n' when you get this prompt. (Series that you have bookmarked directly will always be fully downloaded, regardless of what you choose here.)
  • If you choose to 'download embedded images' the script will look for image links on all works it downloads and attempt to save those images to an 'images' subfolder. Images will be titled with the name of the fic + 'imgxxx' to distinguish them.
    • Note that this feature does not encode any association between the downloaded images and the fic file aside from the file name.
    • Most file formats will include embedded image files anyway, regardless of whether you choose this option. I have confirmed this for PDF, EPUB, MOBI, and AZW3 file formats. (If you saw me contradict this in an earlier version of this readme... no you didn't)
    • Should an image download fail, the details of the failure will be logged in the log file with the message 'Problem getting image' along with the work link and the image link. It's a good idea to check the log file for these messages, since you may still be able to download the image manually or track it down some other way.
  • If you need to stop a download in the middle, you can just close the window. When you restart the script:
    • If you are using the option 'download from ao3 link', you will be given an option to restart the download from the page you left off on. The program will attempt to avoid re-downloading works that are already in the downloads folder.
    • If you are using the option 'download bookmarks from pinboard' or 're-download fics saved in one format in a different format', the list of fics to download will be retrieved as normal but will then be filtered to remove work links that meet the following conditions:
      • A record of a download attempt for that link is present in the log file AND
        • There is a fic with the same title already in the downloads folder OR
        • The download was marked as unsuccessful
    • If you are using the option 'download latest version of incomplete fics' or 'download missing fics from series', just make sure to add any fics you don't want to download again to your library (that is, the folder you entered when prompted 'input path to folder containing files you want to check for updates') and clean up any old versions before re-starting the download.
    • Most methods of avoiding repeat downloads rely on a file called log.jsonl which is generated by the script. Make sure not to move, delete, or modify log.jsonl if you want these features to work. (Using the option to generate the log visualization file is fine.)
  • When checking for incomplete fics, the code makes certain assumptions about how fic files are formatted. I have tried to make this logic as flexible as possible, but there is still some possibility that not all incomplete fics will be properly identified by the updater, especially if the files are old (since ao3 may have made changes to how they format fics for download over time) or have been edited.
  • Custom work skins are not preserved in downloaded files. I don't currently have a way around that, however, when a work is downloaded the log entry for the download will contain a column (called 'workskin') indicating whether the work had a custom skin or not, so you can at least know which fics are in danger of looking garbled.
  • If you need to keep a different version of python on your system for some other purpose, please note that these instructions may not work as expected if you have multiple versions of python installed. However, I can point you toward the following resources:
    • Windows: the py launcher may be helpful to you
    • Mac and Linux: pyenv may be helpful to you

Known Issues

  • The script will enter an infinite loop if you give it a link to an ao3 page that contains links to works or series, but does not support multiple pages of results. The most common example of this is user dashboard pages. (To download an author's works, make sure to put in the link to their "works" page, and not their "dashboard" page.) If you get into an infinite loop, you can simply close the window to get out of it.
  • When downloading missing fics from series, if you are logged in, and the downloader finds a link to a series that is inaccessible because you do not have permission to access the series page, the downloader will download all of the works linked on your user dashboard page, instead. Yes... really.
  • Links containing more than 4095 characters may cause issues on Mac and Linux. To work around this (on Mac and Linux only!) enter stty -icanon into your terminal before running ao3downloader. When you are finished running ao3downloader, enter stty icanon to restore the default behavior. H/t github user verotheelf for this workaround.
  • Links containing more than 8191 characters will cause problems on Windows. There is no workaround, other than using a different link. Thankfully, it is unlikely you will run into this problem, as 8191 characters is quite a lot.

Troubleshooting

  • Make absolutely sure that your active python version has a version number that is not lower than 3.9.0 and IS NOT HIGHER THAN the most recent version confirmed to work with this script (this number is listed in the Announcements section). If your python version is too low OR TOO HIGH (did you hear me? make sure your python version number is not too high!), uninstall python, then install the version linked in Announcements. Then check that your active version is correct. To check which version of python is active:
    • Windows: open a command prompt and enter python --version
    • Mac or Linux: open a terminal window and enter python3 --version
  • Ensure that you have unzipped the repository (see the note about unzipping in the Instructions). Furthermore, ensure that you have not accidentally clicked in to the original, zipped version of the code. The original zip folder will allow you to open it with a double-click as if it was a regular folder, and will display the same list of files, but it's all a lie. You must run ao3downloader from an unzipped file folder. On most operating systems, the zipped version will have a zipper in the icon. Don't use that version. Unzip, and use the version with a regular, zipperless icon.
  • If you are able to create logvisualization.html (menu option 'v'), take a look through the logs to see if there are any helpful error messages.
  • If there are no logs or the logs are unhelpful, look for a folder called "venv" inside the repository. Delete "venv" and try re-running the script. (Re-running the script will re-create "venv" - that's fine. You only need to do this step once.)
  • If deleting venv doesn't work, try deleting the entire repository and re-downloading from github (but remember to save your existing downloads and log files if you have any!)
  • If re-downloading the repository doesn't work, try uninstalling and reinstalling python.
    • Make sure you install a compatible version of python as described in the first troubleshooting step.
    • Choose "Customize installation" when prompted, and check the "Add Python to environment variables" checkbox when it appears. (This option was previously called "add to PATH"). Everything else can be left as default.
  • If reinstalling python doesn't work, and you are on Windows, see this stackoverflow answer.
  • If you have tried all of the above and it still doesn't work, see below for how to send me a bug report.

Questions? Comments? Bug reports?

Feel free to head over to the discussion board and make a post, or create an issue. I prefer to communicate through the above channels if possible, however I understand many of my users don't have github accounts and may not want to make one just for this, so you can also email me at [email protected] if you prefer. Please include "ao3downloader" in the subject line of emails about the downloader. If you are reporting a bug, please describe exactly what you did to make the bug happen to the best of your ability. (More is more! Be as detailed as possible.)

(Please note that while I will absolutely do my best to get back to you, I can't make any promises - I have a job, etc.)

About

Utility for downloading fanfiction in bulk from the Archive of Our Own

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published