Interactive CLI application that fetches your Twitter bookmarks using a headless browser and saves them to a SQLite database
Twitter recently launched v2 of their developer API, which provides an official means for you to fetch and manage your bookmarks. I'd strongly recommend using their official API over this application for your needs. I am not anticipating making changes on this app going forward.
I originally created this app to solve a personal need for myself, which was to save my Twitter bookmarks locally in an offline database to read them later, without relying on a non-existent (at the time) API to pull data from.
My approach relied heavily on the idea that you could act as yourself by logging in via a headless browser, access the bookmarks page, and use an API endpoint to pull bookmarks from.
- Be a means to fetch your bookmarks without needing Twitter's developer API
- Fetch bookmarks using a means as reliable and sturdy as possible (rather, this tool should work on its own for as long as possible, without needing to make breaking changes)
- Command line argument to set the starting point (cursor) to begin fetching tweets from
- Command line argument to begin fetching bookmarks from the top, instead of continuing from the last fetched tweet
- Not all functionality has tests written
- Anything else mentioned in TODO comments
-
Twitter's bookmarks page makes an HTTP GET request to an API endpoint matching this pattern:
https://twitter.com/i/api/graphql/$QUERY_ID/$OPERATION_NAME?variables=$QUERY_PARAMETERS
For fetching bookmarks,
$OPERATION_NAME
will be'Bookmarks'
. Both$QUERY_ID
and$OPERATION_NAME
are referenced in a specific object (below) inside a script loaded on the bookmarks page calledmain.XXXXXXXXX.js
. These variables in the URL path are required to make successful calls to the API. Because playwright can listen for any network calls matching this URL pattern, we're able to avoid hunting for the necessary info in this script file.{queryId:"$QUERY_ID",operationName:"Bookmarks",operationType:"query"}
The query parameter
variables
is url-encoded JSON, and represents the query parameters. For convenience, and to doubly make sure the tool presents like a browser to Twitter, the same parameters the bookmarks page already passes to the query are used, including request headers. There will be an option later to allow you to override a selection of these query parameters.There are certain request headers the API seems to expect (i.e. an authorization token), which is the sole reason why the step of logging you in with a browser is necessary for the app to fetch data from this endpoint.
A successful response returns the most recent bookmarks (the same ones you see when you first load the bookmarks page), as well as a cursor. The cursor is a marker to begin fetching the next set of bookmarks from. The most recent bookmarks from that first response will be saved, if the application doesn't have a cursor to fetch the next set of bookmarks from (which is the case if you're running it for the first time). This cursor also enables the application to resume fetching bookmarks from the last successful point in case an error is encountered, or if you decide to stop.
-
Using said response, superagent then makes subsequent requests to the API URL, and fetches as many bookmarks as possible. An intentional delay of 300ms is added in between these subsequent responses to pretend like a human is scrolling through the bookmarks page; an option to change/remove that delay will be added later. Upon each successful response, the bookmarks and the next cursor are saved to the database.
I've only run this on the following node and npm versions, on Windows 10 with WSL2 running Ubuntu. There's a chance it may run on earlier versions, but I haven't tested this.
node
>= 14.xnpm
>= 7.x- A TTY-capable terminal.
Quickly check this by running
node -e "console.log(process.stdout.isTTY)"
. If it printstrue
, you're set.
- From a directory of your choice, clone this repo with
git clone https://github.com/helmetroo/fetch-twitter-bookmarks.git
- Change into the newly created directory, and install all the required modules.
cd fetch-twitter-bookmarks
npm i
The application is an interactive shell that accepts commands, powered by Vorpal.
You can run the help
command at any time for help on commands available to you at the current moment.
If you're in the same directory as the application, you can run it with
npm start
See the arguments section for more details about the arguments you can provide and how to provide them.
- When you first run the app, you'll see a list of browsers you can run. Run the command
set-browser <name>
(aliasbrowser
).<name>
can be any one of the available browser choices runnable on your machine, which you can see withhelp
. - After you set the browser, you can
login
(aliasessignin
,authenticate
). You'll be prompted for your credentials. If login was successful, you'll see a success message and can start fetching bookmarks. However, you may be prompted to login with your username/phone only, or be asked to provide a specific authorization code (2FA or other identification code Twitter may ask you for to make sure it's you). - You can start fetching bookmarks with
fetch
(aliasstart
). The browser will navigate to the bookmarks page, watch for a call to the bookmarks API (see how it works for more info on this), then repeatedly make calls to this API until no more bookmarks can be fetched (either because there are no more to fetch, or an error was encountered). You can stop fetching at any time with thestop
command. - When you're finished, you can either end your session with
end
(aliasclose
), if you want to choose a different browser, or you canexit
the application entirely. Both commands will log you out first if you were signed in.
At any time, you can dump bookmarks saved in the database to a JSON file with the dump <filename>
command. See the filenames section for how filenames are resolved.
Filenames are resolved in this manner. They can be absolute or relative. Relative paths are resolved relative to the current working directory from which you run the app.
To show help prompts for commands, you can add the option --help
after each command.
Shows help prompt for all currently available commands.
Sets the browser. Necessary to log in and fetch bookmarks. Browser choices available to you are determined each time you start the app. To change browsers after setting a browser, run end
, then choose another browser.
Sets the database bookmarks are saved to. Command not available if you are in the middle of fetching bookmarks.
Providing <filename>
is the same as setting -f
or --file
.
-m
or--in-memory
: Database is stored in-memory. Useful if you don't want to save a database file. All saved bookmarks will be lost when you exit. Ignored if-f
or-d
is set.-f <filename>
or--file <filename>
: Database is saved to<filename>
. Ignored if-m
or-d
is set.-d
or--default
: Database is saved to the default location ($APP_ROOT/twitter-bookmarks.db
). Ignored if-m
or-f
are set.
Sets the location for log files.
Clears the current log file by removing it and creating a new empty file.
Log in to Twitter. Necessary to begin fetching bookmarks. Available after you've set the browser (see above).
Begins fetching bookmarks. Bookmarks will be fetched from either the last saved cursor, or from the very beginning if the app hasn't saved a cursor to the bookmarks database. Command available after you've set a browser and logged in.
Stops fetching bookmarks. Command available while you're fetching bookmarks.
Ends your session. Bookmarks will stop being fetched, and you'll also be logged out. Run when you want to switch browsers and/or accounts.
Dumps saved bookmarks in the database to a JSON file.
Exits the application.
-m
or--in-memory
: Database is kept in memory. Useful if you don't want to save a database file or do a one-time dump. All saved bookmarks will be lost when you exit. Ignored if-f
is set.-f <filename>
or--file <filename>
: Database is saved to<filename>
. Ignored if-m
is set.-d
or--default
: Database is saved to the default location ($APP_ROOT/twitter-bookmarks.db
). Ignored if-m
or-f
are set.-l <filename>
or--log-file <filename>
: Log files are saved to<filename>
.--clear-log
: Logs are cleared. If a log file is provided through-l
, that log file will be cleared. Otherwise, the default log will be cleared.
If you're running the app via npm, arguments are added after an extra set of dashes, like below:
npm start -- -f $HOME/bookmarks.db
When you use this app to login to Twitter, you'll more than likely receive login notifications like the ones I did with my account below. When you run tests, you'll probably see several of these. If you login too frequently, Twitter may ask you to sign in with your username/phone only.
Network requests, and full details of internal errors not shown to you are saved to log files via winston.
The default location for logs is $APP_ROOT/logs/debug.log
.
Bookmarked tweets, and available metadata for their respective authors, as well as the cursor are saved in a SQLite database file. The default location of this database is the $APP_ROOT/twitter-bookmarks.db
, although you can change this with the designated command-line arguments and commands above. You can explore the database with a CLI tool or GUI.
Definitions for the database schema for bookmarked tweets and their authors can be found in tweets-db.ts.
TypeScript interfaces for bookmarked tweets and their authors can be found in twitter.ts. The interfaces were written based off successful responses from the bookmarks API endpoint in Chrome DevTools and making speculations.
Sequelize is used to maintain the database schema, as well as save and retrieve bookmarked tweets, their respective authors, and the cursor to fetch the next set of bookmarks from.
(Incomplete) tests have been set up with Jest to exercise working functionality.
All tests are set up to run against every browser playwright can run on your machine.
Some tests require real user credentials to work. You can add these in a .env
file in this application's root directory:
FTB_TWITTER_USERNAME=...
FTB_TWITTER_PASSWORD=...
Run available tests with
npm test
For my first approach at writing this app, I used Puppeteer to first visit the bookmarks page, then made it continuously scroll down and scrape encountered tweets. It assumed Twitter contained all tweets in <article />
elements. More details on how that worked, as well as the implementation can be found here. I decided that was far too unstable and the code was growing unnecessarily complex for its own good, so I rewrote it.
TBD