These instructions tell you how to set up a literature bot that automatically posts papers on whatever you're intersted in to Bluesky. If you use them to build a bot, please post a note in the "Show and tell" of the discussions.
A key pre-requisite is that you'll need an account with Microsoft Power Automate. Many people have that for free through an institutional Office365 subscription. To see if you have a subscription, go to: https://make.powerautomate.com/, and try to log in. Microsoft Power Automate is a horrendous way to build anything, but the massive advantage for literature bots is that if you have a subscription, you get free server time. So once your literature bot is running, you're all done.
Literature bots can be a useful way to keep up with the latest research. Casey Bergman started it all with a Drosophila literature bot called flypapers back when Twitter existed. There are now hundreds similar ones, many of which can be found on this list: https://twitter.com/caseybergman/lists/literaturebots/members. The detailed instructions here were originally inspired by Casey Bergman's blog post.
This repo tells you how to build your own literature bot, using the phypapers bot on Bluesky as an example.
This is a new set of instructions which I've cleaned up substantially, and streamlined for BlueSky. If you're looking for the old instructions, which had notes for Twitter, you can find them here
If you just want to see what this is all about, here are some examples I know of.
Subject | Name | Account |
---|---|---|
Phylogenetics | phy_papers | @phypapers.bsky.social |
Biogeography | biogeo_papers | @biogeopapers.bsky.social |
Here's an overview of what the literature bot does:
- Runs once every 24 hours
- Checks a lot of RSS feeds for papers that have come out in the last 24 hours
- Makes sure the papers are relevant (i.e. some places let you generate relevant RSS feeds, but others don't)
- Removes duplicate papers
- Posts the papers to Bluesky account over a 23 hour period, evenly spaced (this is so I don't flood everyone's feed with 100 papers at once)
Obviously you need an account to post to. This part gets you set up on Bluesky, whether you have an existing personal account or not.
- If you don't have a Bluesky account: Go to https://bsky.app/, and click 'Sign Up'
- If you do have a Bluesky account, log in then go to
Settings
and clickAdd Account
- Leave the hosting provider as Bluesky Social
- Fill out your details to set up your new account:
- Protip: If you already use your gmail address for your account, you can just append to it to create a new account. E.g. if your personal account is [email protected], you could use [email protected] (the '+' stays). This helps keep mail separate.
- Decide on a handle. Following flypapers' lead, I suggest a short prefix followed immediately by
papers
, e.g.flypapers
,phypapers
, etc. This means we all know it when we see a literature bot. - Click on your new profile, go to
Edit Profile
Username
: I suggest making thisprefix_papers
e.g.fly_papers
orphy_papers
. As above, this helps everyone know what's a literature botDescription
: pretty obvious, but it's always nice to know the human who runs it, so good to put your name there if you want to. It would be great if you could also put a link to these instructions on your literature bot - that way anyone who sees yours can also make their own. On my profile I just wrote: "Make your own literature bot with these instructions: https://github.com/roblanf/phypapers"
- Go to
Settings
and scroll down toApp Passwords
and set one up (this is something which allows another service to post to Bluesky on your behalf)
Literature bots use RSS feeds to post papers. You can use any RSS feed you like, but for the purposes of this tutorial I'll show you how to do the three big ones for what I do: pubmed, arXiv, and bioRxiv. The general rule though is that you should establish RSS feeds that cover as much of the literature as you possibly can for whatever your literature bot is for.
- Go here: http://www.ncbi.nlm.nih.gov/pubmed/
- Type in your favourite search terms remembering that wildcards are useful (e.g.
phylogen*
will match anything starting withphylogen
, and logical operators can be really good, e.g. you can havephylogen* OR raxml OR splitstree
. - Click
search
- Click the
Create RSS
link just below the search box - Name is something sensible
- Set
Number of items to be displayed
to 100 - Click
Create RSS
- Record the RSS URL somewhere
Create as many RSS feeds as you like from PubMed, and note them down. Why would you create more than one? Because each feed is limited to retreiving 100 papers, so on the off chance that there's a bumper day for your subject (like a special issue coming out), you might miss papers by having one general feed. For phypapers I went totally overboard and made the following list of RSS feeds:
- phylogenetics[Title]
- phylogenomics[Title]
- phylogenomic[Title]
- phylogenetic[Title]
- phylogenetics[TitleAbstract]
- phylogenomics[TitleAbstract]
- phylogenomic analysis[TitleAbstract]
- phylogenetic analysis[TitleAbstract]
- ancestral recombination graph[TitleAbstract]
If you click on them you can see what each one entails (the search terms are near the top). And note that it doesn't matter that these will pick up a lot of duplicate papers.
arXiv has preprints for many subjects, so it's worth considering. Setting up the RSS feed is trivial. All you need to do is edit this URL to include your search term:
http://export.arxiv.org/api/query?search_query=all:[YOURSEARCHTERMHERE]&start=0&max_results=100&sortBy=lastUpdatedDate&sortOrder=descending
For phypapers I used two RSS feeds. I include the text of the links below so you can see how they're built
- phylogen*
https://export.arxiv.org/api/query?search_query=all:phylogen*&start=0&max_results=100&sortBy=lastUpdatedDate&sortOrder=descending
- "ancestral recombination graph"
https://export.arxiv.org/api/query?search_query=all:%22ancestral%20recombination%20graph%22&start=0&max_results=100&sortBy=lastUpdatedDate&sortOrder=descending
bioRxiv and EcoEvoRxiv are great for biology preprints. I don't know of a way to get RSS feed with search terms from them though. However, we can get ALL the preprints in an RSS, and filter them using search terms. For bioRxiv you have two options. You can do the simple thing and just get the single RSS feed with all recent paper:
http://connect.biorxiv.org/biorxiv_xml.php?subject=all
OR... you can go overboard like me and get each subject category indpendently. This is probably overkill, but since bioRxiv returns only the last 30 papers, it will help avoid missing anything. Here's the full list in the format you'll need for Microsoft Flow.
[
"https://ecoevorxiv.org/rss/preprints/",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=animal_behavior",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=biochemistry",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=bioinformatics",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=biophysics",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=cancer_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=cell_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=developmental_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=ecology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=evolutionary_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=genetics",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=genomics",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=immunology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=microbiology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=molecular_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=neuroscience",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=paleontology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=pathology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=pharmacology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=physiology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=plant_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=scientific_communication_and_education",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=synthetic_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=systems_biology",
"http://connect.biorxiv.org/biorxiv_xml.php?subject=zoology"
]
Once you've decided on your list of RSS feeds, you then need some search terms - these will be used to find papers with any matches in the title or abstract. For phypapers I use these:
[
"phylogenetic",
"phylogenomic",
"ancestral recombination graph"
]
First we have to upload the template, which will do all the posting for you:
- Download the zip file in this repo called
bluesky_literature_bot.zip
, but don't unzip it. - Log into https://make.powerautomate.com/
- On the left hand navigation bar, click "My Flows"
- Up the top, click 'Import', then 'Import Package (Legacy)'
- Upload the
bluesky_literature_bot.zip
file, then click 'Create as new', then the blue 'Save' button - If there are other rows in the "Review Package Content" table beyond the
bluesky_literature_bot
, you have to review those too. For each one, just click the link under theimport setup
column, then follow the instructions and click the blue 'Save' button. - When you've done every row, the 'Import' button at the bottom will turn blue. Click it. Wait for your bot to import.
Next you just have to edit a few of the variables at the top of the template:
- Click
My Flows
- Click
bluesky_literature_bot
, thenEdit
at the top left - Click the purple variable
RssFeedsThatNeedKeywordSearch
, and edit it to include any RSS feeds which are not pre-searched (by default it has all of bioRxiv and EcoEvoRxiv, but you can change this to anything). Note the format is JSON, as in step 2.3 above. - Click the purple
OtherRssFeeds
, and add in all your PubMed and arXiv RSS feeds. I've left a couple in there so you can see the format, but you will need to delete these and replace them (unless you want to mostly duplicate phypapers). - Click on the purple
Keywords
variable, and put in your search terms. As before, I've left mine there so you can see them. (Hint, don't include TOO many - Microsoft Power Automate is terrifyingly slow, I'd say 10 maximum) - Click on the purple
BlueskyUsername
variable and change the bottom part fromYOUR_USERNAME.bsky.social
to your username, e.g.phypapers.bsky.social
is phypapers'. - Click on the purple
BlueskyAPIPassword
and change the dummy passwordxxxx-xxxx-xxxx-xxxx
to yours from step 1 above. - Click
Save
at the top. - Click the
<-
back arrow at the top left - Click
Turn on
at the top.
It's a good idea to do a dry run first. To do that, just hit the Run
button at the top. This will run your literature bot, and as long as it found at least one paper matching your search terms from the last 24 hours, you'll be able to see it posted to your Bluesky account.
If nothing posts, either follow up the errors in Microsoft Flow, or if it says it 'Succeeded' in the 28-day-run-history, then check your RSS feeds. The chances are that there is nothing from the last 24 hours to post.
Finally, if you want you can set the time that the search happens each day. By default it starts at 10AM AEST, but to change it just:
- Click
Edit
- Click the blue
Recurrence
variable - Change the
Time zone
andAt These Hours
to what you want. But ONLY use one hour. The system is built to run once every 24 hours, and if you do more than that you'll just get a lot of duplicates.
Make sure you follow and check your own feed regularly. If it seems like it's posting rubbish, go tweak the RSS feeds. Leave it runnning for a week or two before telling too many people, so that you can see that it works like you want, and so when people find it they can see a bunch of good papers too.
I'd love to know if you built a literature bot with these instructions. If you did, please let me know via "Show and tell" of the discussions.