Skip to content

Latest commit

 

History

History
358 lines (275 loc) · 33.1 KB

README.md

File metadata and controls

358 lines (275 loc) · 33.1 KB

🔍 Advanced Search on Twitter

👀

These operators work on Web, Mobile, Tweetdeck.

There is some overlap, but largely these will not work for v1.1 Search, Premium Search, or v2 Search APIs.

Adapted from TweetDeck Help, @lucahammer Guide, @eevee Twitter Manual, @pushshift and Twitter / Tweetdeck itself. Contributions / tests, examples welcome!

Class Operator Finds Tweets… Eg:
Tweet content nasa esa
(nasa esa)
Containing both "nasa" and "esa". Spaces are implicit AND. Brackets can be used to group individual words if using other operators. 🔗
  nasa OR esa Either "nasa" or "esa". OR must be in uppercase. 🔗
  "state of the art" The complete phrase "state of the art". Will also match "state-of-the-art". Also use quotes to prevent spelling correction. 🔗
  "this is the * time this week" A complete phrase with a wildcard. * does not work outside of a quoted phrase or without spaces. 🔗
  +radiooooo Force a term to be included as-is. Useful to prevent spelling correction. 🔗
  -love
-"live laugh love"
- is used for excluding "love". Also applies to quoted phrases and other operators. 🔗
  #tgif A hashtag 🔗
  $TWTR A cashtag, like hashtags but for stock symbols 🔗
  What ? Question marks are matched 🔗
  :) OR :( Some emoticons are matched, positive :) :-) :P :D or negative :-( :( 🔗
  👀 Emoji searches are also matched. Usually needs another operator to work. 🔗
  url:google.com urls are tokenized and matched, works very well for subdomains and domains, not so well for long urls, depends on url. Youtube ids work well. Works for both shortened and canonical urls, eg: gu.com shortener for theguardian.com. When searching for Domains with hyphens in it, you have to replace the hyphen by an underscore (like url:t_mobile.com) but underscores _ are also tokenized out, and may not match 🔗
  lang:en Search for tweets in specified language, not always accurate, see the full list and special lang codes below. 🔗
 
Users from:user Sent by a particular @username e.g. "dogs from:NASA" 🔗
  to:user Replying to a particular @username 🔗
  @user Mentioning a particular @username. Combine with -from:username to get only mentions 🔗
  list:715919216927322112
list:esa/astronauts
Tweets from members of this public list. Use the list ID from the API or with urls like twitter.com/i/lists/715919216927322112. List slug is for old list urls like twitter.com/esa/lists/astronauts. Cannot be negated, so you can't search for "not on list". 🔗
  filter:verified From verified users 🔗
  filter:blue_verified From "verified" users that paid $8 for Twitter Blue 🔗
  filter:follows Only from accounts you follow. Cannot be negated. 🔗
  filter:social
filter:trusted
Only from algorithmically expanded network of accounts based your own follows and activities. Works on "Top" results not "Latest" 🔗
 
Geo near:city Geotagged in this place. Also supports Phrases, eg: near:"The Hague" 🔗
  near:me Near where twitter thinks you are 🔗
  within:radius Within specific radius of the "near" operator, to apply a limit. Can use km or mi. e.g. fire near:san-francisco within:10km 🔗
  geocode:lat,long,radius E.g., to get tweets 10km around twitters hq, use geocode:37.7764685,-122.4172004,10km 🔗
  place:96683cc9126741d1 Search tweets by Place Object ID eg: USA Place ID is 96683cc9126741d1 🔗
 
Time since:2021-12-31 On or after (inclusive) a specified date. 4 digit year, 2 digit month, 2 digit day separated by - a dash. 🔗
  until:2021-12-31 Before (NOT inclusive) a specified date. Combine with a "since" operator for dates between. 🔗
  since:2021-12-31_23:59:59_UTC On or after (inclusive) a specified date and time in the specified timezone. 4 digit year, 2 digit month, 2 digit day separated by - dashes, an _ underscore separating the 24 hour clock format hours:minutes:seconds and timezone abbreviation. 🔗
  until:2021-12-31_23:59:59_UTC Before (NOT inclusive) a specified date and time in the specified timezone. Combine with a "since" operator for dates between. 🔗
  since_time:1142974200 On or after a specified unix timestamp in seconds. Combine with the "until" operator for dates between. Maybe easier to use than since_id below. 🔗
  until_time:1142974215 Before a specified unix timestamp in seconds. Combine with a "since" operator for dates between. Maybe easier to use than max_id below. 🔗
  since_id:tweet_id After (NOT inclusive) a specified Snowflake ID 🔗
  max_id:tweet_id At or before (inclusive) a specified Snowflake ID (see Note below) 🔗
  within_time:2d
within_time:3h
within_time:5m
within_time:30s
Search within the last number of days, hours, minutes, or seconds 🔗
 
Tweet Type filter:nativeretweets Only retweets created using the retweet button. Works well combined with from: to show only retweets. Only works within the last 7-10 days or so. 🔗
  include:nativeretweets Native retweets are excluded by default. This shows them. In contrast to filter:, which shows only retweets, this includes retweets in addition to other tweets. Only works within the last 7-10 days or so. 🔗
  filter:retweets Old style retweets ("RT") + quoted tweets. 🔗
  filter:replies Tweet is a reply to another Tweet. good for finding conversations, or threads if you add or remove to:user 🔗
  conversation_id:tweet_id Tweets that are part of a thread (direct replies and other replies) 🔗
  filter:quote Contain Quote Tweets 🔗
  quoted_tweet_id:tweet_id Search for quotes of a specific tweet 🔗
  quoted_user_id:user_id Search for all quotes of a specific user 🔗
  card_name:poll2choice_text_only
card_name:poll3choice_text_only
card_name:poll4choice_text_only
card_name:poll2choice_image
card_name:poll3choice_image
card_name:poll4choice_image
Tweets containing polls. For polls containing 2, 3, 4 choices, or image Polls. 🔗
 
Engagement filter:has_engagement Has some engagement (replies, likes, retweets). Can be negated to find tweets with no engagement. Note all of these are mutually exclusive with filter:nativeretweets or include:nativeretweets, as they apply to the retweet, not the original tweet, so they won't work as expected. 🔗
  min_retweets:5 A minimum number of Retweets. Counts seem to be approximate for larger (1000+) values. 🔗
  min_faves:10 A minimum number of Likes 🔗
  min_replies:100 A minimum number of replies 🔗
  -min_retweets:500 A maximum number of Retweets 🔗
  -min_faves:500 A maximum number of Likes 🔗
  -min_replies:100 A maximum number of replies 🔗
 
Media filter:media All media types. 🔗
  filter:twimg Native Twitter images (pic.twitter.com links) 🔗
  filter:images All images. 🔗
  filter:videos All video types, including native Twitter video and external sources such as Youtube. 🔗
  filter:periscope Periscopes 🔗
  filter:native_video All Twitter-owned video types (native video, vine, periscope) 🔗
  filter:vine Vines (RIP) 🔗
  filter:consumer_video Twitter native video only 🔗
  filter:pro_video Twitter pro video (Amplify) only 🔗
  filter:spaces Twitter Spaces only 🔗
 
More Filters filter:links Only containing some URL, includes media. use -filter:media for urls that aren't media 🔗
  filter:mentions Containing any sort of @mentions 🔗
  filter:news Containing link to a news story. Combine with a list operator to narrow the user set down further. 🔗
  filter:safe Excluding NSFW content. Excludes content that users have marked as "Potentially Sensitive". Doesn't always guarantee SFW results. 🔗
  filter:hashtags Only Tweets with Hashtags. 🔗
 
App specific source:client_name Sent from a specified client e.g. source:tweetdeck (See Note for common ones) eg: twitter_ads doesn't work on it's own, but does with another operator. 🔗
  card_domain:pscp.tv Matches domain name in a Twitter Card. Mostly equivalent to url: operator. 🔗
  card_url:pscp.tv Matches domain name in a Card, but with different results to card_domain. 🔗
  card_name:audio Tweets with a Player Card (Links to Audio sources, Spotify, Soundcloud etc.) 🔗
  card_name:animated_gif Tweets With GIFs 🔗
  card_name:player Tweets with a Player Card 🔗
  card_name:app
card_name:promo_image_app
Tweets with links to an App Card. promo_app does not work, promo_image_app is for an app link with a large image, usually posted in Ads. 🔗
  card_name:summary Only Small image summary cards 🔗
  card_name:summary_large_image Only large image Cards 🔗
  card_name:promo_website Larger than summary_large_image, usually posted via Ads 🔗
  card_name:promo_image_convo
card_name:promo_video_convo
Finds Conversational Ads cards. 🔗
  card_name:3260518932:moment Finds Moments cards. 3260518932 is the user ID of @TwitterMoments, but the search finds moments for everyone, not that specific user. 🔗

Matching:

On web and mobile, keyword operators can match on: The user's name, the @ screen name, tweet text, and shortened, as well as expanded url text (eg, url:trib.al finds accounts that use that shortener, even though the full url is displayed).

By default "Top" results are shown, where "Top" means tweets with some engagements (replies, RTs, likes). "Latest" has most recent tweets. People search will match on descriptions, but not all operators work. "Photos" and "Videos" are presumably equivalent to filter:images and filter:videos.

Exact Tokenization is not known, but it's most likely a custom one to preserve entities. URLs are also tokenized. Spelling correction appears sometimes, and also plurals are also matched, eg: bears will also match tweets with bear. - not preceeding an operator are removed, so "state-of-the-art" is the same as "state of the art".

Private accounts are not included in the search index, and their tweets do no appear in results. Locked and suspended accounts are also hidden from results. There are other situations where tweets may not appear: anti-spam measures, or tweets simply have not been indexed due to server issues.

Twitter is using some words as signal words. E.g. when you search for “photo”, Twitter assumes you’re looking for Tweets with attached photos. If you want to search for Tweets which literally contain the word “photo”, you have to wrap it in double quotes "photo".

Building Queries:

Most "filter:type" can also be negated using the "-" symbol, with exceptions like filter:follows which can't be negated. exclude:links is the same as -filter:links. It's sometimes worth trying an alias like that in case the search doesn't work first time.

Example: I want Tweets from @Nasa with all types of media except images

from:NASA filter:media -filter:images

Combine complex queries together with booleans and parentheses to refine your results.

Example 1: I want mentions of either "puppy" or "kitten", with mentions of either "sweet" or "cute", excluding Retweets, with at least 10 likes.

(puppy OR kitten) AND (sweet OR cute) -filter:nativeretweets min_faves:10

Example 2: I want mentions of "space" and either "big" or "large" by members of the NASA astronauts List, sent from an iPhone or twitter.com, with images, excluding mentions of #asteroid, since 2011.

space (big OR large) list:nasa/astronauts (source:twitter_for_iphone OR source:twitter_web_client) filter:images since:2011-01-01 -#asteroid

To find any quote tweets, search for the tweet permalink, or the tweet ID with url eg: https://twitter.com/NASA/status/1138631847783608321 or url:1138631847783608321, see note for more.

For some queries you may want to use parameters with hyphens or spaces in it, e.g. site:t-mobile.com or app:Twitter for iOS. Twitter doesn’t accept hyphens or spaces in parameters and won’t display any tweets for this query. You can still search for those parameters by replacing all hyphens and spaces with underscores, e.g. site:t_mobile.com or app:Twitter_for_iOS.

Limitations:

Known limitations: card_name: only works for the last 7-8 days.

The maximum number of operators seems to be about 22 or 23.

All the Time operators have to be used in conjunction with something else to work.

Tweetdeck Equivalents:

Tweetdeck options for columns have equivalents you can use on web search:

  • Tweets with Images: filter:images
  • Videos: filter:videos
  • Tweets with GIFs: card_name:animated_gif
  • "Tweets with broadcasts": (card_domain:pscp.tv OR card_domain:periscope.tv OR "twitter.com/i/broadcasts/")
  • "Any Media" (filter:images OR filter:videos)
  • "Any Links (includes media)": filter:links

Notes:

Web, Mobile, Tweetdeck Search runs on one type of system (as far as i can tell), Standard API Search is a different index, Premium Search and Enterprise Search is another separate thing based on Gnip products. API docs already exist for the API and Premium but i might add guides for those separately.

Snowflake IDs:

All user, tweet, DM, and some other object IDs are snowflake IDs on twitter since 2010-06-01 and 2013-01-22 for user IDs. In short, each ID embeds a timestamp in it.

To use these with since_id / max_id as time delimiters, either pick a tweet ID that roughly has a created_at time you need, remembering that all times on twitter are UTC, or use the following (This works for all tweets after Snowflake was implemented):

To convert a Twitter ID to millisecond epoch:

(tweet_id >> 22) + 1288834974657 -- This gives the millisecond epoch of when the tweet or user was created.

Convert from epoch back to a tweet id:

(millisecond_epoch - 1288834974657) << 22 = tweet id

Here's a use case:

You want to start gathering all tweets for specific search terms starting at a specific time. Let's say this time in August 4, 2019 09:00:00 UTC. You can use the max_id parameter by first converting the millisecond epoch time to a tweet id. You can use https://www.epochconverter.com.

August 4, 2019 09:00:00 UTC = 1564909200000 (epoch milliseconds)

(1564909200000 - 1288834974657) << 22 = 1157939227653046272 (tweet id)

So if you set max_id to 1157939227653046272, you will start collecting tweets earlier than that datetime. This can be extremely helpful when you need to get a very specific portion of the timeline.

Here's a quick Python function:

def convert_milliepoch_to_tweet_id(milliepoch):
    if milliepoch <= 1288834974657:
        raise ValueError("Date is too early (before snowflake implementation)")
    return (milliepoch - 1288834974657) << 22

Unfortunately, remember that JavaScript does not support 64bit integers, so these calculations and other operations on IDs often fail in unexpected ways.

More details on snowflake can be found in @pushshift document here.

Quote-Tweets

From a technical perspective Quote-Tweets are Tweets with a URL of another Tweet. It's possible to find Tweets that quote a specific Tweet by searching for the URL of that Tweet. Any parameters need to be removed or only Tweets that contain the parameter as well are found. Twitter appends a Client-parameter when copying Tweet URLs through the sharing menu. Eg. ?s=20 for the Web App and ?s=09 for the Android app. Example: twitter.com/jack/status/20/ -from:jack

To find all Tweets that quote a specific user, you search for the first part of the Tweet-URL and exclude Tweets from the user: twitter.com/jack/status/ -from:jack.

Geo Searches

Very few tweets have exact geo coordinates. Exact Geo coordinates are phased out for normal tweets, but will remain for photos: https://twitter.com/TwitterSupport/status/1141039841993355264

Tweets instead can be tagged by Place

How did I find these in the first place?

Reading Twitter Documentation and help docs from as many sources as possible - eg: Developer Documentation, Help pages, Tool-specific help pages, eg: Tweetdeck help etc. Using Share feature on tweetdeck to copy the search string. Searching google and pastebin and github for rarely documented ones together to find other lists of operators others have compiled.

Known Unknowns and Assumptions:

I have no idea how Twitter decides what should match filter:news, my guess is that it's based on a list of whitelisted domain names, as tweets from anyone can appear as long as they link to a news site, no idea if this list is public. No idea if or how this filter changed over time. But we can try to retrieve tweets and see. lang:und will match most empty tweets or tweets with a single number or link. filter:safe presumably uses the User setting "Contains Sensitive Content" - but may also apply to specific tweets somehow.

It would be great to be able to reliably find Promoted tweets - this may be possible with some of the card searches. Tweets composed in Twitter Ads are available with source:twitter_ads but other promoted tweets may not have been created with that app.

I'd also like to search for Collections (Timelines) and Moments, but this seems to work ok with just url: searches. eg: url:twitter.com/i/events and url:twitter.com/i/moments (I think the difference is events are curated?) but url:twitter.com url:timelines has many false positives.

In Search Settings, "Hide Sensitive Content" equivalent is filter:safe - is there an equivalent to "Remove Blocked and Muted Accounts"?

Supported Languages:

Language is specified as 2 letter ISO codes. Language is tagged automatically from the tweet text, nad not always accurate, see here for notes on accuracy. The list from TweetDeck dropdown menu has all of them:

All Languages

lang:am Amharic (አማርኛ)
lang:ar Arabic (العربية)
lang:bg Bulgarian (Български)
lang:bn Bengali (বাংলা)
lang:bo Tibetan (བོད་སྐད)
lang:ca Catalan (Català)
lang:ch` Cherokee (ᏣᎳᎩ)
lang:cs Czech (čeština)
lang:da Danish (Dansk)
lang:de German (Deutsch)
lang:dv Maldivian (ދިވެހި)
lang:el Greek (Ελληνικά)
lang:en English (English)
lang:es Spanish (Español)
lang:et Estonian (eesti)
lang:fa Persian (فارسی)
lang:fi Finnish (Suomi)
lang:fr French (Français)
lang:gu Gujarati (ગુજરાતી)
lang:hi Hindi (हिंदी)
lang:ht Haitian Creole (Kreyòl ayisyen)
lang:hu Hungarian (Magyar)
lang:hy Armenian (Հայերեն)
lang:in Indonesian (Bahasa Indonesia)
lang:is Icelandic (Íslenska)
lang:it Italian (Italiano)
lang:iu Inuktitut (ᐃᓄᒃᑎᑐᑦ)
lang:iw Hebrew (עברית)
lang:ja Japanese (日本語)
lang:ka Georgian (ქართული)
lang:km Khmer (ខ្មែរ)
lang:kn Kannada (ಕನ್ನಡ)
lang:ko Korean (한국어)
lang:lo Lao (ລາວ)
lang:lt Lithuanian (Lietuvių)
lang:lv Latvian (latviešu valoda)
lang:ml Malayalam (മലയാളം)
lang:my Myanmar (မြန်မာဘာသာ)
lang:ne Nepali (नेपाली)
lang:nl Dutch (Nederlands)
lang:no Norwegian (Norsk)
lang:or Oriya (ଓଡ଼ିଆ)
lang:pa Panjabi (ਪੰਜਾਬੀ)
lang:pl Polish (Polski)
lang:pt Portuguese (Português)
lang:ro Romanian (limba română)
lang:ru Russian (Русский)
lang:si Sinhala (සිංහල)
lang:sk Slovak (slovenčina)
lang:sl Slovene (slovenski jezik)
lang:sv Swedish (Svenska)
lang:ta Tamil (தமிழ்)
lang:te Telugu (తెలుగు)
lang:th Thai (ไทย)
lang:tl Tagalog (Tagalog)
lang:tr Turkish (Türkçe)
lang:uk Ukrainian (українська мова)
lang:ur Urdu (ﺍﺭﺩﻭ)
lang:vi Vietnamese (Tiếng Việt)
lang:zh Chinese (中文)

Searching for lang:chr, lang:iu, lang:sk seems to fail, as tweets matching the keywords are returned instead of the language.

There are also some special language codes that work. For example:

  • lang:und for undefined language
  • lang:qam for tweets with mentions only (works for tweets since 2022-06-14)
  • lang:qct for tweets with cashtags only (works for tweets since 2022-06-14)
  • lang:qht for tweets with hashtags only (works for tweets since 2022-06-14)
  • lang:qme for tweets with media links (works for tweets since 2022-06-14)
  • lang:qst for tweets with a very short text (works for tweets since 2022-06-14)
  • lang:zxx for tweets with either media or Twitter Card only, without any additional text (works for tweets since 2022-06-14)

Common clients:

source: should work for any API client, try putting the client name in quotes or replace spaces with underscores. This is the App name field that you can alter in the developer app configuration page, so anyone can set anything here and appear to tweet from a made up client.

You cannot copy an existing name. This operator needs to be combined with something else to work, eg: lang:en These are some common ones:

Official Twitter Clients:

twitter_web_client
twitter_web_app
twitter_for_iphone
twitter_for_ipad
twitter_for_mac
twitter_for_android
twitter_ads
tweetdeck
tweetdeck_web_app
twitter_for_advertisers
twitter_media_studio
cloudhopper (tweets via sms service)

Very Common 3rd Party Clients:

facebook
instagram
twitterfeed
tweetbot.net
IFTTT

notable, weird and wonderful ones:

"LG Smart Refrigerator"
"GUCCI SmartToilet™"