Skip to content

Commit bb09560

Browse files
author
twitter-team
committed
[minor] Fix grammar + typo issues
Closes twitter#557, closes twitter#678, closes twitter#748, closes twitter#806, closes twitter#818, closes twitter#842, closes twitter#866, closes twitter#948, closes twitter#1024, closes twitter#1313, closes twitter#1458, closes twitter#1461, closes twitter#1465, closes twitter#1491, closes twitter#1503, closes twitter#1539, closes twitter#1611
1 parent 36588c6 commit bb09560

File tree

20 files changed

+138
-158
lines changed

20 files changed

+138
-158
lines changed

README.md

+7-7
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
# Twitter Recommendation Algorithm
1+
# Twitter's Recommendation Algorithm
22

3-
The Twitter Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the
3+
Twitter's Recommendation Algorithm is a set of services and jobs that are responsible for constructing and serving the
44
Home Timeline. For an introduction to how the algorithm works, please refer to our [engineering blog](https://blog.twitter.com/engineering/en_us/topics/open-source/2023/twitter-recommendation-algorithm). The
55
diagram below illustrates how major services and jobs interconnect.
66

@@ -13,24 +13,24 @@ These are the main components of the Recommendation Algorithm included in this r
1313
| Feature | [SimClusters](src/scala/com/twitter/simclusters_v2/README.md) | Community detection and sparse embeddings into those communities. |
1414
| | [TwHIN](https://github.com/twitter/the-algorithm-ml/blob/main/projects/twhin/README.md) | Dense knowledge graph embeddings for Users and Tweets. |
1515
| | [trust-and-safety-models](trust_and_safety_models/README.md) | Models for detecting NSFW or abusive content. |
16-
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict likelihood of a Twitter User interacting with another User. |
16+
| | [real-graph](src/scala/com/twitter/interaction_graph/README.md) | Model to predict the likelihood of a Twitter User interacting with another User. |
1717
| | [tweepcred](src/scala/com/twitter/graph/batch/job/tweepcred/README) | Page-Rank algorithm for calculating Twitter User reputation. |
1818
| | [recos-injector](recos-injector/README.md) | Streaming event processor for building input streams for [GraphJet](https://github.com/twitter/GraphJet) based services. |
1919
| | [graph-feature-service](graph-feature-service/README.md) | Serves graph features for a directed pair of Users (e.g. how many of User A's following liked Tweets from User B). |
2020
| Candidate Source | [search-index](src/java/com/twitter/search/README.md) | Find and rank In-Network Tweets. ~50% of Tweets come from this candidate source. |
2121
| | [cr-mixer](cr-mixer/README.md) | Coordination layer for fetching Out-of-Network tweet candidates from underlying compute services. |
22-
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos) |
22+
| | [user-tweet-entity-graph](src/scala/com/twitter/recos/user_tweet_entity_graph/README.md) (UTEG)| Maintains an in memory User to Tweet interaction graph, and finds candidates based on traversals of this graph. This is built on the [GraphJet](https://github.com/twitter/GraphJet) framework. Several other GraphJet based features and candidate sources are located [here](src/scala/com/twitter/recos). |
2323
| | [follow-recommendation-service](follow-recommendations-service/README.md) (FRS)| Provides Users with recommendations for accounts to follow, and Tweets from those accounts. |
24-
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light ranker model used by search index (Earlybird) to rank Tweets. |
24+
| Ranking | [light-ranker](src/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/README.md) | Light Ranker model used by search index (Earlybird) to rank Tweets. |
2525
| | [heavy-ranker](https://github.com/twitter/the-algorithm-ml/blob/main/projects/home/recap/README.md) | Neural network for ranking candidate tweets. One of the main signals used to select timeline Tweets post candidate sourcing. |
26-
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md) |
26+
| Tweet mixing & filtering | [home-mixer](home-mixer/README.md) | Main service used to construct and serve the Home Timeline. Built on [product-mixer](product-mixer/README.md). |
2727
| | [visibility-filters](visibilitylib/README.md) | Responsible for filtering Twitter content to support legal compliance, improve product quality, increase user trust, protect revenue through the use of hard-filtering, visible product treatments, and coarse-grained downranking. |
2828
| | [timelineranker](timelineranker/README.md) | Legacy service which provides relevance-scored tweets from the Earlybird Search Index and UTEG service. |
2929
| Software framework | [navi](navi/navi/README.md) | High performance, machine learning model serving written in Rust. |
3030
| | [product-mixer](product-mixer/README.md) | Software framework for building feeds of content. |
3131
| | [twml](twml/README.md) | Legacy machine learning framework built on TensorFlow v1. |
3232

33-
We include Bazel BUILD files for most components, but not a top level BUILD or WORKSPACE file.
33+
We include Bazel BUILD files for most components, but not a top-level BUILD or WORKSPACE file.
3434

3535
## Contributing
3636

ann/src/main/python/dataflow/faiss_index_bq_dataset.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ def parse_metric(config):
9191
elif metric_str == "linf":
9292
return faiss.METRIC_Linf
9393
else:
94-
raise Exception(f"Uknown metric: {metric_str}")
94+
raise Exception(f"Unknown metric: {metric_str}")
9595

9696

9797
def run_pipeline(argv=[]):

cr-mixer/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@
22

33
CR-Mixer is a candidate generation service proposed as part of the Personalization Strategy vision for Twitter. Its aim is to speed up the iteration and development of candidate generation and light ranking. The service acts as a lightweight coordinating layer that delegates candidate generation tasks to underlying compute services. It focuses on Twitter's candidate generation use cases and offers a centralized platform for fetching, mixing, and managing candidate sources and light rankers. The overarching goal is to increase the speed and ease of testing and developing candidate generation pipelines, ultimately delivering more value to Twitter users.
44

5-
CR-Mixer act as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.
5+
CR-Mixer acts as a configurator and delegator, providing abstractions for the challenging parts of candidate generation and handling performance issues. It will offer a 1-stop-shop for fetching and mixing candidate sources, a managed and shared performant platform, a light ranking layer, a common filtering layer, a version control system, a co-owned feature switch set, and peripheral tooling.
66

7-
CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.
7+
CR-Mixer's pipeline consists of 4 steps: source signal extraction, candidate generation, filtering, and ranking. It also provides peripheral tooling like scribing, debugging, and monitoring. The service fetches source signals externally from stores like UserProfileService and RealGraph, calls external candidate generation services, and caches results. Filters are applied for deduping and pre-ranking, and a light ranking step follows.

recos-injector/README.md

+10-14
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,10 @@
1-
# recos-injector
2-
Recos-Injector is a streaming event processor for building input streams for GraphJet based services.
3-
It is general purpose in that it consumes arbitrary incoming event stream (e.x. Fav, RT, Follow, client_events, etc), applies
4-
filtering, combines and publishes cleaned up events to corresponding GraphJet services.
5-
Each GraphJet based service subscribes to a dedicated Kafka topic. Recos-Injector enables a GraphJet based service to consume any
6-
event it wants
1+
# Recos-Injector
72

8-
## How to run recos-injector-server tests
3+
Recos-Injector is a streaming event processor used to build input streams for GraphJet-based services. It is a general-purpose tool that consumes arbitrary incoming event streams (e.g., Fav, RT, Follow, client_events, etc.), applies filtering, and combines and publishes cleaned up events to corresponding GraphJet services. Each GraphJet-based service subscribes to a dedicated Kafka topic, and Recos-Injector enables GraphJet-based services to consume any event they want.
94

10-
Tests can be run by using this command from your project's root directory:
5+
## How to run Recos-Injector server tests
6+
7+
You can run tests by using the following command from your project's root directory:
118

129
$ bazel build recos-injector/...
1310
$ bazel test recos-injector/...
@@ -28,17 +25,16 @@ terminal:
2825
$ curl -s localhost:9990/admin/ping
2926
pong
3027

31-
Run `curl -s localhost:9990/admin` to see a list of all of the available admin
32-
endpoints.
28+
Run `curl -s localhost:9990/admin` to see a list of all available admin endpoints.
3329

34-
## Querying recos-injector-server from a Scala console
30+
## Querying Recos-Injector server from a Scala console
3531

36-
Recos Injector does not have a thrift endpoint. It reads Event Bus and Kafka queues and writes to recos_injector kafka.
32+
Recos-Injector does not have a Thrift endpoint. Instead, it reads Event Bus and Kafka queues and writes to the Recos-Injector Kafka.
3733

3834
## Generating a package for deployment
3935

40-
To package your service into a zip for deployment:
36+
To package your service into a zip file for deployment, run:
4137

4238
$ bazel bundle recos-injector/server:bin --bundle-jvm-archive=zip
4339

44-
If successful, a file `dist/recos-injector-server.zip` will be created.
40+
If the command is successful, a file named `dist/recos-injector-server.zip` will be created.

simclusters-ann/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ SimClusters from the Linear Algebra Perspective discussed the difference between
1515
However, calculating the cosine similarity between two Tweets is pretty expensive in Tweet candidate generation. In TWISTLY, we scan at most 15,000 (6 source tweets * 25 clusters * 100 tweets per clusters) tweet candidates for every Home Timeline request. The traditional algorithm needs to make API calls to fetch 15,000 tweet SimCluster embeddings. Consider that we need to process over 6,000 RPS, it’s hard to support by the existing infrastructure.
1616

1717

18-
## SimClusters Approximate Cosine Similariy Core Algorithm
18+
## SimClusters Approximate Cosine Similarity Core Algorithm
1919

2020
1. Provide a source SimCluster Embedding *SV*, *SV = [(SC1, Score), (SC2, Score), (SC3, Score) …]*
2121

src/java/com/twitter/search/common/converter/earlybird/BasicIndexingConverter.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -513,12 +513,12 @@ public static void buildRetweetAndReplyFields(
513513
Optional<Long> inReplyToUserId = Optional.of(inReplyToUserIdVal).filter(x -> x > 0);
514514
Optional<Long> inReplyToStatusId = Optional.of(inReplyToStatusIdVal).filter(x -> x > 0);
515515

516-
// We have six combinations here. A tweet can be
516+
// We have six combinations here. A Tweet can be
517517
// 1) a reply to another tweet (then it has both in-reply-to-user-id and
518518
// in-reply-to-status-id set),
519519
// 2) directed-at a user (then it only has in-reply-to-user-id set),
520520
// 3) not a reply at all.
521-
// Additionally, it may or may not be a retweet (if it is, then it has retweet-user-id and
521+
// Additionally, it may or may not be a Retweet (if it is, then it has retweet-user-id and
522522
// retweet-status-id set).
523523
//
524524
// We want to set some fields unconditionally, and some fields (reference-author-id and

src/java/com/twitter/search/earlybird/ml/ScoringModelsManager.java

+2-2
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,13 @@
2222
/**
2323
* Loads the scoring models for tweets and provides access to them.
2424
*
25-
* This class relies on a list ModelLoader objects to retrieve the objects from them. It will
25+
* This class relies on a list of ModelLoader objects to retrieve the objects from them. It will
2626
* return the first model found according to the order in the list.
2727
*
2828
* For production, we load models from 2 sources: classpath and HDFS. If a model is available
2929
* from HDFS, we return it, otherwise we use the model from the classpath.
3030
*
31-
* The models used in for default requests (i.e. not experiments) MUST be present in the
31+
* The models used for default requests (i.e. not experiments) MUST be present in the
3232
* classpath, this allows us to avoid errors if they can't be loaded from HDFS.
3333
* Models for experiments can live only in HDFS, so we don't need to redeploy Earlybird if we
3434
* want to test them.

0 commit comments

Comments
 (0)