Phd Placeholder: learn-to-rank, decentralised AI, on-device AI, something. #7586

synctext · 2023-09-04T14:45:37Z

ToDo: determine phd focus and scope

Phd Funding project: https://www.tudelft.nl/en/2020/tu-delft/eur33m-research-funding-to-establish-trust-in-the-internet-economy
Duration: 1 Sep 2023 - 1 sep 2027

First weeks: reading and learning. See this looong Tribler reading list of 1999-2023 papers, the "short version". Long version is 236 papers 😄 . Run Tribler from the sources.

Before doing fancy decentralised machine learning, learn-to-rank; first have stability, semantic search, and classical algorithms deployed. Current Dev team focus: #3868

update: Sprint focus? reading more Tribler articles and get this code going again: https://github.com/devos50/decentralized-rules-prototype

Dreams from a young man 👴 From IETF Journal Oct 2012 "Moving Toward a Censorship-free
Internet (page16)", using phone-to-phone communication as used during Arab Spring uprising.

Wise words on difficulty of Distributed Systems for young engineers/scientists (also discussion on Hacker News)

pneague · 2023-10-03T11:34:46Z

I have taken to understanding the work done by Martijn on ticket 42. I read through it and downloaded the code attached.

The last version of the code had a couple of functions not yet implemented so I reverted to the 22-06-2022 version (instead of the last version uploaded on 27-06-2022).

The 22-06-2022 had a few outdated functions and small bugs as well here and there, but since they were small I was able to solve them.

I have downloaded the required dataset and then successfully run the parser and scenario_creating functions implemented by Martijn. After that I ran the experiment itself based on the above-mentioned scenario, resulting in a couple of csv's and graphs.

I understand the general idea of the experiments and how they work, however the code still eludes me since it's not commented to a significant amount.
Here's an example of the graph of an experiment run with Martijn's code so far:

synctext · 2023-10-03T12:33:28Z

Hmmm, very difficult choice.
For publications we should focus on something like Web3AI: deploying decentralised artificial intelligence

pneague · 2023-10-18T07:49:05Z

Re-read papers regarding learn-to-rank and learned how to use the IPV8. With it I created an algorithm which simulates a number of nodes and sends messages to one another. From here I worked with Marcel and started implementing a system whereby one node sends a query to the swarm and then receives recommendations of content back from it. The progress is detailed in ticket 7290.
The idea at the moment is that we implement a version of Mixture-of-Experts (https://arxiv.org/pdf/2002.04013.pdf) whereby one node sends the query to other nodes which are nearby and receives recommendations. These are then aggregated to create a shortened and sorted list of recommendations for the querying node.

There are 2 design choices:
We could send the query-doc_inferior-doc_superior around as gossip or we (as we do at the moment) send the updates around every run. We'll look deeper into these ideas.

One issue discovered was regarding the size of the IPV8 network packet which is currently smaller than the entire model serialized with Pytorch, Marcel is currently working on that. We have 720k weights at the moment, and the maximum network packet size for IPV8 is 2.7MB so we have to fit in as many weight updates as possible.

You can see a demonstration of the prototype below:

I'm currently working on how to aggregate the recommendations of the swarm (for example, what happens if the recommendations of each node which received the query are entirely different). My branch on Marcel's repository: https://github.com/mg98/p2p-ol2r/tree/petrus-branch

synctext · 2023-10-18T09:38:33Z

It's beyond amazing what you acomplished in 6 weeks after starting your phd. 🦄 🦄 🦄
Is the lab now All-In on Distributed AI? 🎲

Can we upgrade to transformers? That is the cardinal question for scientific output. We had Distributed AI in unusable form deployed already in 2012 within our Tribler network. Doing model updates is too complex compared to simple starting with sending training triplets around in a IPv8 community. The key is simplicity, ease of deployment, correctness, and ease of debugging. Nobody has a self-organising live AI with lifelong learning, as you have today in embryonic form. We even removed our deployed clicklog code in 2015 because it was not good enough. Options:

do science-only prototype, see results, try with 1 million items, such as: arxiv 1 million articles. Do a workshop paper, get a nice phd thesis chapter(s).
production-level coding. Get to deployment as-fast-as-possible. Efficiency, storage, and bandwidth cost need to be only compared to the current filename matching. So comparing classic syntactic search with most simple possible semantic search. The different will be gigantic! Anything "simple intelligent" is better than dumb literal keyword query matching.
Science breakthrough. Get the transformers to decentralise. First do a bit of the above, but go beyond current state-of-the-art
- For simplicity, use NanoGPT that our student also got operational. Simple, easy to upgrade, light on resources, but still amazing power with just some Shakespeare dataset training.
- FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large Language Models in Federated Learning
- Privacy-Preserving Fine-Tuning of Artificial Intelligence (AI) Foundation Models with Federated Learning, Differential Privacy, Offsite Tuning, and Parameter-Efficient Fine-Tuning (PEFT)
- For big scientific publication we need ClickLog data and beat the (federated) state-of-the-art performance
  - Kaggle: Recommender Click Logs- Sowiport
  - ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search
  - Github: DL-Hard. Paper: How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset
  - Github code: FlexNeuART (flex-noo-art) Flexible classic and NeurAl Retrieval Toolkit
  - Github code: Reranker is a lightweight, effective and efficient package for training and deploying deep language model 🚀 compatible with HuggingFace transformers and models
Privacy and security: adverserial search are future work or do simple weighted MeritRank for mixture of expert gating function. Selected a few experts/nodes as "guard nodes", like TOR does for privacy and security.

For a Youtube alternative smartphone app we have a single simple network primitive :
Query, content-item-clicked, content-item-NOT-clicked, clicked-item-popularity,signature and in TikTok form without queries and added viewing attention time: content-item-long-attention, long-attention-time, content-item-low-attention, low-attention-time, long-attention-item-popularity,signature. Usable for content discovery, cold starts, content recommendation, and obviously semantic search.

Next sprint goal: get a performance graph!
We need to get a paper on this soon, because the field is moving at lightning speed. So up and running before X-Mas, Tribler test deployment, and usage of NanoGPT in Jan, paper in Feb 🚀

pneague · 2023-11-02T15:26:59Z

After looking into what datasets we could use for training a hypothetical model, I found ORCAS which consists of almost 20 million queries and the relevant website link given the query. It is compiled by Microsoft and it represents searches made on Bing in a period of a few months (with a few caveats to preserve privacy, such as showing only queries which have been searched a number of times and not showing a user_ID and stuff like that).

The data seems good, but the fact that we have links instead of titles of documents made it impossible to use the triplet model we have right now (where we need to calculate the 768 dimension embedding of the title of the document: since we don't have a document-title and only a link we cannot do that).

So I was looking for another model architecture to be usable in our predicament and I found Transformer Memory as a Differentiable Search Index. The paper argues that instead of using a dual-encoder method (where we encode the query and the document on the same space and then find the document which is nearest neighbour to the query) we can use the differentiable-search-index (DSI), where we have a neural network map directly the query to the document. The paper presents a number of methods to achieve this but the easiest one to implement for me at this time was to simply assign each document one number, have the output layer of the network be composed of the same number of neutrons as the number of documents and make the network essentially assign probabilities to each document, given a query. Additionally, the paper performs this work with a Transformer architecture, raising the possibility of us integrating Nanogpt into the future architecture.

I got to implement an intermediary version of the network whereby the same encoder that Marcel used (the allenai/specter language model) encodes a query and the output is the probability for each document individually. The rest of the architecture is left unmodified:
layers = [
('lin1', nn.Linear(768, 256)), # encoded query, 768 dimensions
('relu1', nn.ReLU()),
('lin2', nn.Linear(256, 256)),
('relu2', nn.ReLU()),
('lin3', nn.Linear(256, 256)),
('relu3', nn.ReLU()),
('lin4', nn.Linear(256, number_of_documents)), # output probabilities
]
In my preliminary tests so far, when we have 884 documents (i.e. 884 output neurons) we can perform 50 searches in 4 seconds (so about one search per 0.08 seconds). When we have 1066561 documents, 50 searches get completed in 200 seconds (one search per 4 seconds). Under some circumstances this may be acceptable for Tribler users but people with older computers might experience significant difficulties. I will need to look at ways of reducing the computation time required.

Moving forward, I'm looking to finally implement a good number of peers in a network that send each other the query and answer (from ORCAS) and get the model to train.

qstokkink · 2023-11-03T08:26:35Z

Cool stuff 👍 Could you tell me more about your performance metrics? I have two questions:

Are these are SIMD results (i.e., one batch of 50 searches take 200 seconds but a batch with 1 search also takes 200 seconds)?
What hardware did you use (e.g., CPU, some crappy laptop GPU, HPC node with 10 Tesla V100's, ..)?

This matters a lot for deployment in Tribler.

pneague · 2023-11-03T09:50:42Z

They are not SIMD. One search actually takes 1/50'th of the mentioned time
I used a Mac laptop with M2 Pro Chip

But keep in mind, this is extremely preliminary, I did not implement NanoGPT with this setup so that's bound to increase computing requirements

synctext · 2023-11-08T13:59:52Z

Paper idea to try out for 2 weeks:

Problem: decentralised learn-to-rank with full scalability
Rush Paper. Maximise personal development within phd&learning speed, be the first workshop-level paper, 5 Jan 2024 deadline: https://www.wis.ewi.tudelft.nl/dbml2024
Dataset either use prior work dataset or ORCAS:
- https://ai.google.com/research/NaturalQuestions/download
- https://github.com/microsoft/msmarco/blob/master/ORCAS.md
Central algorithm paper as starting point: Transformer Memory as a Differentiable Search Index
- Running code of algorithm: https://github.com/ArvinZhuang/DSI-transformers
- https://github.com/ArvinZhuang/DSI-QG (superior results by Microsoft people: https://arxiv.org/pdf/2206.10128.pdf)
- Weights: https://huggingface.co/ielabgroup/xor-tydi-docTquery-mt5-large
Keep everything as simple as possible!!! (global ranking, fixed query-doc relation, no popularity, no personalisation)
Goal: 2 IPv8 community peers gossip: Query, item-clicked, ~~item-clicked-popularity~~, item-not-clicked, date, signature.
@mg98
- Get NanoGPT operational on DAS6 GPUs
- existing weights, transfer learning
- first graph, a loss function for trivial training and testing data
@pneague
- replace existing model with NanoGPT
- get the training going
- {future sprint} Strategy for item-not-clicked? Use simply another item in top-K. Can we use co-occurence (same query, different docs) to find a non-click substitute?
add some database angle for workshop scope: also run these queries with Postgress and see how fast it is in classical methodology.

LLM for search related work example on Github called vimGPT:

vimgpt.mov

pneague · 2023-11-22T12:23:18Z

I got the T5 LLM to generate the ID's of ORCAS documents.
Current Setup:

From entire dataset, I took 100 documents which have around 600 queries associated with them each, yielding around 60k query-document pairs. No query-document pair appears more than once.
I split the dataset into train/test with a split factor of 50%
Two agents read the same data from the disk, initially the train set
They send each other sequentially every row of the data (which at this point looks like [query, doc_id] )
They train on the message received but not the one sent (as they both have the same data I'm avoiding training on the same data twice)
The model predicts the doc_id given a query
After all train_dataset has been iterated through, I count this as an epoch and I iterate through it all over again. I count the number of times the doc_id was guessed by the model and this is how I calculated accuracy
After each 'epoch', if accuracy on train set reaches >=90% I saved the model and tokenizer
Training took about 12 hours
Then I calculate accuracy on the test set using the same method (but without training on the new data)
This way, accuracy on the test set was found to be 93%, proving that the model has a high potential to generalize

I was looking for what to do moving forward.

I found a paper survey on the use of LLM's in the context of information retrieval. It was very informational, there's a LOT of research in this area at the moment. Made a list of 23 papers which were referenced there that I'm planning to go through at an accelerated pace. At the moment I'm still wondering what to do next to make the work I've already performed publishable by the conference on the 5'th of Jan.

synctext · 2023-11-22T14:26:08Z

Amazing progress yet again 💥
Paper at prior edition of target workshop: https://people.eng.unimelb.edu.au/jianzhongq/papers/DBML2023_HybridSpatialIndex.pdf
Active learning, asking user about metadata https://doi.org/10.1145/1899412.1899414 (old-skool crowdsourcing)
10 users, produce first loss graph, recall results, or something performance graph
Determine storyline: "decentral Google?" Write the first 2-page of experiment and goal text. No intro or related work yet.

update
Please try to think a bit already about the next step/article idea for upcoming summer 🌞 🍹 ? Can you think of something where users donate their GPU to Tribler and get a boost in their MeritRank as a reward 🥇 ➕ the Marcel angle of "active learning" by donating perfect metadata. Obviously we need the ClickLog deployment and crawling deployed first.

pneague · 2023-12-12T09:36:44Z

In the past weeks I've managed to introduce 10 users who send each other query-doc_id pairs.

The mechanism implemented is the following:

a number of 100 documents per available peer is selected from the entire ORCAS dataset from the beginning to act as the actual dataset
from this, the new dataset is split into train/test datasets, keeping the ratio of each document in the dataset equal (so if there are 20 queries for a document, 10 will be in the train set and 10 in the test set). I've hardcoded that no documents appear which have only 1 query associated with them, meaning they would have appeared only on the train or test sets. The test set is excluded from training, only the data from the train set is sampled in the training process;
from the documents available, each peer samples a random number between 80 and 120 documents that act as the peers own dataset. Peers may sample documents which have already been sampled by somebody else. In total for the experiment with 10 peers, 661 documents were sampled by at least 1 peer out of 1000 (100 docs per peer * 10 peers);
each peer initiates its own T5 model (small version) and sets it to train mode;
training is now performed in batches of 32. Each peer has a list (corresponding to the batch-data) containing the query and another list containing the doc_id. When the list reaches 32 items, the peer trains its model on the data from those 2 lists and then resets them;
every 0.1 seconds, each peer selects a random query-doc_id pair from its own dataset and sends it to another random peer, but does not append to its own current_batch_list. This is done to not agglomerate the training with a peers own data more than the data of the other peers. So each peer appends data (equal to 32 / nbr_of_peers_currently_identified) to its own batch_list when the it is empty. This way we can more or less control that the data fed into the model of each peer is approximately equal probability to come from any peer in the network, including the current peer;
I've tried experiments with 2, 10, 32 peers so far. The experiments with 2 and 10 peers have performed well. For the case with 10 peers, training was finished within 6 hours and they all have an accuracy of 99-100% on the train set and 90-91% on the test set (for the 661 sampled documents out of 1000). The experiment with 32 peers ran out of RAM memory (as each peer holds its own model) and started performing erratically, I don't think we can trust those results. I've talked with Sandip and got an account for DAS6 as I don't think we can scale the experiments more without a training server. I'll be working to understand how to use it;

For the future I think trying to use DAS6 to perform a test with 100 peers may be worthwhile to check the integrity of the model and the evolution as the number of peers increases.

synctext · 2023-12-12T11:14:10Z

AI with access to all human knowledge, art, and entertainment.

AGI could help humanity by developing new drugs, treatments for diseases, and turbocharging the global economy.
Who would own this AGI? Our dream is to contribute to this goal by pioneering a new ownership model for AI and novel model for training. AI should be public and contribute to the common good. More then just open weights, full democratic self-governance. Open problem is how to govern such a project and devise a single roadmap with conflicting expert opinions. Current transformer-based AI has significant knowledge gaps, needs thousands or even millions of people to tune. Needs the Wikipedia paradigm! Gemini example: what is the most popular Youtube video. The state-of-the-art AI fails to understand the concept of media popularity, front-page coverage, and the modern attention economy in general.

It all starts with Learn-to-Rank in full decentral setting {current ongoing work}
Unlock swarm-based data
Continuous learning at next level: eternal learning
Get a few thousand people to contribute (e.g. like Linux,Wikipedia,Bittorrent,Bitcoin, etc.)

Related: How is AI impacting science? (Metascience 2023 Conference in Washington, D.C., May 2023.)

synctext · 2024-01-29T14:42:18Z

Public AI with associative democracy

Who owns AI? Who owns The Internet, Bitcoin, and Bittorrent? We applied public infrastructure principles to AI. We build an AI ecosystem which is owned by both nobody and everybody. The results is a democratically self-governing association for AI.

We pioneered 1) a new ownership model for AI, 2) novel model for training, and 3) competitive access to GPU hardware. AI should be public and contribute to the common good. More then just open weights, we envision full democratic self-governance.
Numerous proposals have been made for making AI safe, democratic, and public. Yet, these proposal are often grounded exclusively in either philosophy or technology. Technological experts from the builders of databases, Operating Systems, and clouds rarely interact with the experts whom deep understand the question 'who has control'? Democracy is still a contested concept after centuries. Self-governance is the topic of active research, both in the world of atoms and the world of bits. Complex collective infrastructure with self-governance is an emerging scientific field. Companies such as OpenAI run on selling their AI dream to ageing companies such as Microsoft. There is great need for market competition and a fine-grained supply chain. Furthermore, lack of fine-grained competition in a supply chain ecosystem is hampering progress. Real world performance results irrefutably show that the model architecture is not really that important, it can be classical transformers, Mamba, SSM, or RWKV. The training set dominates the AI effectiveness equation. Each iteration brings more small improvements to a whole ecosystems, all based on human intelligence. Collective engineering on collective infrastructure is the key building blocks towards creating intelligence superior to the human intellect.

AI improvements are a social process! The process of create long-enduring communities is to slowly grow and evolve them. The first permissionless open source machine learning infrastructure was Internet-deployed in 2012.
However, such self-ruled communities only play a minor role in the AI ecosystem today. The dominating AI architecture is fundamentally unfair. AI is expensive and requires huge investments. An exclusive game for the global tech elite. Elon Musk compared the ongoing AI race to a game of poker, with table stakes of a few billion dollars a year. Such steep training costs and limited access to GPUs causes Big Tech to dominate this field. These hurdles notably affect small firms and research bodies, constraining their progress in the field. Our ecosystem splits the ecosystem by creating isolating competitive markets for GPU renting and training set storage. Our novel training model brings significant synergy, similar to the Linux and Wikipedia efforts. By splitting the architecture and having fine-grained competition between efforts the total system efficiency is significantly boosted. It enables independent evolution of dataset gathering, data storage, GPU rental, and AI models.
Our third pioneering element is the democratic access to GPU hardware. One branch of distributed machine learning studies egalitarian architectures, even a tiny smartphone can be used to contribute to the collective. A billion smartphones, in theory, could significantly outsmart expensive hardware. Wikipedia and Linux have proven that you can't compete with free. We mastered the distributed, permissionless, and egalitarian aspects of AI. The next stage of evolution is to add democratic decision making processes. A team of 60 master students is currently attempting to engineering this world-first innovation collectively.
Another huge evolutionary leap is AI with access to all human knowledge, art, and entertainment. Currently datasets and training hardware are expensive to gather and store. For instance, the open access movement to scientific knowledge has not yet succeeded in creating a single repository. The training of next-generation AI requires completion of this task. All creative commons content (text,audio,video,DNA,robotics,3D) should be scripted in a expanding living dataset, similar to SuperGLUE set-of-datasets. Cardinal problem is building trust in the data, accurancy, and legal status. We pioneered in prior work a collective data vault based on passport-grade digital identity.

pneague · 2024-01-30T09:56:46Z

In the last few weeks I had run experiments with ensembles of peers. The experiments with more than 10 peers makes the laptop run out of RAM memory and starts acting weirdly so I had to change the direction my work.
The current idea is that T5 small is not able to fit inside its weights that many doc_id's (because it is so small). But we need it to be small for it to run on Tribler peer computers.

So in order to increase the number of retrievable documents I thought of sharding the datasets, with each shard having its own peers. In the experiments performed, each shard consists of 10 peers.

Each of the experiments was successfully run, with each peer achieving good results on the shard's test set (as described in a previous entry here).
Each shard was trained in an independent run so the laptop I'm using wouldn't run out of RAM memory.
Each shard had different doc-id's from the other shards.
I've used 5000 documents per shard and let each peer catch a random number of documents between 200 and 300 (as in the previous entry).
Documents not chosen by any peer were discarded.
After successfully training all models on their respective shards, I experimented with using ensembles to aggregate the results of multiple shards. Initially, the idea was that the system would pick a random number of models, belonging to all shards, and each picked model would vote on a document_id, given a query. But this relied on chance picking the models belonging to the right shard for each tested query. Marcel came up with the idea that we could in principle gossip the shard-number of each peer and then we would know to ask models from each shard given a query.
The idea was that models trained on the right data would pick the correct document (as each had a top1 accuracy of 90%), while models not trained on the right data would output either random documents, different from one model to the other, or hallucinate doc_id's, different from one model to the other. So when we see two models voting for the same doc_id, we know that they were trained on data matching the query in question.
Another ensemble idea was to get the top5 results for a query with beam-search, and get their model-scores for those 5 beams. After that we could take softmax of the 5 results so we know the confidence that the model has on each of them. Then, instead of summing the number of times a result was suggested by a model, we would sum the confidences of each model for each result.
At the moment I'm still running some experiments but here are the accuracy results for each shard:

The image above depicts the accuracy on the test set of each shard of each peer belonging to that shard. Blue is top1 accuracy, and red is top5 accuracy (obtained with beam-search).

This diagram shows how a 2-shard ensemble would work in the voting and confidence mechanism (in the previous iteration where the models were chosen randomly, without caring how many models we get from each shard)

synctext · 2024-02-29T07:20:51Z

Solid progress! Operational decentralised machine learning 🚀 🚀 🚀 De-DSI for the win.

Possible next step is enabling unbounded scalability and on-device LLM. See Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis or the knowledge graph direction. We might want to schedule both! New hardware will come for the on-device 1-bit LLM era

update: Nature paper 😲 Uses LLM for parsing of 1200 sentences and 1100 abstracts of scientific papers. Avoids the hard work of PDF knowledge extraction. Structured information extraction from scientific text with large language models this work outputs entities and their relationships as JSON documents or other hierarchical structures

pneague · 2024-03-26T08:13:03Z

Fresh results from DAS6 for magnet link prediction:
1000 docs - 90.5%
5000 docs - 77%
10000 docs - 65%

See comparison between predicting docids vs magnet links:

When the dataset is relatively small, the accuracies are the same for both top-1 and top-5. As more data appears in the dataset, we can see a divergence in the accuracies posted in both metrics. We hypothesize that the limited number of weights in our model efficiently captures URL patterns in scenarios with sparse data. However, as the data complexity increases, this constraint appears to hinder the model’s ability to accurately recall the exact sequence of tokens in
each URLs. This is merely a guess, and we intend to investigate this further in future work. However, the observed
discrepancy in accuracy levels remains marginal, amounting to merely a few percentage points across a corpus of 10 K
documents.

pneague · 2024-04-10T13:29:39Z

Poster for the De-DSI paper:
De-DSI Poster.pdf

pneague · 2024-05-21T12:36:56Z

One of the ideas to further develop the De-DSI paper was to perform the sharding division of documents in a semantically meaningful way. This is what I've done in the past couple of weeks.

So the problem was that if you shard documents randomly, you have 2 similar documents in different shards and when querying all shards for one of the 2 documents, you'd get high confidence on both shards. This leads to a 50/50 chance that the correct shard will have a higher confidence than the incorrect one.

The idea was to perform semantic sharding such that all documents of a type would be in one shard. This would resolve the confusions between shards as each one would know which document needs to be retrieved and the others will have low confidence in their result this way.

So I:

Trained 10 T5 models on 1k docs each, with docs having at least 10 queries each and got the ensemble accuracy
Got the embeddings of the T5-small of all queries for each document and averaged the query-embeddings to obtain the embedding of the document;
Afterwards I used K-means to get the splitting done. K in this case is the number of shards = 10;
Trained 10 T5 models on the documents in each cluster calculated with K-means, and got the ensemble results of that;
Plotted the accuracy distribution in boxplots for top1-top5. Each boxplot represents the accuracy on the dataset of 10 shards (aggregated) by either the individual shard or the ensemble for both the random-sharding and semantic-sharding setting.

I compared the results and it turns out it doesn't work as hoped

I believe the issue is that if shards have semantically meaningful documents, it is harder to distinguish between them and so the confidence of the correct shard is lower than before. This means that more-different-but-still-slightly-similar documents which are in other shards have a higher chance than before to beat the confidence of the correct document in the correct shard.

I thought about what exactly I could do about this but I haven't come up with anything yet.
Jeremie recommended I look into fully decentralized ML training which is resistant to some kind of attacks. I have an idea on how it may be done but I need to read more on it first as its a new topic to me.

pneague · 2024-05-27T13:50:51Z

In the last few days I've read papers on

federated learning and how gradient passing is anonymized;
personalized federated learning;

I also thought about how a mixture-of-experts with multi-layered semantic sharding would work. At the moment something that I could try would be:

Take 10.000 documents and get the average of the queries of each as the representation of the document as before
Use K-means with K = 5 to split the documents into 5 shards, assign to each a number 1-5
Then, for each of the 5 K's, consider only the documents belonging to the shard and perform another K-means with K = 5, assign to each a number 1-5
Then, for each of the last 5 K's, perform another K-means with K = 4, assign to each a number 1-4
Thus ending up with a nested sharding method for 554 = 100 shards
Use a master DSI model in a mixture-of-experts method to predict the ID of the shard, e.g. 2-1-4 would represent the shard belonging to cluster 2-1-4;
Having the shard I can ask it specifically to give me the prediction, or I could take the ensemble;
If using an ensemble, this would still have the issue that slightly less relevant documents present in another shard would outcompete in confidence the correct document present in a shard which has many relevant documents. Thus, I would predict this not to work super well in ensembles, but I am not 100% sure of it

I also haven't found any paper on personalized models in decentralized federated learning, so it would be a gap which is unexplored and thus maybe easy to publish about.

synctext · 2024-05-27T14:53:10Z

Focus on finding a phd problem to solve. Avoid "Technology push" that makes much science useless. We need GPU's for training. We need a dataset. We need publishable problem.

Perhaps it is time to dive for 3 weeks into a production system? Some ideas and links

Aggregate information is extremely costly in distributed systems. Concepts like content popularity always involve the opinion of strangers, therefore nearly impossible to secure against spam and abuse. Sybils can be real, see how Google is captured by SEO and clickbait dominates influencer culture.
- spam in decentralised systems is fundamentally unsolved
- we can have anonimity, but we need long-lived identities for strangers we re-encounter
- Our lab paper: Failures of public key infrastructure: 53 year survey
- track their merit collectively, many times proposed. Never realised.
- Bitcoin and Bittorrent are the only true decentralised system in existance
  - very spammy!
  - note: transaction spam is a thing
Bulat his MeritRank deployment is obviously in scope. master student work, Web3Recommend: Decentralised recommendations with trust and relevance
Trust problems and spam are very real. The tag spam issue of "Justin Bieber is Gay"
Content popularity measurement
Tags are popular scientific topic: Deep Learning YouTube Video Tags
Our decentralised personalisation from 2005. P2P-based PVR recommendation using friends, taste buddies and superpeers
How to make a frontpage of decentralised Youtube/Tiktok? Trending content related work.
Spotify/Youtube content of 10k-400k songs is easy to find
- 7th place solution to The 3rd YouTube-8M Video Understanding Challenge
- Single top 10k songs

Hipster publishable idea: secure information dissemination for decentralised AI (e.g. MeritRank, clicklog, long-lived ID, sharing data, not unverifiable vector of gradient decent)

pneague · 2024-06-10T13:23:57Z

In the last few weeks I looked into methods of estimating reputation and sybil-defense in a graph network by using ML models. There are quite a few methods for doing stuff like this in all types of areas, for example in edge-computing devices, social networks etc.

After talking with Bulat, he suggested we could try to use Meritrank and some kind of model to limit the amount of resources that a sybil attack could sap from the network. The idea is still in the incipient phase and it's not clear to me if it works. Bulat suggested that instead of doing what the other papers have done (for example the papers doing reputation estimation with social networks were using social network information to find sybils), we could try to do this solely by using the graph data. I'm not sure if this is possible but I think it's in the realm of possibility.

Additionally, we would not use a supervised-learning method where we have the sybils clearly mapped, but get a dataset where we assume all members of the graph to be honest, and then perform all types of sybil attacks possible on the network and see if we can limit how much attackers gain from this somehow. We could also implement methods of previous papers and compare our results to theirs in a situation where all types of sybil attack is simulated. Bulat mentioned he doesn't know of a paper taking this approach so far.

I have also talked with Quinten about the dataset his code is collecting. It's interesting but not very rich, even if we may have lots of data. You can see a very small sample meant as an example below: .

Basically we have query, infohash, score, parent_query_forwarding_pk. The score is calculated as thus:
If you search for a query and you click a link and you don't search for the same query again, you're assumed to be satisfied with the link, so the score = 1.0
If you search for a query and you click a link and you are not satisfied, you search for the query again. If you click the second link and you are satisfied you stop there. Thus, the first link clicked has a score of 0.2 and the second link clicked has a score of 0.8.

This is interesting, and may provide a way to get reputation (for the person who's seeding the content in the first link and for the person who's gossipping the queries). But I am not sure we can do it well if we don't have that many users vs number_of_links_available. We'll have to see how much data we end up with in a few months.

synctext · 2024-06-10T13:59:50Z

personalisation would be a great phd topic to explore
- Sep 2025 you need another phd thesis chapter finished/published
- By X-Mas 2024 you need an idea.
- So just semi-random walks and goal-oriented exploring
- Learn-by-doing 👈 ❗
- Work towards 1 graph of results which can be expanded into a whole thesis chapter
Start a practical 3 week sprint?
Building a collaborative filtering recommender from scratch in Python
Not designed for Tribler, but for phd learning
Fill with Movielens data, lot of existing code you can re-use for quick insight
See @mg98 his MovieLens explorations :-)
- using tags to understand items
- this will boost quality tremendously
- Semantic understanding, something we in the future could feed into LLM + De-DSI 🏎️
Or use Creative Commons content, there is audio content and user profiles available for music
Outcome: single amazing .GIF .... Tada🙆🏻‍♂️🙆🏼‍♂️

btw about teaching...prepare for helping out with msc students more + master course of Blockchain Engineering.

update : machine learning for 1) personalisation 2) de-DSI content discovery 3) decentralised seeder content discovery {DHT becomes 👉 IPv4 generative AI} 4) sybil protection 5) spam protection 6) learn-to-rank

pneague · 2024-07-02T12:32:18Z

In the last few weeks I was in vacation. After that I got a recommendation engine working based on collaborative filtering of the movielens dataset. Nothing too fancy, just an SVD algorithm applied on the movielens-1m data. I've also read a few papers, including a literature review on foundation models in recommendation algorithms.

I got two preliminary ideas for future research that I haven't seen yet implemented:

Recommendation engine based on local agent suggestions: If in the near future we will all have LLM agents with whom we'll be interacting to get work done, these LLM's will be able to know quite a lot of information about us. This may allow the agent to suggest items (or even search terms) on a video-providing website. Say I interact a lot with my agent to learn about ancient history, if it would be possible to ask the agent what I am interested in viewing it may say 'ancient history documentaries', thus enhancing the experience of websites like Youtube/Netflix or Tribler
AI algorithm to detect spam in p2p network: using LLM's to determine if files are what the title implies that they are

The two ideas could be used together as well I imagine.
I'll think some more on how I could get one or both of these ideas conceivably done if we deem them interesting

synctext · 2024-07-02T12:43:04Z

Still a few months left to find a great paper idea 🕙
Edge AI and recommenders is hot. rough paper idea: 1) classic collaborative filtering, 2) LLM-based CF, 3) LLM-based CF plus Differentiable Search Index (DSI). Compare performance cost of a "local-first search engine".

"As simple as possible" architecture: 3 items send; 3 recommended items received.

Paper idea: aim to have a recommender without clicklog leakage. No text queries. Peers do not explicitly exchange profiles. Spread real clicklog snippets, from an unknown peer. Focus on unlinkability. They replay old recommendation requests to hide their own request. Use this as a naive approach, with known spam vulnerability.

goal for 19 Aug 2024: Above architecture. 100 IPv8 peers listening, send 3 items to random peer, you get 3 recommended items back. Movielens. Outcome format: single amazing .GIF .... 🎉

update: share the embedding with another user. This could somehow be used to train a model. on-device model. 1 protocol query/response for both real-time search/recommendation and online continual learning in background. Build upon our strength: permisionless gen-AI with full scalability.

Possible goal:

15 Sep 3 readable drafts paper ideas
Only part-time experimental work!
Each paper has only 3 sections: intro+related_work+Problem_Description + title

pneague · 2024-08-26T08:36:53Z

In the last 2 months I went with Marcel to the Oxford NLP summer school, took vacation back home and worked on an idea I had recently. I refreshed my understanding of the topic, having in the last few years not touched the topic professionally.

The professor was from King Abdullah university in Saudi Arabia, his name is Naeemullah Khan.

While there I thought more deeply about an idea I came up with previously, and pitched it to Prof. Khan and another postdoc from a lab at Oxford. The postdoc is Dr. Naman Goel. The idea is to use the Microsoft Recall upcoming feature (which takes screenshots of the activity on the PC every few minutes) in order to get an idea about the preference of the user. This preference can be used to generate query-recommendations for web services, including Tribler.
There is an understandable reluctance by the internet community to use this kind of feature because of privacy concerns, but I think that as AI becomes stronger and chips become cheaper, this kind of AI which looks at everything you do and then helps you in different ways could be come essential for daily life. I think the problems with privacy will be addressed eventually in some way and so I think that starting to work on this topic now positions us well for the future.

Both Prof. Khan and Dr. Goel gave their approval and Dr. Goel even said he's willing to contribute with weekly calls and analysis of results (the code would be my task).

synctext · 2024-08-30T13:47:15Z

screenshots, video identification using LLM, video profile, recommend
Semantic cluster. Dedicated highly scalable architecture for query storage and low-latency retrieval. Use embedding vector for connectivity graph of network.
Publish a piece of the Global Brain Dream. A place where you can browse all knowledge of humanity and can actively expand it. This is one of the motivating visions for building Tribler.
One of the defining qualifications of a truely intelligent machine is the ability to understand abstract concepts from a collection of written words. The scientific problem we focus on is understanding concepts within scientific literature using state-of-the-art generative AI. We use a knowledge graphs to connect various abstract concepts. As a use-case we created a dataset of scientific publication, annotated with the abstract concepts they pioneered. The datasets are obtained from ArXiv and PubMed OpenAccess repositories. We craft the 'global knowledge graph', as published in this nature paper for bio domain. Extract the novel concept from novel scientific papers in machine learning. speculative idea We manually used historical records to annotate various scientific milestone papers such as ELIZA 1966, ImageNet,AlphaGo, GAN, etc. with the abstract concept they pioneered. Then we trained our model until it was capable of predicting the leave-one-out of our training set.
High-Precision Extraction of Emerging Concepts from Scientific Literature
(2013) From ambiguous words to key-concept extraction
PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models

Venue: LCN or the collective intelligence Journal: https://journals.sagepub.com/editorial-board/COL

pneague · 2024-09-09T07:04:53Z

Research plan.pdf

qstokkink · 2024-09-09T08:23:52Z

A potentially interesting topic for your PhD is to check out self-evolving distributed ontologies based on Tries and - at least in this text-based proof of concept - based on Gemini (but other models like ChatGPT should also work). Of course, communicating using human language with the Gemini model is (probably) not a good way forward and this would need some more sophisticated hooking into the underlying model (i.e., Gemini here).

My txt-based intuition is here: learningtrees.txt

synctext · 2024-09-09T09:32:06Z

Great progress! For next sprint

follow a structured process for your 2nd thesis article. Maximise yield of 12 months of your time.
2 ideas into first-draft stage. 1-2 page writeup each.
Show them before next meeting to fellow phd student (not publishable,workshop, conference level)

update
Reading list (lots of reading, focus on single thing at end of Oct??)

idea of copyrighted image detector: DETECTING PRETRAINING DATA FROM LARGE LANGUAGE MODELS
Decentralised learning deploy and privacy
- PETALS: Collaborative Inference and Fine-tuning of Large Models
- Noiseless Privacy-Preserving Decentralized Learning

update2
Found the original paper doing "movie mining" is from 2015 and have over 3000 citations now. See the MIT and Toronto arXiv paper "Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books The first challenge we need to address, and the focus of this paper, is to align books with their movie releases in order to obtain rich descriptions for the visual content. We aim to align the two sources with two types of information: visual, where the goal is to link a movie shot to a book paragraph, and dialog, where we want to find correspondences between sentences in the movie’s subtitle and sentences in the book. and
To evaluate the book-movie alignment model we collected a dataset with 11 movie/book pairs annotated with 2,070 shot-to-sentence correspondences.

mg98 · 2024-09-21T10:14:27Z

Since you're doing automatic content analysis for decentralized search, here are some papers for related work:

Source	What they do
ipfs-search.com	centralized metadata extraction using Apache Tika (e.g. audio length, artist, title, in case of a MP3)
Khudhur et al. 2019. Siva-the ipfs search engine	metadata extraction using Apache Tika
Zhu et al. 2020. Keyword search in decentralized storage systems	metadata extraction using Apache Tika
Wang et al. 2020. Keyword search technology in content addressable storage system	they mention something but don't describe it furhter ("Extract file attribute information as keywords. This step is important for non-text files because we cannot extract keywords from the contents of non-text files.")
Dix et al. 2006. Position Paper: Concept Classification for Decentralised Search	they also mention metadata extraction
Keizer at al. 2023. Ditto: Towards decentralised similarity search for Web3 services	automatic keyword extraction (on text though) using YAKE, this is a bit more semantic/ML flavored

pneague · 2024-10-07T08:03:55Z

For the past 2 weeks I was reading papers and trying to understand the cutting edge in distributed training.

In particular I focused on a recent preprint paper
The idea there is to combat data inference attacks in distributed training (which leverages model gradients received from peers to infer the data that the peers have) by splitting the model into multiple parts and sending peers only one part. This way, every peer ends up with a complete model composed of parts from multiple peers. This makes data inference impossible.

I spent time understanding the mathematics of the issue (convergence and privacy guarantees) and made good progress. I realised that in order to be able to perform this kind of work I would need to go through the references to understand the theorems that are used in this field. This would take a while. Remains to be decided whether it's a good use of my time.

Additionally, I ran their algorithm, posted here.

synctext · 2024-10-07T08:30:46Z

Road to get 3 written down ideas...

inference and mixnet architecture

Middleware 2022 paper: MixNN: Protection of federated learning against inference attacks by mixing neural network layers
Noiseless Privacy-Preserving Decentralized Learning
Byzantine resilience?
🎉 🦄 Shatter code works
Actual Threat Model for machine learning, instead of convergence theory. Contrast this work with systems work from USENIX 2017 workshop: Adversarial example defense: Ensembles of weak defenses are not strong

Microsoft recall; local search agent idea. Purpose: personal search, recall your history, personal media consumption. Do this prevacy-preserving, fully decentralised!

selling a system is a bad ML idea. Algorithm 1 novelty 👍
decouple item modeling from user modeling, as advised by Bytedance HLLM paper

Search scientific literature. embedding of PDF files, link to Global Brain idea.

Science: scalable models of intelligence
De-DSI reusage? Build upon existing code?
Re-use this TPI-LLM code? Our TPI-LLM system addresses the privacy issue by enabling LLM inference on edge devices with limited resources. The system leverages multiple edge devices to perform inference through tensor parallelism, combined with a sliding window memory scheduler to minimize memory usage. Currently, TPI-LLM can run Yi-34B in full precision on 4 laptops with 5GB of memory on each laptop, and run Llama 2-70B on 8 devices with 3GB of memory on each device.

reference baseline for decentralised learning. Lab-idea for longer time. Egbert, Quinten, Bulat time investment.

https://scholar.google.com/scholar?q=stronger+baselines+~machine+~learning&pagesize=100
Decentralised learning: search, search results, latency between nodes, response times, accuracy, daily active users, monthly active, churn (going offline), changes of IPv4 Internet Address, NAT connectivity, etc.
One thesis to explore and compare: proxies for parameter exchange and data exchange

Systems or networking storyline for publication IEEE LCN or PETS or Middleware. Future ambition is NeurIPS or ICML

For next meeting in 2 weeks: attack ideas, IPv8 porting effort, get a experiment graph out of Shatter

pneague · 2024-10-21T11:09:24Z

I have further looked into the code from SHATTER, data inference/reconstruction attack methods, and (as per Jeremie's recommendation) into MixNN which does similar work, though more basic.

I have presented the attack idea on models which mix their parameters and send them to different people to Dr. Naman Goel from Oxford lab and he suggested that since the method is not widely accepted, it may be an attack on an architecture which not many people use, thus being not very interesting.

I thought of looking into byzantine attacks in decentralized networks, then saw that a normal gradient similarity method has been published already in June this year, so I'd have to see if I can come up with something new. I found a literature review on the topic which I believe would be useful to read.

Idea: User has consumed some content, each with a semantic coordinate (calculated with an LLM for example). Then, we calculate the semantic coordinates of the user as the average of the coordinates of the content they have consumed. If I search with a query, I get the coordinates of the query, and then check around me for people who's semantic coordinate is closest to the query, then I ask them, as they are the most likely users to have content in which I'm interested.
Not sure if idea is feasible but it is plausible

synctext · 2024-10-21T12:39:18Z

Document needed for phd progress meeting. Mixture of Experts scaling is a great opportunity for decentralisation we talked about already in 18 Oct 2023. Idea outline:

Online decentralised learning is still immature. First, it is an emerging topic and not much competition yet.
Build upon the Decentralized Mixture-of-Experts work
- {NeuropISP2020} Towards crowdsourced training of large neural networks using decentralized mixture-of-experts
- DHT for decentralisation in https://learning-at-home.readthedocs.io/en/latest/user/dht.html
- Hivemind on Github is dying down: https://github.com/learning-at-home/hivemind/graphs/contributors
- Lack of trust building!
- Above work has no real application
- No user community that takes it forward 🧐
Microsoft recall idea: network of crawlers and personal recommender/archive/search engine.
The WebEngine: a fully integrated, decentralised web search engine
- decentral Google as cardinal use-case
- 30k actual web queries. https://paperswithcode.com/dataset/mslr-web30k-1
- Keep your personal queries local, do not leak (privacy preservation, unlinkability!). Essential for privacy, also essential for performance.
- Build upon mixture-of-experts De-DSI {without explicit boundries}
Feb EuroMLSys venue and Journal later

update much related work exists on 6G federated learning. Yet highly theoretical, impractical, and immature. Great stuff to help realise for real 😃 IEEE/ACM Transactions on Networking cfp

AI on Networks
● Decentralized learning, distributed training and inference, federated learning over
device-edge-cloud continuum
● Trustworthy and privacy-preserving AI over a wide spectrum of networks
● Large foundation model pre-training, fine-tuning and inference over large-scale networks
● Robust adversarial machine learning over wired and wireless networks
● Resource-constrained and edge deployable AI solutions, experiments and testbeds for
ML-driven wireless systems.

15 Jan 2025 deadline, super rush! 🤔

pneague · 2024-10-29T08:47:13Z

Idea 1: Decentralized file-search based on taste embeddings:

Description: When searching for a file in a decentralized network, instead of flooding the network with the query, the system finds people who have similar items to my query and only query them.
Methodology:

Use the ORCAS dataset (which contains query – doc_id pairs), and assign 10 nodes (representing users) a number of documents with their associated queries;
Use an LLM to get the embeddings of all queries;
Calculate embeddings of a document as the average embeddings of its queries;
Calculate embeddings of a node (representing his/her taste) as the average embedding of their documents;
For a new query, calculate cosine similarity between query and all nodes we’re aware of, and send the query to the most similar nodes. Average DSI retrieved doc-ids. This way it works like a mixture of experts;
Open Question: What is the state of the art in decentralized search

Idea 2: Decentralized learning with model-parallelism:

Description: Investigate different aspects of model training in decentralized networks when single nodes can hold only a section of the model.

Methodology:

Look into privacy aspects of the matter, such as how the node holding the first section of the model perturbs the updates sent to the others in order to safeguard its data;
Look into how fault tolerance strategies, including how nodes can reorganize themselves to address issues as they arise;
Consider additional aspects such as communication efficiency and load balancing in a fully decentralized environment.

Brief update after discussion with Naman:
Idea 1 is decent, low risk, low reward. It's publishable but doesn't have many avenues for future research; Not great conference potential but publishable
Idea 2 is ambitious, high rish, high reward. It's probably a direction with a lot of competition but has potential for future research and is more publishable to better conferences

Both ideas should be pursued at the same time. If second fails to deliver because the field is too crowded, at least I have the first one.

So the general plan with De-DSI:

Write the Go/Nogo doc
Find aprropiate baseline for comparison
Start implementation

And the general plan with the decentralized model-parallel training:

Get acquainted with the literature and write it down here
Once I understand the current state of the art in the topic, I'll see if I can find anything interesting to do

qstokkink · 2024-11-06T13:52:09Z

Some reading pointers:

Semantic Overlay Networks. Arturo Crespo and Hector Garcia-Molina
https://resources.mpi-inf.mpg.de/d5/teaching/ws03_04/p2p-data/12-09-writeup2.pdf

Kademlia: A Peer-to-Peer Information System Based on the XOR Metric
https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=eb51cb223fb17995085af86ac70f765077720504

Epidemic Broadcast Trees
https://www.dpss.inesc-id.pt/~ler/reports/srds07.pdf

synctext added the type: enhancement label Sep 4, 2023

synctext assigned pneague Sep 4, 2023

synctext added type: Epic and removed type: enhancement labels Sep 4, 2023

mg98 mentioned this issue Sep 19, 2023

phd placeholder: "Decentralized Machine Learning Systems for Information Retrieval" #7290

Open

synctext mentioned this issue Nov 8, 2023

Msc placeholder: exploring LLM as a database #7435

Closed

qstokkink added type: PhD work and removed type: Epic labels Aug 26, 2024

qstokkink mentioned this issue Sep 5, 2024

An alternative algorithm for calculating torrent popularity #5589

Closed

qstokkink mentioned this issue Sep 19, 2024

My updates: recommended based on your downloads and searches #1846

Closed

Phd Placeholder: learn-to-rank, decentralised AI, on-device AI, something. #7586

Phd Placeholder: learn-to-rank, decentralised AI, on-device AI, something. #7586

Comments

synctext commented Sep 4, 2023 • edited Loading

pneague commented Oct 3, 2023 • edited Loading

synctext commented Oct 3, 2023

pneague commented Oct 18, 2023 • edited Loading

synctext commented Oct 18, 2023 • edited Loading

pneague commented Nov 2, 2023 • edited Loading

qstokkink commented Nov 3, 2023

pneague commented Nov 3, 2023 • edited Loading

synctext commented Nov 8, 2023 • edited Loading

pneague commented Nov 22, 2023 • edited Loading

synctext commented Nov 22, 2023 • edited Loading

pneague commented Dec 12, 2023 • edited Loading

synctext commented Dec 12, 2023 • edited Loading

AI with access to all human knowledge, art, and entertainment.

synctext commented Jan 29, 2024 • edited Loading

Public AI with associative democracy

pneague commented Jan 30, 2024 • edited Loading

synctext commented Feb 29, 2024 • edited Loading

pneague commented Mar 26, 2024 • edited Loading

pneague commented Apr 10, 2024

pneague commented May 21, 2024 • edited Loading

pneague commented May 27, 2024 • edited Loading

synctext commented May 27, 2024 • edited Loading

pneague commented Jun 10, 2024

synctext commented Jun 10, 2024 • edited Loading

pneague commented Jul 2, 2024

synctext commented Jul 2, 2024 • edited Loading

pneague commented Aug 26, 2024 • edited Loading

synctext commented Aug 30, 2024 • edited Loading

pneague commented Sep 9, 2024 • edited Loading

qstokkink commented Sep 9, 2024

synctext commented Sep 9, 2024 • edited Loading

mg98 commented Sep 21, 2024 • edited Loading

pneague commented Oct 7, 2024 • edited Loading

synctext commented Oct 7, 2024 • edited Loading

pneague commented Oct 21, 2024 • edited Loading

synctext commented Oct 21, 2024 • edited Loading

pneague commented Oct 29, 2024 • edited Loading

Idea 1: Decentralized file-search based on taste embeddings:

Idea 2: Decentralized learning with model-parallelism:

qstokkink commented Nov 6, 2024

synctext commented Sep 4, 2023 •

edited

Loading

pneague commented Oct 3, 2023 •

edited

Loading

pneague commented Oct 18, 2023 •

edited

Loading

synctext commented Oct 18, 2023 •

edited

Loading

pneague commented Nov 2, 2023 •

edited

Loading

pneague commented Nov 3, 2023 •

edited

Loading

synctext commented Nov 8, 2023 •

edited

Loading

pneague commented Nov 22, 2023 •

edited

Loading

synctext commented Nov 22, 2023 •

edited

Loading

pneague commented Dec 12, 2023 •

edited

Loading

synctext commented Dec 12, 2023 •

edited

Loading

synctext commented Jan 29, 2024 •

edited

Loading

pneague commented Jan 30, 2024 •

edited

Loading

synctext commented Feb 29, 2024 •

edited

Loading

pneague commented Mar 26, 2024 •

edited

Loading

pneague commented May 21, 2024 •

edited

Loading

pneague commented May 27, 2024 •

edited

Loading

synctext commented May 27, 2024 •

edited

Loading

synctext commented Jun 10, 2024 •

edited

Loading

synctext commented Jul 2, 2024 •

edited

Loading

pneague commented Aug 26, 2024 •

edited

Loading

synctext commented Aug 30, 2024 •

edited

Loading

pneague commented Sep 9, 2024 •

edited

Loading

synctext commented Sep 9, 2024 •

edited

Loading

mg98 commented Sep 21, 2024 •

edited

Loading

pneague commented Oct 7, 2024 •

edited

Loading

synctext commented Oct 7, 2024 •

edited

Loading

pneague commented Oct 21, 2024 •

edited

Loading

synctext commented Oct 21, 2024 •

edited

Loading

pneague commented Oct 29, 2024 •

edited

Loading