Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache and output files for "users", "search", and "likes" mode conflicts #23

Open
nadesai opened this issue Feb 2, 2022 · 1 comment
Labels
bug Something isn't working

Comments

@nadesai
Copy link
Contributor

nadesai commented Feb 2, 2022

Currently, queries for "users", "search", and "likes" mode conflict by reusing some caching files, as noted in #17 #18.

For example, under default settings, here are the cache files associated with the target eleurent for different modes:

  • users mode: cache data stored in out/eleurent/cache/followers.json and out/eleurent/cache/friends.json
  • search mode: cache data stored in out/eleurent/cache/followers.json and out/eleurent/cache/tweets.json
  • likes mode: cache data stored in out/eleurent/cache/followers.json and out/eleurent/cache/tweets.json

Note also that for all three modes, the final results are written to out/eleurent/edges.csv and out/eleurent/nodes.csv, thus multiple runs of different modes with the same target will overwrite the result graph, which could lead to data loss.

It may make more sense to create a new directory layer under the target name associated with the mode being used, i.e. use these files instead:

  • users mode: cache data stored in out/eleurent/users/cache/followers.json and out/eleurent/users/cache/friends.json
  • search mode: cache data stored in out/eleurent/search/cache/authors.json and out/eleurent/search/cache/tweets.json
  • likes mode: cache data stored in out/eleurent/likes/cache/authors.json and out/eleurent/likes/cache/tweets.json

Perhaps also the outfile can sit under this new directory layer as well, so that the final outputs for the different modes do not get overwritten. E.g.

  • users mode: final results stored in out/eleurent/users/edges.csv and out/eleurent/users/nodes.csv
  • search mode: final results stored in out/eleurent/search/edges.csv and out/eleurent/search/nodes.csv
  • likes mode: final results stored in out/eleurent/likes/edges.csv and out/eleurent/likes/nodes.csv
@nadesai nadesai changed the title Caching logic for "users", "search", and "likes" mode conflicts Cache and output files for "users", "search", and "likes" mode conflicts Feb 2, 2022
@eleurent
Copy link
Owner

eleurent commented Feb 3, 2022

Yes I think this totally makes sense.

The only "desirable" conflict that I can think of is the one of frienships.json which can be filled a first time when creating the graph of someone's followers, and reused later when creating the graph of their friends instead, since there is probably a significant overlap. But that is not an essential feature, and it would be preserved by your suggestion anyway, so yes by all means :)

@eleurent eleurent added the bug Something isn't working label Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants