-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #28 from ittia-research/dev
Docs: add factdb, datasets; update to-do
- Loading branch information
Showing
6 changed files
with
91 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
Projects related docs: | ||
- [To-do](./to-do.md): project to-do | ||
- [Workflow](./workflow.md): workflow related docs and to-do | ||
- [FactDB](./factdb.md): a database experiment | ||
- [Experiences](./experiences.md): project experiences | ||
- [Change-log](changelog.md): main project changes | ||
|
||
## Reminders | ||
LLM are getting better and better at their jobs, we need to be aware of the temptation of outsourcing all including foundation (facts) and decision making (thoughts, verdicts) to LLM. | ||
|
||
Be aware of closed information ecosystems. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
Collection of datasets and related models, tools, code, etc. | ||
|
||
## Index & Search | ||
- [Google Cloud Datasets](https://cloud.google.com/datasets) | ||
- [Google Dataset Search](https://datasetsearch.research.google.com) | ||
|
||
## Datasets | ||
### Data Commons | ||
Knowledge database by Google, public as well as private data. | ||
- [Concept](https://docs.datacommons.org/data_model.html) | ||
|
||
Models: | ||
- [DataGemma](https://blog.google/technology/ai/google-datagemma-ai-llm/) | ||
* [paper](https://arxiv.org/abs/2409.13741) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,2 @@ | ||
## Prompt | ||
- If there is unclear reasoning or logic in a prompt, which might have a huge impact on LLM's reasoning ability. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
Media including social-media should follow some basic standards. | ||
|
||
Words are not just words, they expand and expand differently for different people in different time. They are like seeds, web grows out of them and expanding. One goal of counter disinformation is to stop the wrong one from expand. | ||
|
||
We have some common expansion directions, for example: | ||
- When connected a object with a bad thing: blame | ||
|
||
## Goals | ||
- Index the web in the age of automation | ||
- [ ] What's the new data(database) standards in the age of AI? | ||
- [ ] How llm store facts? | ||
- [ ] Maybe we can learn how to build a better database by separate the fact part from LLM model? | ||
- [ ] With a separate and accurate way to store facts, maybe the better way to provide facts to LLM is to connect the database with LLM natively. | ||
- researches: | ||
- https://arxiv.org/pdf/2308.09124 | ||
- https://arxiv.org/pdf/2310.05177 | ||
- [ ] With the rapid development of information environment, humans along are destined to lose. The solution is machine VS machine. | ||
|
||
## To-do | ||
### Get Start | ||
- [ ] Collect examples of mis/dis-information. (Anything related to facts can be used as example, focus on mis/dis-information at the moment.) | ||
- [ ] How to connect parts of one event in database? | ||
- [ ] How to connect entity to their roots, and choose what as roots? | ||
- [ ] From low level factual events to high level summary events. | ||
- [ ] Split the examples to events: time range, known entity, happens(split so that can be matched exactly to a source) | ||
- [ ] When statements is about a general conclusion, when and how to update its related facts to latest? | ||
|
||
### Ensure | ||
- [ ] Sematic search as well as accurate search | ||
|
||
## Database components | ||
- actual event | ||
|
||
## Case Study | ||
### Missing Context | ||
The facts are correct but missing context and will led people misinterpret. | ||
``` | ||
A photo of people wear T-shirt says "Warts for Trump". These people might named Walz but they have never spoken to The Tim Walz. | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters