Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add support for Bitcask #138

Draft
wants to merge 5 commits into
base: v2
Choose a base branch
from
Draft

WIP: Add support for Bitcask #138

wants to merge 5 commits into from

Conversation

prologic
Copy link

This PR adds (or tries to so far) support for Bitcask an embedded KV store that uses a WAL+LSM and is optimized for sequential writes, fast low latency reads and high throughput.

This is still a work-in-progress as I've had to make changes in Bitcask itself in the refactor_trie branch which adds support for an Iterator/Custor (and I may also add support for Transactions too!)

The tests are not yet passing, and I need some help with this actually as I may have gotten some of the implementation wrong 🤔

.vscode/settings.json Outdated Show resolved Hide resolved
go.work Outdated Show resolved Hide resolved
go.work.sum Outdated Show resolved Hide resolved
@prologic
Copy link
Author

That's better. To test this branch (not yet fully working):

git clone https://git.mills.io/prologic/bitcask
cd bitcask
git checkout refactor_trie
cd ..
git clone https://github.com/ostafen/clover
cd clover
go work init
go work use .
go work ../bitcask

@prologic prologic marked this pull request as draft October 23, 2023 13:53
@prologic
Copy link
Author

Ahh I think I've found my first problem. These lines

clover/db.go

Lines 92 to 95 in aa688ad

func (db *DB) hasCollection(name string, tx store.Tx) (bool, error) {
value, err := tx.Get([]byte(getCollectionKey(name)))
return value != nil, err
}
assume that all databases don't return an error for "key not found". Bitcask does, it returns an bitcask.ErrKeyNotFound error a a nil value. We don't assume values that are nil are "not found" Hmmm what to do... 🤔

@prologic
Copy link
Author

Nice, that got things working a little better 🥳

$ bitcask -p test.db dump | jq '. | map_values(@base64d)'
{
  "key": "coll:todos",
  "value": "{\"Size\":0,\"Indexes\":null}"
}

@prologic
Copy link
Author

Most tests pass now, except thi sone:

=== RUN   TestUpdateCollection/bitcask

Which appears to be "handing" hmmm

@ostafen
Copy link
Owner

ostafen commented Oct 23, 2023

Hey, @prologic, first of all thank you for the PR and the interest in clover.
I have one question to you: is your Bitcask storage engine able to support sorted iterations on keys?

@prologic
Copy link
Author

Hey, @prologic, first of all thank you for the PR and the interest in clover. I have one question to you: is your Bitcask storage engine able to support sorted iterations on keys?

Yes it does.

@prologic
Copy link
Author

Once I get this working, are we good to merge this without full transaction support? (which Bitcask has never had support for, until now, which is going to be possible since I'm nearing making a decision to swap out the internal trie implementation that's used)

@ostafen
Copy link
Owner

ostafen commented Oct 23, 2023

Any advantage is using Bitcask against this storage engine https://github.com/nutsdb/nutsdb?
As I understand they are both based on the Bitcask model (nutsdb additionally supports transactions). I would be interested in understanding which one can be better addition to cloverdb

@prologic
Copy link
Author

Based on this comparison of nutsdb vs. others the main advantage of using Bitcask is its use of a trie:

Compared with B+ trees, radix trees have smaller read and write amplifications since they do not store the entire keys in internal nodes

Otherwise I'm not really that familiar with NutsDB myself, and it looks like it was developed around the same time I was developing Bitcask (although I no longer actively use Github to store/collaborate on my projects anymore :/)

I've not done any other types of comparisons either and don't really want to get into "benchmark wars" 🤣 -- As an aside, I've used Bitcaks in many production projects, and it's used a few bit around the place if you look here

@prologic
Copy link
Author

I'm also thinking about and planning to extend Bitcask's functionality a bit to support flushing the keyspace out to disk and using something like SSTables in additional to the WAL+LSM and Radix tree already in use. My hope/goal is to be able to use Bitcask for much larger datasets, where currently the limiting factor is "the entire keyspace has to be held in memory".

@ostafen
Copy link
Owner

ostafen commented Oct 23, 2023

If I can ask, do you need to run clover on top of Bitcask for any specific project/workload type?

@prologic
Copy link
Author

If I can ask, do you need to run clover on top of Bitcask for any specific project/workload type?

I was intending to use it for a new production project (startup). yes. Why's that? 🤔

@ostafen
Copy link
Owner

ostafen commented Oct 23, 2023

BTW, my main concern for Bitcask is its lack for transaction support. Ideally, each storage engine supported by clover should offer same guarantee (for example, all documents should be inserted or modified in a transaction.
If Bitcask can provide transactions then I'm happy to merge it into clover code base, otherwise better option to go is separate repository containing cloverdb-bitcask storage engine which I can link in the README

@ostafen
Copy link
Owner

ostafen commented Oct 23, 2023

I was intending to use it for a new production project (_startup_). yes. Why's that?

Just interested in the usage :=)

@prologic
Copy link
Author

Hmm have a few more tests to figure out why they're failing...

Example:

=== RUN   TestSortWithIndex/bitcask
    db_test.go:1136:
        	Error Trace:	/Users/prologic/Contributions/clover/db_test.go:1136
        	            				/Users/prologic/Contributions/clover/db_test.go:86
        	Error:      	Not equal:
        	            	expected: 4408
        	            	actual  : 0
        	Test:       	TestSortWithIndex/bitcask
--- FAIL: TestSortWithIndex (0.96s)

@prologic
Copy link
Author

BTW, my main concern for Bitcask is its lack for transaction support. Ideally, each storage engine supported by clover should offer same guarantee (for example, all documents should be inserted or modified in a transaction. If Bitcask can provide transactions then I'm happy to merge it into clover code base, otherwise better option to go is separate repository containing cloverdb-bitcask storage engine which I can link in the README

I will likely be adding this support, so we should be all good 👌

@prologic
Copy link
Author

I was intending to use it for a new production project (_startup_). yes. Why's that?

Just interested in the usage :=)

I basically don't want to reinvent my own "document storage" engine 🤣 You seem to have done s nice job of that already 🤣 -- I have this really long standing PR where it adds List, Hash and SortedSet data structure to Bitcask, but honestly no-one (myself included) have really ever bothered using this 😅 So it justs sits there. There is also bitraft which also uses Bitcask internally that once day I hope to spend a bit more time with 🤔

@ostafen
Copy link
Owner

ostafen commented Oct 23, 2023

BTW, before starting clover project, I did an attempt to implement a bitcask based storage engine too: https://github.com/ostafen/eagle (it's mainly experimental). I also used your project as a reference.
I never lost the interest in building a really robust bitcask based storage engine (but I would need definetely more time), so I was thinking that we could collaborate if you like

@prologic
Copy link
Author

BTW, before starting clover project, I did an attempt to implement a bitcask based storage engine too: https://github.com/ostafen/eagle (it's mainly experimental). I also used your project as a reference. I never lost the interest in building a really robust bitcask based storage engine (but I would need definetely more time), so I was thinking that we could collaborate if you like

I would love that ! 😍 I've had many good contributors come and go over the years and many folks love my version of Bitcask 😅 (I do too!) -- It's not perfect, but it works quite well and I use it everywhere. I'd still love to keep improving it, optimizing it and making it one of the best pure-Go KV stores around (although Badger, BBolt and others are pretty good too, but pro/cons 🤷‍♂️)

@ostafen
Copy link
Owner

ostafen commented Oct 23, 2023

If you are interested maybe we can continue our discussion about this privately

@prologic
Copy link
Author

Sure thing!

@Shane-XB-Qian
Copy link
Contributor

Shane-XB-Qian commented Oct 23, 2023 via email

@ostafen
Copy link
Owner

ostafen commented Oct 24, 2023

@prologic: could you share some contact info? Email address?
@Shane-XB-Qian: I guess this page is not definetely the context. But if you are interested, you are welcome to join private discussions too :=)

@prologic
Copy link
Author

On my website and twtxt.net/~prologic 👌

@ostafen
Copy link
Owner

ostafen commented Oct 24, 2023

Cool, sent you an email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants