Skip to content
This repository was archived by the owner on Apr 13, 2022. It is now read-only.

WIP: Replace Array[Byte] by ByteString and Seq[Byte] #221

Closed
wants to merge 1 commit into from

Conversation

ceilican
Copy link
Contributor

@ceilican ceilican commented Apr 7, 2018

Closes #208 .

Only "src/main" done so far.

ToDo:

  • replace in "src/test"
  • replace in "testkit"
  • replace in "examples"

Note that external libraries such as scorex.crypto and iodb require Array[Byte]. This forces as to convert from Seq to Array, which is not nice, especially because, right now, there are dependencies on these external libraries spread everywhere in Scorex. We should think about a better solution for this problem. At the very least, I think we should have a unique component that depends on each of these external libraries.

@catena2w
Copy link
Contributor

@ceilican what is the choice for wrapper? I've created the base project (https://github.com/ScorexFoundation/scorex-util) and what to move some common parts to it. This wrapper is needed in our different projects, including iodb, scrypto, sigmastate-interpreter, etc. that do not have scorex in dependencies, so we should use it in some base class.

@ceilican
Copy link
Contributor Author

@catena2w , I have been replacing Array[Byte] by akka's ByteString, without any wrapper. Unlike Array[Byte], ByteString is a Seq. Therefore, we avoid the bug-prone annoyances of Array[Byte] (e.g. having to use sameElements instead of == and problems with hashcode). ByteString is supposed to be efficient.

The problem I encountered in this PR is that scorex.crypto requires Array[Byte]. And dependencies of scorex.core on scorex.crypto are spread everywhere. That why I started another PR (#261) to isolate the dependency on scorex.crypto to a single file.

That PR (#261) is essentially ready, but @kushti preferred not to merge it yet. I will probably abandon this PR (#221) and start a new PR X on top of #261. PR #221 has accumulated to many conflicts that are hard to solve. It is easier to start a new PR.

In this new PR X, I would modify that single file that interfaces with scorex.crypto in order to convert ByteString to Array[Byte] when calling functions of scorex.crypto. This is, of course, not ideal, but at least scorex.core would not have Arrays anymore...

It would be great if scorex.util used ByteString instead of Array[Byte]. Then, in my PR X I could simply change the interface file and call ByteString-based functions from scorex.util instead of Array-based functions from scorex.crypto.

What do you think?

@catena2w
Copy link
Contributor

I'll discuss with kushti, if everything is ok, we'll extract ByteString it to scorex-utils and will use in scrypto, and other libraries.

@catena2w
Copy link
Contributor

@ceilican I also found that there are akka.protobuf.ByteString and akka.util.ByteString. What kind of ByteString we're going to use and why there are 2 classes there?

@ceilican
Copy link
Contributor Author

@catena2w, I don't know why Akka has two ByteString classes... I'm using akka.util.ByteString in this PR, but that's simply because I didn't know about the other one.

The following may help in understanding the other one:
https://developers.google.com/protocol-buffers/
akka/akka#18738

Also look at:
https://doc.akka.io/japi/akka/2.4.0/akka/protobuf/ByteString.html
https://doc.akka.io/api/akka/2.0/akka/util/ByteString.html

My guess is that akka.protobuf.ByteString is Java oriented for compatibility with Google's Protocol Buffers.

@catena2w
Copy link
Contributor

@ceilican I discussed with Alex - he told that ByteString might be the worst choice for Array[Byte] wrapper. May you consider the best choice, we're waiting for decision to start using it in other projects

@ceilican
Copy link
Contributor Author

@catena2w , that is strange. If I remember correctly, it was @kushti who suggested ByteString.

Also, note that ByteString is not just a wrapper around Array[Byte]. It is an actual data structure for collection of bytes that is supposed to provide efficient operations for manipulating the data (cf. https://en.wikipedia.org/wiki/Rope_(data_structure)). I don't know any other collection data structure with more efficient operations.

Did @kushti say why he thinks it might be the worst option? Does he think that the rope concept (on which akka's ByteString) is implemented is a bad option? Or does he just think that relying on the akka implementation of this concept is a bad option?

Anyway, what is important to use a collection data structure that inherits the Seq[Byte] trait. Then we can use == and hashCode properly, without any wrapping.

scorex.util, scorex.crypto and other projects should have a general public interface. This means that their public methods should take arguments of type Seq[Byte]. In other words, they should be neutral w.r.t. to the data structure that the client uses... Then the client may use ByteString, WrappedArray[Byte] or any other collection of bytes that he wants.

In general, whenever we fave a public function def foo(a: A): B, we want A to be as general as possible and B to be as specific as possible.

@mike-aksarin
Copy link
Contributor

mike-aksarin commented Jul 4, 2018

I agree with Bruno, akka.protobuf.ByteString should be for Google protobuf protocol implementation.

It also uses RopeByteString as they state in docs. I don't think that we should use it.

@mike-aksarin
Copy link
Contributor

mike-aksarin commented Jul 4, 2018

I also don't think that we need Rope here, because we do not perform any insertion, deletion or random access operations. We just need efficient memory storage (like in array) and equality checking operation (like in seq), I believe. Also I think it's nice to have a readable toString operation

@ceilican
Copy link
Contributor Author

ceilican commented Jul 4, 2018

We just need efficient memory storage (like in array) and equality checking operation (like in seq), I believe

Yes. I agree.

If we have to choose one or the other, though, I would prefer to keep a good equality operation (i.e. inheriting Seq) and giving up some efficiency...

@catena2w
Copy link
Contributor

replaced with #290

@catena2w catena2w closed this Jul 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants