Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BITFIELD and BITFIELD_RO feature #2107

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

BITFIELD and BITFIELD_RO feature #2107

wants to merge 18 commits into from

Conversation

slorello89
Copy link
Collaborator

@slorello89 slorello89 commented Apr 19, 2022

Adding BITFIELD and BITFIELD_RO support as part of #2055.

Couple notes about this command

  1. It's variadic in the allowable sub-commands, so I provided a single command for each sub-command, and then split out the sub-commands into their own structures and allowed you to arbitrarily create an array of sub-commands you want to execute.
  2. The literal return-type of single sub-commands are an array of integers, however I abstracted that away and just have them yield a single 64 bit signed integer, which meant some updates to handle multi-bulk return-types in the single-value processor for long and long?
  3. This is smart enough to switch between BITFIELD and BITFIELD_RO if the RO command is available and it determines that the all the commands are read-only. My question though, is there a better way to check for BITFIELD_RO availability, I just do a check against the version for 6.2.0 which is when BITFIELD_RO was added.
  4. The BITFIELD command's encodings top out at i64/u63, so an i64 should be a suitable return type regardless of what the encoding type is.

/// <summary>
/// Represents a Bitfield GET, which returns the number stored in the specified offset of a bitfield at the given encoding.
/// </summary>
public sealed class BitfieldGet : BitfieldSubCommand
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach is a bit heavy here - we're doing a lot of class creation on what seems like should be structs local to the command generation itself and not public API surface area. I'm happy to yank this local and try other approaches - will try and play some tonight.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interested to see what you come up with, this seemed liked best way to me. It's a weird command because of how variadic and structured it is. There's other commands where you just pass in arrays of literals and you can work everything out from that, but in this case you're really expecting things in a very specific order. Kind of tricky to design an API around, might be why it's been around for 6 years without making it into the library. I will tell you though I give a talk on bit-operations and the inability to use bitfield natively has been a real bummer 😆 so I have a bit of skin in this one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is: these are Get*Message methods in RedisDatabase like others - they are just specific overloads. I don't know why we'd need classes on these (maybe should change the implementation on the other PR already in as well). This approach allocates another object to make the message as well as exposes them on the public API but as far as I can tell they don't need to be public. I think we can simplify these subcommands to some of those methods and remove the extra classes/allocations here overall.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for the individual commands yes, but the command itself is weirdly variadic (you can execute an arbitrary number of subcommand, it's almost like a script), so you need some structure to maintain them all in, which is the thinking behind having them all broken out into different classes with an abstract driver. The alternative would be to just expose the single-commands and just let folks batch/pipeline them, but it sort of breaks the command's API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I poked at this for a bit and then git just royally screwed me with a UI change recently on main vs. current branch merging in my tool and lost everything so I'm hands off tonight. Basically I had BitfieldSubCommand -> IBitfieldSubCommand, internal members on that interface still, explicitly implemented on members, and all the API changes that entails. However, I'm not sure this is a good path or not - I don't like the amount of allocations we're doing for what is intended to be, on the Redis side, a very efficient/optimized op. Generally our audiences who want this functionality will also care about the cost (I know I would).

Overall, this one may wait, would like to talk it over with @mgravell on surface area and I'm supposed to be out this week. Given git giving me signs, gonna step away for now :)

@NickCraver
Copy link
Collaborator

@slorello89 Haven't forgotten about this, just also unlikely to get so much time-wise here before trip next week. Overall want I want to try is instead of an array feeding into the method, we can have each operation as a struct (internal, not exposed). Then using a builder model (actually on the API) we can build up the command to pass in. Ultimately having a linked list of structs, each with a ref to the next via internal interface or something (need to play - just spitballing).

Then all we're creating is a single object for the command - it could even be based on Message and control the writing internally. Given the audience for this almost certainly cares about performance I'd like to make it as efficient as reasonably possible but need a few hours to hack at that idea. If you wanna give it a stab by all means! Just didn't want this hanging without an update :)

@slorello89
Copy link
Collaborator Author

Hey @NickCraver - no worries - I'm aware you were on break (I feel bad you were taking any time out at all to look at this stuff).

Just so I'm clear what you're looking for. For the variadic command, you want to convert the command classes -> structs, remove them from the public API, and use a builder to create a linked list of those structs under the hood and use the builder in the public API:

long?[] StringBitfield(RedisKey key, BitfieldCommandBuilder builder, CommandFlags flags = CommandFlags.None);

Couple other questions:

  1. Are you ok with the non-variadic commands? Internally they would be converted to use the structs rather than the classes
  2. Are you ok with the Signedness & BitfieldOverflowHandling enums?

@NickCraver
Copy link
Collaborator

Yeah that looks roughly what I'm guessing at - I think the enums themselves are fine (given they'll be args somewhere either way) - can revisit naming in the end, don't need to get it perfect to figure out the API surface here!

@slorello89
Copy link
Collaborator Author

@NickCraver - what do you think of this?

I blasted out the classes/structs for the sub-commands and let the builder stand on its own. There didn't seem that much sense to have the individual structures broken out with the builder in place, the non-variadic ones could have used them I suppose but then that's an awful lot of repeated code and it seemed a bit redundant to have the builder initialize a bunch of structs when it could just maintain the values independently.

It maintains a LinkedList of the values that you're going to pass into Redis and the Build message builds the command message.

Inside the write for the message, it IS using a foreach, so it's incurring a bit of extra cost to allocate the enumerator, perhaps the better approach is to just use the CommandKeyValuesMessage and the LinkedList of arguments to an array when initializing the message (those writes are being done in one of the more performance-sensitive areas right?)

@NickCraver NickCraver self-assigned this May 10, 2022
@NickCraver
Copy link
Collaborator

Playing with the builder in https://github.com/StackExchange/StackExchange.Redis/tree/craver/bitfield-exp - I think after trying this out having all the newed up struct is odd and I think we can mostly make them an implementation detail instead. Will poke more tomorrow I hope!

@shacharPash
Copy link
Contributor

@NickCraver
So what is actually missing here?
I would be happy to complete what is missing so that this PR can be merged

@NickCraver
Copy link
Collaborator

@slorello89 I don't think there's something missing so much as the current API is extremely heavy and we never got back to designing what this should look like in .NET. I don't have an ETA on getting back to it currently.

Copy link
Collaborator

@mgravell mgravell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

has potential - lots of thoughts; sorry this fell into a hole

/// </summary>
public class BitfieldCommandBuilder
{
private readonly LinkedList<RedisValue> _args = new LinkedList<RedisValue>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linked list is usually suboptimal; honestly I think we should default to List-T here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Been a little while since I put this together so my recollection might be a bit off - IIRC my thinking was:

  1. We don't know the size of the array ahead of time - hence we can't really initialize a List without knowing it won't need to be resized.
  2. LinkedLists allow O(1) tail insertion
  3. We only need to enumerate when sending it, hence the O(N) scan is what you'd expect anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though now that I think of it, I suppose you are performing allocation each time you perform the addlast 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frankly, linked-list just barely gets used - I'd sooner use a List<T>, but again I wonder whether this is actually a List<SomeUnionStructThatIsGetSetAndIncrby>, so each element in the list is not an argument but a logical operation - this might also make it much easier to do very efficient single-shot operations, which I expect to be relatively common. Let me have a think here - I like the ideas in this PR, but I think we can iterate the API a little.

/// <param name="offset">The offset into the bitfield to increment.</param>
/// <param name="increment">The value to increment by.</param>
/// <param name="overflowHandling">How overflows will be handled when incrementing.</param>
public BitfieldCommandBuilder Incrby(BitfieldEncoding encoding, BitfieldOffset offset, long increment, BitfieldOverflowHandling overflowHandling = BitfieldOverflowHandling.Wrap)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be Increment for consistency with StringIncrement

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

/// <param name="offset">The offset into the bitfield for the subcommand.</param>
public BitfieldCommandBuilder Get(BitfieldEncoding encoding, BitfieldOffset offset)
{
_eligibleForReadOnly = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

surely we can only ever set this to false, with it starting true? if we Get(...).Set(...).Get(...) we're not eligible for readonly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, definitely.


internal class BitfieldCommandMessage : Message
{
private readonly LinkedList<RedisValue> _args;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto List, although I think if possible we should skip an alloc here and go straight to an array; for now I'd settle for List-T, though

/// <summary>
/// The encoding that a sub-command should use. This is either a signed or unsigned integer of a specified length.
/// </summary>
public readonly struct BitfieldEncoding
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have full struct equality impl - IEquatable, override GetHashCode, override Equals (via => obj is BitFieldEncoding other && Equals(other);

/// An offset into a bitfield. This is either a literal offset (number of bits from the beginning of the bitfield) or an
/// encoding based offset, based off the encoding of the sub-command.
/// </summary>
public readonly struct BitfieldOffset
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto struct equality

/// <summary>
/// Returns the BitfieldOffset as a RedisValue.
/// </summary>
internal RedisValue RedisValue => $"{(ByEncoding ? "#" : string.Empty)}{Offset}";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm; can't cache this one - I wonder if RedisValue is hampering us here, and we should be using a custom internal struct with a custom writer.... meh, leave it like this for now, we can change that later - let's just get it correct for now

/// </summary>
/// <param name="byEncoding">Whether or not the BitfieldOffset will work off of the sub-commands integer encoding.</param>
/// <param name="offset">The number of either bits or encoded integers to offset into the bitfield.</param>
public BitfieldOffset(bool byEncoding, long offset)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should be (long offset, long byEncoding = true), perhaps even with an implicit conversion operator that does the same

public BitfieldOffset(bool byEncoding, long offset)
{
ByEncoding = byEncoding;
Offset = offset;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have to be non-negative?

/// <summary>
/// Returns the BitfieldOffset as a RedisValue.
/// </summary>
internal RedisValue RedisValue => $"{(ByEncoding ? "#" : string.Empty)}{Offset}";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is unnecessarily allocatey in the raw offset case; should be IMO: => ByEncoding ? new($"#{Offset}") : new(Offset);

@mgravell
Copy link
Collaborator

added thoughts, but overall I agree that the API looks weird with the custom structs; I wonder if we can hide those and just take more args on the methods; @NickCraver thoughts on how we resurrect this?

@slorello89
Copy link
Collaborator Author

@mgravell - merged main and pushed up.

With structs vs extra args, I guess my main thought was it would be a fair number of args - SET would go from 3->5, INCRBY would go from 3/4->5/6, GET would go from 2->4. So it's right around the threshold where I'd consider introducing new structures. But to your and Nick's point, it is a fair amount of allocation for something that is meant to be quite lightweight.

So think just nix those structures, move to LinkedList->List, some minor name changes?

@mgravell
Copy link
Collaborator

mgravell commented Nov 28, 2023

I want to play with an alternative API before we get too carried away. This feels pretty complex at the moment. What I have in mind is something like:

public readonly struct BitfieldOperation
{
  long offset, value
  SomeInternalEnum : byte opType
  byte encoding // low 6 == width
  // high 2 == by bit/enc, signed/unsigned
  public static BitfieldOperation Get(long offset,
      byte width, bool unsigned = false, offsetByBit = false)
    public static BitfieldOperation Set(long offset,
      byte width, bool unsigned = false, offsetByBit = false)
   public static BitfieldOperation Increment(long offset,
      byte width, BitfieldOverflow overflow = /* Todo */, bool unsigned = false, offsetByBit = false)

  // Other bits not shown
}

With the main bitfield methods taking either a single BitfieldOperation or an array.

Internally, we can check the BitfieldOperation if the operand(s) to compute RO. No need for the builder, list, or packs of RedisValue - we should also be able to have each operation say "I contribute this many args", so we can do efficient write

But more importantly, it allows simple usage at the call site, i.e.

dB.Bitfield(key, [ BitfieldOperation.Get(...), BitfieldOperation.Set(...) ])

Thoughts?

@slorello89
Copy link
Collaborator Author

Seems pretty sensible, I think my only thing would be to initialize the offset/encoding as their proper RedisValues in Get Set Increment (can toss validation errors there) WDYT?

@mgravell
Copy link
Collaborator

I think my only thing would be to initialize the offset/encoding as their proper RedisValues

It seems to me likely that we can avoid that entirely - ultimately we don't need a RedisValue here, IMO

@slorello89
Copy link
Collaborator Author

@mgravell - Isn't it going to be implicitly cast to a bulk string when it's written to the socket anyway?

@mgravell
Copy link
Collaborator

mgravell commented Dec 1, 2023

@slorello89 yes and no; we have power to do anything we like there - we could, for example, stackalloc a small chunk, write the prefix+value to that, and write that combination to the output, zero allocs. Or possibly zero copy. It doesn't need to be a string, necessarily

@slorello89
Copy link
Collaborator Author

@mgravell - I updated the API per your comments. An additional thought:

I find these signatures a bit weird: BitfieldOperation Set(long offset, byte width, long value, bool offsetByBit = true, bool unsigned = false) - because it disconnects the offset/encoding parts of the command. A more 'sensible' one to me would be BitfieldOperation Set(long offset, bool offsetByBit, byte width, bool unsigned, long value) - I know in this case they'd no longer be optional parameters, but should they be? Idk just a thought.

@slorello89
Copy link
Collaborator Author

@mgravell that's fair - but is there anywhere currently that's being done (in the context of writing out to the pipeline?)

@NickCraver
Copy link
Collaborator

@slorello89 It took me a while to get over the array allocations on this for a lightweight API until I realized: "users are probably calling it the same way over and over"...in which case with proper usage that shouldn't be a problem, we just need examples of where they're re-using the same array passed in every time. It's really hard to judge the optional params because there's zero usage of this today to go by.

Is there any chance you have examples from other projects, that are calling this API set and what they're usually doing? I'd argue: if they're used most of the time, we can just not make them optional at all (we can even intelligently pack the struct better).

This and other things has made me realize just how much we're lacking a samples section (keeps coming up with logging and event handlers), that would be equally helpful in evaluating PR/API additions too. Not asking for that here, just saying: top of mind and likely a good prototype approach once that's in place.

@slorello89
Copy link
Collaborator Author

@NickCraver - looking across the client ecosystem it looks like there are a couple of approaches to this API:

  1. Just send a bunch of strings and figure it out yourself e.g. jedis, go-redis
  2. Builder pattern with arbitrary arguments. e.g. redis-py
  3. Strongly-typed builder pattern e.g. lettuce or redisson

I can't say I've seen many usages of this command in the wild, and I've only ever heard a few people ask for the command in .NET (I've pointed them to the ad-hoc API) - but I suspect people tend to lean on the builder patterns where available.

Examples are always nice, we are working on getting examples for us to use in code snippets in redis.io across the language communities (the issue that got the ball rolling on this again is part of that effort). See an example of these snippets here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants