Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by parsing directly into struct #44

Open
Boudewijn26 opened this issue Sep 19, 2023 · 1 comment
Open

Improve performance by parsing directly into struct #44

Boudewijn26 opened this issue Sep 19, 2023 · 1 comment

Comments

@Boudewijn26
Copy link
Collaborator

Boudewijn26 commented Sep 19, 2023

In working with this package and looking for performance optimisations one of the possible bottlenecks we identified was that queries were first read into a map, be that FluxRecord or GenericMap after which they'd be converted into the supplied struct through the FromMap trait. While at the moment we wouldn't have a need for optimising this last bit of performance, we thought it might be worthwhile to share the thinking we've done on this so far. Both to get the ball rolling should the need for this last bit of performance arise or for others to pick up if they should feel like it.

For simplicity we'll assume the following query and struct:

from(bucket: "bucket")
|> range(start: -1d)
|> keep(columns: ["_time", "_value", "location"])
struct Row {
  value: f64,
  time: DateTime<FixedOffset>,
  location: String
}

The main goal is for the impl FallibleIterator for QueryTableResult to have a generic Item. The way I'd go about it is by defining a

trait RowBuilder<T: ?Sized> {
  fn column_order(&mut self, columns: Vec<&str>);
  fn next_val(&mut self, val: Value);
  fn build_row(&mut self) -> Result<T, RowBuildError>;
}

enum RowBuildError {
    MissingValues,
    IncorrectValue,
    MissingColumn,
}

and adding a

trait Buildable<T: RowBuilder<Self>> where Self: Sized {
  fn builder() -> T;
}

then the trick would be to define the #[derive(Buildable)] to add to our struct Row. struct QueryTableResult would then contain the impl RowBuilder.

An implementation of RowBuilder for our Row would be:

struct Builder {
    value: Option<f64>, // fields in original struct as Options
    time: Option<DateTime<FixedOffset>>,
    location: Option<String>,
    value_index: usize, // index for each field
    time_index: usize,
    location_index: usize,
    index: usize,
}

impl RowBuilder<Row> for Builder {
    fn column_order(&mut self, columns: Vec<&str>) -> Result<(), RowBuildError> {
        let mut value_set = false;
        let mut time_set = false;
        let mut location_set = false;
        for (idx, col) in columns.iter().enumerate() {
            if *col == "_value" || *col == "value" {
                self.value_index = idx;
                value_set = true;
            }
            if *col == "_time" || *col == "time" {
                self.time_index = idx;
                time_set = true;
            }
            if *col == "_location" || *col == "location" {
                self.location_index = idx;
                location_set = true;
            }
        }
        if value_set && time_set && location_set {
            Ok(())
        } else {
            Err(RowBuildError::MissingColumn)
        }
    }

    fn next_val(&mut self, val: Value) -> Result<(), RowBuildError> {
        if self.index == self.value_index {
            match val {
                Value::Double(d) => self.value = Some(d.0),
                _ => Err(RowBuildError::IncorrectValue)?
            }
        } else if self.index == self.time_index {
            match val {
                Value::TimeRFC(t) => self.time = Some(t),
                _ => Err(RowBuildError::IncorrectValue)?
            }
        } else if self.index == self.location_index {
            match val {
                Value::String(s) => self.location = Some(s),
                _ => Err(RowBuildError::IncorrectValue)?
            }
        }
        self.index += 1;
        Ok(())
    }

    fn build_row(&mut self) -> Result<Row, RowBuildError> {
        let result = match (self.value, self.time, self.location) {
            (Some(value), Some(time), Some(location)) => Ok(Row { value, time, location }),
            _ => Err(RowBuildError::MissingValues),
        };
        self.value = None;
        self.time = None;
        self.location = None;
        self.index = 0;
        result
    }
}

While the implementation of Builder isn't the easiest, I think it wouldn't be that difficult to see where it can be generalised in terms of the struct fields.

To maintain compatibility you'd probably want an impl RowBuilder<FluxRecord> as well. I hope you find this approach constructive.

@Boudewijn26
Copy link
Collaborator Author

On second thought this is pretty similar to what csv and serde are already doing. It'd probably be easier to lean on them instead of going about it alone. Though I do need to do some more reading to fully understand how csv and serde interact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant