-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimised the classes
field of EGraph
to avoid extra hashing
#284
Conversation
Interesting idea, seems like it's probably faster but I'd want to make sure before merging something like this with non-trivial complexity. You can run the built-in test suite and control various knobs as seen in the test.rs file. Something like EGG_BENCH_CSV=out.csv cargo test --release --test=math -- --test-threads=1 You need test threads to be 1 to avoid concurrent writes to the CSV. You can do the same for the Or you could try the e-matching benchmark suites: EGG_ITER_LIMIT=5 cargo test --release -- --nocapture bench Try tuning that limit up to make a more "significant" benchmark run. |
Ok I tried to do some benchmarking. Since the performance was quite variable between runs I made a script to run the benchmark 20 times and summarize the results. use polars::prelude::*;
use std::fs::File;
use std::io::{BufWriter, Write};
use tempdir::TempDir;
fn quantile(e: Expr, n: f32) -> Expr {
e.quantile(lit(n), QuantileInterpolOptions::Nearest)
}
fn main() {
let samples = std::env::args().nth(1).unwrap().parse().unwrap();
let args = [
"test",
"--release",
"--test=math",
"--test=lambda",
"--",
"--test-threads=1",
];
let tmp_dir = TempDir::new("temp").unwrap();
let tmp_path = tmp_dir.path().join("out.csv");
let _ = File::create(&tmp_path).unwrap();
let mut command = std::process::Command::new("cargo");
command.args(args).env("EGG_BENCH_CSV", &tmp_path);
for i in 0..samples {
command.output().unwrap();
print!("{i} ");
std::io::stdout().flush().unwrap();
}
std::env::set_var("POLARS_FMT_MAX_COLS", "-1");
std::env::set_var("POLARS_FMT_MAX_ROWS", "-1");
let df = LazyCsvReader::new(tmp_path)
.has_header(false)
.finish()
.unwrap();
let df = df.rename(["column_1", "column_2"], ["test", "time"]);
let df = df.group_by([col("test")]).agg([
col("time").mean().alias("mean"),
col("time").median().alias("median"),
quantile(col("time"), 0.25).alias(".25"),
quantile(col("time"), 0.75).alias(".75"),
col("time").count().alias("count"),
]);
let df = df.sort("test", SortOptions::default());
let out_path = std::env::args().nth(2).unwrap();
let mut writer = BufWriter::new(File::create(out_path).unwrap());
write!(writer, "{}", df.collect().unwrap()).unwrap()
} I alternated running the repo before and after the changes and here are the results: Before 1shape: (28, 6)
Before 2
After 1
After2
I'm not sure if you have a more stable way of benchmarking? If might also be interesting to benchmark under PGO or BOLT to see if they affect one version more that another. I also changed the implementation of pub(super) fn find_mut_full(&mut self, mut current: Id) -> (Id, ClassId) {
let canon = self.find(current);
loop {
match self.parent(current) {
UnionFindElt::Parent(parent) => {
self.set_parent(current, canon);
current = parent;
}
UnionFindElt::Root(cid) => {
debug_assert!(current == canon);
return (current, cid);
}
}
}
} since one of the advantages of the path halving strategy was that the parent of a root was itself which isn't true anymore. A more similar strategy would be to do: pub(super) fn find_mut_full(&mut self, mut current: Id) -> (Id, ClassId) {
loop {
match self.parent(current) {
UnionFindElt::Parent(parent) => {
match self.parent(parent) {
UnionFindElt::Parent(grand_parent) => {
self.set_parent(current, grand_parent);
current = grand_parent
}
UnionFindElt::Root(cid) => return (parent, cid),
}
},
UnionFindElt::Root(cid) => return (current, cid),
}
};
} Another strategy I thought of was pub(super) fn find_mut_full(&mut self, mut current: Id) -> (Id, ClassId) {
let mut count = 0u32;
let mut old_current = current;
let (canon, cid) = loop {
match self.parent(current) {
UnionFindElt::Parent(parent) => {
count += 1;
current = parent
},
UnionFindElt::Root(cid) => break (current, cid),
}
};
while count > 1 {
let next = self.parents[usize::from(old_current)].as_parent();
count-=1;
self.set_parent(old_current, canon);
old_current = next;
}
(canon, cid)
} With the idea being that most of the time the second loop won't be needed I don't think my benchmarking is precise enough to tell the difference, but if anyone wants to try comparing these they can. |
Nice work! Yes, this is sort of what I expected. I think the cost of this change isn't worth the complexity, since performance is so dominated by other factors. So I think we should close this for now unless/until a need is demonstrated. |
I noticed that the
classes
field of anEGraph
is currently represented using aHashMap<Id, EClass>
(or possibly anIndexMap
). Since the set of possibleId
s form a continuous range and theunion_find
field already contains an element for eachId
, I thought I could try to use it like an "index" for anIndexMap
, this works especially well since it currently maps non-canonicalId
s to there parent, and we would need it to map canonicalId
s to the index into the class list, so each element of theunion_find
would only need to have an index into another list and a tag of whether or not it's canonical. I decided to use one bit of a u32 as the tag so that theunion_find
would stay the same size.Hopefully, this will improve performance (do you have a setup for reliably benchmarking two commits and comparing them?)
The main downside to this change is that
EGraph
s can now only support 2^31Id
s instead of 2^32 and that we now rely more on theEClass
sid
field so if a user changes it (this is possible since it is public) it would likely break theEGraph
(if it didn't already).