View function disassembly or raw instructions? #184

novacrazy · 2020-04-29T14:29:56Z

I'm about to start using Inkwell for a highly-optimized JIT system, and it would be great if there were a way to view the resulting compiled code or even just getting a pointer and length to where the code is, allowing me to read it directly.

I'm aware of the print_to_string/print_to_stderr methods on FunctionValue, but those only seem to print the raw LLVM IR.

Without access to horizontal vector ops, I'm hoping LLVM will be able to autovectorize vector sums and products well enough, but without a way to see the resulting instructions I can't know.

Please let me know if I'm missing something obvious! Also if you have any ideas for autovectorization or horizontal vector ops, I'd love to hear them.

Here is the kind of thing I plan to do:

fn simd_extract<'ctx>(cg: &CodeGen<'ctx>, ty: &types::JITTypes<'ctx>, x: VectorValue<'ctx>, lane: u64) -> FloatValue<'ctx> {
    cg.builder
        .build_extract_element(x, ty.i32_t.const_int(lane, false), &format!("lane_{}", lane))
        .into_float_value()
}

fn build_dot_product<'ctx>(cg: &CodeGen<'ctx>, ty: &types::JITTypes<'ctx>, a: VectorValue<'ctx>, b: VectorValue<'ctx>) -> FloatValue<'ctx> {
    let product = cg.builder.build_float_mul(a, b, "product");
    let x = simd_extract(cg, ty, product, 0);
    let y = simd_extract(cg, ty, product, 1);
    let z = simd_extract(cg, ty, product, 2);
    let w = simd_extract(cg, ty, product, 3);
    let xy = cg.builder.build_float_add(x, y, "xy");
    let xyz = cg.builder.build_float_add(xy, z, "xyz");
    cg.builder.build_float_add(xyz, w, "xyzw")
}

which results in this LLVM IR:

define float @dot_product(<4 x float> %0, <4 x float> %1) {
entry:
  %product = fmul <4 x float> %0, %1
  %lane_0 = extractelement <4 x float> %product, i32 0
  %lane_1 = extractelement <4 x float> %product, i32 1
  %lane_2 = extractelement <4 x float> %product, i32 2
  %lane_3 = extractelement <4 x float> %product, i32 3
  %xy = fadd float %lane_0, %lane_1
  %xyz = fadd float %lane_2, %xy
  %xyzw = fadd float %lane_3, %xyz
  ret float %xyzw
}

which was printed after running the optimization passes shown in the Kaleidoscope demo, which didn't seem to change much. Adding the two "vectorize" passes didn't seem to do anything either.

The text was updated successfully, but these errors were encountered:

novacrazy · 2020-04-29T15:10:11Z

After more research, it seems horizontal ops are a bit weird in general, and even Rust's simd_reduce_add_unordered, which compiles to @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32, results in a simple add/shuffle/extract algorithm. I had been expecting hadd instructions or something.

nlewycky · 2020-04-29T16:53:37Z

hadd is slow on real hardware, it's only useful as a size optimization.

LLVM has the @llvm.experimental.vector.reduce intrinsics that you've identified, which is part of an effort to improve support for horizontal/reduce operations on vectors. Taking a quick look at LLVM 10 source, I don't believe any optimization outputs those intrinsics yet, but you can use them yourself. LLVM exposes native instructions through intrinsics like @llvm.x86.sse3.hadd.ps which are target-specific, if you need them.

I suggest writing the natural code with target-neutral LLVM including the experimental vector reduce intrinsics, and once you get the resulting assembly, see if you can beat it. Previously I'd recommend using Intel Architecture Code Analyzer to analyzer assembly performance, but their webpage is now redirecting to llvm-mca: https://llvm.org/docs/CommandGuide/llvm-mca.html .

novacrazy · 2020-04-29T17:10:02Z

Well, the original point of the issue is still present. How would I go about viewing the generated machine code from Inkwell?

In fact, I'm also not sure how to inject arbitrary LLVM IR other than to create an entirely new module out of it.

TheDan64 · 2020-04-29T22:02:31Z

I'm not certain you can do the former at the moment. For the latter, maybe ~~Module::parse_bitcode_from_buffer~~ Context::create_module_from_ir? I don't think you can just inject IR into the module other than creating it from scratch

novacrazy · 2020-04-30T10:12:48Z

I'll have to experiment with that. Perhaps handwrite a few modules for common ops and rely on link_in_module to combine it with generated code.

As for viewing the assembly, perhaps reinterpreting the raw function pointer in JitFunction as a slice of something and searching for a ret instructions could work to get a range, depending on what the underlying real code is provided by LLVM. I mean, it's still just raw bytes at that point, but it's a start. Nevermind, dumb idea.

novacrazy · 2020-05-02T20:28:20Z

Oh. Of course, this is already available.

Target::initialize_native(&InitializationConfig::default()).expect("Failed to initialize native target");

let triple = TargetMachine::get_default_triple();
let cpu = TargetMachine::get_host_cpu_name().to_string();
let features = TargetMachine::get_host_cpu_features().to_string();

let target = Target::from_triple(&triple).unwrap();
let machine = target
    .create_target_machine(
        &triple,
        &cpu,
        &features,
        OptimizationLevel::Aggressive,
        RelocMode::Default,
        CodeModel::Default,
    )
    .unwrap();
 
    // create a module and do JIT stuff

machine.write_to_file(&module, FileType::Assembly, "out.asm".as_ref()).unwrap();

So yeah, took me a while to find out how, but it does indeed save the whole assembly with labels, attributes and so forth.

It also confirms that it's producing highly-optimized machine code just like I hoped.

However, some better documentation around target machines would be very helpful. Is it stateful? Does it actually affect codegen? Other than exporting that module, it doesn't touch the JIT code, so its affect is unknown.

You're welcome to close this if this solution is acceptable, though my questions still stand.

TheDan64 added this to the 0.1.0 milestone Apr 29, 2020

TheDan64 added the feature request label Apr 29, 2020

TheDan64 modified the milestones: 0.1.0, 0.2.0 Mar 29, 2022

TheDan64 modified the milestones: 0.5.0, 0.6.0 Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

View function disassembly or raw instructions? #184

View function disassembly or raw instructions? #184

novacrazy commented Apr 29, 2020 •

edited

Loading

novacrazy commented Apr 29, 2020

nlewycky commented Apr 29, 2020

novacrazy commented Apr 29, 2020

TheDan64 commented Apr 29, 2020 •

edited

Loading

novacrazy commented Apr 30, 2020 •

edited

Loading

novacrazy commented May 2, 2020 •

edited

Loading

View function disassembly or raw instructions? #184

View function disassembly or raw instructions? #184

Comments

novacrazy commented Apr 29, 2020 • edited Loading

novacrazy commented Apr 29, 2020

nlewycky commented Apr 29, 2020

novacrazy commented Apr 29, 2020

TheDan64 commented Apr 29, 2020 • edited Loading

novacrazy commented Apr 30, 2020 • edited Loading

novacrazy commented May 2, 2020 • edited Loading

novacrazy commented Apr 29, 2020 •

edited

Loading

TheDan64 commented Apr 29, 2020 •

edited

Loading

novacrazy commented Apr 30, 2020 •

edited

Loading

novacrazy commented May 2, 2020 •

edited

Loading