Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View function disassembly or raw instructions? #184

Open
novacrazy opened this issue Apr 29, 2020 · 6 comments
Open

View function disassembly or raw instructions? #184

novacrazy opened this issue Apr 29, 2020 · 6 comments
Milestone

Comments

@novacrazy
Copy link

novacrazy commented Apr 29, 2020

I'm about to start using Inkwell for a highly-optimized JIT system, and it would be great if there were a way to view the resulting compiled code or even just getting a pointer and length to where the code is, allowing me to read it directly.

I'm aware of the print_to_string/print_to_stderr methods on FunctionValue, but those only seem to print the raw LLVM IR.

Without access to horizontal vector ops, I'm hoping LLVM will be able to autovectorize vector sums and products well enough, but without a way to see the resulting instructions I can't know.

Please let me know if I'm missing something obvious! Also if you have any ideas for autovectorization or horizontal vector ops, I'd love to hear them.

Here is the kind of thing I plan to do:

fn simd_extract<'ctx>(cg: &CodeGen<'ctx>, ty: &types::JITTypes<'ctx>, x: VectorValue<'ctx>, lane: u64) -> FloatValue<'ctx> {
    cg.builder
        .build_extract_element(x, ty.i32_t.const_int(lane, false), &format!("lane_{}", lane))
        .into_float_value()
}

fn build_dot_product<'ctx>(cg: &CodeGen<'ctx>, ty: &types::JITTypes<'ctx>, a: VectorValue<'ctx>, b: VectorValue<'ctx>) -> FloatValue<'ctx> {
    let product = cg.builder.build_float_mul(a, b, "product");
    let x = simd_extract(cg, ty, product, 0);
    let y = simd_extract(cg, ty, product, 1);
    let z = simd_extract(cg, ty, product, 2);
    let w = simd_extract(cg, ty, product, 3);
    let xy = cg.builder.build_float_add(x, y, "xy");
    let xyz = cg.builder.build_float_add(xy, z, "xyz");
    cg.builder.build_float_add(xyz, w, "xyzw")
}

which results in this LLVM IR:

define float @dot_product(<4 x float> %0, <4 x float> %1) {
entry:
  %product = fmul <4 x float> %0, %1
  %lane_0 = extractelement <4 x float> %product, i32 0
  %lane_1 = extractelement <4 x float> %product, i32 1
  %lane_2 = extractelement <4 x float> %product, i32 2
  %lane_3 = extractelement <4 x float> %product, i32 3
  %xy = fadd float %lane_0, %lane_1
  %xyz = fadd float %lane_2, %xy
  %xyzw = fadd float %lane_3, %xyz
  ret float %xyzw
}

which was printed after running the optimization passes shown in the Kaleidoscope demo, which didn't seem to change much. Adding the two "vectorize" passes didn't seem to do anything either.

@novacrazy
Copy link
Author

After more research, it seems horizontal ops are a bit weird in general, and even Rust's simd_reduce_add_unordered, which compiles to @llvm.experimental.vector.reduce.v2.fadd.f32.v4f32, results in a simple add/shuffle/extract algorithm. I had been expecting hadd instructions or something.

@nlewycky
Copy link
Contributor

hadd is slow on real hardware, it's only useful as a size optimization.

LLVM has the @llvm.experimental.vector.reduce intrinsics that you've identified, which is part of an effort to improve support for horizontal/reduce operations on vectors. Taking a quick look at LLVM 10 source, I don't believe any optimization outputs those intrinsics yet, but you can use them yourself. LLVM exposes native instructions through intrinsics like @llvm.x86.sse3.hadd.ps which are target-specific, if you need them.

I suggest writing the natural code with target-neutral LLVM including the experimental vector reduce intrinsics, and once you get the resulting assembly, see if you can beat it. Previously I'd recommend using Intel Architecture Code Analyzer to analyzer assembly performance, but their webpage is now redirecting to llvm-mca: https://llvm.org/docs/CommandGuide/llvm-mca.html .

@novacrazy
Copy link
Author

Well, the original point of the issue is still present. How would I go about viewing the generated machine code from Inkwell?

In fact, I'm also not sure how to inject arbitrary LLVM IR other than to create an entirely new module out of it.

@TheDan64 TheDan64 added this to the 0.1.0 milestone Apr 29, 2020
@TheDan64
Copy link
Owner

TheDan64 commented Apr 29, 2020

I'm not certain you can do the former at the moment. For the latter, maybe Module::parse_bitcode_from_buffer Context::create_module_from_ir? I don't think you can just inject IR into the module other than creating it from scratch

@novacrazy
Copy link
Author

novacrazy commented Apr 30, 2020

I'll have to experiment with that. Perhaps handwrite a few modules for common ops and rely on link_in_module to combine it with generated code.

As for viewing the assembly, perhaps reinterpreting the raw function pointer in JitFunction as a slice of something and searching for a ret instructions could work to get a range, depending on what the underlying real code is provided by LLVM. I mean, it's still just raw bytes at that point, but it's a start. Nevermind, dumb idea.

@novacrazy
Copy link
Author

novacrazy commented May 2, 2020

Oh. Of course, this is already available.

Target::initialize_native(&InitializationConfig::default()).expect("Failed to initialize native target");

let triple = TargetMachine::get_default_triple();
let cpu = TargetMachine::get_host_cpu_name().to_string();
let features = TargetMachine::get_host_cpu_features().to_string();

let target = Target::from_triple(&triple).unwrap();
let machine = target
    .create_target_machine(
        &triple,
        &cpu,
        &features,
        OptimizationLevel::Aggressive,
        RelocMode::Default,
        CodeModel::Default,
    )
    .unwrap();
 
    // create a module and do JIT stuff

machine.write_to_file(&module, FileType::Assembly, "out.asm".as_ref()).unwrap();

So yeah, took me a while to find out how, but it does indeed save the whole assembly with labels, attributes and so forth.

It also confirms that it's producing highly-optimized machine code just like I hoped.

However, some better documentation around target machines would be very helpful. Is it stateful? Does it actually affect codegen? Other than exporting that module, it doesn't touch the JIT code, so its affect is unknown.

You're welcome to close this if this solution is acceptable, though my questions still stand.

@TheDan64 TheDan64 modified the milestones: 0.1.0, 0.2.0 Mar 29, 2022
@TheDan64 TheDan64 modified the milestones: 0.5.0, 0.6.0 Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants