Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save JSON/CSV manifests automatically #444

Open
tavinathanson opened this issue Mar 27, 2017 · 1 comment
Open

Save JSON/CSV manifests automatically #444

tavinathanson opened this issue Mar 27, 2017 · 1 comment

Comments

@tavinathanson
Copy link
Member

From @tavinathanson:

confused about the CSV in hammerlab/epidisco#149 vs. the JSON output, and which of those (or both?) we'd want to have for arbitary pipelines?

From @smondet:

@ihodes once we figure out exactly what we want we should bake saving+jsonoutput (+csv?) directly into the To_workflow compiler.
my understanding is that JSON is a superset of the CSV information

Also relates to discohorts. From @tavinathanson:

discohorts will rely on more manual determination of file paths w/out a manifest, but will try to make it easy to switch over to using the manifest when available

@ihodes
Copy link
Member

ihodes commented Mar 28, 2017

I've added this to pipelines in an ad-hoc manner (e.g. for @timodonnell below)

(** We want to extend the compiler to handle a new function,
    `write_csv_manifest`, so we define the new signature that the compiler must
    have. *)
module type Semantics = sig
  include Biokepi.EDSL.Semantics

  val write_csv_manifest :
    normal:[ `Bam ] repr ->
    tumor:[ `Bam ] repr ->
    vcfs:(string * [ `Vcf ] repr) list ->
    string -> unit
end

(** Here we add the function itself for the `To_workflow` compiler (the compiler
    which handles the transformation from the eDSL to actual Ketrew workflow
    nodes). All we're adding is a function which outputs a CSV locally when the
    workflow is compiled. *)
module To_workflow
    (Config : Biokepi.EDSL.Compile.To_workflow.Compiler_configuration) =
struct
  include Biokepi.EDSL.Compile.To_workflow.Make(Config)

  let write_csv_manifest ~normal ~tumor ~vcfs out
    =
    let csv =
      let module F = Biokepi.EDSL.Compile.To_workflow.File_type_specification in
      let header = ["name"; "path"] in
      let vcfs =
        List.map ~f:(fun (name, n) -> [name; (F.get_vcf n)#product#path]) vcfs
      in
      [
        header;
        ["normal";  (F.get_bam normal)#product#path];
        ["tumor";  (F.get_bam tumor)#product#path];
      ] @ vcfs
    in
    let outc = Csv.to_channel (open_out out) in
    Csv.output_all outc csv
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants