Skip to content

Commit

Permalink
Merge pull request #86 from CSBiology/developer
Browse files Browse the repository at this point in the history
Update with changes from developer for 1.1.0
  • Loading branch information
kMutagene authored Mar 23, 2020
2 parents af41feb + 23c2435 commit 0091061
Show file tree
Hide file tree
Showing 25 changed files with 16,392 additions and 163 deletions.
1 change: 1 addition & 0 deletions BioFSharp.sln
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "content", "content", "{8E6D
docsrc\content\Obo.fsx = docsrc\content\Obo.fsx
docsrc\content\PetideClassification.fsx = docsrc\content\PetideClassification.fsx
docsrc\content\Readers.fsx = docsrc\content\Readers.fsx
docsrc\content\SOFT.fsx = docsrc\content\SOFT.fsx
docsrc\content\StringMatching.fsx = docsrc\content\StringMatching.fsx
docsrc\content\tutorial.fsx = docsrc\content\tutorial.fsx
docsrc\content\UniProt.fsx = docsrc\content\UniProt.fsx
Expand Down
14 changes: 14 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,17 @@
#### 1.1.0 - Wednesday, March 23, 2020
**Additions:**
* **BioFSharp.BioContainers:**
* Add [fasterq-dump](https://github.com/CSBiology/BioFSharp/commit/425fbb93b41700eeece8f8ab063c9c37b15124bd) and [prefetch](https://github.com/CSBiology/BioFSharp/commit/b08f307f203eea4c2a84cce10f1a72d05453806b) DSL for the SRATools biocontainer
* Add full [STAR](https://github.com/alexdobin/STAR) RNASeq aligner DSL for the respective BioContainer. [Commit details](https://github.com/CSBiology/BioFSharp/commit/d2cbc0a8691564a487d70d9825867e7eb261d03a)
* **BioFSharp.IO:**
* [Add load script for referencing pretty printers](https://github.com/CSBiology/BioFSharp/commit/130e1c63264989978e54f114dbd04b6dfb9458d3), included in the nuget package
* [Add multiple new pretty printers for SOFT](https://github.com/CSBiology/BioFSharp/commit/97cca9bd06f63455ebafbf3cbb8029a0651137cb)

**Bugfixes:**
* **BioFSharp.IO:**
* [Fix GFF3 pretty printer return type](https://github.com/CSBiology/BioFSharp/commit/bcec2cc719eef7e43827521bd281582a8b5ebe72)


#### 1.0.03 - Wednesday, February 26, 2020
* **BioFSharp.Stats:**
* Massively improved SAILENT characterization speed for [preprocessing of large datasets](https://github.com/CSBiology/BioFSharp/pull/82)
Expand Down
98 changes: 96 additions & 2 deletions docsrc/content/FSIPrinters.fsx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,11 @@ a bunch of functions to view our data types in a structured string format. This
visual investigation. To use these printers, use the `fsi.AddPrinter` function to register the desired printer. This will override the default
printing behaviour of the respective type in the FSI.
Currently, the following printers are implemented:
We also provide a `BioFSharp.IO.fsx` convenience script in our nuget packages, that registers all (except modification printers) printers to the FSI.
just `#load` the script and you get all the printing goodness.
However, if you want to selectively register FSI printers,here are the currently implemented ones:
BioItems
--------
Expand Down Expand Up @@ -173,4 +177,94 @@ Console output using `prettyPrintClustal`:
//register the desired printer
fsi.AddPrinter(FSIPrinters.prettyPrintClustal)

(*** include-value:clustalPrnt ***)
(*** include-value:clustalPrnt ***)


(**
SOFT
----
There are 5 printers available:
`prettyPrintGSE` and `prettyPrintGPL` format the top level types `SOFT.Series.GSE` and `SOFT.Platform.GPL` including associated record metadata.
`prettyPrintSampleRecord`,`prettyPrintSeriesRecord`, and `prettyPrintPlatformRecord` format single record metadata.
All SOFT types are very large and hard to read from standard output, especially the nested top level types, so we will omitt the standard output for readability.
*)

//register the desired printers
fsi.AddPrinter(FSIPrinters.prettyPrintGPL)
fsi.AddPrinter(FSIPrinters.prettyPrintGSE)

fsi.AddPrinter(FSIPrinters.prettyPrintSampleRecord)
fsi.AddPrinter(FSIPrinters.prettyPrintSeriesRecord)
fsi.AddPrinter(FSIPrinters.prettyPrintPlatformRecord)

let gplPath = __SOURCE_DIRECTORY__ + "/data/GPL15922_family.soft"

let gpl15922 = SOFT.Platform.fromFile gplPath

(***hide***)
let gplPrnt = FSIPrinters.prettyPrintGPL gpl15922

(**
Console output using `prettyPrintGPL`:
*)

(*** include-value:gplPrnt ***)

let gsePath = __SOURCE_DIRECTORY__ + "/data/GSE71469_family.soft"

let gse71469 = SOFT.Series.fromFile gsePath

(***hide***)
let gsePrnt = FSIPrinters.prettyPrintGSE gse71469

(**
Console output using `prettyPrintGSE`:
*)

(*** include-value:gsePrnt ***)

let smplRecord =
gse71469
|> SOFT.Series.getAssociatedSamples
|> List.item 0

(***hide***)
let smplPrint = FSIPrinters.prettyPrintSampleRecord smplRecord

(**
Console output using `prettyPrintSampleRecord`:
*)

(*** include-value:smplPrint ***)

let seriesRecord =
gpl15922
|> SOFT.Platform.getAssociatedSeries
|> List.item 0

(***hide***)
let seriesPrint = FSIPrinters.prettyPrintSeriesRecord seriesRecord

(**
Console output using `prettyPrintSeriesRecord`:
*)

(*** include-value:seriesPrint ***)


let pltfrmRecord =
gse71469
|> SOFT.Series.getAssociatedPlatforms
|> List.item 0

(***hide***)
let pltfrmPrint = FSIPrinters.prettyPrintPlatformRecord pltfrmRecord

(**
Console output using `prettyPrintSampleRecord`:
*)

(*** include-value:pltfrmPrint ***)
94 changes: 94 additions & 0 deletions docsrc/content/SOFT.fsx
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
(*** hide ***)
// This block of code is omitted in the generated HTML documentation. Use
// it to define helpers that you do not want to show in the documentation.
#I @"../../bin/BioFSharp/net47/"
#I @"../../bin/BioFSharp.BioDB/net45/"
#I @"../../bin/BioFSharp.ImgP/net47"
#I @"../../bin/BioFSharp.IO/net47/"
#I @"../../bin/BioFSharp.Parallel/net47/"
#I @"../../bin/BioFSharp.Stats/net47/"
#I @"../../bin/BioFSharp.Vis/net47/"
#r @"../../lib/Formatting/FSharp.Plotly.dll"
#r "FSharpAux.dll"
#r "FSharpAux.IO.dll"
#r "BioFSharp.dll"
#r "BioFSharp.IO.dll"

open BioFSharp

open FSharpAux

(**
Parsing SOFT formatted family files
===================================
GEO (Gene Expression Omnibus) is a data repository of high-throughput gene expression and hybridization array data. All record metadata are provided in the [soft format](https://www.ncbi.nlm.nih.gov/geo/info/soft.html) ,
which can be parsed and analyzed using the `BioFSharp.IO.SOFT` module.
SOFT types are very large and hard to read from standard FSI output, especially the nested top level types, so we will omitt the standard output for readability and use
the FSI printers provided in `BioFSharp.IO.FSIPrinters`.
*)

open BioFSharp.IO

//register the desired printers
fsi.AddPrinter(FSIPrinters.prettyPrintGPL)
fsi.AddPrinter(FSIPrinters.prettyPrintGSE)

fsi.AddPrinter(FSIPrinters.prettyPrintSampleRecord)
fsi.AddPrinter(FSIPrinters.prettyPrintSeriesRecord)
fsi.AddPrinter(FSIPrinters.prettyPrintPlatformRecord)


(**
Parsing platform (GPL) files
----------------------------
Soft formatted platform family files can be parsed using the `SOFT.Platform.fromFile` function.
As the format (.soft) does not specify the type of record (GSE or GPL), please make sure that you only
parse GPL*.soft files with this functions, as other files may return errors
*)

let testPlatform = SOFT.Platform.fromFile (__SOURCE_DIRECTORY__ + "/data/GPL15922_family.soft")

(**
Parsing series (GSE) files
--------------------------
Soft formatted series family files can be parsed using the `SOFT.Platform.fromFile` function.
As the format (.soft) does not specify the type of record (GSE or GPL), please make sure that you only
parse GSE*.soft files with this functions, as other files may return errors
*)

let testSeries = SOFT.Series.fromFile (__SOURCE_DIRECTORY__ + "/data/GSE71469_family.soft")

(**
Convenience functions
---------------------
We implemented some convenience functions for `SOFT.Platform.GPL` and `SOFT.Series.GSE`.
`Platform.getAssociatedSampleAccessions` for example retrieves all associated sample accessions. This could be usefull for batch downloads of these files.
*)

let sampleAccessions =
testPlatform |> SOFT.Platform.getAssociatedSampleAccessions

(*** include-value:sampleAccessions ***)

(**
This can be especially usefull to retrieve all samples that are associated with this platform (e.g. for example for meta analysis of the files).
The full sample records can also be retrieved, which makes it possible to access even more metadata:
*)

let relations =
testPlatform
|> SOFT.Platform.getAssociatedSamples
|> List.map (fun x -> x.Relation)
//showing only the first 5 relations for ease of view
|> fun x -> x.[0..4]

(*** include-value:relations ***)
Loading

0 comments on commit 0091061

Please sign in to comment.