-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand file tokenisation #53
Expand file tokenisation #53
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LibraChris looks good, but i think we should separate the layers a bit cleaner.
- The API in
FileSystem
should provide a function that simply tokenizes a single URI (e.g. signatureSystem.Uri -> Cvparam
) - That weay, the calling function can decide on the type of uris it provides, and we do not need to provide one function for paths and files each here.
- I like the idea of "traversing" the URI segments, and i think all these match cases could be unified to a single one
- These changes definately need some tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after some small naming tweaks and added tests, lgtm
src/ARCTokenization/Tokenization.fs
Outdated
@@ -158,3 +159,40 @@ module Tokenization = | |||
at.Columns | |||
|> Array.map CompositeColumn.tokenize | |||
|> List.ofArray | |||
|
|||
module SpecificTokens = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename SpecificTokens -> ArcFileSystem
src/ARCTokenization/FileSystem.fs
Outdated
|
||
|
||
let private normalisePath (path:string) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use internal
instead of private
src/ARCTokenization/Tokenization.fs
Outdated
| _ -> StructuralOntology.AFSO.``File Path`` |> fun t -> CvParam(t,relativePath) | ||
|
||
/// Gets CvParams based on the root path, file system type, and full path | ||
let getSpecificTokens (rootPath:string) (pType:PType) (path:string) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename getSpecificTokens
-> getArcFileSystemTokens
src/ARCTokenization/Tokenization.fs
Outdated
| Directory | ||
|
||
/// Matches a CvParam based on the relative path and file system type | ||
let matchFileSystemTokenByRelativePath (pType:PType) (relativePath: string) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename matchFileSystemTokenByRelativePath
-> convertRelativePath
Add .gitkeep to arcStructureTests in oder to track folder
@kMutagene |
I would suggest by adding the root path as argument |
- Rework MetadataSheet parsers by using the new Tokens - Add parseProcessGraphColumnsFromToken and parseProcessGraphColumnsFromTokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice how this comes together now.
I have a few naming nitpicks. Also, you should adapt the xml documentation for the new token-based top level parsers. Most annotations for the token arguments are
/// he path to the xlsx file
Which is not true.
@@ -9,54 +9,68 @@ module internal ISA = | |||
|
|||
open System.IO | |||
|
|||
let tryParseMetadataSheetFromToken (isaFileName: string) (isaMdsParsingF: string -> IParam list) (absFileToken: IParam) = | |||
let tryParseMetadataSheetFromToken (rootPath:string) (isaCvTerm: CvTerm) (isaMdsParsingF: string -> IParam list) (absFileToken: IParam) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: rename absFileToken -> relFileToken, since we ar enow tokenizing the farc file system as relative paths by convention
with _ -> | ||
None | ||
else None | ||
|
||
let parseMetadataSheetsFromTokens (isaFileName: string) (isaMdsParsingF: string -> IParam list) (absFileTokens: #IParam seq) = | ||
let parseMetadataSheetsFromTokens (rootPath:string) (isaCvTerm: CvTerm) (isaMdsParsingF: string -> IParam list) (absFileTokens: #IParam seq) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absFileToken -> relFileToken
/// </summary> | ||
/// <param name="rootPath">ARC root path</param> | ||
/// <param name="absFileToken">he path to the study xlsx file</param> | ||
let parseProcessGraphColumnsFromToken (rootPath:string) (absFileToken: IParam) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absFileToken -> relFileToken
/// <param name="rootPath">ARC root path</param> | ||
/// <param name="isaCvTerm">The term to which the token should be found</param> | ||
/// <param name="absFileToken">he path to the study xlsx file</param> | ||
let parseProcessGraphColumnsFromTokens (rootPath:string) (isaCvTerm: CvTerm) (absFileTokens: #IParam seq) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absFileTokens -> relFileTokens
/// </summary> | ||
/// <param name="rootPath">ARC root path</param> | ||
/// <param name="absFileToken">he path to the study xlsx file</param> | ||
static member parseProcessGraphColumnsFromToken (rootPath:string) (absFileToken: IParam) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absFileToken -> relFileToken
/// </summary> | ||
/// <param name="rootPath">ARC root path</param> | ||
/// <param name="absFileToken">he path to the xlsx file</param> | ||
static member parseProcessGraphColumnsFromTokens (rootPath:string) (isaCsvTerm) (absFileTokens: #IParam seq) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absFileTokens -> relFileTokens
/// <param name="rootPath">ARC root path</param> | ||
/// <param name="absFileToken">he path to the study xlsx file</param> | ||
static member parseProcessGraphColumnsFromToken (rootPath:string) (absFileToken: IParam) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absFileToken -> relFileToken
/// </summary> | ||
/// <param name="rootPath">ARC root path</param> | ||
/// <param name="absFileToken">he path to the xlsx file</param> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absFileTokens -> relFileTokens
Adress the requested Changes
| true -> Some (parseProcessGraphColumnsFromToken rootPath token) | ||
| false -> None | ||
) | ||
|> fun x -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these functions should not fail but just return an empty sequence (maybe additionally printing warnings). Imagine if this is used in a validation package. It would lead to the parsing failing, without any validation being performed.
refFileTokens | ||
|> Seq.choose (fun token -> tryParseMetadataSheetFromToken rootPath isaCvTerm isaMdsParsingF token) | ||
|
||
let parseProcessGraphColumnsFromToken (rootPath:string) (refFileToken: IParam) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refFileToken
can you elaborate on that naming? i'd have gone with relFileToken
or relativeFileToken
let parseProcessGraphColumnsFromTokens (rootPath:string) (isaCvTerm: CvTerm) (refFileTokens: #IParam seq) = | ||
refFileTokens | ||
|> Seq.choose (fun token -> | ||
match token.Name = isaCvTerm.Name with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we can do a full match here. There are terms out there with the same name from different ontologies.
/// <param name="rootPath">ARC root path</param> | ||
/// <param name="refFileToken">IParam seq of the ARC Tokens</param> | ||
static member parseProcessGraphColumnsFromTokens (rootPath:string) (refFileTokens: #IParam seq) = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The xml comments still do not help that much.
IParam seq of the ARC Tokens
says nothing to me.
Also
Returns a seq of annotation tables from an IParam seq if the given tokens contains a filepath with the standard assay file name ("isa.assay.xlsx")
is only technically true, as we now match the term. So i would suggest something like this:
Returns a seq of annotation tables from an IParam seq for each contained token that is annotated with the term 'Assay File' (e.g, files in the ARC named
isa.assay.xlsx
).
and for refFileTokens (typo there as well)
a seq of IParams that may contain relevant tokens
) | ||
|> fun x -> | ||
match Seq.length x with | ||
| 0 -> failwith "No token found" | ||
| 0 -> Seq.empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convoluted way of just returning x :D
Expand the tokenisation of files and directories as described in #52.