-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathistex_teeft_en.xml
87 lines (63 loc) · 2.39 KB
/
istex_teeft_en.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
<tool id="istex-teeft" name="Generation of a TEEFT “doc × term” file" version="0.5.1">
<description>from an ISTEX id file</description>
<requirements>
<container type="docker">visatm/istex-metadata</container>
</requirements>
<command><![CDATA[
IstexMetadata.pl -i "$input" -t "$docterm"
]]></command>
<inputs>
<param name="input" type="data" format="txt" label="ISTEX id file" />
</inputs>
<outputs>
<data format="tabular" label="TEEFT “doc × term” from ${on_string}" name="docterm" />
</outputs>
<tests>
<test>
<param name="input" value="istexCorpus.txt" />
<output name="metadata" file="istexTeeft.txt" />
</test>
</tests>
<help><![CDATA[
From a list of ISTEX id in the input file, this tool extracts the *TEEFT* descriptors (~ keywords) from
corresponding documents and produce a *“doc × term”* file.
-----
**Options**
This programme has just one **mandatory** argument
+ **ISTEX id filename**
-----
**Input file**
The input file **must** contain a line with the mention **ISTEX** between square brackets
followed by a list of id, one by line preceded by the id type.
Whatever information is present above the mention **[ISTEX]** will be ignored. Each id
may be followed by a pound sign ('#') and a filename.
Example:
::
[ISTEX]
ark ark:/67375/0T8-VC3W50FL-B # test001
ark ark:/67375/6H6-ZNZ982C4-N # test002
ark ark:/67375/6H6-4NH81FDN-R # test003
ark ark:/67375/6H6-1FSRLFB6-Q # test004
ark ark:/67375/WNG-RHR3302C-7 # test005
ark ark:/67375/WNG-R0P0D6BZ-L # test006
-----
**“Doc × term” file**
The “doc × term” file contains a list of “filename — descriptor” separated by a tab, with one
couple per line. If the ISTEX id file does not contain filenames, then the programme will generate
a name of the form “f_nnn_” where *nnn* is a sequential number.
Example:
::
test002 absolute value
test002 behaviour
test002 birkhoff theorem
test002 boundary conditions
test002 cubic interaction
test002 curvature
test002 different energies
test002 disordered regions
test002 geometrical structure
test002 harmonic behaviour
test002 harmonicity threshold
test002 high energy
]]></help>
</tool>