Release v1.5.4.0
This bigger release adds a number of useful features to trident
, some of them long requested. The highlights are ordered output for forge
, a way to preserve key information if forge
is applied to a singular source package, a new Web-API option to return the content of all available .janno
columns, and better error messages for common trident
issues.
Order forge
output with --ordered
The order of samples in a Poseidon package created with trident forge
depends on the order in which the relevant source packages are discovered by trident
(e.g. when it crawls for packages in the -d
base directories) and then the sample order within these packages. This mechanism did not allow for any convenient way to manually set the output order.
v1.5.4.0 adds a new option --ordered
, which causes trident
to output the resulting package with samples ordered according to the selection in -f
or --forgeFile
. This works through an alternative, slower sample selection algorithm that loops through the list of entities and checks for each entity which samples it adds or removes respectively from the final selection.
For simple, positive selection, packages, groups and samples are added as expected. Negative selection removes samples from the list again. If an entity is selected twice via positive selection, then its first occurrence is considered for the ordering.
Preserve the source package in forge
with --preservePyml
For the specific task of subsetting a singular, existing Poseidon package it can be useful to preserve some fields of the POSEIDON.yml
file of the source package, as well as supplementary information in the README.md
and the CHANGELOG.md
file. These are typically discarded by forge
, but can now be copied over to the output package with the new --preservePyml
output mode. Naturally this only works with a single source package!
--preservePyml
specifically preserves the following POSEIDON.yml
fields:
description
contributor
packageVersion
lastModified
readmeFile
changelogFile
Note that this does not include the package title
, which can be easily set to be identical to the source with -n
or -o
if it is desired. The poseidonVersion
field is also not copied, because trident
can only ever produce output packages with the latest Poseidon schema version.
While implementing this we clearly separated the different forge
output modes (--onlyGeno
, --minimal
, --preservePyml
and the default) and made them mutually exclusive. We did so to avoid an increasingly complex set of interactions between them for the future.
One particular application of --preservePyml
is the reordering of samples in an existing Poseidon package MyPac
with the new --ordered
flag. We suggest the following workflow for this application:
- Generate a
--forgeFile
with the desired order of the samples inMyPac
. This can be done manually or with any suitable tool. Here is an example, where we employqjanno
to generate aforge
selection so that the samples are ordered alphabetically by theirPoseidon_ID
:
qjanno "SELECT '<'||Poseidon_ID||'>' FROM d(MyPac) ORDER BY Poseidon_ID" --raw --noOutHeader > myOrder.txt
- Use
trident forge
with--ordered
and--preservePyml
to create the package with the specified order:
trident forge -d MyPac --forgeFile myOrder.txt -o MyPac2 --ordered --preservePyml
- Apply
trident rectify
to increment the package version number and document the reordering:
trident rectify -d MyPac2 --packageVersion Minor --logText "reordered the samples alphabetically by Poseidon_ID"
MyPac2
then acts as a stand-in replacement for MyPac
that only differs in the order of samples (and maybe the order of variables/fields in the POSEIDON.yml
, .janno
, .ssf
or .bib
files). This workflow is not as convenient as in-place reordering would be -- but much safer.
Request all .janno
columns in list
and the Web-API
trident list --individuals
allows to access per-sample information for Poseidon packages on the command line. With the -j
option arbitrary additional columns from the .janno
files can be appended to the output. Here, for example, the Country
and the Genetic_Sex
columns:
trident list -d 2010_RasmussenNature --individuals -j "Country" -j "Genetic_Sex"
.------------.---------------------.----------------------.----------------.-----------.-----------.-------------.
| Individual | Group | Package | PackageVersion | Is Latest | Country | Genetic_Sex |
:============:=====================:======================:================:===========:===========:=============:
| Inuk.SG | Greenland_Saqqaq.SG | 2010_RasmussenNature | 2.1.1 | True | Greenland | M |
'------------'---------------------'----------------------'----------------'-----------'-----------'-------------'
v1.5.4.0 adds a --fullJanno
flag to request all columns at once, without having to list them individually with many -j
arguments.
This convenience feature was also added to the Web-API, where it can be triggered with ?additionalJannoColumns=ALL
on the /individuals
endpoint:
https://server.poseidon-adna.org/individuals?additionalJannoColumns=ALL
Better error messages
In previous trident
versions some common error messages were not well rendered on the command line. This concerned particularly errors when parsing command line input, the POSEIDON.yml
file or genotype data. We applied multiple changes here to improve the cli output.
The behaviour of the global trident
option --errLength
was also changed. It now only truncates genotype data-related messages, but does so as well if these are raised on the [Warning]
log level. This should make the previously often illegible trident
output upon broken genotype data more readable.