Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Old files archive are not correct #58

Open
SantaMcCloud opened this issue Sep 7, 2024 · 2 comments
Open

Old files archive are not correct #58

SantaMcCloud opened this issue Sep 7, 2024 · 2 comments

Comments

@SantaMcCloud
Copy link

Hello,

sorry for writing the issue here, since I didn't find an email to contact any of the CAMI staff. I'm currently working on my bachelor thesis which including building a workflow on the web server https://usegalaxy.eu/ which serve a lot of different tools in the bioinformatic fields. Since amber is up there now, I need some benchmarks to test the workflow and I did discover that you are providing the old archive like cami low or mouse gut toy etc. I did work with the cami low and the mouse gut toy low archives, but I also want to test the high or medium archive as well, and now there is the problem. I did download both tarballs [from http://gigadb.org/dataset/100344] and unzip them, but only to get the samples without any other file while there should be also the gsa and binning which are not there in both tarballs. Is it possible to fix this, or is there any other source which contain the correct tarball as download?

This would be a great help and thank you in advance and again I'm sorry if this topic is wrong here!

@SantaMcCloud SantaMcCloud changed the title Old files archive not are not correct Old files archive are not correct Sep 8, 2024
@fernandomeyer
Copy link
Contributor

You can download the binning gold standards for the Medium and High pooled assemblies here:
Medium: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_MEDIUM/pooled_gsa_mapping.binning.tsv
High: https://openstack.cebitec.uni-bielefeld.de:8080/swift/v1/CAMI_I_HIGH/gsa_mapping_pool.binning
Other files are available, as in the description of each dataset at https://data.cami-challenge.org/participate. The camiClient.jar can be useful sometimes to list and download available files.

@SantaMcCloud
Copy link
Author

Yes this did help, thank you, but there is a problem with the high dataset. The reads and the binning files of the sample doesn't have matching IDs. I don't know if this is only the problem since the reads are download from gigadb and not from the openstack. I tried to download it from there, but I don't have the access for it, at least for the first sample, the other I did not try.

Then I tried to switch to the CAMI2 Toy set which has result in these repositories but the archive in the dataset directory missing some 'tar.gz' files for example sample_3 only has the contig in there, but the reads are missing. Since there is a 'README.txt' file for every file, I assume the missing files should be in there? Could it be possible to make the missing file accessible or not?

Sorry for this kind of question but if there is no possible way to update this archive or fix the mismatching of the sequence IDs from the CAMI high dataset just let me know it!

Thanks you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants