Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about extract columns from hmmsearch domain output #82

Open
Xinpeng021001 opened this issue Dec 19, 2024 · 1 comment
Open

Comments

@Xinpeng021001
Copy link

Hi,

Thank you for designing this program! I'm currently using it to replace the HMMER in my python script, but I meet some issues currently:

    available_memory = psutil.virtual_memory().available
    target_size = os.stat(self.input_faa).st_size
    hmm_files=HMMFiles(self.hmm_file)
    with open("test_hmmer_result.txt", 'wb') as f:
        with pyhmmer.easel.SequenceFile(self.input_faa, digital=True) as seqs:
            if target_size < available_memory * 0.1:   #Pre-fetching targets into memory
                targets = seqs.read_block()
            else:
                targets = seqs
            for i, hits in enumerate(pyhmmer.hmmsearch(hmm_files, targets, cpus=os.cpu_count(), domE=1e-15)):
                hits.write(f, format="domains", header=False)

I'm using hits.write(f, format="domains", header=False) to get the domain output, but I want to extract columns with [query name qlen target name tlen i-Evalue(this domain) hmm_from ali_from] to form a new output result with tab as a separator. I read the doc file but I'm still confused about how to extract those information.

Could you please tell me how Tophit was selected? I noticed that I may have two Tophits with one protein accession from the hits.write.

Really appreciate your help!

Best Regards,
XInpeng

@Xinpeng021001
Copy link
Author

I think I find the way to do it but it would be really helpful if you could verify it:

            for hits in pyhmmer.hmmsearch(hmm_files, targets, cpus=os.cpu_count(), domE=1e-15):
                for hit in hits:
                    for domain in hit.domains.included:
                        #print(dir(domain.alignment))
                        print(domain.alignment.hmm_name, domain.alignment.hmm_length, domain.alignment.target_name, domain.alignment.target_length, domain.i_evalue, domain.alignment.hmm_from, domain.alignment.hmm_to, domain.alignment.target_from, domain.alignment.target_to)

Example usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant