Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ncc file #1

Open
ivyhzau opened this issue Jul 28, 2017 · 3 comments
Open

The ncc file #1

ivyhzau opened this issue Jul 28, 2017 · 3 comments

Comments

@ivyhzau
Copy link

ivyhzau commented Jul 28, 2017

The program nuc_dynamics had been run for a large file. It took several days to finish the program, but it didn't produce the result file without any error message. The head 10000 lines took for a test, and it works with the pdb result. I don't know how to solve it.

@tjs23
Copy link
Owner

tjs23 commented Jul 28, 2017

How many contacts (NCC lines) were input for the large file, what was the smallest scale resolution for the calculation (-s option) and what was the last reported output? Also, is this single-cell data? The nuc_dynamics program has been tested, without issue, to about 500,000 single-cell contacts at 100 kb resolution on a machine with 16 Gb RAM. This calculation completed overnight on a 3.5 GHz Xeon E5 CPU.

@ivyhzau
Copy link
Author

ivyhzau commented Jul 28, 2017

There were 130 million contacts. The commods was as follows:
nuc_dynamics K0.ncc -m 10 -o K0.pdb -s 8 2 1 0.5
The data is not single-cell data. It is traditional Hi-C of cell line.

@tjs23
Copy link
Owner

tjs23 commented Aug 2, 2017

In essence, this software is not designed for traditional, population Hi-C data. As described in the README, it is really only for single-cell Hi-C data. I suspect that 130 million contacts would give a memory problem with the restraint list. In any case, compared to single-cell data, the structure calculation will not work well with population Hi-C, where each bead would be restrained to a large proportion of others. I will investigate the memory allocation for the restraints and impose a limit (with a user warning) so that the code at least doesn't break. Nonetheless, in order to get some kind of averaged structure from population data I recommend binning the contacts into regions corresponding to the finest resolution (so 0.5 Mb in this case) and then selecting the most significant pairs (e.g. largest fold change, comparing observed count with expected at each sequence separation - the basic Hi-C normalization) and limiting to a count of 500,000. An NCC format input file would have to be made, but it need only reflect the chromosomal regions for the binned sections. Also, the "-upper" distance limit for the restraints may have to be increased to accommodate the highly antagonistic contacts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants