Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mask didn't work in the call2plink workflow #43

Open
Zhenghongc opened this issue Aug 18, 2023 · 1 comment
Open

mask didn't work in the call2plink workflow #43

Zhenghongc opened this issue Aug 18, 2023 · 1 comment

Comments

@Zhenghongc
Copy link

I tried to use a mask file to exclude some individuals in the call2plink workflow. But the removal seemed not work and there were several other issues when using a mask file.

  1. I gave a one-column mask file as follows to exclude these individuals. mask_type=sample-label.
    G4968
    G4969
    G4970
    G4971
    G4972
    sheet2fam.py reported an error at line 130. because the mask file doesn't have a second column.
    indivs[(data[0],data[1])]=data[-1]
    so modify the code to:
    indivs[(data[0],data[0])]=data[-1]

  2. the plink log file shows that a --remove command was added to the process, but no individuals were removed. no idea about this bug.
    PLINK v1.90b7 64-bit (16 Jan 2023)
    Options in effect:
    --a2-allele emptyZ0ref.txt
    --bed raw.bed
    --bim raw.bim
    --fam raw.fam
    --flip flips.lst
    --make-bed
    --out plink_out
    --remove mask.inds
    4819 people (0 males, 0 females, 4819 ambiguous) loaded from .fam.
    --remove: 4819 people remaining.

  3. the fixfam process marked mask individuals with 'MSK', but why they are not removed?
    G4969_MSK__ G4969 0 0 1 0
    G4970_MSK__ G4970 0 0 1 0
    G4971_MSK__ G4971 0 0 2 0
    G4972_MSK__ G4972 0 0 2 0

@Zhenghongc
Copy link
Author

I checked the scripts in call2plin/bin and have solved the issues.

  1. indivs[(data[0],data[0])]=data[-1]will not report error.
  2. list_error_inds.py line 28 should be return (data,data)to correctly extract sample id if your mask file only has one column.
  3. sheet2fam.py line 214 if (ofid,oiid) != (fid,real_id): , (ofid,oiid,fid,real_id) write to wrn. I added continue to skip the write. this may affect the result if paramater.replicate is set.
  4. an additional finding. mask file in CSV format will cause an error in reading first line. no idea about this.

hope this will help others who want to use call2plink workflow. thanks for the great work of h3abionet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant