Skip to content

Code for "The battle of the sexes is highly polygenic"

Notifications You must be signed in to change notification settings

harpak-lab/SDS_humans

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

The battle of the sexes in humans is highly polygenic

Jared M. Cole, Carly B. Scott, Mackenzie M. Johnson, Peter R. Golightly, Jedidiah Carlson, Matthew J. Ming, Arbel Harpak, & Mark Kirkpatrick

Welcome! Here you can find the code to reproduce the analyses of "The battle of the sexes in humans is highly polygenic".

Overview:

  1. All data generated from this project can be found at Zenodo: DOI
  2. Code used to perform all analyses and plotting can be found in the scripts directory.
  3. The following software was used:
  4. The following R packages were used:
    • optparse
    • dplyr
    • tidyverse
    • ggplot2
    • ggExtra
    • cowplot
    • scales
    • data.table
    • qqman
    • abc

Data documentation

Detailed descriptions of all the data files found at the Zenodo repository can be found in the file: Data_file_descriptions.txt

Script documentation

Code is number-labeled according to the rough order of when these analyses appear in the text. There are main scripts (in both R and bash, numbered 1-10) that call on several subscripts (unnumbered) to perform various tasks (see scripts directory). ML_functions.R contains most of the functions carried out accross multiple scripts, including the likelihood functions.

1.Simulate_SAS.R

Simulates data and fits likelihood to estimate selection coefficients on simulated data. Requires following data files:

  • UKB_mafs_imputed_filtered.txt
  • UKB_r2_genotyped_filtered.txt
  • UKB_mafs_r2_genotyped.txt

2.UKB_data_extraction.sh

Set of bash commands to extract the relevant haplotype and imputation data from UKB (BGEN files), as well as extract relevant metadata fields (as text file).

3.Sample_level_QC.R

R script that conducts sample-level quality control (QC) steps using metadata (eg, missingness, relatedness, etc)

4.PLINK_site_filtering_steps_and_processing.sh

Bash commands (and using several intermediary scripts) that conduct site-level QC on the UKB genomic data (eg, MAF filtering, genotype quality, regions homologous to sex chromosome regions, etc). Outputs filtered haplotype data.

Filtering done using PLINK 1.9 and PLINK 2.0

This script is used in conjunction with Site_marker_QC_steps.R below.

4.Site_marker_QC_steps.R

R script to do more site-level QC (eg, removing excess heterozygosity, assess missingness between males and females, etc)

This script is used in conjunction with 4.PLINK_site_filtering_steps_and_processing.sh above.

5.Count_haplotypes.sh

Bash commands (and using several intermediary scripts) to count haplotypes from the filtered output generated by 4.PLINK_site_filtering_steps_and_processing.sh.

6.Run_likelihood_analyses_with_bootstrapping.sh

Bash commands (and using several intermediary R and bash scripts) to run the likelihood analyses and perform bootstrapping.

7.Run_SMA.sh

Bash commands for running the Standard Major Axis regression on 27 traits.

8.Run_ABC.R

R script used to perform Approximate Bayesian Computation. Analysis implemented by Carly B. Scott.

9.UKB_main_supp_analyses.R

R script to run statistical analyses on data generated by 6.Run_likelihood_analyses_with_bootstrapping.sh (eg, Mann-Whitney U tests, chi-square tests, etc), Figs 1C-3. Includes all analyses presented in main text and in the supplement.

9.UKB_main_supp_plots.R

R script to perform all plotting in the main text and the supplement.

About

Code for "The battle of the sexes is highly polygenic"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 75.2%
  • Shell 24.8%