Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Antibody and Interface Features R scripts #1

Open
wants to merge 56 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
0ddb7b3
initial copy of scripts. No changes yet made to them.
jadolfbr Jun 14, 2016
58264ce
Fix all interface scripts. Remove some non-useful plots that I have …
jadolfbr Jun 14, 2016
4e6c58a
Fix all antibody scripts.
jadolfbr Jun 15, 2016
dd2a3bc
add script for local installation from dunbrack repo for now.
jadolfbr Jun 15, 2016
edefc3e
add a script to run the features by passing a configuration file.
jadolfbr Jun 15, 2016
d029098
Update README.md
jadolfbr Jun 15, 2016
cca80cf
re-implement char_as_factor in query_sample_sources.
jadolfbr Jun 15, 2016
79f3459
Merge branch 'master' of github.com:DunbrackLab/RosettaFeatures
jadolfbr Jun 15, 2016
f0a3ee4
fixes.
jadolfbr Jun 16, 2016
b3afa42
fixes.
jadolfbr Jun 16, 2016
e6bef23
fixes.
jadolfbr Jun 16, 2016
683d04c
fixes.
jadolfbr Jun 16, 2016
d7df97a
fixes.
jadolfbr Jun 16, 2016
12d1c24
fixes.
jadolfbr Jun 16, 2016
1b4d17b
fixes.
jadolfbr Jun 16, 2016
e69a250
fixes.
jadolfbr Jun 16, 2016
58758ea
remove char_as_factor
jadolfbr Jun 16, 2016
a131d5a
comment out cdr4 option or now, until it can be set via json.
jadolfbr Jun 16, 2016
4fba537
changes to local install. Please update .Renviron with your path-to-r…
Jun 28, 2016
912b36c
update gitignore. remove Rprofile and Renviron from PR.
jadolfbr Sep 14, 2016
894a0bc
attempt ordering fix by using adply instead of ddply.
jadolfbr Sep 14, 2016
84a02e1
print features for now during debugging.
jadolfbr Sep 14, 2016
9e1b752
print features for now during debugging.
jadolfbr Sep 14, 2016
42f99af
print features for now during debugging.
jadolfbr Sep 14, 2016
c85723a
print features for now during debugging.
jadolfbr Sep 14, 2016
2407512
print features for now during debugging.
jadolfbr Sep 14, 2016
cb7035d
print features for now during debugging.
jadolfbr Sep 14, 2016
fddc323
print features for now during debugging.
jadolfbr Sep 14, 2016
b2d93c0
print features for now during debugging.
jadolfbr Sep 14, 2016
06299bb
remove debugging info.
jadolfbr Sep 14, 2016
50e23c0
make abbreviations less severe. I wish this was automatic.
jadolfbr Sep 14, 2016
ba956a9
increase length for abbreviations.
jadolfbr Sep 14, 2016
0377b85
fix sasa plots.
jadolfbr Sep 14, 2016
c596146
debugging interface hbond scripts.
jadolfbr Jan 18, 2017
920a6a2
fix a bunch more plots with ggplot2 upgrade. fun stuff.
jadolfbr Jan 19, 2017
f11e4e9
fix more crap that the new ggplot2 broke.
jadolfbr Jan 19, 2017
e015f63
fix more crap that the new ggplot2 broke.
jadolfbr Jan 19, 2017
5357ee7
more fixes.
jadolfbr Jan 19, 2017
1d41ab4
some trimming to scripts. Better filtering of low data.
jadolfbr Feb 6, 2018
1b5054d
remove some unused interface features scripts. fix up.
jadolfbr Feb 6, 2018
f853f3b
finish fixing features scripts.
jadolfbr Feb 6, 2018
72274f3
testing.
jadolfbr Feb 6, 2018
fd6d311
fix top n percent plots to do ddplyr to correctly select top models.
jadolfbr Feb 6, 2018
55cc8db
fix top n percent plots to do ddplyr to correctly select top models.
jadolfbr Feb 6, 2018
cbb1238
fix top n percent plots to do ddplyr to correctly select top models.
jadolfbr Feb 6, 2018
acb4a28
fix top n percent plots to do ddplyr to correctly select top models.
jadolfbr Feb 6, 2018
2b27ece
fix top n percent plots to do ddplyr to correctly select top models.
jadolfbr Feb 6, 2018
6f0b3a9
fix top n percent plots to do ddplyr to correctly select top models.
jadolfbr Feb 6, 2018
b00c382
remove lm plot from dG vs.
jadolfbr Feb 8, 2018
e677412
add plots for total score testing by native.
jadolfbr Feb 8, 2018
e370958
add plots for total score testing by native.
jadolfbr Feb 8, 2018
b9414f9
add native plots.
jadolfbr Feb 8, 2018
1b12257
add native plots.
jadolfbr Feb 8, 2018
d03a9ab
add native plots.
jadolfbr Feb 8, 2018
6707544
add native plots.
jadolfbr Feb 8, 2018
036a1e5
add native plots.
jadolfbr Feb 8, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,5 @@ inst/doc
*.Rds
Rplots.pdf
*~
.Rprofile
.Renviron
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Title: Tools for analyzing macromolecular feature distributions with Rosetta
Description: A Stela is a slab, such as the Rosetta stone, that
illustrates small symbols or diagrams showing rules or
patterns. This package supports the analysis of molecular energy
function by comparing distributions of local geometric features,
function or other molecular charactoristics by comparing distributions of local geometric features,
often obtained from native or Rosetta-simulated macromolecular
conformations.
Authors@R: person("Matthew", "O'Meara", email = "[email protected]",
Expand All @@ -27,6 +27,7 @@ Imports:
optparse,
reshape2,
proto,
grid,
ggplot2 (>= 1.0.1),
RSQLite,
logspline,
Expand Down
2 changes: 0 additions & 2 deletions R/compare_sample_sources.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@
# (c) For more information, see http://www.rosettacommons.org. Questions about this can be
# (c) addressed to University of Washington UW TechTransfer, email: [email protected].



load_config_file <- function(config_filename, verbose=F){
if(!file.exists(config_filename)){
cat("ERROR: Config file '", config_filename, "' does not exist.\n", sep="")
Expand Down
4 changes: 2 additions & 2 deletions R/support-ggplot2_geom_indicator.R
Original file line number Diff line number Diff line change
Expand Up @@ -114,12 +114,12 @@ GeomIndicator <- ggplot2::ggproto(
size <- data$size[1]
level <- data$group[1] - 1

textGrob(
grid::textGrob(
indicator_display_value,
unit(xpos, "npc"),
unit(ypos, "npc") - unit(level, "line"),
just=c(xjust, yjust),
gp=gpar(
gp=grid::gpar(
col=alpha(data$colour[1], data$alpha[1]),
fontsize=size*12/5,
fontfamily=data$family[1],
Expand Down
20 changes: 19 additions & 1 deletion R/support-query_sample_sources.R
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ query_sample_sources <- function(
tryCatch(sele,error=function(e){
cat("ERROR: The select statement is not defined.\n")
})
features <- plyr::ddply(sample_sources, c("sample_source"), function(ss){
features <- plyr::adply(sample_sources, 1, function(ss){
tryCatch(c(ss),error=function(e){
cat("ERROR: The specified sample source is not defined.\n")
})
Expand Down Expand Up @@ -62,6 +62,15 @@ query_sample_sources <- function(
cat("WARNING: The following query returned no rows:\n")
cat(sele)
}


#if(char_as_factor){
# for(col in names(features)){
# if(is.character(features[,col])){
# features[,col] <- factor(features[,col])
# }
# }
#}
features
}

Expand Down Expand Up @@ -139,6 +148,15 @@ In the returned data.frame the there will be the following columns:
cat(sele)
return(features)
}


#if(char_as_factor){
# for(col in names(features)){
# if(is.character(features[,col])){
# features[,col] <- factor(features[,col])
# }
# }
#}
data.frame(
ref_sample_source = factor(ref_ss$sample_source[1]),
new_sample_source = factor(features$sample_source),
Expand Down
1 change: 1 addition & 0 deletions R/support-save_plots.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

# Save the last ggplot() object created. For each output format,
# generate a plot and put in the output directory

#' @export
save_plots <- function(
features_analysis,
Expand Down
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,13 @@ To install this package, in R:
}
devtools::install_github("momeara/RosettaFeatures")

To install locally, run the ```install_local.R``` script or run the following:

devtools::document() # if you changed function signatures
devtools:build()

devtools::install_local(PATH)

Generate features databases following the features_benchmark protocol capture

https://github.com/RosettaCommons/demos/tree/master/protocol_capture/features_benchmark/README.md
Expand All @@ -45,10 +52,11 @@ Generate features databases following the features_benchmark protocol capture
Then to report features, in R:

library(RosettaFeatures)
libary(methods)
compare_sample_sources(
config_filename="analysis_configuration.json")

Where the `analysis_configuration.json` looks like:
Where the `analysis_configuration.json` looks like (note the change removal of compare_sample_sources main dictionary from previous the pre-library version):

{
"output_dir" : "native_vs_relax_native",
Expand Down
Empty file.
111 changes: 111 additions & 0 deletions inst/scripts/analysis/plots/antibodies/SASA/ab_cdr_SASA_den.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# -*- tab-width:2;indent-tabs-mode:t;show-trailing-whitespace:t;rm-trailing-spaces:t -*-
# vi: set ts=2 noet:
#
# (c) Copyright Rosetta Commons Member Institutions.
# (c) This file is part of the Rosetta software suite and is made available under license.
# (c) The Rosetta software is developed by the contributing members of the Rosetta Commons.
# (c) For more information, see http://www.rosettacommons.org. Questions about this can be
# (c) addressed to University of Washington UW TechTransfer, email: [email protected].

library(ggplot2)
library(plyr)
library(grid)

feature_analyses <- c(feature_analyses, methods::new("FeaturesAnalysis",
id = "ab_SASA-CDR_den",
author = "Jared Adolf-Bryfogle",
brief_description = "CDR Sasas",
feature_reporter_dependencies = c("AntibodyFeatures"),
run=function(self, sample_sources, output_dir, output_formats){

#First we run on all the interfaces in the database


#if ("FALSE" %in% opt$options$include_cdr4 & "FALSE" %in% opt$options$cdr4_only){
sele = "
SELECT
SASA,
CDR,
length
FROM
cdr_metrics
WHERE
CDR NOT LIKE '%Proto%'
"
#}

# if ("TRUE" %in% opt$options$include_cdr4){
# sele = "
# SELECT
# SASA,
# CDR,
# length
# FROM
# cdr_metrics"
# }

# if ("TRUE" %in% opt$options$cdr4_only){
# sele = "
# SELECT
# SASA,
# CDR,
# length
# FROM
# cdr_metrics
# WHERE
# CDR LIKE '%Proto%'"
# }

data = query_sample_sources(sample_sources, sele)

parts <- list(
geom_indicator(aes(indicator=counts, colour=sample_source, group=sample_source)),
scale_y_continuous("Feature Density"),
theme_bw())

plot_field = function(p, plot_id, grid = NULL){

if (! is.null(grid)){
p <- p+ facet_wrap(facets=grid, ncol=3)
}
if(nrow(sample_sources) <= 3){
p <- p + theme(legend.position="bottom", legend.direction="horizontal")
}
save_plots(self, plot_id, sample_sources, output_dir, output_formats)
}


#CDR SASA
group = c("sample_source", "CDR")
dens <- estimate_density_1d(data, group, c("SASA"))
p <- ggplot(data=dens, na.rm=T) + parts +
geom_line(aes(x, y, colour=sample_source), size=1.2) +
xlab("SASA") +
ggtitle("CDR SASA")
plot_field(p, "cdr_sasa_den", ~CDR)

cdr_avgs = ddply(data, .(sample_source, CDR), function(data){
data.frame(m=mean(data$SASA))
})

len_avgs = ddply(data, .(sample_source, CDR, length), function(data){
data.frame(m=mean(data$SASA))
})

p <- ggplot(data=cdr_avgs) +
geom_bar(aes(x=CDR, y=m, fill=sample_source), position="dodge", stat='identity') +
xlab("CDR") +
ylab("SASA") +
ggtitle("Average CDR SASA")
plot_field(p, "avg_cdr_sasa_hist")

p <- ggplot(data=len_avgs) +
geom_bar(aes(x=length, y=m, fill=sample_source), position="dodge", stat='identity') +
xlab("CDR Length") +
ylab("SASA") +
ggtitle("Average CDR SASA")
plot_field(p, "avg_cdr_sasa_hist_by_length", grid= ~CDR)



})) # end FeaturesAnalysis
127 changes: 127 additions & 0 deletions inst/scripts/analysis/plots/antibodies/SASA/ab_paratope_SASA_den.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# -*- tab-width:2;indent-tabs-mode:t;show-trailing-whitespace:t;rm-trailing-spaces:t -*-
# vi: set ts=2 noet:
#
# (c) Copyright Rosetta Commons Member Institutions.
# (c) This file is part of the Rosetta software suite and is made available under license.
# (c) The Rosetta software is developed by the contributing members of the Rosetta Commons.
# (c) For more information, see http://www.rosettacommons.org. Questions about this can be
# (c) addressed to University of Washington UW TechTransfer, email: [email protected].

library(ggplot2)
library(plyr)
library(grid)

feature_analyses <- c(feature_analyses, methods::new("FeaturesAnalysis",
id = "ab_SASA-paratope_den",
author = "Jared Adolf-Bryfogle",
brief_description = "Various statistics on the H3 Kink",
feature_reporter_dependencies = c("AntibodyFeatures"),
run=function(self, sample_sources, output_dir, output_formats){

#First we run on all the interfaces in the database



sele = "
SELECT
paratope_SASA,
paratope_hSASA,
paratope_SASA - paratope_hSASA as paratope_pSASA
FROM
ab_metrics
"

data = query_sample_sources(sample_sources, sele)

parts <- list(
geom_indicator(aes(indicator=counts, colour=sample_source, group=sample_source)),
scale_y_continuous("Feature Density"),
theme_bw())

plot_field = function(p, plot_id, grid = NULL){

if (! is.null(grid)){
p <- p+ facet_grid(facets=grid)
}
if(nrow(sample_sources) <= 3){
p <- p + theme(legend.position="bottom", legend.direction="horizontal")
}
save_plots(self, plot_id, sample_sources, output_dir, output_formats)
}


#Paratope SASA
data_rm_out <- ddply(data, .(sample_source), function(d2){
subset(d2, subset=(d2$paratope_SASA <= quantile(d2$paratope_SASA, .90))) #Remove high energy outliers
})

data_top <- ddply(data, .(sample_source), function(d2){
subset(d2, subset=(d2$paratope_SASA <= quantile(d2$paratope_SASA, .10))) #Top 10 percent
})

group = c("sample_source")
dens <- estimate_density_1d(data, group, c("paratope_SASA"))
p <- ggplot(data=dens, na.rm=T) + parts +
geom_line(aes(x, y, colour=sample_source), size=1.2) +
xlab("SASA") +
ggtitle("CDR Paratope SASA")
plot_field(p, "paratope_sasa_den")

group = c("sample_source")
dens <- estimate_density_1d(data_rm_out, group, c("paratope_SASA"))
p <- ggplot(data=dens, na.rm=T) + parts +
geom_line(aes(x, y, colour=sample_source), size=1.2) +
xlab("SASA") +
ggtitle("CDR Paratope SASA")
plot_field(p, "top_90_percent_paratope_sasa_den")

group = c("sample_source")
dens <- estimate_density_1d(data_top, group, c("paratope_SASA"))
p <- ggplot(data=dens, na.rm=T) + parts +
geom_line(aes(x, y, colour=sample_source), size=1.2) +
xlab("SASA") +
ggtitle("CDR Paratope SASA")
plot_field(p, "paratope_sasa_top_10_percent_den")

#Natives

sele = "
SELECT
paratope_SASA,
paratope_hSASA,
paratope_SASA - paratope_hSASA as paratope_pSASA,
natives.native as native
FROM
ab_metrics,
natives
WHERE
ab_metrics.struct_id = natives.struct_id
"

data = query_sample_sources(sample_sources, sele)

data_rm_out <- ddply(data, .(sample_source, native), function(d2){
subset(d2, subset=(d2$paratope_SASA <= quantile(d2$paratope_SASA, .90))) #Remove high energy outliers
})

data_top <- ddply(data, .(sample_source, native), function(d2){
subset(d2, subset=(d2$paratope_SASA <= quantile(d2$paratope_SASA, .10))) #Top 10 percent
})

group = c("sample_source")
dens <- estimate_density_1d(data_rm_out, group, c("paratope_SASA"))
p <- ggplot(data=dens, na.rm=T) + parts +
geom_line(aes(x, y, colour=sample_source), size=1.2) +
xlab("SASA") +
ggtitle("CDR Paratope SASA")
plot_field(p, "top_90_percent_paratope_sasa_den_by_native")

group = c("sample_source")
dens <- estimate_density_1d(data_top, group, c("paratope_SASA"))
p <- ggplot(data=dens, na.rm=T) + parts +
geom_line(aes(x, y, colour=sample_source), size=1.2) +
xlab("SASA") +
ggtitle("CDR Paratope SASA")
plot_field(p, "paratope_sasa_top_10_percent_den_by_native")

})) # end FeaturesAnalysis
Loading