11-BayesianStatistics.Rmd

---
output:
  pdf_document: default
  bookdown::gitbook:
    lib_dir: "book_assets"
    includes:
      in_header: google_analytics.html
  html_document: default
---
<!--# Bayesian statistics-->
# Estadística Bayesiana {#bayesian-statistics}


```{r echo=FALSE,warning=FALSE,message=FALSE}
library(tidyverse)
library(ggplot2)
library(cowplot)
library(boot)
library(MASS)
library(BayesFactor)
library(knitr)

set.seed(123456) # set random seed to exactly replicate results

# load the NHANES data library
library(NHANES)

# drop duplicated IDs within the NHANES dataset
NHANES <-
  NHANES %>%
  dplyr::distinct(ID, .keep_all = TRUE)

NHANES_adult <-
  NHANES %>%
  drop_na(Weight) %>%
  subset(Age >= 18)

```

<!--In this chapter we will take up the approach to statistical modeling and inference that stands in contrast to the null hypothesis testing framework that you encountered in Chapter \@ref(hypothesis-testing).  This is known as "Bayesian statistics" after the Reverend Thomas Bayes, whose theorem you have already encountered in Chapter \@ref(probability).  In this chapter you will learn how Bayes' theorem provides a way of understanding data that solves many of the conceptual problems that we discussed regarding null hypothesis testing, while also introducing some new challenges. -->
En este capítulo estaremos trabajando desde el enfoque del modelado estadístico e inferencial que contrasta con el marco de prueba de hipótesis nula que vimos en el capítulo \@ref(hypothesis-testing).  Esto se conoce como "Estadística Bayesiana", en honor al reverendo Thomas Bayes, cuyo teorema ya has visto en el capítulo  \@ref(probability). En este capítulo aprenderás cómo el teorema de Bayes proporciona una forma de entender los datos que resuelve muchos de los problemas conceptuales que discutimos con respecto a la prueba de hipótesis nula, mientras introduce a su vez nuevos retos.  

<!--## Generative models-->
## Modelos Generativos 

<!--Say you are walking down the street and a friend of yours walks right by but doesn't say hello.  You would probably try to decide why this happened -- Did they not see you?  Are they mad at you?  Are you suddenly cloaked in a magic invisibility shield?  One of the basic ideas behind Bayesian statistics is that we want to infer the details of how the data are being generated, based on the data themselves.  In this case, you want to use the data (i.e. the fact that your friend did not say hello) to infer the process that generated the data (e.g. whether or not they actually saw you, how they feel about you, etc).-->
Digamos que estás caminando por la calle y unx amigx tuyx camina a tu lado pero no te saluda. Probablemente vas a tratar de decidir por qué pasó esto, ¿no te vieron? ¿están enojadxs contigo? ¿De repente traes una capa de invisibilidad y no te has dado cuenta? Una de las ideas básicas de la estadística Bayesiana es que queremos inferir los detalles de cómo son generados los datos, basándonos en los datos mismos. En este caso, tú quieres usar los datos (por ejemplo, el hecho de que tu amigx no te saludó), para inferir el proceso que generó esos datos (si de verdad no te vieron, cómo se sienten con respecto tuyo, etc.).

<!--The idea behind a generative model is that a *latent* (unseen) process generates the data we observe, usually with some amount of randomness in the process. When we take a sample of data from a population and estimate a parameter from the sample, what we are doing in essence is trying to learn the value of a latent variable (the population mean) that gives rise through sampling to the observed data (the sample mean).  Figure \@ref(fig:GenerativeModel) shows a schematic of this idea.-->
La idea detrás de los modelos generativos es que un proceso *latente* (que no se ha visto) genera los datos que observamos, usualmente con una cantidad de aleatoridad en el proceso. Cuando tomamos una muestra de datos de una población y estimamos el parámetro a partir de la muestra, lo que estamos haciendo en esencia es tratar de conocer el valor de la variable latente (la media de la población), la cual da lugar a través del muestreo a los datos observados (la media de la muestra). La Figura \@ref(fig:GenerativeModel) muestra un esquema de esta idea. 

```{r GenerativeModel, echo=FALSE,fig.cap="A schematic of the idea of a generative model.",fig.width=6, out.width="80%"}
knitr::include_graphics("images/BayesianInference.png")
```

<!--If we know the value of the latent variable, then it's easy to reconstruct what the observed data should look like.  For example, let's say that we are flipping a coin that we know to be fair, such that we would expect it to land on heads 50% of the time.  We can describe the coin by a binomial distribution with a value of $P_{heads}=0.5$, and then we could generate random samples from such a distribution in order to see what the observed data should look like. However, in general we are in the opposite situation: We don't know the value of the latent variable of interest, but we have some data that we would like to use to estimate it.-->
Si conociéramos el valor de la variable latente, entonces debería de ser fácil reconstruir cómo deberían verse los datos observados. Por ejemplo, digamos que lanzamos una moneda que sabemos que está balanceada, por lo que podemos esperar que caiga cara el 50% de las veces que se lanza. Podemos describir la moneda con una distribución binomial con un valor de $P_{heads}=0.5$, y luego podemos generar una muestra aleatoria de tal distribución con el fin de visualizar cómo se deberían de ver los datos. No obstante, en general estamos en la posición contraria: No sabemos el valor de la variable de interés latente, pero tenemos algunos datos que nos gustaría usar para estimarlo. 

<!--## Bayes' theorem and inverse inference-->
## El Teorema de Bayes y la Inferencia Inversa

<!--The reason that Bayesian statistics has its name is because it takes advantage of Bayes' theorem to make inferences from data about the underlying process that generated the data.  Let's say that we want to know whether a coin is fair.  To test this, we flip the coin 10 times and come up with 7 heads.  Before this test we were pretty sure that the $P_{heads}=0.5$, but finding 7 heads out of 10 flips would certainly give us pause if we believed that $P_{heads}=0.5$.  We already know how to compute the conditional probability that we would flip 7 or more heads out of 10 if the coin is really fair ($P(n\ge7|p_{heads}=0.5)$), using the binomial distribution.-->
La razón por la cual la estadística Bayesiana tiene tal nombre es porque aprovecha el teorema de Bayes para hacer inferencias a partir de los datos sobre el proceso subyacente que generó los datos. Digamos que queremos saber si una moneda está balanceada. Para probar esto, lanzamos la moneda 10 veces y en 7 ocasiones cae en cara. Antes de esta prueba, estábamos muy seguros de que $P_{heads}=0.5$, pero al caer cara 7 veces de 10 lanzamientos sin duda nos daría una pausa si creyéramos que $P_{heads}=0.5$. Ya sabemos cómo calcular la probabilidad condicional de que caiga 7 o más veces en cara (de 10 veces que se lanza la moneda) si la moneda realmente está balanceada ($P(n\ge7|p_{heads}=0.5)$), usando la distribución binomial.    

```{r echo=FALSE}
# *TBD: MOTIVATE SWITCH FROM 7 To 7 OR MORE*
```

<!--The resulting probability is `r I(sprintf("%.3f",pbinom(7, 10, .5, lower.tail = FALSE)))`.  That is a fairly small number, but this number doesn't really answer the question that we are asking -- it is telling us about the likelihood of 7 or more heads given some particular probability of heads, whereas what we really want to know is the true probability of heads for this particular coin. This should sound familiar, as it's exactly the situation that we were in with null hypothesis testing, which told us about the likelihood of data rather than the likelihood of hypotheses.-->
La probabilidad resultante es `r I(sprintf("%.3f",pbinom(7, 10, .5, lower.tail = FALSE)))`. Ese es un número bastante pequeño, pero este número no responde realmente a la pregunta que estamos haciendo-- nos está diciendo la probabilidad de que en 7 ocasiones o más la moneda caiga en cara, dada una probabilidad particular de cara, mientras que lo que queremos saber en realidad es la probabilidad verdadera de que caiga cara en esta moneda en particular. Esto debería sonar familiar, ya que es una situación exactamente igual a la que estábamos con la prueba de hipótesis nula, la cual nos mostró la probabilidad de los datos, en lugar de la probabilidad de las hipótesis.  

<!--Remember that Bayes' theorem provides us with the tool that we need to invert a conditional probability:-->
Recuerda que el teorema de Bayes nos provee con una herramienta que necesitamos para invertir una probabilidad condicional. 

$$
P(H|D) = \frac{P(D|H)*P(H)}{P(D)}
$$

<!--We can think of this theorem as having four parts:-->
Podemos pensar este teorema como si tuviéramos cuatro partes (entre paréntesis están sus nombres en inglés, y su notación matemática):

<!--- *prior* ($P(Hypothesis)$): Our degree of belief about hypothesis H before seeing the data D
- *likelihood* ($P(Data|Hypothesis)$): How likely are the observed data D under hypothesis H?
- *marginal likelihood* ($P(Data)$): How likely are the observed data, combining over all possible hypotheses?
- *posterior* ($P(Hypothesis|Data)$): Our updated belief about hypothesis H, given the data D-->
- *Probabilidad previa* (*prior*: $P(Hipótesis)$): Nuestro grado de creencia sobre la hipótesis H antes de ver los datos D.
- *Probabilidad* (*Likelihood*: $P(Datos|Hipótesis)$): ¿Qué tan probables son los datos observados D bajo la hipótesis H?
- *Probabilidad marginal* (*marginal likelihood*: $P(Datos)$): ¿Qué tan probables son los datos observados, combinando todas las hipótesis posibles? 
- *Probabilidad Posterior* (*posterior*: $P(Hipótesis|Datos)$): Nuestra creencia actualizada sobre la hipótesis H, dados los datos D.

<!--In the case of our coin-flipping example:-->
En el caso de nuestro ejemplo de lanzar la moneda: 

<!--- *prior* ($P_{heads}$): Our degree of belief about the likelhood of flipping heads, which was $P_{heads}=0.5$
- *likelihood* ($P(\text{7 or more heads out of 10 flips}|P_{heads}=0.5)$): How likely are 7 or more heads out of 10 flips if $P_{heads}=0.5)$?
- *marginal likelihood* ($P(\text{7 or more heads out of 10 flips})$): How likely are we to observe 7 heads out of 10 coin flips, in general?
- *posterior* ($P_{heads}|\text{7 or more heads out of 10 coin flips})$): Our updated belief about $P_{heads}$ given the observed coin flips-->
- *Probilidad previa* ($P_ {cara}$): Nuestro grado de creencia sobre la probabilidad de que caiga en cara, que fue $P_ {cara} = 0.5$.
- *Probabilidad* ($P(\text{7 o más caras en 10 lanzamientos}|P_{cara}=0.5)$): ¿Qué probabilidad hay de 7 o más caras de 10 lanzamientos si $P_{cara}=0.5)$?
- *Probabilidad marginal* ($P(\text{7 o más caras en 10 lanzamientos})$): ¿Qué probabilidades hay de que observemos 7 caras de cada 10 lanzamientos de moneda, en general?
- *Probabilidad posterior* ($P_{cara}|\text{7 o más caras en 10 lanzamientos})$): Nuestra creencia actualizada sobre $P_{cara}$ dados los lanzamientos de moneda observados.

<!--Here we see one of the primary differences between frequentist and Bayesian statistics. Frequentists do not believe in the idea of a probability of a hypothesis (i.e. our degree of belief about a hypothesis) -- for them, a hypothesis is either true or it isn't. Another way to say this is that for the frequentist, the hypothesis is fixed and the data are random, which is why frequentist inference focuses on describing the probability of data given a hypothesis (i.e. the p-value). Bayesians, on the other hand, are comfortable making probability statements about both data and hypotheses.-->
Aquí vemos una de las principales diferencias entre la estadística frecuentista y bayesiana. Lxs frecuentistas no creen en la idea de una probabilidad de una hipótesis (es decir, nuestro grado de creencia sobre una hipótesis); para ellxs, una hipótesis es verdadera o no lo es. Otra forma de decir esto es que para la/el frecuentista, la hipótesis es fija y los datos son aleatorios, por lo que la inferencia frecuentista se centra en describir la probabilidad de los datos dada una hipótesis (es decir, el valor p). Lxs bayesianos, por otro lado, se sienten cómodos haciendo declaraciones de probabilidad sobre datos e hipótesis.

<!--## Doing Bayesian estimation {#doing-bayesian-estimation}-->
## Haciendo estimaciones Bayesianas {#doing-bayesian-estimation}

<!--We ultimately want to use Bayesian statistics to make decisions about  hypotheses, but before we do that we need to estimate the parameters that are necessary to make the decision. Here we will walk through the process of Bayesian estimation.  Let's use another screening example: Airport security screening.  If you fly a lot, it's just a matter of time until one of the random explosive screenings comes back positive; I had the particularly unfortunate experience of this happening soon after September 11, 2001, when airport security staff were especially on edge.--> 
En última instancia, queremos utilizar la estadística bayesiana para tomar decisiones sobre las hipótesis, pero antes de hacerlo, debemos estimar los parámetros que son necesarios para tomar la decisión. Aquí recorreremos el proceso de estimación bayesiana. Usemos otro ejemplo de inspecciones (*screening*): la inspección de seguridad del aeropuerto. Si vuelas mucho, es solo cuestión de tiempo para que una de las inspecciones aleatorias de explosivos resulte positiva; tuve la experiencia particularmente desafortunada de que esto sucediera poco después del 11 de septiembre de 2001, cuando el personal de seguridad del aeropuerto estaba especialmente nervioso.

<!--What the security staff want to know is what is the likelihood that a person is carrying an explosive, given that the machine has given a positive test.  Let's walk through how to calculate this value using Bayesian analysis.-->
Lo que el personal de seguridad quiere saber es cuál es la probabilidad de que una persona lleve un explosivo, dado que la máquina ha dado positivo en la prueba. Veamos cómo calcular este valor mediante el análisis bayesiano.

<!--### Specifying the prior-->
### Especificar la probabilidad previa

<!--To use Bayes' theorem, we first need to specify the prior probability for the hypothesis.  In this case, we don't know the real number but we can assume that it's quite small.  According to the [FAA](https://www.faa.gov/air_traffic/by_the_numbers/media/Air_Traffic_by_the_Numbers_2018.pdf), there were 971,595,898 air passengers in the U.S. in 2017.  Let's say that one of those travelers was carrying an explosive in their bag --- that would give a prior probability of 1 out of 971 million, which is very small!  The security personnel may have reasonably held a stronger prior in the months after the 9/11 attack, so let's say that their subjective belief was that one out of every million flyers was carrying an explosive.-->
Para usar el teorema de Bayes, primero debemos especificar la probabilidad previa para la hipótesis. En este caso, no sabemos el número real pero podemos asumir que es pequeño. De acuerdo a la [FAA](https://www.faa.gov/air_traffic/by_the_numbers/media/Air_Traffic_by_the_Numbers_2018.pdf), había 971,595,898 pasajeros aéreos en los Estados Unidos de América en 2017. Digamos que uno de esos viajeros llevaba un explosivo en su bolsa --- esto nos daría una probabilidad previa de 1 entre 971 millones, lo cual es muy pequeño. El personal de seguridad seguramente tuvo en mente una probabilidad previa mucho mayor en los meses después del ataque del 9/11, así que diremos que su pensamiento subjetivo era que uno en un millón de viajeros traía consigo un explosivo. 

```{r echo=FALSE}
bayes_df = data.frame(prior=NA, 
                      likelihood=NA, 
                      marginal_likelihood=NA, 
                      posterior=NA)

bayes_df$prior <- 1/1000000 


nTests <- 3
nPositives <- 3
sensitivity <- 0.99
specificity <- 0.99

bayes_df$likelihood <- dbinom(nPositives, nTests, 0.99)

bayes_df$marginal_likelihood <- 
  dbinom(
    x = nPositives, 
    size = nTests, 
    prob = sensitivity
  ) * bayes_df$prior + 
  dbinom(
    x = nPositives, 
    size = nTests, 
    prob = 1 - specificity
  ) * 
  (1 - bayes_df$prior)

bayes_df$posterior <- (bayes_df$likelihood * bayes_df$prior) / bayes_df$marginal_likelihood

```

<!--### Collect some data-->
### Recolectar los datos

<!--The data are composed of the results of the explosive screening test.  Let's say that the security staff runs the bag through their testing apparatus `r I(nTests)` times, and it gives a positive reading on `r I(nPositives)` of the `r I(nTests)` tests.-->
Los datos se componen de los resultados de las pruebas de screening de explosivos. Digamos que el staff de seguridad pasa una bolsa a través de su aparato para comprobar que sea seguro, lo pasa durante `r I(nTests)` veces, y da positivo en `r I(nPositives)` de las `r I(nTests)` pruebas.

<!--### Computing the likelihood-->
### Calcular la probabilidad (likelihood)

<!--We want to compute the likelihood of the data under the hypothesis that there is an explosive in the bag.  Let's say that we know (from the machine's manufacturer) that the sensitivity of the test is `r I(sprintf('%.2f',sensitivity))` -- that is, when a device is present, it will detect it `r I(sprintf('%.0f%%',sensitivity*100))` of the time. To determine the likelihood of our data under the hypothesis that a device is present, we can treat each test as a Bernoulli trial (that is, a trial with an outcome of true or false) with a probability of success of `r I(sprintf('%.2f',sensitivity))`, which we can model using a binomial distribution.-->
Queremos calcular la probabilidad (*likelihood*) de los datos observados bajo la hipótesis de que hay un explosivo en la bolsa. Digamos que sabemos (por los fabricantes de la máquina) que la sensibilidad de la prueba es `r I(sprintf('%.2f',sensitivity))` -- o sea, cuando un objeto explosivo está presente lo detectará un `r I(sprintf('%.0f%%',sensitivity*100))` de las veces. Para determinar la probabilidad de nuestros datos bajo la hipótesis de que un objeto explosivo está presente, podemos tratar cada prueba como un ensayo de Bernoulli (o sea, un ensayo con un resultado de verdadero o falso) con una probabilidad de éxito de `r I(sprintf('%.2f',sensitivity))`, lo cual podemos modelar con una distribución binomial.  


<!--### Computing the marginal likelihood-->
### Calcular la probabilidad marginal (marginal likelihood)

<!--We also need to know the overall likelihood of the data -- that is, finding `r I(nPositives)` positives out of `r I(nTests)` tests. Computing the marginal likelihood is often one of the most difficult aspects of Bayesian analysis, but for our example it's simple because we can take advantage of the specific form of Bayes' theorem for a binary outcome that we introduced in Section \@ref(bayestheorem):-->
También necesitamos saber la probabilidad (*likelihood*) total de los datos -- esto es, el encontrar `r I(nPositives)` positivos de `r I(nTests)` pruebas. Calcular la probabilidad marginal es comúnmente uno de los aspectos del análisis Bayesiano más difíciles, pero para nuestro ejemplo, es simple, ya que, podemos tomar ventaja de la forma específica del teorema de Bayes para un resultado binario que presentamos en la sección \@ref(bayestheorem):
$$
P(E|T) = \frac{P(T|E)*P(E)}{P(T|E)*P(E) + P(T|\neg E)*P(\neg E)}
$$

<!--where $E$ refers to the presence of explosives, and $T$ refers to a postive test result.-->
en donde $E$ se refiere a la presencia de explosivos y $T$ se refiere a un resultado positivo de la prueba.

<!--The marginal likelihood in this case is a weighted average of the likelihood of the data under either presence or absence of the explosive, multiplied by the probability of the explosive being present (i.e. the prior).  In this case, let's say that we know (from the manufacturer) that the specificity of the test is `r I(sprintf('%.2f', specificity))`, such that the likelihood of a positive result when there is no explosive ($P(T|\neg E)$) is `r I(sprintf('%.2f', 1 - specificity))`.-->
En este caso, la probabilidad marginal es una probabilidad ponderada (*weighted probability*) de la probabilidad de los datos en la presencia y en la ausencia de explosivos, multiplicada por la probabilidad de que haya un explosivo presente (es decir, la probabilidad previa). En este caso, digamos que sabemos (gracias al fabricante), que la especificidad de la prueba es `r I(sprintf('%.2f', specificity))`, por lo que la probabilidad de un resultado positivo cuando no hay explosivo ($P(T|\neg E)$) es `r I(sprintf('%.2f', 1 - specificity))`. 

<!--### Computing the posterior-->
### Calcular la probabilidad posterior

<!--We now have all of the parts that we need to compute the posterior probability of an explosive being present, given the observed `r I(nPositives)` positive outcomes out of `r I(nTests)` tests.  
This result shows us that the posterior probability of an explosive in the bag given these positive tests (`r I(sprintf('%.3f', bayes_df$posterior))`) is just under 50%, again highlighting the fact that testing for rare events is almost always liable to produce high numbers of false positives, even when the specificity and sensitivity are very high.-->
Ahora que tenemos todas las partes que necesitamos para calcular la probabilidad posterior de que un explosivo esté presente en la bolsa, dados los resultados observados de `r I(nPositives)` positivos de `r I(nTests)` pruebas.
Este resultado nos muestra que la probabilidad posterior de que haya un explosivo en la bolsa dadas estas pruebas positivas (`r I(sprintf('%.3f', bayes_df$posterior))`) está justo por debajo del 50%, de nuevo resaltando el hecho de que las pruebas para detectar eventos raros casi siempre pueden producir un gran número de falsos positivos, incluso cuando la especificidad y la sensibilidad son muy altas.

<!--An important aspect of Bayesian analysis is that it can be sequential.  Once we have the posterior from one analysis, it can become the prior for the next analysis!-->
Un aspecto importante del análisis Bayesiano es que puede ser secuencial. Una vez que tenemos la probabilidad posterior de un análisis, se puede convertir en la probabilidad previa del siguiente análisis.

<!--## Estimating posterior distributions {#estimating-posterior-distributions}-->
## Estimar distribuciones posteriores {#estimating-posterior-distributions}

<!--In the previous example there were only two possible outcomes -- the explosive is either there or it's not -- and we wanted to know which outcome was most likely given the data.  However, in other cases we want to use Bayesian estimation to estimate the numeric value of a parameter.  Let's say that we want to know about the effectiveness of a new drug for pain; to test this, we can administer the drug to a group of patients and then ask them whether their pain was improved or not after taking the drug.  We can use Bayesian analysis to estimate the proportion of people for whom the drug will be effective using these data.-->
En el ejemplo anterior solamente había dos posibilidades -- el explosivo estaba ahí o no -- y queríamos saber cuál resultado era más probable dados los datos que teníamos. Sin embargo, en otros casos queremos usar estimaciones Bayesianas para estimar el valor numérico de un parámetro. Digamos que queremos conocer la efectividad de un nuevo medicamento para el dolor; para probar esto, podemos administrar el medicamento a un grupo de pacientes y luego preguntarles si su dolor disminuyó o no después de tomar el medicamento. Podemos usar un análisis Bayesiano para estimar la proporción de personas para las cuales el medicamento será efectivo utilizando estos datos.

<!--### Specifying the prior-->
### Especificar la probabilidad previa

```{r echo=FALSE}
# *TBD: MH: USE PRIOR BIASED TOWARDS ZERO?*
```


<!--In this case, we don't have any prior information about the effectiveness of the drug, so we will use a *uniform distribution* as our prior, since all values are equally likely under a uniform distribution.  In order to simplify the example, we will only look at a subset of 99 possible values of effectiveness (from .01 to .99, in steps of .01). Therefore, each possible value has a prior probability of 1/99. -->
En ete caso, no tenemos información previa sobre la efectividad del medicamento, así que usaremos una *distribución uniforme* como nuestra probabilidad previa, ya que los valores son igualmente probables en una distribución uniforme. Para simplificar este ejemplo, solo veremos un subconjunto de 99 posibles valores de efectividad (de .01 a .99, en pasos de .01). Por lo que cada valor posible tiene una probabilidad previa de 1/99. 

<!--### Collect some data-->
### Recolectar algunos datos


```{r echo=FALSE}
# create a table with results
nResponders <- 64
nTested <- 100

drugDf <- tibble(
  outcome = c("improved", "not improved"),
  number = c(nResponders, nTested - nResponders)
)

```

<!--We need some data in order to estimate the effect of the drug.  Let's say that we administer the drug to 100 individuals, we find that `r I(nResponders)` respond positively to the drug.-->
Necesitamos algunos datos para poder estimar la efectividad del medicamento. Digamos que administramos el medicamento a 100 individuos, y encontramos que `r I(nResponders)` responden positivamente al medicamento (*responders*).

<!--### Computing the likelihood-->
### Calcular la probabilidad (likelihood)

<!--We can compute the likelihood of the observed data under any particular value of the effectiveness parameter using the binomial density function. In Figure \@ref(fig:like2) you can see the likelihood curves over numbers of responders for several different values of $P_{respond}$. Looking at this, it seems that our observed data are relatively more likely under the hypothesis of $P_{respond}=0.7$, somewhat less likely under the hypothesis of $P_{respond}=0.5$, and quite unlikely under the hypothesis of $P_{respond}=0.3$.  One of the fundamental ideas of Bayesian inference is that we should upweight our belief in values of our parameter of interest in proportion to how likely the data are under those values, balanced against what we believed about the parameter values before having seen the data (our prior knowledge).-->
Podemos calcular la probabilidad (*likelihood*) de los datos observados bajo cualquier valor particular del parámetro de efectividad usando la función de densidad binomial. En la Figura \@ref(fig:like2) puedes ver las curvas de probabilidad sobre el número de respondientes (quienes respondieron positivamente al medicamento, *responders*) para varios valores de $P_ {respond}$. Observando esto, parece que nuestros datos observados son relativamente más probables bajo la hipótesis de $P_ {respond} = 0.7$, algo menos probable bajo la hipótesis de $P_ {respond} = 0.5$, y bastante improbable bajo la hipótesis de $P_ {respond} = 0.3$. Una de las ideas fundamentales de la inferencia bayesiana es que debemos cambiar nuestra creencia sobre los valores de nuestro parámetro de interés en proporción a qué tan probables serían los datos bajo esos valores, contrastados contra lo que creíamos acerca de los valores del parámetro antes de haber visto los datos (nuestro conocimiento previo).

<!-- fig.cap='Likelihood of each possible number of responders under several different hypotheses (p(respond)=0.5 (solid), 0.7 (dotted), 0.3 (dashed).  Observed value shown in the vertical line' -->
```{r like2,echo=FALSE,fig.cap='Probabilidad (likelihood) de cada número de respondientes posible bajo diferentes hipótesis (p(respond)=0.5 (línea contínua), 0.7 (línea punteada), 0.3 (línea discontinua).  Los valores observados se muestran en la línea vertical.',fig.width=4,fig.height=4,out.height='50%'}

likeDf <-
  tibble(resp = seq(1,99,1)) %>%
  mutate(
    presp=resp/100,
    likelihood5 = dbinom(resp,100,.5),
    likelihood7 = dbinom(resp,100,.7),
    likelihood3 = dbinom(resp,100,.3)
)

ggplot(likeDf,aes(resp,likelihood5)) + 
  geom_line() +
  xlab('number of responders') + ylab('likelihood') +
  geom_vline(xintercept = drugDf$number[1],color='blue') +
  geom_line(aes(resp,likelihood7),linetype='dotted') +
  geom_line(aes(resp,likelihood3),linetype='dashed')


```

<!--### Computing the marginal likelihood-->
### Calcular la probabilidad marginal

<!--In addition to the likelihood of the data under different hypotheses, we need to know the overall likelihood of the data, combining across all hypotheses (i.e., the marginal likelihood). This marginal likelihood is primarily important because it helps to ensure that the posterior values are true probabilities. In this case, our use of a set of discrete possible parameter values makes it easy to compute the marginal likelihood, because we can just compute the likelihood of each parameter value under each hypothesis and add them up.-->
Además de la probabilidad de los datos bajo diferentes hipótesis, necesitamos conocer la probabilidad general de los datos, combinando todas las hipótesis (es decir, la probabilidad marginal). Esta probabilidad marginal es particularmente importante porque ayuda a asegurar que los valores posteriores sean probabilidades verdaderas. En este caso, nuestro uso de un conjunto de posibles valores discretos de parámetros facilita el cálculo de la probabilidad marginal, porque podemos simplemente calcular la probabilidad de cada valor del parámetro bajo cada hipótesis y sumarlos.


```{r ,echo=FALSE}
#  *MH:*not sure there’s a been clear discussion of the marginal likelihood up this point. it’s a confusing and also very deep construct..  the overall likelihood of the data is the likelihood of the data under each hypothesis, averaged together (weighted by) the prior probability of those hypotheses. it is how likely the data is under your prior beliefs about the hypotheses.

# might be worth thinking of two examples, where the likelihood of the data under that hypothesis of interest is the same, but where the marginal likelihood changes i.e., the hypothesis is pretty good at predicting the data, while other hypothese are bad vs. other hypotheses are always good (perhaps better)
```

```{r echo=FALSE}
# compute marginal likelihood
likeDf <- 
  likeDf %>%
  mutate(uniform_prior = array(1 / n()))

# multiply each likelihood by prior and add them up
marginal_likelihood <- 
  sum(
    dbinom(
      x = nResponders, # the number who responded to the drug
      size = 100, # the number tested
      likeDf$presp # the likelihood of each response 
    ) * likeDf$uniform_prior
  )

```

<!--### Computing the posterior-->
### Calcular la probabilidad posterior

<!--We now have all of the parts that we need to compute the posterior probability distribution across all possible values of $p_{respond}$, as shown in Figure \@ref(fig:posteriorDist).-->
Ahora tenemos todas las partes que necesitamos para calcular la distribución de probabilidad posterior a lo largo de todos los valores posibles de $p_{respond}$, como se muestra en la Figura \@ref(fig:posteriorDist). 

```{r echo=FALSE}
# Create data for use in figure
bayesDf <-
  tibble(
    steps = seq(from = 0.01, to = 0.99, by = 0.01)
  ) %>%
  mutate(
    likelihoods = dbinom(
      x = nResponders, 
      size = 100, 
      prob = steps
    ),
    priors = dunif(steps) / length(steps),
    posteriors = (likelihoods * priors) / marginal_likelihood
  )
```

<!-- fig.cap="Posterior probability distribution for the observed data plotted in solid line against uniform prior distribution (dotted line). The maximum a posteriori (MAP) value is signified by the diamond symbol." -->
```{r posteriorDist,echo=FALSE,fig.cap="Distribución de probabilidad posterior para los datos observados graficada con la línea sólida en comparación con la distribución de probabilidad previa uniforme (línea punteada). El valor máximo a posteriori (MAP) está señalado con el símbolo de diamante.",fig.width=4,fig.height=4,out.height='50%'}

# compute MAP estimate
MAP_estimate <- 
  bayesDf %>% 
  arrange(desc(posteriors)) %>% 
  slice(1) %>% 
  pull(steps)


# compute likelihoods for the observed data under all values of p(heads).  here we use the quantized values from .01 to .99 in steps of 0.01


ggplot(bayesDf,aes(steps,posteriors)) +
  geom_line() +
  geom_line(aes(steps,priors),color='black',linetype='dotted') +
  xlab('p(respond)') + ylab('posterior probability of the observed data') +
  annotate(
    "point", 
    x = MAP_estimate, 
    y = max(bayesDf$posteriors), shape=9, 
    size = 3
  )


```

<!--### Maximum a posteriori (MAP) estimation-->
### Estimación máxima a posteriori (MAP, maximum a posteriori)

<!--Given our data we would like to obtain an estimate of $p_{respond}$ for our sample.  One way to do this is to find the value of $p_{respond}$ for which the posterior probability is the highest, which we refer to as the *maximum a posteriori* (MAP) estimate.  We can find this from the data in \@ref(fig:posteriorDist) --- it's the value shown with a marker at the top of the distribution.  Note that the result (`r I(MAP_estimate)`) is simply the proportion of responders from our sample -- this occurs because the prior was uniform and thus didn't influence our estimate.-->
Dados nuestros datos, nos gustaría obtener una estimación de $p_{respond}$ para nuestra muestra. Una forma de hacer esto es encontrar el valor de $p_{respond}$ para el cual la probabilidad posterior es la más alta, al que nos referimos como la estimación *máxima a posteriori* (MAP). Podemos encontrar esto a partir de los datos en la Figura \@ref(fig:posteriorDist) --- es el valor que se muestra con un marcador en la parte superior de la distribución. Ten en cuenta que el resultado (`r I (MAP_estimate)`) es simplemente la proporción de personas respondientes (quienes respondieron positivamente al medicamento) de nuestra muestra; esto ocurre porque la probabilidad a priori (probabilidad previa) era uniforme y por lo tanto no influyó en nuestra estimación.

<!--### Credible intervals-->
### Intervalos de credibilidad

<!--Often we would like to know not just a single estimate for the posterior, but an interval in which we are confident that the posterior falls.  We previously discussed the concept of confidence intervals in the context of frequentist inference, and you may remember that the interpretation of confidence intervals was particularly convoluted: It was an interval that will contain the the value of the parameter 95% of the time.  What we really want is an interval in which we are confident that the true parameter falls, and Bayesian statistics can give us such an interval, which we call a *credible interval*.-->
Frecuentemente nos gustaría saber no solo una estimación única para la probabilidad posterior, sino un intervalo en el que confiamos que la probabilidad posterior caerá. Anteriormente discutimos el concepto de intervalos de confianza en el contexto de la inferencia frecuentista, y podrás recordar que la interpretación de los intervalos de confianza fue particularmente complicada: era un intervalo que contendrá el valor del parámetro el 95% del tiempo. Lo que realmente queremos es un intervalo en el que estemos segurxs que el verdadero parámetro estará incluido, y las estadísticas bayesianas pueden darnos ese intervalo, al que llamamos *intervalo de credibilidad*.

```{r ,echo=FALSE}
#  *TBD: USE POSTERIOR FROM ABOVE*

```

<!--The interpretation of this credible interval is much closer to what we had hoped we could get from a confidence interval (but could not): It tells us that there is a 95% probability that the value of $p_{respond}$ falls between these two values.  Importantly, in this case it shows that we have high confidence that $p_{respond} > 0.0$, meaning that the drug seems to have a positive effect.-->
La interpretación de este intervalo de credibilidad está mucho más cerca de lo que esperábamos que pudiéramos obtener de un intervalo de confianza (pero que no obtuvimos): nos dice que hay un 95% de probabilidad de que el valor de $p_{respond}$ se encuentre entre estos dos valores. Es importante destacar que, en este caso, muestra que tenemos una alta confianza en que $p_{respond} > 0.0$, lo que significa que el medicamento parece tener un efecto positivo.

<!--In some cases the credible interval can be computed *numerically* based on a known distribution, but it's more common to generate a credible interval by sampling from the posterior distribution and then to compute quantiles of the samples. This is particularly useful when we don't have an easy way to express the posterior distribution numerically, which is often the case in real Bayesian data analysis.  One such method (rejection sampling) is explained in more detail in the Appendix at the end of this chapter.-->
En algunos casos el intervalo de credibilidad puede ser calculado *numéricamente* basado en una distribución conocida, pero es mucho más común generar un intervalo de credibilidad al muestrar de la distribución posterior y luego calcular cuantiles de las muestras. Esto es particularmente útil cuando no tenemos una forma sencilla de expresar la distribución posterior numéricamente, que es común en un análisis de datos Bayesiano real. Uno de estos métodos (muestreo de rechazo, *rejection sampling*) se explica con más detalle en el Apéndice al final de este capítulo.

<!--### Effects of different priors-->
### Efectos de diferentes probabilidades previas

<!--In the previous example we used a *flat prior*, meaning that we didn't have any reason to believe that any particular value of $p_{respond}$ was more or less likely.  However, let's say that we had instead started with some previous data: In a previous study, researchers had tested 20 people and found that 10 of them had responded positively.  This would have lead us to start with a prior belief that the treatment has an effect in 50% of people.  We can do the same computation as above, but using the information from our previous study to inform our prior (see panel A in Figure \@ref(fig:posteriorDistPrior)).-->
En el ejemplo anterior usamos una *probabilidad previa plana*, lo cual quiere decir que no teníamos ninguna razón para creer que algún valor en particular de $p_{respond}$ era más o menos probable. Sin embargo, digamos que en su lugar hubiéramos comenzado con algunos datos previos: En un estudio previo, investigadores habían puesto a prueba a 20 personas y habían encontrado que 10 de ellas respondieron positivamente. Esto nos habría llevado a comenzar con la creencia previa de que el tratamiento tiene efecto en el 50% de las personas. Podemos hacer el mismo cálculo que en el anterior, pero usando la información de nuestro estudio anterior para informar nuestra probabilidad previa (ve el panel A en la Figura \@ref(fig:posteriorDistPrior)).

```{r ,echo=FALSE}

# *MH:* i wonder what you’re doing here: is this the same thing as doing a bayesian inference assuming 10 / 20 data and using the posterior from that as the prior for this analysis? that is what woud normally be the straightfoward thing to do.

```

```{r echo=FALSE}
# compute likelihoods for data under all values of p(heads) 
# using a flat or empirical prior.  
# here we use the quantized values from .01 to .99 in steps of 0.01

df <-
  tibble(
    steps = seq(from = 0.01, to = 0.99, by = 0.01)
  ) %>%
  mutate(
    likelihoods = dbinom(nResponders, 100, steps),
    priors_flat = dunif(steps) / sum(dunif(steps)),
    priors_empirical = dbinom(10, 20, steps) / sum(dbinom(10, 20, steps))
  )

marginal_likelihood_flat <- 
  sum(dbinom(nResponders, 100, df$steps) * df$priors_flat)

marginal_likelihood_empirical <- 
  sum(dbinom(nResponders, 100, df$steps) * df$priors_empirical)

df <- 
  df %>%
  mutate(
    posteriors_flat = 
      (likelihoods * priors_flat) / marginal_likelihood_flat,
    posteriors_empirical = 
      (likelihoods * priors_empirical) / marginal_likelihood_empirical
  )

p1 <- ggplot(df, aes(steps, posteriors_flat)) +
  geom_line(color = "blue") +
  xlab("p(heads)") + ylab("Posterior probability") +
  geom_line(aes(steps, posteriors_empirical), color = "red") +
  geom_line(aes(steps, priors_empirical), linetype = "dotted")

```

<!--Note that the likelihood and marginal likelihood did not change - only the prior changed.  The effect of the change in prior to was to pull the posterior closer to the mass of the new prior, which is centered at 0.5.-->
Nota que la probabilidad y la probabilidad marginal no cambiaron - solamente cambió la probabilidad previa. El efecto del cambio en la probabilidad previa fue jalar la probabilidad posterior más cerca de la masa de la nueva probabilidad previa, la cual está centrada en 0.5.

<!--Now let's see what happens if we come to the analysis with an even stronger prior belief.  Let's say that instead of having previously observed 10 responders out of 20 people, the prior study had instead tested 500 people and found 250 responders.  This should in principle give us a much stronger prior, and as we see in panel B of Figure \@ref(fig:posteriorDistPrior) , that's what happens: The prior is much more concentrated around 0.5, and the posterior is also much closer to the prior.  The general idea is that Bayesian inference combines the information from the prior and the likelihood, weighting the relative strength of each.-->
Ahora veamos qué pasa si llegamos al análisis con un conocimiento previo más fuerte aún. Digamos que en lugar de haber observado con anterioridad a 10 pesonas que respondieron positivamente de 20, el estudio previo había puesto a prueba a 500 personas y encontró 250 que respondieron positivamente. Esto en principio nos debería de dar una probabilidad previa más fuerte, y como vemos en el panel B de la Figura \@ref(fig:posteriorDistPrior), eso es lo que pasa: La probabilidad previa está mucho más concentrada alrededor de 0.5, y la probabilidad posterior está mucho más cerca de la probabilidad previa. La idea general es que la inferencia Bayesiana combina la información de la probabilidad previa y la probabilidad (*likelihood*), ponderando el peso relativo de cada una.   
 
```{r echo=FALSE}
# compute likelihoods for data under all values of p(heads) using strong prior.

df <-
  df %>%
  mutate(
    priors_strong = dbinom(250, 500, steps) / sum(dbinom(250, 500, steps))
  )

marginal_likelihood_strong <- 
  sum(dbinom(nResponders, 100, df$steps) * df$priors_strong)

df <-
  df %>%
  mutate(
    posteriors_strongprior = (likelihoods * priors_strong) / marginal_likelihood_strong
  )

p2 <- ggplot(df,aes(steps,posteriors_empirical)) + 
  geom_line(color='blue') + 
  xlab('p(heads)') + ylab('Posterior probability') +
  geom_line(aes(steps,posteriors_strongprior),color='red') +
  geom_line(aes(steps,priors_strong),linetype='dotted')


```


<!--This example also highlights the sequential nature of Bayesian analysis -- the posterior from one analysis can become the prior for the next analysis.-->
Este ejemplo también destaca la naturaleza secuencial de los análisis Bayesianos -- la probabilidad posterior de un análisis puede tornarse en la probabilidad previa del siguiente análisis. 

<!--Finally, it is important to realize that if the priors are strong enough, they can completely overwhelm the data.  Let's say that you have an absolute prior that $p_{respond}$ is 0.8 or greater, such that you set the prior likelihood of all other values to zero.  What happens if we then compute the posterior?-->
Finalmente, es importante destacar que si las probabilidades previas son lo suficientemente fuertes, pueden abrumar completamente a los datos. Digamos que tienes una probabilidad previa absoluta en donde $p_{respond}$ es 0.8 o más, por lo que defines la probabilidad previa de todos los demás valores a cero. ¿Qué pasa entonces cuando calculamos la probabilidad posterior?

```{r echo=FALSE}
# compute likelihoods for data under all values of p(respond) using absolute prior. 
df <-
  df %>%
  mutate(
    priors_absolute = array(data = 0, dim = length(steps)),
    priors_absolute = if_else(
      steps >= 0.8,
      1, priors_absolute
    ),
    priors_absolute = priors_absolute / sum(priors_absolute)
  )

marginal_likelihood_absolute <- 
  sum(dbinom(nResponders, 100, df$steps) * df$priors_absolute)

df <-
  df %>%
  mutate(
    posteriors_absolute = 
      (likelihoods * priors_absolute) / marginal_likelihood_absolute
  )

```

<!-- fig.cap="A: Effects of priors on the posterior distribution.  The original posterior distribution based on a flat prior is plotted in blue. The prior based on the observation of 10 responders out of 20 people is plotted in the dotted black line, and the posterior using this prior is plotted in red.  B: Effects of the strength of the prior on the posterior distribution. The blue line shows the posterior obtained using the prior based on 50 responders out of 100 people.  The dotted black line shows the prior based on 250 responders out of 500 people, and the red line shows the posterior based on that prior. C: Effects of the strength of the prior on the posterior distribution. The blue line shows the posterior obtained using an absolute prior which states that p(respond) is 0.8 or greater.  The prior is shown in the dotted black line." -->
```{r posteriorDistPrior,echo=FALSE,fig.cap="A: Efectos de probabilidades previas sobre la distribución posterior.  La distribución posterior original qur se basó en una probabilidad previa plana se muestra en azul. La probabilidad previa basada en la observación de 10 respondientes de un total de 20 personas está graficada con la línea negra punteada, y la probabilidad posterior usando esta probabilidad previa está graficada en rojo.  B: Efectos de la fuerza de la probabilidad previa sobre la distribución posterior. La línea azul muestra la probabilidad posterior obtenida usando una probabilidad previa basada en 50 respondientes de un total de 100 personas. La línea negra punteada muestra la probabilidad previa basada en 250 respondientes de 500 personas, y la línea roja muestra la probabilidad posterior basada en esta probabilidad previa. C: Efectos de la fuerza de la probabilidad previa sobre la distribución posterior. La línea azul muestra la probabilidad posterior obtenida usando una probabilidad previa absoluta que indica que p(respond) es 0.8 o mayor. La probabilidad previa se muestra con la línea negra punteada.",fig.width=8,fig.height=8,out.width='80%'}

p3 <- ggplot(df,aes(steps,posteriors_absolute)) + 
  geom_line(color='blue') + 
  xlab('p(heads)') + 
  ylab('Posterior probability') +
  ylim(0,max(df$posteriors_absolute)*1.1) + 
  geom_line(aes(steps,
            priors_absolute*max(df$posteriors_absolute)*20),
            linetype='dotted',
            size=1)

plot_grid(p1, p2,p3, labels='AUTO')
```

<!--In panel C of Figure \@ref(fig:posteriorDistPrior) we see that there is zero density in the posterior for any of the values where the prior was set to zero - the data are overwhelmed by the absolute prior.-->
En el panel de C de la Figura \@ref(fig:posteriorDistPrior) observamos que hay una densidad cero en la probabilidad posterior para cualquiera de los valores en donde la probabilidad previa se ha puesto en cero - los datos se ven abrumados por el valor previo absoluto.

<!--## Choosing a prior-->
## Elegir una probabilidad previa

<!--The impact of priors on the resulting inferences are the most controversial aspect of Bayesian statistics. What is the right prior to use? If the choice of prior determines the results (i.e., the posterior), how can you be sure you results are trustworthy? These are difficult questions, but we should not back away just because we are faced with hard questions. As we discussed previously, Bayesian analyses give us interpretable results (credible intervals, etc.). This alone should inspire us to think hard about these questions so that we can arrive with results that are reasonable and interpretable.-->
El impacto de las probabilidades previas en el resultado de las inferencias es uno de los aspectos más controversiales de la estadística Bayesiana. ¿Cuál es la probabilidad correcta a usar? Si la elección de probabilidad previa determina los resultados (es decir, la probabilidad posterior), ¿cómo puedes estar segurx que los resultados son confiables? Estas son preguntas difíciles, pero no deberíamos de retroceder solamente porque nos enfrentemos con preguntas difíciles. Como lo discutimos previamente, los análisis Bayesianos nos dan resultados interpretables (intervalos de credibilidad, etc.). Esto nos debería de inspirar a pensar seriamente sobre estas preguntas, y así poder llegar a resultados que sean razonables e interpretables.

<!--There are various ways to choose one's priors, which (as we saw above) can impact the resulting inferences. Sometimes we have a very specific prior, as in the case where we expected our coin to lands heads 50% of the time, but in many cases we don't have such strong a starting point. *Uninformative priors* attempt to influence the resulting posterior as little as possible, as we saw in the example of the uniform prior above.  It's also common to use *weakly informative priors* (or *default priors*), which influence the result only very slightly. For example, if we had used a binomial distribution based on one heads out of two coin flips, the prior would have been centered around 0.5 but fairly flat, influence the posterior only slightly.  It is also possible to use priors based on the scientific literature or pre-existing data, which we would call *empirical priors*.  In general, however, we will stick to the use of uninformative/weakly informative priors, since they raise the least concern about influencing our results.-->
Hay varias formas de elegir probabilidades previas, que (como vimos anteriormente) pueden afectar las inferencias resultantes. A veces tenemos una probabilidad previa muy específica, como en el caso en el que esperábamos que nuestra moneda saliera cara el 50% de las veces, pero en muchos casos no tenemos un punto de partida tan sólido. Las *probabilidades previas no informativas* (*Uninformative priors*) intentan influir lo menos posible en la probabilidad posterior resultante, como vimos en el ejemplo anterior con probabilidad uniforme. También es común usar *probabilidades previas débilmente informativas* (o *probabilidades previas por defecto*, *weakly informative priors* o *default priors*), que influyen en el resultado sólo muy levemente. Por ejemplo, si hubiéramos usado una distribución binomial basada en el resultado de una cara de dos lanzamientos de moneda, la probabilidad previa se habría centrado alrededor de 0.5, pero bastante plana, influyendo en la probabilidad posterior sólo ligeramente.  También es posible utilizar probabilidades previas basadas en la literatura científica o datos preexistentes, que llamaríamos *probabilidades previas empíricas* (*empirical priors*). En general, sin embargo, nos ceñiremos al uso de probabilidades previas no-informativas / poco-informativas, ya que generan la menor preocupación de influir en nuestros resultados.

<!--## Bayesian hypothesis testing-->
## Prueba de hipótesis Bayesiana

<!--Having learned how to perform Bayesian estimation, we now turn to the use of Bayesian methods for hypothesis testing.  Let's say that there are two politicians who differ in their beliefs about whether the public is in favor an extra tax to support the national parks. Senator Smith thinks that only 40% of people are in favor of the tax, whereas Senator Jones thinks that 60% of people are in favor.  They arrange to have a poll done to test this, which asks 1000 randomly selected people whether they support such a tax. The results are that 490 of the people in the polled sample were in favor of the tax. Based on these data, we would like to know: Do the data support the claims of one senator over the other,and by how much?  We can test this using a concept known as the [Bayes factor](https://bayesfactor.blogspot.com/2014/02/the-bayesfactor-package-this-blog-is.html), which quantifies which hypothesis is better by comparing how well each predicts the observed data.-->
Habiendo aprendido cómo realizar la estimación bayesiana, ahora pasamos al uso de métodos bayesianos para la prueba de hipótesis. Digamos que hay dos políticos que difieren en sus creencias sobre si el público está a favor de un impuesto extra para apoyar los parques nacionales. El senador Smith piensa que solo el 40% de la gente está a favor del impuesto, mientras que el senador Jones cree que el 60% de la gente está a favor. Organizan una encuesta para probar esto, que pregunta a 1000 personas seleccionadas al azar si apoyan tal impuesto. Los resultados son que 490 de las personas de la muestra encuestada estaban a favor del impuesto. Con base en estos datos, nos gustaría saber: ¿Los datos respaldan las afirmaciones de un senador sobre el otro y en qué medida? Podemos probar esto usando un concepto conocido como el [factor de Bayes](https://bayesfactor.blogspot.com/2014/02/the-bayesfactor-package-this-blog-is.html) (*Bayes Factors*), que cuantifica qué hipótesis es mejor comparando qué tan bien predicen los datos cada una de las hipótesis.


<!--### Bayes factors {#Bayes-factors}-->
### Factores de Bayes {#Bayes-factors}

```{r echo=FALSE}
# compute Bayes factor for Smith vs. Jones

bf <-
  dbinom(
    x = 490,
    size = 1000,
    prob = 0.4 #Smith's hypothesis
  ) / dbinom(
    x = 490, 
    size = 1000, 
    prob = 0.6 #Jones' hypothesis
  )

```


<!--The Bayes factor characterizes the relative likelihood of the data under two different hypotheses.  It is defined as:-->
El factor de Bayes caracteriza la probabilidad relativa de los datos bajo dos hipótesis diferentes. Es definido como:

$$
BF = \frac{p(data|H_1)}{p(data|H_2)}
$$

<!--for two hypotheses $H_1$ and $H_2$.  In the case of our two senators, we know how to compute the likelihood of the data under each hypothesis using the binomial distribution; let's assume for the moment that our prior probability for each senator being correct is the same ($P_{H_1} = P_{H_2} = 0.5$).  We will put Senator Smith in the numerator and Senator Jones in the denominator, so that a value greater than one will reflect greater evidence for Senator Smith, and a value less than one will reflect greater evidence for Senator Jones. The resulting Bayes Factor (`r I(bf)`) provides a measure of the evidence that the data provides regarding the two hypotheses - in this case, it tells us the data support Senator Smith more than 3000 times more strongly than they support Senator Jones.-->
para dos hipótesis $H_1$ y $H_2$. En el caso de nuestros dos senadores, sabemos cómo calcular la probabilidad de los datos bajo cada hipótesis utilizando la distribución binomial; supongamos por el momento que nuestra probabilidad previa de que cada senador esté en lo correcto es la misma ($P_{H_1} = P_{H_2} = 0.5$). Pondremos al senador Smith en el numerador y al senador Jones en el denominador, de modo que un valor mayor que uno reflejará una mayor evidencia para el senador Smith, y un valor menor que uno reflejará una mayor evidencia para el senador Jones. El factor de Bayes resultante (`r I(bf)`) proporciona una medida de la evidencia que los datos proporcionan con respecto a las dos hipótesis; en este caso, nos dice que los datos apoyan al senador Smith más de 3000 veces más de lo que apoyan al senador Jones.

<!--### Bayes factors for statistical hypotheses-->
### Factores de Bayes para hipótesis estadísticas

<!--In the previous example we had specific predictions from each senator, whose likelihood we could quantify using the binomial distribution. In addition, our prior probability for the two hypotheses was equal.  However, in real data analysis we generally must deal with uncertainty about our parameters, which complicates the Bayes factor, because we need to compute the marginal likelihood (that is, an integrated average of the likelihoods over all possible model parameters, weighted by their prior probabilities).  However, in exchange we gain the ability to quantify the relative amount of evidence in favor of the null versus alternative hypotheses.-->
En el ejemplo anterior teníamos predicciones específicas de cada senador, cuya probabilidad pudimos cuantificar utilizando la distribución binomial. Además, nuestra probabilidad previa para las dos hipótesis era igual. Sin embargo, en el análisis de datos reales generalmente debemos lidiar con la incertidumbre acerca de nuestros parámetros, lo que complica el factor de Bayes, porque necesitamos calcular la probabilidad marginal (es decir, un promedio integrado de las probabilidades sobre todos los parámetros posibles del modelo, ponderado por su probabilidades). Sin embargo, a cambio, obtenemos la capacidad de cuantificar la cantidad relativa de evidencia a favor de la hipótesis nula frente a la alternativa.

<!--Let's say that we are a medical researcher performing a clinical trial for the treatment of diabetes, and we wish to know whether a particular drug reduces blood glucose compared to placebo. We recruit a set of volunteers and randomly assign them to either drug or placebo group, and we measure the change in hemoglobin A1C (a marker for blood glucose levels) in each group over the period in which the drug or placebo was administered.  What we want to know is: Is there a difference between the drug and placebo?-->
Digamos que somos un investigador médico que realiza un ensayo clínico para el tratamiento de la diabetes y deseamos saber si un medicamento en particular reduce la glucosa en sangre en comparación con el placebo. Reclutamos un conjunto de voluntarios y los asignamos aleatoriamente al grupo de fármaco o placebo, y medimos el cambio en la hemoglobina A1C (un marcador de los niveles de glucosa en sangre) en cada grupo durante el período en el que se administró el fármaco o el placebo. Lo que queremos saber es: ¿Existe alguna diferencia entre el fármaco y el placebo?

<!--First, let's generate some data and analyze them using null hypothesis testing (see Figure \@ref(fig:bayesTesting)). Then let's perform an independent-samples t-test, which shows that there is a significant difference between the groups:-->
Primero, generemos algunos datos y analicémoslos usando la prueba de hipótesis nula (ve la Figura \@ref(fig:bayesTesting)). Luego, realicemos una prueba t de muestras independientes, que muestra que hay una diferencia significativa entre los grupos:


```{r echo=FALSE}
# create simulated data for drug trial example

set.seed(1234567)
nsubs <- 40
effect_size <- 0.6

# randomize indiviuals to drug (1) or placebo (0)
drugDf <-
  tibble(
    group = as.integer(runif(nsubs) > 0.5)
  ) %>%
  mutate(
    hbchange = rnorm(nsubs) - group * effect_size
  )

```

<!-- fig.cap="Box plots showing data for drug and placebo groups." -->
```{r bayesTesting,echo=FALSE,fig.cap="Box plots mostrando los datos de los grupos de medicamento y de placebo.",fig.width=4,fig.height=4,out.height='50%'}

drugDf %>%
  mutate(
    group = as.factor(
      recode(
        group,
        "1" = "Drug",
        "0" = "Placebo"
      )
    )
  ) %>%
  ggplot(aes(group, hbchange)) +
  geom_boxplot() +
  annotate("segment", x = 0.5, xend = 2.5, y = 0, yend = 0, linetype = "dotted") +
  labs(
    x = "",
    y = "Change in hemoglobin A1C"
  )
```


```{r echo=FALSE}
# compute t-test for drug example
drugTT <- t.test(hbchange ~ group, alternative = "greater", data = drugDf)
print(drugTT)
```

<!--This test tells us that there is a significant difference between the groups, but it doesn't quantify how strongly the evidence supports the null versus alternative hypotheses.  To measure that, we can compute a Bayes factor using `ttestBF` function from the BayesFactor package in R:-->
Esta prueba nos dice que hay una diferencia significativa entre los grupos, pero no cuantifica la fuerza con la que la evidencia apoya la hipótesis nula versus la alternativa. Para medir eso, podemos calcular un factor de Bayes usando la función `ttestBF` del paquete de BayesFactor en R:

```{r echo=FALSE, message=FALSE,warning=FALSE}
# compute Bayes factor for drug data
bf_drug <- ttestBF(
  formula = hbchange ~ group, data = drugDf,
  nullInterval = c(0, Inf)
)

bf_drug
```

<!--We are particularly interested in the Bayes Factor for an effect greater than zero, which is listed in the line marked "[1]" in the report.  The Bayes factor here tells us that the alternative hypothesis (i.e. that the difference is greater than zero) is about 3 times more likely than the point null hypothesis (i.e. a mean difference of exactly zero) given the data.  Thus, while the effect is significant, the amount of evidence it provides us in favor of the alternative hypothesis is rather weak.-->
Estamos particularmente interesadxs en el factor de Bayes para un efecto mayor que cero, que se enumera en la línea marcada "[1]" en el informe. El factor de Bayes aquí nos dice que la hipótesis alternativa (es decir, que la diferencia sea mayor que cero) es aproximadamente 3 veces más probable que la hipótesis nula puntual (es decir, una diferencia media de exactamente cero) dados los datos. Por lo tanto, si bien el efecto es significativo, la cantidad de evidencia que nos proporciona a favor de la hipótesis alternativa es bastante débil.

<!--#### One-sided tests-->
#### Pruebas unilaterales (one-sided tests)

<!--We generally are less interested in testing against the null hypothesis of a specific point value (e.g. mean difference = 0) than we are in testing against a directional null hypothesis (e.g. that the difference is less than or equal to zero).  We can also perform a directional (or *one-sided*) test using the results from `ttestBF` analysis, since it provides two Bayes factors: one for the alternative hypothesis that the mean difference is greater than zero, and one for the alternative hypothesis that the mean difference is less than zero.  If we want to assess the relative evidence for a positive effect, we can compute a Bayes factor comparing the relative evidence for a positive versus a negative effect by simply dividing the two Bayes factors returned by the function:-->
Por lo general, estamos menos interesados en contrastar la hipótesis nula de un valor puntual específico (por ejemplo, diferencia media = 0) que en contrastar una hipótesis nula direccional (por ejemplo, que la diferencia es menor o igual a cero). También podemos realizar una prueba direccional (o *unilateral*, en inglés *one-sided*) utilizando los resultados del análisis `ttestBF`, ya que proporciona dos factores de Bayes: uno para la hipótesis alternativa de que la diferencia media es mayor que cero y otro para la hipótesis alternativa de que la diferencia media es menor que cero. Si queremos evaluar la evidencia relativa de un efecto positivo, podemos calcular un factor de Bayes comparando la evidencia relativa de un efecto positivo versus negativo simplemente dividiendo los dos factores de Bayes devueltos por la función:

```{r echo=FALSE}
bf_drug[1]/bf_drug[2]
```

<!--Now we see that the Bayes factor for a positive effect versus a negative effect is substantially larger (almost 30).--> 
Ahora vemos que el factor Bayes para un efecto positivo frente a un efecto negativo es sustancialmente mayor (casi 30).

<!--#### Interpreting Bayes Factors-->
#### Interpretar Factores de Bayes

<!--How do we know whether a Bayes factor of 2 or 20 is good or bad? There is a general guideline for interpretation of Bayes factors suggested by [Kass & Rafferty (1995)](https://www.andrew.cmu.edu/user/kk3n/simplicity/KassRaftery1995.pdf):-->
¿Cómo sabemos si un factor de Bayes de 2 o 20 es bueno o malo? Existe una guía general para la interpretación de los factores de Bayes sugerida por [Kass & Rafferty (1995)](https://www.andrew.cmu.edu/user/kk3n/simplicity/KassRaftery1995.pdf):

<!-- 
|BF|	Strength of evidence|
|---------|---------------------|
|1 to 3 |  not worth more than a bare mention|
|3 to 20| positive|
|20 to 150| strong|
|>150 | very strong|
--> 
|BF|	Fuerza de la evidencia|
|---------|---------------------|
|1 to 3 |  No merece más que una simple mención|
|3 to 20| Positiva|
|20 to 150| Fuerte|
|>150 | Muy fuerte|

<!--Based on this, even though the statisical result is significant, the amount of evidence in favor of the alternative vs. the point null hypothesis is weak enough that it's hardly worth even mentioning, whereas the evidence for the directional hypothesis is relatively strong.-->
Con base en esto, aunque el resultado estadístico es significativo, la cantidad de evidencia a favor de la hipótesis alternativa frente a la hipótesis nula puntual es tan débil que apenas vale la pena mencionarla, mientras que la evidencia para la hipótesis direccional es relativamente sólida.


<!--### Assessing evidence for the null hypothesis-->
### Evaluar evidencia a favor de la hipótesis nula

<!--Because the Bayes factor is comparing evidence for two hypotheses, it also allows us to assess whether there is evidence in favor of the null hypothesis, which we couldn't do with standard null hypothesis testing (because it starts with the assumption that the null is true).  This can be very useful for determining whether a non-significant result really provides strong evidence that there is no effect, or instead just reflects weak evidence overall.-->
Debido a que el factor de Bayes está comparando evidencia para dos hipótesis, también nos permite evaluar si hay evidencia a favor de la hipótesis nula, lo cual no podríamos hacer con la prueba estándar de hipótesis nula (porque comienza con la suposición de que la nula es cierta). Esto puede ser muy útil para determinar si un resultado no significativo realmente proporciona pruebas sólidas de que no hay ningún efecto o, en cambio, solo refleja una evidencia débil en general.

<!--## Learning objectives-->
## Objetivos de aprendizaje

<!--After reading this chapter, should be able to:-->
Después de leer este capítulo, debes ser capaz de:

<!--* Describe the main differences between Bayesian analysis and null hypothesis testing
* Describe and perform the steps in a Bayesian analysis
* Describe the effects of different priors, and the considerations that go into choosing a prior
* Describe the difference in interpretation between a confidence interval and a Bayesian credible interval-->
* Describir las principales diferencias entre el análisis bayesiano y la prueba de hipótesis nula.
* Describir y realizar los pasos en un análisis bayesiano.
* Describir los efectos de diferentes probabilidades previas y las consideraciones que intervienen en la elección de una probabilidad previa.
* Describir la diferencia de interpretación entre un intervalo de confianza y un intervalo de credibilidad bayesiano.

<!-- ## Suggested readings -->
## Lecturas sugeridas

- *The Theory That Would Not Die: How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant from Two Centuries of Controversy*, por Sharon Bertsch McGrayne.
- *Doing Bayesian Data Analysis: A Tutorial Introduction with R*, por John K. Kruschke.

<!-- ## Appendix:  -->
## Apéndice:

<!-- ### Rejection sampling -->
### Muestreo de rechazo

<!-- We will generate samples from our posterior distribution using a simple algorithm known as [*rejection sampling*](https://am207.github.io/2017/wiki/rejectionsampling.html).  The idea is that we choose a random value of x (in this case $p_{respond}$) and a random value of y (in this case, the posterior probability of $p_{respond}$) each from a uniform distribution. We then only accept the sample if $y < f(x)$ - in this case, if the randomly selected value of y is less than the actual posterior probability of y.  Figure \@ref(fig:rejectionSampling) shows an example of a histogram of samples using rejection sampling, along with the 95% credible intervals obtained using this method (with the values presented in Table \@ref(tab:credInt))   -->
Generaremos muestras a partir de nuestra distribución de probabilidad posterior usando un algoritmo simple conocido como [*muestreo de rechazo* (*rejection sampling*)](https://am207.github.io/2017/wiki/rejectionsampling.html). La idea es que elegimos un valor aleatorio de x (en este caso $p_{respond}$) y un valor aleatorio de y (en este caso, la probabilidad posterior de $p_{respond}$) cada uno de una distribución uniforme. Luego aceptamos la muestra sólo si $y < f(x)$ - en este caso, si el valor seleccionado aleatoriamente de y es menor que la probabilidad posterior real de y. La Figura \@ref(fig:rejectionSampling) muestra un ejemplo de un histograma de muestras usando el muestreo de rechazo, junto con el intervalo de credibilidad del 95% obtenido usando este método (con los valores presentados en la Tabla \@ref(tab:credInt)).

```{r credInt, echo=FALSE}
# Compute credible intervals for example

nsamples <- 100000

# create random uniform variates for x and y
x <- runif(nsamples)
y <- runif(nsamples)

# create f(x)
fx <- dbinom(x = nResponders, size = 100, prob = x)

# accept samples where y < f(x)
accept <- which(y < fx)
accepted_samples <- x[accept]

credible_interval <- quantile(x = accepted_samples, 
                              probs = c(0.025, 0.975))
kable(credible_interval)
```

<!-- fig.cap="Rejection sampling example.The black line shows the density of all possible values of p(respond); the blue lines show the 2.5th and 97.5th percentiles of the distribution, which represent the 95 percent credible interval for the estimate of p(respond)." -->
```{r rejectionSampling,echo=FALSE,fig.cap="Ejemplo de muestreo de rechazo. La línea negra muestra la densidad de todos los posibles valores de p(respond); las líneas azules muestran los percentiles 2.5 y 97.5 de la distribución, que representan el intervalo de credibilidad al 95 porciento para la estimación de p(respond).",fig.width=4,fig.height=4,out.height='50%'}

# plot histogram

p=ggplot(data.frame(samples=accepted_samples),aes(samples)) + 
  geom_density()

for (i in 1:2) {
  p = p + annotate('segment',x=credible_interval[i],xend=credible_interval[i],
           y=0,yend=2,col='blue',lwd=1) 
} 
print(p)
```