-
Jupyter Notebooks:
CP Confidence.ipynb
: Analysis related to confidence, using subject-level data.CP Confidence No Subject.ipynb
: Confidence analysis without subject-specific data.CP Risk.ipynb
: Risk analysis using subject-level data.CP Risk No Subject.ipynb
: Risk analysis without subject-specific data.Demeaned Confidence.ipynb
: Confidence analysis with demeaned data for statistical adjustments.Demeaned Risk.ipynb
: Risk analysis with demeaned data.Peterson.ipynb
: Analysis or replication related to a study by Peterson (details inside the notebook).Shubatt Replication.ipynb
: Replication of a study by Shubatt (details inside the notebook).
-
Stata
.do
File:calc_index.do
: A Stata script to calculate complexity indices from Quantifying Lottery Choice Complexity (Enke & Shubatt 2023):
-
OPC: Objective Problem Complexity
-
SPC: Subjective Problem Complexity
-
OAC: Objective Aggregation Complexity
-
SAC: Subjective Aggregation Complexity
-
OLC: Objective Lottery Complexity
-
SLC: Subjective Lottery Complexity
Depending on the Input, the tool calculates these indices and saves them in the output
folder, including the necessary features to obtain these indices.
The data I used are provided by Peterson et al., 2021, which could be found at choice13k.
- Max number of lotteries: 2
- Max number of states per lottery: 9
- Payout value is ignored if its probability is 0
- Payouts should be distinct in each lottery
- Problem ID is optional and should be indicated by the column name
problem
. - CSV format
- Not existing probabilities and states can be indicated by
,"",
,,"NA",
or,,
- Additional columns can be in the input data and will not be manipulated by code, as long as it does not match the pattern
x_
,p_
,_a_
,_a
,_b_
,_b
andcor_
If two lotteries are supplied, the column names should be as displayed in the table below. The columns indicating payouts should take the form x_{l}_{i}
, where p_{l}_{i}
. The state index x_a_1
, x_a_2
, p_a_1
, p_a_2
, and similarly for
If only one lottery is supplied, the lottery column names can be either as displayed above (using only the _a_
columns and no _b_
columns). Alternatively, the _a_
segment may be omitted, in which case payoff columns will take the form x_{i}
and probability columns will take the form p_{i}
. Again, sample_just_OLC_SLC_calculation_1.csv
or sample_just_OLC_SLC_calculation_2.csv
in the sample_data
folder.
The results will be saved in output
with index_calculated_stata.csv
depending on which script you run, including features which are necessary for the calculations. (Additionally .dta
, are saved, depending on the executed script).
If two lotteries are supplied as above, all 6 indices are automatically calculated. The results are ordered in the CSV as follows: Problem [Optional] | Supplied probabilities and payouts | OPC | SPC | OAC | SAC | OLC_a | SLC_a | OLC_a | SLC_a | then in the same order of the indices the necessary features for their calculations| and in the end any other columns which were also in the input data but not used by the code |. The endings _a
or _b
at OLC
and SLC
indicate to which lottery the lottery complexity index is referring. compound
is optional.
The features for each index are the following. Please see Section 4
for the development of the indices and appendix Potential Complexity Features for details about feature definition in Quantifying Lottery Choice Complexity.
Consider a choice between two lotteries indexed by
- Log excess dissimilarity (
ln_excess_dissimilarity
):
When$F_A(x)$ and$F_B(x)$ are the CDFs of Lottery A and B with EV(.) indicating the expected value of a lottery then Log excess dissimilarity is defined as
- No dominance (
no_dominance
):
$$\exists x_1 , x_2: F_A(x_1) < F_B(x_1) \land F_A(x_2) >F_B(x_2) $$ - Average log payout magnitude (
ave_ln_payout_magn
):
$$\frac{1}{2} \Big [ log \Big(1 + 1/k_A \sum_{s=1}^{k_A} |x_s^A| \Big) + log \Big (1 + 1/k_B \sum_{s=1}^{k_B} |x_s^B|) \Big) \Big ]$$ - Average log number of states (
ave_ln_num_states_a
):
$$\frac{log(1 + k_A) + log(1 + k_B)}{2}$$ - Frac. lotteries involving loss (
frac_involves_losses
)
- If one lottery in the choice is compound (according to definition above)
- Absolute expected value difference (
abs_ev_diff
):
$$|EV(A) - EV(B)|$$ - Absolute expected value difference squared (
abs_ev_diff_sq
):
$$|EV(A) - EV(B)|^2$$
As above but without abs_ev_diff
and abs_ev_diff_sq
features.
If just one lottery is supplied the lottery complexity is calculated (OLC/SLC). In principle, the ordering of the output is the same as for the Choice Complexity output. However, payouts and probabilities are now named x_1, x_2, ... p_1, p_2. Additionally, the features and indices don't have the appendix _a
as just one lottery is in the dataset.
The following defines features for both lotteries indicated with
-
Log Variance (
ln_variance_a/b
):
$$log\Big ( 1+ \sum_{s=1}^{k_j} p_s^j(x_i^j)^2 -( \sum_{s=1}^{k_j} p_s^jx_s^j)^2 \Big)$$ -
Log payout magnitude (
ln_payout_magn_a/b
):
$$log\Big( 1 + 1/k_j\sum_{s=1}^{k_j}|x_s^j| \Big)$$ -
Log number of states (
ln_num_states_a/b
):
- 1 if involves loss (
involves_loss_a/b
) - 1 if involves compound probability(
compound
)
See Shubatt Replication.ipynb
for the replication code.