This repository has been archived by the owner on Aug 9, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
06-resilience-sms.Rmd
1121 lines (899 loc) · 82 KB
/
06-resilience-sms.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Health resilience spatial microsimulation {#ressim}
<!-- Notes:
- Is the error small enough to identify resilient individuals?
+ Just state assumptions, same as you would any other method.
+ "Within the limits of this study..." [@cairns2012a, p. 932].
Initial literature review of resilience needs to set out how health resilience
is used to explain why I use the two approaches.
- Why do the systematic review?
- Why clinical depression?
-->
## Introduction {#ressim-intro}
After successfully simulating a pilot spatial micro dataset in Chapter \@ref(methods) I moved on to simulate health resilience, which includes clinical depression and measures of deprivation, and indicators of poverty which I use to examine the likely effects of a number of local and national policy proposals in Chapter \@ref(policy).
This was again a simulation of Doncaster, my case study area, at output area level.
To perform the simulation I used the same data sources as the pilot simulation, namely *Understanding Society* and the 2011 census tables.
Where this simulation differed was in the increased number of target variables that I simulated to help identify health resilience, and in the increased number of constraint variables I used to improve the accuracy of the simulation.
For the target variables I compared two approaches to identify resilience.
One approach was to simulate mental health outcomes, specifically prevalence of clinical depression, at the area level.
I then combined these results with area--level deprivation measures to identify which area or areas could be considered resilient, if any.
This is similar to the approach taken by much contemporary social science research into health resilience, such as that by @bartley2006b, @mitchell2009a, or @cairns2013a.
The other approach was to simulate variables that identify concepts thought to promote resilience, as outlined in Chapter \@ref(sysrev).
With this approach I was able to specify which areas might be resilient under certain assumptions.
These two approaches are documented in Section \@ref(ressim-results).
Finally I simulated various indicators of economic and social status, which I use to examine the possible effects of proposed national and local policy changes in Chapter \@ref(policy).
For the constraints I wanted to test additional variables because more constraints can lead to a more accurate simulation, although some authors suggest the number of possible categories for each constraint is at least as important as the number of constraints themselves:
> ...a model constrained by two variables, each containing 10 categories (20 constraint categories in total), will be better constrained than a model constrained by 5 binary variables such as male/female, young/old etc. [@lovelace2016a, p. 52].
Regardless of the efficacy of using multiple variables or multiple levels, by testing additional constraints I was able to satisfy both requirements, as many of the constraints have several response categories.
Of course, the constraints are only as good as their ability to predict the target variable, so I empirically tested this relationship in Section \@ref(ressim-test).
## Target variables {#ressim-targets}
```{r res-charac-operationalise-prep}
res_charac_operationalise <- data.frame(
paper_13a = c("13",
"Neighbourhood cohesion",
"scopngbhe; scopngbhg"),
paper_13b = c("",
"Neighbourhood trust",
"nbrcoh3; sctrust (wave a_ only)"),
paper_13c = c("",
"Neighbourhood belonging",
"scopngbha"),
paper_13d = c("",
"Civic participation",
"orga"),
paper_13e = c("",
"Social cohesion",
"nbrcoh4 (reversed)"),
paper_13f = c("",
"Mutual respect",
"No suitable measure"),
paper_13g = c("",
"Heterogeneous relationships",
"siminc"),
paper_13h = c("",
"",
"simrace"),
paper_13i = c("",
"Political participation",
"No suitable measure"),
paper_13j = c("",
"Political activism",
"No suitable measure"),
paper_13l = c("",
"Political efficacy",
"poleff4 (reversed)"),
paper_13m = c("",
"Political trust",
"No suitable measure"),
paper_16 = c("16",
"Cognitive ability",
"No suitable measure"),
paper_23 = c("23",
"Support/encouragement",
"No suitable measure"),
paper_32a = c("32",
"Place attachment",
"As neighbourhood belonging"),
paper_32b = c("",
"Social capital",
"As paper 13"),
paper_32c = c("",
"Natural environment",
"No suitable measure"),
paper_37a = c("37",
"Employment",
"Excluded - constraint"),
paper_37b = c("",
"Finances/income",
"finnow"),
paper_37c = c("",
"Social isolation",
"closenum (>0)"),
paper_37d = c("",
"Occupational capital",
"No suitable measure"),
paper_37e = c("",
"Social support",
"No suitable measure"),
paper_46a = c("46",
"Place attachment",
"As paper 32"),
paper_46b = c("",
"Social capital",
"As paper 32"),
paper_46c = c("",
"Natural environment",
"No suitable measure"),
paper_67 = c("67",
"Sport involvement in youth",
"No suitable measure"),
paper_78a = c("78",
"Coping strategy",
"GHQ"),
paper_90a = c("90",
"Smoking",
"ncigs (0)"),
paper_90b = c("",
"Alcohol consumption",
"xpaltob_g3 (household measure)"),
paper_90c = c("",
"Diet",
"No suitable measure"),
paper_90d = c("",
"Exercise",
"No suitable measure"),
paper_96 = c("96",
"Sickness benefit provision",
"Excluded - not generally applicable"),
paper_98 = c("98",
"Peer support",
"No suitable measure"),
paper_173a = c("173",
"Repressive coping",
"No suitable measure"),
paper_195 = c("195",
"Access to healthcare",
"Excluded - not generally applicable"),
paper_204a = c("204",
"Greater distance to brownfield",
"No suitable measure"),
paper_204b = c("",
"Low environmental deprivation",
"No suitable measure"),
paper_206 = c("206",
"Resilience Scale (RS-25)",
"Items not provided"),
paper_208a = c("208",
"Gender",
"Excluded - constraint"),
paper_208b = c("",
"Age",
"Excluded - constraint"),
paper_208c = c("",
"Education level",
"Excluded - constraint"),
paper_208d = c("",
"Employment",
"Excluded - constraint"),
paper_208e = c("",
"Financial problems in last year",
"As paper 37"),
paper_208f = c("",
"Area-level deprivation",
"No suitable measure"),
paper_241a = c("241",
"Number of 'nodes' within 5 minutes",
"netlv (< 1 mile)"),
paper_241b = c("",
"Support from network",
"No suitable measure"),
paper_241c = c("",
"Frequent contacts (>1/week)",
"netph (at least weekly)"),
paper_241d = c("",
"Number of cohabitants",
"hhsize"),
paper_241e = c("",
"Binary: network include spouse/partner",
"Excluded - constraint"),
paper_241f = c("",
"Number of different relationship 'types'",
"No suitable measure"),
paper_241g = c("",
"Number of network pairs who know each other",
"No suitable measure"),
paper_241h = c("",
"Support given to others",
"Unknown types of support"),
paper_241i = c("",
"Social resources",
"No suitable measures"),
paper_241j = c("",
"Involvement in groups or organisations",
"As paper 13"),
paper_241k = c("",
"Binary: network member lost in last 12 months",
"No suitable measure"),
paper_241l = c("",
"Total network members lost in 12 months",
"No suitable measure"),
paper_242 = c("242",
"Similarity with area status",
"scopngbhg"),
paper_250 = c("250",
"Adverse Childhood Experiences (ACEs)",
"No suitable measures"),
paper_272 = c("272",
"Parental and grandparental mental health",
"No suitable measures"),
paper_307 = c("307",
"Budgeting/money management skills",
"finnow; save"),
stringsAsFactors = FALSE
)
res_charac_operationalise <- t(res_charac_operationalise)
colnames(res_charac_operationalise) = c("Paper", "Original measure",
"Understanding Society Variable")
```
Each of the two approaches to identify resilience that I described in Section \@ref(ressim-intro) required different target variables.
The first approach identified areas as resilient if they have low prevalence of clinical depression but high area--level deprivation.
I chose clinical depression as it is more closely associated with psychological resilience originated in early resilience literature, outlined in Chapter \@ref(reslit-intro).
To calculate this I simulated the prevalence of clinical depression.
In *Understanding Society* this was asked as 'Has a doctor or other health professional ever told you that you have any of these conditions?' [@understandingsociety2016a].
Respondents were asked if they had any one or more of 17 conditions, which included clinical depression.
Self--reported depression has been shown to be adequately correlated with clinical records of depression [@sanchez2008a].
The second approach simulated characteristics thought to relate to higher levels of resilience, as identified by the systematic literature review described in Chapter \@ref(sysrev).
Table \@ref(tab:res-charac-measures-table) outlines the characteristics identified in each paper thought to affect resilience.
These included: social capital and social networks; a mentor or someone to provide support; place attachment; natural environment; being in or returning to employment, income, or social class; involvement in sports in childhood and youth; coping mechanisms; cognitive ability in childhood; behaviour change; sickness benefit provision; access to---especially primary---healthcare; demographics such as gender, age, ethnicity, and education level; congruity between individual circumstances and neighbourhood or area circumstances; absence of Adverse Childhood Experiences (ACE); parental and grandparental mental health; budgeting and money management skills; and bespoke resilience scales.
### Social capital {#ressim-social-capital}
@poortinga2012a tested the role of bonding, bridging, and linking social capital in community resilience, and @cairns2013a and @nagi2013a also identify social capital as a source of resilience.
@poortinga2012a used nine variables from the 2007 and 2009 Citizenship Surveys in England to articulate social capital [@poortinga2012a, pp. 289--290].
The authors tested bonding social capital by asking about: the extent to which people in their respondent's neighbourhood pull together to improve the neighbourhood; how many people in the neighbourhood can be trusted; and how strongly the respondent feels they belong to their neighbourhood.
In *Understanding Society* there is no exact analogy to the first question, but respondents are asked if they would be 'willing to work together with others on something to improve my neighbourhood' and if 'I think of myself as similar to the people that live in this neighbourhood'.
I coded respondents who strongly agreed or agreed to both questions as a proxy for neighbourhood cohesion.
Trust in people in the neighbourhood and feeling of belonging to the neighbourhood have more direct analogies in *Understanding Society*.
Trust was asked as 'people in this neighbourhood can be trusted' in waves `f_` and `c_`, and as 'generally speaking would you say that most people can be trusted, or that you can’t be too careful in dealing with people?' in wave `_a`.
I coded neighbourhood trust as either strongly agree or agree to wave `f_` and `c_`, or 'most people can be trusted' to wave `a_`, taking the most recent response if respondents answered more than one wave.
Belonging was asked as 'I feel like I belong to this neighbourhood'.
I coded respondents who strongly agreed or agreed with this statement as feeling they belong to their neighbourhood.
Bridging social capital was asked by: if respondents think people from different backgrounds in their neighbourhood get on well together; if residents respect ethnic differences between people; what proportion of the respondent's friends have a similar income to them; and what proportion of the respondent's friends are of the same ethnic group as them.
*Understanding Society* asks respondents to agree or disagree with the statement, 'People in this neighbourhood generally don't get along with each other'.
This is reversed from the use in @poortinga2012a, but tests the same concept so I used this as a proxy.
There is no direct analogy asking about respect for ethnic differences so I could not include this.
Proportion of friends with a similar income and proportion of friends of the same ethnic group have direct analogies in *Understanding Society*.
@poortinga2012a suggested hetergeneous friendship groups were conducive to resilience, so I coded 'about half' and 'less than half' as hetergeneous in both cases.
Linking social capital asked: if respondents had contacted a political representative, such as a councillor or Member of Parliament, in the last twelve months; if the respondent had attended a public rally, meeting, demonstration, or protest, or signed a petition in the last twelve months; to what degree the respondent felt they could influence decisions affecting their local area; and how much they trust the local council, the police, and parliament.
The first two questions ask about activities, except voting, that the respondent has participated in, for which there was no adequate analogy in *Understanding Society*, which forced me to exclude these questions from my analysis.
The third question asks about the respondent's ability to influence local decisions, for which I used 'People like me don't have any say in what the government does' as a proxy.
I coded respondents who strongly disagreed or disagreed as having political efficacy.
Finally, levels of trust in the local council, police, and parliament were not asked so I could not use these.
### Social networks
@reeves2014a reviewed the effectiveness of social networks for patients managing a long--term condition.
They suggested network member characteristics, social network characteristics, and member change were important for effective social networks.
They articulated these as: number of network members within five minutes; percentage of network members giving support within five minutes; number of network members in contact at least weekly; number of cohabitants; if the network includes a spouse or partner; number of different relationship 'types'; number of network members who know each other; amount of support given to other network members; score of social resources; extent of involvement in groups or organisations; a binary measure if any network members were lost in the previous twelve months; and number of members of the network lost in the previous twelve months.
In *Understanding Society* respondents are only asked details of up to three 'best friends' or network contacts, so it was not appropriate to use the number of contacts as this was capped.
Instead, I used a binary yes or no if any one of the respondent's friends met the respective criteria.
For network members within five minutes I used friends who live less than one mile away as a proxy.
There was no suitable variable asking if friends provided support, so I was not able to include this.
Respondents were asked how frequently they were in touch with friends, so I coded respondents as binary yes or no if they were in touch with at least one friend, at least weekly.
I derived number of cohabitants by subtracting one---the respondent---from household size.
Marital status was the most appropriate proxy for whether the network included a spouse or partner, but I could not include this because I included it as a constraint (see Section \@ref(ressim-marital-status)).
There were no suitable measures for number of different relationship 'types' or number of network members who know each other, so I could not include these.
The paper used a count of up to seven types of support given to others by the participant in the last month, but it is not known what these seven types of support were.
*Understanding Society* asks if the respondent cares for others either inside or outside of the household, but I was not able to use these responses as they might not capture all of the types of support used by @reeves2014a.
Social resources were assessed using the Resource Generator--UK (RG--UK) instrument [@webber2007a].
This asked 27 items about the help available to the respondent across four domains, such as if the respondent had a friend who could help with jobs around the home or who had a professional occupation [@webber2007a, p. 486].
*Understanding Society* did not ask comparable questions about the nature and extent of support provided by friends so I could not include these measures.
Extent of involvement in groups or organisations was asked as the number attended from a list of 14 different types.
They did not specify what the 14 types are, but in *Understanding Society* respondents are asked if they participate in any of 16 organisations or activities.
While there is no guarantee the 16 items in *Understanding Society* map to the 14 in @reeves2014a, they do cover a broad range of organisations and groups and respondents are asked if they participate in any other groups not captured.
I coded respondents as being involved if they participated in at least one group or organisation.
*Understanding Society* does not ask if any friendships or network 'nodes' have been lost in the preceeding twelve months or about work done by lost 'nodes' in that period.
I was therefore unable to include these concepts in my analysis.
### Peer support {#ressim-peer-support}
@matthews2012a found that respondents who self--reported that they had "...someone to support, push or encourage them" were more likely to look after their health and seek treatment when necessary [@matthews2012a, p. 404].
*Understanding Society* asks about social networks, but not if the respondent feels they receive support from members of their network.
Similarly only respondents completing the youth questionnaire---those aged 16--21---were asked if they feel they receive support from their family.
For this reason I was not able to include this measure, but other measures of the quality and quantity of the respondent's social network are inlcuded based on measures in Section \@ref(ressim-social-capital).
@robinson2015a identified peer support as a protective factor against poor health in men.
As discussed above there were no suitable measures in *Understanding Society* for this concept so I was not able to include it.
### Place attachment
@cairns2013a and @nagi2013a are based on the same doctoral research so repeat the same measures.
They identify self--reported place attachment, social capital, and the quality of the natural environment as potential protective mechanisms.
Place attachment was defined by the authors as "the emotional attachment acquired by individuals to their environmental surroundings which enables them to develop a strong sense of belonging, which is important for personal identity and emotional well--being" [@nagi2013a, p. 232].
*Understanding Society* asks if the respondent feels like they belong to their neighbourhood, which I already coded in Section \@ref(ressim-social-capital).
### Natural environment
@cairns2013a and @nagi2013a identify the quality of the natural environment as a potential protective mechanism.
@bambra2015a hypothesised a reduced or limited proximity to 'brownfield' sites---sites that are categorised as previously developed land (PDL)---and low environmental deprivation are potential sources of health resilience.
*Understanding Society* does not ask about the local environment so I was not able to include these concepts.
### Employment status and occupational capital {#ressim-employment-status}
@cameron2013a found that self--reported employment status, financial situation, social isolation, 'occupational capital', and social support affected health outcomes.
Employment status is already used as a constraint so I had to exclude it.
Respondents in *Understanding Society* are asked about their current subjective financial status, so I included this as a proxy for financial situation.
I coded respondents who reported they were living comfortably or doing alright as a 'good' financial situation and potential source of resilience.
The number of close friends (which can also include family members) is asked in *Understanding Society*, so I coded respondents with one or more close friends as not socially isolated.
Occupational capital is defined by the author as "accessible external opportunities" [@cameron2013a, p. 197], which I take to mean as the availability or number of jobs which the candidate could reasonably perform and be appointed to within a reasonable distance.
This is only applicable to individuals who are currently seeking work, mostly those who are unemployed, so is not applicable to the general population.
I could not combine this in any way with employment status, either, as I used this as a constraint (see Section \@ref(ressim-economic-activity))
For these reasons I excluded this from my analysis.
I was not able to include social support as I discussed in Section \@ref(ressim-peer-support).
### Sports participation {#ressim-sports-participation}
@haycock2014a determined that sports participation in youth had a strong association with sports participation, and therefore improved health, in adult life.
In *Understanding Society* sports participation is asked, but only for the youth panel or if there is a child in the home, so it was not possible to include this measure.
### Coping mechanisms
@lai2014a provide a systematic review of coping mechanisms employed to mitigate stress and challenges from caregiving.
As this is a review of other literature, multiple instruments were identified to measure coping ability and strategy including Coping Health Inventory for Parents (CHIP), Ways of Coping Scale (WCS), and the Multidimensional Coping Inventory (MCI), as well as qualitative and self--reported measures.
*Understanding Society* does not capture this breadth of information about coping, and likely should not as many of these instruments are not designed to be self--completed.
It does, however, ask the General Health Questionnaire (GHQ) which includes items on the respondent's ability to overcome difficulties and to face problems.
I used these as a proxy for 'coping' overall, although these will not articulate the nuances of *how* respondents cope.
I coded 'not at all' or 'no more than usual' to problems overcoming difficulties and 'more so than usual' or 'same as usual' to ability to face problems as potentially sources of resilience.
@erskine2016a looked at the protection provided by repressive coping in old age.
I cannot include detailed information about coping styles because these are not asked in *Understanding Society*.
I have included the GHQ which asks about coping overall, but not about *how* the respondent copes.
### Cognitive ability
@mottus2012a tested the efficacy of cognitive ability, measured with the Moray House Test no. 12 [@mottus2012a, p. 1370], as a protective mechanism for health.
I had to exclude this because there was no suitable comparable measure in *Understanding Society*.
### Behaviour change {#ressim-behaviour-change}
@mackenbach2015a describe the relationship between education and cause--specific mortality in Europe, from which mortality deviated from the 'expected' level in some circumstances.
They determined that much of the deviation, particularly for preventable diseases, is due to behaviour change, medical intervention, and injury prevention [@mackenbach2015a, p. 59].
Medical intervention and injury prevention, although clearly important, are not of interest to this study because they focus on the prevention and treatment of a specific pathology or event, not on psychological or physiological improvement overall.
Behaviours they identified as protective included not smoking and low alcohol consumption.
Smoking is recorded in *Understanding Society* as the usual number of cigarettes smoked per day, which I coded as either no cigarettes for non--smokers or one or more cigarettes per day for smokers.
Alcohol consumption is not directly asked in *Understanding Society* but the amount of money the household spent on alcohol in the preceeding four weeks is.
By dividing this figure by the average unit cost of alcohol [@ias2014a] I estimated the household alcohol consumption in units.
Dividing this figure by four gave the weekly household alcohol consumption.
I further divided this by the number of individuals living in the household aged 16 and over to arrive at an estimated consumption of alcohol per person in units.
Consumption of more than 14 units per week is considered risky [@cmo2016a, p. 4] so I have coded respondents as low or high risk based on this threshold.
This should be treated as highly indicative only as it is based on a number of assumptions, not least that all individuals within the household drink the same amount of alcohol.
Parental attitudes and behavious towards alcohol consumption demonstrably influence child alcohol consumption [@nash2005a; @yu2003a] but clearly there will be variation within the household to a greater or lesser degree.
There are no analogies for diet and exercise in *Understanding Society* so I have had to exclude these.
### Sickness benefit arrangements
@wel2015a compare sickness benefit arrangements across Europe and their effect on health inequalities.
Sickness benefit is an important safety net, potentially applicable to any and all employed individuals.
*Understanding Society* asks if the respondent is usually employed but on sick leave in the last week, but does not include details of any amounts paid because of sick leave.
Further, sickness benefit will only apply to respondents who are employed which accounts for only about $`r format(table(us$econ_act)["eca_emp"] / nrow(us) * 100, digits = 0)`\%$ of the sample.
Employment status is a constraint, so I was not able to combine this with sickness benefit provision to create a measure for the whole population.
For these reasons I was not able to include sickness benefit in my analysis of resilience.
### Accessing health care
@mastrocola2015a identified barriers women involved in street--based prostitution face in accessing health care, especially primary care, and suggest that improved access would be a protective factor for these women.
Respondents in *Understanding Society* are asked if they experienced any difficulties accessing local services, but this is grouped together as one question which includes healthcare, food shops, and learning facilities.
I was therefore not able to include this measure as there was no way to differentiate between access to health care services and all other services.
### Personal and area demographics
@glonti2015a is a systematic review of health resilience during economic crises across ten countries.
Extracting just the UK--based papers, the sources of resilience were, variously, gender, age, education level, employment, financial constraints, and low area--level deprivation.
I could not include gender, age, education level, and employment because they are already included in the simulation as constraints.
*Understanding Society* asks about subjective financial situation which I used as an indicator for financial constraints, as I coded in Section \@ref(ressim-employment-status).
Area--based methods of deprivation, such as IMD score, are not recorded in *Understanding Society* but I attached these to the aggregated simulation.
### Neighbourhood congruity
@albor2014a tested to see if sharing a similar socio--economic status to other residents in the neighbourhood---neighbourhood congruity---can be a source of health resilience.
Individual socio--economic status was derived from household occupational class and educational achievement, and neighbourhood socio--economic status was based on census occupational status and educational status.
*Understanding Society* asks if respondents agree or disagree that they are similar to others in their neighbourhood, which is what I based neighbourhood congruity on.
I was not able to include occupational status or educational status as they are both constraints.
### Adverse Childhood Experiences (ACEs)
@bellis2014a explored the association between adverse childhood experiences (ACEs) and health--harming behaviours, specifically if an absence of ACEs can lead to resilience.
Respondents were asked about ACEs using the Centers for Disease Control and Prevention short ACE tool which covered: physical, verbal, and sexual abuse; parental separation; exposure to domestic violence; or growing up in a household with mental illness, alcohol abuse, drug abuse, or incarceration [@bellis2014a, p. 3].
I was not able to include ACEs as *Understanding Society* does not ask respondents about household conditions during childhood or adolescence, but I was able to code household alcohol consumption (Section \@ref(ressim-behaviour-change)).
### Familial mental health
@johnston2013a used the 1970 British Cohort Study to test if parental or grandparental mental health affected the mental health of the grandchild.
Childhood or adolescent household conditions were not asked of respondents in *Understanding Society* so I was therefore unable to include parental or grandparental mental health.
I was able to include an indicator for the respondent's mental health using the General Health Questionnaire (GHQ).
### Financial and budgeting skills
@fenge2012a used semi--structured interviews to explore older peoples' resilience to the effects of economic recession, specifically if budgeting and money management skills enabled them to maintain their well--being and quality of life.
*Understanding Society* asks respondents if they save any money, which is a binary response, and about the respondent's subjective financial situation, which I have already coded in Section \@ref(ressim-employment-status).
### Resilience scale (RS--25)
@sull2015a used the Resilience Scale (RS--25) to measure resilience among NHS workers which tests concepts of "a purposeful life, perserverance, equanimity, self--reliance and existential aloneness" [@sull2015a, p. 3].
The RS--25 is a proprietary measure of resilience marketed as the 'True Resilience Scale' which can be licensed for use from The Resilience Centre [@rs25].
I contacted The Resilience Centre by email in April 2017 asking to see the items on the RS--25, explaining the nature of this research and that I did not intend to use the resilience scale in a clinical or organisational setting.
After repeated emails [@wagnild-priv-comm] The Resilience Centre did not provide the items, so I could not include them.
The RS--25 instrument might be valid but is of limited use for policy or research if it cannot be reviewed by other researchers.
Table \@ref(tab:res-charac-operationalise-table) summarises the concepts and variables I used to operationalise these.
```{r res-charac-operationalise-table}
knitr::kable(res_charac_operationalise, row.names = FALSE,
caption = "Operationalisation of resilience sources")
```
## Constraints {#ressim-constraints}
In selecting constraints I began with those I used in the pilot simulation (see Section \@ref(methods-constraints)).
These constraints simulated limiting long--term illness or disability well because they correlated well with this variable, and my aim here was to simulate similar health--related variables.
The constraints I used were sex, highest qualification, ethnicity, housing tenure, car ownership, and age.
In addition to these I wanted to test an increased number of constraints, now I had a working model; as in the pilot simulation (Chapter \@ref(methods)) I was limited by the variables that are available in both the census and the survey data, which in practice usually means the census was the limiting factor.
Nevertheless the census contained additional variables that I tested for inclusion in the simulation.
These were: economic activity; overcrowding (greater than 1.0 person per room, as described by @townsend1988a); marital status; and social class.
### Economic activity {#ressim-economic-activity}
The first additional variable I tried was economic activity, as this is a powerful predictor of many health outcomes [@wilkinson2003a; @bartley2006a].
Most levels matched across both the survey and the census data, but a few required recoding or re--aggregating.
Economic activity data in the census covered only individuals aged 16--74 whereas *Understanding Society* covered all individuals aged 16 and above.
To solve this issue in setting up the census I added all individuals aged 75 and above from the census to the 'retired' category.
This was the most pragmatic choice as, even though some individuals aged 75 and above may still be working, especially in part--time or informal capacities, the majority will have left the primary employment or career which influenced their social class.
An option for maternity leave was present in the survey data but not in the census data so I needed to choose the most suitable group to combine this with.
Similarly apprenticeships, government training schemes, and 'unpaid worker in family business' were options in the survey data but not in the census.
I ultimately decided that because apprenticeships and government training schemes were conceptually similar I would combine these into 'other' in both the census and survey levels.
Combining government training scheme and apprenticeship with unpaid worker in a family business was not ideal as they are conceptually different forms of economic activity.
However, only a small number of respondents in *Understanding Society* were unpaid workers in a family business ($n = 48$) so the effect was negligible, so the 'other' group could be thought of as mostly comprising individuals on training schemes designed to enhance their skills and improve their careers.
Because of this, it did not seem appropriate to include people on maternity leave in the 'other' group, as women on maternity leave can choose to return to their previous role and economic activity.
I considered grouping maternity leave and long--term sick and disabled together in the survey, as both groups have 'paused' their previous economic activity.
However, maternity leave comes with an expectation that the individual returns to their previous economic activity within a defined period, usually twelve months.
Individuals who are long--term sick or disabled and receiving a personal independence payment (PIP) must have a condition expected to last at least nine months, but in practice there is no maximum length of time people can claim for before returning to their previous economic activity as they are 'regularly reassessed' [@govuk2017a].
I ultimately decided to group individuals on maternity leave with individuals looking after family or home.
This has the same issue that those on maternity leave are likely to return to their 'previous' economic activity while those looking after the family or home or those who are long--term sick or disabled are more likely to remain so.
It has the advantage, though, of the two being conceptually similar involving care for family members.
In addition, *by definition*, people with a long--term illness or disability will necessarily have a health issue, while both those on maternity leave and those looking after family or home may or may not have a health issue: a health issue is not *a priori* known for these individuals.
Finally, it preserves the distinction between individuals who are fundamentally performing a caring role to those who are receiving formal training.
In the census students are split between those who are economically active and those who are economically inactive, which is usually students who are studying full--time.
In *Understanding Society* students are not distinguished in this way, so it was necessary to group economically active and economically inactive students in the census.
Even though economically active students may not be full--time students, or may participate in the labour market in other ways, their primary economic activity is arguably studying to improve their skills so the two are conceptually similar.
The census splits self--employed groups by part--time and full--time, and those with employees and those without employees.
These had to be aggregated to match the survey, which had a single category for self--employed.
Similarly full--time and part--time employed individuals in the census were aggregated---to simply 'employed'---to match the survey.
*Understanding Society* does not explicitly state the 'unemployed' group is the same as 'economically active unemployed' from the census.
To be 'economically active unemployed' requires the individual to be "actively looking for work" or "waiting to start a new job" [@nomis2013e], while *Understanding Society* instead asks respondents to choose the economic activity that 'best' describes their current circumstances.
Again, I do not believe this will affect the simulation significantly as they fundamentally measure the same concept; an individual looking to return to some other form of economic activity, be that employment, self--employment, or studying.
The final levels for economic activity in the census and the survey I used are: employed; looking after home or family; long--term sick or disabled; retired; self--employed; student; unemployed; and other.
These are coded in `data-raw/0-prep-understanding-society.R` in the thesis source code.
### Overcrowding
The concept of 'overcrowding' is based on the definition used by @townsend1988a [pp. 36--37] in their construction of a deprivation index.
A private household is considered overcrowded if there is more than one person per room in the household.
The definition of room excludes bathrooms, toilets, halls or landings, rooms that can only be used for storage, or any rooms shared between different households.
All other rooms, including kitchens and utility rooms, are included.
If two rooms have been converted in to one room they are counted as one room [@nomis2014a].
Unfortunately it proved impossible to use overcrowding as a constraint variable.
The data is available in the census for households or individuals, but crucially only for the whole population: it is not possible to obtain persons per room with an associated age breakdown.
This makes it impossible to subset the data and remove individuals aged less than 16 from the census tables so there are approximately 50,000 'extra' individuals.
```{r kids-ocrowd-model, cache=TRUE}
kids_ocrowd <- glm(ocrowd ~ kids, data = us, family = binomial())
kids_ocrowd <- check_logit(kids_ocrowd)
```
Arguably I could reweight the overcrowding population using the respective proportions to that of the known population that is 16 and above, as I did for car ownership (Section \@ref(matching-census-populations)).
The discrepancy for car ownership was approximately $5,000$ individuals, or approximately $2.1\%$, so the reweighting had a much smaller effect on the data than reweighting $50,000$ individuals would.
This is additionally problematic because children are not randomly distributed among households that are overcrowded and those that are not.
A hypothesis test using logistic regression with data from *Understanding Society* indicates that the number of children in the household and overcrowding are correlated (Nagelkerke pseudo--$R^2 = `r kids_ocrowd$over$nagelkerke`$, model $\chi^2$ $p$--value $\approx$ $`r kids_ocrowd$over$chisq_prob`$).
This would not be the case if families with more children had access to larger houses, but clearly something---perhaps income or availability of suitable housing stock---is preventing many families with children from moving into suitably--sized accommodation.
For these reasons I decided recalculating the populations was not appropriate and chose not to include overcrowding, or persons per room, as a constraint.
This is unlikely to pose an issue for the simulation, however, because other constraints capture different dimensions of reduced material or economic circumstances or deprivation, which overcrowding is associated with.
### Marital status {#ressim-marital-status}
Evidence suggests marital status is associated with health outcomes [@hosseinpour2012a; @robards2012a], although not conclusively [@sacker2009a], and not always equally across social class [@choi2013a].
For the most part, levels recorded in *Understanding Society* closely matched those in the census.
There were levels for married, in a civil partnership, single, separated, divorced, or widowed, and these required no additional matching.
For respondents in *Understanding Society* there were additional levels for separated from a civil partnership, divorced from a civil partnership, or a surviving partner in a civil partnership.
I simply combined these with separated, divorced, or widowed, respectively and there were relatively small number of respondents in a civil partnership so this did not affect the simulation.
### Social class
Socio--economic position or social class is another powerful determinant of health.
Social class is usually measured using the National Statistics Socio--economic Classification (NS--SEC) [@ons2015a].
```{r nssec-cases, cache=TRUE}
nssec_cases <- us %>%
select(pidp,
age, sex, eth, marital,
qual, econ_act,
car, ten) %>%
na.omit()
m_nssec <- glm(llid ~ class8, data = us, family = binomial())
m_nssec <- check_logit(m_nssec)
```
There were a large number of missing cases for social class in *Understanding Society* (missing $n = `r format(nrow(us[is.na(us$class8), ]), big.mark = ",", trim = FALSE)`$).
To help in deciding whether to remove or include social class I ran a logistic regression test to see if NS--SEC is useful in predicting limiting long--term illness or disability, as a proxy for a health outcome.
The model was statistically significant ($p \approx `r m_nssec$over$chisq_prob`$) but the predictive power was negligible (Nagelkerke pseudo--$R^2 \approx `r m_nssec$over$nagelkerke`$), the difference in deviances was small ($`r m_nssec$over$diff_deviance`$), and none of the levels of the variable were statistically significant.
The poor predictive power of social class and the fact that there were so many missing data points led me to exclude this variable from the simulation.
I did not consider this a significant problem as I was able to include education in the model which is arguably a more robust measure.
Because highest level of education is generally 'fixed' there is no problem of 'reverse causality', making it clearer if poor health in old age affects socio--economic position, or if socio--economic position negatively affects health.
### Final constraint choice
After excluding social class and overcrowding, the final list of constraints I tested were: age; sex; ethnicity; marital status; highest qualification; economic activity; car ownership; and housing tenure.
## Empirically test constraints {#ressim-test}
In this section I tested the constraints to see if they correlated with clinical depression.
Respondents in *Understanding Society* are asked if they have a broad range of health conditions, including clinical depression, and responses are coded as 'yes' or 'no'.
Of the $`r format(nrow(us), big.mark = ",", trim = FALSE)`$ respondents in *Understanding Society*, $`r format(nrow(us[us$depress == "depress_yes" & !(is.na(us$depress)), ]), big.mark = ",", trim = FALSE)`$ reported having clinical depression.
As with the pilot microsimulation the dependent variable is binary, so logistic regression is the most appropriate technique to establish correlation between the constraints and depression.
I set up an initial model using age, sex, ethnicity, marital status, highest qualification, car ownership, housing tenure, economic activity, and limiting long--term illness or disability as independent variables.
Clinical depression, with 'no clinical depression' coded as the base category, was the dependent variable.
The overall results of this model are displayed in table \@ref(tab:model-depress-results-over).
```{r model-depress-setup, cache=TRUE}
dep_df <- us %>%
select(depress,
age, sex, eth, marital, qual, car, ten, econ_act, llid) %>%
rename(
mar = marital,
eca = econ_act
) %>%
na.omit()
```
```{r dep-model-relevel}
# Relevels factors so output of models is more sensible
dep_df$depress <- relevel(dep_df$depress, ref = "depress_no")
dep_df$age <- relevel(dep_df$age, ref = "age_90_plus")
dep_df$sex <- relevel(dep_df$sex, ref = "sex_female")
dep_df$eth <- relevel(dep_df$eth, ref = "eth_british")
dep_df$mar <- relevel(dep_df$mar, ref = "mar_single")
dep_df$qual <- relevel(dep_df$qual, ref = "qual_0")
dep_df$car <- relevel(dep_df$car, ref = "car_0")
dep_df$ten <- relevel(dep_df$ten, ref = "ten_rented")
dep_df$eca <- relevel(dep_df$eca, ref = "eca_emp")
dep_df$llid <- relevel(dep_df$llid, ref = "llid_no")
```
```{r model-depress, cache=TRUE}
m_dep <- glm(depress ~ age + sex + eth + mar + qual + car + ten + eca + llid,
data = dep_df, family = binomial())
m_dep_aic <- AIC(m_dep)
m_dep_base_aic <- AIC(glm(depress ~ 1, data = dep_df, family = binomial()))
m_dep <- check_logit(m_dep)
```
```{r model-depress-results-over}
knitr::kable(m_dep$over, row.names = FALSE,
caption = "Overall results of depression model")
```
```{r model-depress-test, echo=FALSE, include=FALSE}
assertthat::assert_that(m_dep_aic < m_dep_base_aic)
```
The AIC of the model (`r m_dep_aic`) is less than the AIC of the baseline (`r m_dep_base_aic`) so the model overall predicts depression (difference in deviances = `r m_dep$over$diff_deviance`, Nagelkerke pseudo--$R^2 = `r m_dep$over$nagelkerke`$, $p \approx `r m_dep$over$chisq_prob`$).
The breakdown of individual results are provided in table \@ref(tab:model-depress-results-ind).
```{r model-depress-results-ind}
m_dep$ind$predictor <-
stringr::str_replace(m_dep$ind$predictor,
"^age|^sex|^eth|^mar|^qual|^car|^ten|^eca|^llid",
"")
knitr::kable(m_dep$ind, row.names = FALSE,
caption = "Individual results of depression model")
```
The odds ratios suggest all age groups except age 85--89 are statistically significantly more likely to have clinical depresseion than respondents aged 90 and over.
The odds of having clinical depression increase from age 16--17 to their peak between ages 25--44, then decline again with age to their lowest at age 85 and above.
The increase in odds to age 44 might be a result of cumulative exposure to evironments and events that contribute to clinical depression.
After this age the decreasing likelihood of clinical depression may be a genuine change so that older people 'recover' from or are otherwise resistant to clinical depression.
It may also be a cohort effect such that older generations are less likely to report or seek diagnoses for mental illness.
Sex is statistically significant, with males less likely than females to have a diagnosis of clinical depression.
Most levels of ethnicity were statistically significant compared to the reference group of White British; only the Irish ethnic group was not statistically significant.
White British respondents are the most likely to have clinical depression, with all other ethnic groups having lower odds.
Black African or Black Caribbean British respondents were less than half as likely to have clinical depression that White British respondents.
These are consistent with the findings of the limiting long--term illness or disability model in Section \@ref(methods-test-cons).
Respondents who are married were less likely to have clinical depression compared to those who were single and never married.
Respondents who were divorced or separated were more likely to have clinical depression than those who were single and never married.
Respondents in a civil partnership and who were widowed were not not statistically significantly different to the reference group (single), suggesting similar levels of clinical depression.
The confidence intervals for the odds for civil partnership are wide, perhaps because of the small number of respondents in a civil partnership ($n = `r nrow(us[us$marital == "mar_civil_part" & !is.na(us$marital), ])`$).
Interestingly, respondents with any level of qualification were *more* likely to have clinical depression than those with no qualifications.
This could be because individuals with qualifications may be more likely to know of services available or more willing to obtain an appropriate diagnosis in order to obtain support.
Individuals from households with at least one car were less likely to have clinical depression than the reference group (no car), with decreasing odds ratios for individuals from households with more cars.
Home owners, either those who owned their home outright or with a mortgage, were less likely to have depression than individuals who rent their homes (the reference group).
These suggest that increased financial means are associated with lower risks of clinical depression.
This is supported by the fact that employed respondents are least likely to have clinical depression compared to other statistically significant levels of economic activity.
Respondents looking after the home or family, who are long--term sick, retired, or unemployed are all more likely to have clinical depression than employed respondents.
Respondents who are self--employed or who are students have similar levels of clinical depression to employed respondents.
```{r llid-depress, cache=TRUE}
m_dep_llid <- glm(depress ~ llid, data = dep_df, family = binomial())
m_dep_llid <- check_logit(m_dep_llid)
```
Limiting long--term illness or disability is also correlated with clinical depression.
The correlation is not high (pseudo--$R^2 = `r m_dep_llid$over$nagelkerke`$), but it does suggest that: either some people have depression severe enough for them to consider it 'limiting'; or that some people have a different limiting condition with clinical depression as a co--morbidity; or both.
Overall these variables correlated meaningfully with clinical depression, so I was able to use them as constraints for the spatial microsimulation model.
### Constraint order {#sms-constraint-order}
As seen in Section \@ref(methods-test-cons) the order the constraints were entered into the model made negligible differences to the outcome.
I used the absolute $\beta$ values to guide the order I entered the constraints into the model, although a number of random orders converged on the same result.
The final order of entry I used was: car ownership, housing tenure, highest qualification, marital status, economic activity, sex, ethnicity, and age.
## Weight {#ressim-weight}
```{r depress-don-map, out.width="100%", fig.cap="Simulated clinical depression prevalence in Doncaster", cache=TRUE}
depress_don <- tmap::tm_shape(don_oa) +
tmap::tm_polygons("depress_yes", textNA = "Prison OA",
title = "Depression prevalence") +
tmap::tm_layout(frame = FALSE)
depress_don
```
```{r depress-don-map-export}
# export as A3 for presentations/printing
if (!file.exists("figures/cache/depress_don.pdf")) {
tmap::save_tmap(depress_don, filename = "figures/cache/depress_don.pdf",
width = 420, height = 297, units = "mm", asp = 1)
}
```
Weighting was performed with the `rakeR` package.
I ordered the constraints as specified in Section \@ref(sms-constraint-order) in both the census and survey and then checked for compatibility using `rakeR::check_constraint()`.
I produced the fractional weights using the iterative proportional fitting algorithm (Section \@ref(methods-weighting)), as was the case for the pilot simulation.
For this I used the `rakeR::weight()` function.
I then 'extracted' the weights to produce aggregate results for each variable in each zone with `rakeR::extract()`.
I integerised the weights to use as case studies in Section \@ref(policy-case-studies), but I used the extracted weights in most of my analysis because I do not need cases to use in a subsequent agent--based or dynamic model.
As demonstrated in Section \@ref(methods-weight-int-comp) the fractional weights are also slightly more accurate than the integerised weights.
Figure \@ref(fig:depress-don-map) shows the initial results of simulated clinical depression by output area in Doncaster.
Output areas with significant prison populations have been removed as discussed in Section \@ref(communal-establishment-residents), and are displayed in grey.
## Validate {#ressim-validate}
As with the pilot simulation, it is possible to statistically compare the simulated constraints with the actual, known constraints to internally validate the accuracy of the model.
This will involve an assessment of: correlation; a two--sided, equal variance *t*--test; total absolute error and standardised absolute error of the model overall; and standardised absolute error for each zone.
### Correlation
The simulated population ($`r format(don_pop_sim, big.mark = ",", trim = FALSE)`$) matched the actual population ($`r format(don_pop_act, big.mark = ",", trim = FALSE)`$) exactly, indicating the simulation constrained accurately overall.
This was further confirmed by the correlation statistic, which is a standardised statistic so a value of $1.0$ is ideal.
The correlation statistic was $`r cor(rowSums(res_con[, grep("sex_", colnames(res_con))]), res_weights_ext[["total"]])`$, indicating the population simulated in each area accurately matched the respective known population.
```{r depress-pop-plot-prep}
res_con <- arrange(res_con, code)
res_weights_ext <- arrange(res_weights_ext, code)
dep_sim_act <- data.frame(
code = res_con$code,
act = rowSums(res_con[, grep("sex_", colnames(res_con))]),
sim = res_weights_ext$total,
stringsAsFactors = FALSE
)
```
```{r depress-pop-plot, fig.width=7, fig.height=7, fig.cap="Actual population against simulated population by output area", cache=TRUE}
ggplot(data = dep_sim_act) +
geom_point(aes(act, sim)) +
geom_smooth(aes(act, act), method = "lm") +
coord_equal()
```
Figure \@ref(fig:depress-pop-plot) compares the simulated population against the actual, known population for each output area.
The simulated populations were a perfect match with their known counterparts, indicating that each individual area simulated accurately.
In addition to the overall plot for each area shown in figure \@ref(fig:depress-pop-plot), I created a plot for each level of each variable for inspection.
These all demonstrated the same high level of fit as the overall area plot, further indicating the model simulation was accurate.
These figures are not displayed here to avoid repetition, as they all show essentially the same relationship, but can be found in the `figures/cache/` directory of the thesis source code if required.
```{r depress-var-level-correlation, include=FALSE, message=FALSE, warning=FALSE}
if (!file.exists("figures/cache/depression_validation_sex_male.pdf")) {
variables <-
colnames(res_weights_ext[, !names(res_weights_ext) %in%
c("code", "total", "depress_no", "depress_yes")])
variables <-
variables[str_detect(
variables, "^car_|^ten_|^qual_|^mar_|^eca_|^sex_|^eth_|^age_")]
lapply(as.list(variables), function(x) {
ggplot() +
geom_point(aes(res_weights_ext[[x]], res_con[[x]])) +
geom_smooth(aes(res_con[[x]], res_con[[x]]), method = "lm") +
xlab(paste(x, "(actual)")) +
ylab(paste(x, "(simulated)")) +
coord_equal()
ggsave(filename = paste0("depression_validation_", x, ".pdf"),
path = "figures/cache/")
})
}
```
### *t*--test
```{r depress-ttests}
variables <- colnames(res_con[, 2:ncol(res_con)]) # drop `code`
depress_ttests <- lapply(as.list(variables), function(x) {
result <- t.test(res_con[[x]], res_weights_ext[[x]],
var.equal = TRUE, alternative = "two.sided")
result <- data.frame(
x,
result[["statistic"]],
result[["p.value"]],
stringsAsFactors = FALSE, row.names = NULL
)
colnames(result) <- c("variable", "statistic", "p_value")
result
})
depress_ttests <- dplyr::bind_rows(depress_ttests)
knitr::kable(
depress_ttests,
caption = "Result of t-tests comparing simulated against actual data")
```
Table \@ref(tab:depress-ttests) shows the results of the equal variance, two--sided *t*--test for each constraint.
This statistically compares the simulated value with the actual, known value from the census and tests the null hypothesis that the two distributions are not different.
In all cases the result of the *t*--test was not statistically significant so we accept the null hypothesis that the two distributions are not statistically different.
This indicates the simulation was a good fit with the census data.
### Total absolute error {#ressim-tae}
```{r depress-tae}
depress_tae <- calc_tae(res_con_pops, res_weights_ext$total)
depress_sae <- calc_sae(depress_tae, res_weights_ext$total)
```
The total absolute error and the standardised absolute error were both overall $\approx `r sum(depress_tae)`$.
Together, these indicate the model overall simulated very well as the differences between the simulated and the observed data are negligible, and certainly well within the thresholds suggested by @smith2009a [p. 1256] discussed in Section \@ref(smslit-validation).
### External validation {#ressim-ext-val}
<!--
Liddy: Can always be confusing to start be saying what you can’t do or haven’t done – clearer to start by simply described what you did do (and why). Only then if essential mention why you did not use alternative approach but here I suspect not needed.
TODO: explain in sms lit rev chapter that it's not easy to validate the simulated data (otherwise you wouldn't need to simulate it!) and refer back to this here
-->
By aggregating the simulated values for clinical depression I was able to determine the total simulated prevalence for the Doncaster local authority area.
I then compared this aggregated value against a known value to provide reassurance that the simulation was realistic and plausible.
These values were unlikely to match precisely because of differences in the populations and because I had to exclude output areas whose population was predominantly prisoners.
The population of the simulation was individuals aged 16 and above as this is based on the sample of individuals in *Understanding Society*.
The measures from Public Health England (PHE) only include those aged 18 and over.
I also had to exclude three output areas had a population consisting predominantly of prisoners.
The prison population was $`r format(sum(don_oa@data$prison_pop, na.rm = TRUE), big.mark = ",", trim = TRUE)`$ in 2011, and it is likely a substantial proportion of these individuals will have clinical depression.
<!-- TODO: reference for prisoners having depression -->
Data from @phe2016a provides the prevalence of depression in the Doncaster clinical commissioning group (CCG) area for patients registered with a GP aged 18 and over, for the years $2011$--$12$ to $2015$--$16$.
The clinical commissioning group area is coterminous with the local authority boundaries in Doncaster, so the two could be compared directly.
Based on the results of my simulation, the number of people in Doncaster with clinical depression was $`r format(depress_sim, big.mark = ",")`$, or approximately $`r ((depress_sim / don_pop_act) * 100)`\%$ of the overall population aged 16 and above.
The 'known' prevalence of clinical depression was $12.8\%$ in $2011$--$12$ for Doncaster CCG.
I used the 2011--12 prevalence because the simulation was constrained by census data from this year.
The population aged 18 and above in Doncaster was $`r format(don_pop_18p, big.mark = ",", trim = FALSE)`$ in 2011, so the prevalence of depression was approximately $`r format(depress_act, big.mark = ",", trim = FALSE)`$ individuals.
On face value this indicated the model only simulated about half the cases of clinical depression.
A more careful examination of the PHE data suggested the 2011--12 data point was problematic and the simulation was more accurate than initial inspection suggested.
I believe the 'known' prevalence provided by @phe2016a for 2011--12 is inconsistent with the data from the surrounding time points, suggesting this data point could be spurious.
```{r phe-dep-trend, fig.align="center", fig.cap="Prevalence of clinical depression in Doncaster (blue) and the Yorkshire and The Humber region (black), source: Public Health England (2016)"}
knitr::include_graphics("figures/phe-depression-trend.png", dpi = 300)
```
Figure \@ref(fig:phe-dep-trend) depicts the trend in clinical depression prevalence in Doncaster and the Yorkshire and The Humber region between $2009$--$10$ and $2015$--$16$ [@phe2016a].
This trend data indicates that the prevalence of clinical depression in Doncaster in $2012$--$13$ was only $6.1\%$, less than half that of the $2011$--$12$ figure.
This figure is more congruous with subsequent years, for which the prevalence of clinical depression increased to $8.2\%$ by $2015$--$16$.
The $2011$--$12$ prevalence figure therefore seems at odds with later data points.
Data before $2011$--$12$ for Doncaster is not provided, but data for the Yorkshire and The Humber region suggest the prevalence of clinical depression prior to $2011$--$12$ was less than $5.0\%$.
This is congruous with $2012$--$13$ and later data, further suggesting the $2011$--$12$ figure is anomalous.
One possible explanation for this discrepancy is the Quality and Outcomes Framework (QOF), "...the annual reward and incentive programme detailing GP practice achievement results" [@qof], changed between $2010$--$11$ and $2011$--$12$.
Indicators for clinical depression---DEP2/DEP4 and DEP3/DEP5---were changed to be worth fewer 'points', potentially affecting the measurement and reporting of this diagnosis [@qof-indicators, p. 3].
For this reason I believe it is likely that the prevalence of clinical depression is closer to $5$--$6\%$ than the chart initially suggests.
This would be the approximately prevalence if the 2011--12 data point was removed and the trend used instead.
This places my simulated results in line with the surrounding data, suggesting they are plausible and certainly more likely to be valid than initial comparison to 'known' data suggested.
## Results {#ressim-results}
### Resilience {#ressim-results-resilience}
```{r oa-imd-map, fig.width=7, fig.height=7, fig.cap="Doncaster IMD 2015 rank (lower rank is more deprived)", cache=TRUE}
tmap::tm_shape(don_oa) +
tmap::tm_polygons("imd_rank", title = "IMD 2015 rank", palette = "-YlOrBr") +
tmap::tm_layout(frame = FALSE)
```
```{r prep-res-results-table}
res_vars <- stringr::str_detect(colnames(don_oa@data), "res_")
res_results <- lapply(don_oa@data[, res_vars], function(x) {
result <- length(x[!is.na(x) & x == 2])
result
})
res_results <- data.frame(
"measure" = colnames(don_oa@data[, res_vars]),
"freq" = unlist(res_results),
stringsAsFactors = FALSE
)
```
Having simulated and validated prevalence of clinical depression I compared this with various indicators of area--based socio--economic deprivation.
These were: unemployment; long--term unemployment; low--grade employment (routine employment, NS--SEC 7); index of multiple deprivation (IMD) score; and output area classification supergroup 'hard--pressed living'.
Deprivation based on unemployment, long--term unemployment, and low--grade employment were calculated by summing the number of individuals in each output area matching these criteria and selecting the areas with the highest number of these individuals.
The 2015 Index of Multiple Deprivation (IMD) is provided for lower layer super output areas (LSOAs), but not output areas directly.
An official tool to lookup the IMD score for individual postcodes is provided by @postcode-imd, so it is possible to use indices of multiple deprivation scores at geographies smaller than the LSOAs provided.
For each LSOA I applied the overall LSOA score to each of its constituent output areas, then selected the lowest ranks as the most deprived areas of Doncaster.
Figure \@ref(fig:oa-imd-map) shows the IMD score for each output area in Doncaster, with lower scores representing higher deprivation.
<!--
DB: There is a need to provide a bit more detail on this - classified according to what? Perhaps a paragraph on what these classifications are and how they are derived
TODO: add deprivation OAC to literature and refer back here
-->
Areas classified as being in the 'hard--pressed living' supergroup are used to identify high deprivation areas using the output area classification system.
These areas are indicative of higher rates of social renting, lower rates of higher--level qualifications, and unemployment rates above the national average [@ons2015f, p. 19].
Figure \@ref(fig:comm-oac) shows the output area classification supergroup of Doncaster output areas.
```{r res-results-table}
res_results <- res_results %>%
mutate(threshold = stringr::str_extract(measure, "_[:digit:].*$")) %>%
mutate(
threshold = stringr::str_replace(threshold, "_", ""),
measure = stringr::str_replace(measure, "^res_", ""),
measure = stringr::str_replace(measure, "_[:digit:].*$", "")
) %>%
mutate(
measure = stringr::str_replace(measure, "unem", "High unemployment"),
measure = stringr::str_replace(measure,
"ltun", "High long-term unemployment"),
measure = stringr::str_replace(measure,
"rout", "High low-grade employment"),
measure = stringr::str_replace(measure, "oac", "\'Hard-pressed living\'"),
measure = stringr::str_replace(measure, "imd", "IMD score")
) %>%
select(measure, threshold, freq) %>%
arrange(threshold) %>%
rename(
"Area-based deprivation measure" = measure,
"Threshold (%)" = threshold,
"Number of resilient areas" = freq
)
knitr::kable(res_results, row.names = FALSE, caption = "Number of resilient areas by area-based deprivation measure")
```
I considered output areas as 'resilient' if they had both high deprivation, using the indicators described above, and low prevalence of clinical depression.
To determine what to classify as 'low' and 'high' I tested a number of thresholds from $20\%$ to $40\%$ of respondents being both clinically depressed and being in the highest deprivation classification.
Table \@ref(tab:res-results-table) summarises the results of these tests.
Selecting a threshold will always include an element of subjective choice and is arguably more an art than a science.
There are two properties that I used to help guide my decision in selecting a threshold, however.
First, resilience is, by definition, an outlying phenomenon so a threshold should mark a relatively small number of areas as resilient.
Second, I suggest it is desirable if a threshold does not treat too many cases as 'high' deprivation or 'low' health, as it is important for these to remain differentiated from 'background' cases.
After testing, thresholds of $20\%$, $25\%$, and $30\%$ resulted in very few 'resilient' areas, sometimes none at all.
Conversely, a threshold of $40\%$ arguably resulted in too many resilient areas being identified.
Using $40\%$ also felt unsatisfactory as this resulted in similar numbers of areas being classified as 'high' deprivation and 'low' clinical depression as not.
A threshold of $\frac{1}{3}$ (specifically $33\%$) resulted in approximately $1\%$ of output areas being classified as resilient.
I selected this threshold because I believe it offered the most satisfactory balance between identifying suitable resilient areas and maintaining separation of 'high' and 'low' areas.
Of course, this decision is my own and could be argued to be arbitrary, but I will progress on this basis because any reasonable threshold can be used to provide useful insight, and other thresholds can be selected and tested by subsequent researchers using the code in this repository.
```{r res-map, fig.width=7, fig.height=7, fig.cap="Resilient output areas in Doncaster", cache=TRUE}
# colorbrewer 6 class YlGn
res_colours <- c("#78c679", "#31a354", "#006837")
tmap::tm_shape(don_oa) +
tmap::tm_borders("light grey") +
tmap::tm_shape(don_oa[!is.na(don_oa@data$res_total) &
don_oa@data$res_total > 0, ]) +
tmap::tm_fill("res_total",
labels = c("Resilient in one domain",
"Resilient in two domains",
"Resilient in three domains"),
title = "Resilient OAs in Doncaster",
palette = res_colours) +
tmap::tm_layout(frame = FALSE)
```
<!--
DB (refering to figure: I think that this is very interesting – it would be worth expanding on this here and have a profile of these areas using your spatial microsimulation output – for instance, have a profile of these areas by providing estimates of average household income, number of children in poverty and other cross-tabulations (making the most of your microsimulated output) and how these compare to the Doncaster average.
-->
Having selected an appropriate threshold, I plotted the output areas that the various models identified as resilient.
The simulation identifies $`r nrow(don_oa@data[!is.na(don_oa@data$res_total) & don_oa@data$res_total > 0, ])`$ output areas as resilient in total based on the five deprivation criteria, of which $`r nrow(don_oa@data[!is.na(don_oa@data$res_total) & don_oa@data$res_total > 1, ])`$ are identified as resilient by two or more measures of area--based deprivation.
One area, to the north east near Thorne, is rural but the majority of resilient areas were in urban or suburban centres.
These include output areas in: Adwick le Street to the north; Stainforth to the north east; Armthorpe to the east; New Edlington to the south; Conisborough, Mexborough, and Denaby Main to the west; as well as Doncaster town itself.
<!--
Liddy: You seem to have generated a testable hypotheis here but is there any previous evidence or potential further analysis that could actually test this hypothesis – a very tantalizing way to end the chapter!!
DB: I agree- there is a need to discuss this further here (and/or in the concluding section) even in a speculative manner but ideally with some references to relevant literature (e.g. on social capital regarding the community centres etc)
-->
<!-- TODO: is it worth plotting the location of GP practices or similar? -->
### Resilient characteristics {#ressim-res-charac}
```{r remove-prisons}
# Variables with resilience characteristics end with _yes, _good, or _low
# So does 'depress_yes' so we need to remove this
# 'no_p' is for 'isol_no'
res_chars <- grepl("no_p$|_yes_p$|_good_p$|_low_p$", colnames(don_oa@data))
res_chars <- colnames(don_oa@data[, res_chars])
don_oa@data[!is.na(don_oa@data$prison_pop), res_chars] <- NA
```
```{r prep-res-char-plots, include=FALSE}
if (length(list.files("figures/cache/", "res_char_")) < 17) {
lapply(res_chars, function(x) {
map <-
tmap::tm_shape(don_oa) +
tmap::tm_fill(col = x, palette = "BuGn", textNA = "Prison OA") +
tmap::tm_borders(col = "black") +
tmap::tm_layout(frame = FALSE)
tmap::save_tmap(