-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdraft_report.qmd
1076 lines (775 loc) · 65.5 KB
/
draft_report.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Visualising CHD Pathway Inequalities by ethnicity"
lang: en-GB
author: "Jacqueline Grout"
date: last-modified
date-format: "YYYY-MM-DD"
title-block-banner: "#f9bf07"
title-block-banner-color: "#333739"
format:
html:
self-contained: true
grid:
sidebar-width: 200px
body-width: 950px
margin-width: 150px
gutter-width: 1.5rem
embed-resources: true
smooth-scroll: true
theme: cosmo
fontcolor: black
toc: true
toc-location: left
toc-title: Contents
toc-depth: 3
editor: visual
execute:
echo: false
message: false
warning: false
freeze: auto
editor_options:
chunk_output_type: console
css: styles.css
---
```{r}
#| echo: false
library(targets)
library(gt)
library(tidyverse)
library(grid)
library(gridExtra)
```
# Introduction
In 2022, the Strategy Unit worked with the British Heart Foundation (BHF) to explore ways of visualising socio-economic inequalities as they emerge through the coronary heart disease (CHD) pathway. We made these visualisations available so that health and care staff can understand where on the pathway socio-economic inequalities emerge and at which points they are moderated or exacerbated. The report[^1] and web-based tool[^2] are available via the BHF and Strategy Unit websites.
[^1]: <https://www.strategyunitwm.nhs.uk/publications/socio-economic-inequalities-coronary-heart-disease>
[^2]: <https://www.bhf.org.uk/icb-tool>
Whilst this initial analysis was focused on inequalities between socio-economic groups, BHF asked the Strategy Unit if we might explore the feasibility of assessing inequalities across other dimensions of inequality. This report explores inequalities by ethnicity along the coronary heart disease (CHD) pathway.
The objectives of the report are to:
1. Set out the methods by which the Strategy Unit have sought to represent CHD pathway inequalities by ethnicity.
2. Quantify and illustrate the inequalities by ethnicity over the disease progression and treatment pathway for CHD.
The analysis has been conducted by the Strategy Unit on behalf of the British Heart Foundation.
# Inequities in healthcare
The term ‘inequities’ is used to describe unjustifiable differences in rates of access between subgroups. An equity analysis must control for levels of need within each population subgroup. Having done this, an equitable distribution of services is one where rates of access to a service or population follow the distribution of need, such that a patient with a given level of need in one subgroup has the same chance of accessing a service as their counterparts with a similar level of need in other subgroups. This is the standard that the NHS seeks to achieve. Assessing equity is challenging. Further detail about inequalities and inequities in healthcare can be found in a previous Strategy Unit report[^3]
[^3]: <https://www.strategyunitwm.nhs.uk/publications/socio-economic-inequalities-access-planned-hospital-care-causes-and-consequences>
In our previous work, visualising socio-economic inequalities, our units of analysis were GP practices. The metrics of interest, risk factors, primary prevention interventions, secondary care procedures, outcomes etc, are readily available at GP practice level and there are established methods of assigning GP practices to deciles of deprivation. Having calculated a metric value for each decile of GP practices, we used the relative index of inequality (RII) to estimate the scale and direction of inequality. The RII can only be used when the dimension of inequality can be expressed as a set of order groups.
Assessing inequalities across ethnic groups is more challenging, since unlike deprivation, ethnicity cannot be expresseed as a set of ordered groups. The distribution of patients across ethnic groups is unique to each practice. There is no meaningful way to numerically aggregate these distributions into a single variable (e.g. % BME) without significant loss of information. An alternative method of grouping practices according to the distribution of its patients over ethnic group was required. We have used k-medoids clustering to assign practices to one of a small number of groups such that practices within a group have similar distributions of patients over ethnic groups. K-medoids is an established, and commonly used unsupervised machine learning technique. This clustering approach required data on the ethnicity of a practices registered population. We imputed it based on the Lower Super Output Area (LSOA) of residence of GP registrants and the ethnicity distribution of patients living in these LSOAs using the Census 2021 data.
Ethnicity, unlike deprivation, is not an ordered variable. The approach in our previous work used the relative index of inequality to measure the degree of inequality. This measure relies on the ordered quality of the socio-economic deprivation variable. To handle the categorical nature of the ethnicity variable we have used the relative index of disparity[^4] to indicate the extent to which the rate of an activity or event varies across groups. The index estimates the proportion of events (e.g., admissions) that would need to be redistributed between clusters in order that event rates follow levels of need. Further detailed explanations of the methods used in the analysis can be found in the appendix.
[^4]: *Pearcy J, Keppel K, A Summary Measure of Health Disparity, Public Health Reports, Vol 117, May-June 2002*
[*https://open.umich.edu/sites/default/files/downloads/PublicHealthRep-Pearcy.pdf*](https://gbr01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fopen.umich.edu%2Fsites%2Fdefault%2Ffiles%2Fdownloads%2FPublicHealthRep-Pearcy.pdf&data=05%7C02%7Cjacqueline.grout1%40nhs.net%7Ca9c82e4d3cc94a72de8408dc68333fc0%7C37c354b285b047f5b22207b48d774ee3%7C0%7C0%7C638499816545595235%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=ddk0rdD6HnQ1524JNCsTfr3AoiBePESeckGAAnQPSA4%3D&reserved=0)
# CHD Metrics
This report quantifies and illustrates levels of inequity across 33 metrics at various points along the continuum of coronary heart disease progression and over a typical treatment pathway. They are shown in the table below grouped by domain (risk factors, risk factor identification, primary prevention, disease identification, secondary prevention, tertiary prevention, intermediate and full outcomes), which represent the various stages along the pathway. Full definitions and data sources for each metric are included in the appendix.
*Table 1 - coronary heart disease pathway metrics*
```{r}
library(gt)
metrics_table <- tibble(metric_domain = c("Need","Risk factors","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Primary prevention","Primary prevention","Primary prevention","Primary prevention","Disease identification","Disease identification","Disease identification","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Intermediate outcome","Intermediate outcome","Intermediate outcome","Full outcomes","Full outcomes","Full outcomes","Full outcomes"),
name_metric=c("CHD synthetic prevalence estimates",
"Smoking synthetic prevalence estimates",
"Smoking register",
"Obesity register",
"Diabetes register",
"Depression register",
"The percentage of patients aged 45 or over who have a record of blood pressure in the preceding 5 years",
"CVD risk register",
"Percentage of patients aged 18 and over with GP recorded CVD (narrow definition), who are currently treated with lipid lowering therapy",
"Percentage of patients aged 18 and over, with no GP recorded CVD and a GP recorded QRISK score of 10% or more, CKD (G3a to G5), T1 diabetes (aged 40 and over) or T2 diabetes aged 60 and over, who are currently treated with lipid lowering therapy.",
"Smoking cessation support offered",
"Exception reporting for Smoking cessation support offered",
"CHD register",
"CT angiography",
"Electrocardiography",
"Aspirin, anti-platelet or anti-coagulent",
"Exception reporting for Aspirin, anti-platelet or anti-coagulent",
"Flu vaccination",
"Exception reporting for Flu vaccination",
"Percentage of patients aged 65 or over
who received a seasonal influenza vaccination between 1 September 2022 and 31 March 2023",
" Percentage of patients aged 18 to 64
years and in a clinical at-risk group who received
a seasonal influenza vaccination between 1
September 2022 and 31 March 2023",
"Referral to cardiology (First outpatient)",
"Cardiology outpatient DNAs",
"Elective PCI","Elective CABG","Waiting time for elective PCI / CABG","Elective PCI / CABG patients discharged before trimpoint",
"Cardiac rehabilitation - Started", "Cardiac rehabilitation - Completed","BP reading < 140/90","Readmission within 30 days of elective PCI / CABG","Emergency admissions for CHD","Deaths in hospital from CHD","Deaths in hospital from CHD <75","Deaths from CHD","Deaths from CHD <75"
)
)
metrics_table |>
gt()|>
cols_hide(metric_domain)|>
# tab_stubhead(label = md("**Domain**"))|>
tab_header(
title = md("**Table 1** - *Coronary heart disease pathway metrics*")
)|>
tab_row_group(
id="Need",
label = md("**Need**"),
rows = 1
) |>
tab_row_group(
id="Risk factors",
label = md("**Risk factors**"),
rows = 2
) |>
tab_row_group(
id="Risk factor identification",
label = md("**Risk factor identification**"),
rows = 3:7
)|>
tab_row_group(
id="Primary prevention",
label = md("**Primary prevention**"),
rows = 8:12
)|>
tab_row_group(
id="Disease identification",
label = md("**Disease identification**"),
rows = 13:15
)|>
tab_row_group(
id="Secondary prevention",
label = md("**Secondary prevention**"),
rows = 16:23
)|>
tab_row_group(
id="Tertiary prevention",
label = md("**Tertiary prevention**"),
rows = 24:29
)|>
tab_row_group(
id="Intermediate outcome",
label = md("**Intermediate outcome**"),
rows = 30:32
)|>
tab_row_group(
id="Full outcomes",
label = md("**Full outcomes**"),
rows = 33:36
)|>
row_group_order(groups=c("Need","Risk factors","Risk factor identification","Primary prevention","Disease identification","Secondary prevention","Tertiary prevention","Intermediate outcome","Full outcomes"))|>
cols_label(name_metric="")
```
# Clustering
In this analysis we used the k-medoids clustering method to group practices. Each practice was assigned to one of a small number of groups such that practices within the group have similar distributions of patients over ethnic groups. In order to obtain data on the ethnicity of a practice's registered population we imputed it based on the Lower Super Output Area (LSOA) of residence of GP registrants and the ethnicity distribution of patients living in those LSOAs using the Census 2021 data. Further detailed explanations of the method can be found in the appendix.
The clustering analysis resulted in 5 groups (clusters) of practices which are illustrated in the charts and map below.
## Cluster descriptions
The median percentage of each ethnic group for each cluster was calculated and is presented in these charts to describe their respective diversity.
### Cluster 1 - Least diverse
This cluster is the least diverse of the five and the median percentage of White patients is 97% (94% White British).
::: panel-tabset
## Five ethnic groups
```{r}
tar_read(cluster2_treemap_1)
```
## Fourteen ethnic groups
94% White British
```{r}
tar_read(cluster2_eth_chart_1)
```
:::
### Cluster 2
In this cluster the median percentage of White patients is 93% (87% White British). Those patients whose ethnicity is White are more likely to be White Irish or other White ethnicities (0.7% and 4% respectively) when compared to cluster 1. The median percentage of patients with a mixed ethnicity in this cluster is 2%.
::: panel-tabset
## Five ethnic groups
```{r}
tar_read(cluster2_treemap_2)
```
## Fourteen ethnic groups
87% White British
```{r}
tar_read(cluster2_eth_chart_2)
```
:::
### Cluster 3
In this cluster the median percentage of White patients is 78% (69% White British). The median percentage of patients with a mixed ethnicity in this cluster is 3.8%, which is higher than in cluster 2. Indian patients make up 3.3% of the patients, and 2.6% are patients whose ethnicity is Black African.
::: panel-tabset
## Five ethnic groups
```{r}
tar_read(cluster2_treemap_3)
```
## Fourteen ethnic groups
69% White British
```{r}
tar_read(cluster2_eth_chart_3)
```
:::
### Cluster 4
In cluster 4 just over half the patients are White with a median percentage of 55% (35% White British). The ethnic group of White has fewer White British than clusters 1 to 3, with 17% of patients having an ethnicity of White Other The median percentage of patients whose ethnicity is Black African in this cluster is 8.3%, which is higher than in cluster 3 and the median percentage of Black Caribbean patients is 3.9%. The median percentage of Black and Asian is 29% in this cluster.
::: panel-tabset
## Five ethnic groups
```{r}
tar_read(cluster2_treemap_4)
```
## Fourteen ethnic groups
35% White British
```{r}
tar_read(cluster2_eth_chart_4)
```
:::
### Cluster 5 - Most diverse
This cluster is the most diverse. Overall the median percentage of Asian patients in this cluster is 45%. There are 13.3% Indian, 9.7% Pakistani and 2.9% Bangladeshi. Mixed and other ethnic groups form a large proportion of the patients in this cluster.
::: panel-tabset
## Five ethnic groups
```{r}
tar_read(cluster2_treemap_5)
```
## Fourteen ethnic groups
22% White British
```{r}
tar_read(cluster2_eth_chart_5)
```
:::
## Age, Sex and Deprivation Profile of Clusters
Table 2 shows that the most diverse clusters are younger and the least diverse clusters are older. In particular almost one quarter of cluster 5 are under 18, compared to 19% of cluster 1 and 12% of cluster 1 are aged 75+ compared to only 4% of clusters 4 and 5.
```{r}
gp_reg_pat_prac_sing_age_female<- tar_read(gp_reg_pat_prac_sing_age_female)|>as_tibble()
gp_reg_pat_prac_sing_age_male<- tar_read(gp_reg_pat_prac_sing_age_male)|>as_tibble()
clusters_for_nacr <- tar_read(clusters_for_nacr)|>as_tibble()
females_gp_reg_pat <- gp_reg_pat_prac_sing_age_female |>
filter(age != "ALL") |>
mutate(age=as.numeric(case_when(age=="95+" ~ 95,
.default = as.numeric(age))))|>
mutate(age_group=case_when(age<18~'<18',
age>=18 & age <45 ~'18-44',
age>=45 & age <65 ~'45-64',
age>=65 & age <75 ~'65-74',
age>=75 ~'75+'))|>
select(-age)|>
mutate(sex=2)
males_gp_reg_pat <- gp_reg_pat_prac_sing_age_male |>
filter(age != "ALL") |>
mutate(age=as.numeric(case_when(age=="95+" ~ 95,
.default = as.numeric(age))))|>
mutate(age_group=case_when(age<18~'<18',
age>=18 & age <45 ~'18-44',
age>=45 & age <65 ~'45-64',
age>=65 & age <75 ~'65-74',
age>=75 ~'75+'))|>
select(-age)|>
mutate(sex=1)
all_gp_reg_pat <- males_gp_reg_pat|>
rbind(females_gp_reg_pat)|>
rename(gp_practice_code=org_code)|>
group_by(sex,age_group,gp_practice_code)|>
summarise(number_of_patients=sum(number_of_patients))|>
ungroup()|>
left_join(clusters_for_nacr|>select(gp_practice_code,cluster))|>
group_by(sex,age_group,cluster)|>
summarise(number_of_patients=sum(number_of_patients))|>
ungroup()|>
mutate(sex_name=case_when(sex==1~'Male',
sex==2~'Female'))|>
select(-sex)|>
group_by(cluster)|>
mutate(patients_total_cluster=sum(number_of_patients))|>
mutate(perc=number_of_patients/patients_total_cluster)|>
select(-number_of_patients,-patients_total_cluster)
all_gp_reg_pat |>
pivot_wider(names_from=c(cluster,sex_name),
names_sep = "_",
values_from = perc
)|>
select(age_group, starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")) |>
arrange(factor(age_group,
levels = c("<18","18-44","45-64","65-74","75+")))|>
gt(rowname_col = "age_group")|>
tab_spanner_delim(delim = "_") |>
tab_spanner(label = "Cluster", columns = c(starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")))|>
tab_stubhead(label=md("**Age Group**"))|>
tab_options(heading.title.font.size = 18,
heading.title.font.weight = "bolder",
column_labels.font.weight = "bold")|>
fmt_percent(decimals = 2)|>
tab_header(title = md("**Table 2** - *Age / Sex Breakdown by Cluster*"))
```
Table 3 shows that the most diverse cluster (cluster 5) has the greatest proportion of patients in practices in the most deprived quantile. Cluster 2 has the greatest proportion of patients in practices in the least deprived quantile.
```{r}
source("R/get_imd_data_by_gpprac.R")
patientweighted_practice_imd |>
mutate(gp_imd_quantile = case_when(gp_imd_decile<=2 ~ "1 - most deprived",
gp_imd_decile==3 ~"2",
gp_imd_decile==4 ~"2",
gp_imd_decile==5 ~"3",
gp_imd_decile==6 ~"3",
gp_imd_decile==7 ~"4",
gp_imd_decile==8 ~"4",
gp_imd_decile>=9 ~"5 - least deprived",
.default = "0"
))|>
group_by(gp_imd_quantile,cluster)|>
summarise(patients=sum(patients))|>
ungroup()|>
group_by(cluster)|>
mutate(cluster_total=sum(patients),
perc=patients/cluster_total)|>
select(-patients,-cluster_total)|>
pivot_wider(names_from=c(cluster),
names_sep = "_",
values_from = perc
)|>
select(gp_imd_quantile, starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")) |>
arrange(factor(gp_imd_quantile,
levels = c("1 - most deprived","2","3","4","5 - least deprived")))|>
gt(rowname_col = "gp_imd_quantile")|>
tab_stubhead(label=md("**IMD Quantile**"))|>
tab_options(heading.title.font.size = 18,
heading.title.font.weight = "bolder",
column_labels.font.weight = "bold")|>
tab_stubhead(label=md("**IMD Quantile**"))|>
tab_spanner(label = "Cluster", columns = c(starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")))|>
fmt_percent(columns=c(starts_with("1"), starts_with("2"), starts_with("3"),starts_with("4"),starts_with("5")),decimals = 2)|>
tab_header(title = md("**Table 3** - *Index of Multiple Deprivation Breakdown by Cluster*"))
```
## Clustered GP Practices
The interactive map below presents each GP practice coloured according to their assigned cluster. Cluster 1 (red) is the least diverse and cluster 5 (orange) is the most diverse.
```{r}
tar_read(cluster2_map)
```
# Rate of activity
The charts below show the activity rates by cluster for each metric along the CHD pathway. The horizontal line is the overall rate. The 95% confidence intervals are shown at the top of each bar. These activity rates take account of need by using CHD prevalence (or list size for risk factor) as the denominator of the calculation. Further detail on the methodology can be found in the appendix.
In some cases these upper and lower limits are so small as to not be visible (e.g. Obesity register), whereas for other metrics (e.g. Readmission within 30 days of a PCI or CABG) they are much wider. Metrics with larger volumes of activity data give rise to a greater confidence in the calculated rate (e.g. limits are closer together) when compared to metrics with smaller volumes of activity data where there is less confidence in the calculated rate (e.g. limits are further apart).
In calculating these rates for each cluster within each metric and then comparing the rates by cluster to the global rate for the metric it is possible to identify how activity rates vary across the 5 clusters. For there to be equity between the clusters the rates would need to be the same. The charts below present the metrics grouped according to pathway domain.
### Risk factors
::: callout-tip
## Key Findings
- Activity rates vary very little for the risk factor metric % patients 45+ with a record of BP \< 5 years
- Rates of patients on the diabetes register are highest in the most diverse cluster
- Rates of patients on the depression register are highest in the least diverse cluster and lowest in the most diverse cluster
:::
#### Risk factors
```{r}
tar_read(rate_chart_risk_fact)
```
#### Risk factor identification
```{r}
tar_read(rate_chart_risk_fact_ident)
```
### Primary prevention
::: callout-tip
## Key Findings
- The activity rate for CVD patients treated with LLT is highest for those patients whose practice is in the least diverse cluster and lowest for patients whose practice is in cluster 4.
- A similar pattern applies for at risk patients treated with LLT, activity rates being highest for those in the least diverse cluster.
:::
```{r}
tar_read(rate_chart_prim_prevent)
```
### Disease identification
::: callout-tip
## Key Findings
- Activity rates vary considerably across the clusters in the disease identification metrics, with differing patterns for each metric
- Patients from clusters 4 and 5 (most diverse) are less likely to be recorded on the CHD register
- Clusters 5 and 4 have lower rates of CT angiography and electrocardiography
:::
```{r}
tar_read(rate_chart_disease_ident)
```
### Secondary prevention
::: callout-tip
## Key Findings
- There is little variation between clusters 1 to 4 in referral rates to outpatient cardiology, although the referrals are lower for the most diverse cluster (cluster 5)
- Flu vaccination rates for 65+ patients are highest among patients whose practice is in the least diverse cluster. the two most diverse clusters having the lowest rates
- Amongst patients under 65 who are at risk, flu vaccination rates are less varied between the clusters than for 65+ patients. Cluster 4 has the lowest rate of vaccination
- Cardiology outpatient DNA rates are higher the more diverse the cluster
:::
```{r}
tar_read(rate_chart_second_prevent)
```
### Tertiary prevention
::: callout-tip
## Key Findings
- All the tertiary prevention metrics, except waiting time for elective PCI/CABG, follow broadly the same pattern of activity rates with the highest rates being in the least diverse cluster (cluster 1), followed by cluster 2 and then clusters 5 (the most diverse) and 3. Cluster 4 has the lowest activity rates.
- The rate for waiting time for elective PCI/CABG is greatest in the most diverse cluster (cluster 5) and lowest in cluster 4, although overall there is little variation between the clusters.
:::
```{r}
tar_read(rate_chart_tert_prevent)
```
### Intermediate outcome
::: callout-tip
## Key Findings
- The least diverse cluster has the highest rates of BP readings \< 140/90.
- Emergency admissions for CHD and readmissions within 30 days of a PCI or CABG are highest for the least diverse cluster and lowest for cluster 4.
:::
```{r}
tar_read(rate_chart_int_out)
```
### Full outcomes
::: callout-tip
## Key Findings
- The least diverse cluster has the highest rate of CHD deaths and premature CHD deaths. Cluster 4 has the lowest rate of CHD deaths.
- CHD hospital deaths and premature CHD hospital deaths also have the lowest rate in cluster 4. Premature CHD hospital deaths have a similar rate for the other clusters including the most diverse cluster.
:::
```{r}
tar_read(rate_chart_full_out)
```
::: {.callout-note appearance="minimal"}
Work led by Professor Sir Michael Marmot and colleagues in the late 1980s revealed that first-generation South Asians living in the UK have a higher rate of coronary heart disease and diabetes compared to White Europeans[^5]. This gives rise to the question, why are the death rates lower in the more diverse clusters? There are a number of factors to consider in understanding this including; the age/sex profile of the clusters, the methodology for calculating the CHD synthetic prevalence (the measure of need in this project), and age and sex standardised mortality rates (ASMR) compared to the project CHD mortality rates. These are examined further in the appendix.
:::
[^5]: <https://www.bhf.org.uk/what-we-do/our-research/research-successes/ethnicity-and-heart-disease>
# Relative Index of Disparity
The relative index of disparity for the metric is expressed here as a %, such that the % indicates the amount of activity that would need to be redistributed between clusters in order that event rates follow levels of need. The detail of the methodology used for its calculation can be found in the appendix.
The chart below presents the metrics along the CHD pathway from risk factor and identification through to outcome measures of death from CHD. For each metric the index of disparity is shown as a point estimate (yellow dot) with the upper and lower confidence intervals in grey.
::: callout-tip
## Key Findings
- The greatest disparity along the CHD pathway is flu vaccinations for patients aged 65+ where the relative index of disparity is 17.14% (95% ci - 17.11%, 17.17%), indicating that 17.1% of activity needs to be redistributed so that rates reflect need.
- The most equitably distributed metric is blood pressure checks within the last 5 years for patients aged 45+ with a relative index of disparity of 0.8% (95% ci - 0.82%, 0.78%).
- There are 7 metrics where the index of disparity is less than 5%, as well as blood pressure checks, these include metrics such as referrals to outpatient cardiology, outpatient DNAs and waiting times for elective procedures.
- Out of the 33 metrics along the CHD pathway, 13 have an index of disparity of greater than 10%.
- In general (but not exclusively) these greater levels of disparity occur in the disease identification and secondary and tertiary prevention pathway domains and consequently the outcome metrics related to deaths from CHD.
:::
```{r}
#| out-height: 600px
tar_read(ci_iod_chart)
```
```{r}
#| code-fold: true
#| code-summary: "Data"
rate_chart_data <- tar_read(rate_chart_data) |> as_tibble()
iod_with_ci <- tar_read(iod_with_ci) |> as_tibble()
metric_names <- rate_chart_data |>
select(pathway,metric_name,name) |>
unique() |>
rename(metric_name_full=metric_name, metric_name=name)
disparity_data <- iod_with_ci |> left_join(metric_names)
disparity_data |>
select(-metric_name) |>
gt() |>
cols_move(columns = c(iod,lower_ci,upper_ci),
after = metric_name_full) |>
tab_row_group(
label = md("**Full outcomes**"),
rows = pathway == "Full outcomes"
) |>
tab_row_group(
label = md("**Intermediate outcome**"),
rows = pathway == "Intermediate outcome"
) |>
tab_row_group(
label = md("**Tertiary prevention**"),
rows = pathway == "Tertiary prevention"
) |>
tab_row_group(
label = md("**Secondary prevention**"),
rows = pathway == "Secondary prevention"
) |>
tab_row_group(
label = md("**Disease identification**"),
rows = pathway == "Disease identification"
) |>
tab_row_group(
label = md("**Primary prevention**"),
rows = pathway == "Primary prevention"
) |>
tab_row_group(
label = md("**Risk factor identification**"),
rows = pathway == "Risk factor identification"
) |>
tab_row_group(
label = md("**Risk factors**"),
rows = pathway == "Risk factors"
) |>
fmt_number(columns=c(iod,lower_ci,upper_ci),
decimals=3) |>
cols_label(pathway = md("**Pathway**"),
metric_name_full = md("**Metric**"),
iod= md("**IOD**"),
lower_ci = md("**Lower CI**"),
upper_ci = md("**Upper CI**"))
```
::: {.callout-tip icon="false"}
## Confidence Intervals
Confidence intervals are shown in grey. There is 95% confidence that the index of disparity is within this range. The confidence interval estimation methods are detailed in the appendix.
:::
## Routes to equity
A previous Strategy Unit report "Strategies to reduce inequalities in access to planned hospital procedures"[^6] highlights in chapter 1, three of the many routes from inequity to equity: levelling-up, levelling-down and zero-sum redistribution. Using the zero-sum redistribution route as an example, along with the metric flu vaccinations for patients aged 65+, the illustration below shows how equity could be achieved by increasing and decreasing the number of vaccinations delivered in each cluster to give the same activity rate per cluster.
[^6]: <https://www.midlandsdecisionsupport.nhs.uk/knowledge-library/strategies-to-reduce-inequalities-in-access-to-planned-hospital-procedures/>
*Table 4 - achieving equity via zero-sum redistribution*
| Cluster | Number of vaccinations | CHD synthetic prevalence | Rate (No. of vaccinations / CHD synthetic prevalence) | Global rate | Change in number of vaccinations to match global rate |
|:----------:|-----------:|-----------:|:----------:|:----------:|:----------:|
| | *n* | *d* | *r=n/d* | *gr=𝚺n/𝚺d* | *(gr-r) x d* |
| 1 | 3,356,002 | 1,228,453 | 2.73 | 1.92 | -1,001,284 |
| 2 | 2,638,698 | 1,124,040 | 2.35 | 1.92 | -484,121 |
| 3 | 1,862,430 | 1,181,796 | 1.58 | 1.92 | 402,855 |
| 4 | 476,616 | 571,049 | 0.83 | 1.92 | 617,979 |
| 5 | 332,728 | 415,950 | 0.80 | 1.92 | 464,571 |
| **Total** | **8,666,474** | **4,521,288** | 1.92 | | **0** |
In reality, in attempting to achieve equity for this metric, it is unlikely that the offer of vaccination would be removed from patients whose practice is in clusters 1 and 2. More likely would be an approach to focus public health and vaccination roll-out campaigns on practices in clusters 3, 4 and 5.
Each of the metrics along the CHD pathway with the higher relative indicies of disparity would almost certainly require a variety of different routes from inequity to equity given their differing nature. The more effective interventions on the pathway might be levelled-up, whilst those of more limited value might be levelled-down.
# Regional Analysis
## Clusters
The charts below show the distribution of the patients in each NHS region according to the cluster of their GP practice.
::: panel-tabset
## London
```{r}
#1
region_cluster_charts <- tar_read(region_cluster_charts)
region_cluster_charts[[2]]
```
## South West
```{r}
#2
region_cluster_charts[[7]]
```
## South East
```{r}
#3
region_cluster_charts[[6]]
```
## Midlands
```{r}
#4
region_cluster_charts[[3]]
```
## East of England
```{r}
#5
region_cluster_charts[[1]]
```
## North West
```{r}
#6
region_cluster_charts[[5]]
```
## North East and Yorkshire
```{r}
#7
region_cluster_charts[[4]]
```
:::
## Relative Index of Disparity
::: callout-tip
## Key Findings
- Looking at a regional level, the greatest disparity is in CT angiography in the South West, Midlands, and the North West.
:::
::: panel-tabset
## London
```{r}
#| out-height: 600px
regional_charts <- tar_read(regional_charts)
regional_charts[1]
```
## South West
```{r}
#| out-height: 600px
regional_charts[2]
```
## South East
```{r}
#| out-height: 600px
regional_charts[3]
```
## Midlands
```{r}
#| out-height: 600px
regional_charts[4]
```
## East of England
```{r}
#| out-height: 600px
regional_charts[5]
```
## North West
```{r}
#| out-height: 600px
regional_charts[6]
```
## North East and Yorkshire
```{r}
#| out-height: 600px
regional_charts[7]
```
:::
The appendix contains equivalent charts to those above, showing the Index of Disparity for each CHD pathway metric for each of the 42 ICBs.
# Conclusions
In conclusion, there are three main aspects to this analysis on which to reflect: methods, findings and next steps.
Assessing inequalities across ethnic groups was challenging on a number of levels, it was both experimental and complex. Ethnicity can’t be expressed as a set of ordered groups and there is no meaningful way to numerically aggregate into a single variable. Consequently, relative index of inequality can’t be used.
This analysis attempted to overcome these challenges using novel methods, creating 5 clusters of GP practices using K-Medoids clustering on ethnicity %, followed by use of the relative index of disparity to indicate the extent to which the rate of an activity or event on the CHD pathway varied across the clusters.
Further challenges arose during the analysis, which are detailed in the appendix, however this analysis has made progress in the methods used and they are useful in presenting a visualisation of the variation in disparity of equity across the CHD pathway.
The results of the analysis should be viewed as tentative rather than actionable, adding to a fuller picture alongside other work and thinking on inequalities and CHD.
Early findings show that variation clearly exists across the pathway with some metrics showing less than 1% disparity (blood pressure checks within 5 years for 45+), whilst others have as much as 17% disparity (flu vaccinations for 65+).
The analysis found that risk factors varied little across clusters, although diabetes rates were identified more in the most diverse cluster and depression was identified more in the least diverse cluster. Moving through the pathway, CVD patients or at risk patients had higher rates of lipid lowering therapies in the least diverse clusters. Rates of diagnosis (on the CHD register, CT angiography, and electrocardiography) were all lower in the more diverse clusters.
Moving across the pathway towards treatment and care, inequity showed up more between clusters in some of the secondary and tertiary prevention metrics in particular CHD patients receiving Asprin, APT or ACT (12.5%), starting cardiac rehabilitation(11%) and Elective CABG (10%).
Mortality (CHD deaths and premature CHD deaths, including in hospital) was higher in the least diverse cluster, which may reflect age and sex profiles of the population (and which is examined and discussed in further detail in the appendix).
A lot of risk factors had similar rates to those found when examining the pathway through a socio-economic lens but a lot of diagnostics such as being on the CHD register or getting a CT Angiography were lower in the more diverse clusters. There are also parallels with the previous work examining socio-economic inequalities along the pathway for secondary and tertiary prevention metrics with many of these metrics such as Elective PCI and CABG and receipt of Asprin etc also showing that inequity favoured the least deprived.
Many of these findings give rise to further questions and therefore there are a number of further routes for analysis and possible next steps that could be taken to extend and follow up this project.
Whilst this project includes some regional and ICB level presentations, in some cases the majority of GP practices in a region or ICB have the same cluster. The analysis could be recreated at a regional level, whereby new unique clusters could be assigned separately for a region, potentially giving rise to more nuanced clusters from which local disparities could be more easily identified.
The methods used in this project could be transferred to other pathways and services, re-using the same clusters and then selecting different sets of metrics for example related to COPD, Cancer, Mental Health etc.
The code for this project is available on GitHub and as such this enables the work to be reused and extended more easily.
Many of the metrics used in this analysis were only available at a GP practice level, necessitating some of the approaches taken. Where record level data is available this would allow for more detailed exploration and understanding of some of these initial findings. Of particular interest would be further exploration of the findings surrounding mortality rates and the inter-relationships between different factors of inequality such as ethnicity and deprivation, as well as better understanding at an individual ethnicity level.
# Appendix
## Definitions and data sources of pathway metrics
The table below sets out the sources of the various metrics as well as the time-period to which they relate, and details of the selection criteria used.
```{r}
library(gt)
metrics_table <- tibble(metric_domain = c("Need","Risk factors","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Risk factor identification","Primary prevention","Primary prevention","Primary prevention","Primary prevention","Disease identification","Disease identification","Disease identification","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Secondary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Tertiary prevention","Intermediate outcome","Intermediate outcome","Intermediate outcome","Full outcomes","Full outcomes","Full outcomes","Full outcomes"),
name_metric=c("CHD synthetic prevalence estimates",
"Smoking synthetic prevalence estimates",
"Smoking register",
"Obesity register",
"Diabetes register",
"Depression register",
"The percentage of patients aged 45 or over who have a record of blood pressure in the preceding 5 years",
"CVD risk register",
"Percentage of patients aged 18 and over with GP recorded CVD (narrow definition), who are currently treated with lipid lowering therapy",
"Percentage of patients aged 18 and over, with no GP recorded CVD and a GP recorded QRISK score of 10% or more, CKD (G3a to G5), T1 diabetes (aged 40 and over) or T2 diabetes aged 60 and over, who are currently treated with lipid lowering therapy",
"Smoking cessation support offered",
"Exception reporting for Smoking cessation support offered",
"CHD register",
"CT angiography",
"Electrocardiography",
"Aspirin, anti-platelet or anti-coagulent",
"Exception reporting for Aspirin, anti-platelet or anti-coagulent",
"Percentage of patients aged 65 or over
who received a seasonal influenza vaccination
between 1 September 2022 and 31 March 2023",
" Percentage of patients aged 18 to 64
years and in a clinical at-risk group who received
a seasonal influenza vaccination between 1
September 2022 and 31 March 2023",
"Referral to cardiology (First outpatient)",
"Cardiology outpatient DNAs",
"Elective PCI","Elective CABG","Waiting time for elective PCI / CABG","Elective PCI / CABG patients discharged before trimpoint",
"Cardiac rehabilitation - Started", "Cardiac rehabilitation - Completed","BP reading < 140/90","Readmission within 30 days of elective PCI / CABG","Emergency admissions for CHD","Deaths in hospital from CHD","Deaths in hospital from CHD <75","Deaths from CHD","Deaths from CHD <75"
),
data_source=c("PHE","GP Survey","NHSD QOF","NHSD QOF","NHSD QOF","NHSD QOF","NHSD QOF","NHSD QOF","CVD Prevent","CVD Prevent",
"NHSD QOF","NHSD QOF","NHSD QOF","SUS","SUS","NHSD QOF","NHSD QOF","IIF","IIF","SUS",
"SUS","SUS","SUS","SUS","SUS","NACR","NACR","NHSD QOF","SUS","SUS",
"SUS","SUS","ONS Death records","ONS Death records"),
year=c("2015","2023","2022/23","2022/23","2022/23","2022/23","2022/23","2019/20","2022/23","2022/23",
"2022/23","2022/23","2022/23","2022/23","2022/23","2022/23","2022/23","March 2023","March 2023","2022/23",
"2022/23","2022/23","2022/23","2022/23","2022/23","2021 & 2022","2021 & 2022","2022/23","2022/23 (Mar-Feb)","2022/23",
"2022/23","2022/23","2022/23","2022/23"),
definition=c("Provided directly by OHID (formerly indicator 92847 in FingertipsR)","","Indicator 91280 extracted from FingertipsR","Indicator 93088 extracted from FingertipsR","Indicator 241 extracted from FingertipsR","Indicator 848 extracted from FingertipsR","Indicator 91262 extracted from FingertipsR","Formerly indicator 92589 extracted from FingertipsR","https://www.cvdprevent.nhs.uk/home","https://www.cvdprevent.nhs.uk/home",
"Indicator 90619 extracted from FingertipsR","Indicator 90619 extracted from FingertipsR","Indicator 273 extracted from FingertipsR","OPCS Code: U102 - Cardiac computed tomography angiography. Elective only","OPCS: U19 & U34. Elective only","Indicator 90999 extracted from FingertipsR","Indicator 90999 extracted from FingertipsR","https://www.england.nhs.uk/primary-care/primary-care-networks/network-contract-des/iif/","https://www.england.nhs.uk/primary-care/primary-care-networks/network-contract-des/iif/","TFC = 320",
"TFC = 320","OPCS Code: K49, K50, K75. FCE = 1","OPCS Code: K40-K46, FCE = 1","Elective admissions. OPCS Codes as per metrics 20 and 21","Main PCI and CABG Trim points for 2022/23 linked for HRG","Data supplied by ICB. Aggregated ACS and HF patients","Data supplied by ICB. Aggregated ACS and HF patients","CHD008","No. of emerg. spells up to 31/03/2023 within 0-29 days (inclusive) of the last, previous discharge from hospital / No. of finished spells with discharge date between 01/03/2022 and 28/02/2023 Exclude: TFC = 501, 560, 610, OPCS starting with O, Classpat = 1 (Ord.), Any diagnosis = C00*-C97*, D37*-D48*","PRIMARY diagnosis = I20 or I21 or I22 or I23 or I24 or I25",
"Following any admission Elective or Emergency. PRIMARY diagnosis = I20 or I21 or I22 or I23 or I24 or I25","Following any admission Elective or Emergency. PRIMARY diagnosis = I20 or I21 or I22 or I23 or I24 or I25. Age <=75","Underlying Cause of Death = I20 or I21 or I22 or I23 or I24 or I25","Underlying Cause of Death = I20 or I21 or I22 or I23 or I24 or I25. Age <=75")
)
metrics_table |>
gt()|>
tab_header(
title = md("**Table 5** - *Metric definitions and data sources*"))|>
cols_label(name_metric=md("**Metric**"),
metric_domain=md("**Domain**"),
data_source=md("**Data Source**"),
year=md("**Year**"),
definition=md("**Definition and selection criteria/codes**"))
```
## Methods explained
### QOF data
Metrics taken from the QOF data via Fingertips have been extracted by GP practice and the counts (numerator of the measure) from the calculations have been used. Exception reporting metrics have used the Personalised Care Adjustment (PCA) data to obtain the number of patients by GP practice for whom a PCA has been recorded for the relevant metric. Possible reasons for a PCA include; Newly diagnosed/registered, Intervention is clinically unsuitable, patient choice, did not respond to offers of care, service not available.
### SUS data
SUS data was extracted using Transact-SQL and using relevant OPCS, ICD10, HRG, and Treatment function codes as detailed in table 5 above.
### Cardiac rehabilitation
This data was supplied by NACR from an extract taken in March 2024. The data wasn’t available by GP Practice, so an alternative method was followed. NACR was supplied with a list of practices, ICBs and the cluster to which they had been allocated by the Strategy Unit (SU). The data was then supplied to the SU aggregated into clusters and ICBs. The data relates to patients who started rehabilitation in 2021 or 2022 calendar years and were ACS or HF according to the NHS England reporting. These two years are the most recent complete years, and following changes in the way rehabilitation has been delivered due to the Covid pandemic, most closely reflect the rehabilitation model now in place. Patients needed to have GP Practice code recorded to be matched to a cluster. Those that didn't equated to 42,637 patients that had either NO GP CODE completed, no matching GP Code or GMC Number added, representing 48% of patient records. Within the data included in this analysis, there are 11 clusters across 9 ICBs that have no rehabilitation data despite there being practices in the cluster. Due to these high levels of missing data the two cardiac rehabilitation metrics have been included in the presentations of the index of disparity and activity rates at a national level, but excluded from the regional and ICB level presentations.
### Readmission within 30 days
The methodology used for this metric follows, as far as possible, that detailed by NHS Digital[^7]. The readmissions counted relate to those who had an elective PCI / CABG in the period 1/3/2022 to 28/2/2023.
[^7]: <https://digital.nhs.uk/data-and-information/publications/statistical/ccg-outcomes-indicator-set/specifications/3.2-emergency-readmissions-within-30-days-of-discharge-from-hospital_1_4>
### ONS Death records
ONS death records contain an encrypted HES ID which was used to link to outpatient, inpatient and A&E HES records, from which the latest GP practice recorded (on a spine traced record) for the patient was then assigned to the death record.
### CHD synthetic prevalence estimates
Public Health England (PHE) CHD synthetic prevalence estimates data was previously available from OHID via Fingertips. This data has now been removed from Fingertips and archived, however OHID supplied it via email for this project following a request. This data is for 7564 practices from 2015, of which 6297 are current practices. In order to assign a prevalence for practices with missing data a nearest neighbours methodology was used to impute the prevalence by taking an average of the prevalence of each practice within a 1.5km radius of the practice without data. Further details, including code with sample data, for this methodology are available in a blog post on the Strategy Unit Data Science website[^8].
[^8]: <https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-17_nearest_neighbour.html>
### Ethnicity by GP practice
Ethnicity by GP practice was imputed based on the Lower Super Output Area (LSOA) of residence of GP registrants and the ethnicity distribution of patients living in these LSOAs using the Census 2021 data.
#### GP list size by LSOA
Since metric data is mainly from the 2022/23 year, the October 2022 list size data was taken from the NHS Digital Patients Registered at a GP practice dataset[^9] as this is the data closest to the mid point in the 22/23 year. The GP to ICB mapping file was also downloaded and joined to the list size to easily create analysis by ICB and region.
[^9]: <https://digital.nhs.uk/data-and-information/publications/statistical/patients-registered-at-a-gp-practice/october-2022>
#### Census 2021 Ethnicity by LSOA
The 2021 Census Ethnicity data was downloaded from the .gov.uk website.[^10] Summary details of the census ethnicity data is also available and this includes descriptions of the standardised list of ethnic groups and the differences from the 2011 ethnic groups.[^11]
[^10]: <https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/demographics/age-groups/latest/#download-the-data>
[^11]: <https://www.ethnicity-facts-figures.service.gov.uk/uk-population-by-ethnicity/demographics/age-groups/latest/>
The 2021 Census Ethnicity data by LSOA uses the 2021 LSOAs. The GP list size data uses the 2011 LSOAs. This results in gaps when the datasets are combined. To overcome this the following method was used:
- An LSOA lookup showing 2011 and 2021 LSOA parent and child relationships was joined to the 2021 Census Ethnicity data
- Where a 2021 LSOAs has a parent LSOA in 2011 then we applied the average ethnicity % for the child LSOAs to the parent (this is when one 2011 LSOA has been split into multiple 2021 LSOAs).
- Where the LSOAs in 2021 have a child LSOA in 2011 then we applied the ethnicity % for the 2021 parent to the 2011 children (this is when several LSOAs have been combined into one new LSOA).
Further details about item editing and imputation processes for Census 2021, England and Wales are available on the ONS website. This details techniques used by ONS with the aim of arriving at a fully populated clean database. These techniques include manual imputation and nearest neighbour donor imputation. For the England and Wales Census 2021 the imputation rate for ethnic group was 1.3%.[^12]
[^12]: <https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/methodologies/itemeditingandimputationprocessforcensus2021englandandwales>
### K-Medoids Clustering
K-medoids is an established, and commonly used unsupervised machine learning technique. This clustering approach required data on the ethnicity of a practices registered population which was obtained as per the methodology explained above (Ethnicity by GP practice). The data used in the clustering was the % of patients in each ethnic group per practice, such that for each practice all the ethnic group %'s added to 100%.
The K-medoids clustering was performed using the {cluster} package[^13] in R using partitioning (clustering) of the data into k clusters "around medoids" (PAM), which is a more robust version of K-means. The algorithm is based on the search for k representative objects or medoids among the observations of the dataset. After finding a set of k medoids, k clusters are constructed by assigning each observation to the nearest medoid. The goal is to find k representative objects which minimise the sum of the dissimilarities of the observations to their closest representative object.[^14]
[^13]: Maechler M, Rousseeuw P, Struyf A, Hubert M, Hornik K (2023). *cluster: Cluster Analysis Basics and Extensions*. R package version 2.1.6 — For new features, see the 'NEWS' and the 'Changelog' file in the package source), [https://CRAN.R-project.org/package=cluster](https://cran.r-project.org/package=cluster).
[^14]: <https://cran.r-project.org/web/packages/cluster/cluster.pdf>
Clustering experiments were conducted using a variety of different combinations of ethnic breakdowns, such as 14 ethnic groups and 5 ethnic groups, as well as variations that excluded the % of White British patients or factored in age group. Techniques were also used to attempt to determine the optimal number of clusters such as the elbow method, gap plots, and scree plots. Consideration was given to combining some ethnic groups and in doing so a correlation matrix was produced to identify ethnic groups that commonly featured together. It was also important to seek a number of clusters that could be defined easily, where characteristics did not overlap each other so that resulting clusters were too similar to each other. It was also necessary to ensure that there was a sufficient number of practices in each cluster, too few practices in a cluster could result in issues with small amounts of metric data when calculating activity rates and consequently lead to wide confidence intervals in the index of disparity calculations.
In conclusion, the final version selected, and presented here, used the percentage of the GP list size in each of the 14 census 2021 ethnic groups and five clusters was felt to give a sufficient number of describable clusters for which the index of disparity calculations could be performed. The chart below is an elbow plot generated for the 14 ethnic groups showing an elbow at around 4 or 5 clusters.
```{r}
tar_read(elbow_plot)
```
Table 6 shows the number of GP practices assigned to each of the 5 clusters and the characteristics of the clusters are described in the main body of the report above.
```{r}
final_data_full_cats_percent_5_clusters <- tar_read(final_data_full_cats_percent_5_clusters) |> as_tibble()
final_data_full_cats_percent_5_clusters |>
group_by(cluster)|>
count("practice_code")|>
ungroup() |>
select(cluster,n)|>
gt()|>
tab_header(title = md("**Table 6** - *Number of practices per cluster*"))|>
cols_label(cluster=md("**Cluster**"),
n=md("**Number of practices**"))
```
### Index of Disparity calculations
#### Rate of activity
Calculations of the index of disparity begin by calculating an activity rate using a numerator of the activity ( eg number of elective procedures) divided by a denominator, which for metrics other than risk factors has been taken to be CHD synthetic prevalence (as a determinant of need).
Where the metric is in the risk factor domain the denominator used is GP practice list size (with the exception of the percentage of patients aged 45 or over who have a record of blood pressure in the preceding 5 years, where the 45+ list size has been used).
In calculating these rates for each cluster within each metric and then comparing these rates by cluster to the global rate for the metric it is possible to identify how activity rates vary across the 5 clusters
#### Index of disparity
The relative index of disparity indicates the extent to which the rate of an activity or event varies across groups. Having calculated the rates for each cluster within each metric the disparity is calculated by taking the differences for each cluster from the global rate and then determining the amount of activity by which this cluster varies from the global. This absolute difference is then summed for the 5 clusters and divided by twice the sum of the numerators. This produces a relative index of deprivation for the metric, expressed as a %, such that the % indicates the amount of activity that would need to be redistributed between clusters in order that event rates follow levels of need.
Index of disparity = (𝚺 \| r~(1–n)~ - R\| / n) / R\*100
where r = group rate and R = total population rate
A report by Pearcy and Keppel (2002) presents the Index of Disparity as a summary measure of disparity across population groups and advocates its use for groups defined in terms of categories such as ethnicity.[^15]
[^15]: <https://journals.sagepub.com/doi/epdf/10.1093/phr/117.3.273>
### Confidence Intervals
Confidence intervals of 95% are presented in the national charts presented in this report. Confidence intervals have been used, rather than calculations of statistical significance as this presents a clearer picture of the likelihood of the disparity and due to concerns with the suitability in this case of statistical significance.
> "Practitioners need to be aware that statistical significance differs from practical importance in that statistical significance is highly dependent upon sample size. For a large sample, a statistically significant result is likely to be obtained even when the actual magnitude of an effect is small and of little or no practical importance. On the other hand, for a small sample, it is quite likely that insufficient evidence of a statistically significant result will be obtained – even when there is, indeed, an effect of practical importance." (Hahn, G. J., Doganaksoy, N., Meeker, W. Q., 2019)[^16]
[^16]: <https://academic.oup.com/jrssig/article/16/4/20/7038025>
In order to estimate the confidence intervals for the index of disparity, a bootstrap method was adopted, assuming independence, generating 1000 random samples for each metric for which an index of disparity was calculated. The 2.5th and 97.5th percentiles were then taken for each metric to give the upper and lower limits of the 95% confidence interval for each.
An explanation of bootstrap theory and confidence intervals is available in Practical Statistics for Data Scientists (Bruce, P. and Bruce, A., 2017)[^17]
[^17]: <https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/>