This repository has been archived by the owner on Sep 26, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcj-tulsa-05-19.Rmd
444 lines (312 loc) · 11.4 KB
/
cj-tulsa-05-19.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
---
title: "CJ: Tulsa County 05-19"
author:
- Leticia Calvillo, Fellow
- Open Justice Oklahoma
date: "`r Sys.Date()`"
output:
html_document:
number_sections: yes
toc: yes
toc_float: yes
pdf_document:
toc: yes
---
```{r setup, include=FALSE}
library(ojodb)
library(knitr)
nc <- function(x) {comma(x, digits = 0)}
```
Background information: Why are we researching this topic? What is the expected product? A blog post? Analysis for an internal or external partner?
This project aims to shed light on the extent of medical debt in Oklahoma. The goal is to produce blog posts, fact sheets, and/or reports that emphasize the scale of the medicial debt problem in Oklahoma and urge citizens to vote in favor of SQ 802.
> **Research Question**
How many medical debt cases were filled in Oklahoma County small claims court each year from 2005-2019?
# Data sources
Where is the data from? The ojo database? OSCN? ODCR? Crime in Oklahoma reports?
The data is from the ojo database.
## Timeframe
What years/months are you looking at? Year of case filing, first disposition, prison admission, etc.? Be as specific as possible.
The analysis aims to identify trends in medical debt cases over the past 15 years (2005-2019).
## Geography
What courts/counties/states are you looking at?
This analysis focuses on CJ court cases in Tulsa County.
## Variables
What variables did you use? What types of cases, crimes, etc.?
This project focuses on medical debt cases.
## Query
If you're using data from the OJO database, include the query you used to pull your data. This may be a function that starts `ojo_query_` or `dbGetQuery`.
``` {r}
CJ <- ojo_query_disps("TULSA", "CJ", 2005:2019)
```
This gives us over 364,318 rows of data. Each row in the table is a disposition; since there may be multiple dispositions per case, we should expect that the number of cases is somewhat lower.
# Checking Data
## Check for Completeness
If you're using court data, you can use the `ojo_check_comp` to see how complete the data is.
```{r}
#ojo_check_comp(CJ)
#completeness
```
The pct_complete column shows that we have nearly 100 percent of expected cases for each year.
## Check for NAs
Detail the NAs and blank items in your data and, if necessary and possible, fill them in.
``` {r}
nas_plaint<- CJ %>%
group_by(casenum) %>%
filter(all(is.na(iss_plaint) | iss_plaint == ""))
nas_plaint
nas_plaint <- nas_plaint %>%
group_by(casenum) %>%
slice(1)
year_nas_plaint <- nas_plaint%>%
ungroup %>%
count(file_year)
year_nas_plaint
```
Most of the missing plaintiff data occurs in years 2005.
``` {r}
nas_desc <- CJ %>%
group_by(casenum) %>%
filter(all(is.na(iss_desc) | iss_desc == ""))
nas_desc
nas_desc <- nas_desc%>%
group_by(casenum) %>%
slice(1)
year_nas_desc <- nas_desc%>%
ungroup %>%
count(file_year)
year_nas_desc
```
Next, we will add the debt amounts to our dataset.
## Extract debt amount from minutes
```{r}
mins <- ojo_query_mins("TULSA", "CJ", 2005:2019, min_code = "A/")
mins <- mins %>%
mutate(debt_amt = min_desc %>%
str_remove_all("[:alpha:]|\\$|,") %>%
str_squish %>%
as.numeric)
debt_amts <- mins %>%
select(casenum, debt_amt)
```
``` {r}
new <- left_join(CJ, debt_amts, by= "casenum")
CJ <- new
```
Find number of cases
``` {r}
#CJ <- CJ %>%
#group_by(casenum) %>%
#slice(1)
```
That gives us 102,082 lawsuits filed.
# Defining Measures
## Debt Cases
CJ cases generally involve a debt of more than 10,000. To narrow down our data set, we’ll limit to cases that involve debt.That information is found in the issue description (iss_desc) column.
``` {r}
CJ %>%
count(iss_desc) %>%
arrange(desc(n))
```
We see that over 132,788 cases are foreclosure. We can discard those.
We use the str_detect() function to look for rows with the character string “FORECLOSURE”. Putting a ! before it means we want the rows where the string is not detected.
``` {r}
debt <- CJ %>%
filter(!str_detect(iss_desc, "FORECLOSURE|PROMISSORY|BREACH"))
debt %>%
count(iss_desc) %>%
arrange(desc(n))
```
Medical debt cases will be filed with “Indebtedness” issues only. We discard all other cases.
``` {r}
debt <- debt %>%
filter(str_detect(iss_desc, "DEBT"))
debt %>%
count(iss_desc) %>%
arrange(desc(n))
```
## Medical Providers
Next, we will use the mutate() function to classify the cases that have a medical provider plaintiff.
Let’s start by finding the most common plaintiffs.
``` {r}
debt %>%
count(iss_plaint) %>%
arrange(desc(n))
```
Many of the most common plaintiffs are clearly medical providers or financial institutions. We use the case_when() function to classify all rows where one of the words is detected as being filed by a medical provider.
We can also classify the names that contain “BANK” or “LOAN” as not being medical providers. The TRUE category at the end will be applied to all rows that don’t fit any previous conditions.
``` {r}
debt <- debt %>%
mutate(plaint_type = case_when(str_detect(iss_plaint, "BANK|CREDIT|LOAN") ~ "OTHER",
str_detect(iss_plaint, "HEALTH|CLINIC|MEDICAL") ~ "MEDICAL",
TRUE ~ "UNKNOWN"))
```
To check how many we were able to classify, we’ll look at the count in the plaint_type column.
``` {r}
debt %>%
count(plaint_type) %>%
arrange(desc(n))
```
To classify the remaining 16,860, we comb through the plaintiffs and classify them. We can repeat the three steps above until we have enough of them classified.
To get the most common plaintiffs that are not classified yet:
``` {r}
debt %>%
filter(plaint_type == "UNKNOWN") %>%
count(iss_plaint) %>%
arrange(desc(n))
```
I came up with the following classification:
``` {r}
debt <- debt %>%
mutate(plaint_type = case_when(str_detect(iss_plaint, "BANK|LAWN|BOND|CASH|CREDIT|MIDLAND|TAX\\sCOMMISSION|FIA\\sCARD\\sSERVICES|CACH|TULSA\\sADJUSTMENT\\sBUREAU|TAX|CACH|ASSET\\sACCEPTANCE|UNIFUND\\sCCR\\sPARTNERS|JEFFERSON CAPITAL\\sSYSTEMS|CAVALRY|AUTO|CITY\\sOF\\sTULSA|STUDENT\\sLOAN|HOME\\sOPTIONS\\sMEDICAL\\sEQUIPMENT\\sINC|MEGA\\sLIFE") ~ "OTHER",
str_detect(iss_plaint, "HEALTH|CLINIC|MEDICAL|HOSPITAL|AHS|DENTAL|ORTHOPAEDIC|ORTHOPEDIC|HILLCREST\\sRIVERSIDE\\sINC|ONCOLOGY|ST\\sJOHN\\sSAPULPA|HOME\\sIV\\sCARE\\sINC|B\\sP\\sMD|NEONATOLOGY\\sGROUP|MEGA\\sLIFE\\sAND\\sHEALTH\\sINSURANCE\\sCOMPANY|MERITS\\sHEALTH\\sPRODUCTS\\sINC|APEX\\sMEDICAL\\sGAS\\sSYSTEMS\\sINC|MAXWELL\\sHEALTHCARE\\sCONSULTING\\sPLLC|TEJ\\sHOSPITALITY\\sINC|AMERICAN\\sINSTITUTE\\sOF\\sMEDICAL\\sTECHNOLOGY\\sLLC|TOTAL\\sMEDICAL\\sPERSONNEL\\sSTAFFING\\sLLC|TRI\\sANIM\\sHEALTH\\sSERVICES\\sINC|JEFFREY\\sDO") ~ "MEDICAL",
TRUE ~ "UNKNOWN"))
```
This gives us a count of:
```{r}
debt %>%
count(plaint_type) %>%
arrange(desc(n))
```
This gives us the most common UNKNOWN plaintiffs.
``` {r}
debt %>%
filter(plaint_type == "UNKNOWN") %>%
count(iss_plaint) %>%
arrange(desc(n))
```
This gives us the most common OTHER plaintiffs.
```{r}
debt %>%
filter(plaint_type == "OTHER") %>%
count(iss_plaint) %>%
arrange(desc(n))
```
This gives us the most common medical plaintiffs.
```{r}
debt %>%
filter(plaint_type == "MEDICAL") %>%
count(iss_plaint) %>%
arrange(desc(n))
```
To get a data frame with only the medical debt cases, filter on the plaint_type variable we created.
``` {r}
debt <- debt %>%
filter(plaint_type == "MEDICAL")
```
# Summarizing and Visualizing Data
## Summarizing to the case level
``` {r}
debt <- debt %>%
group_by(casenum) %>%
slice(1)
```
That gives us 1,740 lawsuits filed.
## Summarizing by year
To find out how many medicial debt cases were filed in Tulsa County small claims court each year we just need to count up the number of rows with each distinct file year in our data.
``` {r}
year_sum <- debt %>%
ungroup %>%
count(file_year)
```
Now we have a data frame with the number of cases filed in each year. To create a simple line plot with this data.
``` {r}
ggplot(year_sum, aes(file_year, n)) +
geom_line() +
geom_text(aes(y = n + 10, label = n), family = "Menlo", size = 3) +
xlab("Year") + ylab("Cases") +
ylim(0, NA) +
theme_ojo() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5)) +
labs(title = "CJ medical debt cases",
subtitle = "Tulsa County 2005-2019")
```
## Summarizing by provider
The most common plaintiffs seem to be SAINT FRANCIS,ST JOHN, HILLCREST. Again, we’ll use the str_detect() function to classify the cases.
``` {r}
debt <- debt %>%
mutate(plaint_cat = case_when(str_detect(iss_plaint, "SAINT FRANCIS") ~ "SAINT FRANCIS",
str_detect(iss_plaint, "ST JOHN") ~ "ST JOHN",
str_detect(iss_plaint, "HILLCREST MEDICAL CENTER") ~ "HILLCREST",
TRUE ~ "OTHER"))
```
Now we can see how many cases each group filed each year:
``` {r}
plaint_sum <- debt %>%
ungroup %>% # Our data frame is still grouped by casenum from the code above; ungroup to summarize using broader groups
count(file_year, plaint_cat)
ggplot(plaint_sum, aes(file_year, n, group = plaint_cat, color = plaint_cat)) +
geom_line() +
xlab("Year") + ylab("Cases")+
ylim(0, NA) + # Extends y-axis down to zero
theme_ojo() + # Adds ojo styling to the plot
theme(legend.title=element_blank())+
theme(plot.title = element_text(hjust = 0.5, size = 14)) +
theme(plot.subtitle = element_text(hjust = 0.5, size = 13)) +
theme(legend.text=element_text(size = 9))+
theme(legend.position="bottom")+
labs(title = "Small claims medical debt cases\nin Tulsa County 2005-2019",
subtitle = "By medical provider") +
scale_color_manual(values = ojo_pal) # Gives lines colors from ojo's palette
```
### Find median debt amount
Created a new dataframe with the median medical debt amount by file_year.
```{r}
debt_new <-debt%>%
group_by(file_year) %>%
summarise(n_cases = n(),
n_cases_with_debt = sum(!is.na(debt_amt)),
avg = median(debt_amt, na.rm = TRUE)) %>%
mutate(pct_with_debt = n_cases_with_debt/n_cases)
debt_new
```
### Graph median medical debt by year
```{r}
ggplot(debt_new, aes(x= file_year, y= avg)) +
geom_line() +
#geom_text(aes(y = avg + 50, label = avg), family = "Menlo", size = 3) +
xlab("Year") + ylab("Amount") +
ylim(0, NA) + # Extends y-axis down to zero
theme_ojo() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(plot.subtitle = element_text(hjust = 0.5)) +
labs(title = "Median medical debt amount",
subtitle = "Tulsa County 2005-2019")
```
## Summary 1
- How you summarized and why
To answer the original question, I summarized by year. To take a deeper look into what is behind the general trends, I also summarized by medical provider.
- How you visualized the data
I used a line graph to visualize the data.
- Potential leads and trends
```{r results="asis", echo=FALSE}
cat("
<style>
body {
padding-top: 63px;
}
h1.title.toc-ignore {
font-family: 'Pluto Sans';
font-weight: bold;
background-color: #F8D64E;
}
h1, h2, h3 {
font-family: 'Pluto Sans';
font-weight: bold;
}
#TOC {
font-family: 'Menlo'
}
.list-group-item.active, .list-group-item.active:focus, .list-group-item.active:hover {
font-weight: bold;
color: black;
background-color: #F8D64E;
}
p a {
color: black;
background-color: #F8D64E;
font-weight: bold;
}
</style>
")
```