This project aims to analyze unstructured Electronic Medical Records (EMR) data to extract key information such as patient demographics, diagnoses, medications, and vital statistics. The primary goal is to derive meaningful insights from the raw EMR text corpora.
The data source is a text file containing unstructured medical records. The chosen dataset was selected for its relevance to diabetes, and the analysis focused on extracting critical health information. Future enhancements may include additional analysis or integration with structured datasets.
-
Patient Demographics:
- Number of male patients: 32
- Number of female patients: 2
- Total number of patients: 34
- Mean age of patients: 50.20
-
Diagnosis:
- Type 1 DM
-
Medications:
- Full list: ['glyburide - 5 mg daily', 'Potassium Chl - 25 meq po qd', 'glimepiride - 4 mg daily', 'Insulin Nph - 80 1 80 UNITS SC AC', 'Insulin Nph - 20 1 20 UNITS SC AC', 'Potassium Chl - 25 meq po qd', 'Insulin', 'Insulin', 'glimepiride - 4 mg daily', 'Insulin Nph - 20 1 20 UNITS SC AC', 'glyburide - 5 mg daily', 'Insulin', 'glimepiride - 4 mg daily']
- Unique Medications: ['Insulin Nph - 80 1 80 UNITS SC AC', 'glyburide - 5 mg daily', 'glimepiride - 4 mg daily', 'Insulin Nph - 20 1 20 UNITS SC AC', 'Potassium Chl - 25 meq po qd', 'Insulin']
-
Vital Statistics:
- Height: 68.0 cm
- Weight: 168.0 lbs
- Mean Height: 68.0 cm
- Mean Weight: 168.0 lbs
- Mean BMI: 25.54
- Python
- Visual Studio Code
- Regular Expressions (Regex)
The project is currently in the complete.
import re
file_path = 'D:\Text_Sample_Diabetes(RM).txt'
with open(file_path, 'r') as file: records = file.readlines()
print(file_path)
male_pattern = re.compile(r'\b(?:M|Male|man|boy|male)\b', re.IGNORECASE) female_pattern = re.compile(r'\b(?:F|Female|woman|girl|female)\b', re.IGNORECASE)
male_count = 0 female_count = 0
for record in records: if male_pattern.search(record): male_count += 1 elif female_pattern.search(record): female_count += 1
total_patients = male_count + female_count
print(f"Number of male patients: {male_count}") print(f"Number of female patients: {female_count}") print(f"Total number of patients: {total_patients}")
age_pattern = re.compile(r'\b(\d{1,3})\s*(?:yr|year|yrs)\b', re.IGNORECASE)
ages = []
for record in records: match = age_pattern.search(record) if match: # Extracted age is in the first capturing group age = int(match.group(1)) ages.append(age)
mean_age = sum(ages) / len(ages) if ages else 0
print(f"Mean age of patients: {mean_age:.2f}")
with open(file_path, 'r') as file: medical_record_text = file.read()
diagnosis_match = re.search(r"a:(.*?)(?=(?:p:|s:|$))", medical_record_text) if diagnosis_match: diagnosis = diagnosis_match.group(1).strip() print(f"Diagnosis: {diagnosis}")
medication_matches = re.finditer(r"p:administered\s(.*?)(?=(?:,|;|$))", medical_record_text) medications = [match.group(1).strip() for match in medication_matches] print(f"Medications: {medications}")
unique_medications = list(set(medications))
print("Unique Medications:", unique_medications)
height_match = re.search(r"Height\s(\d+.?\d*)\s*(?:cm|in)\b", medical_record_text) if height_match: height = float(height_match.group(1).strip()) print(f"Height: {height} cm")
weight_match = re.search(r"Weight\s(\d+.?\d*)\s*lbs\b", medical_record_text) if weight_match: weight = float(weight_match.group(1).strip()) print(f"Weight: {weight} lbs")
heights = [height] # Assuming a list in case there are multiple heights weights = [weight] # Assuming a list in case there are multiple weights
mean_height = sum(heights) / len(heights) if heights else None mean_weight = sum(weights) / len(weights) if weights else None
mean_height_meters = mean_height * 0.0254 # Convert from inches to meters mean_weight_kg = mean_weight * 0.453592 # Convert from pounds to kilograms
mean_bmi = round(mean_weight_kg / (mean_height_meters ** 2), 2)
print(f"Mean Height: {mean_height} cm") print(f"Mean Weight: {mean_weight} lbs") print(f"Mean BMI: {mean_bmi}")