Data Mining in Healthcare: Predictive Analytics and Outbreak Detection of Lyme Disease

Authors

  • Mazharul Islam Tusher Department of Computer Science, Monroe University, New York, USA Author
  • Md Rayhan Hassan Mahin Department of Computer Science, Monroe University, New York, USA Author
  • Estak Ahmed Department of Computer Science, Monroe University, New York, USA. Author
  • Redoyan Chowdhury Department of Business Administration, International American University, California,LA, USA Author
  • Mujiba Shaima Department of Computer Science, Monroe University, New York, USA Author

DOI:

https://doi.org/10.59675/P323

Keywords:

Lyme Diseases;, Random Forest Classifier;, Linear Regression;, Lasso Regression;, Ridge Regression;, Logistic Regression

Abstract

The most prevalent tick-borne disease in the US is Lyme disease, which is becoming more prevalent as a result of ecological and environmental variables. This study explores the potential of data mining and machine learning to improve Lyme disease outbreak prediction and early detection. The study assesses seasonal trends, geographic hotspots, and demographic patterns from 1996 to 2023 using CDC surveillance data and Python-based data analytics. The study identifies possible epidemic zones with a 75.7% accuracy rate using data preprocessing, visualization, and predictive modeling, which includes Random Forest categorization. The study focuses on high-risk states like Pennsylvania and Delaware, as well as groups that are more likely to be hurt, like kids ages 5 to 14 and people over 75. Using Kaggle datasets and machine learning algorithms like Linear Regression and Logistic Regression to look at historical data shows that all models work the same way. Based on spatial and temporal characteristics, Random Forest models in particular show great promise for identifying outbreak times. According to the study's findings, data-driven surveillance provides a potent instrument for making decisions about public health. For increased accuracy, it suggests including socioeconomic and environmental aspects in future models. This research highlights the importance of predictive analytics in public health responses and lays the groundwork for real-time Lyme disease monitoring systems.

References

Centers for Disease Control and Prevention (CDC). Lyme disease data & statistics [Internet]. 2023 [cited 2025 Nov 12]. Available from: https://www.cdc.gov/lyme/data-research/facts-stats/surveillance-data-1.html

Data.gov. Lyme disease county-level incidence data [Internet]. 2023 [cited 2025 Nov 12]. Available from: https://catalog.data.gov/dataset/lymedisease-9211-county

Jordan RA, Gable S, Egizi A. Relevance of spatial and temporal trends in nymphal tick density and infection prevalence for public health and surveillance practice in long-term endemic areas: a case study in Monmouth County, NJ. J Med Entomol. 2022;59(4):1451–66. https://doi.org/10.1093/jme/tjac073

Steere AC, Redel H, Blaser M. Lyme disease (Lyme borreliosis) due to Borrelia burgdorferi. In Mandell, Douglas, and Bennett's Principles and Practice of Infectious Diseases, 9th Edition: Volume 1-2. 2019.

Wooten RM, Ma Y, Yoder RA, Brown JP, Weis JH, Zachary JF, Kirschning CJ, Weis JJ. Toll-like receptor 2 is required for innate, but not acquired, host defense to Borrelia burgdorferi. The Journal of Immunology. 2002 Jan 1;168(1):348-55.

Mead P. Epidemiology of Lyme disease. Infect Dis Clin North Am. 2022;36(3):495–521. https://doi.org/10.1016/j.idc.2022.03.004

Grimm D, Tilly K, Byram R, Stewart PE, Krum JG, Bueschel DM, et al. Outer-surface protein C of the Lyme disease spirochete: a protein induced in ticks for infection of mammals. Proc Natl Acad Sci U S A. 2004;101(9):3142–7. https://doi.org/10.1073/pnas.0306845101

Hook SA, Jeon S, Niesobecki SA, Hansen AP, Meek JI, Bjork JKH, et al. Economic burden of reported Lyme disease in high-incidence areas, United States, 2014–2016. Emerg Infect Dis. 2022;28(6):1165–74. https://doi.org/10.3201/eid2806.211335

Kugeler KJ, Scotty E, Hinckley AF, Hook SA, Nawrocki CC, Nikolai AM, Linz AM, Meece J, Schotthoefer AM. Epidemiology of Lyme disease as identified through electronic health records in a large midwestern health system, 2016–2019. In Open Forum Infectious Diseases. 2025 Feb;12(2): ofae758. Oxford University Press.

Penn Medicine. Lyme disease [Internet]. 2023 [cited 2025 Nov 12]. Available from: https://www.pennmedicine.org/for-patients-and-visitors/patient-information/conditions-treated-a-to-z/lyme-disease

Bostic TD, Kugeler KJ, Hinckley AF. Pregnancy Among Reported Lyme Disease Cases—United States, 1992–2019. Zoonoses and Public Health. 2024 Dec;71(8):972-7.

Cartter M, Lynfield R, Feldman KA, Hook SA, Hinckley AF. Lyme disease surveillance in the United States: looking for ways to cut the Gordian knot. Zoonoses Public Health. 2018;65(3):227–9. https://doi.org/10.1111/zph.12448

Centers for Disease Control and Prevention (CDC). Lyme disease data & statistics [Internet]. 2023.

Bay Area Lyme Foundation. Lyme disease facts & statistics [Internet]. 2024 [cited 2025 Nov 12]. Available from: https://www.bayarealyme.org/about-lyme/lyme-disease-facts-statistics

Kilpatrick AM, Randolph SE. Drivers, dynamics, and control of emerging vector-borne zoonotic diseases. Lancet. 2012;380(9857):1946–55.

Statista. Lyme disease incidence by state [Internet]. 2022 [cited 2025 Nov 12]. Available from: https://www.statista.com/statistics/742936/incidence-rates-of-lyme-disease-cases-by-state/

Eisen L, Dolan MC. Evidence for personal protective measures… J Med Entomol. 2016;53(5):1063–92.

Shikamaru S. Lyme disease dataset [Internet]. 2022 [cited 2025 Nov 12]. Available from: https://www.kaggle.com/datasets/sshikamaru/lyme-disease-rashes

Birkhead GS, Klompas M, Shah NR. Uses of electronic health records for public health surveillance. Annual Review of Public Health. 2015 Mar 18;36(1):345–59.

Nelson CA, Saha S, Kugeler KJ, Delorey MJ, Shankar MB, Hinckley AF, Mead PS. Incidence of clinician-diagnosed Lyme disease, United States, 2005–2010. Emerg Infect Dis. 2015;21(9):1625–31.

Tahmid M. Lyme disease dataset [Internet]. 2024 [cited 2025 Nov 12]. Available from: https://www.kaggle.com/datasets/tahmidmir/lyme-disease-dataset

Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. https://doi.org/10.1023/A:1010933404324

Published

01-11-2025

Issue

Section

Articles

How to Cite

Mazharul Islam Tusher, Md Rayhan Hassan Mahin, Estak Ahmed, Redoyan Chowdhury, & Mujiba Shaima. (2025). Data Mining in Healthcare: Predictive Analytics and Outbreak Detection of Lyme Disease. Academic International Journal of Pure Science , 3(2), 22-34. https://doi.org/10.59675/P323