

OBJECTIVE Continuous glucose monitoring (CGM) is essential in diabetes care and research; however, extracting key data (e.g., time above, in, or below range) from CGM reports is manual, time-consuming, and inefficient. Natural language processing (NLP) can extract data from unstructured sources (e.g., images), but its application in CGM remains unexplored. We aimed to evaluate the accuracy of extracting CGM data using NLP. RESEARCH DESIGN AND METHODS We analyzed CGM reports stored as PDF files from the electronic health record at New York University Langone Health. The steps of our algorithm pipeline consist of 1 ) performing optical character recognition (OCR) to obtain glucose matrix data from CGM reports, 2 ) determining the type of CGM documents based on keywords in OCR results, 3 ) extracting variables of glucose based on CGM document type, and 4 ) storing the extracted glucose data in a structured database. Two experts with experience in CGM research and clinical practice conducted an independent manual review of 1% of the documents ( n = 226). We calculated accuracy (correct extraction of CGM data) by comparing the algorithm’s results with the manual review. RESULTS Of the documents analyzed, 36.8% were Freestyle Libre and 63.2% were Dexcom. For information extraction, the agreement in evaluating Libre results between two experts was 99.93%. When comparing algorithm accuracy with manual review, the accuracy for Libre was 99.87% and, for Dexcom, 100.00%. CONCLUSIONS Using an NLP approach to extract valuable glucose data from CGM PDF files is feasible and accurate, which can benefit clinical practice and diabetes research.
Medical Journal
|15th Jan, 2026
|Nature Medicine's Advance Online Publication (AOP) table of contents.
Medical Journal
|15th Jan, 2026
|Wiley
Medical Journal
|15th Jan, 2026
|Wiley
Medical Journal
|15th Jan, 2026
|Wiley
Medical Journal
|15th Jan, 2026
|Wiley
Medical Journal
|15th Jan, 2026
|Wiley
Medical Journal
|15th Jan, 2026
|Wiley