Published on in Vol 13 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/50849, first published .
Consequences of Data Loss on Clinical Decision-Making in Continuous Glucose Monitoring: Retrospective Cohort Study

Consequences of Data Loss on Clinical Decision-Making in Continuous Glucose Monitoring: Retrospective Cohort Study

Consequences of Data Loss on Clinical Decision-Making in Continuous Glucose Monitoring: Retrospective Cohort Study

Original Paper

1Biomedical Signal and Systems, Faculty of Electrical Engineering, Mathematics And Computer Science, University of Twente, Enschede, Netherlands

2Internal Medicine, Ziekenhuisgroep Twente, Almelo, Netherlands

*these authors contributed equally

Corresponding Author:

Carlijn I R Braem, MSc

Biomedical Signal and Systems

Faculty of Electrical Engineering, Mathematics And Computer Science

University of Twente

PO BOX 217

Enschede, 7500 AE

Netherlands

Phone: 31 534898830

Email: c.i.r.braem@utwente.nl


Background: The impact of missing data on individual continuous glucose monitoring (CGM) data is unknown but can influence clinical decision-making for patients.

Objective: We aimed to investigate the consequences of data loss on glucose metrics in individual patient recordings from continuous glucose monitors and assess its implications on clinical decision-making.

Methods: The CGM data were collected from patients with type 1 and 2 diabetes using the FreeStyle Libre sensor (Abbott Diabetes Care). We selected 7-28 days of 24 hours of continuous data without any missing values from each individual patient. To mimic real-world data loss, missing data ranging from 5% to 50% were introduced into the data set. From this modified data set, clinical metrics including time below range (TBR), TBR level 2 (TBR2), and other common glucose metrics were calculated in the data sets with and that without data loss. Recordings in which glucose metrics deviated relevantly due to data loss, as determined by clinical experts, were defined as expert panel boundary error (εEPB). These errors were expressed as a percentage of the total number of recordings. The errors for the recordings with glucose management indicator <53 mmol/mol were investigated.

Results: A total of 84 patients contributed to 798 recordings over 28 days. With 5%-50% data loss for 7-28 days recordings, the εEPB varied from 0 out of 798 (0.0%) to 147 out of 736 (20.0%) for TBR and 0 out of 612 (0.0%) to 22 out of 408 (5.4%) recordings for TBR2. In the case of 14-day recordings, TBR and TBR2 episodes completely disappeared due to 30% data loss in 2 out of 786 (0.3%) and 32 out of 522 (6.1%) of the cases, respectively. However, the initial values of the disappeared TBR and TBR2 were relatively small (<0.1%). In the recordings with glucose management indicator <53 mmol/mol the εEPB was 9.6% for 14 days with 30% data loss.

Conclusions: With a maximum of 30% data loss in 14-day CGM recordings, there is minimal impact of missing data on the clinical interpretation of various glucose metrics.

Trial Registration: ClinicalTrials.gov NCT05584293; https://clinicaltrials.gov/study/NCT05584293

Interact J Med Res 2024;13:e50849

doi:10.2196/50849

Keywords



Improved glucose sensing techniques have led to the increased availability of continuous glucose monitoring (CGM) technology for patients with diabetes. These minimally invasive sensors measure the glucose concentration of the interstitial fluid every 1 or 5 minutes, representing the blood glucose concentration with an average delay of 10-12 minutes [1,2]. CGM provides insights into glucose concentration fluctuations throughout the day and enables comparisons over time using predefined glucose metrics. This offers better prevention of out-of-range values compared to glycated hemoglobin (HbA1c) measurements or traditional finger prick methods. Consequently, CGM has the potential to improve glycemic control and adherence to lifestyle and drug regimens [3-7]. Therefore, CGM devices have been included in clinical guidelines and standards of care for patients with type 1 and 2 diabetes [8].

Commonly used glucose metrics to assess glycemic control in clinical practice from CGM devices include time in range (TIR), representing the percentage of time spent within the glucose range of 70-180 mg/dL (3.9-10.0 mmol/L), time below range (TBR) in 2 levels, time above range (TAR) in 2 levels, glucose management indicator (GMI), and the coefficient of variation (CV). Although treatment targets are individualized, the general aim is to achieve a TIR of over 70% while minimizing TBR below 4% [9,10].

Incomplete data collection during CGM monitoring can affect the clinical interpretation of glucose metrics. Several factors such as loss of connectivity, sensor or reader malfunction, depleted battery, or delayed interaction with the device when data are only temporarily stored on the sensor, could lead to data loss. This data loss can introduce bias in the estimation of glucose metrics [11]. Evaluating the potential impact of data loss on clinical interpretations and determining acceptable levels of error is crucial for clinical decision-making [12].

The type and distribution characteristics of missing data are important when evaluating data loss. There are 3 types of missing data, they are, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR) [13-15]. MCAR implies no systematic differences between participants with complete and incomplete data. MAR occurs when missing values are independent of the missing variables, but the pattern of missing data is dependent on time. In the case of MNAR, valuable information is lost from the data and there is no general method to manage the missing data properly [16,17]. Consecutive missing data, or gaps, have a certain gap size and incidence in the data, which can be represented by a gap probability distribution. Missing data are characterized as MCAR when the gap probability follows an exponential decline, with minimal affecting analysis and outcomes. However, MAR patterns skew the research outcomes, as missing data are more prevalent at certain times than others. Therefore, insight into the gap probability distribution is essential for understanding the influence of missing data on desired outcomes [15,18-20].

Currently, the recommendation for reliable interpretation of CGM data is to evaluate either 10 consecutive days without data loss or 14 consecutive days with a maximum data loss of 30% [8,10,21]. However, these studies have not investigated the impact of deviations from missing data in common CGM metrics on clinical decisions for individual patients [12,22,23]. Furthermore, previous research has determined that at least 14-15 days of CGM data provide a good estimation of CGM metrics compared to monitoring every 3 months or HbA1c [21,24]. These studies primarily focused on the correlation between CGM data and HbA1c, which does not give insight into the impact of missing data on the clinical interpretation of different CGM metrics. Additionally, it only reflects long-term glycemic control, overlooking the potential for assessing short-term variations afforded by CGM data. Therefore, this study aims to investigate the effects of data loss on glucose metrics in individual patient recordings and its influence on clinical decision-making.


Patient Inclusion and Data Collection

This study was performed in the Diabase cohort, which is a registry of adult patients with type 1 diabetes and those with type 2 diabetes who use CGM technology as part of their care. They are treated in Ziekenhuisgroep Twente (ZGT), a local hospital in the Netherlands (NCT05584293). The exclusion criteria were dependency on hemodialysis or inability to provide informed consent.

All patients included in the Diabase cohort between September 2020 and March 2022 were reported upon, which contains retrospective data from June 2016.

Ethical Considerations

The study was performed in accordance with the Declaration of Helsinki, the guidelines of good clinical practice. The Medical Research Ethics Committees United (MEC-U) in Nieuwegein, the Netherlands (registration AW23.009/W20.197), reviewed and approved the protocol. Prior to participation, patients provided informed consent to collect their (retrospective) glucose sensors and to retrieve relevant patient information from electronic patient files (age, gender, HbA1c, and BMI).

For this study, deidentified data were provided by the Diabase cohort, ensuring the confidentiality and privacy of participant information. Participants did not receive financial compensation for their participation in this study, as it solely involved the collection of data readily available from standard practice.

CGM Derived Clinical Metrics

We collected CGM data using FreeStyle Libre sensors, and the data were stored in LibreView (Abbott Diabetes Care). The sensors have a storage capacity of up to 8 hours of data and measure for 2 weeks.

Derived from the CGM data were the mean glucose, TIR (glucose between 70 and 180 mg/dL or 3.9 and 10.0 mmol/L; %), TBR (glucose <70 mg/dL or 3.9 mmol/L; %), TBR level 2 (TBR2; glucose <54 mg/dL or 3.0 mmol/L; %), TAR (glucose >180 mg/dL or 10.0 mmol/L; %), TAR level 2 (TAR2; glucose >250.2 mg/dL or 13.9 mmol/L; %), SD glucose (mmol/L), CV (%), GMI (mmol/mol), low blood glucose index (LBGI), high blood glucose index (HBGI), and the risk index (RI; Multimedia Appendix 1) [8,10,25-27].

Data Processing and Analysis

Data from LibreView were extracted as a CSV file and processed in Python 3.9 (extension Spyder 5.3.1; Python Software Foundation). Figure 1 illustrates the data processing steps from the original CGM data into the various outputs used in this study. Data from the FreeStyle Libre sensor were recorded at intervals of 13-19 minutes (with an average every 15 minutes). First, duplicate data and data from multiple sources within 14 minutes were removed from the data set. Subsequently, data were resampled to 1 sample per 15 minutes using linear interpolation, starting at midnight. Interpolated data points were marked as missing when the time difference between the 2 nearest original data points exceeded 19 minutes. These steps resulted in the preprocessed data set.

Figure 1. Data processing pipeline showing the data processing steps, the different data sets and the outputs. AE95: 95th percentile of absolute error; CGM: continuous glucose monitoring; εCT: error clinical target; εEPB: error expert panel boundaries.

From the preprocessed data set, the data loss characteristics were determined (illustrated in Figure 1). The percentage of missing data was calculated, and we determined the percentage of time that patients adhered to the data loss guidelines outlined in prior studies. These studies indicated that maintaining >70% of CGM data over the last 14 days or 10 days out of the 14 days strongly correlates with mean glucose, estimated HbA1c, TIR, and hyperglycemic metrics over a 3-month period. Consequently, our study adopts a 14-day measurement window with <30% data loss or a 10-day window with no data loss [8,21,24,28] The gap length and gap probability distribution were computed (Figure 1). To assess whether the missing data were MAR, we tested the fit of an exponential probability function to the gap probability distribution for gaps smaller than 96 samples [13]. The data loss dispersion over time was researched by comparing the average data loss per patient of the preprocessed CGM data set between various time periods—hours within the day, days in the week, business days (Monday to Friday) versus weekends, months, seasons, and days of the year.

It is important to consider the circadian rhythm of glucose metabolism during data analysis. Numerous studies have shown diurnal variations in glucose tolerance, insulin secretion, and peripheral insulin sensitivity, with poorer glycemic control observed in the evening and at night in healthy individuals [29,30]. To account for circadian rhythm, we constructed a baseline data set for each patient by combining 28 segments of 24-hour data without missing data (Figure 1). In cases that were where available, we merged multiple periods of 28 days per patient for further analysis, excluding patients with less than 28 segments of data. From the 28-day recordings, we created shorter recordings ranging from 7 up to 28 days.

Hereafter, data loss ranging from 5% to 50% was randomly introduced into the baseline data set in accordance with the gap distribution and data loss dispersion determined from the preprocessed data set. This means the gap distribution of the synthetic missing data is as close as possible to the gap distribution found over the whole population in the preprocessed data set. Also, synthetic data have higher chance of missing following the found data loss dispersion over time. Therefore, the synthetic missing data mimics the missing data as seen in day-to-day patient care.

Absolute Errors

The CGM metrics were calculated for each recording in the baseline and synthetic missing data set, and compared for every recording. The absolute error (AE), the median AE (MedAE), and the 95th percentile of the AE (AE95) as a result of missing data were calculated for all CGM metrics (Figure 1).

Clinical Target Errors (εCT)

The clinical targets follow clinical guidelines—<4% TBR, <1% TBR2, >70% TIR, <25% TAR, <5% TAR2, <36% CV, and <53 mmol/mol GMI [9,10]. The percentage of recordings that surpassed the clinical target cut-off because of missing data was determined as the clinical target error (εCT). This step is illustrated in Figure 1.

Expert Panel Boundary Error (εEPB)

A panel of experts consisting of a diabetes specialist nurse, a diabetes nurse, and a technical physician (a medical specialist in diabetes-related technology in health care) was interviewed. Each expert was interviewed individually to discuss clinically relevant changes in CGM metrics. During the interviews, CGM metrics TIR, TAR, TAR2, TBR, TBR2, GMI, and CV were discussed separately. The experts were instructed to consider a generic patient with diabetes and to evaluate each CGM metric separately. They were asked to identify when a change in the CGM metric would likely result in therapy alteration, indicating clinical relevance. A clinically relevant change can be dependent on the initial value of the CGM metric. For example, a change in TBR from 2% to 4% can have more impact than a change from 8% to 10%, even though the absolute change is the same. Therefore, the experts determined clinically relevant changes for various initial values of all CGM metrics (Multimedia Appendix 2). From these discussions, the strictest relevant change per CGM metric was selected as the expert panel boundary (Multimedia Appendix 2). The percentage of recordings that exceed the defined expert panel boundary due to data loss is defined as the expert panel boundary error (εEPB), this step is depicted in Figure 1. When data loss resulted in an εCT or εEPB of more than 5%, we assumed that data loss had considerable influence on the CGM metric and should be interpreted with caution, consistent with the clinically significant criterion for a 5% increase in TIR [8].

GMI Subgroup Analysis

To see whether missing data had a different influence on TBR, the recordings were divided into 3 GMI-based groups (<53, 53-64, and ≥64 mmol/mol), based on 3 commonly used HbA1c categories as determined by the American Diabetes Association [31]. Hereafter, the MedAE, εCT, and εEPB were calculated for TBR in the 3 GMI groups.

Statistical Analysis

The median and IQR were calculated for the clinical and sensor data characteristics, that is, the recording length, data loss, and overall CGM metrics.

The fit between the gap distribution and the exponential probability density function was tested with a 2-sided Kolmogorov-Smirnov test and the sum of the squared error. Differences in data loss dispersion between several time periods were evaluated using the Kruskal-Wallis test with post hoc Dunn test and Bonferroni correction. Differences between the glucose metrics between the preprocessed data set and the baseline data set with 28 days were evaluated using Kruskal-Wallis test with Bonferroni correction. For these statistical analyses, the packages SciPy and scikit-posthocs in Python were used [32,33]. All data visualizations were made with the Python package Matplotlib [34].

Scatterplots were generated for CGM metrics from 14-day recordings with complete data versus those with 30% synthetic missing data. Additionally, histograms of the AE were constructed from 14-day recordings with 30% data loss in TBR and TBR2. The histogram bin widths were determined with Freedman-Diaconis rule. A P value <.01 was considered significant.


Data and Patient Characteristics

Between September 2020 and March 2022, 92 patients using a FreeStyle Libre CGM device were included in the Diabase study. Descriptive statistics of the population can be found in Table 1.

Table 1. Characteristics of the population and their data used in this research.

Values
Type 1 diabetes, n (%)79 (95.8)
Type 2 diabetes, n (%)13 (14.2)
Men, n (%)47 (51)
Age (years), median (IQR)52 (37.3-60.5)
BMI (kg/m2), median (IQR)26.2 (23.1-29.4)
HbA1ca (%), median (IQR)7.6 (7.0-8.3)
HbA1c (mmol/mol), median (IQR)60 (53.3-67.0)
Recording length (days), median (IQR)655 (499-925)
Data loss (%), median (IQR)13.5 (6.3-35.6)
Time recordings had 0% data loss for 10 days (%), median (IQR)0.0 (0.0-0.3)
Time recordings had <30% data loss for 14 days (%), median (IQR)88.9 (63.0-98.7)

aHbA1c: glycated hemoglobin.

Data Loss Statistics

The gap probability distribution of the preprocessed data set differs significantly from a fitted exponential probability density function (P=.002). This means the data loss is not completely random (Multimedia Appendix 3).

During the hours of 10-11 PM, 11 PM to midnight, and midnight to 1 AM, a higher data loss of 35.8%, 43.3%, and 35.3%, respectively, was observed compared to other hours of the day with an average data loss of 10.4% (IQR 2.7%-30.5%; P<.001). Therefore, to mimic the unequal dispersion we created synthetic missing data following the probability of missing data during the day. No significant differences in data loss were found between other time periods.

Baseline Data Set

Among the 92 patients, 84 (91%) patients had sufficient data to construct a data set of 28 segments with complete 24-hour segments of data. These 84 patients contributed to a total of 798 recordings over 28 days, forming the baseline data set. Each patient provided median of 7.0 (IQR 3.8-14.0) in 28-day recordings. The median (IQR) for TIR, TBR, TBR2, TAR, and TAR2 of all 28-day recordings were 60.3% (53.2%-67.8%), 3.3% (1.4%-5.8%), 0.3% (0.0%-1.0%), 35.4% (27.4%-43.3%), and 9.4% (5.5%-13.9%), respectively. No significant differences were found between all the clinical metrics of the preprocessed data and the 28-day baseline data set, indicating that the selected baseline data set reflects the original data.

Expert Panel

The experts unanimously agreed that the most important metrics were TBR and TBR2, resulting in strict expert panel boundaries. In the TBR range of 0%-4% and the TBR2 range of 0%-5%, a difference of 1% was deemed clinically relevant (Multimedia Appendix 2). However, as the TBR increased, the experts allowed for higher changes, accepting a maximum of 5% for TBR and 10% for TBR2 when the initial value was 100%. Expert panel boundaries for the remaining glucose metrics can be found in Multimedia Appendix 2.

CGM Metrics Without and Those With Synthetic Data Loss

In the baseline data set, 734 (92.0%), 408 (51.1%), 796 (99.8%), and 767 (96.1%) of the total (N=798) 7-day recordings contained TBR, TBR2, TAR, and TAR2, respectively. For 28-day recordings, this is 798 (100%), 611 (76.6%), 798 (100%), and 790 (99.0%) for TBR, TBR2, TAR, and TAR2, respectively.

The scatterplots of Figure 2A represent the relation between the original 14-day data of TBR and TBR2, and with 30% data loss, per individual recording. These figures show that 30% data loss in these metrics results in small deviations from the true value with data loss, as all data points are close to the identity line. More recordings fell into the εEPB area for TBR, as the expert panel boundary was stricter compared to TBR2. The histograms of Figure 2B show the AE, MedAE, and AE95 for these CGM metrics as a result of 30% data loss in a 14-day recording. The MedAE in the histograms is small, indicating that the majority of errors are small. However, the histograms also display a long tail, indicating that there are instances where the AE is larger than the MedAE.

Figure 3 shows the values of AE95, εCT, and εEPB for the CGM metrics TBR and TBR2, with missing data increasing from 5% to 50% over a period of 7 to 28 days. This analysis includes only the recordings that had TBR episodes. As expected, the AE95, εCT, and εEPB increase as the percentage of data loss increases. When more days are available, the influence of data loss is reduced (Figure 3A). Applying the current guidelines of 30% data loss for a 14-day recording period, we observed an AE95 of 1.0% for TBR and 0.5% for TBR2. The corresponding εCT values were 29 out of 786 (3.7%) recordings for TBR and 28 out of 522 (5.4%) recordings for TBR2 and the εEPB were 28 out of 786 recordings (5.0%) for TBR 1 out of 522 recordings and 0.2% for TBR2. These findings suggest that the impact of missing data on CGM metrics can vary significantly depending on the specific metric and recording period used.

Figure 2. (A) The density scatterplots of the TBR and (B) TBR2 indicates the relation between the 14-day-long original data without missing data (horizontal axis) and the data with 30% data loss (vertical axis). More detail of the critical areas of TBR and TBR2 are illustrated in (C) and (D). The color bar indicates the Gaussian kernel-density estimate of the recordings. The black dashed lines represent the expert panel boundaries, and the horizontal and vertical black dotted lines represent the clinical targets. Values falling outside the clinical target and expert panel boundary, in the hatched areas, are labeled as errors (εCT and εEPB). (E) The histograms show the AE (%) with the median AE (MedAE) and AE95 as a result of 30% data loss of the TBR and (F) TBR2. AE: absolute error; AE95: 95th percentile of AE; CT: clinical target; EPB: expert panel boundary; MedAE: median absolure error; TBR: time below range; TBR2: time below range level 2; εCT: clinical target errors; εEPB: expert panel boundary errors.
Figure 3. (A) The εEPB of 5%-50% data loss in recordings (horizontal axis) of 7-28 days (vertical axis) of TBR. The white line indicates the boundary where the εEPB exceeds 5%. (B,C) show the AE95, (D,E) the clinical target errors (εCT), and (F,G) the εEPB of TBR and TBR2 respectively of 10%, 20%, 30%, 40%, and 50% data loss in recordings of 7, 14, 21, and 28 days. The color bar indicates the errors in percentages. AE95: 95th percentile of the absolute error; TBR: time below range; TBR2: time below range level 2; εEPB: expert panel boundary errors; εCT: clinical target errors.

For a 14-day measurement period with 30% data loss, the AE95 was 2.4%, 1.3%, and 1.0%, εCT was 17 (2.1%), 23 (2.9%), and 14 (1.8%) and εEPB was 0 (0.0%), 0 (0.0%), and 0 (0.0%) for TIR, CV, and GMI, respectively (Figures S1 and S2 in Multimedia Appendix 4). For TAR and TAR2 the εEPB was 0.0% for all cases (Figure S3 in Multimedia Appendix 4).

TBR and TBR2 have the highest errors, especially with greater data loss in shorter recordings, with a maximum εEPB of 147 out of 736 (20.0%) of the recordings for TBR and 22 out of 408 (5.4%) of the recordings for TBR2, compared to a maximum of 5 out of 798 (0.6%) for TIR, CV, and GMI. In 2 (0.3%) of the 14-day recordings, the TBR value went from 0.2% to 0.0%, resulting in its disappearance due to data loss. Similarly, the TBR2 metric disappeared in 32 (6.1%) recordings in the 14-day recordings with 30% data loss. The median TBR2 value of the original recordings was 0.07% (IQR 0.07%-0.15%). No other metrics disappeared due to data loss.

For the SD, LBGI, HBGI, and RI, only the AE95 was available. The overall AE95 was low, with 0.1 for SD, 0.2 for LBGI, and 0.6 for HBGI and RI. The highest AE95 of 0.8% was observed for the HBGI and RI metrics in a 7-day recording with 50% data loss (Figure S4 in Multimedia Appendix 4).

GMI Subgroup Analysis

The recordings were divided into 260 (32.6%) low GMI (<54 mmol/mol), 445 (55.8%) moderately elevated (53-64 mmol/mol), and 93 (11.7%) elevated GMI recordings (≥64 mmol/mol). Figure 4 shows the values of AE95, εCT, and εEPB for TBR, with missing data increasing from 5% to 50% over a period of 7 to 28 days for the GMI subgroups. The error caused by missing data is highest in the low GMI group (Figures 4A, 4D, and 4G). The AE95, εCT, and εEPB are 1.2%, 10 (3.8%), and 25 out of recordings 260 (9.6%), respectively, for a recording of 14 days with 30% data loss. This means that for 9.6% of the recordings a clinical expert would see a relevant change.

Figure 4. (A-C) The AE95 are shown, (D-F) the εCT, and (G-I) the εEPB of TBR within 3 GMI groups of <53 mmol/mol, 53 mmol/mol≤ GMI <64 mmol/mol, and GMI ≥64 mmol/mol. These panels illustrate the impact of 10%, 20%, 30%, 40%, and 50% data loss in recordings of 7, 14, 21, and 28 days. The color bar indicates the errors in percentages. AE95: 95th percentile of the absolute error; GMI: glucose management indicator; TBR: time below range; εCT: clinical target errors; εEPB: expert panel boundary errors.

Principal Findings

This study provides a thorough analysis of missing data’s impact on real-world CGM recordings for patients with diabetes. By merging data analysis and expert evaluations, it assesses the clinical implications of missing data on CGM metrics, improving our understanding of its practical effects and challenges. Findings indicate that a minimum of 14 days of glucose data collection with no more than 30% missing data suffice for clinical decision-making, ensuring adequate patient care.

The key finding of this study is that the interpretation of TBR metrics is more vulnerable to greater clinical consequences due to missing data compared to other metrics. The results indicate that up to 30% data loss in a 14-day recording results in a misinterpretation of the glucose metric in 5.0% of the time. Therefore, it should be realized that 30% data loss over a 14-day measurement will occasionally lead to false clinical interpretations of TBR. Furthermore, when analyzing the data by GMI levels, it is notable that in 9.6% of recordings featuring low GMI levels, a clinical expert would observe a relevant change due to missing data on TBR. These findings underscore the substantial impact of missing data on TBR, emphasizing the importance of cautious interpretation in clinical practice.

Similarly, for TBR2, 14-day recordings with 30% data loss resulted in misinterpretation in 0.2% of cases, and the complete disappearance of TBR2 episodes occurred in 6.1% of the cases. One might, therefore, suggest that severe hypoglycemic episodes would be missed, which can have potentially serious clinical consequences as symptoms might not always be recognized when they occur during nighttime [35]. However, this complete loss of TBR2 episodes occurs only when the actual TBR2 value is below the clinical target of 1%, and thus, would not have required any action. In contrast, missing data had limited influence on the clinical interpretation of other glucose metrics such as TIR, CV, GMI, TAR and TAR2. This limited influence was partly caused by broader expert panel boundaries for these metrics compared to TBR and TBR2. The expert panel accepted errors of 1%-5% for TBR2 and 1%-10% for TBR, while the accepted errors for TIR, TAR, and TAR2 ranged between 5% and 10%. When TBR is not a primary concern, 14 days of CGM data can be interpreted with 50% missing data. Even measurements shorter than 14 days could be used, but the representativeness of the data for long-term glucose values will diminish, and the correlation of the metrics with HbA1c may decrease [21].

Based on the findings of this study, we propose several recommendations. First, we support the current recommendation of having a minimum of 70% data available in a CGM recording of 14 days. However, health care practitioners should be aware that there is still a chance of misinterpreting TBR. The guideline of a 10-day CGM recording with no data loss might be impractical, as the real-life CGM data analyzed in this study rarely met this criterion [8]. Therefore, our second recommendation is to use a measurement period of at least 14 days, which aligns with the durability of the current sensors. However, this result should be validated by other studies with different glucose sensors.

In this study, the expert panel boundaries are an important feature that may contribute to current clinical guidelines as they give insight into when CGM metrics result in clinically relevant changes due to data loss. The current clinical targets for most CGM metrics are valuable in the clinical setting to serve as goals for optimal glucose management. However, the targets are a limited tool for determining when a patient has a clinically relevant improvement or deterioration of their glucose management. When assessing the clinical consequences, the clinical target may not always be useful because significant changes due to data loss may not result in a different categorization if the initial value is already far from the target. However, such changes may still be clinically relevant.

In the literature, the CGM metrics SD, LBGI, HBGI, and RI are defined [21]. However, there are currently no targets defined for these metrics. Consequently, the expert panel was unable to determine such a boundary for these metrics. Nevertheless, AE95 can provide insight into the error size, but not indicate the impact on clinical decision-making. Previous studies investigating the influence of missing data on glycemic variability metrics reported small MedAEs [12,22]. This corresponds to our findings of small MedAEs, suggesting that even with excessive data loss, the CGM metrics would not be altered significantly. However, the AE95 is in some cases relatively high, indicating that in 5% of the population data loss can have quite significant clinical consequences. Therefore, we reported on AE95 instead of MedAE in this study.

The data loss in the real-world CGM data was MNAR, as the gap probability distribution did not follow an exponential function. Next to that, the chance of missing data during the night (10 PM-1 AM) was higher. MNAR data loss has a profound impact on the effects of missing data and synthetic data loss cannot be applied arbitrarily. Therefore, in this research, we mimicked these characteristics of data loss to create a simulation that closely resembled the actual missing data experienced by patients using CGM devices in everyday life.

Limitations

Several potential limitations of this study need to be addressed. In clinical practice, all CGM metrics are typically evaluated together rather than individually which was done to determine the clinical expert boundaries. Additionally, these metrics will be reviewed at every clinical visit, further reducing the actual risk that missing data will cause a change in treatment. Another potential limitation of our study is the small number of experts involved in defining the boundaries for CGM metrics. The subjectivity and potential biases inherent in expert input can pose challenges. Furthermore, the cut-off to accept an εEPB at 5% was chosen arbitrarily and choosing a different cut-off could influence the interpretation of the results. It is important to note that defining different boundaries than those presented in this study could influence the observed errors.

It should also be considered that this research has been performed on data measured only with the FreeStyle Libre 1 sensor, and some findings might differ for other devices. For example, the 8-hour data storage capacity may explain why more data are lost during the early night (10 PM-1 AM), which is a critical time for glucose measurements [36]. However, the presented research methodology can be easily adopted and applied to other CGM devices or populations. The results presented in our study were obtained from patients with diabetes, predominantly type 1 diabetes, who were treated in a hospital in the Netherlands. Differences in patient demographics, disease progression, and device accuracy could impact the applicability of findings across diverse populations. Therefore, caution should be exercised when extrapolating these results to other populations. However, the average HbA1c value of 7.6% (IQR 7.0%-8.3%; 60.0 mmol/mol, IQR 53.3-67.0 mmol/mol) was comparable with a large cohort from Germany and Austria, with a mean HbA1c of 7.8% (IQR 6.9%-8.9%; 62, IQR 52.0-74.0 mmol/mol), suggesting some level of generalizability [37]. Furthermore, the presented methodology can be applied to investigate the consequences of missing data in other diabetes populations using CGM.

Methodological Decisions

Some noteworthy methodological decisions were made in this study. First, we decided to include all the available 24-hour windows, thus including multiple 28-day recordings per patient in the data set. With this approach, there were no significant differences in CGM metrics between the preprocessed data with the baseline data set, validating the inclusion of multiple recordings. Second, not all patients used their CGM device continuously. Patients may have periods of several months where they did not use a CGM device. We marked this as missing data, which may have led to a potential overestimation of the reported data loss, compared to day-to-day CGM use. Third, we decided to create recordings of 7 to 28 days to study a range of CGM measurement lengths. The studies of Riddlesworth et al [21] and Xing et al [24] suggest that at least 14-15 days of CGM data provide a good estimation of CGM metrics compared to monitoring every 3 months or HbA1c. Also, Akturk et al [23] state that while the optimal recording length depends on the size of the gaps, duration of 14 days generally proves to be adequate. Using measurements of 7-28 days, we can ensure that an adequate measurement duration was covered.

Future Research

Future research should focus on including a heterogeneous population of patients with type 1, type 2, and other subtypes of diabetes. Next to that, more commonly used sensors should be added to the analysis to give a more generalizable result. Finally, the expert panel should be expanded and implemented similarly to the Parkes and Clarke error grids, involving a larger and more diverse panel of experts, to enhance the reliability and generalizability of the established boundaries [38,39].

Conclusions

To conclude, our aim was to examine the impact of data loss on glucose metrics within individual patient recordings from continuous glucose monitors and its implications for clinical decision-making. Through integrated data analysis and expert evaluations, we underscore the importance of comprehending missing data’s clinical consequences and recommend a maximum of 30% missing data in 14-day CGM recordings to enhance accurate interpretation and glucose management, acknowledging the possibility of misinterpreting TBR even with this threshold. For reliable interpretation of TBR in recordings with a low GMI, data loss should be below10%. Further research is needed to explore the consequences of missing data in diverse populations using various CGM devices, emphasizing the importance of comprehensive data collection for optimal glucose management and clinical decision-making.

Acknowledgments

The authors used the generative artificial intelligence tool ChatGPT 3.5 by OpenAI (2022) for language checking, suggestions on sentence structure, and improving overall readability. The authors thank Anouk Veldhuis, Ziekenhuisgroep Twente, for her contribution to Diabase. This research was financially supported by the grant Exceptional and Deep Intelligent Coach (EDIC) financed by the Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO), grant (628.011.02). NdB and CIRB are appointed by this grant. Diabase was financially supported by an unrestricted grant from the Pioneers in Health Care Innovation Fund 2020 established by the University of Twente, Medisch Spectrum Twente, and Ziekenhuisgroep Twente.

Data Availability

The data sets generated and analyzed during this study are not publicly available due to the ongoing inclusion of subjects but are available from the principal investigator (TU) on reasonable request.

Authors' Contributions

Conceptualization, methodology, and validation were performed by NdB, CIRB, USY, MMRVH, and GDL. Formal analysis, visualization, and software were completed by NdB and CIRB. Writing, the original draft preparation was done by NdB and CIRB. Writing, the review and editing were performed by NdB, CIRB, USY, TU, MMRVH, GDL, PHV, and HJH. Supervision was performed by MMRVH, GDL, PHV, and HJH. All authors have read and agreed to the published version of the study. GDL is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Formulas of the clinical continuous glucose monitoring (CGM) metrics.

DOCX File , 28 KB

Multimedia Appendix 2

Expert panel boundaries and the clinically relevant change, using the initial metric value as a baseline, for the seven most commonly used clinical continuous glucose monitoring (CGM) metrics, along with their respective clinical targets.

DOCX File , 24 KB

Multimedia Appendix 3

Description of the method and results of fitting the gap probability distribution of preprocessed continuous glucose monitoring (CGM) data with the exponential and other common probability mass functions.

DOCX File , 54 KB

Multimedia Appendix 4

Density scatterplots for time in range (TIR) and coefficient of variation (CV) and the error plots for glucose metrics TIR, CV, glucose management indicator (GMI), time above range (TAR), TAR level 2 (TAR2), SD, low blood glucose index (LBGI), high blood glucose index (HBGI), and risk index (RI).

DOCX File , 968 KB

  1. Schmelzeisen-Redeker G, Schoemaker M, Kirchsteiger H, Freckmann G, Heinemann L, Del Re L. Time delay of CGM sensors: relevance, causes, and countermeasures. J Diabetes Sci Technol. 2015;9(5):1006-1015. [FREE Full text] [CrossRef] [Medline]
  2. Zaharieva DP, Turksoy K, McGaugh SM, Pooni R, Vienneau T, Ly T, et al. Lag time remains with newer real-time continuous glucose monitoring technology during aerobic exercise in adults living with type 1 diabetes. Diabetes Technol Ther. 2019;21(6):313-321. [FREE Full text] [CrossRef] [Medline]
  3. Jangam S, Dunn T, Xu Y, Hayter G, Ajjan RA. Flash glucose monitoring improves glycemia in higher risk patients: a longitudinal, observational study under real-life settings. BMJ Open Diabetes Res Care. 2019;7(1):e000611. [FREE Full text] [CrossRef] [Medline]
  4. Dunn TC, Xu Y, Hayter G, Ajjan RA. Real-world flash glucose monitoring patterns and associations between self-monitoring frequency and glycaemic measures: a European analysis of over 60 million glucose tests. Diabetes Res Clin Pract. 2018;137:37-46. [FREE Full text] [CrossRef] [Medline]
  5. Rodbard D. Continuous glucose monitoring: a review of recent studies demonstrating improved glycemic outcomes. Diabetes Technol Ther. 2017;19(S3):S25-S37. [FREE Full text] [CrossRef] [Medline]
  6. Taylor PJ, Thompson CH, Luscombe-Marsh ND, Wycherley TP, Wittert G, Brinkworth GD. Efficacy of real-time continuous glucose monitoring to improve effects of a prescriptive lifestyle intervention in type 2 diabetes: a pilot study. Diabetes Ther. 2019;10(2):509-522. [FREE Full text] [CrossRef] [Medline]
  7. Oser TK, Cucuzzella M, Stasinopoulos M, Moncrief M, McCall A, Cox DJ. An innovative, paradigm-shifting lifestyle intervention to reduce glucose excursions with the use of continuous glucose monitoring to educate, motivate, and activate adults with newly diagnosed type 2 diabetes: pilot feasibility study. JMIR Diabetes. 2022;7(1):e34465. [FREE Full text] [CrossRef] [Medline]
  8. Danne T, Nimri R, Battelino T, Bergenstal RM, Close KL, DeVries JH, et al. International consensus on use of continuous glucose monitoring. Diabetes Care. 2017;40(12):1631-1640. [FREE Full text] [CrossRef] [Medline]
  9. Battelino T, Danne T, Bergenstal RM, Amiel SA, Beck R, Biester T, et al. Clinical targets for continuous glucose monitoring data interpretation: recommendations from the international consensus on time in range. Diabetes Care. 2019;42(8):1593-1603. [FREE Full text] [CrossRef] [Medline]
  10. Battelino T, Alexander CM, Amiel SA, Arreaza-Rubin G, Beck RW, Bergenstal RM, et al. Continuous glucose monitoring and metrics for clinical trials: an international consensus statement. Lancet Diabetes Endocrinol. 2023;11(1):42-57. [CrossRef] [Medline]
  11. Rubin DB. Inference and missing data. Biometrika. 1976;63(3):581-592. [CrossRef]
  12. Kucharski P, Pagacz K, Szadkowska A, Młynarski W, Romanowski A, Fendler W. Resistance to data loss of glycemic variability measurements in long-term continuous glucose monitoring. Diabetes Technol Ther. 2018;20(12):833-842. [CrossRef] [Medline]
  13. Zhou N, Patel D, Iyengar A, Shrivastava S, Bhamidipaty A. A verifiable imputation analysis for univariate time series and enabling package. 2020. Presented at: Proceedings - 2020 IEEE International Conference on Big Data (Big Data); 2020 Dec 10:1764-1773; Atlanta, Georgia, United States. [CrossRef]
  14. Moritz S, Sardá A, Bartz-Beielstein M, Zaefferer M, Stork J. Comparison of different methods for univariate time series imputation in R. arXiv. Oct 13, 2015:1-20. [FREE Full text]
  15. Braem CIR, Yavuz US, Hermens HJ, Veltink PH. Missing data statistics provide causal insights into data loss in diabetes health monitoring by wearable sensors. Sensors (Basel). 2024;24(5):1526. [FREE Full text] [CrossRef] [Medline]
  16. Kang H. The prevention and handling of the missing data. Korean J Anesthesiol. 2013;64(5):402-406. [FREE Full text] [CrossRef] [Medline]
  17. García-Laencina PJ, Sancho-Gómez JL, Figueiras-Vidal AR. Pattern classification with missing data: a review. Neural Comput Appl. 2009;19(2):263-282. [CrossRef]
  18. Khayati M, Lerner A, Tymchenko Z, Cudré-Mauroux P. Mind the gap: an experimental evaluation of imputation of missing values techniques in time series. Proc VLDB Endow. 2020;13(5):768-782. [CrossRef]
  19. Moritz S, Bartz-Beielstein T. ImputeTS: time series missing value imputation in R. R J. 2017;9(1):207. [CrossRef]
  20. Blankers M, Koeter MWJ, Schippers GM. Missing data approaches in eHealth research: simulation study and a tutorial for nonmathematically inclined researchers. J Med Internet Res. 2010;12(5):e54. [FREE Full text] [CrossRef] [Medline]
  21. Riddlesworth TD, Beck RW, Gal RL, Connor CG, Bergenstal RM, Lee S, et al. Optimal sampling duration for continuous glucose monitoring to determine long-term glycemic control. Diabetes Technol Ther. 2018;20(4):314-316. [CrossRef] [Medline]
  22. Kingsnorth AP, Whelan ME, Orme MW, Routen AC, Sherar LB, Esliger DW. Resistance to data loss from the Freestyle Libre: impact on glucose variability indices and recommendations for data analysis. Appl Physiol Nutr Metab. 2021;46(2):148-154. [CrossRef] [Medline]
  23. Akturk HK, Herrero P, Oliver N, Wise H, Eikermann E, Snell-Bergeon J, et al. Impact of different types of data loss on optimal continuous glucose monitoring sampling duration. Diabetes Technol Ther. 2022;24(10):749-753. [CrossRef] [Medline]
  24. Xing D, Kollman C, Beck RW, Tamborlane WV, Laffel L, Buckingham BA, et al. Juvenile Diabetes Research Foundation Continuous Glucose Monitoring Study Group. Optimal sampling intervals to assess long-term glycemic control using continuous glucose monitoring. Diabetes Technol Ther. 2011;13(3):351-358. [FREE Full text] [CrossRef] [Medline]
  25. Bergenstal RM, Beck RW, Close KL, Grunberger G, Sacks DB, Kowalski A, et al. Glucose management indicator (GMI): a new term for estimating A1C from continuous glucose monitoring. Diabetes Care. 2018;41(11):2275-2280. [FREE Full text] [CrossRef] [Medline]
  26. Kovatchev BP, Cox DJ, Kumar A, Gonder-Frederick L, Clarke WL. Algorithmic evaluation of metabolic control and risk of severe hypoglycemia in type 1 and type 2 diabetes using self-monitoring blood glucose data. Diabetes Technol Ther. 2003;5(5):817-828. [CrossRef] [Medline]
  27. Fabris C, Patek SD, Breton MD. Are risk indices derived from CGM interchangeable with SMBG-based indices? J Diabetes Sci Technol. 2015;10(1):50-59. [FREE Full text] [CrossRef] [Medline]
  28. Beck RW, Connor CG, Mullen DM, Wesley DM, Bergenstal RM. The fallacy of average: how using HbA alone to assess glycemic control can be misleading. Diabetes Care. 2017;40(8):994-999. [FREE Full text] [CrossRef] [Medline]
  29. Poggiogalle E, Jamshed H, Peterson CM. Circadian regulation of glucose, lipid, and energy metabolism in humans. Metabolism. 2018;84:11-27. [FREE Full text] [CrossRef] [Medline]
  30. Martorina W, Tavares A. Effects of melatonin on glycemic variability in type 2 diabetes mellitus: protocol for a crossover, double-blind, placebo-controlled trial. JMIR Res Protoc. 2023;12:e47887. [FREE Full text] [CrossRef] [Medline]
  31. American Diabetes Association Professional Practice Committee. 6. Glycemic targets: standards of medical care in diabetes-2022. Diabetes Care. 2022;45(Suppl 1):S83-S96. [CrossRef] [Medline]
  32. Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Polat, et al. SciPy 1.0 Contributors. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261-272. [FREE Full text] [CrossRef] [Medline]
  33. Terpilowski MA. scikit-posthocs: pairwise multiple comparison tests in Python. J Open Source Softw. 2019;4(36):1169. [CrossRef]
  34. Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng. 2007;9(3):90-95. [CrossRef]
  35. Fidler C, Elmelund Christensen T, Gillard S. Hypoglycemia: an overview of fear of hypoglycemia, quality-of-life, and impact on costs. J Med Econ. 2011;14(5):646-655. [FREE Full text] [CrossRef] [Medline]
  36. den Braber N, Vollenbroek-Hutten MMR, Westerik KM, Bakker SJL, Navis G, van Beijnum BF, et al. Glucose regulation beyond HbA1C in type 2 diabetes treated with insulin: real-world evidence from the DIALECT-2 cohort. Diabetes Care. 2021;44(10):2238-2244. [FREE Full text] [CrossRef] [Medline]
  37. Gerstl EM, Rabl W, Rosenbauer J, Gröbe H, Hofer SE, Krause U, et al. Metabolic control as reflected by HbA1c in children, adolescents and young adults with type-1 diabetes mellitus: combined longitudinal analysis including 27,035 patients from 207 centers in Germany and Austria during the last decade. Eur J Pediatr. 2008;167(4):447-453. [CrossRef] [Medline]
  38. Clarke WL, Cox D, Gonder-Frederick LA, Carter W, Pohl SL. Evaluating clinical accuracy of systems for self-monitoring of blood glucose. Diabetes Care. 1987;10(5):622-628. [CrossRef] [Medline]
  39. Parkes JL, Slatin SL, Pardo S, Ginsberg BH. A new consensus error grid to evaluate the clinical significance of inaccuracies in the measurement of blood glucose. Diabetes Care. 2000;23(8):1143-1148. [CrossRef] [Medline]


εCT: clinical target error
εEPB: expert panel boundary error
AE: absolute error
AE95: 95th percentile of the absolute error
CGM: continuous glucose monitoring
CV: coefficient of variation
GMI: glucose management indicator
HbA1c: glycated hemoglobin
HBGI: high blood glucose index
LBGI: low blood glucose index
MAR: missing at random
MCAR: missing completely at random
MEC-U: Medical Research Ethics Committees United
MedAE: median absolute error
MNAR: missing not at random
RI: risk index
TAR: time above range
TAR2: time above range level 2
TBR: time below range
TBR2: time below range level 2
TIR: time in range
ZGT: Ziekenhuisgroep Twente


Edited by T de Azevedo Cardoso; submitted 14.07.23; peer-reviewed by W Jia, T Vandenbroek, A Degraaf; comments to author 19.01.24; revised version received 21.02.24; accepted 10.04.24; published 31.07.24.

Copyright

©Niala den Braber, Carlijn I R Braem, Miriam M R Vollenbroek-Hutten, Hermie J Hermens, Thomas Urgert, Utku S Yavuz, Peter H Veltink, Gozewijn D Laverman. Originally published in the Interactive Journal of Medical Research (https://www.i-jmr.org/), 31.07.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Interactive Journal of Medical Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.i-jmr.org/, as well as this copyright and license information must be included.