'Big' Real Word Data in Drug Development:
A Flood of Promise, But Is It Always Clean Water?
Summary: The massive volume of Big Data from genomics, EHRs, and sensors has potential to transform healthcare and achieve precision medicine. However, this scale often masks issues like poor quality, lack of standardization, and integration challenges. Achieving personalized care relies on converting this data into reliable, validated, and secure "Good Data," complicated by technical, ethical, and regulatory hurdles, especially for new technologies in clinical settings. Until addressed properly, Big RWD is unlikely to gain much traction.
We live in an era where "Big Data" feels almost synonymous with "progress," especially in medicine. The sheer volume of information being generated – from high-throughput molecular technologies to ubiquitous electronic health records and increasingly sophisticated connected sensors – promises a revolution, ushering in the age of precision medicine and transforming how we diagnose, treat, and prevent disease. Yet, as someone who has navigated the complexities of drug development and clinical practice, I've learned a crucial lesson: Big Data is not always Good Data. This isn't a call for skepticism about the potential, but a necessary dose of realism about the profound challenges inherent in harnessing this deluge of information. Without rigorous attention to data quality, integration, and interpretation, the promise of Big Data remains just that – a promise.
The Multilayered Challenges of Big Data
The challenges in applying Big Data in medicine are not merely technical; they are scientific, logistical, ethical, and regulatory. The sources highlight several key areas where the "bigness" of the data can paradoxically impede rather than accelerate progress.
Volume and Complexity: The rapid accumulation of vast, complex datasets, including -omic data and comprehensive EHRs, presents significant management challenges. Handling the overwhelming amount of both structured and unstructured data, particularly from sources such as IoT sensors, can be challenging, especially under resource or time constraints.
Data Quality and Management: The sheer volume does not guarantee accuracy. If not managed properly, data can be redundant, incomplete, or outdated, leading to potentially poor forecasting and decision-making. Challenges persist in managing and interpreting data, even with large-scale sequencing.
Integration and Standardization: Integrating diverse genomic data and comprehensive electronic health records (EHRs) onto a Big Data infrastructure is challenging. There is a critical need for standardizing data content, format, and clinical definitions. Conflicting data formats and semantic data exchange issues exist, particularly with medical IoT devices, where standards are often poorly developed or dispersed.
Analytical Hurdles: Although new tools exist to extract meaning from large volumes of information, effectively doing so remains a significant challenge. Applying Big Data techniques to translational medicine also faces hindrances. There's a recognized need to reconsider how analytic methodology is taught to medical researchers.
Ethical, Legal, and Regulatory Landscapes: Compliance with regulations regarding patient privacy, security, and confidentiality poses a significant challenge, necessitating robust data protection measures. Legal complexities arise concerning liability and accountability for outcomes derived from AI-driven decisions based on Big Data. Ethical considerations surrounding privacy, bias, accountability, transparency, and patient consent are paramount, and implementing frameworks to address these is challenging.
Bias and Fairness: AI algorithms applied to Big Data can be susceptible to biases and errors if not rigorously validated with diverse datasets. Continuous monitoring is needed to mitigate these algorithmic errors. Bias in data itself, reflecting existing healthcare disparities, can be amplified by Big Data analytics.
Technical and Infrastructure Needs: Developing and deploying Big Data solutions require substantial investment in infrastructure, talent, and expertise. Ongoing maintenance adds to resource constraints.
Organizational Barriers: Resistance to change within healthcare systems can hinder the adoption of new Big Data technologies and AI.
In essence, while Big Data provides unprecedented opportunities for uncovering patterns and insights, its inherent characteristics and the complexity of the healthcare environment necessitate significant effort to transform raw data into reliable, actionable "Good Data."
Let's summarize some of these key challenges impacting data quality and utility:
Connected Sensors and the Trial Landscape
The advent of connected sensor technologies – those digital medicine products that use mobile sensors and algorithms to measure behavioral and physiological function – offers exciting prospects for clinical trials. Consider smartwatches that capture activity, wireless blood pressure cuffs, or even microphones that analyze vocal patterns. These technologies enable remote data acquisition, facilitating decentralized clinical trials (DCTs) and potentially reducing patient burden. They offer the potential for collecting richer, more frequent, or continuous data compared to traditional methods, capturing Real-World Data (RWD) and Patient-Generated Health Data (PGHD). This continuous, high-resolution data stream, while voluminous, can theoretically allow for smaller sample sizes in some trials.
However, the integration of these connected sensor technologies into pivotal clinical trials introduces specific data quality and logistical challenges.
Validation: While the pharmaceutical industry uses "validation" to mean one thing, the concept carries widely different meanings for various stakeholders. Ensuring the analytical validation (measuring what it intends to accurately) and clinical validation (demonstrating meaningful health outcomes) of these biometric monitoring technologies is foundational but complex. A shortcut question like "Is this wearable clinically validated?" highlights the ambiguity.
Workflow Integration: Integrating these technologies into trials requires careful consideration of workflow – how data is aggregated, the type of statistical analysis needed, participant training, site monitoring, how clinicians act on the data, and providing technical support. Insufficient attention here can reduce the technology's effectiveness.
Data Aggregation and Missingness: While continuous data is valuable, aggregating it effectively and handling potential data missingness (e.g., device not worn, connectivity issues) are practical challenges. High adherence and low data missingness are indicators of high-quality data acquisition from these sources.
Security and Privacy: Connected devices broaden attack vectors, making cybersecurity risks significant. Ensuring patient privacy and data rights when collecting data with connected sensor technologies is crucial. Robust data sharing and privacy practices are essential considerations.
These challenges directly impact whether the Big Data generated by connected sensors can be considered "Good Data" suitable for the high stakes of pivotal clinical trials. The reliability, accuracy, completeness, and security of this data must be rigorously established.
FDA's View on Digital Health and Data
The regulatory landscape, particularly concerning digital health technologies (DHTs) and the use of RWD, is evolving rapidly. The FDA has provided guidance on topics pertinent to connected sensors and Big Data, including remote data acquisition in clinical investigations and premarket/postmarket cybersecurity for medical devices.
While the sources don't explicitly state "FDA is concerned about IoT data quality," they highlight regulatory areas that underscore this implicit concern:
Data Standards for Submissions: The FDA seeks input on challenges regarding the use of standards like HL7 FHIR for submitting clinical study data collected from RWD sources. They also want to understand challenges in structuring and standardizing these submissions. This focus on standardization and structure for RWD submissions strongly suggests a regulatory need for data that is not just voluminous ("Big") but also consistent, well-defined, and usable ("Good").
Cybersecurity: The emphasis on premarket and postmarket cybersecurity for medical devices, the need for evolved risk management, and "security-by-design" approaches reflects concerns about the integrity and confidentiality of data collected by connected devices. Compromised data is not "Good Data".
Validation Frameworks: As evaluation frameworks adapt for connected sensor technologies, the FDA's role in defining what constitutes sufficient validation for these novel data sources in regulatory submissions becomes critical. Ensuring the data is "fit-for-purpose" is key.
Liability and Accountability: As discussed generally with AI and Big Data, determining liability for adverse events potentially linked to decisions or actions based on data from connected devices in a clinical trial or practice setting is a complex regulatory consideration.
These regulatory considerations underscore the importance of ensuring that data derived from connected sensors and other Big Data sources meet high standards of reliability, security, and standardization before they can be fully leveraged in regulated environments like pivotal clinical trials or integrated into clinical decision-making.
Bringing it Together for Precision Medicine
The ultimate goal of applying Big Data analytics, whether to genomics, EHRs, or connected sensor data, is to drive precision medicine, tailoring medical decisions to the individual based on their unique characteristics. This requires integrating data from multiple sources, such as diverse genomic data and comprehensive EHRs. Successfully identifying clinically actionable genetic variants, for instance, relies on the efficient and effective manipulation and analysis of large-scale sequencing data and clinical data from EHRs. Similarly, integrating derived-omic knowledge into EHRs is an approach to utilize molecular information for clinical decision support and deliver precision medicine.
However, if the underlying data is not "Good Data" – if it's incomplete, inaccurate, biased, or poorly integrated – the precision medicine insights derived from it will be flawed. Trying to build individualized strategies for diagnostic or therapeutic decision-making using unreliable data will not lead to precision; it will lead to uncertainty and potentially harmful errors.
The challenges of data management, integration, standardization, and analysis are not just academic points; they directly impact the feasibility and trustworthiness of precision medicine initiatives. Collaborative networks with data and expertise sharing are essential to address these issues. Moreover, the ethical and regulatory challenges surrounding data privacy, security, and the appropriate use of AI algorithms must be navigated carefully to build trust and ensure responsible implementation.
Conclusion: A Sobering Perspective, Not a Pessimistic One
Big Data holds an undeniable potential to transform medicine. The ability to collect, store, and analyze information on an unprecedented scale opens doors to insights previously unattainable. However, achieving the promise of precision medicine hinges on our ability to ensure that the data powering this revolution is not just Big, but truly Good.
Ignoring the significant challenges in data quality, integration, validation, and security, especially when dealing with novel sources like connected sensor technologies in clinical trials, is a path toward unreliable results and eroded trust. The regulatory focus on data standards, cybersecurity, and appropriate validation frameworks highlights the critical need for robust practices in data stewardship.
As we move forward, let's embrace the power of Big Data with open eyes. Let's invest not just in collecting more data, but in the infrastructure, standards, talent, and ethical frameworks required to make that data reliable, interoperable, secure, and interpretable. The future of precision medicine depends on it. It's time to focus on cultivating "Good Data" – because science without conscience (and reliable data) is indeed but the ruin of the soul.
References
Wu P-Y, Cheng C-W, Kaddi CD, Venugopalan J, Hoffman R, Wang MD. -Omic and Electronic Health Record Big Data Analytics for Precision Medicine. IEEE Trans Biomed Eng. 2017 Feb;64(2):263–73. doi: 10.1109/TBME.2016.2573285.
He KY, Ge D, He MM. Big Data Analytics for Genomic Medicine. Int J Mol Sci. 2017 Feb 15;18(2):412. doi: 10.3390/ijms18020412.
Anonymous. Big data analysis using modern statistical and machine learning methods in medicine - PubMed [Internet]. [cited 2024 May 29]. Available from: https://pubmed.ncbi.nlm.nih.gov/30514117/
Anonymous. Dockets Management Food and Drug Administration 5630 Fishers Lane, Rm. 1061 Rockville, Maryland 20852 RE [Internet]. [cited 2024 May 29]. Available from: https://www.regulations.gov/comment/FDA-2019-N-2683-0016
Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O, Hedensted S, et al. From Big Data to Precision Medicine. Front Med (Lausanne). 2019 Mar 1;6:34. doi: 10.3389/fmed.2019.00034.
Hassan M, Awan FM, Naz A, deAndrés-Galiana EJ, Alvarez O, Cernea A, et al. Innovations in Genomics and Big Data Analytics for Personalized Medicine and Health Care: A Review. Int J Mol Sci. 2022 Apr 22;23(9):4645. doi: 10.3390/ijms23094645.
Anonymous. Modernizing and designing evaluation frameworks for connected sensor technologies in medicine - Federal Trade Commission [Internet]. [cited 2024 May 29]. Available from: https://www.ftc.gov/policy/policy-reports/modernizing-and-designing-evaluation-frameworks-connected-sensor-technologies
Dodge HH, Moore ST, Kaye JA. Use of high-frequency in-home monitoring data may reduce sample sizes needed in clinical trials. PLOS ONE. 2015 Oct 7;10(10):e0138095. doi: 10.1371/journal.pone.0138095.
Sonawane AR, Weiss ST, Glass K, Sharma A. Network Medicine in the Age of Biomedical Big Data. Front Genet. 2019 Apr 11;10:294. doi: 10.3389/fgene.2019.00294.
Khoury MJ. Planning for the Future of Epidemiology in the Era of Big Data and Precision Medicine. Am J Epidemiol. 2015 Dec 15;182(12):977–9. doi: 10.1093/aje/kwv228.
Vicini P, Fields O, Lai E, Litwack ED, Martin A-M, Morgan TM, et al. Precision medicine in the age of big data: The present and future role of large-scale unbiased sequencing in drug discovery and development. Clin Pharmacol Ther. 2016 Feb;99(2):198–207. doi: 10.1002/cpt.293.
Canales C, Lee C, Cannesson M. Science Without Conscience Is but the Ruin of the Soul: The Ethics of Big Data and Artificial Intelligence in Perioperative Medicine. Anesth Analg. 2020 May;130(5):1234–43. doi: 10.1213/ANE.0000000000004728.
McCue ME, McCoy AM. The Scope of Big Data in One Medicine: Unprecedented Opportunities and Challenges. Front Vet Sci. 2017 Nov 16;4:194. doi: 10.3389/fvets.2017.00194.
Austin C, Kusumoto F. The application of Big Data in medicine: current implications and future directions. J Interv Card Electrophysiol. 2016 Oct;47(1):51–9. doi: 10.1007/s10840-016-0104-y.
Leopold JA, Maron BA, Loscalzo J. The application of big data to cardiovascular disease: paths to precision medicine. J Clin Invest. 2020 Jan 2;130(1):29–38. doi: 10.1172/JCI129203.
Jordan L. The problem with Big Data in Translational Medicine. A review of where we’ve been and the possibilities ahead. Artif Cells Nanomed Biotechnol. 2015 Sep;6(3):3–6. doi: 10.1016/j.atg.2015.07.005.
Cook TW, Wilstermann AM, Mitchell JT, Arnold NE, Rajasekaran S, Bupp CP, et al. Understanding Insulin in the Age of Precision Medicine and Big Data: Under-Explored Nature of Genomics. Biomolecules. 2023 Jan 30;13(2):257. doi: 10.3390/biom13020257.
#RealWorldData #ClinicalTrials #DrugDevelopment #DigitalHealth #BigData


