Public health remains a major issue in most developing countries worldwide, especially in sub-Saharan Africa where poverty is most severe. To name a few, only half of pregnant women receive the recommended minimum of four antenatal care visits; only 56 percent of births in rural areas are attended by skilled health personnel, compared with 87 percent in urban areas; and in sub-Saharan Africa still less than 40 percent of youth aged 15 to 24 years had comprehensive correct knowledge of HIV in 2014. This is mostly because critical data for health and development policymaking are still lacking in these parts of the world. Poor data collection, poor data quality, lack of timely data and unavailability of disaggregated data on important dimensions, are major challenges faced by the policymakers 1. Consequently, making decisions and monitoring progress towards achieving good health for all is practically a nightmare, thus the need for more efficient and holistic information systems.
The advent of the digital age has led to a rise in different types of complex data which need to be stored, processed and analysed to generate timely information 2-4. By leveraging emerging Big Data in public health, Big Data can amalgamate all data related to the population to get a complete view of their health status to analyse and predict outcomes using big data analytics as tool. It can enhance the development of new drugs, healthcare financing, clinical approaches, healthcare quality and efficiency, fraud detection, and early disease detection 5. In low and middle-income countries, some applications of Big Data in public health include Genome Sequencing Techniques, Personal-based Health Records, and Universal Health Coverage and its Needs for Data Support. However, this concept comes with privacy and security challenges, ownership related challenges, infrastructure related challenges, socio-technical challenges, and epistemological dilemmas 6, 7.
Data collection, transformation, and analytics are the core elements of the general framework of Big Data analytics 5. The fact that data collection in public health is a major concern in developing countries systematically implies that results from Big Data transformation and analytics in those contexts are unreliable. Thus, developing countries do not benefit from the real potentials of this concept. Given these challenges more or less unique to developing countries, this research aims at proposing a comprehensive framework of Big Data that would enable developing countries to exploit Big data’s full potentials in the domain of public health.
The section that follows presents background information on Big data applied to public health.
Large amounts of data that can be used in public health are being generated every day at an unprecedented speed. This has made the application of big data in healthcare to be one of the fastest-growing fields, with many new discoveries and methodologies published in the last five years 8. These studies have revealed that though the concept is still nascent, big data has the potential to transform the healthcare industry 9 – it has the ability to support transitions in Models of Healthcare Delivery, enable more effective disease surveillance, detect and prevent fraud, and improve pharmaceutical development 5, 6. However, data scientists working in healthcare face several challenges in collecting, analysing and storing these massive and complex datasets 10. Thus, they use multiple new and powerful technologies to extract useful information and enable more broad-based healthcare solutions 8.
Useful information in public health is one that can help in three core functions: public health assessment, policy development, and assurance 8. Thus, big data solutions are only useful to public health if they can help satisfy these three core functions which determine the robustness of health systems. The problem is that big data does not always reflect the growing diversity of the population. According to Malanga et al. (2016) 11, it is particularly complex to use big data in capturing certain marginalized demographics, especially people with low socioeconomic status, the LGBT community, and immigrants. This is because most of these people are missing from regular data sources of big data analytics like internet history, social media, EHRs and genomic databases. These gaps are often because these minorities experience lack of insurance, inability to access healthcare, and low levels of health literacy, to list a few. This implies that, in developing countries where the poor are the majority, big data solutions practically exclude the very people who need increased healthcare services the most.
To ensure that big data solutions are inclusive, data is often collected from traditional health systems (medical records, surveys…), largescale datasets in molecular and biological fields (genomics, microbiomics…), and web tools (social media, internet…) 12 to assess population healthcare needs. The information generated is supposed to be analysed to ensure that adequate policies are developed to meet the needs of the population. Finally, big data analytics needs to provide information that will be used to evaluate how well public health institutions have or are able to meet their objectives. However, this is not enough for developing countries. Data collection for big data analytics needs to be even more inclusive. New data collection techniques need to be developed, which would systematically affect the transformation, analysis and presentation processes of big data analytics in these countries.
Figure 1 presents the generally applied conceptual architecture/framework of big data analytics. This gives rise to organizational, technological, legal and ethical issues that need be addressed (Table 1). While technological issues may be common to health systems worldwide, organizational, legal and ethical issues vary from one country to another. However, no literature was found on a methodology or framework of big data applied to public health in developing countries given their specificities. This implies that such a study is either completely absent or very little has been done on it. Based on these research gaps, the following elements were developed: