Best performer Epidemium 2018 – Open Cancer – Grand prize

Benjamin Schannes, Martin Micalef, Lino Galiana and Benoît Rolland

Open data coming from International Organizations repositories require sound data cleaning and preprocessing procedures but can then be used to identify and select the most relevant, statistically significant and causal macro drivers likely to play a role in the cancer incidence evolution equation. To reach accuracy and interpretability standards, both required to provide medical safeguards, we adopt a localization –using grouping variables to define strata- approach at each preprocessing and processing step. We have designed fine-tuned methods to build a clean, merged -WHO, WB, ILO and FAO- and large dataset anyone can use directly, and a prediction tool aimed at helping health institutions to identify the most prominent risk factors, to quantify their impacts on cancer incidence and to prioritize prevention actions.