With the COVID-19 disease, risk factors and co-morbidities are at stakes. Patients presenting such risks should indeed receive specific attentions and cares. The fact is that risk factors and co-morbidities reports are spread in many publications which does not provide a global view with actionable information.
In 2020, to cope with this problem, we developed a project, named VIDAR-19 (VIsualization of Diseases At Risk in CORD-19), able to extract automatically diseases from the ICD-11 in the coronavirus literature, and also diseases which might be considered as risk factors or co-morbidities.
In 2021, with the extraordinary worldwide efforts to provide a medical care for this new disease, the emphasis was on side effects of such rapidly designed vaccines. To cope with this new challenge, we extended VIDAR-19 so as to also extract vaccine side effects by using the techniques already used and developed for risk factors
This dashboard shows the outcome of the project. We present briefly below the different tabs. The online dashboard is available at: https://vidar-19.yotta-conseil.fr/
This tab shows the share of branches in 3 sets of diseases: diseases with a code in ICD-11, diseases found in the corpus and diseases which might be considered as risk factors.
It shows also 2 treemap graphics which compare by branch (or sub-branch) the occurrences of diseases in the corpus vs. the occurrences of risk factors.
This tab highlights the diseases, or the branches, for which the document frequency for risk factors, or side effects, is higher than the one for diseases in the corpus.
This tab enables to search for documents which contain a disease or a branch. Then it is possible to view the document source.
This tab enables to search for documents which contain a variant. A short list is available: D614G, E484K, E484Q, K417N, L452R, N439K, N501Y, P681H, P681R, R346K, S477N, V367F, Y453F. Then it is possible to view the document source.
CORD-19 is a corpus of academic papers about COVID-19 and related coronavirus research. It has been curated and maintained by the Semantic Scholar team at the Allen Institute for AI to support text mining and NLP research.
This tab presents the CORD-19 documents dataset. It provides some insights into the processed documents from the selected datasets: How many documents? How many documents mentioning at least one disease? How many documents mentioning at least one disease which might be considered as a risk factor, or as a side effect? This tab enables also to select documents dealing with COVID-19 (virus or disease) or Vaccines.
ICD-11 is the International Classification of Diseases from the World Health Organization.
This tab presents the ICD-11 documents dataset. It gives some insights into the diseases which have been extracted: How many diseases in the database? How many diseases mentioned in the corpus? How many diseases which might be considered as risk factors?
As the final release of the CORD-19 dataset has been produced on June 2, 2022. They won't be any update of the dashboard after this last release.
The VIDAR-19 project has been implemented in Python. The graphical part of the project has been encapsulated into the Dash framework from plotly to get this web application. It is hosted on PythonAnywhere.
VIDAR sounds like radar or lidar. Viðarr is also the name of a god in Norse mythology.
Francis Wolinski, Visualization of Diseases at Risk in the COVID-19 Literature, arXiv, May 2020, https://arxiv.org/abs/2005.00848
Francis Wolinski, Automatic Extractions of Risk Factors from COVID-19 Literature, 1st SciNLP: Natural Language Processing and Data Mining for Scientific Text, June 2020, [poster abstract], presented at the SciNLP 2020 workshop hosted at AKBC 2020: [poster video]
Francis Wolinski, Systematic Extraction of Covid-19 Risk Factors and Vaccine Side Effects, 2nd SciNLP: Natural Language Processing and Data Mining for Scientific Text, October 2021, [poster abstract], presented at the SciNLP 2021 workshop hosted at AKBC 2021: [poster video]