Category Archives: Bayesian networks

Predictive Models in Medicine and Health – Forecasting Epidemics

I’m interested in everything under the sun relating to forecasting – including sunspots (another future post). But the focus on medicine and health is special for me, since my closest companion, until her untimely death a few years ago, was a physician. So I pay particular attention to details on forecasting in medicine and health, with my conversations from the past somewhat in mind.

There is a major area which needs attention for any kind of completion of a first pass on this subject – forecasting epidemics.

Several major diseases ebb and flow according to a pattern many describe as an epidemic or outbreak – influenza being the most familiar to people in North America.

I’ve already posted on the controversy over Google flu trends, which still seems to be underperforming, judging from the 2013-2014 flu season numbers.

However, combining Google flu trends with other forecasting models, and, possibly, additional data, is reported to produce improved forecasts. In other words, there is information there.

In tropical areas, malaria and dengue fever, both carried by mosquitos, have seasonal patterns and time profiles that health authorities need to anticipate to stock supplies to keep fatalities lower and take other preparatory steps.

Early Warning Systems

The following slide from A Prototype Malaria Forecasting System illustrates the promise of early warning systems, keying off of weather and climatic predictions.


There is a marked seasonal pattern, in other words, to malaria outbreaks, and this pattern is linked with developments in weather.

Researchers from the Howard Hughes Medical Institute, for example, recently demonstrated that temperatures in a large area of the tropical South Atlantic are directly correlated with the size of malaria outbreaks in India each year – lower sea surface temperatures led to changes in how the atmosphere over the ocean behaved and, over time, led to increased rainfall in India.

Another mosquito-borne disease claiming many thousands of lives each year is dengue fever.

And there is interesting, sophisticated research detailing the development of an early warning system for climate-sensitive disease risk from dengue epidemics in Brazil.

The following exhibits show the strong seasonality of dengue outbreaks, and a revealing mapping application, showing geographic location of high incidence areas.


This research used out-of-sample data to test the performance of the forecasting model.

The model was compared to a simple conceptual model of current practice, based on dengue cases three months previously. It was found that the developed model including climate, past dengue risk and observed and unobserved confounding factors, enhanced dengue predictions compared to model based on past dengue risk alone.


The latest global threat, of course, is MERS – or Middle East Respiratory Syndrome, which is a coronavirus, It’s transmission from source areas in Saudi Arabia is pointedly suggested by the following graphic.


The World Health Organization is, as yet, refusing to declare MERS a global health emergency. Instead, spokesmen for the organization say,

..that much of the recent surge in cases was from large outbreaks of MERS in hospitals in Saudi Arabia, where some emergency rooms are crowded and infection control and prevention are “sub-optimal.” The WHO group called for all hospitals to immediately strengthen infection prevention and control measures. Basic steps, such as washing hands and proper use of gloves and masks, would have an immediate impact on reducing the number of cases..

Millions of people, of course, will travel to Saudi Arabia for Ramadan in July and the hajj in October. Thirty percent of the cases so far diagnosed have resulted in fatalties.

Causal and Bayesian Networks

In his Nobel Acceptance Lecture, Sir C.J.W. Granger mentions that he did not realize people had so many conceptions of causality, nor that his proposed test would be so controversial – resulting in its being confined to a special category “Granger Causality.’

That’s an astute observation – people harbor many conceptions and shades of meaning for the idea of causality. It’s in this regard that renewed efforts recently – motivated by machine learning – to operationalize the idea of causality, linking it with both directed graphs and equation systems, is nothing less than heroic.

However, despite the confusion engendered by quantum theory and perhaps other “new science,” the identification of “cause” can be materially important in the real world. For example, if you are diagnosed with metastatic cancer, it is important for doctors to discover where in the body the cancer originated – in the lungs, in the breast, and so forth. This can be challenging, because cancer mutates, but making this identification can be crucial for selecting chemotherapy agents. In general, medicine is full of problems of identifying causal nexus, cause and effect.

In economics, Herbert Simon, also a Nobel Prize recipient, actively promoted causal analysis and its representation in graphs and equations. In Causal Ordering and Identifiability, Simon writes,


For example, we cannot reverse the causal chain poor growing weather → small wheat crops → increase in price of wheat by an attribution increase in price of wheat → poor growing weather.

Simon then proposes that the weather to price causal system might be represented by a series of linear, simultaneous equations, as follows:


This example can be solved recursively, first by solving for x1, then by using this value of x1 to solve for x2, and then using the so-obtained values of x1 and x2 to solve for x3. So the system is self-contained, and Simon discusses other conditions. Probably the most important is assymmetry and the direct relationship between variables.

Readers interested in the milestones in this discourse, leading to the present, need to be aware of Pearl’s seminal 1998 article, which begins,

It is an embarrassing but inescapable fact that probability theory, the official mathematical language of many empirical sciences, does not permit us to express sentences such as “”Mud does not cause rain”; all we can say is that the two events are mutually correlated, or dependent – meaning that if we find one, we can expect to encounter the other.”

Positive Impacts of Machine Learning

So far as I can tell, the efforts of Simon and even perhaps Pearl would have been lost in endless and confusing controversy, were it not for the emergence of machine learning as a distinct specialization

A nice, more recent discussion of causality, graphs, and equations is Denver Dash’s A Note on the Correctness of the Causal Ordering Algorithm. Dash links equations with directed graphs, as in the following example.

DAGandEQS Dash shows that Simon’s causal ordering algorithm (COA) to match equations to a cluster graph is consistent with more recent methods of constructing directed causal graphs from the same equation set.

My reading suggests a direct line of development, involving attention to the vertices and nodes of directed acyclic graphs (DAG’s) – or graphs without any backward connections or loops – and evolution to Bayesian networks – which are directed graphs with associated probabilities.

Here is are two examples of Bayesian networks.

First, another contribution from Dash and others


So clearly Bayesian networks are closely akin to expert systems, combining elements of causal reasoning, directed graphs, and conditional probabilities.

The scale of Bayesian networks can be much larger, or societal-wide, as this example from Using Influence Nets in Financial Informatics: A Case Study of Pakistan.


The development of machine systems capable of responding to their environment – robots, for example – are a driver of this work currently. This leads to the distinction between identifying causal relations by observation or from existing data, and from intervention, action, or manipulation. Uncovering mechanisms by actively energizing nodes in a directed graph, one-by-one, is, in some sense, an ideal approach. However, there are clearly circumstances – again medical research provides excellent examples – where full-scale experimentation is simply not possible or allowable.

At some point, combinatorial analysis is almost always involved in developing accurate causal networks, and certainly in developing Bayesian networks. But this means that full implementation of these methods must stay confined to smaller systems, cut corners in various ways, or wait for development (one hopes) of quantum computers.

Note: header cartoon from