Medical Decision Making

Dr. Ali M. Hadianfard (Medical Informatics)
Faculty member of AJUMS (paramedical school)
Further reading
Biomedical Informatics-Computer Applications in Health Care and Biomedicine,
Edward H. Shortliffe, James J. Cimino, 4th Ed., 2014 (chapter 3 & 22).
Decision Support Systems and Intelligent Systems, Efraim Liang,Ting-Peng
Aronson, Jay E. Turban, 7th Ed., 2004 (chapter 2).
Medical Decision Making, Harold C. Sox, Michael C.Higgins, Douglas K. Owens,
2nd Ed., 2013 (chapters 3, 5 & 6).
From Patient Data to Medical Knowledge The Principles and Practice of Health
Informatics, Paul_Taylor, 2006 (chapter 10).
Clinical Decision Support Systems Theory and Practice Health Informatics, Eta
S. Berner, 2nd Ed., 2006 (chapters 1 & 2).
Fuzzy control and identification, John. Lilly, 2010 (chapters 1 & 2).
What is Medical Informatics?
The study of applying computer
technology to manage medical
information in order to affect medical
care and support problem-solving
and decision-making.
Can be considered as the cognitive process
(the thought process) selecting a logical choice
from the available alternatives.
Common examples include shopping,
deciding what to eat, and deciding
whom or what to vote for in
an election or referendum.
A problem occurs when a system does
not meet its established goals, does
not yield the predicted results, or does
not work as planned. Problem-solving
may also deal with identifying new
Sometimes the terms decision-making
and problem-solving are used
Decision-making process
Simon's model (1977) is the most concise
and yet complete characterization of rational
decision-making. This involves three major
and Choice.
He later added a fourth phase,
Implementation. Monitoring can be
considered a fifth phase.
Decision-making process
Intelligence phase:
 Scanning the environment
 Data collection (objectives)
 Problem identification
 Problem ownership ( at the national or international
levels or a problem exists in an organization only if
someone or some group takes on the responsibility
of attacking it and if the organization has the ability
to solve it)
 Problem classification (definable category, wellstructured problems, unstructured problems )
 Problem statement
Decision-making process
Design phase:
These include understanding the problem and testing solutions
for feasibility. A model of the decision-making problem is
constructed, tested, and validated
 Formulate a model based on the relationships among all the
variables (Modeling involves conceptualizing the problem
and abstracting it to quantitative and/or qualitative form. For
a mathematical model, the variables are identified and their
mutual relationships are established)
 Validate of the model
 Set criteria for choice
 Search for alternatives
 Predict and measure outcomes
Decision-making process
Choice phase:
Selection of a proposed solution to the model, Selection of
best (good) alternative(s) Plan for implementation
The solution is tested to determine its viability and
The boundary between the design and choice phases is often
unclear because certain activities can be performed during
both of them and because one can return frequently from
choice activities to design activities. For example, one can
generate new alternatives while performing an evaluation of
existing ones.
Decision-making process
Implementation phase:
Once the proposed solution seems reasonable,
we are ready for the last phase: implementation
of the decision. implementation means putting
a recommended solution to work. Successful
implementation results in solving the real
problem. Failure leads to a return to an earlier
phase of the process. In fact, we can return to
an earlier phase during any of the latter three
Decision-making under different conditions
• There is perfect knowledge of all the information needed to make a
• Problems are structured
• Solutions are already available from past experiences
• Information is incomplete
• The problem and the alternatives are defined, but has no guarantee
how each solution will work.
• It is feasible to make a list of all possible outcomes and assign
probabilities to the various outcomes.
• information is very poor
• Problems are unstructured
• Decision maker cannot list all possible outcomes and/or cannot assign
probabilities to the various outcomes.
Medical Decision Making is under Uncertainty
Decision making is one of the quintessential activities of
the healthcare professional.
Some decisions are made on the basis of deductive
(subtract from a total; antonym: inductive ) reasoning or
of physiological principles.
Many decisions, however, are made on the basis of
knowledge that has been gained through collective
experience: the clinician often must rely on empirical
knowledge of associations between symptoms and
disease to evaluate a problem.
A decision that is based on these usually imperfect
associations will be, to some degree, uncertain.
Clinical data are imperfect. The degree of imperfection
varies, but all clinical data—including the results of
diagnostic tests, the history given by the patient, and the
findings on physical examination— are uncertain.
Example: uncertain condition
Mr. James is a 59-year-old man with coronary artery disease. The patient often experiences
chest pain (angina). Mr. James has twice undergone coronary artery bypass graft (CABG)
surgery. Unfortunately, he has again begun to have chest pain, which becomes
progressively more severe, despite medication. If the heart muscle is deprived of oxygen,
the result can be a heart attack (myocardial infarction), in which a section of the muscle
Should Mr. James undergo a third operation?
The medications are not working; without surgery, he runs a high risk of
suffering a heart attack, which may be fatal. On the other hand, the surgery is
hazardous. Not only is the surgical mortality rate for a third operation higher
than that for a first or second one but also the chance that surgery will relieve
the chest pain is lower than that for a first operation.
All choices in the example entail considerable uncertainty. Furthermore, the
risks are grave; an incorrect decision may substantially increase the chance
that Mr. James will die. The decision will be difficult even for experienced
The use of probability or odds as an expression of uncertainty avoids the
ambiguities inherent in common descriptive terms.
Probability is represented numerically by a number
between 0 and 1. Statements with a probability of 0 are
false. Statements with a probability of 1 are true. An
event that is certain to occur has a probability of 1; an
event that is certain not to occur has a probability of 0.
probability of 0.5 or 50% are just as likely to be true as
false. The probability of event A is written p[A]. The sum
of the probabilities of all possible, collectively exhaustive
outcomes of a chance event must be equal to 1. e.g.,
p[heads]+ p[tails] = 1.0.
Probabilities can be combined to yield new probabilities.
and: p[A∩B] = p[A]  p[B]
or: p[A U B] = p[A] + p[B]
Conditional Probability
The probability that event A will occur given that event B is
known to occur is called the conditional probability of event A
given event B, denoted by p[A|B] and read as “the probability of
A given B.” Thus a post-test probability is a conditional
probability predicated on the test or finding. For example, if 30 %
of patients who have a swollen leg have a blood clot, we say the
probability of a blood clot given a swollen leg is 0.3, denoted:
p[blood clot | swollen leg] = 0.3.
p[A∩B] = p[A]  p[B|A]
(Bayes’ theorem)
The ratio of the probability of an event occurring over the probability of
the event not occurring. Odds and probability are equivalent. The
relationship between the odds of an event and its probability is the
where p is the probability
that the event will occur
For example, if the probability of the event is 0.67, the odds of the event
are 0.67 divided by 0.33, or 2 to 1. Another way to express the odds of an
event is p:(1−p). Thus, writing 2:1 is equivalent to saying ‘‘2 to 1 odds.’’
Some find it especially useful to use odds to express their opinion about
very infrequent events (1 to 99 odds, rather than a probability of 0.01) or
very common events (99 to 1 odds, rather than a probability of 0.99).
Probability Assessment
Probability assessment means asking a person to use a number to express
how strongly he/she believes that an event will occur.
When to estimate probability:
If the probability of a disease is very low, doing nothing will be the best
choice. Treating without further testing is the best choice if the
probability of the target condition is relatively high. Testing (getting
more information) is best when the probability of disease is
How to estimate probability:
Subjective Probability Assessment
Objective Probability Estimates
Probability Assessment- The source of information
1. Personal experience: When estimating probability, a clinician
relies on personal experience with similar events. For example, a
surgeon uses her experience with similar patients when she
estimates the probability that Mr. Jones will survive an open heart
2. Published experience: Published articles report the frequency of
death after surgical procedures. These reports provide an average
frequency for a large but not necessarily diverse population,
raising questions about its applicability to a specific population.
3. Attributes of the patient: The experienced clinician uses
published reports and personal experience to make an estimate that
applies to the average patient. She/he then adjusts the estimate
upward or downward starting from this average figure if the
patient has unusual characteristics that might affect his risk (e.g.,
advanced age or many chronic conditions).
Subjective Probability Assessment
It is based on personal experience. An unconscious
mental processes that have been described and studied
by cognitive psychologists. These processes are termed
cognitive heuristics. A cognitive heuristic is a mental
process by which we learn, recall, or process
information; we can think of heuristics as rules of thumb
(guide or principle based on experience or practice). We
may make mistakes in estimating probability in
deceptive clinical situations.
Subjective Probability Assessment
Three heuristics have been identified as important in estimation of
1) Representativeness. Are judged by the degree to which A is
representative of, or similar to, B. For instance, what is the
probability that this patient who has a swollen leg belongs to the
class of patients who have blood clots? To answer, If the patient
has all the classic findings (signs and symptoms) associated with
a blood clot, the clinician judges that the patient is highly likely
to have a blood clot. Difficulties occur with the use of this
heuristic when the disease is rare (very low prior probability, or
prevalence). when the clinician’s previous experience with the
disease is atypical, thus giving an incorrect mental representation;
when the patient’s clinical profile is atypical; and when the
probability of certain findings depends on whether other findings
are present.
(More examples can be found in Medical Decision Making, Harold C. Sox, Michael
C.Higgins, Douglas K. Owens, 2nd Ed., 2013, P. 38-44.)
Subjective Probability Assessment
2) Availability. Our estimate of the probability of an event is influenced by the
ease with which we remember similar events. Events more easily remembered
are judged more probable; this rule is the availability heuristic, and it is often
misleading. We remember dramatic, atypical, or emotion-laden events more
easily and therefore are likely to overestimate their probability. A clinician
who had cared for a patient who had a swollen leg and who then died from a
blood clot would vividly remember thrombosis as a cause of a swollen leg.
The clinician would remember other causes of swollen legs less easily, and he
or she would tend to overestimate the probability of a blood clot in patients
with a swollen leg.
3) Anchoring and adjustment. A clinician makes an initial probability
estimate (the anchor) and then adjusts the estimate based on further
information. For instance, the clinician makes an initial estimate of the
probability of heart disease as 0.5. If he or she then learns that all the patient’s
brothers had died of heart disease, the clinician should raise the estimate
because the patient’s strong family history of heart disease increases the
probability that he or she has heart disease, a fact the clinician could ascertain
from the literature. The usual mistake is to adjust the initial estimate (the
anchor) insufficiently in light of the new information.
Objective Probability Estimates
Published research results can serve as a guide for more objective estimates
of probabilities.
We can use the prevalence of disease in the population or in a subgroup of
the population, or clinical prediction rules, to estimate the probability of
Estimates of disease prevalence in a defined population often are available
in the medical literature.
The prevalence of a disease in patients who have in common a symptom,
physical finding, or diagnostic test result helps a clinician to diagnose the
Symptoms, such as difficulty with urination, or signs, such as a palpable
prostate nodule, can be used to place patients into a clinical subgroup in
which the probability of disease is known. This approach may be limited by
difficulty in placing a patient in the correct clinically defined subgroup,
especially if the criteria for classifying patients are ill-defined. A trend has
been to develop guidelines, known as clinical prediction rules, to help
clinicians assign patients to well-defined subgroups in which the
probability of disease is known.
Objective Probability Estimates– Example
A medical student evaluates a young man with abdominal pain. She
is concerned about the possibility of appendicitis. The pain is present
throughout the abdomen and is associated with loose bowel
movements. The patient does not have localized abdominal
tenderness, fever, or an increased blood leukocyte count. The
medical student presents the patient to the chief surgical resident
who, to the student’s surprise, discharges the patient from the
emergency room.
The chief surgical resident knows that the prevalence of appendicitis
among self-referred adult males with abdominal pain is only 1%. The
student should use this information as a starting point as she uses the
patient’s clinical findings to estimate the probability of appendicitis. If the
history and physical examination do not suggest appendicitis, the
probability of appendicitis is very low, since it was 1% in the average
patient with abdominal pain. If the examination does suggest appendicitis,
the student’s estimate of probability must reflect the low prevalence of
appendicitis in all men with abdominal pain.
Objective Probability Estimates
Clinical prediction rules are developed from systematic study of
patients who have a particular diagnostic problem; they define how
clinicians can use combinations of clinical findings to estimate
probability. Clinical prediction rules are statistical models of the
diagnostic process. The symptoms or signs that make an
independent contribution to the probability that a patient has a
disease are identified and assigned numerical weights based on
statistical analysis (Regression analysis, assigns a numerical weight
to each predictor) of the finding’s contribution. The result is a list of
symptoms and signs for an individual patient, each with a
corresponding numerical contribution to a total score (Discriminant
Score). The total score places a patient in a subgroup with a known
probability of disease.
Clinical prediction rules – Example
Ms. Troy, a 65-year-old woman who had heart attack 4 months ago,
has abnormal heart rhythm (arrhythmia), is in poor medical
condition, and is about to undergo elective surgery.
What is the probability that Ms. Troy will suffer a cardiac
Table 3.1 lists clinical findings and their corresponding diagnostic
weights. We add the diagnostic weights for each of the patient’s
clinical findings to obtain the total score. The total score places the
patient in a group with a defined probability of cardiac complications,
as shown in Table 3.2. Ms. Troy receives a score of 20; thus, the
clinician can estimate that the patient has a 27 % chance of
developing a severe cardiac complication.
Recursive partitioning
Recursive partitioning is a statistical process that leads to an
algorithm for classifying patients. In recursive partitioning, the
diagnostic process is represented by a series of yes–no decision
points. If a patient has a finding, he is placed in one group; if not, he
is placed in a second group. Each of the two groups resulting from
the first yes–no decision point is subjected to a second yes–no
question about another finding. The process continues until it
reaches a pre-defined stopping point. The goal of the process is to
place each patient into a group in which the prevalence of disease is
either very high or very low. Typically, the finding that is used at
each yes–no decision point is the one that best discriminates
between the diseased and non-diseased patients at that point in
the partitioning process.
Recursive partitioning – Example
A modified version of recursive partitioning was used to categorize
adults with a sore throat as having a high, medium, or low
probability of having a beta hemolytic streptococcal infection
(Figure 3.8). According to some authors, patients with a high
likelihood of infection should have treatment without obtaining a
throat culture. In patients with a low likelihood of infection, neither
throat culture nor treatment may be indicated.
Decision Tree
Decision Tree
The decision tree, a method for representing and comparing
the expected outcomes of each decision alternative. It is one
way to display an algorithm. It can be used when the
outcomes are uncertain, e.g. the results of a surgical operation
are unknown. This technique help clinicians to clarify the
decision problem and thus to choose the alternative that is
most likely to help the patient.
There are two available therapies for a fatal illness. The length of a patient’s life
after either therapy is unpredictable, as illustrated by the frequency distribution
shown in Fig. 3.6 and summarized in Table 3.5. Each therapy is associated
with uncertainty: regardless of which therapy a patient receives, he will die by the
end of the fourth year, but there is no way to know which year will be the patient’s
last. Figure 3.6 shows that survival until the fourth year is more likely with
therapy B, but the patient might die in the first year with therapy B or might
survive to the fourth year with therapy A. Which of the two therapies is
Decision tree - component
A decision tree diagram consists of 3 types of nodes:
1) Decision nodes - commonly represented by squares
2) Chance nodes - represented by circles
3) End (terminal) nodes or Outcomes - represented by triangles or
solid circles
A chance node is shown as a circle from which several lines
emanate. Each line represents one of the possible outcomes.
An example of decision tree
The example of decision tree – A chance-node
The outcome of a
chance event,
unknowable for the
individual, can be
represented by the
expected value at the
chance node.
Decision Tree – Expected-value Decision Making
Calculating Expected Value: The term expected value is used to
characterize a chance event, such as the outcome of a therapy. If the
outcomes of a therapy are measured in units of duration of survival,
units of sense of well-being, or dollars, the therapy is characterized
by the expected duration of survival, expected sense of well-being,
or expected monetary cost that it will confer on, or incur for, the
patient, respectively.
To use expected-value decision making, we follow this strategy
when there are therapy choices with uncertain outcomes:
(1) calculate the expected value of each decision alternative and then
(2) pick the alternative with the highest expected value.
Calculating Expected Value – Example 1
We use the average duration of life after therapy (survival) as a
criterion for choosing among therapies. The first step we take in
calculating the mean survival for a therapy is to divide the
population receiving the therapy into groups of patients who have
similar survival rates. Then, we multiply the survival time in each
group by the fraction of the total population in that group. Finally,
we sum these products over all possible survival values.
Mean survival for therapy A
A = (0.2 × 1) + (0.4 × 2) + (0.3 × 3) + (0.1 × 4) = 2.3 years.
Mean survival for therapy B
B = (0.05 × 1) + (0.15 × 2) + (0.45 × 3) + (0.35 × 4) = 3.1 years.
we should select therapy B.
Calculating Expected Value – Example 2
The person who faced this decision was a 60-year-old man we will call ‘‘Hank.’’ Hank
had a history of eczema. Because of this chronic condition, he was unconcerned when a
rash first appeared near his anus. However, the persistent discomfort eventually led
Hank to seek medical attention. His dermatologist performed a biopsy, which showed
that a rare skin cancer, called perianal Paget’s disease, was causing Hank’s newly
discovered rash. This disease starts in the epidermis but often will metastasize.
Therefore, the dermatologist referred Hank to an oncologist for treatment.
First treatment alternative – traditional surgery: In this case the Hank would lose the
function of his rectum and be forced to live the remainder of his life with a colostomy bag.
Second treatment alternative – microscopically directed surgery: the resections
might stop short of the anal mucosa, thereby avoiding the risk of a colostomy.
Third treatment alternative – do nothing: This alternative would leave him with untreated
local disease, which ultimately could result in an invasive cancer, metastases, and death.
Expected value analysis only captures that single
aspect of what can happen to Hank – the length of his
life. It ignores the impact of the decision on other
factors, such as his quality of life.
Sensitivity Analysis
Is the systematic exploration of how the value of one or more
parameters will affect the decision-making implications of a model.
This tool can support the validity of decision analysis by revealing
how changes in the probabilities will affect the conclusions of the
Folding Back: A decision tree when the problem includes
more than one decision node
Algorithm for folding back a decision tree
1. Start with the most distal nodes.
2. Replace each chance node with its expected value
p1y1 + p2y2 + p3y3 + . . . + pNyN
p1, p2, p3, . . . , pN are the probabilities for the possible outcomes and
y1, y2, y3, . . . , yN are the corresponding values associated with the outcomes.
3. Replace each decision node with the maximum expected value for the possible
Maximum of x1, x2, x3, . . . , xM
x1, x2, x3, . . . , xM are the expected values for the possible alternatives.
4. Repeat until the initial node is reached.
Any computer system, which utilizes clinical data or
medical knowledge to help health care practitioners to
make a decision, can be considered as a Clinical
Decision Support System (CDSS).
CDSSs are computer systems designed to impact
clinician decision making about individual patients at the
point in time that these decisions are made. (were
designed to assist clinicians at the point of care.)
Systems that provide CDS do not simply assist with the
retrieval of relevant information; they communicate
information that takes into consideration the particular
clinical context, offering situation-specific information
and recommendations. At the same time, such systems
do not themselves perform clinical decision making;
they provide relevant knowledge and analyses that
enable the ultimate decision makers, clinicians, patients,
and health care organizations to develop more informed
If used properly, CDSSs have the potential to change the
way medicine has been taught and practiced.
To influence:
 physician behavior,
 diagnostic test ordering
 other care processes
 costs of care
 clinical outcomes
Although, the use of CDSSs is still not widely accepted
among clinicians, they have been applied in many fields.
CDSSs were used as:
 Diagnostic systems
 Reminder and Alert systems
 Disease management systems
 Drug–dosing or Prescribing systems
 In Outpatient services for diverse purposes such as
prevention/screening, drug dosing, acute disease
management, and chronic disease management
CDS systems may be described in terms
of five right things that they do: they “provide
the right information,
II. to the right person,
III. in the right format,
IV. through the right channel,
V. at the right point
in workflow to improve health and health care decisions and
Systems that provide CDS come in three basic varieties:
1) They may use information about the current clinical context to
retrieve highly relevant online documents, as with so-called
2) They may provide patient-specific, situation-specific alerts,
reminders, physician order sets, or other recommendations for
direct action.
3) They may organize and present information in a way that
facilitates problem solving and decision making, as in dashboards
, graphical displays, documentation templates, structured reports,
and order sets.
Order sets are a good example of the latter because they both may
facilitate decision making by providing a mnemonic function and also
may enhance workflow by providing a means to select a group of
relevant activities quickly. many observers consider knowledge
resources that distill the medical literature and that facilitate manual
selection of content relevant to the current situation to be simple
decision-support systems.
 The timing at which they provide support: before, during,
or after the clinical decision is made.
Active or Passive the support is, that is, whether the
CDSS actively provides alerts or passively responds to
physician input or patient-specific information.
Stand-alone systems or part of noncommercial
computer based patient record systems or physician
order entry systems (CPOE).
They are knowledge-based systems, or non-knowledgebased systems that employ machine learning and other
statistical pattern recognition approaches.
A Knowledge-based system (KBS) is a computer program that reasons and
uses a knowledge base to solve complex problems.
A program that symbolically encodes, in a knowledge base, facts, heuristics,
and models derived from experts in a field and uses that knowledge to
provide problem analysis or advice that the expert might have provided if
asked the same question.
Many of today’s knowledge-based CDSS arose out of earlier expert systems.
Many of the earliest systems were diagnostic decision support systems. The
intent of these CDSS was no longer to simulate an expert’s decision making,
but to assist the clinician in his or her own decision making. The system was
expected to provide information for the user, rather than to come up with
“the answer,” as was the goal of earlier expert systems.
There are three parts to most CDSS.
These parts are:
the knowledge base,
the inference or reasoning engine,
III. and a mechanism to communicate
with the user
The knowledge base consists of compiled information that is often, but not always,
in the form of if–then rules (rule - based system).
The rules structure contains an antecedent and a consequence. The general form of
the rule is in the form of ‘IF {condition} THEN {statement}’.
An example of an if–then rule might be, for instance, IF a new order is placed for a
particular blood test that tends to change very slowly, AND IF that blood test was
ordered within the previous 48 hours, THEN alert the physician. In this case, the
rule is designed to prevent duplicate test ordering.
Other types of knowledge bases might include probabilistic associations of signs and
symptoms with diagnoses, or known drug–drug or drug–food interactions.
 The second part of the CDSS is called the inference
engine or reasoning mechanism, which contains the
formulas for combining the rules or associations in
the knowledge base with actual patient data. It
operates on the rules to generate the necessary
Finally, there has to be a communication
mechanism, a way of getting the patient data into
the system and getting the output of the system to
the user who will make the actual decision.
The decision support system’s knowledge base
contains information about diseases and their signs
and symptoms. The inference engine maps the patient
signs and symptoms to those diseases and might
suggest some diagnoses for the clinicians to consider.
These systems generally do not generate only a
single diagnosis, but usually generate a set of
diagnoses based on the available information.
Because the clinician often knows more about the
patient than can be put into the computer, the clinician
will be able to eliminate some of the choices. Most of
the diagnostic systems have been stand-alone
There are CDSS that are part of computerized
physician order entry (CPOE) systems that take
a new medication order and the patient’s current
medications as input, the knowledge base might
include a drug database and the output would
be an alert about drug
interactions so that the physician could change
the order.
Unlike knowledge-based decision support systems,
some of the non-knowledge-based CDSS use a form of
artificial intelligence called machine learning, which
allows the computer to learn from past experiences
and/or to recognize patterns in the clinical data.
Artificial neural networks and genetic algorithms are
two types of non knowledge-based systems.
Top-down systems use rules, typically derived from
Bottom-up systems use tools like neural networks or
machine learning in which “smart” software can find novel
or unexpected information by analyzing large datasets for
Top-down systems typically require on-going maintenance
and supervision, whereas bottom-up systems can be selfteaching.
Applying diagnosis system dates back to many years ago.
Starting in the late 1960s, F. T. de Dombal and his associates
at the University of Leeds. the Leeds abdominal pain system,
used sensitivity, specificity, and disease- prevalence data for
various signs, symptoms, and test results to calculate, using
Bayes’ theorem, the probability of seven possible explanations
for acute abdominal pain (appendicitis, diverticulitis, perforated
ulcer, cholecystitis, small-bowel obstruction, pancreatitis, and
nonspecific abdominal pain). Using surgical or pathologic
diagnoses as the gold standard.
In the years 1972 and 1973, Internist-1, a
diagnostic consultant system, was designed by
Myers, Pople, and Miller at the University of
Pittsburgh, Pennsylvania, United States. The
system was introduced as a stand-alone system
and an expert system knowledge- based.
In 1985-1986, the QMR (Quick Medical Reference) system was
developed based on the Internist-1. The system included about
600 diseases and 4500 clinical findings or disease manifestations
related to the diseases in general internal medicine. The clinical
findings obtained from medical history, symptoms, signs, and
laboratory results. QMR used many-to-many relationship data
model between the diseases and the findings, for example, fever,
as a disease manifestation was associated with multiple disease.
QMR was designed based on asking questions in order to
formulate differential diagnosis in a passive role.
MYCIN, a stand-alone system, developed at Stanford University in
mid-1970s by Dr. Edward Shortliffe as a knowledge-based expert
system to diagnose and recommend suitable treatments for
infectious diseases. Artificial intelligence (AI) was used to design
MYCIN program and the inference engine was made based on a
set of ‘IF-THEN’ rules. MYCIN as a consultation system assisted
clinicians to make decision about the best option to take with a
passive role in giving advice, for example, determining the bacteria
cause of Bacteremia and Meningitis, and recommending proper
antibiotics and dosage.
In the 1980s DXplain was developed at the
Massachusetts General Hospital, Boston, United
States as a diagnosis support system to aid
practitioners in making differential diagnosis based
on manifestations entered into the system. DXplain
was an expert knowledge-based system containing
about 5000 clinical manifestations (signs, symptoms,
medical laboratory results) associated with about
2200 diseases.
LDS (latter-Day Saints) hospital in Salt Lake City, Utah, United State
established a hospital information system under the name HELP (Health
Evaluation through Logical Processing) in the 1970s. Unlike the standalone diagnostic systems such as QMR, MYCIN, and DXplain, the HELP
system was an attempt to integrate a diagnosis system with the EPR.
Therefore, if an abnormality was documented into the patient record, the
HELP system as a monitoring program could alert users. As an example,
for a patient who is allergic to penicillin, the HELP system generates an
alert and recommends an alternative preformation plan that may be
preferable if the patient is prescribed a drug in the penicillin class. The
HELP system provided a model of active role in the clinical decision
Providing alerts/reminders automatically as part of the workflow;
Providing the suggestions at a time and location where the
decisions were being made;
Providing actionable recommendations;
Computerizing the entire process.
How the data are entered.
The development and maintenance of the knowledge base
The vocabulary and user interface.
Since these systems may represent a change in the usual way
patient care.
Guidelines for Selecting and Implementing Clinical
Decision Support Systems
Assuring that users understand the limitations: for example, when
the knowledge base and/or reasoning mechanism of the CDSS is
not transparent to the user.
Assuring that the knowledge is from reputable sources. As an
example, What rules are actually included in the system and what
is the evidence behind the rules?
Assuring that the system is appropriate for the local site: for example,
Does the clinical vocabulary in the system match that in the EMR?
What are the normal values assumed by a system alerting to
abnormal laboratory tests?
Guidelines for Selecting and Implementing Clinical
Decision Support Systems
Assuring that users are properly trained: for example, vendors of
CDSS need to be clear about what expertise is assumed in using
the system. Clinician training is needed for physicians to use the
system appropriately.
Assuring the knowledge base is monitored and maintained. As an
example, they must be calibrated to alert the user often enough to
prevent serious errors. the responsibility for updating the
knowledge base in a timely manner. New diseases are
discovered, new medications come on the market.
Soft computing is a term applied to a field within computer science
which is characterized by the use of inexact solutions to
computationally hard tasks. The solutions are unpredictable and
uncertain. Earlier computational approaches could model and
precisely analyze only relatively simple systems. More complex
systems arising in biology, medicine, the humanities, management
Fuzzy Logic, Neural Networks, Genetic Algorithms, and Bayesian
Network (an idea about probability) are discussed under soft
computing field.
Fuzzy logic is modeled on the human reasoning process.
Fuzzy logic was presented by Zadeh (Professor Dr. Lotfali
Askar Zadeh).
In order to implement or simulate fuzzy systems, it is almost
unavoidable to write computer programs. MATLAB is used
exclusively for simulations due to its ease of programming
matrix manipulations and plotting. MATLAB is a high-level
language and interactive environment for numerical
computation, visualization, and programming. Using MATLAB,
you can analyze data, develop algorithms, and create models
and applications.
Fuzzy identification and control methods are used in
many systems. Some automobile manufacturers use
fuzzy logic to control automatic braking Systems.
employing a fuzzy system to control turbidity in
washing machine water. Fuzzy set theory and fuzzy
logic are a highly suitable and applicable basis for
developing knowledge-based systems in medicine for
tasks such as the interpretation of sets of medical
findings and diagnosis of diseases.
A fuzzy set is a collection of real numbers having partial
membership in the set. This is in contrast with conventional, or
crisp sets, to which a number can belong or not belong, but not
partially belong. A crisp set can be described by a characteristic
function μ: X → {0,1}. A fuzzy set can be described as a function
μ: X → [0,1]. The value μ(x) indicates the degree to which x has
the property.
Total membership in the set is specified by a membership value of
1, absolute exclusion from the set is specified by a membership
value of 0, and partial membership in the set is specified by a
membership value between 0 and 1.
An example of representing a medical concept “high fever” as a fuzzy set is
illustrated in the figure.
a) if x is grater than 39o C, then membership function μ(x) of medical
concept “High Fever” is 1 i.e. means that x has surely “high fever”
b) if x is less than 38.5o C, then membership function μ(x) of medical
concept “High Fever” is 0 i.e. means that x has surely not “high fever”
c) if x is in the interval [38.5o C, 39o C], then x has a property “high
fever” with some degree in [0,1].
If the physical quantity under consideration is described by
a word rather than a symbol, that word is called a linguistic
variable . Fuzzy sets are also usually given linguistic
names, called linguistic values. For instance, for the “
height ” variable, we could define a fuzzy set with linguistic
value “ tall ” We could also define fuzzy sets with linguistic
values “ short ” and “ medium ” Linguistic names are used
for variables and their values in fuzzy logic because
people usually think and speak in linguistic terms, not
mathematical symbols.
Another example, linguistic values for nutrition status
variable are severe malnutrition, moderate malnutrition,
mild malnutrition, and normal.
BMI value
linguistic value
< 18.5
18.5 – 24.9
25 – 30
> 30
A fuzzy logic model comprises of four parts Fuzzifier, Fuzzy rules,
Fuzzy Inference Engine and Defuzzifier.
The fuzzifier is the first phase of the fuzzy logic and is also known as
the fuzzification or fuzzy classification. In this step a crisp set of input
data was converted into a fuzzy variable through using fuzzy
linguistic values instead of original numerical values and determining
the type of membership function. The membership value was
considered from 0 to 1 for minimum and maximum membership
The fuzzy rules structure contains an antecedent and a
consequence. The general form of the rule is in the form of
‘IF {condition} THEN {statement}’. The logical operations including
‘AND’, ‘OR’, ‘NOT’ were also used to build up the rules. For example,
Rule: if (BMI is underweight) or (BMI is overweight) or (BMI is
obesity) then (the patient has malnutrition)
The inference engine includes the process of interpreting and
reasoning the fuzzy rules to generate the output from the input.
There are two types of fuzzy inference systems: the Mamdani
method and Takagi-Sugeno-Kang (TSK) method. Both methods are
similar in many aspects.
The main difference between the Mamdani and the Sugeno
methods is in the output membership functions so that in the
Mamdani method the output is a fuzzy set that needs the
defuzzification while in the Sugeno method the output
membership functions are either linear or constant.
The defuzzifier also named the defuzzification is the last
phase in the fuzzy logic model to return the output fuzzy
linguistic value into a single crisp or original numerical
value. As shown in the figure, there are several
defuzzification methods including the largest of max, the
centroid of area, the bisector of area, and the mean of
max. The centroid defuzzification is a popular method
and indicates the center of the area under the
membership function curve of the output variable.
Artificial Neural Networks (ANN)
The history of neural networking arguably started in the late 1800s
with scientific attempts to study the workings of the human brain. In
1890, William James published the first work about brain activity
patterns. In 1943, McCulloch and Pitts produced a model of the
neuron that is still used today in artificial neural networking. In 1949,
Donald Hebb published The Organization of Behavior, which outlined
a law for synaptic neuron learning.
MATLAB is an ideal tool for working with artificial neural networks for
a number of reasons. First, MATLAB is highly efficient in performing
vector and matrix calculations. Second, MATLAB comes with a
specialized Neural Network Toolbox which contains a number of
useful tools for working with artificial neural networks.
Artificial Neural Networks (ANN)
ANN simulate human thinking and learn from examples. An
ANN consists of nodes called neurodes (which correspond to
neurons) and weighted connections (which correspond to
nerve synapses) that transmit signals between the neurodes in
a unidirectional manner.
Artificial Neural Networks (ANN)
An ANN contains 3 layers, which include the input layer, output
layer, and hidden (middle) layer. The input layer is the data
receiver and the output layer communicates the results, while
the hidden layer processes the incoming data and determines
the results.
Artificial Neural Networks (ANN)
An artificial neural network may be described as a set of
neurons or nodes Xi, each transforming its total or net input
x_ini into an output or activity xi according to an activation
function (or transfer function) f(x_ini). Each node Xi sends its
output to other units Xj through connections each having a
certain effectiveness or weight wij. The net input to any unit xj is
usually modelled as a sum of all the outputs xi from other units
(and, in recurrent nets, from itself), weighted by the weights wij
of the respective connections.
Artificial Neural Networks (ANN)
The most important distinction is that between feed-forward
and feed-back (or recurrent) nets. In the former, the
information is thought to pass just once through the net,
starting in an input layer of units and ending in an output
layer. Between the input and the output layers, hidden layers
of neurons may exist, which as a rule enhances the
computational power of the ANN. Recurrent nets have a
more complicated dynamics, with signals going back and
forth between the nodes for some time.
Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)
Artificial Neural Networks (ANN)
These systems can learn from examples when supplied with
known results for a large amount of data. The system will
study this information, make guesses for the correct output,
compare the guesses to the given results, find patterns that
match the input to the correct output, and adjust the weights
of the connections between the neurodes accordingly, in
order, to produce the correct results. This iterative process is
known as training the artificial network.
Artificial Neural Networks (ANN)
An example with myocardial infarction, for instance, the data
including a variety of signs and symptoms from large
numbers of patients who are known to either have or not
have a myocardial infarction can be used to train the neural
network. Once the network is trained, i.e., once the weighted
associations of signs and symptoms with the diagnosis are
determined, the system can be used on new cases to
determine if the patient has a myocardial infarction.
Artificial Neural Networks (ANN)
There are many advantages and disadvantages to using
artificial neural networks. Advantages include eliminating the
need to program IF–THEN rules and eliminating the need for
direct input from experts. ANNs can also process incomplete
data by inferring what the data should be and can improve
every time they are used because of their dynamic nature.
ANNs also do not require a large database to make
predictions about outcomes, but the more comprehensive the
training data set is, the more accurate the ANN is likely to be.
Artificial Neural Networks (ANN)
Even though all of these advantages exist, there are some
disadvantages. The training process involved can be time
consuming. ANNs follow a statistical pattern recognition
approach to derive their formulas for weighting and
combining data. The resulting formulas and weights are
often not easily interpretable, and the system cannot explain
or justify why it uses certain data the way it does, which can
make the reliability and accountability of these systems a
Artificial Neural Networks (ANN)
Despite the above concerns, artificial neural networks have
many applications in the medical field. In a review article on
the use of neural networks in health care, Baxt provides a
chart that shows various applications of ANNs, which
include the diagnosis of appendicitis, back pain, dementia,
myocardial infarction, psychiatric emergencies, sexually
transmitted diseases, skin disorders, and temporal arteritis.
ANNs can predict which patients are at high risk for cancers
such as oral cancer.
Genetic Algorithm (GA)
GAs were developed in the 1940s by John Holland at the
Massachusetts Institute of Technology, and are based on
the evolutionary theories by Darwin that dealt with natural
selection and survival of the fittest. Just as species change
to adapt to their environment, GAs ‘reproduce’ themselves
in various recombination in an effort to find a new
recombinant that is better adapted than its predecessors..
Genetic Algorithm (GA)
In other words, without any domain-specific knowledge,
components of random sets of solutions to a problem are
evaluated, the best ones are kept and are then
recombined and mutated to form the next set of possible
solutions to be evaluated, and this continues until the
proper solution is discovered. The fitness function is used
to determine which solutions are good and which ones
should be eliminated.
Genetic Algorithm (GA)
GAs are similar to neural networks in that they derive their
knowledge from patient data. Genetic algorithms have also
been applied in health care, but there are fewer examples of
this type of CDSS than those based on neural networks.
However, GAs have proved to be a helpful aid in the
diagnosis of female urinary incontinence. Genetic
Algorithms are explored in medical applications to
characterize patterns and results.
Genetic Algorithm (GA)
For example, optimizing image analysis such as, assessing
classes of cells in blood cell microscope images or for
facilitating magnetic resonance tomography (MRT) treatment
planning and 3D visualization of image data. Genetic
algorithms developed for mammography were adapted for
mining patient’s having abdominal aortic aneurysms by
analyzing abdominal computed tomography (CT) scan reports
for common patterns and features of successful and
unsuccessful surgeries.
Genetic Algorithm (GA)
Genetic algorithms can be used for optimizing
pharmaceutical products. Recently, it was shown that
Genetic Algorithms were able to identify additional anti-
bacterial peptides with a high activity during a study.
Finally, it was shown that Genetic Algorithms enhance the
precision of artificial neural networks (ANNs) such as for
hip-bone fracture prediction or for optimizing efficient search
strategies of ANNs to predict and discriminate pneumonia
within a training group.
Genetic Algorithm (GA)
It is suggested that combining Genetic
Algorithms and artificial neural networks to form
genetic algorithm neural networks (GANNs) is
an important approach for improving the analysis
of medical data.
Measurement of the Operating
Characteristics of Decision
Models (Diagnostic Tests)
Classification of Test Results
True Positive
False Positive
False Negative
True Negative
• A True Positive (TP) is a positive test result obtained for a patient in whom the disease
is present (the test result correctly classifies the patient as having the disease).
• A True Negative (TN) is a negative test result obtained for a patient in whom the
disease is absent (the test result correctly classifies the patient as not having the disease).
• A False Positive (FP) is a positive test result obtained for a patient in whom the disease
is absent (the test result incorrectly classifies the patient as having the disease).
• A False Negative (FN) is a negative test result obtained for a patient in whom the
disease is present (the test result incorrectly classifies the patient as not having the
How to measure the performance of a decision model (test)
To measure the performance of a test for a disease, first perform the
test in patients who are known to have the target condition and in
patients who are known to be free of the target condition (but might
have other diseases). Then, calculate the frequency of a result in
patients with the target condition and in patients who do not have the
target condition. We also need a gold standard test (The procedure that
defines the true state of the patient in a study of test performance. Also
known as ‘‘diagnostic reference standard’’.).
Sensitivity, Specificity, Accuracy, Precision, false-negative rate, and
false-positive rate are calculated to measure the concordance and
discordance between index test (decision model) and disease state.
The ability to detect true positives (really sick).
The likelihood that a diseased patient has a positive test.
In conditional probability notation, the true-positive rate of a
test result is
P[positive test result|disease] or P[+|D]
The true-positive rate (TPR) of a model is
The ability to detect true negatives (really healthy).
The likelihood that a patient that does not have the target
condition has a negative test.
In conditional probability notation, the true-negative rate of a
test result is
P[negative test result|no disease] or P[−| no D]
The true-negative rate (TNR) of a model is
The ability to detect true results (true positives and
true negatives).
 +  +  + 
Precision (Positive Predictive Value)
The ability to detect true positives (really sick) of all positive
PV+ is the fraction of patients with a positive test who also
have the target condition.
Negative Predictive Value
The ability to detect true negatives (really healthy) of
all negative results.
PV− is the fraction of patients with a negative test
result who do not have disease.
False-Negative Rate (FNR)
The likelihood that a patient who have the target
condition has a negative test.
P[negative test result|disease] or P[−|D]
   =  − 
False- Positive Rate (FPR)
The likelihood that a patient that does not have the
target condition has a positive test.
P[positive test result|no disease] or P[+|no D]
   =  − 
Likelihood Ratio
The LR of a test combines the measures of test discrimination to give one
number that characterizes the discriminatory power of a test.
The LR indicates the amount that the odds of disease change based on the
test result. We describe the performance of a test that has only two possible
outcomes (e.g., positive or negative) by two LRs: one corresponding to a
positive test result and the other corresponding to a negative test. These ratios
are abbreviated LR + and LR−, respectively.
Likelihood Ratio

ℎ   (+) =
1 − 
1 − 
ℎ    − =

Likelihood Ratio
The LR of a test combines the measures of test discrimination to give one
number that characterizes the discriminatory power of a test.

ℎ   (+) =
1 − 
1 − 
ℎ    − =

Example 1
One test used to screen blood donors for HIV antibody is an
enzyme-linked immunoassay (EIA). So that the performance of the
EIA can be measured, the test is performed on 400 patients.
Example 2
A mammogram, or breast X-ray, is the diagnostic test used in screening for
breast cancer. Imagine that mammography has a sensitivity of 77% and a
specificity of 95%, and assume that the incidence of breast cancer in the
screening population is 0.6%. If the mammogram is positive, how likely is it that
the patient has breast cancer?
An incidence of 0.6% would mean that in a population of 10 000 there would be 60
cases of cancer. A sensitivity of 77% would mean that out of 60 cases of cancer 46
would be correctly identified. A specificity of 95% would mean that out of 9940
disease-free cases, 9443 would be correctly identified as disease-free, leaving 497
false positives. It follows that a total of 543 cases would be identified as cancer: 497
disease-free cases plus 46 disease cases. If you had a positive mammogram, the
likelihood of your having cancer is therefore 46/543 or around 0.08.
We can do the same calculation using Bayes’ theorem. In the
above data, the probability of a positive mammogram if cancer is
present, p(s|d), is 0.77. The prior probability of cancer, p(d), is
6/1000 = 0.006. The probability of a positive mammogram, p(s),
is 543/10 000 = 0.054.
By Bayes’ theorem,
p(d|s) = p(s|d)  p(d)/p(s)
= 0.77  0.006/0.054
= 0.08.

similar documents