### 6. Dr.Anil Analysis - karpagam faculty of medical sciences and

```Appropriate techniques of statistical
analysis
Anil C Mathew PhD
Professor of Biostatistics &
General Secretary ISMS
PSG Institute of Medical Sciences and
Research
Coimbatore 641 004
Types of studies
•
•
•
•
•
•
•
Case study
Case series
Cross sectional studies
Case control study
Cohort study
Randomized controlled trials
Screening test evaluation
Data analysis-Case series
Measures of averages
• Mean, Median, Mode
• Length of stay for 5 patients
1,3,2,4,5
Mean length of stay 3 days
Median length of stay 3 days
Mode length of stay No mode
Which is the best average
Mean
Median
Mode
DBP
81
79
76
Height
180
180
180
SAL
7.5
7.6
8.1
Data analysis-case series
• Frequency distribution
RBC
Frequency
Relative
frequency
5.95-7.95
1
0.029
7.95-9.95
8
0.229
9.95-11.95
14
0.400
11.95-13.95
9
0.257
13.95-15.95
2
0.057
15.95-17.95
1
0.029
Total
35
1.000
Design of Cohort Study
Time
Direction of inquiry
disease
Exposed
Population
no disease
People without
the disease
disease
Not Exposed
no disease
pregnancy outcomes? Women with a Body
Mass Index > 30 delivering singletons.
Ref- University of Udine, Italy,2006
Preterm Birth
Obese
16
Normal
46
No preterm
birth
35
T=51
487
T=533
%
31.4
8.6
RR=
3.65
Design of Case Control Study
Exposed
Disease
Not Exposed
Exposed
No Disease
Not Exposed
Results of a Case Control Study
Exposed (E+)
Non exposed
(E-)
Totals
Lung
Cancer
(D+)
80 a
No Lung
Cancer
(D-)
30 b
Totals
20 c
70 d
c+d
100 a + c
100 b + d
a+b
Analysis of Case-control study
Odds ratio = a*d/b*c =80*70/30*20 =9.3
Data Analysis-Screening Test Evaluation-Whether the plasma
levels of (Breast Carcinoma promoting factor) could be
used to diagnose breast cancer?
Positive criterion of BCPF >150 units vs. Breast Biopsy (the
gold standard)
D+
BCPF
Test
D-
T+
570
150
720
T-
30
850
880
600
1000
1600
TP = 570
FN = 30
FP = 150
TN = 850
Sensitivity = P (T+/D+)=570/600 =
95%
Specificity = P(T-/D-) = 850/1000 =
85%
False negative rate = 1 – sensitivity
False positive rate = 1 – specificity
Prevalence = P(D+) = 600/1600 =
38%
Positive predictive value = P (D+/T+)
= 570/720 = 79%
and specificity
When the consequences of missing a case
are potentially grave
When a false positive diagnosis may lead to
risky treatment
Data analysis-case series
Measures of variation
Group 1
Group 2
29
25
30
30
31
35
• Range
• Standard deviation
Data analysis- Analytical studies
• Tests of significance
Case Study 1: Drug A and Drug B
• Aim: Efficacy of two drugs on lowering serum
cholesterol levels
• Method: Drug A – 50 Patients
Drug B – 50 Patients
• Result: Average serum cholesterol level is
lower in those receiving drug B than
drug A at the end of 6 months
What is the Conclusion?
A) Drug B is superior to Drug A in lowering
cholesterol levels :
Possible/Not possible
B) Drug B is not superior to Drug A, instead
the difference may be due to chance:
Possible/Not possible
C) It is not due to drug, but uncontrolled
differences other than treatment between
the sample of men receiving drug A and
drug B account for the difference:
Possible/Not possible
D) Drug A may have selectively
cholesterol levels were more refractory
to drug therapy:
Possible/Not possible
Observed difference in a study can
be due to
1) Random change
2) Biased comparison
3) Uncontrolled confounding variables
Solutions: A and B
• Test of Significance – p value
• P<0.05, means probability that the
difference is due to random chance is less
than 5%
• P<0.01, means probability that the
difference is due to random chance is less
than 1%
• P value will not tell about the magnitude of
the difference
Solutions: C and D
• Random allocation and compare the
baseline characteristics
Figure 1
Table 1-Baseline
Characteristics
Characteristic
Vitamin group
(n = 141)
Placebo group
(n = 142)
Mean age ± SD, y
28.9 ± 6.4
29.8 ± 5.6
Smokers, n (%)
22 (15.6)
14 (9.9)
Mean body mass index ± SD, kg/m2
25.3 ± 6.0
25.6 ± 5.6
Mean blood pressure ± SD, mm Hg
Systolic
Diastolic
112 ± 15
67 ± 11
110 ± 12
68 ± 10
Parity, n %)
0
1
2
>2
91 (65)
39 (28)
9 (6)
2 (1)
87 (61)
42 (30)
8 (6)
5 (4)
Coexisting disease, n (%)
Essential hypertension
Lupus/antiphospholipid syndrome
Diabetes
10 (7%)
4 (3%)
2 (1%)
7 (5%)
1(1%)
3 (2%)
“t” Test
Ho: There is no difference in mean birth weight of children from HSE
and LSE in the population
CR = t = | X1 - X2 |
SD 1 + 1
n1 n2
SD = (n1-1)SD12 + (n2-1)SD22
n1 + n2- 2
SD = 14*0.272 + 9*0.222
23
t = | 2.91 – 2.26|
0.25 1 + 1
15 10
DF = n1 + n2 – 2
CAL > Table REJECT Ho
= 0.25
= 6.36
GENERAL STEPS IN
HYPOTHESIS TESTING
1) State the hypothesis to be tested
2) Select a sample and collect data
3) Calculate the test statistics
4) Evaluate the evidence against the null hypothesis
5) State the conclusion
Commonly used statistical tests
• T test-compare two mean values
• Analysis of variance-Compare more than
two mean values
• Chi square test-Compare two proportions
• Correlation coefficient-relationship of two
continuous variables
Data entry format
Treatment
Age
weight
Diabetes
Painscore-b
Painscore-a
Vomiting
1
21
50
1
9
6
0
1
24
53
0
10
9
0
1
25
55
1
9
9
1
1
28
50
0
10
6
1
1
29
60
0
10
5
0
1
20
65
0
10
8
0
0
26
60
0
9
9
0
0
25
90
1
9
9
1
0
24
80
1
9
9
1
0
28
89
0
10
8
1
0
22
86
1
10
9
1
0
22
45
0
10
9
0
Example t test
Body
temperature c
Simple febrile
seizure
N = 25
Febrile without
seizure
N =25
P value
Mean
39.01
38.64
P<0.001
SD
0.56
0.45
Example-Analysis of variance
• Serum zinc level in simple febrile patients
based on duration of seizure occurred
Duration
min
n
Mean
SD
P value
<5
3
10.27
0.25
P <0.001
5 to 10
18
9.02
0.81
>10
4
6.90
0.98
Example Chi-square test
• Characteristics of patients in the two
groups
Duration of
fever (hour)
Simple
febrile
seizure
Febrile
without
seizure
P value
< 24
16
6
P<0.05
More than 24
9
19
Example Correlation
• We found a negative correlation between
serum zinc level and simple febrile seizure
event r = - 0.86 p <0.001
Type 1 and Type 2 Errors
Ho True
Correct decision
Type 2 error
β = P (Type 2 error)
Type 1 error
α = P (Type 1 error)
Correct decision
Accept Ho
Reject Ho
Power = 1- β
Ho False / H1 True
Multivariate problem
• Main outcome
• Continuous variable-Linear regression
• Dichotomous variable-Logistic regression
•
•
•
•
Introduction- Why did you start?
Methods-What did you do?
Results- What did you find?
Discussion- What does it mean?
How to begin writing?
• Data Tables Methods, Results 
Introduction , Discussion  Abstract 
Title, Key words, References
Thank you
```