### Introduction to data handling

```Data Domains and
Introduction to Statistics
Chemistry 243
Instrumental methods and
what they measure
Electromagnetic
methods
Electrical methods
Instruments are translators

Convert physical or chemical properties that
we cannot directly observe into information that
we can interpret.
P
T
P0
A   bc   log T
P
 log
P0
c
b
Sometimes multiple
translations are needed

Thermometer


Bimetallic coil converts temperature to
physical displacement
Scale converts angle of the pointer to an
observable value of meaning
adapted from C.G. Enke, The Art and
Science of Chemical Analysis, 2001. 
Thermostat: Displacement
used to activate switch
http://static.howstuffworks.com/gif/home-thermostat-thermometer.jpg
Components in translation
Data domains

Information is
encoded and
transferred
between domains

Non-electrical
domains


Beginning and end of
a measurement
Electrical domains

Intermediate data
collection and
processing
Data domains
Initial
conversion
device
Intermediate
conversion
device
conversion
device
Often viewed on a GUI
(graphical user interface)
PMT
Resistor
Digital
voltmeter
Electrical domains

Analog signals



Magnitude of voltage, current, charge, or power
Continuous in both amplitude and time
Time-domain signals

Time relationship of signal fluctuations



(not amplitudes)
Frequency, pulse width, phase
Digital information



Data encoded in only two discrete levels
A simplification for transmission and storage of
information which can be re-combined with great
accuracy and precision
The heart of modern electronics
Digital and analog signals

Analog signals



Magnitude of voltage, current, charge, or power
Continuous in both amplitude and time
Digital information

Data encoded in only discrete levels
Analog to digital to conversion

Limited by bit resolution of ADC



4-bit card has 24 = 16 discrete binary levels
8-bit card has 28 = 256 discrete binary levels
32-bit card has 232 = 4,294,967,296 discrete binary levels



Common today
Maximum resolution comes from full use of ADC
voltage range.


More bits is usually slower
More expensive
K.A. Rubinson, J.F. Rubinson, Contemporary Instrumental Analysis, 2000.
Byte prefixes
Serial and parallel binary
encoding
Slow – not digital; outdated
(serial)
Fast – between instruments
“serial-coded binary” data
Binary Parallel:
Very Fast – within an instrument
“parallel digital” data
Introductory statistics



Statistical handling of data is incredibly
important because it gives it significance.
The ability or inability to definitively state that
two values are statistically different has
profound ramifications in data interpretation.
Measurements are not absolute and robust
methods for establishing run-to-run
reproducibility and instrument-to-instrument
variability are essential.
Introductory statistics:
Mean, median, and mode

Population mean (m): average value of replicate data
N
x
i
m  lim
N 



i 1
N

x1  x2  x3  ...xN 
N
Median (m½): ½ of the observations are greater; ½ are
less
Mode (mmd): most probable value
For a symmetrical distribution:
m1/ 2  mmd  m

Real distributions are rarely perfectly symmetrical
Statistical distribution

Often follows a Gaussian functional form
Introductory statistics:
Standard deviation and variance

Standard deviation (s):
N
s

2
x

m


 i
lim
i 1
N 
N
Variance (s2):
N
s 2  lim
N 

 xi  m 
i 1
N
2
Gaussian distribution

Common distribution with well-defined stats



y
68.3% of data is within 1s of mean
95.5% at 2s
99.7% at 3s
1
s 2
 x  m 
e
2s 2
2
Statistical distribution


50 Abs measurements of an identical sample
Let’s go to Excel
Table a1-1,
Skoog
But no one has
an infinite data set …

N
x
i
x
i 1
N
N
s
2
x

x


i

i 1
N 1
N
x  x 
i
s2 
i 1
N 1
2
Standard deviation and
variance, continued

s is a measure of precision (magnitude of
indeterminate error)
2
s total
 s12  s 22  s 32  ...s n2

Other useful definitions:

Standard error of mean
sm 
s
N
Confidence intervals

In most situations m cannot be determined


Would require infinite number of measurements
Statistically we can establish confidence interval
around x in which m is expected to lie with a
certain level of probability.
Calculating confidence
intervals


We cannot absolutely
determine s, so when s is
not a good estimate (small
# of samples) use:
Note that t approaches z as
N increases.
2-sided t values
Example of confidence interval
determination for smaller number of
samples

Given the following values for
serum carcinoembryonic acid
(CEA) measurements,
determine the 95% confidence
interval.




or
16.9 ng/mL, 12.7 ng/mL,
15.3 ng/mL, 17.2 ng/mL
Sample mean = 15.525 ng/mL
s = 2.059733 ng/mL
Answer: 15.525 ± 2.863, but when you consider sig figs
you get: 16 ± 3
Propagation of errors

How do errors at each
set contribute to the
final result?
x  f  p, q, r...
dxi  f  dpi , dqi , dri ...
 x 
 x 
 x 
dx    dp    dq    dr  ...
 r v
 p v
 q v
s x2
 x  2  x  2  x  2
   s p    sq    sr  ...
 r 
 p 
 q 
```