Theories of Classical Conditioning

Theories of Classical
Critical CS-US relationship
• Important (critical) things to note about classical conditioning:
– the CS MUST precede the US
– the CS MUST predict the US
– if the CS does not predict the US, no conditioning
– the CR does not have to be identical to the UR
• E.g., subtle differences even Pavlov noticed)
• may even be opposite: Morphine studies
• Any response is a classically conditioned response if it
– occurs to a CS
– after that CS has been paired with a US
– but does NOT occur to a randomly presented CS-US pairing
Theories: WHY do
organisms respond to predictability?
• Pavlov: Stimulus substitutability theory
• Kamin: Surprise theory
• Rescorla and Wagner: Computational Model
Pavlov: Stimulus Substitution Theory
– Basic premise of theory
• w/repeated pairings between CS and US, CS becomes
substitute for the US
• thus, the response initially elicited only by US is now also
elicited by CS
– sounds pretty good:
• salivary conditioning: US and CS both elicit salivation
• eyeblink conditioning: both elicit eyeblinks
– Theory was doing well until we found compensatory
Pavlov: Stimulus Substitution Theory
– Criticisms and Flaws:
• CR is almost never an exact replica of the UR
• an eyeblink to UR of air puff = large, rapid closure
• eyeblink to CS of tone = smaller, more gradual closure
• Defense of theory: Hilgard (1936): Why
differences in CR and UR:
– intensity and stimulus modality of the CS and US are
– Thus: differences in Response magnitude and timing
are to be expected
– But still doesn’t explain OPPOSITE CR
Pavlov: Stimulus Substitution Theory
– whereas many US's elicit several different R's, as a general rule not all of these
R's are later elicited by the CS
• E.g. Zener (1937)
– dog presented w/food as US:
• found that the dog elicited a number of UR responses to the food
• E.g., salivation, chewing, swallowing, etc.
– CS not elicit all of those responses
• NO CRs of chewing and swallowing
• Just the CR of just salivation
• on other hand: CR may contain some of responses that are not part of CR:
– Zener found that dogs turned head to bell
– But no head turns to presentation of food
Modifications of SST
– only some components of UR transferred to CR
– CS such as a bell often elicits unconditioned responses of its own, and
these may become part of CR
• SIGN TRACKING: Hearst and Jenkins 1974
– emphasized this change in form of CR vs. UR
– Also Jenkins, Barrara, Ireland and Woodside (1976)
• Sign Tracking : animals tend to
– orient themselves toward
– approach
– explore any stimuli that are good predictors of important events such
as the delivery of food
Set up:
1. Initial training: Light
turns on above
feederfeeder releases
pieces of hot dog
2. Test:
a. Light turns on above
feeder, then above
each of the other
b. Forms a sequence
of 1234
3. What is optimal response?
Jenkins, Barrara, Ireland and Woodside (1976)
4. But: Dog “tracked the
Modifications of SST
• Strongest data against SST theory: Paradoxical
– CR in opposite direction of UR
• Black (1965):
– heart rate decreases to CS paired w/shock
– US of shock elicits UR of heart rate INCREASE
– But CS of light or tone elicits CR of heart rate DECREASE
• Seigel (1979): conditioned compensatory responses
Morphine studies
evidence of down regulation in addiction
Actual cellular process in neurons (and other cells, too!)
thus SST theory appears incorrect
Perceptual Gating Theory
• Perceptual gating theory:
– Idea that only if CS is biologically relevant will it get
– If a CS doesn’t get processed it can be predictive/informative
– Animals attend to biologically relevant stimuli
• Problem:
– Data show that under certain circumstances a stimulus is
“attended to” or “processed”, but still does not serve as a CS
with an accompanying CR
– Issue remains: is the stimulus the most predictive?
– Second issue: Defining “biologically relevant”
Kamin’s work: 1967-1974
Blocking and overshadowing
• Overshadowing:
use one "weak" and one "strong" CS
reaction to weaker stimulus is blotted out by stronger CS
Demonstrated by Pavlov
• Blocking:
– Train 1 CS, then add a second CS to it:
• CS1 US
• CS1+CS2US
– test each individually after training
– Find that only one supports a CR
– One stimulus “blocks” learning to second CS
– Demonstrated by Kamin
Kamin’s blocking experiment
• used multiple CS's and 4 groups of rats
• the blocking group receives
– series of L+ trials which produce strong CR
– series of L+T+ trials
– then tested to just the T
• control groups receives
– no first phase
– L+ only; Test T
– T+ only; Test T
– LT+ only: Test T
Kamin’s blocking experiment
• prediction: since both received same # of trials to the tone- should
get equal conditioning to the tone
• results quite different: Blocking group shows no CR to the tone- the
prior conditioning to the light "blocked" any more conditioning to
the tone
• directly contradicts frequency principle (remember associationism!)
Phase I
Phase II
Test Phase Result
T elicits no CR
T elicits CR
T elicits a CR
T elicits no CR
Things we know about blocking:
the animal does "detect" the stimulus:
– can’t be perceptual gating issue
– EXT of CR with either T alone or with LT
– EXT occurred faster with compound LT
• appears to be independent of:
– length of presentation of the CS
– number of trials of conditioning to compound CS
• constancy of US from phase 1 to 2 important!!!!
– US must remain identical between the two phases or no blocking
• influenced by:
– Type of CR measure (used CER, not as stable as non fear CR)
– nature of CS may be important- e.g. modality
– intensity of CS or US stimuli important
• depends on amount of conditioning to blocking stimulus which already
Change in either US or CS can
prevent/ overcome blocking
• Change the intensity of the CS from phase 1 to phase 2
Overshadowing could be playing a role
strong vs weak stimulus
e.g. experiments when changed from 1 ma to 4 ma shock
quickly condition to compound stimulus
little or no overshadowing or blocking
• Change in intensity of either CS stimulus– Change in context from Phase 1 to Phase 2
• lT
• Lt
then T
then T
– presents a different learning situation and no blocking
• Any ideas about what is happening?
Explanations of Blocking:
Poor Explanation: Perceptual gating theory:
– tone never gets processed
– tone not informative
– data not really support this (evidence that do “hear” tone)
Good Explanation: Kamin's Surprise theory:
to condition requires some mental work on part of animal
animal only does mental work when surprised
bio genetic advantage: prevents having to carry around excess mental baggage
thus only learn with "surprise"
situation must be different from original learning situation
Better Explanation: Rescorla Wagner model:
– particular US only supports a certain amount of conditioning
– if one CS “hogs” all that conditioning- none is left over for another CS to be added
– question- how do we show this?
Recorla: Which is more important?
CS-US correlation vs. contiguity
• CS-US contiguity:
– CS and US are next to one another in time/space
– In most cases, CS and US are continguous
• CS-US correlation: CS followed by the US in a
predictive correlation:
• If perfect correlation (most predictive)- most conditioning
• p(US/CS) = 1.0
• p(US/no CS) = 0.0
• But: life not always a perfect correlation
CS-US correlation is more critical
• Rescorla (1966, 1968): Showed how 2 probabilities
interact to determine size of the CS
– CS = 2 min tone; presented at random intervals (M = 8 min)
– for: Group 1: p(shock/CS) = 0.4 during 2 min presentation
– For Group 2: p(shock/no CS) = 0.2
• Which group should show more conditioning?
• WHY?
Robert Rescorla (1966)
Examined predictability 6 types of Groups
• CS-alone
– present CS alone with no US pairing
– problem: not have same number of US trials as
experimental animals do, may actually be extinction
• Novel CS group:
– looks at whether stimulus is truly "neutral"
– may produce habituation- animal doesn't respond
because it "gets used to it"
• US-alone
– present US alone with no CS pairing
– problem: not have same number of CS trials
Rescorla: 6 types of control groups
• Explicitly unpaired control
– CS NEVER predicts US
– that is- presence of CS is really CS-, predicts NO US
– animal learns new rule: if CS, then no US
• Backward conditioning:
– US precedes CS
– assumes temporal order is important (but not able to
explain why)
– again, animal learns that CS predicts no US
• Discrimination conditioning (CS+ vs CS-)
– use one CS as a plus; one CS as a minus
– same problem as explicitly unpaired and backwardworks, but can work in certain circumstances (taste
Rescorla: Results with 6 Groups
• CS-alone: No conditioning, but habituation to CS
• Novel CS group: novel worked better than CS with previous experience.
• US-alone: habituation to CS
• Explicitly unpaired control:
– Got GREAT conditioning
– Learned that the CS NEVER predicts the US!
• Backward conditioning:
– US preceded CS
– assumed temporal order is important
– It was: Animal learned that CS predicts NO US, but US predicted CS
• Discrimination conditioning (CS+ vs CS-)
– use one CS as a plus; one CS as a minus
– Got discrimination
– Animals paid attention to whatever stimulus was MOST PREDICTIVE!
CS-US correlation: Summary of Results
• whenever p(US/CS) > p(US/NO cs):
– that is, CS predicts US
– amount of learning depended on size difference between p(US/CS) and
p(US/no CS)
• whenever p(US/CS) <p(US/NO CS):
– CS predicts ABSENCE of US
– amount of learning depended on size difference between p(US/CS) and
p(US/no CS)
• whenever p(US/CS) = p(US/NO cs):
– CS doesn’t predict or not predict CS
– no learning will occur because there is no predictability.
CS-US correlation vs. contiguity
• Thus: appears to be the CORRELATION
between the CS and US, not the contiguity
(closeness in time) that is important
• Can write this more succinctly:
– correlation carries more information
– if r = + then excitatory CS
– if r = - then inhibitory CS
– if r = 0 then neutral CS (not really even a CS)
Classical condition is “cognitive”
(oh the horror of that statement, I am in pain)
• PREDICTABILITY is critical
• Learning occurs slowly, trial by trial
– Each time the CS predicts the US, the strength of the correlation is
– The resulting learning curve is monotonically increasing:
• Initial steep curve
• Levels off as reaches asymptote
– There is an asymptote to conditioning to the CS:
• Maximum amount of learning that can occur
• Maximum amount of responding that can occur to CS in anticipation of the
upcoming US
• We can explain this through an equation!
Answers to
Blocking and Overshadowing
• Overshadowing:
– use one "weak" and one "strong" CS
– reaction to weaker stimulus: less CR
– Reaction to stronger stronger stimulus: more CR
• Blocking:
– What is being predicted
– Does LT give any more information/predictability than
L alone?
– If not, then L “blocks” learning to LT
Assumptions of
Rescorla-Wagner (1974) model
• Model developed to accurately predict and map learning as it occurs trial
by trial
• Assumes a bunch of givens:
– Assume animal can perceive CS and US, and can exhibit UR and CR
– Helpful for the animal to know 2 things about conditioning:
• what TYPE of event is coming
• the SIZE of the upcoming event
• Thus, classical conditioning is really learning about:
– signals (CS's) which are PREDICTORS for
– important events (US's)
Assumptions of R-W model
• assumes that with each CS-US pairing 1 of 3 things can happen:
– the CS might become more INHIBITORY
– the CS might become more EXCITATORY
– there is no change in the CS
• how do these 3 rules work?
– if US is larger than expected: CS = excitatory
– if US is smaller than expected: CS= inhibitory
– if US = expectations: No change in CS
• The effect of reinforcers or nonreinforcers on the change of associative
strength depends upon:
– the existing associative strength of THAT CS
– AND on the associative strength of other stimuli concurrently present
More assumptions
• Explanation of how an animal anticipates what type of CS is coming:
– direct link is assumed between "CS center" and "US center":
• e.g. between a tone center and food center
• In 1970’s: other researchers thought R and W were crazy with this idea
• Now: neuroscience shows formation of neural circuits!
– assumes that STRENGTH of an event is given
• the conditioning situation is predicted by the strength of the learned
– THUS: when learning is complete:
• the strength of the association relates directly to the size or intensity of the CS
• Asymptote of learning = max learning that can occur to that size or intensity of
a CS
• Maximum amount of learning that a given CS can support
More assumptions
• The change in associative strength of a CS as the result of
any given trial can be predicted from the composite
strength resulting from all stimuli presented on that trial:
– Composite strength = summation of conditioning that
occurs to all stimuli present during a conditioning trial
– if composite strength is LOW:
• the ability of reinforcer to produce increments in the strength of
component stimuli is HIGH
• More can be learned for this trial
– if the composite strength is HIGH:
• reinforcement is relatively less effective (LOW)
• Less can be learned for this trial- approaching max of learning
More assumptions:
• Can expand to extinction, or nonreinforced trials:
– if composite associative strength of a stimulus
compound is high, then the degree to which a
nonreinforced presentation will produce a
decrease in associative strength of the
components is LARGE
– if composite associative strength is lownonreinforcement effects reduced
The Equation!:
Yields an equation: THE Rescorla Wagner (1974) model!!!!!
Vi =αißj(Λj-Vsum)
Vi = amount learned (conditioned) on a given trial
Αi = the salience of the CS
ßj = the salience of the US
(Λj-Vsum) = total amount of conditioning that can occur to a particular CS-US pairing
What does this equation say?
The amount of conditioning that will occur on a given trial is a function of:
• The size of the salience of the CS multiplied by
• The size of the salience of the US multiplied by
• (The maximum amount of learning minus the amount of learning that has already occurred).
Let’s use this in an example:
First example:
A rat is subjected to conditioned suppression procedure:
CS (light) ---> US (1 mA shock)
Question: what is associative strength?
1 = associative strength that a 1mA shock can support at asymptote ( λ j )
(I am arbitrarily setting this value for easy math)
So, we will say that the associative strength of a 1 mA shock = 100 units of association/learning
VL = associative strength of the light (strength of the CS-US association)
thus: λ 1 = animal’s maximum reaction to the size of the observed event (actual shock)
VL = measure of the Subjects current "expectation" about the light predicting the light.
VL will approach λ 1 over course of conditioning: VL = λ 1
First trial: CSL USshock
• CS (light+tone) --> 1 mA shock on trial 1 (no previous pairing)
– Λj = max amount of conditioning that can occur to the
CSL : Let’s set it at 100
– Vsum = assoc. strength of all paired trials so far (0)
– Can set αi = 0.5
– Can set ßj = 1.0
VL = αißj(Λj-Vsum) just plug in numbers
– VL = 0.5*1.0(100-0) = 50 units of conditioning/learning
Second example: 2CS's:
• CS (light+tone) --> 1 mA shock
– Vsum = VL + VT = assoc. strength of the 2 CS's
– (still 0 on trial 1)
– Vsum = αißj(λn)
– if VL and VT equally salient:
• VL = 0.5αißj;
• VT = 0.5αißj
– VT = 0.5*0.5*(100-0) = 25 units of learning
WHY is this equation important?
• We can use the three rules to make predictions about amount
and direction of classical conditioning
• λ j > Vsum = excitatory conditioning
– The degree to which the CS predicted the size of the US was GREATER
than expected, so you react MORE to the CS next trial
• λ j < Vsum = inhibitory conditioning
– The degree to which the CS predicted the size of the US was LESS than
expected, so you react LESS to the CS next trial
• λ j = Vsum = no change:
– The CS predicted the size of the US exactly as you expected
Now have the Rescorla-Wagner Model:
• Model makes predictions on a trial by trial
• for each trial: predicts increase or decrement
in associative strength for every CS present
• Can specify amount and direction of the
change in conditioning!
Now have the Rescorla-Wagner Model:
• Restate the equation: Vi =αißj(λ j -Vsum)
• Vi = change in associative strength that occurs for any CS, i,
on a single trial
• λ j= associative strength that some US, j, can support at
• Vsum = associative strength of the sum of the CS's (strength of
CS-US pairing)
• αi = measure of salience of the CS (must have value between
0 and 1)
• ßj = learning rate parameters associated with the US
(assumes that different beta values may depend upon the
particular US employed)
Can say this easier!
• How much you will learn on a given trial (Vi) is a
function of:
– αi or how good a stimulus the CS is (how well it grabs
your attention)
– ßj or how good a stimulus the US is (how well it grabs
your attention
– Λj or how much can learning can be learned about the
CS-US relationship
– AND Vsum or how much you have learned ALREADY!
Okay, you got all that?
Let’s put this baby to work……..
…….we will try a few examples
The equation: Vi =αißj(λ j-Vsum)
Vi = change in associative strength that occurs for any CS, i, on a single trial
αi = stimulus salience (assumes that different stimuli may acquire associative
strength at different rates, despite equal reinforcement)
ßj = learning rate parameters associated with the US (assumes that different beta
values may depend upon the particular US employed)
Vsum = associative strength of the sum of the CS's (strength of CS-US pairing)
λ j= associative strength that some CS, i, can support at asymptote
In English: How much you learn on a given trial is a function of the value of the
stimulus x value of the reinforcer x (the absolute amount you can learn minus the
amount you have already learned).
first conditioning trial: Assume (our givens)
– CS = light; US= 1 ma Shock
– Vsum = Vl; no trials so Vl = 0
– thus: λ j-Vsum = 100-0 = 100
– -first trial must be EXCITATORY
• BUT: must consider the salience of the light:
– αi = 1.0
– ßj = 0.5
first conditioning trial: CS = light; US= 1 ma Shock
– Vsum = Vl; no trials so Vl = 0
– thus: λ j-Vsum = 100-0 = 100
– -first trial must be EXCITATORY
• BUT: must consider the salience of the light: αi = 1.0 and
learning rate: ßj = 0.5
• Plug into the equation: for TRIAL 1
– VL = (1.0)(0.)(100-0) = 0.5(100) = 50
– thus: V only equals 50% of the discrepancy between Aj an
Vsum for the first trial
• Plug into the equation:
–for TRIAL 1
–VL = (1.0)(0.)(100-0) = 0.5(100) = 50
–thus: VL only approaches 50% of
the discrepancy between Aj and
Vsum is learned for the first trial
• TRIAL 2:
– Same assumptions!
– VL = (1.0)(0.5)(100-50) = 0.5(50) = 25
– Vsum = (50+25) = 75
• TRIAL 3:
– VL = (1.0)(0.5)(100-75) = 0.5(25) = 12.5
– Vsum = (50+25+12.5) = 87.5
• TRIAL 4:
– VL = (1.0)(0.5)(100-87.5) = 0.5(12.5) = 6.25
– Vsum = (50+25+12.5+6.25) = 93.75
• TRIAL 10: Vsum = 99.81, etc., until reach ~100 on approx. trial
• When will you reach asymptote?
R-W explains 1 CS learning
Amt of learning
learning to Vlight
Total amount learned (Vsum)
How to explain overshadowing?
Yep, it is good old Rescorla-Wagner
to the rescue!
Remember Overshadowing
• Pavlov: compound CS with 1 intense CS, 1 weak
– after a number of trials found: strong CS elicits
strong CR
– Weak CS elicits weak or no CR
• Note: BOTH CSs are presented at same time
– Why would one over shadow or overpower the
– Why did animal not attend equally to both?
• Rescorla-Wagner model helps to explain why:
• Assume
– αL = light = 0.2; αT = tone = 0.5
– ßL = light = 1.0 ; ßt = tone = 1.0
• Plug into equation:
– Vsum = Vl + Vt = 0 on trial 1
– VL = 0.2(1)(100-0) = 20
– Vt = 0.5(1)(100-0) = 50
– after trial 1: Vsum = 70
– VL = 0.2(1)(100-(50+20)) = 6
– Vt = 0.5(1)(100-(50+20)) = 15
– Vsum = (70+(6+15)) = 91
• TRIAL 3:
VL = 0.2(1)(100-(91)) = 1.8
Vt = 0.5(1)(100-(91)) = 4.5
Vsum = (91+(1.8+4.5)) = 97.3 and so on
thus: reaches asymptote (by trial 6) MUCH faster w/2 CS's
• NOTE: CSt takes up over 70 units of assoc. strength CSl takes up only
30 units of assoc. strength
R-W explains 2 CS learning
Amt of learning
Vsum for light
V sum for tone
V sum total
• Similar explanation to overshadowing:
– Does not matter whether VL has more or
less saliency than Vt,
– CS has basically absorbed all the associative
strength that the CS can support
• Why?
• give trials of A-alone to asymptote:
– reach asymptote: VL = λ j =100 =Vsum
NOW add trials to compound stimuli:
– CS of the light has salience: αL =.5465
– CS of tone has salience of: ßt =0.464
– Note that CStone has higher salience!
– Eh, oh, the math is going to be TOO HARD to
• Or IS the math to hard to do?
• First compound V1 Trial:
• Vt= αß(Λj-Vsum)
• What is Vsum after the training to the CS light?
• That’s right Vsum = ___________
• Vt=0.*1.0*(100-100)= _____________
• No learning!
How could one eliminate blocking effect?
• increase the intensity of the US to 2 mA
with λ j now equals = 160
– Learning so far: Vsum still equals 100
(learned to 1 mA shock)
– But now: TOTAL learning is increased to
How could one eliminate blocking effect?
• plug into the equation:
• (assume Vl and Vt equally salient)
– Vt = 0.2(1)(160-100) = 0.2(60) = 12
– Vl = 0.2(1)(160-100) = 0.2(60) = 12
– Vsum = 100+12+12 =124
How could one eliminate blocking effect?
• on trial 2:
– Vsum = 124
– Vt = 0.2(1)(160-124) = 0.2(36) = 7.2
– Vl = 0.2(1)(160-124) = 0.2(36) = 7.2
– Vsum now = (124+14.4) = 138.
– Again, monotonically increasing curve.
• Thus, altering the salience of the US alters the learning
• Does altering the CS make the same change?
Can also explain why probability of reward given
CS vs no CS makes a difference:
π = probability of US given the CS or No US given No CS
can make up three rules:
– if πax > πa then Vx should be POSITIVE
– if πax < πa then Vx should be NEGATIVE
– if πax = πa then Vx should be ZERO
modified formula: (assume λ1 =1.0; λ 2 =0; ß1 =.10; ß2=.05; α1=.10; α2=.5)
Πa = probability of reward.
Explaining loss of associate value
despite pairings with the US:
• R-W model makes a unique prediction:
Conditioned properties of stimuli can
DECREASE despite continued pairings with the
• Lose associative value if presented together
on conditioning trial after they have been
trained separately
Explaining loss of associate value
Phase 1
I pellet
Phase 2
Phase 3 TEST
I pellet
I pellet
• At the end of Phase 1: VAand VB= Ʌ; both equally and perfectly predict 1
• Phase 2: Compound stimuli with same US
• No change in US
• Should VAand VB remain unchanged?
• But animal interprets differently: VAand VB=2 Ʌ
• Animal is surprised (disappointed): get suppression to A and B in Phase 3
Conditioned Inhibition
• Two kinds of trials:
– CS+: CS predicts US
– CS+ and CS-: predicts NO US
• Must consider CS+ and CS+ & CS- trials separately:
– CS+: pairs CS+  US, V+ approaches Ʌ
• Excitatory conditioning ceases as V+ approaches Ʌ
– On Non reinforced trials: CS+ and CS•
No excitatory conditioning to CS+, but disappointment
BUT: inhibitory conditioning to CSValue of CS+ + CS- must sum to 0 to get inhibition
CS- value is then NEGATIVE: CS+ - CS- = 0
Extinction of excitation and inhibition
• V for CS+ has reached Ʌ
– Now begin presenting CS+ without US
• CS+ begins to lose its excitatory value
– V for CS+ will approach 0
Critique of the Rescorla-Wagner Model:
• R-W model really a theory about the US
– says nothing about CS effectiveness
• How WELL a CS predicts as a combo of salience and
– states that an unpredicted US is effective in
promoting learning, whereas a well-predicted US is
• Reason has to do with brain processing of all of
Critique of the Rescorla-Wagner Model:
• Fails to predict the CS-pre-exposure effect:
– two groups of subjects (probably rats)
– Grp I
CS-US pairings
– Grp II CS alone
CS-US pairings PRE-Expos
• Bob and Tom effect
– Bob always hangs with Tom
– You are dating Tom
– You have a BAAAAAD breakup with Tom
– Now you hate Bob….why?
Critique of the Rescorla-Wagner Model:
• In pre-exposure effect, simply being around a neutral stimulus
alters its ability to become conditioned
• Original R-W model doesn't predict any difference,
– Assumes no conditioning trials occur when CSs presented
in absence of US so Vsum = 0
– This appears to be wrong
• Conditioning likely occurring any time 2 stimuli are together
– Form an incidental association
– Need to modify the equation to account for this
– They have, but we won’t!
Critique of the Rescorla-Wagner Model:
• Original R-W model implies that salience is fixed for any given CS
– R-W assume CS salience doesn't change w/experience
– these data strongly suggest CS salience DOES change w/experience
• Newer data supports changes salience
– data suggest that Salience to a CS DECREASES when CS is repeatedly
presented without consequence
– CS that is accidentally paired with another CS INCREASES in salience
– NOW: appears that CS and US effectiveness are both highly important
• Model has stood test of time, now widely used in neuroscience
• Given birth to attentional models of CC
Attentional Models of CC
• Alternative focus: how well the CS commands attention
– Assumes that increased attention facilitates learning about
a stimulus
– Procedures that disrupt attention to CS disrupt learning
• Different attentional models differ in assumptions
about what determines how much attention a CS
commands on any given trial
– Single attentional mechanisms: Kamin’s surprise
– Multiple attentional mechanisms:
Multiple attentional mechanisms:
• Three attentions:
– looking for action: attention a CS commands after it has become
a good predictor of the CS
– Looking for learning: how well the organism processes cues that
are not yet good predictors of the US, and thus have to be
“learned about”
– Looking for liking: the emotional/affective properties of the CS
• Assume that the outcome of a given trial alters the degree
of attention commanded by the CS on future trials
– Surprise? Then an increase in looking for learning on next trial
– Pleasant outcome? Increases emotional value of CS on next trial
Timing and Information Theory
• Recognized that time is important factor in CC
– Focal search responses become conditioned when
CS-US interval is short
– General search responses become conditioned
when CS-US interval is long
– Suggests that organisms learn both
• What is predicted
• WHEN what is predicted will occur
Temporal coding hypothesis
• Organisms learn when the US occurs in relation
to the CS
• Use this information in blocking, second-order
conditioning, etc.
• What is learned in one phase of training
influences what is learned in subsequent phase
• Large literature supports this
Importance of Inter-trial interval
More conditioned responding observed with longer inter-trial interval
– Intertrial interval and CS duration (CS-US interval) act in combination to determine responding
– Critical factor: relative druation of these two temporal intervals rather than absolute value of
either one by itself
Holland (2000)
– Conditioned rats to an autidory cue that was presented just before delivery to food
– CR to CS: nosing of food cup (goal tracking)
– Each group conditioned with
1 of 2 CS durations: 10 or 20 sec
1 of 6 intertrial intervals: 15 to 960 sec
– Characterized responses in terms of ratio of the intertrial interval (I) and the CS duration (T).
Time spent nosing the food cup during CS plotted as function of relative value of I/T
– Results: as IT ratio increases, the percentage of time the rats spend with the nose in the food
cup increases
Importance of Inter-trial interval
• Relative Waiting Time Hypothesis
– Organism making comparison between events during the I and T
– How long one has to wait for the US during the CS vs. how long one
has to wait for the US during the intertrial interval
• When US waiting time during CS is shorter than intertrial interval:
– I/T ratio is high and CS is highly informative about the next occurrence
of the US
– Lots of responding
• When US waiting time during CS is same or longer than intertrial
interval wait:
– I/T ratio is low, CS is not highly informative
– Less responding
Comparator Hypothesis
• Comparator hypothesis assumes that animal compares what
happens in one situation to what happens in another: animal
COMPARES expectations across settings
• Revaluation effects: e.g. in blocking
– Not that can’t learn to second CS, but that responding is blocked to
– Can get responding to CS2 by presenting alone, with out the US!
• Anytime there is a change in the predictive value of a CS the
organism will re-evaluate its value
• Result is a disruption in responding to the changed CS
Comparator Hypothesis
• Note that this model is a PERFORMANCE model
– It is not what is learned, but what is performed that is
– Organism compares cues that may occur in various
settings and alters responding depending on value of
the cues in a given setting
• Not changing excitatory value of the US, but
comparing the value of the predictive CSs for that
Comparator Models
Model assumes organism learns three associations during course of conditioning:
1. Association between target CS and US
2. Association between the target C S and the comparator cues
3. Association between comparator stimuli and the US
Comparison between the direct and indirect activations determines the degree of excitatory
or inhibitory responding
Comparator model predictions
• The comparison between the CS-US and the comparator-US associations
at testing are important:
• Allows prediction that extinction of comparator-US associations following
training of a target CS will enhance responding to that CS
• Thus, in blocking, extinction of CSA will unmask conditioned responding to
• Not that responding to CSB was blocked, but that it was masked because,
– when comparing CSA to CSA+CSB, the compound CSs provided no increased
– Only when lessen predictiveness of CSA does CSB become “important”
• Organism responds to the BEST predictor under the circumstances!
Dopamine and
Rescorla Wagner Model
• Turns out that changes in dopamine (DA) levels in dorsal striatal limbic
cortical pathway vary as we learn
• And guess what: these levels can be predicted by the RW model!
• But, once a CS-US pairing (or an operant R-SR pairing) become well
learned, the circuit begins to involve lower parts of the brain
– Circuit begins to involve basal striatal areas
– Becomes an “automated” or mastered behavior
– No longer involves being “surprised”; is the most robust predictor amongst
• A response to another CS will occur along the DA pathway if the CS-US
relation change!!!!!
– Change in the conditional value of a CS

similar documents