Instrumental Conditioning: Motivational Mechanisms

Chapter 7 – Instrumental Conditioning: Motivational Mechanisms
• Outline
– The Associative Structure of Instrumental
• S-R association and the Law of Effect
• S-O Association
– expectancy of reward
• R-O relations in Instrumental Conditioning
– Behavioral Regulation
• Early Behavioral Regulation theories
– Consummatory-Response Theory
– The Premack Principle
• The Behavioral Bliss Point
• What Motivates Instrumental Responding?
• Two different perspectives.
• 1. The associative structure of instrumental
– A molecular perspective
– Similar to the tradition of Pavlov
• Relationships among specific stimuli
• 2. Behavioral Regulation
– A molar perspective
– Skinnerian tradition
• Concerned with how instrumental conditioning sets limits on
the organisms free flow of activity
• The associative structure of instrumental
• Thorndike
– Instrumental conditioning involves more than just
a response and reinforcer
• It occurs in a specific context (S)
• Three events
– 1) Stimulus context (S)
– 2) The instrumental response (R)
– 3) The response outcome (O)
• can be associated in a variety of ways.
– Figure 7.1
• The S-R Association and the Law of Effect
– Behaviors that are followed by a satisfying state of affairs
become more probable.
– Behaviors that are followed by an annoying state of affairs
become less probable
• Thorndike thought that the key association was the
S-R association.
– The role of the outcome (O) was to stamp in the
association between the contextual cues (S) and the
instrumental response (R)
– instrumental conditioning did not involve learning about the
reinforcer (O), or the relationship between R-O.
• Thorndike did not believe that animals “knew” why they were
running the maze (or pressing the lever)
– They don’t “expect” reward.
– behaviors were robotic
• (stamped in) by O (the reinforcer).
• This view was hit pretty hard by the cognitive revolution.
• Some resurgence in subcategories of human behavior
– Habit formation
• Drugs
• Infidelity
• gambling
– Context (S) can induce drug seeking (R)
• The important point is that from an S-R perspective the response is
– Out of their control
• Expectancy of Reward and the S-O
– Clark Hull (1931) Kenneth Spence (1956)
• Thought that animals may come to expect
– Expectancy
– perhaps established through Pavlovian
• Perhaps organisms learn two things about
the Stimulus (S)
– Two-Process theory
• 1) S comes to evoke the response directly by
association with R
– S-R association
» O (RF) stamps in R in the context of S
• 2) Instrumental Activity also comes to be made in
response to expectancy of reward
– S-O association.
» S  Food
» CS US
• Modern Two-Process Theory
– (Rescorla & Soloman, 1967)
• There are two distinct kinds of learning
– Pavlovian
– Instrumental
– They are related, however, in a special way
• During Instrumental conditioning
– As S-R learning progresses a Pavlovian process kicks in
• S becomes associated with O
• S (context) --------- O(response outcome) = Emotion
– Chamber ------- Food
– maze ------------ Shock
= Hope
= Fear
• This S-O association further motivates responding.
• Implication
– rate of instrumental responding will be modified by the
presentation of a classically conditioned stimulus.
– Tone  Food = hope
• Making the tone a CS+ for food
• Presentation of a food CS+ while an animal is
responding for food RF should increase hope and
thus increase response rate
Results Consistent with Modern Two-Process Theory
Pavlovian-Instrumental Transfer Test
Phase 1
– Instrumental training
Barpress  food
Phase 2
– Pavlovian training
Tone - Food
Phase 3
– Transfer phase
– CS from phase 2 is periodically presented to observe its effect on barpressing.
If two process theory is correct when should animals respond the fastest?
• Does this procedure look familiar?
– Conditioned emotional response
– Conditioned suppression
• Pavlovian fear conditioning to the tone disrupted
Instrumental responding
• Thus two-process theory works in either
– Positive emotions increase motivation to
respond when good outcome
– Negative emotions decrease motivation to
respond when bad outcome
• R-O Relations
– Thorndike’s S-R explanation of instrumental
responding and Two-Process theories ignore R-O
• Common sense implies that animals may
associate outcomes with particular responses
– Push button on remote  expect visual reward
– Open door on fridge  expect food reward
• Evidence for R-O relations
– Outcome devaluation studies
• Example: Colwill and Rescorla (1986)
– Phase 1
• Train rat to push a vertical rod
– Left (VI 60s) = food pellets
– Right (VI 60s) = sugar solution
– Phase 2
• Devalue food or sugar (depending on rat)
– Sugar  LiCl
– Test
• Which way does the rat push the bar?
– The response is altered by changing the value of the
• Implies that animals expect that outcome when they make the
– An R-O relation
– Don’t want sugar so make the response associated with food
• Behavioral Regulation
– This view of instrumental behavior is quite
different from the associative account we just
– Does not focus on molecular stimuli
• how does reinforcement of responding in the
presence of a particular stimuli affect behavior?
– The focus is molar
• how do instrumental contingencies put limitations
on an organisms activity and cause redistributions
of those activities?
• Early Behavioral Regulation Theories
– Consummatory Response Theory
• Sheffield
• Is it the food that is reinforcing or the behavior (eating)
that is reinforcing?
– Consummatory responses
• Chewing, licking, swallowing
• Consummatory responses are special
– Represent consumption (or completion) of an instinctive behavior
» Getting food and then consuming it.
– fundamentally different from other instrumental behaviors, such as
running, jumping, or lever pressing.
– A big change in the view of RF
• RF no longer a stimulus
• RF is a behavior
• David Premack
– disagreed with Sheffield
• consummatory responses are not necessarily
more reinforcing than other behaviors
• According to Premack
– consummatory responses are special only
because they occur more often than other
behaviors (e.g., lever pressing)
– Free environment with a lever and food
• A rat that knows nothing about lever pressing
(naïve) is likely to spend more time eating than
pressing the lever
• The Differential Probability Principle
– Premack Principle
• Of any two responses the more probable response will
reinforce the less probable one.
– Two responses of different probabilities
• H – high likelihood
• L – low likelihood
– The opportunity to perform H after L will result in
reinforcement of L
• LH reinforces L
– The opportunity to perform L after H will not result
in reinforcement of H
• HL does not reinforce H
• Behaviors that an animal does a lot, will
reinforce behaviors that an animal does not
perform as much.
– strictly empirical.
– does not posit that some behaviors are enjoyed
more than others.
• Simply get a baseline measurement of both
– A kid may engage in video game playing behavior
quite often, but engage in homework activity
much less.
• If you make access to the video game
contingent on homework activity do you
think that home work activity will increase?
– Do homework  get to play video games?
• If you make homework activity contingent
on video game activity do you think that
video game activity will increase?
– Play video games  get to do homework?
• Empirical Evidence
• Premack deprived rats of water
– if given a choice between water and running in a wheel the rat would
now spend more time drinking water
• What if you make water drinking activity contingent on running in a
– The rat runs in the wheel more than it normally would.
• What if you could make running in a wheel more valuable than
– How would you do this?
• Allow the rat all the water it wants
• Restrict the opportunity in a wheel.
• Now make access to the running wheel contingent on drinking
– what happens?
– the rats drink three times as much water as the baseline rate
• Premack principle in kids
• first graders
– eat candy or play pinball
• get the baseline
– some prefer candy, some prefer pinball
• How would Premack increase pinball playing for
children who preferred to eat candy?
– Make access to candy contingent on playing pinball
• Play pinball get candy
• How would Premack increase candy eating for
children who preferred to play pinball?
– Make access to the pinball machine contingent on eating
• Eat candy get to play pinball
• What is nice about Premack’s theory is
that it is strictly empirical.
– it contains no hypothetical constructs.
• No references to unobservables like hunger
• No reference to pleasurable vs. nonpleasurable
• The Behavioral Bliss Point
– If we have several activities that we can
engage in
– we distribute our behavior among those
activities in a way that is optimal
• The bliss point can be determined like
Premack did
– Time spent engaging in each activity
• Student
– Time spent watching TV
– Time spent studying
• In Figure 7.8 the students Bliss point is to
spend much more time watching TV (60m)
than studying (15m)
• The line in Fig 7.8 represents an
instrumental contingency.
– Now the student is only allowed to watch TV
for the same amount of time that they study
– They can no longer achieve the Bliss Point
– They will now redistribute their behavior
• How do they redistribute?
– Must make a compromise
– Minimum-deviation model (Staddon)
• The rate of one response is brought as close to its
preferred level as possible without moving the
other response too far away from its preferred level
• Filled circle on Fig. 7.8
– 37.5 minutes of each activity
» 22.5 more minutes of studying
» 15 + 22.5 = 37.5 studying
» 22.5 less minutes of TV = 37.5 TV
» 60 - 22.5 = 37.5 TV
• Application of Bliss-Point to Behavior
– Figure 7.9
• Left to his own devices the child likes a lot of social RF
from parents, while eliciting very few positive behaviors
– Bliss point
• The parents have been trying to RF positive behaviors,
so they provide social rewards only after the child has
engaged in two positive behaviors (2:1 ratio)
– Dotted line
• If not going well a therapist might be tempted to tell the
parents to RF every positive behavior (1:1 to ratio)
– Solid line
• Note - the minimum-deviation model
actually predicts fewer positive behaviors
after RF is increased
– The two solid dots
• Certainly an important consideration
• Things are not always as simple as they seem.

similar documents