Chapter 7 – Instrumental Conditioning: Motivational Mechanisms • Outline – The Associative Structure of Instrumental Conditioning • S-R association and the Law of Effect • S-O Association – expectancy of reward • R-O relations in Instrumental Conditioning – Behavioral Regulation • Early Behavioral Regulation theories – Consummatory-Response Theory – The Premack Principle • The Behavioral Bliss Point • What Motivates Instrumental Responding? • Two different perspectives. • 1. The associative structure of instrumental conditioning – A molecular perspective – Similar to the tradition of Pavlov • Relationships among specific stimuli • 2. Behavioral Regulation – A molar perspective – Skinnerian tradition • Concerned with how instrumental conditioning sets limits on the organisms free flow of activity • The associative structure of instrumental conditioning • Thorndike – Instrumental conditioning involves more than just a response and reinforcer • It occurs in a specific context (S) • Three events – 1) Stimulus context (S) – 2) The instrumental response (R) – 3) The response outcome (O) • can be associated in a variety of ways. – Figure 7.1 • The S-R Association and the Law of Effect – Behaviors that are followed by a satisfying state of affairs become more probable. – Behaviors that are followed by an annoying state of affairs become less probable • Thorndike thought that the key association was the S-R association. – The role of the outcome (O) was to stamp in the association between the contextual cues (S) and the instrumental response (R) – instrumental conditioning did not involve learning about the reinforcer (O), or the relationship between R-O. • Thorndike did not believe that animals “knew” why they were running the maze (or pressing the lever) – They don’t “expect” reward. – behaviors were robotic • (stamped in) by O (the reinforcer). • This view was hit pretty hard by the cognitive revolution. • Some resurgence in subcategories of human behavior – Habit formation • Drugs • Infidelity • gambling – Context (S) can induce drug seeking (R) • The important point is that from an S-R perspective the response is automatic – Out of their control • Expectancy of Reward and the S-O association – Clark Hull (1931) Kenneth Spence (1956) • Thought that animals may come to expect reward – Expectancy – perhaps established through Pavlovian Conditioning • Perhaps organisms learn two things about the Stimulus (S) – Two-Process theory • 1) S comes to evoke the response directly by association with R – S-R association » O (RF) stamps in R in the context of S • 2) Instrumental Activity also comes to be made in response to expectancy of reward – S-O association. » S Food » CS US • Modern Two-Process Theory – (Rescorla & Soloman, 1967) • There are two distinct kinds of learning – Pavlovian – Instrumental – They are related, however, in a special way • During Instrumental conditioning – As S-R learning progresses a Pavlovian process kicks in • S becomes associated with O • S (context) --------- O(response outcome) = Emotion – Chamber ------- Food – maze ------------ Shock = Hope = Fear • This S-O association further motivates responding. • Implication – rate of instrumental responding will be modified by the presentation of a classically conditioned stimulus. – Tone Food = hope • Making the tone a CS+ for food • Presentation of a food CS+ while an animal is responding for food RF should increase hope and thus increase response rate • Results Consistent with Modern Two-Process Theory • Pavlovian-Instrumental Transfer Test • Phase 1 – Instrumental training • • Barpress food Phase 2 – Pavlovian training • • • CS – US Tone - Food Phase 3 – Transfer phase – CS from phase 2 is periodically presented to observe its effect on barpressing. • If two process theory is correct when should animals respond the fastest? • Does this procedure look familiar? – Conditioned emotional response – Conditioned suppression • Pavlovian fear conditioning to the tone disrupted Instrumental responding • Thus two-process theory works in either case – Positive emotions increase motivation to respond when good outcome – Negative emotions decrease motivation to respond when bad outcome • R-O Relations – Thorndike’s S-R explanation of instrumental responding and Two-Process theories ignore R-O Relations • Common sense implies that animals may associate outcomes with particular responses – Push button on remote expect visual reward – Open door on fridge expect food reward • Evidence for R-O relations – Outcome devaluation studies • Example: Colwill and Rescorla (1986) – Phase 1 • Train rat to push a vertical rod – Left (VI 60s) = food pellets – Right (VI 60s) = sugar solution – Phase 2 • Devalue food or sugar (depending on rat) – Sugar LiCl – Test • Which way does the rat push the bar? – The response is altered by changing the value of the outcome. • Implies that animals expect that outcome when they make the response. – An R-O relation – Don’t want sugar so make the response associated with food • Behavioral Regulation – This view of instrumental behavior is quite different from the associative account we just discussed. – Does not focus on molecular stimuli • how does reinforcement of responding in the presence of a particular stimuli affect behavior? – The focus is molar • how do instrumental contingencies put limitations on an organisms activity and cause redistributions of those activities? • Early Behavioral Regulation Theories – Consummatory Response Theory • Sheffield • Is it the food that is reinforcing or the behavior (eating) that is reinforcing? – Consummatory responses • Chewing, licking, swallowing • Consummatory responses are special – Represent consumption (or completion) of an instinctive behavior sequence. » Getting food and then consuming it. – fundamentally different from other instrumental behaviors, such as running, jumping, or lever pressing. – A big change in the view of RF • RF no longer a stimulus • RF is a behavior • David Premack – disagreed with Sheffield • consummatory responses are not necessarily more reinforcing than other behaviors • According to Premack – consummatory responses are special only because they occur more often than other behaviors (e.g., lever pressing) – Free environment with a lever and food • A rat that knows nothing about lever pressing (naïve) is likely to spend more time eating than pressing the lever • The Differential Probability Principle – Premack Principle • Of any two responses the more probable response will reinforce the less probable one. – Two responses of different probabilities • H – high likelihood • L – low likelihood – The opportunity to perform H after L will result in reinforcement of L • LH reinforces L – The opportunity to perform L after H will not result in reinforcement of H • HL does not reinforce H • Behaviors that an animal does a lot, will reinforce behaviors that an animal does not perform as much. – strictly empirical. – does not posit that some behaviors are enjoyed more than others. • Simply get a baseline measurement of both activities. – A kid may engage in video game playing behavior quite often, but engage in homework activity much less. • If you make access to the video game contingent on homework activity do you think that home work activity will increase? – Do homework get to play video games? • If you make homework activity contingent on video game activity do you think that video game activity will increase? – Play video games get to do homework? • Empirical Evidence • Premack deprived rats of water – if given a choice between water and running in a wheel the rat would now spend more time drinking water • What if you make water drinking activity contingent on running in a wheel? – The rat runs in the wheel more than it normally would. • What if you could make running in a wheel more valuable than water? – How would you do this? • Allow the rat all the water it wants • Restrict the opportunity in a wheel. • Now make access to the running wheel contingent on drinking water. – what happens? – the rats drink three times as much water as the baseline rate • Premack principle in kids • first graders – eat candy or play pinball • get the baseline – some prefer candy, some prefer pinball • How would Premack increase pinball playing for children who preferred to eat candy? – Make access to candy contingent on playing pinball • Play pinball get candy • How would Premack increase candy eating for children who preferred to play pinball? – Make access to the pinball machine contingent on eating candy • Eat candy get to play pinball • What is nice about Premack’s theory is that it is strictly empirical. – it contains no hypothetical constructs. • No references to unobservables like hunger • No reference to pleasurable vs. nonpleasurable things. • The Behavioral Bliss Point – If we have several activities that we can engage in – we distribute our behavior among those activities in a way that is optimal • The bliss point can be determined like Premack did – Time spent engaging in each activity • Student – Time spent watching TV – Time spent studying • In Figure 7.8 the students Bliss point is to spend much more time watching TV (60m) than studying (15m) • The line in Fig 7.8 represents an instrumental contingency. – Now the student is only allowed to watch TV for the same amount of time that they study – They can no longer achieve the Bliss Point – They will now redistribute their behavior • How do they redistribute? – Must make a compromise – Minimum-deviation model (Staddon) • The rate of one response is brought as close to its preferred level as possible without moving the other response too far away from its preferred level • Filled circle on Fig. 7.8 – 37.5 minutes of each activity » 22.5 more minutes of studying » 15 + 22.5 = 37.5 studying » 22.5 less minutes of TV = 37.5 TV » 60 - 22.5 = 37.5 TV • Application of Bliss-Point to Behavior Therapy – Figure 7.9 • Left to his own devices the child likes a lot of social RF from parents, while eliciting very few positive behaviors – Bliss point • The parents have been trying to RF positive behaviors, so they provide social rewards only after the child has engaged in two positive behaviors (2:1 ratio) – Dotted line • If not going well a therapist might be tempted to tell the parents to RF every positive behavior (1:1 to ratio) – Solid line • Note - the minimum-deviation model actually predicts fewer positive behaviors after RF is increased – The two solid dots • Certainly an important consideration • Things are not always as simple as they seem.