Partial reinforcement
Skinner, who completely rejected the theoretical law of effect, devoted several years of research (e.g. Ferster & Skinner, 1957) to exploring and demonstrating the power of the empirical law. He worked mostly with pigeons, trained in a Skinner box to peck a disc set in the wall for food reinforcement. Skinner investigated the effects of partial reinforcement [partial reinforcement the delivery of a reinforcer in operant conditioning is scheduled to occur after only a proportion of the responses rather than after all of them (continuous reinforcement)], in which food was presented after some responses but not all. Animals will usually respond well in these conditions, and with some schedules of reinforcement [schedules of reinforcement rules that determine which responses will be followed by a reinforcer in operant conditioning] the rate of response can be very high indeed. If, for example, the animal is required to respond a certain number of times before food is delivered (known as a fixed ratio schedule), there will usually be a pause after reinforcement, but this will be followed by a high frequency burst of responding.
Other ways of scheduling reinforcement control different but equally systematic patterns of response. There is a clear parallel here between the pigeon responding on a partial reinforcement schedule and the human gambler who works persistently at a one-armed bandit for occasional pay-outs.
According to the theoretical version of the law of effect, the only function of the reinforcer is to strengthen a connection between the response (R) that produced that reinforcer and the stimulus (S) that preceded the R. It follows that an S–R learner does not actively know what the consequence of the R will be, but rather the response is simply triggered based on previous contingencies. In other words, the rat in the Skinner box is compelled in a reflex-like fashion to make the R when the S is presented and it is presumed to be as surprised at the delivery of the food pellet after the hundredth reinforced response as it was after the first. Not only is this an implausible notion, but experimental evidence disproves it. The evidence comes from studies of the effects of reinforcer revaluation on instrumental performance. The results of one such study, a first stage of training, rats were allowed to press the lever in a Skinner box 100 times, each response being followed by a sugar pellet. Half the animals were then given a nausea-inducing injection after eating sugar pellets – a flavour-aversion learning procedure. As you might expect, these rats developed an aversion to the pellets, so the reinforcer was effectively devalued.
In the subsequent test phase, the rats were returned to the Skinner box and allowed access to the lever (although no pellets were now delivered). The researchers found that rats given the devaluation treatment were reluctant to press the lever, compared with the control animals. This result makes common sense – but no sense in terms of the theoretical law of effect. According to the strict interpretation of the law of effect, an S–R connection would have been established at the end of the first stage of training by virtue of the reinforcers that followed responding, before the nausea-inducing injection was administered. Subsequent changes in the value of this reinforcer (which, according to the theory, has already done its job in mediating a ‘state of satisfaction’) should have been of no consequence. These results suggest that the critical association in instrumental learning is not between stimulus and response, but between representations of a) the response and b) the reinforcer (or more generally, between the behaviour and its outcome). The stronger this association, assuming that the outcome is valued, the more probable the response will be. But an association with an aversive outcome (i.e. a devalued foodstuff or a punishment) will lead to a suppression of responding. This does not mean that S–R learning can never occur. Often, after long practice, we acquire patterns of behaviour (habits) that have all the qualities of reflexes. In other words, they are automatically evoked by the stimulus situation and not guided by consideration of their consequences. The results may be an experimental example of this. One group of rats was given extensive initial training in lever-pressing (500 rather than 100 reinforced trials) prior to the reinforcer-devaluation treatment. These animals continued to press the lever in the test phase. One interpretation of this result is that with extensive training, behaviour that is initially goal-directed (i.e. controlled by a response–outcome association) can be converted into an automatic S–R habit. When next you absent-mindedly take the well-worn path from your home to the college library, forgetting that on this occasion you were intending to go to the corner shop, your behaviour has been controlled by an S–R habit rather than the response–outcome relationship – just like the rats.
If an animal has acquired an S–R habit, then we can predict that the R will occur whenever the S is presented. But what controls performance if learning is the result of a response-outcome association? A rat can be trained to press for food or jump to avoid shock only in the presence of a given stimulus (called a discriminative stimulus) which signals that food or shock are likely to occur. Presumably the response-outcome association is there all the time, so why is it effective in producing behaviour only when the stimulus is present? How does the presentation of the discriminative stimulus activate the existing instrumental association?
Skinner, who completely rejected the theoretical law of effect, devoted several years of research (e.g. Ferster & Skinner, 1957) to exploring and demonstrating the power of the empirical law. He worked mostly with pigeons, trained in a Skinner box to peck a disc set in the wall for food reinforcement. Skinner investigated the effects of partial reinforcement [partial reinforcement the delivery of a reinforcer in operant conditioning is scheduled to occur after only a proportion of the responses rather than after all of them (continuous reinforcement)], in which food was presented after some responses but not all. Animals will usually respond well in these conditions, and with some schedules of reinforcement [schedules of reinforcement rules that determine which responses will be followed by a reinforcer in operant conditioning] the rate of response can be very high indeed. If, for example, the animal is required to respond a certain number of times before food is delivered (known as a fixed ratio schedule), there will usually be a pause after reinforcement, but this will be followed by a high frequency burst of responding.
Other ways of scheduling reinforcement control different but equally systematic patterns of response. There is a clear parallel here between the pigeon responding on a partial reinforcement schedule and the human gambler who works persistently at a one-armed bandit for occasional pay-outs.
According to the theoretical version of the law of effect, the only function of the reinforcer is to strengthen a connection between the response (R) that produced that reinforcer and the stimulus (S) that preceded the R. It follows that an S–R learner does not actively know what the consequence of the R will be, but rather the response is simply triggered based on previous contingencies. In other words, the rat in the Skinner box is compelled in a reflex-like fashion to make the R when the S is presented and it is presumed to be as surprised at the delivery of the food pellet after the hundredth reinforced response as it was after the first. Not only is this an implausible notion, but experimental evidence disproves it. The evidence comes from studies of the effects of reinforcer revaluation on instrumental performance. The results of one such study, a first stage of training, rats were allowed to press the lever in a Skinner box 100 times, each response being followed by a sugar pellet. Half the animals were then given a nausea-inducing injection after eating sugar pellets – a flavour-aversion learning procedure. As you might expect, these rats developed an aversion to the pellets, so the reinforcer was effectively devalued.
In the subsequent test phase, the rats were returned to the Skinner box and allowed access to the lever (although no pellets were now delivered). The researchers found that rats given the devaluation treatment were reluctant to press the lever, compared with the control animals. This result makes common sense – but no sense in terms of the theoretical law of effect. According to the strict interpretation of the law of effect, an S–R connection would have been established at the end of the first stage of training by virtue of the reinforcers that followed responding, before the nausea-inducing injection was administered. Subsequent changes in the value of this reinforcer (which, according to the theory, has already done its job in mediating a ‘state of satisfaction’) should have been of no consequence. These results suggest that the critical association in instrumental learning is not between stimulus and response, but between representations of a) the response and b) the reinforcer (or more generally, between the behaviour and its outcome). The stronger this association, assuming that the outcome is valued, the more probable the response will be. But an association with an aversive outcome (i.e. a devalued foodstuff or a punishment) will lead to a suppression of responding. This does not mean that S–R learning can never occur. Often, after long practice, we acquire patterns of behaviour (habits) that have all the qualities of reflexes. In other words, they are automatically evoked by the stimulus situation and not guided by consideration of their consequences. The results may be an experimental example of this. One group of rats was given extensive initial training in lever-pressing (500 rather than 100 reinforced trials) prior to the reinforcer-devaluation treatment. These animals continued to press the lever in the test phase. One interpretation of this result is that with extensive training, behaviour that is initially goal-directed (i.e. controlled by a response–outcome association) can be converted into an automatic S–R habit. When next you absent-mindedly take the well-worn path from your home to the college library, forgetting that on this occasion you were intending to go to the corner shop, your behaviour has been controlled by an S–R habit rather than the response–outcome relationship – just like the rats.
If an animal has acquired an S–R habit, then we can predict that the R will occur whenever the S is presented. But what controls performance if learning is the result of a response-outcome association? A rat can be trained to press for food or jump to avoid shock only in the presence of a given stimulus (called a discriminative stimulus) which signals that food or shock are likely to occur. Presumably the response-outcome association is there all the time, so why is it effective in producing behaviour only when the stimulus is present? How does the presentation of the discriminative stimulus activate the existing instrumental association?
No comments:
Post a Comment