ClickerSolutions Training Articles

Introductory Chicken Training Camp
July 23-27, 1999

Note: These are the notes I took during the lecture portions of the introductory chicken training camp offered by Bob and Marian Bailey. If there are any mistakes, they are mine. I have no doubt that my paraphrasing -- or perhaps my plain old understanding -- is incorrect or incomplete in places. If you have any doubts about the validity of something, please verify it with the Baileys themselves at behavior@hsnp.com.

Day 1, Friday, July 25, 1999

We train responses, not behavior.

Behavior is anything an animal does. The Dead Man's Rule: If the dead can do it, it's not behavior.

Behavior can be complex. One behavior is rarely separated from other behaviors.

Behavior can be stereotypic. For example, one step is pretty much like the next step. However, stereotypic behavior isn't identical. One can take shorter or longer steps or can step high or low or right or left.

Scientist try to break behavior into smaller bits called responses.

The definition of a specific response is arbitrary, convenient, and changeable. For example, I can define a response as touching a target. Once defined, it is predictable, consistent, and precise. A defined behavior is a response.

Behavioral scientists predict and control behavior. Good trainers are good behavior analysts.

Defining a response. Responses may be defined by a higher authority or may be predetermined.

Train responses, not behaviors.

Two training approaches: lumping and splitting. Splitting is a reductionist approach. Lumping is collectivist. The collectivist method results in erratic learning and learning plateaus.

No matter what responses you see during training, you are dealing with innate responses. An animal can only do what nature has made it capable of. We can't be trained to fly, for example. That doesn't mean that complex behaviors can't be learned.

A reflex is the relationship between the stimulus and the response. Reflexes tend to be involuntary and internal.

Continuum - gradual transition from one extreme to the other.

Reflexes are rigid and consistent and require no training. They are hard to change. The opposite is a stimulus.

People who work with assistance dogs should know the dog's satiation curve.

Revel in simplicity!

Animal training is a mechanical skill using hand/eye coordination and a methodology including procedures and rules that teaches behaviors.

Methods of animal training: traditional, OC, clicker, other.

Mechanical skill + Experience + Education = Proficiency

Animal training should be a science, a technology, and an art. Preferably more science and technology and less art.

OC is a science that studies behavior and ignores mental processes. It is a technology that changes behavior - not attitudes, not thought process.

Day 2, Saturday, July 24, 1999

The technology works. If something isn't working, then, it's what the trainer is doing. This assumes that the animal is healthy and capable of what the trainer is asking.

Principles

  • Reinforcement - increases behavior; animal decides what is reinforcing
  • Extinction - decreases rate of behavior through non-reinforcement
  • Punishment - decreases rate of behavior by suppressing behavior; animal decides what is punishing
  • Generalization - stimulus and response

Applying the principles

  • Reinforcement - use it to get behavior
  • Extinction - use it to reduce or eliminate unwanted behavior
  • Generalization - stimulus and response
  • Punishment - use it to suppress behavior

Punishment is difficult to apply correctly. It generalizes easily, causes fear and aggression, and timing and severity are critical. It is easy to apply incorrectly. Overuse and ill-timing are common.

If you have to apply a punishment more than two or three times, you're not doing it correctly.

The Baileys have used punishment approximately a dozen times in 50 years (in training 10,000 animals). In all cases, it was the client who insisted. Bob doesn't believe the punishment was necessary. In one case they were teaching a dog not to chase cars. In all other cases, they were teaching dogs in the military to locate and prevent soldiers from tripping trip wires. The military insisted that the dogs be punished for tripping the wires. In the military's original training, they punished the dogs early in the program, then spent a long time recovering. Bob trained the dogs first, then applied the punishment - one time, and he had to set the dogs up to fail - as a proofing exercise. He only punished each dog one time, and he didn't believe it was necessary.

An effective reinforcer is immediate, contingent, and valued. Reinforcement is the key to effective training.

Factors of reinforcement:

  • Timing: the when
  • Criteria: the what
  • Rate: the how much

80% of (non-Pavlovian) training problems are created by poor timing, incorrect raises in criteria, or poor rate of reinforcement.

You get what you reinforce, not necessarily what you want.

A training period is any preset period of time set aside for training behaviors. Bob suggests 10-15 minutes. A training period is made up of training sessions. A session can be a preset period of time, a number of repetitions, or whatever you define it to be. Bob likes a session to be 10 trials. At the end of the session you evaluate the animal and either increase the criteria or remain.

Don't change your criteria during a training session.

Don't do more than three sessions in a row where the animal is getting fewer than 50% correct responses.

You, as trainer, can alter any "rule" of training, but you have to live with the results.

Color discrimination. Present hot target alone. Once the behavior is freely offered, present cold target. Ignore first mistake - that's a freebie to let the animal know that responses to that target are not reinforcing. If the animal makes a second mistake - and ANY MISTAKE thereafter, even in later sessions - remove the hot target and let the behavior against the cold target begin to extinguish. When the behavior rate diminishes, add the hot again, so you can reinforce not interacting with the cold target. (Premack)

Day 3, Sunday, July 25, 1999

Variations on the training game. Have several people holding hands. On one end is the trainer, who knows the behavior being trained. On the other end is the clicker. When the trainer wants to click, he squeezes the hand of the person who squeezes the hand of the person next to him and so on. There can be a huge delay in the click, which can cause huge frustration. This is a particularly important lesson for corporate managers. Another variation: put the clicker person in the middle and have two trainers - one at each end. The clicker person can't click until he gets squeezes from *both* ends. Depending on how each trainer processes information (lumper vs. splitter), the clicks could be incredibly different.

People make decisions precipitously when they don't have much information coming in, and they delay unnecessarily when there is too much. Keep that in mind when planning a training program. Don't change your overall plan just because you're not getting data - don't second guess yourself.

How you present a jackpot is important. The process of ingesting food is reinforcing - present more pieces and present them individually. Jackpots make the trainer feel good. (Training should be worthwhile for all.) In the long-term scheme, there is almost no difference between those who do and don't jackpot. Only jackpot when the dog vastly outperforms previous attempts, or if the behavior is of such long duration that the animal has had few opportunities to earn reinforcements.

If your animal is not responding, the trainer can choose to end the session - don't bludgeon yourself or your chicken. Put the animal away, make a new plan, then start a new session. However, give your plan a chance - you took the time to make a plan so it deserves an opportunity. Use your judgement about whether the animal is working through and making progress or if it is completely confused and perhaps going backwards. Also, don't forget to take baggage into account.

What kind of data to keep. Session time is invariant. Since rate of response is high, count errors. Hopefully the number of errors will get lower and lower.

When do you proof what you've got? When you have a strong response on what you've got.

If you have a session of 10 trials and the animal makes no errors, no learning has taken place. Likewise, if the animal makes no correct responses, no learning has taken place. The data suggests that an overall rate between 70 and 90% gives the best learning rate/performance increase overall. Bob likes 80%. Do 10 trials. If you get more than two wrong, stick with your criteria. If not, increase your criteria.

We have to develop discipline. When the timer goes off, the session ends - even if your animal is "on a roll."

At CCI, dog sessions start at 2-3 minutes. Eventually, if they're working on a particular problem, they may do five minutes. After five minutes, the learning curve tapers off. If you use a timer with a dog, don't set it for the end of the session. Set it for one minute before the end - otherwise the timer becomes a type of aversive.

Stimulus reversal. At the end of the stimulus discrimination exercise, you have three targets: a hot target (which was cold before you started), a cold target (which was hot before you started), and a warm target (which was ignored both times. For the stimulus reversal, make the warm target the new hot target.

This is a shaping exercise. All three targets are placed on the table in a row about two inches apart. The old hot target is on an end. The trainer can chose whether the new hot target will be in the middle or on the other end.

To start, reinforce the bird for two pecks on the old target. After that, no pecks on the old target will be reinforced. The trainer must shape the bird to peck the other target. The trainer cannot lure in any way. The trainer's body language must remain neutral. For best results, maintain a high level of reinforcement.

In the beginning, you have two responses: pecking and not pecking. Reinforce anything that isn't pecking at the old hot target. Remember lumpers and splitters. Don't wait for the bird to do something in the vicinity of the new target - just reinforce not pecking.

Timing is important, but you won't fail if your timing is off. Rather, you will get the behavior faster if your timing is on.

Gary Wilkes analogy: Imagine getting into a cab in NYC and listing every place that you don't want to go, leaving the cabbie to figure out that the one place you didn't mention is the right place. That's the traditional style of training.

Emotions are reflex responses that cause many physiological responses inside. Fear and anger are debilitating. If they are persistent, then the body develops psychosomatic illnesses: the body maintains the physiological state created by the emotion, causing illness. Psychosomatic illnesses are not imaginary.

The central nervous system is the brain and spinal cord. The peripheral nervous system is outside of the central nervous system. It is also called the autonomic. It controls emotions, digestion, and some other reactions.

Autonomic has two parts: sympathetic and parasympathetic. The sympathetic includes the negative emotions like fear and anger. These reactions cause most people to feel bad, but "adrenaline junkies" would feel energized. The sympathetic is related to consumption of energy. Parasympathetic is related to the conservation of energy. Parasympathetic is the happy emotions. All of these are subject to Pavlovian conditioning.

Most of the behaviors we train are not reflex behaviors. However, we must be concerned with those inasmuch as they are responses to our training.

Operant conditioning is also called instrumental or Skinnerian conditioning, and it is concerned with motor behaviors.

To condition an unconditioned (neutral) stimulus, pair the unconditioned stimulus with a stimulus (either at the same time or earlier) and eventually the unconditioned stimulus will be able to replace the stimulus. At that point it is a conditioned stimulus. This is Pavlovian conditioning.

Operant. Learning to differentiate between stimuli. Start with lots of irrelevant stimuli. There's a response and a reinforcement. Some stimuli begin to stand out - these are called salient stimuli. In the presence of these, the response/reinforcement cycle occurs. When certain stimuli stand out above all others, and thus the stimuli cue the response, they are discriminative stimuli (SD). The key to making an excellent, strong cue is to get rid of all but one SD.

Respondent and operant conditioning combine in common situations.

Characteristics of action patterns:

  • Skeletal or striped muscle
  • Species-specific
  • Instinctive
  • Require maturation

Motor actions have these characteristics:

  • Can be modified by learning to some extent
  • Physical and neurological

Discrete responses: particular events which can be easily identified and clicked. For example, pecking a target: click when the beak hits the target.

Nondiscrete responses: You have to define the moment when you're going to reinforce. For example, choosing an arbitrary distance to click when the chicken is pulling a rubber band.

Day 4, Monday, July 26, 1999

This is a very forgiving system. You can get whatever level of preciseness you want. Or, you can use it with a novice and get a basic behavior.

You need to know your animal so well that you recognize precursors to behavior. Otherwise you will be late.

A confused student is the teacher's fault. Always. The teacher's task is to make the first step (and all subsequent steps) as easy as possible.

The reason we remove the hot target instead of the cold target is because the animal must have the chance to extinct the behavior.

There are two parts of training: mechanics and behavior. Right now, we must concentrate (at least equally) on the mechanics side. If we don't, the mechanics will begin to break down. We'll begin to accept our lower standard of mechanics. Eventually, whether we set a low standard or high standard of mechanics, the mechanics will become automatic, and we will concentrate entirely on behavior.

If you go for more than two sessions and don't see changes in behavior, you need to make changes. Ask, "What am I doing wrong?"

Stimulus (pl. stimuli): A change in physical energy to which an animal can respond or react. If the animal cannot react to it, it is not a stimulus to that animal. For example, we cannot hear many tones beyond 25000 cycles. There are various kinds of physical energy. Electro-magnetic is most important to us. There is a tiny span within electromagnetic energy which is visual to us. There are others for our other senses.

It's the change that makes it a stimulus. If there is no change, there is no stimulus - that's why we don't see the blind spot in our eye. Eye movement is sufficient to be a stimulus for normal vision.

Salient stimuli can be innately saliently or learned. Your name, spoken in a large noisy room, is salient, and we have learned to focus on it. Stimulus context. All the stimuli present in a situation. For example, I'm listening to the lecture, but my desire for lunch is becoming a salient stimulus. The other people in the room are stimuli but mostly they are background stimuli. When training an animal, you must be aware of all the stimuli present in a situation. Just because we don't consider something salient, doesn't mean that the animal doesn't. Sometimes the best thing you can do for your training is to change the context.

Genetic factors. Genetic means anything that is inborn, innate, inherited, instinctive, whatever.

Reflexes help keep the animal alive and were selected through natural selection for that purpose. What about operants? Why are some operants more prevalent than others? Operants are selected by their consequences.

Motor action patterns are species-specific. For example, the layson albatross has a very complex courting dance.

Learning involves a change of behavior that is brought about by some experience with the environment. Learning is a very comprehensive word. One of the things that constitutes learning is learning discriminations.

Methods of learning: respondent conditioning, operant conditioning.

Four principles of learning: reinforcement, extinction, punishment, and generalization.

The ABCs of learning: Antecedent, Behavior, and Consequence.

Antecedents include several categories:

  • Stimuli (salient and background)
  • Establishing operations (setting factors) - a condition that causes the value of the reinforcement to change. For example, hunger causes the value of food to increase. Hormone changes can also do this - obviously this is out of the trainer's control, but the trainer must be aware of the effect of heat cycles, etc. There are similar changes that occur due to illness.
  • Previous reinforcement history

Consequences are events that happen after the event. We use positive reinforcement the most. Extinction is another consequence. Extinction is very important if you don't want to use punishment. Very often this is the only thing you have to do to extinguish a behavior. One way to speed up extinction is to reward an incompatible behavior. There are also aversives/punishment. Many things in nature are aversive, but we don't label it as such.

Features of extinction.

  • Extinction burst - a very rapid run of the behavior that is being extinguished. Usually occurs early, but may occur later. Extinction bursts tend to be very vigorous. You can take advantage of this to shape a more vigorous response if you're only getting weak responses.
  • Spontaneous recovery - the behavior appears to have extinguished, then after the passage of time, it suddenly reappears just as strong.

Extinction bursts and spontaneous recovery are completely natural parts of the phenomenon and you simply have to ride them out.

Aversives. Pain is naturally aversive. Many odors are naturally offensive. You have to study each species - and even the individual - to determine what's aversive. An aversive is anything the animal will avoid or try to get rid of.

Reinforcement. A positive reinforcer is anything that an animal will seek out or work to get.

Day 5, Tuesday, July 27, 1999

Bob and Marian are willing to use aversives if there is a situation that involves life or death for the dog, a human, or property. He suggests using the punishment before the dog has a large history of doing what you don't want it to do. If you have to use aversives, do it twice, and if the behavior hasn't stopped, you're not doing it right.

Don't use a variable schedule of reinforcement unless you have to. Bob has never seen a pet dog or obedience dog who needed a variable schedule. If your dog is going to be blasted into space, you need a variable schedule. Search and rescue dogs who work for long durations may need a variable schedule.

One behavior can reinforce another behavior. Premack. This is how chains work. Reinforce only at the end of a chain.

Bob has only seen a few obedience competitions but he said each time he had the same opinion: the behaviors weren't strong enough. The behaviors should be carbon copies - no variation. Every sit should be the same as every other sit every time. Use video tape. Your dog should look the same every single time. If this is true, then there should be no problem not reinforcing in the ring.

Bob has found that giving people a form to fill out showing the work they did with their dog during the week makes people just a bit more honest. Michele divides behaviors into levels, and won't give out the sheets for the next level until they have made it all the way through the current level. People who do the work should be heavily reinforced.

Philosophy. Bob and Marian believe that in 50 years Psychology won't exist as a discrete science because it can only be reduced to a certain point, then you're out of the realm of Psychology.

We should make an effort to develop an individual philosophy about why we do things. This will build our confidence about what we're doing. It gives us a foundation to stand on. A philosophy isn't confining. It gives us principles to apply to other areas of our life. A philosophy gives you the arrogance, the confidence, to walk into a kennel and say, "I know what I want, and I know how to get it." That attitude will help you be effective, to have better timing.

Why the Bailey's believe in OC/Behavior Analysis. It works for them. 15,000 animals in 140 species in 50 years. Their business, which used OC, supported more than 70 employees for over 47 years. OC works for others. No other training effective with humans and all tested animals. It is based on science. Applied Behavior Analysis is a technology based on science.

Behavior Analysis is not a theory. It is a description.

Applied behavior analysis is used even by detractors when changing behavior.

They (the Baileys) do not believe for personal or personality reasons. Not for friendship, not for loyalty, not for self-aggrandizement. They have no personal investment. If a new humane training method proved better than behavior analysis, they would change immediately and without regret.

Practical training philosophy. Suggestions for efficient applied behavior analysis and production training: Think. Plan. Do.

The biggest training problem is timing. The second biggest is establishing criteria. The third biggest is rate of reinforcement.

Training is a mechanical skill.

Operant conditioning is a few simple principles applied in a complex environment.

Creating complexity is ease. Creating simplicity is hard.

Fundamentally, you get behavior with reinforcement. Reduce or eliminate behavior with extinction. Last resort, you suppress behavior with punishment. PUNISHMENT IS A LAST RESORT. Effective punisher should be timely, contingent, sufficiently intense, and safe. Punishment should rarely exceed three applications.

Punishment generalizes better than reinforcement.

More behavior is better than less behavior. Having more gives you more to change. If you don't have enough, you have to work to get it. To a traditional trainer, less behavior is desirable - they're working to get rid of behavior.

More reinforcements are better than fewer reinforcements. By and large, if you have a doubt about reinforcing, reinforce.

Know what you want. Know what you don't want. If you aren't sure, don't use the clicker - just chuck food.

If you feel emotional, stop training.

The game must be worthwhile for all.

Behavior is determined.

Behavior is lawful. It is a function of natural variables. The trainer's job: Find the variables.

Do solve mysteries. Don't create mysteries. Look at the behavior. Keep looking at the behavior.

Do gather data. Don't guess. Don't base your training program on anecdotal evidence.

Problem solving. Look for simplicity: timing, criteria, rate of reinforcement.

Behavior should continue to improve steadily. Don't back up. Slice behavior into tiny responses. Learning plateaus are created by the trainer, by and large.

Trainers get bored. Animals don't. (Don't project.)

Trainer's motto: CAN DO. If you question whether you can do it, don't even try. Leave your doubts at the kennel door. Believe in yourself and your technology. Believe in what you are doing.

Believe. Believe. Believe.

More about various versus continuous rates of reinforcement. Variable reinforcement makes a stronger behavior, but it is more variable. Bob has never had a behavior that has been properly trained extinguish because of continuous reinforcement. Karen encourages variable because she thinks beginning trainers will try to advance too quickly and the behavior will extinguish and the trainer will get frustrated. Bob doesn't deny that that could happen with beginners, but believes that Karen should tell advanced students to use continuous. To get strong, non-variable, precise behaviors, you need to use continuous.

Positive primary reinforcers are innately, genetically desired. Food, water, sex, creature comforts, social interaction (for social creatures), petting (for social creatures). Positive secondary reinforcers are learned. To create a positive secondary reinforcer, you only have to pair it with a primary reinforcer. Some neutral stimuli are easier to pair with primary reinforcers than other, depending on biological preferences. Watch the animal and see what the animal likes.

There are also primary and secondary aversives. Primary aversives are things which are naturally offensive like pain and some odors. Conditioned aversives include our tone of voice and certain words, warning signs.

Positive and negative. These are mathematical signs: + and -. They mean add and take away.

Reinforcement always strengthens. (Reinforcing the troops.)

Punishment is to suppress.

Positive reinforcement: Add something that the animal wants, seeks, or needs to strengthen (increase the frequency of) a behavior.

Positive punishment: Add something that the animal will work to avoid (an aversive) to suppress (lessen the frequency of) a behavior.

Negative reinforcement: Remove something that the animal will work to avoid (an aversive) to strengthen (increase the frequency of) a behavior.

Negative punishment: Take away something that the animal wants, seeks, or needs to suppress (lessen the frequency of) a behavior. This includes taking away something irrelevant: taking away the car for failing a test.

Extinction doesn't appear directly on this chart. It's a different class of behavior, which can include one of these principles. Its characteristics include spontaneous recovery, extinction burst, ease of re-extinction if the behavior appears again. Extinction is the removal of the chance for reinforcement. The behavior weakens because it is not reinforced, not because it is suppressed by unpleasant consequences.

Melissa Alexander
mca @ clickersolutions.com
copyright 1999 Melissa Alexander

 

| Training Articles Contents || Site Home |


List and Site Owner: Melissa Alexander, mca @ clickersolutions.com