Role of dopamine in Pavlovian reward conditioning

1 Role of dopamine in Pavlovian reward conditioningAssoci...
Author: Marvin Griffin
0 downloads 2 Views

1 Role of dopamine in Pavlovian reward conditioningAssociative learning critical for animal survival. Pavlovian conditioning: allow animals to execute approach behaviors that increase chances of obtaining reward. We won’t cover instrumental conditioning although it’s thought there are both similarities and differences with Pavlovian forms of learning. Focus here on natural rewards like food, to distinguish from other rewards like money or drugs of abuse, but there are many shared functions of dopamine in processing those rewards as well. Sotiris Masmanidis May

2 Outline Review of anatomy & physiologyAssociative learning models & computational properties of dopamine neurons Behavioral role of dopamine neurons

3 Overview of dopamine functionRole in healthy brain: Cognition Control of movement 3. Reward-guided learning and behavior Role in disease: Cognition: Schizophrenia, ADHD Movement: Parkinson Reward: Addiction, Depression Also implicated in: Mood, motivation Attention, salience Social behaviors Arousal, sleep Note: PubMed or DOI links to the cited papers are provided in the presenter notes.

4 History of dopamine 2003 1990s-2000s 1960s-80s 1957-1960 1950 1910Dopamine: the movie 1990s-2000s Electrophysiological properties of dopamine system Discovery of reward prediction error coding Genetic tools to study dopaminergic function 1960s-80s Discovery of dopamine receptors Discovery of principal dopaminergic pathways Implication of dopamine in reward / addiction Model of basal ganglia function (direct/indirect) Discovery of DA in human brain Discovery of DA concentration in striatum Discovery of DA depletion in Parkinson’s disease Implication of dopamine in movement Marsden, PMC 1950 Compound name coined as dopamine Synthesis of chlorpromazine (antipsychotic, D2R) 1910 First laboratory synthesis of 3,4-dihydroxyphenylethylamine Marsden, 2006

5 Anatomy of dopaminergic systems – Cell bodiesMain dopaminergic nuclei: Ventral tegmental area (VTA) Substantia nigra pars compacta (SNc) SNc VTA 0.5 mm TH: enzyme used in dopamine synthesis Tyrosine hydroxylase (TH)

6 Anatomy of dopaminergic systems - InputsDopaminergic neurons receive input from both external and local excitatory and inhibitory sources. It is thought that the combined effect of these inputs is what gives rise to the reward processing properties of dopaminergic neurons. Dopamine Glutamate Morales & Margolis, DOI: /nrn GABA mPFC: medial prefrontal cortex VP: ventral pallidum LHb: lateral habenula LHT: lateral hypothalamus Morales & Margolis, 2017

7 Anatomy of dopaminergic systems - ProjectionsMain dopaminergic pathways: Mesolimbic: VTA to nucleus accumbens (reward) Mesocortical: VTA to prefrontal cortex (cognition) Nigrostriatal: SNc to dorsal striatum (movement) Other tracts: VTA to amygdala and hippocampus Russo & Nestler, PMC Mesocorticolimbic pathway (Russo & Nestler, 2013)

8 Anatomy of dopaminergic systems - ProjectionsDopaminergic neurons densely project to the striatum 1 mm TH immuno-staining

9 Actions of dopamine on brain functionActs on dopamine receptors (2 major categories). D1 receptor-expressing neurons increase their excitability in presence of DA. D2 receptor-expressing neurons decrease their excitability in presence of DA. 2. Modulates neuronal excitability. Basal ganglia output regulates cortical activity. Thought to be important for movement control. 3. Modulates neuronal plasticity. Thought to be important for reward learning.

10 Actions of dopamine on brain function - ExcitabilityClassical model of basal ganglia function: Dopamine enhances direct pathway activity to promote movement. Dopamine reduces indirect pathway activity to promote movement. Parkinson’s disease: loss of DA suppresses movement by reducing direct pathway and increasing indirect pathway activity. Direct pathway: D1 receptor Indirect pathway: D2 receptor Albin & Penney, Gerfen & Surmeier, PMC Albin & Penney, 1989. Gerfen & Surmeier, 2011.

11 Actions of dopamine on brain function - ExcitabilityDopamine has opposing effects on D1/D2 receptors In addition, some dopamine neurons co-release other neurotransmitters that regulate excitability: Glutamate (Stuber & Bonci 2010; Tecuapetla & Koos 2010) GABA (Tritsch & Sabatini 2016) Gerfen, DOI: /nn Gerfen, 2006

12 Actions of dopamine on brain function - PlasticityStriatum: Striatum is an important site of plasticity in reward-based learning. Dopamine modulates both LTP and LTD on corticostriatal synapses. LTP is thought to increase efficiency of corticostriatal communication. Higher corticostriatal communication is associated with improved motor task performance. Corticostriatal communication increases with reward-based motor learning. Glutamatergic projections to striatum: cortex Reynolds & Wickens, DOI: / Kreitzer & Malenka, PMC Gerfen & Surmeier, PMC Yin & Costa, PMC Koralek & Carmena, PMC thalamus amygdala striatum Reynolds & Wickens, 2001 Kreitzer & Malenka, 2008 Gerfen & Surmeier, 2011 Yin & Costa, 2009 Koralek & Carmena, 2012 hippocampus Dopaminergic projections to striatum: VTA/SNc

13 Dual mechanisms for striatal dopamine releaseDopamine release in the striatum is independently controlled by: VTA/SNc dopaminergic neuron activity (the familiar route). Cholinergic interneurons acting on nicotinic acetylcholine receptors on dopaminergic axon terminals. Threlfell & Cragg, DOI: /j.neuron Cachope & Cheer, PMC Mamaligas & Ford, PMC Threlfell & Cragg, 2012 Cachope & Cheer, 2012 Mamaligas & Ford, 2016

14 Summary of dopaminergic system actionsServes to modulate both neuronal excitability and plasticity. Promotes movement and reinforcement by enhancing direct pathway and suppressing indirect pathway of basal ganglia. Dopaminergic neurons are not just squeeze bottles with dopamine: multiple neurotransmitters & release mechanisms. Although much has been discovered, our understanding of how dopamine influences brain activity is still incomplete, and an active area of research.

15 Latest research trends – From cells to systemsNew viral tracing approaches make it possible to identify the inputs and outputs of brain regions with unprecedented specificity and detail. These anatomical maps allow us to make informed hypotheses about the function of specific brain circuits. Watabe-Uchida et al. DOI: /j.neuron Beier et al. PMC

16 Map data repositories: Allen Brain Projection Atlas

17 Associative learning: Pavlovian conditioningPairing of a conditioned stimulus (CS) with an unconditioned stimulus (US). CS: a sensory cue (tone, light, odor) US: a reinforcer (drop of water, juice, money, etc). After repeatedly paired trials, animals acquire a conditioned response (CR) upon exposure to just the CS. CR: reward-anticipatory response (salivation, licking, approach). The presence of a CR indicates that animals have learned the CS-US association, and are predicting the reward. In extinction, the CS is no longer followed by a US, and the CR is abolished.

18 Pavlovian vs operant conditioning – how distinct?Pavlovian: stimuli allow animals to prepare for reward presented irrespective of behavior. Operant: stimuli elicit actions needed to obtain the reward. There is a large amount of literature pointing to both similarities and differences in the brain circuits mediating Pavlovian and operant learning and behavior. In general: The associative learning process is thought to be mediated by dopamine in both cases. However, the site of plasticity for performing Pavlovian and operant responses is thought to often lie in different circuits (e.g., ventral and dorsal striatum). These distinctions can also be task-dependent (e.g., licking vs lever-pressing). Keep an open mind: Do not over-generalize claims that particular brain areas do or don’t mediate Pavlovian or operant learning and behavior. Fanselow & Wassum, DOI: /cshperspect.a021717 Further reading on Pavlovian conditioning: Fanselow & Wassum, 2016

19 Example: Pavlovian trace conditioning in miceTrace conditioning: delay between CS and US. Delay conditioning: timing of CS and US overlap. Bakhurin & Masmanidis, DOI: /jn CS: olfactory cue (amyl acetate). Reward (US): drop of sweetened milk. Food restriction to increase motivation. Bakhurin & Masmanidis, 2016

20 Advantages of trial-based learning tasksMost of what we know about computational properties of dopamine neurons from the last 25 years is from trial-based learning tasks. Can model how behavior will evolve over successive trials. Can deliver precisely timed stimuli: thus can also model how behavior will evolve as a function of time. Can vary the dosage and probability of stimulus delivery. Example of single-trial Pavlovian learning task: conditioned place preference Effective for studying the reinforcing properties of various compounds. Can vary dosage, but not timing or probability.

21 Rescorla-Wagner model of associative learningUsed to model the strength of a CS-US association. Association strength is related to likelihood of executing a conditioned response. Trial-based learning model (association strength gets updated on each successive trial). Learning is driven by errors (discrepancy between predicted and actual reward). Model equation: Vi+1 = Vi + αβ(λ-Vtot) Run RW Matlab script. Vi+1 is the associative strength of a US to a specific CS on trial i+1. Vi is the associative strength of a US to a specific CS on trial i. Vtot is the total associative strength of a US to all associated CS types. α is the salience (constant from 0 to 1). β is the learning rate (constant from 0 to 1). λ is the maximum possible association strength possible to the US (λ is related to the reward value).

22 Matlab tutorial: Rescorla-Wagner modelOpen MatlabNewScript Copy the script below in the script editorSave fileRun file Alter the initial parameters and rerun the script. %***Initialize parameters*** alpha=0.9; %salience (parameter from 0 to 1) beta=0.1; %learning rate (parameter from 0 to 1) lambda=100; %maximum possible association strength to US (depends on reward value). Vinit=0; %initial cue-reward association strength (Vi=0 for naive animals) n=100; %number of trials %**************************** close all V=zeros(1,n); V(1)=Vinit; for i=1:(n-1) deltaV=alpha*beta*(lambda-V(i)); V(i+1)=V(i)+deltaV; end figure(1); clf; plot(1:n, V, '.-') xlabel(['Trial #'], 'FontSize', 12) ylabel(['Association strength'] ,'FontSize', 12) title(['Rescorla-Wagner Model'] ,'FontSize', 12) set(gca,'FontSize',12,'TickDir','out') Run RW Matlab script.

23 Rescorla-Wagner model of acquisitionRun RW Matlab script.

24 Rescorla-Wagner model of extinctionRun RW Matlab script.

25 Pros and cons of Rescorla-Wagner modelAdvantages: Simple, intuitive, few free parameters. Has made some successful predictions (e.g., blocking). Disadvantages: Fails to explain some behavioral effects. Extinction: not unlearning of previous associations. Does not treat time as a variable.

26 When is dopamine released in the brain?Microelectrode recordings of dopaminergic neurons: Dopamine neurons fire to uncued (i.e., unpredicted) rewards. So, do dopamine neurons just signal the presence of reward? NO! Schultz et al., PMID: Schultz et al., 1997

27 Dopamine neurons encode reward prediction errorRPE coding: Positive RPE Zero RPE Schultz et al., PMID: Negative RPE Schultz et al., 1997

28 How does DA RPE signal fit into Rescorla-Wagner model?Vi+1 = Vi + αβ(λ-Vtot) Dopamine’s ability to modulate plasticity is is qualitatively related to λ. The parameter λ represents the available reward value. Any error between association strength and λ will lead to a change in association strength. λ>Vinitial: Positive RPE, increased learning I already said this but it’s worth repeating… λ=Vinitial: Zero RPE, no learning λ

29 Directly measuring dopamine releaseFast scan cyclic voltammetry (FSCV) in nucleus accumbens: Early learning Unpredicted reward: Positive RPE Late learning Predicted reward: Day & Carelli, DOI: /nn1923 Diminished RPE Notice that DA reward response is reduced but not zero. This is a fairly common observation. Day & Carelli, 2007

30 What is the CS response for?With more learning, dopamine signaling shifts to coincide with the CS. This indicates that the CS has now acquired the ability to predict that a reward is likely to occur at a particular time. This predictive property allows animals to initiate anticipatory behavioral responses. Sometimes, this is taken to mean that the CS has the same hedonic value as the actual reward, but that’s not known to be universally true. Schultz et al., 1997 Schultz et al., PMID: Day & Carelli, DOI: /nn1923 CS response: Day & Carelli, 2007

31 Caveat 1: Striatal dopamine does not just signal RPEIn spatial navigation task, striatal dopamine shows a ramping profile that signals proximity to reward. Howe & Graybiel, PMCID: PMC Gershman, Dopamine Ramps Are a Consequence of Reward Prediction Errors. DOI: /NECO_a_00559 Howe & Graybiel, Also see Gershman, 2014.

32 Caveat 2: Striatal dopamine signals are not uniformVentral striatal areas contain more reward-related, dorsal areas are more movement-related dopamine signals. Dorsal striatum: Central striatum: Howe & Dombeck, PMCID: PMC Ventral striatum: Howe & Dombeck, 2016.

33 Predictive properties of dopamine neuronsThe RPE coding properties of dopamine neurons can be modeled using temporal-difference (TD) models of learning. TD models of learning: Incorporate time as variable in the trial, which RW model does not (RW is trial-based) Thus, TD models can be used to predict time of reward. Can be viewed as extension of RW model, since timing of reward signal influences learning rate. Schultz et al., PMID: Suri, PMID: Further reading on TD models in neuroscience: Schultz et al., 1997 Suri, 2002

34 TD model of reward prediction errorData: Suri, PMID: Suri, 2002

35 Matlab tutorial: TD modelOpen MatlabNewScript Copy the script below in the script editorSave fileRun file Alter the initial parameters and rerun the script. %****Acknowledgment: Code was adapted from David S. Touretzky (October, 1998). Original code: %****For explanation of model see Suri, Neural Networks 2002. %********Set parameters****** stimtime=5; %time of CS rewardtime=25; %time of US numberofbins = 30; %number of time bins. make sure this is greater than rewardtime. numberoftrials=50; %number of trials. alpha=0.9; %learning rate (0 to 1). gamma=0.99; %temporal discount factor (0 to 1). More distant rewards are weighed less. default=0.99. reset_learning='y'; %if not 'y', then will use final {W, delta, and V} values from the last run. %***************************** stim=zeros(numberofbins,1); reward=zeros(numberofbins,1); stim(stimtime)=1; %defines stimulus vector (value of 1 at stimtime). For surprise reward, set this value to zero. reward(rewardtime)=1; %defines reward vector (value of 1 at rewardtime). For extinction, set this value to zero. if reset_learning=='y'; W=zeros(numberofbins,1); %predictive synaptic weight of each time bin. Initially, all weights are zero but they get updated every time bin. delta=zeros(numberoftrials,numberofbins); V=zeros(numberoftrials,numberofbins); else %use final {W, delta, and V} values from the last run. delta=delta(numberoftrials,:); V=V(numberoftrials,:) end Vij=V(1,1); %prediction of cue x at time t. initial value=0. for i=1:numberoftrials x=zeros(numberofbins,1); %Initialize time-shifted stimulus vector. for j=stimtime:numberofbins %start at j=stimtime but could also set initial j=1; result is the same. x_prev=x; %note that the vector x is zero for j = 1 through stimtime-1, thus as expected there is no predictive value for times before stimtime. Vij_prev=Vij; %Generate new time-shifted stimulus vector: x(2:end)=x(1:(end-1)); %vector shifts forward by one time bin (the value 1 moves forward in time). x(1)=stim(j); %assigns first element of x to correspond to the stimulus value at time bin j. x has the value 1 in one time bin. R=reward(j); %R will be 1 at j=rewardtime, 0 at all other times. note that when j=rewardtime, x will have the value 1 at time rewardtime-stimtime %TD learning rules: Vij=sum(W.*x); %updated prediction of current time bin. deltaij=R+gamma*Vij-Vij_prev; %prediction error at current time bin. On the first trial that R=1, deltaij=1 at j=rewardtime (strong positive +ve). V(i,j)=Vij; delta(i,j)=deltaij; %Update the synaptic weight vector for the next time bin: W=W+alpha*deltaij*x_prev; %With successive iterations W increases with time from stimtim and reaches a peak value at t=rewardtime-stimtime. %A key property is that W becomes nonzero for times earlier than reward (because of term x_prev). figure(1); clf; subplot(2,1,1) plot(1:numberofbins,V'); xlabel(['Time bin #'], 'FontSize', 12); ylabel(['Prediction'] ,'FontSize', 12); set(gca,'FontSize',12,'TickDir','out') subplot(2,1,2) plot(1:numberofbins,delta'); xlabel(['Time bin #'], 'FontSize', 12); ylabel(['Prediction Error'] ,'FontSize', 12); set(gca,'FontSize',12,'TickDir','out')

36 Matlab: TD model of positive RPEInitially: CS has no predictive value. After learning: Reward prediction increases with time and reaches maximum in time before reward.

37 Matlab: TD model of negative RPE

38 How do dopamine neurons compute RPE?Hypothesis: The combined effect of inputs allows dopaminergic neurons to compute RPE signals. Unknown: What kind of information is provided by each input. Dopamine Glutamate Morales & Margolis, DOI: /nrn GABA Morales & Margolis, 2017

39 Theoretical model of how dopamine neurons compute RPEKeiflin & Janak, PMC Keiflin & Janak, 2015

40 VTA GABAergic neurons inhibit dopamine neuronsBecause of their diverse receptor expression and strong inhibitory influence over DA neurons, much attention has focused on local GABAergic neurons. Relevance to addiction (see paper below) These cells are projection neurons so they don’t just couple to DA neurons. Johnson & North, PMID: GABA Dopamine

41 VTA GABAergic neurons encode reward predictionIn order to compute RPE, the brain also needs to compute RP. GABAergic Firing rate Cohen & Uchida, PMC Dopaminergic Firing rate Cohen & Uchida, 2012

42 From correlative to causal analysis approachesNeural recordings provide a correlative link between brain activity and computation, but ultimately, we want to establish a causal relationship. Correlative: GABAergic neurons encode reward prediction signals, which may be necessary for DA neurons to encode RPE. Causal: Inhibiting GABAergic neurons alters RPE encoding in DA neurons. Establishing causality requires experiments involving loss-of-function or gain-of-function of specific brain circuits. Approaches: pharmacology, gene knockout/rescue, lesions, optogenetics, chemogenetics.

43 Targeting genetically defined cell types with Cre-Lox recombinationCardin & Moore, PMC Cardin & Moore, 2010

44 Viral vectors for spatially confined cell targetingChoose the region and cell type of interest. Identify genes that are selectively expressed in that cell, but not other neighboring cells. Obtain animal strains selectively expressing Cre recombinase in the gene of interest (e.g., VGAT-Cre or TH-Cre). Inject a Cre-dependent virus in the region of interest. Example: AAV-Flex-ChR2-YFP There are several viral types with different expression properties. Histologically confirm that the YFP reporter is selectively expressed in the cells of interest. VGAT TH Allen Brain Atlas

45 Be aware of potential pitfallsCre expression is not perfectly confined to the cell type of interest; if the selectivity is poor this can limit how the results are interpreted. Lammel & Malenka, PMC

46 Optogenetic manipulation of GABAergic neurons alters dopamine RPE signalsActivating GABAergic neurons reduces dopamine RPE signal. Inhibiting GABAergic neurons increases dopamine RPE signal. Activate with ChR2: Inhibit with Arch: Eshel & Uchida, PMC GABA Dopamine Eshel & Uchida, 2015

47 External sources of dopaminergic input – Frontal cortexLesioning orbitofrontal cortex reduces magnitude of positive and negative reward prediction error signals encoded by dopamine neurons. Takahashi & Schoenbaum, PMC Takahashi & Schoenbaum, 2011

48 External sources of input - HabenulaLesioning habenula selectively attenuates the negative RPE signal. Reward omission trials: Tian & Uchida, PMC Tian & Uchida, 2015

49 Summary of dopamine computational propertiesEncode reward prediction error (RPE). Rescorla-Wagner and TD models can be used to simulate RPE signals. Higher than expected reward (positive RPE) promotes stronger cue-reward associations. Lower than expected reward (negative RPE) promotes extinction. The computational properties of dopamine neurons are thought to be generated by signals from functionally diverse inputs.

50 Latest research trends – embracing diversityIncreasingly, studies show that dopamine signals in the brain are heterogeneous, suggesting a diverse set of functions beyond just reward learning. The diversity appears to be largely related to anatomy, and thus, new brain maps greatly help us understand the organizational principles of dopamine functional diversity. Howe & Dombeck, PMC Parker & Witten, PMC

51 Behavioral role of dopamine in learningWe will briefly sample the dopamine literature relying on the following approaches: Pharmacology Genetic models (dopamine KO mice) Optogenetics

52 DA receptor blockade, or DA pathway lesions, impair Pavlovian learningThere is extensive literature on these effects, dating back to the 80s. Rather than covering a specific paper, I refer anyone interested in reading further to the following review papers: Caveat of pharmacology: lacks time specificity. Wise, DOI: /nrn1406 Belin & Everitt, DOI: /j.bbr

53 Pavlovian learning deficits in TH knockout mice lacking dopamineDarvas & Palmiter, PMC Darvas & Palmiter, 2014

54 Selectively restoring striatal dopamine rescues Pavlovian learningVentral striatal dopamine is necessary for Pavlovian reward learning, and restoring dopamine in that region is sufficient to rescue learning. Caveat: lack time specificity. Darvas & Palmiter, PMC Darvas & Palmiter, 2014

55 Optogenetically mimicking a positive RPE signal drives associative learningUsed a blocking task. Provided a well-timed laser stimulus paired with reward. Unpaired laser had no effect on behavior. Steinberg & Janak, PMC Keiflin & Janak, PMC Steinberg & Janak, 2013 Keiflin & Janak, 2015

56 Optogenetically mimicking a negative RPE signal drives extinctionUsed an over-expectation task. Provided a well-timed laser stimulus coinciding with expected reward. Together with the Steinberg & Janak paper, these results show that dopamine RPE signals bidirectionally control learning. Chang & Schoenbaum, PMC

57 Summary – Some general principles emerging from optogenetic manipulationsActivation of VTA dopamine neurons drives stronger learning (or is appetitive). Inhibition of VTA dopamine neurons drives weaker learning (or is aversive). Inputs that increase DA activity thus tend to drive stronger learning (or are appetitive). Inputs that decrease DA activity thus tend to drive weaker learning (or are aversive). VTA preferentially controls reward learning, while the SNc preferentially controls movement. See “Other Recent Literature” slide at the end of this presentation.

58 Sufficiency versus Necessity in OptogeneticsWhen reading the literature (or designing your own study), it is important to distinguish between two fundamentally different types of experiments: Test for sufficiency of a brain circuit in behavior. Test for necessity of a brain circuit in behavior. One does not imply the other. Sufficiency is often tested via optogenetic activation. Necessity is often tested via optogenetic inhibition. Should be aware of potential caveats. For example, activation may not be a pure gain-of-function, and inhibition may not be a pure loss-of-function experiment. For a thoughtful review of caveats, see Allen & Boyden, 2015. Bottom line: there are pros and cons of using optogenetics, and if there is uncertainty in the interpretation of the results, other methods (e.g., knockout/rescue, chemogenetics) could be used to confirm the results. Allen & Boyden, PMC Also see:

59 Another thing about sufficiencyIf a study concludes that a particular circuit is sufficient to drive a certain behavior, that does not mean that other circuits are not involved in the behavior.

60 Latest research trends – Multiple circuits and functionsThere is major interest in understanding how dopamine signaling is orchestrated with other neuromodulatory & neurotransmitter signaling mechanisms in different brain areas to mediate associative learning, as well as other behavioral functions. More precise cell mapping and targeting approaches allow us to dissect these functions in genetically and anatomically defined circuits. Russo & Nestler, PMC Russo & Nestler, 2013

61 Other Recent Literature (this is not meant to be a complete list)Cholinergic inputs to VTA and SNc: Xiao & Gradinaru, 2016. Cholinergic Mesopontine Signals Govern Locomotion and Reward through Dissociable Midbrain Pathways Hypothalamic inputs to VTA: Nieh & Tye, 2015. Decoding neural circuits that control compulsive sucrose seeking Excitatory and inhibitory inputs to VTA: Lammel & Malenka, 2012. Input-specific control of reward and aversion in the ventral tegmental area Xiao & Gradinaru, DOI: /j.neuron Nieh & Tye, PMC Lammel & Malenka, PMC Tan & Luscher, DOI: /j.neuron VTA dopamine neuron inhibition by local GABAergic neurons: Tan & Lüscher, 2012. GABA neurons of the VTA drive conditioned place aversion