ccnbook-motor_summary

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
Next revision Both sides next revision
ccnbook-motor_summary [2018/06/02 21:52]
n.arakawa created
ccnbook-motor_summary [2018/06/02 22:21]
n.arakawa
Line 1: Line 1:
 The aim of this article is to present an actor-critic model based on [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Main|the CCNBook]].  As the description of reinforcement learning in [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Motor|the Motor chapter]] seems a bit ‘roundabout’, this memo tries to simplify it.  The aim of this article is to present an actor-critic model based on [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Main|the CCNBook]].  As the description of reinforcement learning in [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Motor|the Motor chapter]] seems a bit ‘roundabout’, this memo tries to simplify it. 
 ---- ----
-The chapter is based on the hypothesis that **the basal ganglia (BG) uses the actor-critic type** of reinforcement learning, which in turn is based on the finding that **the dopamine output from SNc encodes TD (time difference) δ** used in AC learning.+The chapter is based on the hypothesis that **the basal ganglia (BG) uses the actor-critic type** of reinforcement learning, which in turn is based on the finding that **the dopamine output from [[https://en.wikipedia.org/wiki/Pars_compacta|SNc]] encodes TD (time difference) δ** used in AC learning.
  
-{{https://grey.colorado.edu/mediawiki/sites/CompCogNeuro/images/8/85/fig_actor_critic_basic.png}} +|  {{https://grey.colorado.edu/mediawiki/sites/CompCogNeuro/images/8/85/fig_actor_critic_basic.png}}  | 
- +|  **Figure 7.6**: Basic structure of the actor critic architecture (from [[https://grey.colorado.edu/CompCogNeuro/index.php/File:fig_actor_critic_basic.png|CCNBook]])  |
-**Figure 7.6**: Basic structure of the actor critic architecture (from [[https://grey.colorado.edu/CompCogNeuro/index.php/File:fig_actor_critic_basic.png|CCNBook]])+
  
  
 Since SNc provides with δ, it is supposed to be part of the Critic, and with the Figures 7.2 & 7.4 of the chapter, the Actor contains the loop of Frontal Cortex, Striatum, and Thalamus with the dopamine (δ) input to the Striatum (Figure 1). Since SNc provides with δ, it is supposed to be part of the Critic, and with the Figures 7.2 & 7.4 of the chapter, the Actor contains the loop of Frontal Cortex, Striatum, and Thalamus with the dopamine (δ) input to the Striatum (Figure 1).
  
-Figure 1+|  {{ccnmotornutshell1.png}} 
 +|  **Figure 1**  |
  
 Here, the Actor determines its action based on the state representation (of the environment) in the Frontal Cortex, which is in turn formed with its input (not shown) from other cortical areas, the amygdala, the hippocampus, and other subcortical nuclei (the input sources vary with the area in FC).  While the Frontal Cortex provides with output options, the Striatum selects an option to be outputted. Here, the Actor determines its action based on the state representation (of the environment) in the Frontal Cortex, which is in turn formed with its input (not shown) from other cortical areas, the amygdala, the hippocampus, and other subcortical nuclei (the input sources vary with the area in FC).  While the Frontal Cortex provides with output options, the Striatum selects an option to be outputted.
 The reward r to the Critic is explained the PVLV (Primary Value, Learned Value) model section of the chapter (Figure 7.8).  The reward r to the Critic is explained the PVLV (Primary Value, Learned Value) model section of the chapter (Figure 7.8). 
  
-Figure 7.8: Biological mapping of the PVLV algorithm +|  {{https://grey.colorado.edu/mediawiki/sites/CompCogNeuro/images/thumb/e/e9/fig_pvlv_bio_no_cereb.png/400px-fig_pvlv_bio_no_cereb.png}} 
-VS: Ventral Striatum +|  **Figure 7.8**: Biological mapping of the PVLV algorithm  (from [[https://grey.colorado.edu/CompCogNeuro/index.php/File:fig_pvlv_bio_no_cereb.png|CCNBook]])  | 
-VTA: Ventral Tegmental Area +|  VS: Ventral StriatumVTA: [[https://en.wikipedia.org/wiki/Ventral_tegmental_area|Ventral Tegmental Area]], PPT: [[https://en.wikipedia.org/wiki/Pedunculopontine_nucleus|Pedunculopontine Tegmental Nucleus]], LHA: Lateral Hypothalamic NucleusCNA: Central Nucleus of the AmygdalaCS: Conditioned StimuliUS: Unconditioned Stimuli (〜Reward)  | 
-PPT: Pedunculopontine Tegmental Nucleus + 
-LHA: Lateral Hypothalamic Nucleus +An apparent problem of Figure 7.8 is that SNc does not receive a reward signal (US).  The problem is solved in Figure PV.1 in [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Sims/Motor/PVLV|the PVLV page]], where SNc is substituted with VTA. 
-CNA: Central Nucleus of the Amygdala + 
-CS: Conditioned Stimuli +|  {{https://grey.colorado.edu/mediawiki/sites/CompCogNeuro/images/thumb/0/02/fig_bvpvlv_pv_lv_only.png/800px-fig_bvpvlv_pv_lv_only.png?500}} 
-US: Unconditioned Stimuli (〜Reward) +|  **Figure PV.1**  | 
-An apparent problem of Figure 7.8 is that SNc does not receive a reward signal (US).  The problem is solved in Figure PV.1 in the PVLV page, where SNc is substituted with VTA. +|  LHB: [[https://en.wikipedia.org/wiki/Habenula#Lateral_habenula|Lateral Habenula]], RMTg: Rostral Medial Tegmental gyrus  | 
-  +
-Figure PV.1 +
-LHB: Lateral Habenula +
-RMTg: Rostral Medial Tegmental gyrus+
 If you want distinguish all the parts shown in Figure PV.1, you should keep them in your model.  However, if the function of the circuit is the Critic in AC learning, the complication would not be necessary in engineering terms.  Figure 2 shows a model in which the complication is encapsulated (parts such as the amygdala, VTA/SNc, part of the striatum are hidden).  Note that the TD error δ encodes rt+1 +γV(st+1)−V(st), where r stands for reward, V(s) the evaluation of the state s, and γ the discount coefficient. If you want distinguish all the parts shown in Figure PV.1, you should keep them in your model.  However, if the function of the circuit is the Critic in AC learning, the complication would not be necessary in engineering terms.  Figure 2 shows a model in which the complication is encapsulated (parts such as the amygdala, VTA/SNc, part of the striatum are hidden).  Note that the TD error δ encodes rt+1 +γV(st+1)−V(st), where r stands for reward, V(s) the evaluation of the state s, and γ the discount coefficient.
  
-Figure 2+|  **Figure 2**  |
 You might want to distinguish the reward system from the punishment system, but its physiology may be in the dark. You might want to distinguish the reward system from the punishment system, but its physiology may be in the dark.
 A simple overall (AC) scheme would be modeled as below (Figure 3).  Note that the Frontal Cortex is also included in the State box. A simple overall (AC) scheme would be modeled as below (Figure 3).  Note that the Frontal Cortex is also included in the State box.
  
-Figure 3 +|  **Figure 3**  | 
-Reference +=== Reference === 
-Daphna Joela, Yael Niva, Eytan Ruppin: Actor–critic models of the basal ganglia, Neural Networks 15 (2002).+Daphna Joela, Yael Niva, Eytan Ruppin: [[https://www.princeton.edu/~yael/Publications/NN2002.pdf|Actor–critic models of the basal ganglia]], Neural Networks 15 (2002).
  • ccnbook-motor_summary.txt
  • Last modified: 2018/06/02 22:32
  • by n.arakawa