Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
ccnbook-motor_summary [2018/06/02 22:14]
n.arakawa
ccnbook-motor_summary [2018/06/02 22:32] (current)
n.arakawa
Line 1: Line 1:
 +==== CCNBook Motor in a Nutshell ===
 The aim of this article is to present an actor-critic model based on [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​CCNBook/​Main|the CCNBook]]. ​ As the description of reinforcement learning in [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​CCNBook/​Motor|the Motor chapter]] seems a bit ‘roundabout’,​ this memo tries to simplify it.  The aim of this article is to present an actor-critic model based on [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​CCNBook/​Main|the CCNBook]]. ​ As the description of reinforcement learning in [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​CCNBook/​Motor|the Motor chapter]] seems a bit ‘roundabout’,​ this memo tries to simplify it. 
 ---- ----
Line 16: Line 17:
  
 |  {{https://​grey.colorado.edu/​mediawiki/​sites/​CompCogNeuro/​images/​thumb/​e/​e9/​fig_pvlv_bio_no_cereb.png/​400px-fig_pvlv_bio_no_cereb.png}} ​ | |  {{https://​grey.colorado.edu/​mediawiki/​sites/​CompCogNeuro/​images/​thumb/​e/​e9/​fig_pvlv_bio_no_cereb.png/​400px-fig_pvlv_bio_no_cereb.png}} ​ |
-|  **Figure 7.8**: Biological mapping of the PVLV algorithm ​ (from [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​File:​fig_pvlv_bio_no_cereb.png|CCNBook]]) ​ |+|  **Figure 7.8**: Biological mapping of the PVLV algorithm (from [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​File:​fig_pvlv_bio_no_cereb.png|CCNBook]]) ​ |
 |  VS: Ventral Striatum, VTA: [[https://​en.wikipedia.org/​wiki/​Ventral_tegmental_area|Ventral Tegmental Area]], PPT: [[https://​en.wikipedia.org/​wiki/​Pedunculopontine_nucleus|Pedunculopontine Tegmental Nucleus]], LHA: Lateral Hypothalamic Nucleus, CNA: Central Nucleus of the Amygdala, CS: Conditioned Stimuli, US: Unconditioned Stimuli (〜Reward) ​ | |  VS: Ventral Striatum, VTA: [[https://​en.wikipedia.org/​wiki/​Ventral_tegmental_area|Ventral Tegmental Area]], PPT: [[https://​en.wikipedia.org/​wiki/​Pedunculopontine_nucleus|Pedunculopontine Tegmental Nucleus]], LHA: Lateral Hypothalamic Nucleus, CNA: Central Nucleus of the Amygdala, CS: Conditioned Stimuli, US: Unconditioned Stimuli (〜Reward) ​ |
  
 An apparent problem of Figure 7.8 is that SNc does not receive a reward signal (US).  The problem is solved in Figure PV.1 in [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​CCNBook/​Sims/​Motor/​PVLV|the PVLV page]], where SNc is substituted with VTA. An apparent problem of Figure 7.8 is that SNc does not receive a reward signal (US).  The problem is solved in Figure PV.1 in [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​CCNBook/​Sims/​Motor/​PVLV|the PVLV page]], where SNc is substituted with VTA.
-  
-Figure PV.1 
-LHB: Lateral Habenula 
-RMTg: Rostral Medial Tegmental gyrus 
-If you want distinguish all the parts shown in Figure PV.1, you should keep them in your model. ​ However, if the function of the circuit is the Critic in AC learning, the complication would not be necessary in engineering terms. ​ Figure 2 shows a model in which the complication is encapsulated (parts such as the amygdala, VTA/SNc, part of the striatum are hidden). ​ Note that the TD error δ encodes rt+1 +γV(st+1)−V(st),​ where r stands for reward, V(s) the evaluation of the state s, and γ the discount coefficient. 
  
-Figure 2+|  {{https://​grey.colorado.edu/​mediawiki/​sites/​CompCogNeuro/​images/​thumb/​0/​02/​fig_bvpvlv_pv_lv_only.png/​800px-fig_bvpvlv_pv_lv_only.png?​500}} ​ | 
 +|  **Figure PV.1** (from [[https://​grey.colorado.edu/​CompCogNeuro/​index.php/​File:​fig_bvpvlv_pv_lv_only.png|CCNBook]]) ​ | 
 +|  LHB: [[https://​en.wikipedia.org/​wiki/​Habenula#​Lateral_habenula|Lateral Habenula]], RMTg: Rostral Medial Tegmental gyrus  | 
 + 
 +If you want distinguish all the parts shown in Figure PV.1, you should keep them in your model. ​ However, if the function of the circuit is the Critic in AC learning, the complication would not be necessary in engineering terms. ​ Figure 2 shows a model in which the complication is encapsulated (parts such as the amygdala, VTA/SNc, part of the striatum are hidden). ​ Note that the TD error δ encodes r<​sub>​t+1</​sub>​ +γV(s<​sub>​t+1</​sub>​)−V(s<​sub>​t</​sub>​),​ where r stands for reward, V(s) the evaluation of the state s, and γ the discount coefficient. 
 +|  {{ccnmotornutshell2.png}} ​ | 
 +|  **Figure 2**  |
 You might want to distinguish the reward system from the punishment system, but its physiology may be in the dark. You might want to distinguish the reward system from the punishment system, but its physiology may be in the dark.
 A simple overall (AC) scheme would be modeled as below (Figure 3).  Note that the Frontal Cortex is also included in the State box. A simple overall (AC) scheme would be modeled as below (Figure 3).  Note that the Frontal Cortex is also included in the State box.
- +|  {{ccnmotornutshell3.png}} ​ | 
-Figure 3 +|  **Figure 3**  | 
-Reference +=== Reference ​=== 
-Daphna Joela, Yael Niva, Eytan Ruppin: Actor–critic models of the basal ganglia, Neural Networks 15 (2002).+Daphna Joela, Yael Niva, Eytan Ruppin: ​[[https://​www.princeton.edu/​~yael/​Publications/​NN2002.pdf|Actor–critic models of the basal ganglia]], Neural Networks 15 (2002).