Both sides previous revision
Previous revision
|
Last revision
Both sides next revision
|
ccnbook-motor_summary [2018/06/02 22:21] n.arakawa |
ccnbook-motor_summary [2018/06/02 22:28] n.arakawa |
| ==== CCNBook Motor in a Nutshell === |
The aim of this article is to present an actor-critic model based on [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Main|the CCNBook]]. As the description of reinforcement learning in [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Motor|the Motor chapter]] seems a bit ‘roundabout’, this memo tries to simplify it. | The aim of this article is to present an actor-critic model based on [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Main|the CCNBook]]. As the description of reinforcement learning in [[https://grey.colorado.edu/CompCogNeuro/index.php/CCNBook/Motor|the Motor chapter]] seems a bit ‘roundabout’, this memo tries to simplify it. |
---- | ---- |
| LHB: [[https://en.wikipedia.org/wiki/Habenula#Lateral_habenula|Lateral Habenula]], RMTg: Rostral Medial Tegmental gyrus | | | LHB: [[https://en.wikipedia.org/wiki/Habenula#Lateral_habenula|Lateral Habenula]], RMTg: Rostral Medial Tegmental gyrus | |
| |
If you want distinguish all the parts shown in Figure PV.1, you should keep them in your model. However, if the function of the circuit is the Critic in AC learning, the complication would not be necessary in engineering terms. Figure 2 shows a model in which the complication is encapsulated (parts such as the amygdala, VTA/SNc, part of the striatum are hidden). Note that the TD error δ encodes rt+1 +γV(st+1)−V(st), where r stands for reward, V(s) the evaluation of the state s, and γ the discount coefficient. | If you want distinguish all the parts shown in Figure PV.1, you should keep them in your model. However, if the function of the circuit is the Critic in AC learning, the complication would not be necessary in engineering terms. Figure 2 shows a model in which the complication is encapsulated (parts such as the amygdala, VTA/SNc, part of the striatum are hidden). Note that the TD error δ encodes r<sub>t+1</sub> +γV(s<sub>t+1</sub>)−V(s<sub>t</sub>), where r stands for reward, V(s) the evaluation of the state s, and γ the discount coefficient. |
| | {{ccnmotornutshell2.png}} | |
| **Figure 2** | | | **Figure 2** | |
You might want to distinguish the reward system from the punishment system, but its physiology may be in the dark. | You might want to distinguish the reward system from the punishment system, but its physiology may be in the dark. |
A simple overall (AC) scheme would be modeled as below (Figure 3). Note that the Frontal Cortex is also included in the State box. | A simple overall (AC) scheme would be modeled as below (Figure 3). Note that the Frontal Cortex is also included in the State box. |
| | {{ccnmotornutshell3.png}} | |
| **Figure 3** | | | **Figure 3** | |
=== Reference === | === Reference === |
Daphna Joela, Yael Niva, Eytan Ruppin: [[https://www.princeton.edu/~yael/Publications/NN2002.pdf|Actor–critic models of the basal ganglia]], Neural Networks 15 (2002). | Daphna Joela, Yael Niva, Eytan Ruppin: [[https://www.princeton.edu/~yael/Publications/NN2002.pdf|Actor–critic models of the basal ganglia]], Neural Networks 15 (2002). |