Differences
This shows you the differences between two versions of the page.
Next revision Both sides next revision | |||
ccnbook-motor_summary [2018/06/02 21:52] n.arakawa created |
ccnbook-motor_summary [2018/06/02 22:14] n.arakawa |
||
---|---|---|---|
Line 1: | Line 1: | ||
The aim of this article is to present an actor-critic model based on [[https:// | The aim of this article is to present an actor-critic model based on [[https:// | ||
---- | ---- | ||
- | The chapter is based on the hypothesis that **the basal ganglia (BG) uses the actor-critic type** of reinforcement learning, which in turn is based on the finding that **the dopamine output from SNc encodes TD (time difference) δ** used in AC learning. | + | The chapter is based on the hypothesis that **the basal ganglia (BG) uses the actor-critic type** of reinforcement learning, which in turn is based on the finding that **the dopamine output from [[https:// |
- | {{https:// | + | | |
- | + | | | |
- | **Figure 7.6**: Basic structure of the actor critic architecture (from [[https:// | + | |
Since SNc provides with δ, it is supposed to be part of the Critic, and with the Figures 7.2 & 7.4 of the chapter, the Actor contains the loop of Frontal Cortex, Striatum, and Thalamus with the dopamine (δ) input to the Striatum (Figure 1). | Since SNc provides with δ, it is supposed to be part of the Critic, and with the Figures 7.2 & 7.4 of the chapter, the Actor contains the loop of Frontal Cortex, Striatum, and Thalamus with the dopamine (δ) input to the Striatum (Figure 1). | ||
- | Figure 1 | + | | {{ccnmotornutshell1.png}} |
+ | | **Figure 1** | | ||
Here, the Actor determines its action based on the state representation (of the environment) in the Frontal Cortex, which is in turn formed with its input (not shown) from other cortical areas, the amygdala, the hippocampus, | Here, the Actor determines its action based on the state representation (of the environment) in the Frontal Cortex, which is in turn formed with its input (not shown) from other cortical areas, the amygdala, the hippocampus, | ||
The reward r to the Critic is explained the PVLV (Primary Value, Learned Value) model section of the chapter (Figure 7.8). | The reward r to the Critic is explained the PVLV (Primary Value, Learned Value) model section of the chapter (Figure 7.8). | ||
- | Figure 7.8: Biological mapping of the PVLV algorithm | + | | {{https:// |
- | VS: Ventral Striatum | + | | **Figure 7.8**: Biological mapping of the PVLV algorithm |
- | VTA: Ventral Tegmental Area | + | | |
- | PPT: Pedunculopontine Tegmental Nucleus | + | |
- | LHA: Lateral Hypothalamic Nucleus | + | An apparent problem of Figure 7.8 is that SNc does not receive a reward signal (US). The problem is solved in Figure PV.1 in [[https:// |
- | CNA: Central Nucleus of the Amygdala | + | |
- | CS: Conditioned Stimuli | + | |
- | US: Unconditioned Stimuli (〜Reward) | + | |
- | An apparent problem of Figure 7.8 is that SNc does not receive a reward signal (US). The problem is solved in Figure PV.1 in the PVLV page, where SNc is substituted with VTA. | + | |
Figure PV.1 | Figure PV.1 |