��W(xu��)��(x��)�Ĕ�(sh��)�W(xu��)ԭ��(Ӣ�İ�)

��ߣ��w��

��磺��A��W(xu��)��r�g��2024-07-01

�_�� 16�_ 퓔�(sh��)�� 312

���Σ�Ӌ��C/�W(w��ng)�j(lu��)�N��

�� D �r:¥94.4(8.0��) ��r ~~¥118.0~~ ��䛺�ɿ��T�r

��ُ��܇ �ղ�

�\�M6Ԫ��M39Ԫ��\�M

?�½��س��

��Ǖ��>

>
�Q��(zh��n)�Мy5000�}(��Z��c��_)

�Q��(zh��n)�Мy5000�}(��Z��c��_)

¥38.8¥88
>
ܛ��ܜyԇ.��c�{(di��o)��(y��u)��`֮·

ܛ��ܜyԇ.��c�{(di��o)��(y��u)��`֮·

¥49.3¥69
>
��һ�д��aAndroid

��һ�д��aAndroid

¥58.4¥99
>
��ȌW(xu��)��(x��)

��ȌW(xu��)��(x��)

¥92.4¥168
>
Unreal Engine 4�{�D��ȫ�W(xu��)��(x��)�̳�

Unreal Engine 4�{�D��ȫ�W(xu��)��(x��)�̳�

¥72.2¥168
>
��Ӌ��Cϵ�y(t��ng)-ԭ��3��

��Ӌ��Cϵ�y(t��ng)-ԭ��3��

¥104.3¥139
>
Word/Excel PPT 2013�k��(y��ng)�Ï��T��ͨ-(��ٛ1DVD.��Z��ҕ�l�̌W(xu��)+�k��ģ��+PDF��ӕ�)

Word/Excel PPT 2013�k��(y��ng)�Ï��T��ͨ-(��ٛ1DVD.��Z��ҕ�l�̌W(xu��)+�k��ģ��+PDF��ӕ�)

¥21¥49.9

��ƷԔ��
��Ʒ�uՓ(0�l)

�ЈD�r:¥94.4 ��ُ��܇

��(qu��n)��Ϣ
��ɫ
��(n��i)�ݺ��
Ŀ�
��ߺ��

��W(xu��)��(x��)�Ĕ�(sh��)�W(xu��)ԭ��(Ӣ�İ�) ��(qu��n)��Ϣ

ISBN��9787302658528
�l�δa��9787302658528 ; 978-7-302-65852-8
�b��һ��z�漈
�Ԕ�(sh��)��o
��o
��ٷ��
Ӌ��C/�W(w��ng)�j(lu��)
>
Ӌ��C��Փ

��W(xu��)��(x��)�Ĕ�(sh��)�W(xu��)ԭ��(Ӣ�İ�) ��ɫ

��_ʼ��͸��֪��Ȼ��֪��Ȼ�� GitHub�ի@2000 �� n��ҕ�lȫ�W(w��ng)��ų��^80�f�� (n��i)��x�߷��ڱ�� ̲ġ�ҕ�l��n��λһ�w��

��W(xu��)��(x��)�Ĕ�(sh��)�W(xu��)ԭ��(Ӣ�İ�) ��(n��i)�ݺ��

��ď��W(xu��)��(x��)*��ĸ��_ʼ��B, ��B��A(ch��)�ķ��, ��ؐ��ʽ��ؐ��* ��(y��u)��ʽ, Ȼ��ƏV��ģ�͵ĺ͟oģ�͵ď��W(xu��)��(x��)�㷨, *��ƏV��ں��(sh��)�ƽ��ď��W(xu��)��(x��)�� {(di��o)�Ĕ�(sh��)�W(xu��)�ĽǶ��}��㷨, ��{(di��o)�㷨�ľ��̌��F(xi��n)��Ҫ�� x�߾߂��κ��P(gu��n)�ڏ��W(xu��)��(x��)��֪�R��, �HҪ��x�߾߂�һ��ĸ��Փ�;��Դ��(sh��)��֪�R��x�� ѽ�(j��ng)�߂䏊��W(xu��)��(x��)�ČW(xu��)��(x��)��A(ch��), ��Ԏ��x�߸��һЩ��}��ṩ�µ�ҕ�ǡ� ��?q��)��W(xu��)��(x��)��dȤ�ı��о��о��ˆT��I(y��)��о��ďĘI(y��)��

��W(xu��)��(x��)�Ĕ�(sh��)�W(xu��)ԭ��(Ӣ�İ�) Ŀ�

Overview of this BookChapter 1 Basic Concepts1.1 A grid world example1.2 State and action1.3 State transition1.4 Policy1.5 Reward1.6 Trajectories, returns, and episodes1.7 Markov decision processes1.8 Summary1.9 Q&AChapter 2 State Values and the Bellman Equation2.1 Motivating example 1: Why are returns important?2.2 Motivating example 2: How to calculate returns?2.3 State values2.4 The Bellman equation2.5 Examples for illustrating the Bellman equation2.6 Matrix-vector form of the Bellman equation2.7 Solving state values from the Bellman equation2.7.1 Closed-form solution2.7.2 Iterative solution2.7.3 Illustrative examples2.8 From state value to action value2.8.1 Illustrative examples2.8.2 The Bellman equation in terms of action values2.9 Summary2.10 Q&AChapter 3 Optimal State Values and the Bellman Optimality Equation3.1 Motivating example: How to improve policies?3.2 Optimal state values and optimal policies3.3 The Bellman optimality equation3.3.1 Maximization of the right-hand side of the BOE3.3.2 Matrix-vector form of the BOE3.3.3 Contraction mapping theorem3.3.4 Contraction property of the right-hand side of the BOE3.4 Solving an optimal policy from the BOE3.5 Factors that influence optimal policies3.6 Summary3.7 Q&AChapter 4 Value Iteration and Policy Iteration4.1 Value iteration4.1.1 Elementwise form and implementation4.1.2 Illustrative examples4.2 Policy iteration4.2.1 Algorithm analysis4.2.2 Elementwise form and implementation4.2.3 Illustrative examples4.3 Truncated policy iteration4.3.1 Comparing value iteration and policy iteration4.3.2 Truncated policy iteration algorithm4.4 Summary4.5 Q&AChapter 5 Monte Carlo Methods5.1 Motivating example: Mean estimation5.2 MC Basic: The simplest MC-based algorithm5.2.1 Converting policy iteration to be model-free5.2.2 The MC Basic algorithm5.2.3 Illustrative examples5.3 MC Exploring Starts5.3.1 Utilizing samples more efficiently5.3.2 Updating policies more efficiently5.3.3 Algorithm description5.4 MC ��-Greedy: Learning without exploring starts5.4.1 ��-greedy policies5.4.2 Algorithm description5.4.3 Illustrative examples5.5 Exploration and exploitation of ��-greedy policies5.6 Summary5.7 Q&AChapter 6 Stochastic Approximation6.1 Motivating example: Mean estimation6.2 Robbins-Monro algorithm6.2.1 Convergence properties6.2.2 Application to mean estimation6.3 Dvoretzky's convergence theorem6.3.1 Proof of Dvoretzky's theorem6.3.2 Application to mean estimation6.3.3 Application to the Robbins-Monro theorem6.3.4 An extension of Dvoretzky's theorem6.4 Stochastic gradient descent6.4.1 Application to mean estimation6.4.2 Convergence pattern of SGD6.4.3 A deterministic formulation of SGD6.4.4 BGD, SGD, and mini-batch GD6.4.5 Convergence of SGD6.5 Summary6.6 Q&AChapter 7 Temporal-Difference Methods7.1 TD learning of state values7.1.1 Algorithm description7.1.2 Property analysis7.1.3 Convergence analysis7.2 TD learning of action values: Sarsa7.2.1 Algorithm description7.2.2 Optimal policy learning via Sarsa7.3 TD learning of action values: n-step Sarsa7.4 TD learning of optimal action values: Q-learning7.4.1 Algorithm description7.4.2 Off-policy vs. on-policy7.4.3 Implementation7.4.4 Illustrative examples7.5 A unifed viewpoint7.6 Summary7.7 Q&AChapter 8 Value Function Approximation8.1 Value representation: From table to function8.2 TD learning of state values with function approximation8.2.1 O

չ�_ȫ��

��W(xu��)��(x��)�Ĕ�(sh��)�W(xu��)ԭ��(Ӣ�İ�) ��ߺ��

�w��W(xu��)��W(xu��)ԺAI��֧��Ƹ�о��T��ܟo��ϵ�y(t��ng)��ؓ(f��)؟(z��)�ˣ��Һ��ߌӴ��˲��MӋ��Ŀ�@��T��I(y��)�ڱ��պ��W(xu��)��ʿ��I(y��)��¼��W(xu��)��Ӣ��x�Ơ��´�W(xu��)�Ԅӿ��cϵ�y(t��ng)��ϵLecturer��аl(f��)��Ȥ��(zh��n)�Ե��һ��C��ϵ�y(t��ng)��c�P(gu��n)ע��C��ϵ�y(t��ng)�еĿ��Q��c��֪�Ȇ��}��

��Ʒ�uՓ(0�l)

��u ٍ��

��o�uՓ��

��]

>
�Ї��vʷ��˲�g
�Ї��vʷ��˲�g
��
¥16.7~~¥38.0~~
>
�_ӹ��(li��n)��n�
�_ӹ��(li��n)��n�
�_ӹ
¥13.8~~¥32.0~~
>
��е��
��е��
��
¥17.1~~¥45.0~~
>
��~��Փ/��С��
��~��Փ/��С��
��
¥9.4~~¥24.0~~
>
��c�؉�
��c�؉�
ʷ�F��
¥16.5~~¥28.0~~
>
��?gu��)��x��Ѹ:��¾�
��?gu��)��x��Ѹ:��¾�
��Ѹ ��
¥13.0~~¥26.0~~
>
�ͽ��˼�
�ͽ��˼�
�ͽ�
¥14.7~~¥46.0~~
>
��ČW(xu��)��ɾ��--��Ѹ�c��m/�t�T�W(xu��)�g(sh��)��(�t�T�W(xu��)�g(sh��)��)
��ČW(xu��)��ɾ��--��Ѹ�c��m/�t�T�W(xu��)�g(sh��)��(�t�T�W(xu��)�g(sh��)��)
��t
¥9.9~~¥23.0~~