08-08-2012, 09:43 AM
Markov Systems with Rewards, Markov Decision Processes
MarkovModels10.pdf (Size: 172.67 KB / Downloads: 37)
Where We Are and Outline
• Planning
– Deterministic state, preconditions, effects
– Uncertainty
• Conditional planning, conformant planning, nondeterministic
• Probabilistic modeling of systems with
uncertainty and rewards
• Modeling probabilistic systems with control, i.e.,
action selection
• Reinforcement learning
Markov Systems with Rewards
• Finite set of n states, si
• Probabilistic state matrix, P, pij
• “Goal achievement” - Reward for each state, ri
• Discount factor - γ
• Process/observation:
– Assume start state si
– Receive immediate reward ri
– Move, or observe a move, randomly to a new state
according to the probability transition matrix
– Future rewards (of next state) are discounted by γ
Summary
• Markov Models with Reward
• Value iteration
• Markov Decision Process
• Value Iteration
• Policy Iteration
• Reinforcement Learning