2 Formulate business problems with Markov decision process

 

This chapter covers

  • How Markov Decision Process offers a framework to tackle complex business problems?
  • The key components of a Markov Decision Processes.
  • Hands-on case for formulating real-world business problems as Markov Decision Process
  • Strategies for reward engineering and constraint handling
There is only one basic way of dealing with complexity: divide and conquer.

Bjarne Stroustrup, creator of C++

They say the best way to eat an elephant is one bite at a time. As odd as that sounds, it’s a popular metaphor for tackling big problems: break them down into manageable pieces. But here’s the catch: imagine you actually did cut an elephant into parts. Could you ever put it back together and call it alive again? Not likely.

That’s the problem with how we often approach complex business challenges. Yes, dividing a problem helps us understand it — but if we don’t have a way to reassemble the pieces meaningfully, we risk ending up with a puzzle we can’t solve, or worse, a lifeless mess.

This is where systems thinking comes in. It tells us that analysis — the art of taking things apart — must be paired with synthesis — the ability to see the whole forest, not just the individual trees. Solving real-world problems isn’t just about breaking them down. It’s also about knowing how to bring the parts back together in a smart, coherent way.

2.1 State: Anatomy of sequential decision making

2.2 Markov chain and Markov property

2.3 Markov decision process

2.4 Examples of Markov decision processes

2.5 Build a Markov Decision Process for Production Planning

2.6 Reward engineering and constraint handling strategies

2.6.1 Design rewards to be stepwise, whenever possible

2.6.2 Inject constraint information into the state

2.6.3 Handle soft constrains with stepwise penalties

2.6.4 Use action masking with penalties to handle hard constraints

2.6.5 Avoid mismatched scales with reward normalization / balancing

2.6.6 Avoid deceptive shortcuts in reward function

2.7 Summary