Blue Yonder interview question

Implement an LLM post-training using RL.