The policy optimization method