(WIP) How Mixtral is trained efficiently?