instructGPT

instructGPT

InstructGPT

step 1

  • collect data and write some prompt and result
  • fine tune GPT3

step 2

  • labeler rank LLM result
  • train a reward model

step 3

  • optimize reward model with reinforcement learning