For full text search please use the '?' prefix. e.g. ? Onboarding
instructGPT
instructGPT
InstructGPT
step 1
collect data and write some prompt and result
fine tune GPT3
step 2
labeler rank LLM result
train a reward model
step 3
optimize reward model with reinforcement learning
instructGPT
step 1
step 2
step 3