Name		Name	Last commit message	Last commit date
parent directory ..
notebooks		notebooks
README.md		README.md
dpo.md		dpo.md
orpo.md		orpo.md

README.md

Preference Alignment

This module covers techniques for aligning language models with human preferences. While supervised fine-tuning helps models learn tasks, preference alignment encourages outputs to match human expectations and values.

Overview

Typical alignment methods involve multiple stages:

Supervised Fine-Tuning (SFT) to adapt models to specific domains
Preference alignment (like RLHF or DPO) to improve response quality

Alternative approaches like ORPO combine instruction tuning and preference alignment into a single process. Here, we will focus on DPO and ORPO algorithms.

If you would like to learn more about the different alignment techniques, you can read more about them in the Argilla Blog.

Direct Preference Optimization (DPO)

DPO simplifies preference alignment by directly optimizing models using preference data, eliminating the need for separate reward models and complex reinforcement learning. This makes it more stable and efficient than traditional RLHF.

Key benefits:

No separate reward model needed
More stable training process
Lower computational requirements

Odds Ratio Preference Optimization (ORPO)

ORPO introduces a combined approach to instruction tuning and preference alignment in a single process. It modifies the standard language modeling objective by combining negative log-likelihood loss with an odds ratio term on a token level.

Key innovations:

Unified single-stage training process
Reference model-free architecture
Improved computational efficiency

ORPO has shown impressive results across various benchmarks. Better performance on AlpacaEval compared to traditional methods. Strong results on MT-Bench, even without multi-turn training. Effective across different model sizes (125M to 1.3B parameters).

Next Steps

Start with DPO for a simpler introduction to preference alignment.
Explore ORPO for a unified approach
Try the practical tutorials in the notebooks/ directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2_preference_alignment

2_preference_alignment

README.md

Preference Alignment

Overview

Direct Preference Optimization (DPO)

Odds Ratio Preference Optimization (ORPO)

Next Steps

Resources

Files

2_preference_alignment

Directory actions

More options

Directory actions

More options

Latest commit

History

2_preference_alignment

Folders and files

parent directory

README.md

Preference Alignment

Overview

Direct Preference Optimization (DPO)

Odds Ratio Preference Optimization (ORPO)

Next Steps

Resources