Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RHELAI-2756 Adding Adaptive Throttling #476

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eshwarprasadS
Copy link

Why Adaptive Throttling?

The Problem:

If too many requests are sent concurrently to the server, it might:

  • Run out of GPU memory (VRAM) or other resources.
  • Fail to respond to requests within the timeout period.
  • Reducing concurrency manually (batch_num_workers) is a workaround, but it's static and not adaptive to server conditions.

The Solution:

  • Monitor the server's behavior (e.g., response success or failure).
  • Dynamically increase or decrease concurrency based on server feedback.

Key Components in the Code

The AdaptiveThrottler Class

This class encapsulates the logic for adjusting concurrency:

Parameters:

        min_workers: Minimum number of concurrent workers (never go below this).
        max_workers: Maximum allowed concurrent workers, matches the num_cpus from instructlab core.
        initial_workers: Starting number of workers, 50% of max to begin with.
        tolerance: Fraction to reduce workers when errors occur.

@mergify mergify bot added testing Relates to testing ci-failure labels Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-failure testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant