-
Notifications
You must be signed in to change notification settings - Fork 1
Rework of Rewards and Queueing + Small hotfixes #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
enlorenz
commented
Jan 28, 2026
- After Refactor, "new" Energy-efficiency reward was gone. It's reintroduced, and put through where "efficiency-reward" is carried.
- Introduce helper_queue to handle job-submission overflow. Agent still only sees 1000 MAX_QUEUE as observation space, while excess jobs are caught by helper.
- Reworked job-age penalty. Now, takes age of longest-waiting job. Smooth rise of penalty with age, after 1 Week max. penalty reached.
- Added more metrics/logs into callbacks for tensorboard.
- Some small bug-fixes.
… refactoring. Now reintroduced and used instead of "efficiency" function.
8bf27e2 to
963be49
Compare
📝 WalkthroughWalkthroughAdds deque-backed backlog handling and helpers to age/fill backlogs, integrates backlog into baseline and environment flows, disables age-based job dropping, refactors reward composition to separate energy-efficiency and blackout terms, expands hourly/episode metrics, introduces MAX_JOB_AGE_OBS, renames weight prefixes, and adds Changes
Sequence Diagram(s)sequenceDiagram
participant Env as Environment
participant JM as JobManagement
participant Q as MainQueue
participant BQ as BacklogQueue
participant BL as Baseline
participant Nodes as Nodes
Env->>JM: age_backlog_queue(BQ, metrics)
JM-->>Env: updated BQ, dropped_count
Env->>JM: fill_queue_from_backlog(Q, BQ, next_empty_slot)
JM->>BQ: pop FIFO jobs
JM->>Q: insert jobs
JM-->>Env: next_empty_slot, moved_count
Env->>JM: add_new_jobs(Q, new_jobs..., backlog_queue=BQ)
alt main queue full
JM->>BQ: append overflow job
else space available
JM->>Q: insert job
end
JM-->>Env: new_jobs, next_empty_slot
Env->>BL: baseline_step(..., baseline_backlog_queue=BQ)
BL->>JM: may call age/fill helpers
BL-->>Env: baseline results
Env->>Nodes: assign_jobs_to_available_nodes(...)
Nodes-->>Env: running jobs / completions
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@src/config.py`:
- Around line 9-10: The observation `job_queue` uses MAX_JOB_AGE for all four
job attributes causing job_age to be massively overbounded; add a realistic
observation cap like MAX_JOB_AGE_OBS (e.g., set to EPISODE_HOURS * 26 for ~6
months) in config.py and replace the job_age entry in the `high` vector used to
build the `job_queue` observation space in environment.py (referencing the
`job_queue` observation construction and `MAX_JOB_AGE` symbol) so that the
job_age bound uses MAX_JOB_AGE_OBS while duration/nodes/cores keep their
existing reasonable bounds.
In `@src/environment.py`:
- Line 340: Replace the extraneous f-string used in the env output: change the
call to self.env_print(f"[5] Calculating reward...") so it uses a plain string
without the f prefix (self.env_print("[5] Calculating reward...")); update the
occurrence inside the surrounding method where self.env_print is called to
remove the unnecessary f-string formatting.
In `@src/reward_calculation.py`:
- Around line 202-204: The combined efficiency term (efficiency_reward_norm) can
be in [-1,1] because _reward_energy_efficiency_normalized returns [0,1] and
_blackout_term returns [-1,1]; if downstream code expects [0,1], replace the
current sum with a normalized mapping and clamp: compute raw =
self._reward_energy_efficiency_normalized(...) + self._blackout_term(...), then
set efficiency_reward_norm = max(0.0, min(1.0, (raw + 1.0) / 2.0)) so the result
is guaranteed in [0,1]; update the expression where efficiency_reward_norm is
computed (referencing _reward_energy_efficiency_normalized, _blackout_term, and
the efficiency_reward_norm variable) and add a short comment explaining the
normalization.
🧹 Nitpick comments (6)
requirements.txt (1)
8-8: Consider pinning dependency versions for reproducibility.The new
torchinfodependency follows the existing pattern of unpinned versions. For production stability and reproducible builds, consider pinning versions for all dependencies (e.g.,torchinfo==1.8.0). This is optional since it matches the current style.src/baseline.py (2)
16-17: Update docstring to includebaseline_helper_queueparameter.The new
baseline_helper_queueparameter is added to the function signature but not documented in the Args section of the docstring.Proposed fix
Add after line 35:
baseline_helper_queue: Helper queue for baseline overflow jobsAlso applies to: 36-36
50-53: Use underscore prefix for unused variable.
new_baseline_jobsis never used after assignment. Per Python convention, prefix unused variables with_to signal intent.Proposed fix
- new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( + _new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( job_queue_2d, new_jobs_count, new_jobs_durations, new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue )src/job_management.py (1)
231-232: Document the hotfix more explicitly for future cleanup.The
if False and new_age > MAX_JOB_AGEpattern effectively disables dropping. Consider adding a TODO comment with a tracking reference (e.g., issue number) to ensure this temporary measure gets revisited.📝 Suggested improvement
- # TEMP HOTFIX: disable age-based dropping (keep logic for later). - if False and new_age > MAX_JOB_AGE: + # TEMP HOTFIX: disable age-based dropping (keep logic for later). + # TODO(enlorenz): Re-enable age-based dropping once queue overflow is properly handled. + if False and new_age > MAX_JOB_AGE:src/reward_calculation.py (2)
135-139: Normalization produces incorrect values:_normalize(current, 0, -1)inverts the scale unexpectedly.The
_normalizemethod computes(current - minimum) / (maximum - minimum). Withminimum=0andmaximum=-1:
- When
current_penalty = 0: returns(0 - 0) / (-1 - 0) = 0- When
current_penalty = 1: returns(1 - 0) / (-1 - 0) = -1This effectively maps
[0, 1]→[0, -1], which is the intended outcome, but the parameter ordering is confusing (passing-1asmaximumwhen it's actually the minimum of the output range). The subsequentclip(-1, 0)ensures correctness, but the code is misleading.♻️ Clearer implementation
def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d): """Calculate normalized job age penalty [-1, 0].""" current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d) - normalized_penalty = self._normalize(current_penalty, 0, -1) - return np.clip(normalized_penalty, -1, 0) + # current_penalty is in [0, 1], map to [-1, 0] + return -current_penalty
158-176: Blackout term uses hardcoded magic numbers.
BLACKOUT_QUEUE_THRESHOLD = 10andSATURATION_FACTOR = 2are defined inline. Consider extracting these tosrc/config.pyfor consistency with other configurable parameters and easier tuning.The penalty formula
exp(-ratio * 2) - 1produces:
- 0 jobs →
exp(0) - 1 = 0(but handled separately, returns 1.0)- 10 jobs →
exp(-2) - 1 ≈ -0.865- 20 jobs →
exp(-4) - 1 ≈ -0.982This provides smooth saturation toward -1, which is sensible for the design.
📝 Consider moving constants to config
from src.config import ( COST_IDLE_MW, COST_USED_MW, PENALTY_IDLE_NODE, - PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS + PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS, + BLACKOUT_QUEUE_THRESHOLD, BLACKOUT_SATURATION_FACTOR )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/job_management.py`:
- Around line 9-33: The age_helper_queue docstring is inaccurate because
age-based dropping is currently disabled by a temporary hotfix (the hardcoded
"if False" guard) so jobs are always kept; update the docstring for
age_helper_queue to state that jobs are incremented for age but dropping based
on MAX_JOB_AGE is temporarily disabled (mention the "if False" hotfix and
MAX_JOB_AGE) and add a brief TODO indicating where/how to re-enable the drop
logic (remove the "if False" guard and use job[1] > MAX_JOB_AGE) and that
metrics updates live in the now-disabled branch.
- Around line 132-141: The add_new_jobs() function currently omits jobs pushed
into helper_queue from the returned new_jobs list, causing environment.py to
undercount submitted jobs; update add_new_jobs() so that whenever a job is
appended to helper_queue (the branch that uses helper_queue.append([...]) ),
also append a corresponding job representation to new_jobs (same
structure/metadata you use for queue entries) so helper-queued jobs are included
in the returned list, and update the function docstring to state that returned
new_jobs includes both directly queued and helper-queued jobs (next_empty_slot
return stays the same); ensure this change keeps compatibility with
fill_queue_from_helper() which will still consume helper_queue.
🧹 Nitpick comments (4)
src/job_management.py (1)
231-248: Replaceif Falsewith a config flag for the hotfix toggle.Line 232’s
if False and ...is non-idiomatic and easy to miss. Prefer an explicit flag (e.g.,ENABLE_AGE_DROP) in config so the behavior is discoverable and consistent.♻️ Suggested refactor
-from src.config import ( - MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE -) +from src.config import ( + MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE, ENABLE_AGE_DROP +) ... - if False and new_age > MAX_JOB_AGE: + if ENABLE_AGE_DROP and new_age > MAX_JOB_AGE:src/reward_calculation.py (2)
117-139: Optional simplification: avoid inverted bounds in job-age normalization.
_penalty_job_age()already returns[0, 1], so_normalize(..., 0, -1)is effectively-current_penalty. Consider returning-current_penaltydirectly for clarity.♻️ Possible simplification
- normalized_penalty = self._normalize(current_penalty, 0, -1) - return np.clip(normalized_penalty, -1, 0) + return np.clip(-current_penalty, -1, 0)
141-176: Consider centralizing blackout constants for tuning.
BLACKOUT_QUEUE_THRESHOLDandSATURATION_FACTORare core tuning knobs; promoting them tosrc/config.pykeeps constants centralized and consistent across modules.As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.
test/test_sanity_env.py (1)
87-87: Use MAX_JOB_AGE constant to keep the test in sync.You already import config constants; consider replacing the hard-coded
168 * 52 * 10withMAX_JOB_AGEto avoid drift.♻️ Suggested change
-from src.config import ( +from src.config import ( MAX_JOB_DURATION, MIN_NODES_PER_JOB, MAX_NODES_PER_JOB, MIN_CORES_PER_JOB, - CORES_PER_NODE, EPISODE_HOURS + CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE ) ... - assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}" + assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.
6763cc8 to
440c0b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/job_management.py (1)
116-131: Addhelper_queueto the Args docstring.The new parameter is undocumented, which can mislead callers.
Suggested doc fix
Args: job_queue_2d: 2D job queue array (MAX_QUEUE_SIZE, 4) new_jobs_count: Number of new jobs to add new_jobs_durations: List of job durations new_jobs_nodes: List of nodes required per job new_jobs_cores: List of cores per node required per job next_empty_slot: Index of next empty slot in queue + helper_queue: Optional overflow queue for jobs when main queue is full
🤖 Fix all issues with AI agents
In `@src/reward_calculation.py`:
- Around line 135-139: The normalization call in _penalty_job_age_normalized is
using inverted bounds; call _normalize with the true source range (minimum=0,
maximum=1) and then invert the normalized result to map into [-1,0].
Specifically, in _penalty_job_age_normalized replace the
_normalize(current_penalty, 0, -1) usage with _normalize(current_penalty, 0, 1)
and negate the returned value (e.g., -normalized) before applying np.clip; this
keeps _normalize semantics correct and ensures the final value lies in [-1, 0].
🧹 Nitpick comments (5)
test/test_sanity_env.py (1)
85-87: Consider importingMAX_JOB_AGEinstead of hardcoding the magic number.The file already imports several constants from
src.config(lines 24-29). While the comment says "keep hard-coded here to avoid importing env constants," usingMAX_JOB_AGEwould prevent this check from silently becoming stale if the config value changes again.Suggested improvement
from src.config import ( MAX_JOB_DURATION, MIN_NODES_PER_JOB, MAX_NODES_PER_JOB, MIN_CORES_PER_JOB, - CORES_PER_NODE, EPISODE_HOURS + CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE )Then at line 87:
- assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}" + assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"src/baseline.py (1)
50-53: Rename unused variable to_per static analysis hint.The
new_baseline_jobsreturn value is not used. Prefixing with underscore signals intentional discard and silences the linter.Proposed fix
- new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( + _, baseline_next_empty_slot = add_new_jobs( job_queue_2d, new_jobs_count, new_jobs_durations, new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue )src/reward_calculation.py (1)
163-164: Consider moving reward constants toconfig.py.
BLACKOUT_QUEUE_THRESHOLDandSATURATION_FACTORare reward-shaping parameters that affect RL training. Per coding guidelines, configuration constants should be centralized for consistent tuning.Suggested refactor
In
src/config.py:# Blackout reward shaping BLACKOUT_QUEUE_THRESHOLD = 10 # jobs waiting until blackout penalty saturates BLACKOUT_SATURATION_FACTOR = 2Then import and use in
_blackout_term.Based on coding guidelines: "Configuration constants must be centralized and updated consistently across all modules".
src/job_management.py (2)
43-59: Document the required deque-like interface forhelper_queue.
popleft()implies acollections.deque-style container; a brief Args note (or type hint) would prevent accidental misuse.Suggested doc tweak
def fill_queue_from_helper(job_queue_2d, helper_queue, next_empty_slot): """ Move jobs from helper queue into the real queue (FIFO) until full. + Args: + helper_queue: deque-like queue supporting popleft() (e.g., collections.deque). """
236-238: Consider a named flag instead ofif Falseto track the hotfix.A local constant makes intent clearer and easier to toggle/search.
Example refactor
+# TEMP HOTFIX toggle for age-based drops +AGE_DROP_ENABLED = False ... - if False and new_age > MAX_JOB_AGE: + if AGE_DROP_ENABLED and new_age > MAX_JOB_AGE:
440c0b6 to
b62bdbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/environment.py`:
- Around line 265-270: The baseline backlog is being aged twice each step: once
via age_backlog_queue(self.baseline_backlog_queue, ...) in environment.py and
again inside baseline.baseline_step(baseline_backlog_queue, metrics, ...).
Remove the duplicate call in environment.py by deleting the
age_backlog_queue(self.baseline_backlog_queue, self.metrics, is_baseline=True)
line so baseline aging is only performed by baseline_step; leave the normal
backlog aging (age_backlog_queue(self.backlog_queue, ...)) and the
fill_queue_from_backlog call unchanged.
🧹 Nitpick comments (4)
test/test_sanity_env.py (1)
87-87: Consider importingMAX_JOB_AGEinstead of hardcoding the value.The hardcoded
168 * 52 * 10matchesMAX_JOB_AGEfromsrc/config.py, but using the constant directly would ensure consistency if the value changes in the future.♻️ Suggested improvement
Add to imports at line 24-29:
from src.config import ( MAX_JOB_DURATION, MIN_NODES_PER_JOB, MAX_NODES_PER_JOB, MIN_CORES_PER_JOB, CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE # Add this )Then update line 87:
- assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}" + assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"As per coding guidelines: Configuration constants must be centralized and updated consistently across all modules.
src/baseline.py (1)
50-53: Unused variablenew_baseline_jobscan be replaced with_.The return value
new_baseline_jobsis never used since metrics are updated usingnew_jobs_countdirectly. This is correct behavior, but the variable should be discarded to satisfy the linter and clarify intent.♻️ Suggested fix
- new_baseline_jobs, baseline_next_empty_slot = add_new_jobs( + _, baseline_next_empty_slot = add_new_jobs( job_queue_2d, new_jobs_count, new_jobs_durations, new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_backlog_queue )src/reward_calculation.py (1)
158-176: Consider extracting magic numbers as module-level or class-level constants.
BLACKOUT_QUEUE_THRESHOLD = 10andSATURATION_FACTOR = 2are defined inline within the method. For tunability and visibility, these could be moved tosrc/config.pyor at least to class-level constants.♻️ Suggested approach
Either add to
src/config.py:BLACKOUT_QUEUE_THRESHOLD = 10 BLACKOUT_SATURATION_FACTOR = 2Or define as class attributes in
RewardCalculator:class RewardCalculator: BLACKOUT_QUEUE_THRESHOLD = 10 BLACKOUT_SATURATION_FACTOR = 2 ...src/job_management.py (1)
236-237: Theif Falseguard creates dead code that linters may flag.While this is intentional as a temporary hotfix, consider using a named constant (e.g.,
AGE_DROPPING_ENABLED = False) to make the intent clearer and avoid lint warnings about unreachable code.♻️ Suggested improvement
Add to
src/config.py:AGE_DROPPING_ENABLED = False # TEMP: disabled; re-enable when readyThen in
src/job_management.py:+from src.config import AGE_DROPPING_ENABLED ... - if False and new_age > MAX_JOB_AGE: + if AGE_DROPPING_ENABLED and new_age > MAX_JOB_AGE:This also makes the
age_backlog_queuefunction's guard consistent.
b62bdbd to
bae5879
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/environment.py (1)
310-358: Include backlog jobs in blackout/queue wait count.
num_unprocessed_jobsreflects only the main queue. If backlog still holds jobs, the blackout term can incorrectly reward a full blackout despite pending work. Use total waiting jobs (main queue + backlog) when callingRewardCalculator.calculate.🐛 Suggested fix
- num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0) - combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue) + num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0) + combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue) + total_waiting_jobs = combined_queue_size @@ - step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate( + step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate( num_used_nodes, num_idle_nodes, current_price, average_future_price, num_off_nodes, num_launched_jobs, num_node_changes, job_queue_2d, - num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print + total_waiting_jobs, self.weights, num_dropped_this_step, self.env_print )
🤖 Fix all issues with AI agents
In `@src/baseline.py`:
- Around line 50-55: The variable new_baseline_jobs returned from the call to
add_new_jobs is unused and triggers a lint warning; rename that binding to
_new_baseline_jobs (or _) in the assignment where add_new_jobs is called so the
call remains the same but the unused result is ignored—update the assignment
that currently reads new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...) to _new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...), leaving subsequent uses of baseline_next_empty_slot and
metrics.baseline_jobs_submitted / metrics.episode_baseline_jobs_submitted
unchanged.
In `@src/job_management.py`:
- Around line 9-18: The function age_backlog_queue currently has unused
parameters metrics and is_baseline causing ARG001 while the drop branch is
disabled; rename those parameters to _metrics and _is_baseline in the
age_backlog_queue signature (or, alternatively, add a no-op reference like "del
metrics, is_baseline" at the top of the function) so the linter knows they are
intentionally unused until the drop logic is re-enabled; ensure you update any
internal references to those names accordingly when you later restore the
branch.
🧹 Nitpick comments (1)
src/callbacks.py (1)
54-71: Consider extracting helper functions to reduce duplication.The completion rate and average wait time calculation patterns are repeated for both agent and baseline metrics. While the current implementation is correct and readable, extracting these into small helper functions would improve maintainability.
♻️ Optional refactor to reduce duplication
+def _safe_completion_rate(completed: int, submitted: int) -> float: + return (completed / submitted * 100) if submitted > 0 else 0.0 + +def _safe_avg_wait(total_wait: float, completed: int) -> float: + return (total_wait / completed) if completed > 0 else 0.0 + class ComputeClusterCallback(BaseCallback):Then in
_on_step:# Job metrics (agent) - completion_rate = ( - env.metrics.episode_jobs_completed / env.metrics.episode_jobs_submitted * 100 - if env.metrics.episode_jobs_submitted > 0 - else 0.0 - ) - avg_wait = ( - env.metrics.episode_total_job_wait_time / env.metrics.episode_jobs_completed - if env.metrics.episode_jobs_completed > 0 - else 0.0 - ) + completion_rate = _safe_completion_rate( + env.metrics.episode_jobs_completed, env.metrics.episode_jobs_submitted + ) + avg_wait = _safe_avg_wait( + env.metrics.episode_total_job_wait_time, env.metrics.episode_jobs_completed + ) ... # Job metrics (baseline) - baseline_completion_rate = ( - env.metrics.episode_baseline_jobs_completed / env.metrics.episode_baseline_jobs_submitted * 100 - if env.metrics.episode_baseline_jobs_submitted > 0 - else 0.0 - ) - baseline_avg_wait = ( - env.metrics.episode_baseline_total_job_wait_time / env.metrics.episode_baseline_jobs_completed - if env.metrics.episode_baseline_jobs_completed > 0 - else 0.0 - ) + baseline_completion_rate = _safe_completion_rate( + env.metrics.episode_baseline_jobs_completed, env.metrics.episode_baseline_jobs_submitted + ) + baseline_avg_wait = _safe_avg_wait( + env.metrics.episode_baseline_total_job_wait_time, env.metrics.episode_baseline_jobs_completed + )
f0924ed to
88b68ea
Compare
helper_queue allows to keep observation space at "reduced" queue-length, while no job is rejected.
Set Max-Age to very high number (observation space limit). Hotfixed dropping jobs after 1 week, kept logic, if actual dropping shall be implemented later. Age-Penalty now looks at the age of the oldest waiting job; smooth exponential for early times, saturates at -1 after one week.
…erlapping tags Added metrics beyond the original five are: metrics/episode_baseline_cost metrics/episode_baseline_cost_off metrics/episode_jobs_submitted metrics/episode_jobs_completed metrics/episode_completion_rate metrics/episode_avg_wait_hours metrics/episode_max_queue_size metrics/episode_jobs_rejected_queue_full metrics/episode_baseline_jobs_submitted metrics/episode_baseline_jobs_completed metrics/episode_baseline_completion_rate metrics/episode_baseline_avg_wait_hours metrics/episode_baseline_max_queue_size metrics/episode_baseline_jobs_dropped metrics/episode_baseline_jobs_rejected_queue_full
88b68ea to
e7d63d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/environment.py (1)
355-358: Job-age penalty ignores backlog jobs still waiting.
The reward usesjob_queue_2donly, so backlog jobs (which can be the oldest) won’t affect the age penalty while they wait outside the main queue. Consider folding backlog entries into the penalty input.🛠️ One way to include backlog ages
- step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate( - num_used_nodes, num_idle_nodes, current_price, average_future_price, - num_off_nodes, num_launched_jobs, num_node_changes, job_queue_2d, - num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print - ) + job_queue_for_penalty = job_queue_2d + if self.backlog_queue: + job_queue_for_penalty = np.vstack( + [job_queue_2d, np.array(self.backlog_queue, dtype=job_queue_2d.dtype)] + ) + step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate( + num_used_nodes, num_idle_nodes, current_price, average_future_price, + num_off_nodes, num_launched_jobs, num_node_changes, job_queue_for_penalty, + num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print + )
🤖 Fix all issues with AI agents
In `@src/plotter.py`:
- Line 225: The prefix used when building filenames is inconsistent: update the
prefix inside plot_cumulative_savings() so it matches the new format used
elsewhere (use
e{env.weights.efficiency_weight}_p{env.weights.price_weight}_i{env.weights.idle_weight}_a{env.weights.job_age_weight}_d{env.weights.drop_weight});
replace the old `_d{job_age_weight}` segment with
`_a{env.weights.job_age_weight}_d{env.weights.drop_weight}` and ensure the
function references env.weights the same way as the earlier prefix assignment.
🧹 Nitpick comments (2)
requirements.txt (1)
1-8: Consider pinning dependency versions for reproducibility.None of the dependencies specify version constraints, which can lead to non-reproducible builds and unexpected breakage from upstream changes. Consider pinning versions (e.g.,
stable-baselines3==2.2.1) or using version ranges (e.g.,numpy>=1.24,<2.0) to ensure consistent environments across installations.src/reward_calculation.py (1)
163-176: Move blackout tuning constants to config.
BLACKOUT_QUEUE_THRESHOLDandSATURATION_FACTORlook like configuration knobs; centralizing them makes experiments easier to control and audit.
As per coding guidelines: Configuration constants must be centralized and updated consistently across all modules.
…re filenames in plotter.
d218798 to
6f75711
Compare