Rework of Rewards and Queueing + Small hotfixes #17

enlorenz · 2026-01-28T13:32:04Z

After Refactor, "new" Energy-efficiency reward was gone. It's reintroduced, and put through where "efficiency-reward" is carried.
Introduce helper_queue to handle job-submission overflow. Agent still only sees 1000 MAX_QUEUE as observation space, while excess jobs are caught by helper.
Reworked job-age penalty. Now, takes age of longest-waiting job. Smooth rise of penalty with age, after 1 Week max. penalty reached.
Added more metrics/logs into callbacks for tensorboard.
Some small bug-fixes.

… refactoring. Now reintroduced and used instead of "efficiency" function.

coderabbitai · 2026-01-28T13:39:22Z

📝 Walkthrough

Walkthrough

Adds deque-backed backlog handling and helpers to age/fill backlogs, integrates backlog into baseline and environment flows, disables age-based job dropping, refactors reward composition to separate energy-efficiency and blackout terms, expands hourly/episode metrics, introduces MAX_JOB_AGE_OBS, renames weight prefixes, and adds torchinfo to requirements.

Changes

Cohort / File(s)	Summary
Queue & Backlog Core `src/job_management.py`, `src/environment.py`, `src/baseline.py`	Introduce `backlog_queue` and `baseline_backlog_queue` (deque); add `age_backlog_queue()` and `fill_queue_from_backlog()`; extend `add_new_jobs(..., backlog_queue=None)` and `baseline_step(..., baseline_backlog_queue)` to integrate backlog; fill backlog into main queue each step and age backlog.
Aging / Drop Semantics `src/job_management.py`, `src/config.py`	Disable age-based dropping in assignment logic; remove `MAX_JOB_AGE` and add `MAX_JOB_AGE_OBS` constant (EPISODE_HOURS * 13) for observable age bounds.
Metrics & Callbacks `src/callbacks.py`, `src/environment.py`, `src/baseline.py`	Expand per-hour/episode metrics (detailed agent and baseline job stats, costs/savings); compute combined queue sizes including backlog; update submission counters and remove some rejection counters.
Reward Logic `src/reward_calculation.py`	Replace linear job-age penalty with saturating penalty; add `_reward_energy_efficiency_normalized()` and `_blackout_term()`; recompute reward using separate energy-efficiency and blackout terms.
Naming for Outputs `train.py`, `src/plotter.py`	Change weights/plot filename prefix to include `a{job_age_weight}` and move drop weight to trailing `d{drop_weight}` segment.
Tests `test/test_sanity_env.py`	Widen age bound invariant to match new observable max age.
Dependencies `requirements.txt`	Add `torchinfo` dependency.

Sequence Diagram(s)

sequenceDiagram
    participant Env as Environment
    participant JM as JobManagement
    participant Q as MainQueue
    participant BQ as BacklogQueue
    participant BL as Baseline
    participant Nodes as Nodes

    Env->>JM: age_backlog_queue(BQ, metrics)
    JM-->>Env: updated BQ, dropped_count

    Env->>JM: fill_queue_from_backlog(Q, BQ, next_empty_slot)
    JM->>BQ: pop FIFO jobs
    JM->>Q: insert jobs
    JM-->>Env: next_empty_slot, moved_count

    Env->>JM: add_new_jobs(Q, new_jobs..., backlog_queue=BQ)
    alt main queue full
        JM->>BQ: append overflow job
    else space available
        JM->>Q: insert job
    end
    JM-->>Env: new_jobs, next_empty_slot

    Env->>BL: baseline_step(..., baseline_backlog_queue=BQ)
    BL->>JM: may call age/fill helpers
    BL-->>Env: baseline results

    Env->>Nodes: assign_jobs_to_available_nodes(...)
    Nodes-->>Env: running jobs / completions

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

RewardCalculator #6 — related changes to RewardCalculator and reward composition (energy-efficiency / blackout refactor).
Rebuilt carry-over logic to fit with new infrastructure. Also added two standalone plotting tests. #11 — overlaps on baseline/backlog handling and episode-level baseline metrics.
Integration of early Dropped-Jobs, RNG Seeding and Workload generator & alternative "prices" script. #3 — related modifications to queue/backlog behavior and drop/aging semantics.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main changes: reward reworking, queueing improvements via helper queue, and miscellaneous fixes.
Description check	✅ Passed	The description covers all major aspects of the changeset including energy-efficiency reward reintroduction, helper queue for overflow, reworked job-age penalty, additional metrics, and bug fixes.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@src/config.py`:
- Around line 9-10: The observation `job_queue` uses MAX_JOB_AGE for all four
job attributes causing job_age to be massively overbounded; add a realistic
observation cap like MAX_JOB_AGE_OBS (e.g., set to EPISODE_HOURS * 26 for ~6
months) in config.py and replace the job_age entry in the `high` vector used to
build the `job_queue` observation space in environment.py (referencing the
`job_queue` observation construction and `MAX_JOB_AGE` symbol) so that the
job_age bound uses MAX_JOB_AGE_OBS while duration/nodes/cores keep their
existing reasonable bounds.

In `@src/environment.py`:
- Line 340: Replace the extraneous f-string used in the env output: change the
call to self.env_print(f"[5] Calculating reward...") so it uses a plain string
without the f prefix (self.env_print("[5] Calculating reward...")); update the
occurrence inside the surrounding method where self.env_print is called to
remove the unnecessary f-string formatting.

In `@src/reward_calculation.py`:
- Around line 202-204: The combined efficiency term (efficiency_reward_norm) can
be in [-1,1] because _reward_energy_efficiency_normalized returns [0,1] and
_blackout_term returns [-1,1]; if downstream code expects [0,1], replace the
current sum with a normalized mapping and clamp: compute raw =
self._reward_energy_efficiency_normalized(...) + self._blackout_term(...), then
set efficiency_reward_norm = max(0.0, min(1.0, (raw + 1.0) / 2.0)) so the result
is guaranteed in [0,1]; update the expression where efficiency_reward_norm is
computed (referencing _reward_energy_efficiency_normalized, _blackout_term, and
the efficiency_reward_norm variable) and add a short comment explaining the
normalization.

🧹 Nitpick comments (6)

requirements.txt (1)

8-8: Consider pinning dependency versions for reproducibility.

The new torchinfo dependency follows the existing pattern of unpinned versions. For production stability and reproducible builds, consider pinning versions for all dependencies (e.g., torchinfo==1.8.0). This is optional since it matches the current style.
src/baseline.py (2)
16-17: Update docstring to include baseline_helper_queue parameter.

The new baseline_helper_queue parameter is added to the function signature but not documented in the Args section of the docstring.
Proposed fix

Add after line 35:
        baseline_helper_queue: Helper queue for baseline overflow jobs
Also applies to: 36-36

50-53: Use underscore prefix for unused variable.

new_baseline_jobs is never used after assignment. Per Python convention, prefix unused variables with _ to signal intent.
Proposed fix
-    new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
+    _new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
         job_queue_2d, new_jobs_count, new_jobs_durations,
         new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue
     )
src/job_management.py (1)
231-232: Document the hotfix more explicitly for future cleanup.

The if False and new_age > MAX_JOB_AGE pattern effectively disables dropping. Consider adding a TODO comment with a tracking reference (e.g., issue number) to ensure this temporary measure gets revisited.
📝 Suggested improvement
-        # TEMP HOTFIX: disable age-based dropping (keep logic for later).
-        if False and new_age > MAX_JOB_AGE:
+        # TEMP HOTFIX: disable age-based dropping (keep logic for later).
+        # TODO(enlorenz): Re-enable age-based dropping once queue overflow is properly handled.
+        if False and new_age > MAX_JOB_AGE:
src/reward_calculation.py (2)
135-139: Normalization produces incorrect values: _normalize(current, 0, -1) inverts the scale unexpectedly.

The _normalize method computes (current - minimum) / (maximum - minimum). With minimum=0 and maximum=-1:

When current_penalty = 0: returns (0 - 0) / (-1 - 0) = 0

When current_penalty = 1: returns (1 - 0) / (-1 - 0) = -1

This effectively maps [0, 1] → [0, -1], which is the intended outcome, but the parameter ordering is confusing (passing -1 as maximum when it's actually the minimum of the output range). The subsequent clip(-1, 0) ensures correctness, but the code is misleading.
♻️ Clearer implementation
     def _penalty_job_age_normalized(self, num_off_nodes, job_queue_2d):
         """Calculate normalized job age penalty [-1, 0]."""
         current_penalty = self._penalty_job_age(num_off_nodes, job_queue_2d)
-        normalized_penalty = self._normalize(current_penalty, 0, -1)
-        return np.clip(normalized_penalty, -1, 0)
+        # current_penalty is in [0, 1], map to [-1, 0]
+        return -current_penalty
158-176: Blackout term uses hardcoded magic numbers.

BLACKOUT_QUEUE_THRESHOLD = 10 and SATURATION_FACTOR = 2 are defined inline. Consider extracting these to src/config.py for consistency with other configurable parameters and easier tuning.

The penalty formula exp(-ratio * 2) - 1 produces:

0 jobs → exp(0) - 1 = 0 (but handled separately, returns 1.0)

10 jobs → exp(-2) - 1 ≈ -0.865

20 jobs → exp(-4) - 1 ≈ -0.982

This provides smooth saturation toward -1, which is sensible for the design.
📝 Consider moving constants to config
 from src.config import (
     COST_IDLE_MW, COST_USED_MW, PENALTY_IDLE_NODE,
-    PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS
+    PENALTY_DROPPED_JOB, MAX_NODES, MAX_NEW_JOBS_PER_HOUR, MAX_QUEUE_SIZE, WEEK_HOURS,
+    BLACKOUT_QUEUE_THRESHOLD, BLACKOUT_SATURATION_FACTOR
 )

src/config.py

src/environment.py

src/reward_calculation.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/job_management.py`:
- Around line 9-33: The age_helper_queue docstring is inaccurate because
age-based dropping is currently disabled by a temporary hotfix (the hardcoded
"if False" guard) so jobs are always kept; update the docstring for
age_helper_queue to state that jobs are incremented for age but dropping based
on MAX_JOB_AGE is temporarily disabled (mention the "if False" hotfix and
MAX_JOB_AGE) and add a brief TODO indicating where/how to re-enable the drop
logic (remove the "if False" guard and use job[1] > MAX_JOB_AGE) and that
metrics updates live in the now-disabled branch.
- Around line 132-141: The add_new_jobs() function currently omits jobs pushed
into helper_queue from the returned new_jobs list, causing environment.py to
undercount submitted jobs; update add_new_jobs() so that whenever a job is
appended to helper_queue (the branch that uses helper_queue.append([...]) ),
also append a corresponding job representation to new_jobs (same
structure/metadata you use for queue entries) so helper-queued jobs are included
in the returned list, and update the function docstring to state that returned
new_jobs includes both directly queued and helper-queued jobs (next_empty_slot
return stays the same); ensure this change keeps compatibility with
fill_queue_from_helper() which will still consume helper_queue.

🧹 Nitpick comments (4)

src/job_management.py (1)
231-248: Replace if False with a config flag for the hotfix toggle.

Line 232’s if False and ... is non-idiomatic and easy to miss. Prefer an explicit flag (e.g., ENABLE_AGE_DROP) in config so the behavior is discoverable and consistent.
♻️ Suggested refactor
-from src.config import (
-    MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE
-)
+from src.config import (
+    MAX_NODES, MAX_JOB_AGE, CORES_PER_NODE, ENABLE_AGE_DROP
+)
 ...
-        if False and new_age > MAX_JOB_AGE:
+        if ENABLE_AGE_DROP and new_age > MAX_JOB_AGE:
src/reward_calculation.py (2)
117-139: Optional simplification: avoid inverted bounds in job-age normalization.

_penalty_job_age() already returns [0, 1], so _normalize(..., 0, -1) is effectively -current_penalty. Consider returning -current_penalty directly for clarity.
♻️ Possible simplification
-        normalized_penalty = self._normalize(current_penalty, 0, -1)
-        return np.clip(normalized_penalty, -1, 0)
+        return np.clip(-current_penalty, -1, 0)
141-176: Consider centralizing blackout constants for tuning.

BLACKOUT_QUEUE_THRESHOLD and SATURATION_FACTOR are core tuning knobs; promoting them to src/config.py keeps constants centralized and consistent across modules.

As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.
test/test_sanity_env.py (1)
87-87: Use MAX_JOB_AGE constant to keep the test in sync.

You already import config constants; consider replacing the hard-coded 168 * 52 * 10 with MAX_JOB_AGE to avoid drift.
♻️ Suggested change
-from src.config import (
+from src.config import (
     MAX_JOB_DURATION,
     MIN_NODES_PER_JOB, MAX_NODES_PER_JOB,
     MIN_CORES_PER_JOB,
-    CORES_PER_NODE, EPISODE_HOURS
+    CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE
 )
 ...
-    assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}"
+    assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"
As per coding guidelines, Configuration constants must be centralized and updated consistently across all modules.

src/job_management.py

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/job_management.py (1)

116-131: Add helper_queue to the Args docstring.

The new parameter is undocumented, which can mislead callers.

Suggested doc fix

     Args:
         job_queue_2d: 2D job queue array (MAX_QUEUE_SIZE, 4)
         new_jobs_count: Number of new jobs to add
         new_jobs_durations: List of job durations
         new_jobs_nodes: List of nodes required per job
         new_jobs_cores: List of cores per node required per job
         next_empty_slot: Index of next empty slot in queue
+        helper_queue: Optional overflow queue for jobs when main queue is full

🤖 Fix all issues with AI agents

In `@src/reward_calculation.py`:
- Around line 135-139: The normalization call in _penalty_job_age_normalized is
using inverted bounds; call _normalize with the true source range (minimum=0,
maximum=1) and then invert the normalized result to map into [-1,0].
Specifically, in _penalty_job_age_normalized replace the
_normalize(current_penalty, 0, -1) usage with _normalize(current_penalty, 0, 1)
and negate the returned value (e.g., -normalized) before applying np.clip; this
keeps _normalize semantics correct and ensures the final value lies in [-1, 0].

🧹 Nitpick comments (5)

test/test_sanity_env.py (1)
85-87: Consider importing MAX_JOB_AGE instead of hardcoding the magic number.

The file already imports several constants from src.config (lines 24-29). While the comment says "keep hard-coded here to avoid importing env constants," using MAX_JOB_AGE would prevent this check from silently becoming stale if the config value changes again.
Suggested improvement
 from src.config import (
     MAX_JOB_DURATION,
     MIN_NODES_PER_JOB, MAX_NODES_PER_JOB,
     MIN_CORES_PER_JOB,
-    CORES_PER_NODE, EPISODE_HOURS
+    CORES_PER_NODE, EPISODE_HOURS, MAX_JOB_AGE
 )
Then at line 87:
-    assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}"
+    assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"
src/baseline.py (1)
50-53: Rename unused variable to _ per static analysis hint.

The new_baseline_jobs return value is not used. Prefixing with underscore signals intentional discard and silences the linter.
Proposed fix
-    new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
+    _, baseline_next_empty_slot = add_new_jobs(
         job_queue_2d, new_jobs_count, new_jobs_durations,
         new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_helper_queue
     )
src/reward_calculation.py (1)
163-164: Consider moving reward constants to config.py.

BLACKOUT_QUEUE_THRESHOLD and SATURATION_FACTOR are reward-shaping parameters that affect RL training. Per coding guidelines, configuration constants should be centralized for consistent tuning.
Suggested refactor

In src/config.py:
# Blackout reward shaping
BLACKOUT_QUEUE_THRESHOLD = 10  # jobs waiting until blackout penalty saturates
BLACKOUT_SATURATION_FACTOR = 2
Then import and use in _blackout_term.
Based on coding guidelines: "Configuration constants must be centralized and updated consistently across all modules".
src/job_management.py (2)
43-59: Document the required deque-like interface for helper_queue.

popleft() implies a collections.deque-style container; a brief Args note (or type hint) would prevent accidental misuse.
Suggested doc tweak
 def fill_queue_from_helper(job_queue_2d, helper_queue, next_empty_slot):
     """
     Move jobs from helper queue into the real queue (FIFO) until full.
+    Args:
+        helper_queue: deque-like queue supporting popleft() (e.g., collections.deque).
     """
236-238: Consider a named flag instead of if False to track the hotfix.

A local constant makes intent clearer and easier to toggle/search.
Example refactor
+# TEMP HOTFIX toggle for age-based drops
+AGE_DROP_ENABLED = False
 ...
-        if False and new_age > MAX_JOB_AGE:
+        if AGE_DROP_ENABLED and new_age > MAX_JOB_AGE:

src/reward_calculation.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/environment.py`:
- Around line 265-270: The baseline backlog is being aged twice each step: once
via age_backlog_queue(self.baseline_backlog_queue, ...) in environment.py and
again inside baseline.baseline_step(baseline_backlog_queue, metrics, ...).
Remove the duplicate call in environment.py by deleting the
age_backlog_queue(self.baseline_backlog_queue, self.metrics, is_baseline=True)
line so baseline aging is only performed by baseline_step; leave the normal
backlog aging (age_backlog_queue(self.backlog_queue, ...)) and the
fill_queue_from_backlog call unchanged.

🧹 Nitpick comments (4)

test/test_sanity_env.py (1)
87-87: Consider importing MAX_JOB_AGE instead of hardcoding the value.

The hardcoded 168 * 52 * 10 matches MAX_JOB_AGE from src/config.py, but using the constant directly would ensure consistency if the value changes in the future.
♻️ Suggested improvement

Add to imports at line 24-29:
from src.config import (
    MAX_JOB_DURATION,
    MIN_NODES_PER_JOB, MAX_NODES_PER_JOB,
    MIN_CORES_PER_JOB,
    CORES_PER_NODE, EPISODE_HOURS,
    MAX_JOB_AGE  # Add this
)
Then update line 87:
-    assert np.all((age >= 0) & (age <= 168 * 52 * 10)), f"age out of bounds min={age.min()} max={age.max()}"
+    assert np.all((age >= 0) & (age <= MAX_JOB_AGE)), f"age out of bounds min={age.min()} max={age.max()}"
As per coding guidelines: Configuration constants must be centralized and updated consistently across all modules.
src/baseline.py (1)
50-53: Unused variable new_baseline_jobs can be replaced with _.

The return value new_baseline_jobs is never used since metrics are updated using new_jobs_count directly. This is correct behavior, but the variable should be discarded to satisfy the linter and clarify intent.
♻️ Suggested fix
-    new_baseline_jobs, baseline_next_empty_slot = add_new_jobs(
+    _, baseline_next_empty_slot = add_new_jobs(
         job_queue_2d, new_jobs_count, new_jobs_durations,
         new_jobs_nodes, new_jobs_cores, baseline_next_empty_slot, baseline_backlog_queue
     )
src/reward_calculation.py (1)
158-176: Consider extracting magic numbers as module-level or class-level constants.

BLACKOUT_QUEUE_THRESHOLD = 10 and SATURATION_FACTOR = 2 are defined inline within the method. For tunability and visibility, these could be moved to src/config.py or at least to class-level constants.
♻️ Suggested approach

Either add to src/config.py:
BLACKOUT_QUEUE_THRESHOLD = 10
BLACKOUT_SATURATION_FACTOR = 2
Or define as class attributes in RewardCalculator:
class RewardCalculator:
    BLACKOUT_QUEUE_THRESHOLD = 10
    BLACKOUT_SATURATION_FACTOR = 2
    ...
src/job_management.py (1)
236-237: The if False guard creates dead code that linters may flag.

While this is intentional as a temporary hotfix, consider using a named constant (e.g., AGE_DROPPING_ENABLED = False) to make the intent clearer and avoid lint warnings about unreachable code.
♻️ Suggested improvement

Add to src/config.py:
AGE_DROPPING_ENABLED = False  # TEMP: disabled; re-enable when ready
Then in src/job_management.py:
+from src.config import AGE_DROPPING_ENABLED
...
-        if False and new_age > MAX_JOB_AGE:
+        if AGE_DROPPING_ENABLED and new_age > MAX_JOB_AGE:
This also makes the age_backlog_queue function's guard consistent.

src/environment.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/environment.py (1)

310-358: Include backlog jobs in blackout/queue wait count.

num_unprocessed_jobs reflects only the main queue. If backlog still holds jobs, the blackout term can incorrectly reward a full blackout despite pending work. Use total waiting jobs (main queue + backlog) when calling RewardCalculator.calculate.

🐛 Suggested fix

-        num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0)
-        combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue)
+        num_unprocessed_jobs = np.sum(job_queue_2d[:, 0] > 0)
+        combined_queue_size = num_unprocessed_jobs + len(self.backlog_queue)
+        total_waiting_jobs = combined_queue_size
@@
-        step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate(
+        step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate(
             num_used_nodes, num_idle_nodes, current_price, average_future_price,
             num_off_nodes, num_launched_jobs, num_node_changes, job_queue_2d,
-            num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print
+            total_waiting_jobs, self.weights, num_dropped_this_step, self.env_print
         )

🤖 Fix all issues with AI agents

In `@src/baseline.py`:
- Around line 50-55: The variable new_baseline_jobs returned from the call to
add_new_jobs is unused and triggers a lint warning; rename that binding to
_new_baseline_jobs (or _) in the assignment where add_new_jobs is called so the
call remains the same but the unused result is ignored—update the assignment
that currently reads new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...) to _new_baseline_jobs, baseline_next_empty_slot =
add_new_jobs(...), leaving subsequent uses of baseline_next_empty_slot and
metrics.baseline_jobs_submitted / metrics.episode_baseline_jobs_submitted
unchanged.

In `@src/job_management.py`:
- Around line 9-18: The function age_backlog_queue currently has unused
parameters metrics and is_baseline causing ARG001 while the drop branch is
disabled; rename those parameters to _metrics and _is_baseline in the
age_backlog_queue signature (or, alternatively, add a no-op reference like "del
metrics, is_baseline" at the top of the function) so the linter knows they are
intentionally unused until the drop logic is re-enabled; ensure you update any
internal references to those names accordingly when you later restore the
branch.

🧹 Nitpick comments (1)

src/callbacks.py (1)

54-71: Consider extracting helper functions to reduce duplication.

The completion rate and average wait time calculation patterns are repeated for both agent and baseline metrics. While the current implementation is correct and readable, extracting these into small helper functions would improve maintainability.

♻️ Optional refactor to reduce duplication

+def _safe_completion_rate(completed: int, submitted: int) -> float:
+    return (completed / submitted * 100) if submitted > 0 else 0.0
+
+def _safe_avg_wait(total_wait: float, completed: int) -> float:
+    return (total_wait / completed) if completed > 0 else 0.0
+
 class ComputeClusterCallback(BaseCallback):

Then in _on_step:

             # Job metrics (agent)
-            completion_rate = (
-                env.metrics.episode_jobs_completed / env.metrics.episode_jobs_submitted * 100
-                if env.metrics.episode_jobs_submitted > 0
-                else 0.0
-            )
-            avg_wait = (
-                env.metrics.episode_total_job_wait_time / env.metrics.episode_jobs_completed
-                if env.metrics.episode_jobs_completed > 0
-                else 0.0
-            )
+            completion_rate = _safe_completion_rate(
+                env.metrics.episode_jobs_completed, env.metrics.episode_jobs_submitted
+            )
+            avg_wait = _safe_avg_wait(
+                env.metrics.episode_total_job_wait_time, env.metrics.episode_jobs_completed
+            )
             ...
             # Job metrics (baseline)
-            baseline_completion_rate = (
-                env.metrics.episode_baseline_jobs_completed / env.metrics.episode_baseline_jobs_submitted * 100
-                if env.metrics.episode_baseline_jobs_submitted > 0
-                else 0.0
-            )
-            baseline_avg_wait = (
-                env.metrics.episode_baseline_total_job_wait_time / env.metrics.episode_baseline_jobs_completed
-                if env.metrics.episode_baseline_jobs_completed > 0
-                else 0.0
-            )
+            baseline_completion_rate = _safe_completion_rate(
+                env.metrics.episode_baseline_jobs_completed, env.metrics.episode_baseline_jobs_submitted
+            )
+            baseline_avg_wait = _safe_avg_wait(
+                env.metrics.episode_baseline_total_job_wait_time, env.metrics.episode_baseline_jobs_completed
+            )

src/baseline.py

src/job_management.py

helper_queue allows to keep observation space at "reduced" queue-length, while no job is rejected.

Set Max-Age to very high number (observation space limit). Hotfixed dropping jobs after 1 week, kept logic, if actual dropping shall be implemented later. Age-Penalty now looks at the age of the oldest waiting job; smooth exponential for early times, saturates at -1 after one week.

…erlapping tags Added metrics beyond the original five are: metrics/episode_baseline_cost metrics/episode_baseline_cost_off metrics/episode_jobs_submitted metrics/episode_jobs_completed metrics/episode_completion_rate metrics/episode_avg_wait_hours metrics/episode_max_queue_size metrics/episode_jobs_rejected_queue_full metrics/episode_baseline_jobs_submitted metrics/episode_baseline_jobs_completed metrics/episode_baseline_completion_rate metrics/episode_baseline_avg_wait_hours metrics/episode_baseline_max_queue_size metrics/episode_baseline_jobs_dropped metrics/episode_baseline_jobs_rejected_queue_full

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/environment.py (1)

355-358: Job-age penalty ignores backlog jobs still waiting.
The reward uses job_queue_2d only, so backlog jobs (which can be the oldest) won’t affect the age penalty while they wait outside the main queue. Consider folding backlog entries into the penalty input.

🛠️ One way to include backlog ages

-        step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate(
-            num_used_nodes, num_idle_nodes, current_price, average_future_price,
-            num_off_nodes, num_launched_jobs, num_node_changes, job_queue_2d,
-            num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print
-        )
+        job_queue_for_penalty = job_queue_2d
+        if self.backlog_queue:
+            job_queue_for_penalty = np.vstack(
+                [job_queue_2d, np.array(self.backlog_queue, dtype=job_queue_2d.dtype)]
+            )
+        step_reward, step_cost, eff_reward_norm, price_reward_norm, idle_penalty_norm, job_age_penalty_norm = self.reward_calculator.calculate(
+            num_used_nodes, num_idle_nodes, current_price, average_future_price,
+            num_off_nodes, num_launched_jobs, num_node_changes, job_queue_for_penalty,
+            num_unprocessed_jobs, self.weights, num_dropped_this_step, self.env_print
+        )

🤖 Fix all issues with AI agents

In `@src/plotter.py`:
- Line 225: The prefix used when building filenames is inconsistent: update the
prefix inside plot_cumulative_savings() so it matches the new format used
elsewhere (use
e{env.weights.efficiency_weight}_p{env.weights.price_weight}_i{env.weights.idle_weight}_a{env.weights.job_age_weight}_d{env.weights.drop_weight});
replace the old `_d{job_age_weight}` segment with
`_a{env.weights.job_age_weight}_d{env.weights.drop_weight}` and ensure the
function references env.weights the same way as the earlier prefix assignment.

🧹 Nitpick comments (2)

requirements.txt (1)

1-8: Consider pinning dependency versions for reproducibility.

None of the dependencies specify version constraints, which can lead to non-reproducible builds and unexpected breakage from upstream changes. Consider pinning versions (e.g., stable-baselines3==2.2.1) or using version ranges (e.g., numpy>=1.24,<2.0) to ensure consistent environments across installations.

src/reward_calculation.py (1)

163-176: Move blackout tuning constants to config.
BLACKOUT_QUEUE_THRESHOLD and SATURATION_FACTOR look like configuration knobs; centralizing them makes experiments easier to control and audit.
As per coding guidelines: Configuration constants must be centralized and updated consistently across all modules.

src/plotter.py

…re filenames in plotter.

Reward Calculation: HOTFIX! "energy-efficiency" logic was lost during…

f1b4c16

… refactoring. Now reintroduced and used instead of "efficiency" function.

enlorenz force-pushed the wip/integration branch from 8bf27e2 to 963be49 Compare January 28, 2026 13:38

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/config.py Outdated Show resolved Hide resolved

src/environment.py Show resolved Hide resolved

src/reward_calculation.py Show resolved Hide resolved

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/job_management.py Outdated Show resolved Hide resolved

src/job_management.py Show resolved Hide resolved

enlorenz force-pushed the wip/integration branch 2 times, most recently from 6763cc8 to 440c0b6 Compare January 28, 2026 14:07

enlorenz marked this pull request as draft January 28, 2026 14:11

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/reward_calculation.py Show resolved Hide resolved

enlorenz marked this pull request as ready for review January 28, 2026 14:14

enlorenz force-pushed the wip/integration branch from 440c0b6 to b62bdbd Compare January 28, 2026 14:59

enlorenz marked this pull request as draft January 28, 2026 14:59

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/environment.py Show resolved Hide resolved

enlorenz marked this pull request as ready for review January 28, 2026 15:08

enlorenz marked this pull request as draft January 28, 2026 15:08

enlorenz force-pushed the wip/integration branch 2 times, most recently from b62bdbd to bae5879 Compare January 28, 2026 15:11

coderabbitai bot reviewed Jan 28, 2026

View reviewed changes

src/baseline.py Outdated Show resolved Hide resolved

src/job_management.py Outdated Show resolved Hide resolved

enlorenz force-pushed the wip/integration branch 2 times, most recently from f0924ed to 88b68ea Compare January 28, 2026 15:31

enlorenz marked this pull request as ready for review January 28, 2026 15:36

Enis Lorenz added 4 commits January 28, 2026 16:49

Introduce helper_queue to collect overflow.

481141c

helper_queue allows to keep observation space at "reduced" queue-length, while no job is rejected.

Train.py: Fixed Weight Prefixes for model names.

e7d63d2

enlorenz force-pushed the wip/integration branch from 88b68ea to e7d63d2 Compare January 28, 2026 15:49

coderabbitai bot reviewed Jan 29, 2026

View reviewed changes

src/plotter.py Show resolved Hide resolved

Small Fixes: Removed MAX_JOB_AGE from logic, corrected naming of figu…

6f75711

…re filenames in plotter.

enlorenz force-pushed the wip/integration branch from d218798 to 6f75711 Compare January 29, 2026 10:38

make age penalty clearer

8416d5b

rbx merged commit 6460436 into FairRootGroup:master Jan 29, 2026
4 checks passed

Rework of Rewards and Queueing + Small hotfixes #17

Rework of Rewards and Queueing + Small hotfixes #17

Uh oh!

Conversation

enlorenz commented Jan 28, 2026

Uh oh!

coderabbitai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 28, 2026 •

edited

Loading