Tree: fix gateway channel close/abort#590
Merged
thiell merged 2 commits intocea-hpc:masterfrom Aug 8, 2025
Merged
Conversation
a945f82 to
f8c60aa
Compare
d89a1d4 to
b75a928
Compare
Fix PropagationChannel.ev_close() where gateway channel termination is handled. If we get an actual rc > 0, that comes from the gateway command itself and that means the gateway is defective/misconfigured, in that case, we mark it as unreachable at the Task level. In addition, in that case, if we have not launched the remote commands yet, they are redistributed to other available gateways. rc=None is now handled as a normal termination of the propagation channel and the corresponding gateway is not marked as unreachable anymore. Fixes cea-hpc#566.
This commit adds the functionality to abort a specific gateway channel from the initiator. Until now, this was not properly handled. This also fixes gateway failover. Changes: * Implement TreeWorker._gateway_abort() that can be used to abort/cancel all tasks being done by the TreeWorker via the specified gateway. In case of such abort (likely due to some gateway failure), a special return code 76 (os.EX_PROTOCOL) is used for closing all running remote commands via this gateway. This return code is sometimes used to specify a "Remote protocol error" / "An error occurred in a remote communication protocol" which seems appropriate here. * Implement a new Task._pchannel_closing() method that is called on PropagationChannel.ev_close(), so deterministically every time a gateway channel is closing (self-initiated or not). This method performs necessary cleanup actions, but most notably calls TreeWorker._gateway_abort(gateway) on each worker currently using the gateway channel. * Update Task._pchannel_release() so that it now calls PropagationChannel._close() instead of Worker.abort() to properly reset the channel's opened/setup flags. * Updated TreeWorkerTest with tests to better cover the above and gateway failover. Part of cea-hpc#229 and extended work on cea-hpc#566.
b75a928 to
812b20c
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #566.