[LTS 8.6] RDMA: CVE-2023-2176, CVE-2022-48925, CVE-2022-50543; mm: CVE-2023-53178, CVE-2024-26832#851
Open
pvts-mat wants to merge 6 commits intoctrliq:ciqlts8_6from
Open
[LTS 8.6] RDMA: CVE-2023-2176, CVE-2022-48925, CVE-2022-50543; mm: CVE-2023-53178, CVE-2024-26832#851pvts-mat wants to merge 6 commits intoctrliq:ciqlts8_6from
pvts-mat wants to merge 6 commits intoctrliq:ciqlts8_6from
Conversation
jira VULN-175450 cve CVE-2022-48925 commit-author Jason Gunthorpe <jgg@nvidia.com> commit 22e9f71 If the state is not idle then resolve_prepare_src() should immediately fail and no change to global state should happen. However, it unconditionally overwrites the src_addr trying to build a temporary any address. For instance if the state is already RDMA_CM_LISTEN then this will corrupt the src_addr and would cause the test in cma_cancel_operation(): if (cma_any_addr(cma_src_addr(id_priv)) && !id_priv->cma_dev) Which would manifest as this trace from syzkaller: BUG: KASAN: use-after-free in __list_add_valid+0x93/0xa0 lib/list_debug.c:26 Read of size 8 at addr ffff8881546491e0 by task syz-executor.1/32204 CPU: 1 PID: 32204 Comm: syz-executor.1 Not tainted 5.12.0-rc8-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232 __kasan_report mm/kasan/report.c:399 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416 __list_add_valid+0x93/0xa0 lib/list_debug.c:26 __list_add include/linux/list.h:67 [inline] list_add_tail include/linux/list.h:100 [inline] cma_listen_on_all drivers/infiniband/core/cma.c:2557 [inline] rdma_listen+0x787/0xe00 drivers/infiniband/core/cma.c:3751 ucma_listen+0x16a/0x210 drivers/infiniband/core/ucma.c:1102 ucma_write+0x259/0x350 drivers/infiniband/core/ucma.c:1732 vfs_write+0x28e/0xa30 fs/read_write.c:603 ksys_write+0x1ee/0x250 fs/read_write.c:658 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xae This is indicating that an rdma_id_private was destroyed without doing cma_cancel_listens(). Instead of trying to re-use the src_addr memory to indirectly create an any address derived from the dst build one explicitly on the stack and bind to that as any other normal flow would do. rdma_bind_addr() will copy it over the src_addr once it knows the state is valid. This is similar to commit bc0bdc5 ("RDMA/cma: Do not change route.addr.src_addr.ss_family") Link: https://lore.kernel.org/r/0-v2-e975c8fd9ef2+11e-syz_cma_srcaddr_jgg@nvidia.com Cc: stable@vger.kernel.org Fixes: 732d41c ("RDMA/cma: Make the locking for automatic state transition more clear") Reported-by: syzbot+c94a3675a626f6333d74@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> (cherry picked from commit 22e9f71) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-4121 cve CVE-2023-2176 commit-author Patrisious Haddad <phaddad@nvidia.com> commit 8d03797 Refactor rdma_bind_addr function so that it doesn't require that the cma destination address be changed before calling it. So now it will update the destination address internally only when it is really needed and after passing all the required checks. Which in turn results in a cleaner and more sensible call and error handling flows for the functions that call it directly or indirectly. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reported-by: Wei Chen <harperchen1110@gmail.com> Reviewed-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/3d0e9a2fd62bc10ba02fed1c7c48a48638952320.1672819273.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 8d03797) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-4121 cve-bf CVE-2023-2176 commit-author Shiraz Saleem <shiraz.saleem@intel.com> commit 0e15863 8d03797 ("RDMA/core: Refactor rdma_bind_addr") intoduces as regression on irdma devices on certain tests which uses rdma CM, such as cmtime. No connections can be established with the MAD QP experiences a fatal error on the active side. The cma destination address is not updated with the dst_addr when ULP on active side calls rdma_bind_addr followed by rdma_resolve_addr. The id_priv state is 'bound' in resolve_prepare_src and update is skipped. This leaves the dgid passed into irdma driver to create an Address Handle (AH) for the MAD QP at 0. The create AH descriptor as well as the ARP cache entry is invalid and HW throws an asynchronous events as result. [ 1207.656888] resolve_prepare_src caller: ucma_resolve_addr+0xff/0x170 [rdma_ucm] daddr=200.0.4.28 id_priv->state=7 [....] [ 1207.680362] ice 0000:07:00.1 rocep7s0f1: caller: irdma_create_ah+0x3e/0x70 [irdma] ah_id=0 arp_idx=0 dest_ip=0.0.0.0 destMAC=00:00:64:ca:b7:52 ipvalid=1 raw=0000:0000:0000:0000:0000:ffff:0000:0000 [ 1207.682077] ice 0000:07:00.1 rocep7s0f1: abnormal ae_id = 0x401 bool qp=1 qp_id = 1, ae_src=5 [ 1207.691657] infiniband rocep7s0f1: Fatal error (1) on MAD QP (1) Fix this by updating the CMA destination address when the ULP calls a resolve address with the CM state already bound. Fixes: 8d03797 ("RDMA/core: Refactor rdma_bind_addr") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Link: https://lore.kernel.org/r/20230712234133.1343-1-shiraz.saleem@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org> (cherry picked from commit 0e15863) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-158126 cve CVE-2022-50543 commit-author Li Zhijian <lizhijian@fujitsu.com> commit 7d984da rxe_mr_cleanup() which tries to free mr->map again will be called when rxe_mr_init_user() fails: CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ ctrliq#25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x45/0x5d panic+0x19e/0x349 end_report.part.0+0x54/0x7c kasan_report.cold+0xa/0xf rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe] __rxe_cleanup+0x10a/0x1e0 [rdma_rxe] rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe] ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs] ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs] This issue was firstly exposed since commit b18c7da ("RDMA/rxe: Fix memory leak in error path code") and then we fixed it in commit 8ff5f5d ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this fix was reverted together at last by commit 1e75550 (Revert "RDMA/rxe: Create duplicate mapping tables for FMRs") Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is successfully allocated. Fixes: 1e75550 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"") Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> (cherry picked from commit 7d984da) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-154318 cve CVE-2023-53178 commit-author Domenico Cerasuolo <cerasuolodomenico@gmail.com> commit 04fc781 upstream-diff Version ciqlts8_6 lacks commit 75fa68a ("mm/swap: convert delete_from_swap_cache() to take a folio") so `delete_from_swap_cache()' operates on pages directly and the `page_folio()' call is not needed. (That function is not defined in ciqlts8_6 anyway, as it was introduced in a non-backported commit 7b230db) The zswap writeback mechanism can cause a race condition resulting in memory corruption, where a swapped out page gets swapped in with data that was written to a different page. The race unfolds like this: 1. a page with data A and swap offset X is stored in zswap 2. page A is removed off the LRU by zpool driver for writeback in zswap-shrink work, data for A is mapped by zpool driver 3. user space program faults and invalidates page entry A, offset X is considered free 4. kswapd stores page B at offset X in zswap (zswap could also be full, if so, page B would then be IOed to X, then skip step 5.) 5. entry A is replaced by B in tree->rbroot, this doesn't affect the local reference held by zswap-shrink work 6. zswap-shrink work writes back A at X, and frees zswap entry A 7. swapin of slot X brings A in memory instead of B The fix: Once the swap page cache has been allocated (case ZSWAP_SWAPCACHE_NEW), zswap-shrink work just checks that the local zswap_entry reference is still the same as the one in the tree. If it's not the same it means that it's either been invalidated or replaced, in both cases the writeback is aborted because the local entry contains stale data. Reproducer: I originally found this by running `stress` overnight to validate my work on the zswap writeback mechanism, it manifested after hours on my test machine. The key to make it happen is having zswap writebacks, so whatever setup pumps /sys/kernel/debug/zswap/written_back_pages should do the trick. In order to reproduce this faster on a vm, I setup a system with ~100M of available memory and a 500M swap file, then running `stress --vm 1 --vm-bytes 300000000 --vm-stride 4000` makes it happen in matter of tens of minutes. One can speed things up even more by swinging /sys/module/zswap/parameters/max_pool_percent up and down between, say, 20 and 1; this makes it reproduce in tens of seconds. It's crucial to set `--vm-stride` to something other than 4096 otherwise `stress` won't realize that memory has been corrupted because all pages would have the same data. Link: https://lkml.kernel.org/r/20230503151200.19707-1-cerasuolodomenico@gmail.com Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Chris Li (Google) <chrisl@kernel.org> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 04fc781) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
jira VULN-175655 cve CVE-2024-26832 commit-author Yosry Ahmed <yosryahmed@google.com> commit e3b63e9 upstream-diff Manual change; no commit was actually cherry-picked as the auto resolver puts the change completely out of place and its contents had to be rewritten entirely anyway. Kernel ciqlts8_6 lacks the transition from page to folio in `zswap_writeback_entry()' introduced in 96c7b0b, so in the backported version `unlock_page(page)' is used instead of `folio_unlock(folio)' and `put_page(page)' instead of `folio_put(folio)'. (See also the relevant, non-backported commit 4e13642). In the upstream, up until e3b63e9, the `zswap_writeback_entry()' function underwent major refactors in ff9d5ba, 98804a9, 32acba4 and 96c7b0b compared to the ciqlts8_6 version, but the writeback race path addressed by the fix remained largely intact, and in ciqlts8_6 falls under the ZSWAP_SWAPCACHE_NEW case. In zswap_writeback_entry(), after we get a folio from __read_swap_cache_async(), we grab the tree lock again to check that the swap entry was not invalidated and recycled. If it was, we delete the folio we just added to the swap cache and exit. However, __read_swap_cache_async() returns the folio locked when it is newly allocated, which is always true for this path, and the folio is ref'd. Make sure to unlock and put the folio before returning. This was discovered by code inspection, probably because this path handles a race condition that should not happen often, and the bug would not crash the system, it will only strand the folio indefinitely. Link: https://lkml.kernel.org/r/20240125085127.1327013-1-yosryahmed@google.com Fixes: 04fc781 ("mm: fix zswap writeback race condition") Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Nhat Pham <nphamcs@gmail.com> Cc: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit e3b63e9) Signed-off-by: Marcin Wcisło <marcin.wcislo@conclusive.pl>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
[LTS 8.6]
Commits
CVE-2023-2176 (+ CVE-2022-48925)
Commit
RDMA/cma: Do not change route.addr.src_addr outside state checkswas a prerequisite forRDMA/core: Refactor rdma_bind_addr, with its own CVE-2022-48925. The explanation for the CVE-2023-2176 fix (which isn't a mere code refactor) can be found in #599 for LTS 9.2. Another, independently developed solution converging to the same result for FIPS 9 compliant can be found at #584.While RHEL 8 is listed as not affected by CVE-2022-48925 on https://access.redhat.com/security/cve/cve-2022-48925 the discussion on slack led to the inclusion of the fix for LTS 8.6 with a dedicated jira ticket anyway.
CVE-2022-50543
The issue of LTS 8.6 being affected by CVE-2022-50543 is a bit convoluted. From the commit message of the upstream fix 7d984da:
In LTS 8.6 the last two commits are missing, while the first one is backported as 09e1409, so the bug applies.
The reason 8ff5f5d
RDMA/rxe: Prevent double freeing rxe_map_set()wasn't used for the actual fix on LTS 8.6 was that it requires 647bf13RDMA/rxe: Create duplicate mapping tables for FMRsas prerequisite. The reverting commit 1e75550 reverts both of them, so the addressed functionrxe_mr_init_user(…)goes back to the state at b18c7daRDMA/rxe: Fix memory leak in error path code, such that it's 7d984da which cherry-picks cleanly onciqlts8_6, not 8ff5f5d, despite the latter being the chronologically first solution to this problem.CVE-2023-53178 (+ CVE-2024-26832)
The CVE-2024-26832 fix is a bugfix of the CVE-2023-53178 fix.
The inclusion of e3b63e9 backport can be contested by RH's classification of RHEL 8 (and others) as "Not affected" by CVE-2024-26832: https://access.redhat.com/security/cve/cve-2024-26832. However, the fix addresses specifically the "writeback race path" added by 04fc781 in the
zswap_writeback_entry()function; therefore LTS 8.6 is either affected by both CVE-2023-53178 and CVE-2024-26832 (provided that the former was fixed by 04fc781), or none of them. It's likely that RH's CVE-2024-26832 evaluation was simply the result of CVE-2023-53178 not being fixed at the time yet - seeLast modified: October 3, 2025at https://access.redhat.com/security/cve/cve-2024-26832 andIssued: 2025-11-12at https://access.redhat.com/errata/RHSA-2025:21084.Additionally, the end result (of backporting both cve fixes) is the same as in
rocky8_10- compare the functionszswap_writeback_entry()inrocky8_10https://github.com/ctrliq/kernel-src-tree/blob/rocky8_10/mm/zswap.c#L892 and in this patch, specifically thecase ZSWAP_SWAPCACHE_NEWfragment. The CVEs were addressed inrocky8_10in the back-engineered commits 60f2415 and b923a51. Given the very similar histories ofmm/zswap.cfile inciqlts8_6androcky8_10this further supports thesis that LTS 8.6 is affected by CVE-2024-26832. It's worth noting that the Rocky 8.10 solution was discovered only after the vulnerabilities were already solved on LTS 8.6, so the convergence of the results increases confidence in the solution.kABI check: passed
Boot test: passed
boot-test.log
Kselftests: passed relative
Reference
kselftests–ciqlts8_6–run1.log
Patch
kselftests–ciqlts8_6-CVE-batch-21–run1.log
Comparison
The tests results for the reference and the patch are the same.