| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
bnxt_en: Fix lockdep warning during rmmod
The commit under the Fixes tag added a netdev_assert_locked() in
bnxt_free_ntp_fltrs(). The lock should be held during normal run-time
but the assert will be triggered (see below) during bnxt_remove_one()
which should not need the lock. The netdev is already unregistered by
then. Fix it by calling netdev_assert_locked_or_invisible() which will
not assert if the netdev is unregistered.
WARNING: CPU: 5 PID: 2241 at ./include/net/netdev_lock.h:17 bnxt_free_ntp_fltrs+0xf8/0x100 [bnxt_en]
Modules linked in: rpcrdma rdma_cm iw_cm ib_cm configfs ib_core bnxt_en(-) bridge stp llc x86_pkg_temp_thermal xfs tg3 [last unloaded: bnxt_re]
CPU: 5 UID: 0 PID: 2241 Comm: rmmod Tainted: G S W 6.16.0 #2 PREEMPT(voluntary)
Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.4.3 01/17/2017
RIP: 0010:bnxt_free_ntp_fltrs+0xf8/0x100 [bnxt_en]
Code: 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 48 8b 47 60 be ff ff ff ff 48 8d b8 28 0c 00 00 e8 d0 cf 41 c3 85 c0 0f 85 2e ff ff ff <0f> 0b e9 27 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90
RSP: 0018:ffffa92082387da0 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff9e5b593d8000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffffffff83dc9a70 RDI: ffffffff83e1a1cf
RBP: ffff9e5b593d8c80 R08: 0000000000000000 R09: ffffffff8373a2b3
R10: 000000008100009f R11: 0000000000000001 R12: 0000000000000001
R13: ffffffffc01c4478 R14: dead000000000122 R15: dead000000000100
FS: 00007f3a8a52c740(0000) GS:ffff9e631ad1c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055bb289419c8 CR3: 000000011274e001 CR4: 00000000003706f0
Call Trace:
<TASK>
bnxt_remove_one+0x57/0x180 [bnxt_en]
pci_device_remove+0x39/0xc0
device_release_driver_internal+0xa5/0x130
driver_detach+0x42/0x90
bus_remove_driver+0x61/0xc0
pci_unregister_driver+0x38/0x90
bnxt_exit+0xc/0x7d0 [bnxt_en] |
| In the Linux kernel, the following vulnerability has been resolved:
dm: dm-crypt: Do not partially accept write BIOs with zoned targets
Read and write operations issued to a dm-crypt target may be split
according to the dm-crypt internal limits defined by the max_read_size
and max_write_size module parameters (default is 128 KB). The intent is
to improve processing time of large BIOs by splitting them into smaller
operations that can be parallelized on different CPUs.
For zoned dm-crypt targets, this BIO splitting is still done but without
the parallel execution to ensure that the issuing order of write
operations to the underlying devices remains sequential. However, the
splitting itself causes other problems:
1) Since dm-crypt relies on the block layer zone write plugging to
handle zone append emulation using regular write operations, the
reminder of a split write BIO will always be plugged into the target
zone write plugged. Once the on-going write BIO finishes, this
reminder BIO is unplugged and issued from the zone write plug work.
If this reminder BIO itself needs to be split, the reminder will be
re-issued and plugged again, but that causes a call to a
blk_queue_enter(), which may block if a queue freeze operation was
initiated. This results in a deadlock as DM submission still holds
BIOs that the queue freeze side is waiting for.
2) dm-crypt relies on the emulation done by the block layer using
regular write operations for processing zone append operations. This
still requires to properly return the written sector as the BIO
sector of the original BIO. However, this can be done correctly only
and only if there is a single clone BIO used for processing the
original zone append operation issued by the user. If the size of a
zone append operation is larger than dm-crypt max_write_size, then
the orginal BIO will be split and processed as a chain of regular
write operations. Such chaining result in an incorrect written sector
being returned to the zone append issuer using the original BIO
sector. This in turn results in file system data corruptions using
xfs or btrfs.
Fix this by modifying get_max_request_size() to always return the size
of the BIO to avoid it being split with dm_accpet_partial_bio() in
crypt_map(). get_max_request_size() is renamed to
get_max_request_sectors() to clarify the unit of the value returned
and its interface is changed to take a struct dm_target pointer and a
pointer to the struct bio being processed. In addition to this change,
to ensure that crypt_alloc_buffer() works correctly, set the dm-crypt
device max_hw_sectors limit to be at most
BIO_MAX_VECS << PAGE_SECTORS_SHIFT (1 MB with a 4KB page architecture).
This forces DM core to split write BIOs before passing them to
crypt_map(), and thus guaranteeing that dm-crypt can always accept an
entire write BIO without needing to split it.
This change does not have any effect on the read path of dm-crypt. Read
operations can still be split and the BIO fragments processed in
parallel. There is also no impact on the performance of the write path
given that all zone write BIOs were already processed inline instead of
in parallel.
This change also does not affect in any way regular dm-crypt block
devices. |
| In the Linux kernel, the following vulnerability has been resolved:
ext4: avoid deadlock in fs reclaim with page writeback
Ext4 has a filesystem wide lock protecting ext4_writepages() calls to
avoid races with switching of journalled data flag or inode format. This
lock can however cause a deadlock like:
CPU0 CPU1
ext4_writepages()
percpu_down_read(sbi->s_writepages_rwsem);
ext4_change_inode_journal_flag()
percpu_down_write(sbi->s_writepages_rwsem);
- blocks, all readers block from now on
ext4_do_writepages()
ext4_init_io_end()
kmem_cache_zalloc(io_end_cachep, GFP_KERNEL)
fs_reclaim frees dentry...
dentry_unlink_inode()
iput() - last ref =>
iput_final() - inode dirty =>
write_inode_now()...
ext4_writepages() tries to acquire sbi->s_writepages_rwsem
and blocks forever
Make sure we cannot recurse into filesystem reclaim from writeback code
to avoid the deadlock. |
| In the Linux kernel, the following vulnerability has been resolved:
e1000: Move cancel_work_sync to avoid deadlock
Previously, e1000_down called cancel_work_sync for the e1000 reset task
(via e1000_down_and_stop), which takes RTNL.
As reported by users and syzbot, a deadlock is possible in the following
scenario:
CPU 0:
- RTNL is held
- e1000_close
- e1000_down
- cancel_work_sync (cancel / wait for e1000_reset_task())
CPU 1:
- process_one_work
- e1000_reset_task
- take RTNL
To remedy this, avoid calling cancel_work_sync from e1000_down
(e1000_reset_task does nothing if the device is down anyway). Instead,
call cancel_work_sync for e1000_reset_task when the device is being
removed. |
| In the Linux kernel, the following vulnerability has been resolved:
smb: client: fix potential deadlock when reconnecting channels
Fix cifs_signal_cifsd_for_reconnect() to take the correct lock order
and prevent the following deadlock from happening
======================================================
WARNING: possible circular locking dependency detected
6.16.0-rc3-build2+ #1301 Tainted: G S W
------------------------------------------------------
cifsd/6055 is trying to acquire lock:
ffff88810ad56038 (&tcp_ses->srv_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x134/0x200
but task is already holding lock:
ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&ret_buf->chan_lock){+.+.}-{3:3}:
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_setup_session+0x81/0x4b0
cifs_get_smb_ses+0x771/0x900
cifs_mount_get_session+0x7e/0x170
cifs_mount+0x92/0x2d0
cifs_smb3_do_mount+0x161/0x460
smb3_get_tree+0x55/0x90
vfs_get_tree+0x46/0x180
do_new_mount+0x1b0/0x2e0
path_mount+0x6ee/0x740
do_mount+0x98/0xe0
__do_sys_mount+0x148/0x180
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&ret_buf->ses_lock){+.+.}-{3:3}:
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_match_super+0x101/0x320
sget+0xab/0x270
cifs_smb3_do_mount+0x1e0/0x460
smb3_get_tree+0x55/0x90
vfs_get_tree+0x46/0x180
do_new_mount+0x1b0/0x2e0
path_mount+0x6ee/0x740
do_mount+0x98/0xe0
__do_sys_mount+0x148/0x180
do_syscall_64+0xa4/0x260
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (&tcp_ses->srv_lock){+.+.}-{3:3}:
check_noncircular+0x95/0xc0
check_prev_add+0x115/0x2f0
validate_chain+0x1cf/0x270
__lock_acquire+0x60e/0x780
lock_acquire.part.0+0xb4/0x1f0
_raw_spin_lock+0x2f/0x40
cifs_signal_cifsd_for_reconnect+0x134/0x200
__cifs_reconnect+0x8f/0x500
cifs_handle_standard+0x112/0x280
cifs_demultiplex_thread+0x64d/0xbc0
kthread+0x2f7/0x310
ret_from_fork+0x2a/0x230
ret_from_fork_asm+0x1a/0x30
other info that might help us debug this:
Chain exists of:
&tcp_ses->srv_lock --> &ret_buf->ses_lock --> &ret_buf->chan_lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ret_buf->chan_lock);
lock(&ret_buf->ses_lock);
lock(&ret_buf->chan_lock);
lock(&tcp_ses->srv_lock);
*** DEADLOCK ***
3 locks held by cifsd/6055:
#0: ffffffff857de398 (&cifs_tcp_ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x7b/0x200
#1: ffff888119c64060 (&ret_buf->ses_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0x9c/0x200
#2: ffff888119c64330 (&ret_buf->chan_lock){+.+.}-{3:3}, at: cifs_signal_cifsd_for_reconnect+0xcf/0x200 |
| In the Linux kernel, the following vulnerability has been resolved:
af_packet: move notifier's packet_dev_mc out of rcu critical section
Syzkaller reports the following issue:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:578
__mutex_lock+0x106/0xe80 kernel/locking/mutex.c:746
team_change_rx_flags+0x38/0x220 drivers/net/team/team_core.c:1781
dev_change_rx_flags net/core/dev.c:9145 [inline]
__dev_set_promiscuity+0x3f8/0x590 net/core/dev.c:9189
netif_set_promiscuity+0x50/0xe0 net/core/dev.c:9201
dev_set_promiscuity+0x126/0x260 net/core/dev_api.c:286 packet_dev_mc net/packet/af_packet.c:3698 [inline]
packet_dev_mclist_delete net/packet/af_packet.c:3722 [inline]
packet_notifier+0x292/0xa60 net/packet/af_packet.c:4247
notifier_call_chain+0x1b3/0x3e0 kernel/notifier.c:85
call_netdevice_notifiers_extack net/core/dev.c:2214 [inline]
call_netdevice_notifiers net/core/dev.c:2228 [inline]
unregister_netdevice_many_notify+0x15d8/0x2330 net/core/dev.c:11972
rtnl_delete_link net/core/rtnetlink.c:3522 [inline]
rtnl_dellink+0x488/0x710 net/core/rtnetlink.c:3564
rtnetlink_rcv_msg+0x7cf/0xb70 net/core/rtnetlink.c:6955
netlink_rcv_skb+0x219/0x490 net/netlink/af_netlink.c:2534
Calling `PACKET_ADD_MEMBERSHIP` on an ops-locked device can trigger
the `NETDEV_UNREGISTER` notifier, which may require disabling promiscuous
and/or allmulti mode. Both of these operations require acquiring
the netdev instance lock.
Move the call to `packet_dev_mc` outside of the RCU critical section.
The `mclist` modifications (add, del, flush, unregister) are protected by
the RTNL, not the RCU. The RCU only protects the `sklist` and its
associated `sks`. The delayed operation on the `mclist` entry remains
within the RTNL. |
| A denial of service vulnerability was found in tipc_crypto_key_revoke in net/tipc/crypto.c in the Linux kernel’s TIPC subsystem. This flaw allows guests with local user privileges to trigger a deadlock and potentially crash the system. |
| A denial of service vulnerability due to a deadlock was found in sctp_auto_asconf_init in net/sctp/socket.c in the Linux kernel’s SCTP subsystem. This flaw allows guests with local user privileges to trigger a deadlock and potentially crash the system. |
| In the Linux kernel, the following vulnerability has been resolved:
usb: typec: tcpm: move tcpm_queue_vdm_unlocked to asynchronous work
A state check was previously added to tcpm_queue_vdm_unlocked to
prevent a deadlock where the DisplayPort Alt Mode driver would be
executing work and attempting to grab the tcpm_lock while the TCPM
was holding the lock and attempting to unregister the altmode, blocking
on the altmode driver's cancel_work_sync call.
Because the state check isn't protected, there is a small window
where the Alt Mode driver could determine that the TCPM is
in a ready state and attempt to grab the lock while the
TCPM grabs the lock and changes the TCPM state to one that
causes the deadlock. The callstack is provided below:
[110121.667392][ C7] Call trace:
[110121.667396][ C7] __switch_to+0x174/0x338
[110121.667406][ C7] __schedule+0x608/0x9f0
[110121.667414][ C7] schedule+0x7c/0xe8
[110121.667423][ C7] kernfs_drain+0xb0/0x114
[110121.667431][ C7] __kernfs_remove+0x16c/0x20c
[110121.667436][ C7] kernfs_remove_by_name_ns+0x74/0xe8
[110121.667442][ C7] sysfs_remove_group+0x84/0xe8
[110121.667450][ C7] sysfs_remove_groups+0x34/0x58
[110121.667458][ C7] device_remove_groups+0x10/0x20
[110121.667464][ C7] device_release_driver_internal+0x164/0x2e4
[110121.667475][ C7] device_release_driver+0x18/0x28
[110121.667484][ C7] bus_remove_device+0xec/0x118
[110121.667491][ C7] device_del+0x1e8/0x4ac
[110121.667498][ C7] device_unregister+0x18/0x38
[110121.667504][ C7] typec_unregister_altmode+0x30/0x44
[110121.667515][ C7] tcpm_reset_port+0xac/0x370
[110121.667523][ C7] tcpm_snk_detach+0x84/0xb8
[110121.667529][ C7] run_state_machine+0x4c0/0x1b68
[110121.667536][ C7] tcpm_state_machine_work+0x94/0xe4
[110121.667544][ C7] kthread_worker_fn+0x10c/0x244
[110121.667552][ C7] kthread+0x104/0x1d4
[110121.667557][ C7] ret_from_fork+0x10/0x20
[110121.667689][ C7] Workqueue: events dp_altmode_work
[110121.667697][ C7] Call trace:
[110121.667701][ C7] __switch_to+0x174/0x338
[110121.667710][ C7] __schedule+0x608/0x9f0
[110121.667717][ C7] schedule+0x7c/0xe8
[110121.667725][ C7] schedule_preempt_disabled+0x24/0x40
[110121.667733][ C7] __mutex_lock+0x408/0xdac
[110121.667741][ C7] __mutex_lock_slowpath+0x14/0x24
[110121.667748][ C7] mutex_lock+0x40/0xec
[110121.667757][ C7] tcpm_altmode_enter+0x78/0xb4
[110121.667764][ C7] typec_altmode_enter+0xdc/0x10c
[110121.667769][ C7] dp_altmode_work+0x68/0x164
[110121.667775][ C7] process_one_work+0x1e4/0x43c
[110121.667783][ C7] worker_thread+0x25c/0x430
[110121.667789][ C7] kthread+0x104/0x1d4
[110121.667794][ C7] ret_from_fork+0x10/0x20
Change tcpm_queue_vdm_unlocked to queue for tcpm_queue_vdm_work,
which can perform the state check while holding the TCPM lock
while the Alt Mode lock is no longer held. This requires a new
struct to hold the vdm data, altmode_vdm_event. |
| In the Linux kernel, the following vulnerability has been resolved:
fs/dax: Fix "don't skip locked entries when scanning entries"
Commit 6be3e21d25ca ("fs/dax: don't skip locked entries when scanning
entries") introduced a new function, wait_entry_unlocked_exclusive(),
which waits for the current entry to become unlocked without advancing
the XArray iterator state.
Waiting for the entry to become unlocked requires dropping the XArray
lock. This requires calling xas_pause() prior to dropping the lock
which leaves the xas in a suitable state for the next iteration. However
this has the side-effect of advancing the xas state to the next index.
Normally this isn't an issue because xas_for_each() contains code to
detect this state and thus avoid advancing the index a second time on
the next loop iteration.
However both callers of and wait_entry_unlocked_exclusive() itself
subsequently use the xas state to reload the entry. As xas_pause()
updated the state to the next index this will cause the current entry
which is being waited on to be skipped. This caused the following
warning to fire intermittently when running xftest generic/068 on an XFS
filesystem with FS DAX enabled:
[ 35.067397] ------------[ cut here ]------------
[ 35.068229] WARNING: CPU: 21 PID: 1640 at mm/truncate.c:89 truncate_folio_batch_exceptionals+0xd8/0x1e0
[ 35.069717] Modules linked in: nd_pmem dax_pmem nd_btt nd_e820 libnvdimm
[ 35.071006] CPU: 21 UID: 0 PID: 1640 Comm: fstest Not tainted 6.15.0-rc7+ #77 PREEMPT(voluntary)
[ 35.072613] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/204
[ 35.074845] RIP: 0010:truncate_folio_batch_exceptionals+0xd8/0x1e0
[ 35.075962] Code: a1 00 00 00 f6 47 0d 20 0f 84 97 00 00 00 4c 63 e8 41 39 c4 7f 0b eb 61 49 83 c5 01 45 39 ec 7e 58 42 f68
[ 35.079522] RSP: 0018:ffffb04e426c7850 EFLAGS: 00010202
[ 35.080359] RAX: 0000000000000000 RBX: ffff9d21e3481908 RCX: ffffb04e426c77f4
[ 35.081477] RDX: ffffb04e426c79e8 RSI: ffffb04e426c79e0 RDI: ffff9d21e34816e8
[ 35.082590] RBP: ffffb04e426c79e0 R08: 0000000000000001 R09: 0000000000000003
[ 35.083733] R10: 0000000000000000 R11: 822b53c0f7a49868 R12: 000000000000001f
[ 35.084850] R13: 0000000000000000 R14: ffffb04e426c78e8 R15: fffffffffffffffe
[ 35.085953] FS: 00007f9134c87740(0000) GS:ffff9d22abba0000(0000) knlGS:0000000000000000
[ 35.087346] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 35.088244] CR2: 00007f9134c86000 CR3: 000000040afff000 CR4: 00000000000006f0
[ 35.089354] Call Trace:
[ 35.089749] <TASK>
[ 35.090168] truncate_inode_pages_range+0xfc/0x4d0
[ 35.091078] truncate_pagecache+0x47/0x60
[ 35.091735] xfs_setattr_size+0xc7/0x3e0
[ 35.092648] xfs_vn_setattr+0x1ea/0x270
[ 35.093437] notify_change+0x1f4/0x510
[ 35.094219] ? do_truncate+0x97/0xe0
[ 35.094879] do_truncate+0x97/0xe0
[ 35.095640] path_openat+0xabd/0xca0
[ 35.096278] do_filp_open+0xd7/0x190
[ 35.096860] do_sys_openat2+0x8a/0xe0
[ 35.097459] __x64_sys_openat+0x6d/0xa0
[ 35.098076] do_syscall_64+0xbb/0x1d0
[ 35.098647] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 35.099444] RIP: 0033:0x7f9134d81fc1
[ 35.100033] Code: 75 57 89 f0 25 00 00 41 00 3d 00 00 41 00 74 49 80 3d 2a 26 0e 00 00 74 6d 89 da 48 89 ee bf 9c ff ff ff5
[ 35.102993] RSP: 002b:00007ffcd41e0d10 EFLAGS: 00000202 ORIG_RAX: 0000000000000101
[ 35.104263] RAX: ffffffffffffffda RBX: 0000000000000242 RCX: 00007f9134d81fc1
[ 35.105452] RDX: 0000000000000242 RSI: 00007ffcd41e1200 RDI: 00000000ffffff9c
[ 35.106663] RBP: 00007ffcd41e1200 R08: 0000000000000000 R09: 0000000000000064
[ 35.107923] R10: 00000000000001a4 R11: 0000000000000202 R12: 0000000000000066
[ 35.109112] R13: 0000000000100000 R14: 0000000000100000 R15: 0000000000000400
[ 35.110357] </TASK>
[ 35.110769] irq event stamp: 8415587
[ 35.111486] hardirqs last enabled at (8415599): [<ffffffff8d74b562>] __up_console_se
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
block: don't use submit_bio_noacct_nocheck in blk_zone_wplug_bio_work
Bios queued up in the zone write plug have already gone through all all
preparation in the submit_bio path, including the freeze protection.
Submitting them through submit_bio_noacct_nocheck duplicates the work
and can can cause deadlocks when freezing a queue with pending bio
write plugs.
Go straight to ->submit_bio or blk_mq_submit_bio to bypass the
superfluous extra freeze protection and checks. |
| In the Linux kernel, the following vulnerability has been resolved:
firmware: arm_ffa: Replace mutex with rwlock to avoid sleep in atomic context
The current use of a mutex to protect the notifier hashtable accesses
can lead to issues in the atomic context. It results in the below
kernel warnings:
| BUG: sleeping function called from invalid context at kernel/locking/mutex.c:258
| in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 9, name: kworker/0:0
| preempt_count: 1, expected: 0
| RCU nest depth: 0, expected: 0
| CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.14.0 #4
| Workqueue: ffa_pcpu_irq_notification notif_pcpu_irq_work_fn
| Call trace:
| show_stack+0x18/0x24 (C)
| dump_stack_lvl+0x78/0x90
| dump_stack+0x18/0x24
| __might_resched+0x114/0x170
| __might_sleep+0x48/0x98
| mutex_lock+0x24/0x80
| handle_notif_callbacks+0x54/0xe0
| notif_get_and_handle+0x40/0x88
| generic_exec_single+0x80/0xc0
| smp_call_function_single+0xfc/0x1a0
| notif_pcpu_irq_work_fn+0x2c/0x38
| process_one_work+0x14c/0x2b4
| worker_thread+0x2e4/0x3e0
| kthread+0x13c/0x210
| ret_from_fork+0x10/0x20
To address this, replace the mutex with an rwlock to protect the notifier
hashtable accesses. This ensures that read-side locking does not sleep and
multiple readers can acquire the lock concurrently, avoiding unnecessary
contention and potential deadlocks. Writer access remains exclusive,
preserving correctness.
This change resolves warnings from lockdep about potential sleep in
atomic context. |
| In the Linux kernel, the following vulnerability has been resolved:
IB/mlx5: Fix potential deadlock in MR deregistration
The issue arises when kzalloc() is invoked while holding umem_mutex or
any other lock acquired under umem_mutex. This is problematic because
kzalloc() can trigger fs_reclaim_aqcuire(), which may, in turn, invoke
mmu_notifier_invalidate_range_start(). This function can lead to
mlx5_ib_invalidate_range(), which attempts to acquire umem_mutex again,
resulting in a deadlock.
The problematic flow:
CPU0 | CPU1
---------------------------------------|------------------------------------------------
mlx5_ib_dereg_mr() |
→ revoke_mr() |
→ mutex_lock(&umem_odp->umem_mutex) |
| mlx5_mkey_cache_init()
| → mutex_lock(&dev->cache.rb_lock)
| → mlx5r_cache_create_ent_locked()
| → kzalloc(GFP_KERNEL)
| → fs_reclaim()
| → mmu_notifier_invalidate_range_start()
| → mlx5_ib_invalidate_range()
| → mutex_lock(&umem_odp->umem_mutex)
→ cache_ent_find_and_store() |
→ mutex_lock(&dev->cache.rb_lock) |
Additionally, when kzalloc() is called from within
cache_ent_find_and_store(), we encounter the same deadlock due to
re-acquisition of umem_mutex.
Solve by releasing umem_mutex in dereg_mr() after umr_revoke_mr()
and before acquiring rb_lock. This ensures that we don't hold
umem_mutex while performing memory allocations that could trigger
the reclaim path.
This change prevents the deadlock by ensuring proper lock ordering and
avoiding holding locks during memory allocation operations that could
trigger the reclaim path.
The following lockdep warning demonstrates the deadlock:
python3/20557 is trying to acquire lock:
ffff888387542128 (&umem_odp->umem_mutex){+.+.}-{4:4}, at:
mlx5_ib_invalidate_range+0x5b/0x550 [mlx5_ib]
but task is already holding lock:
ffffffff82f6b840 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}, at:
unmap_vmas+0x7b/0x1a0
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
fs_reclaim_acquire+0x60/0xd0
mem_cgroup_css_alloc+0x6f/0x9b0
cgroup_init_subsys+0xa4/0x240
cgroup_init+0x1c8/0x510
start_kernel+0x747/0x760
x86_64_start_reservations+0x25/0x30
x86_64_start_kernel+0x73/0x80
common_startup_64+0x129/0x138
-> #2 (fs_reclaim){+.+.}-{0:0}:
fs_reclaim_acquire+0x91/0xd0
__kmalloc_cache_noprof+0x4d/0x4c0
mlx5r_cache_create_ent_locked+0x75/0x620 [mlx5_ib]
mlx5_mkey_cache_init+0x186/0x360 [mlx5_ib]
mlx5_ib_stage_post_ib_reg_umr_init+0x3c/0x60 [mlx5_ib]
__mlx5_ib_add+0x4b/0x190 [mlx5_ib]
mlx5r_probe+0xd9/0x320 [mlx5_ib]
auxiliary_bus_probe+0x42/0x70
really_probe+0xdb/0x360
__driver_probe_device+0x8f/0x130
driver_probe_device+0x1f/0xb0
__driver_attach+0xd4/0x1f0
bus_for_each_dev+0x79/0xd0
bus_add_driver+0xf0/0x200
driver_register+0x6e/0xc0
__auxiliary_driver_register+0x6a/0xc0
do_one_initcall+0x5e/0x390
do_init_module+0x88/0x240
init_module_from_file+0x85/0xc0
idempotent_init_module+0x104/0x300
__x64_sys_finit_module+0x68/0xc0
do_syscall_64+0x6d/0x140
entry_SYSCALL_64_after_hwframe+0x4b/0x53
-> #1 (&dev->cache.rb_lock){+.+.}-{4:4}:
__mutex_lock+0x98/0xf10
__mlx5_ib_dereg_mr+0x6f2/0x890 [mlx5_ib]
mlx5_ib_dereg_mr+0x21/0x110 [mlx5_ib]
ib_dereg_mr_user+0x85/0x1f0 [ib_core]
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
drm/scheduler: signal scheduled fence when kill job
When an entity from application B is killed, drm_sched_entity_kill()
removes all jobs belonging to that entity through
drm_sched_entity_kill_jobs_work(). If application A's job depends on a
scheduled fence from application B's job, and that fence is not properly
signaled during the killing process, application A's dependency cannot be
cleared.
This leads to application A hanging indefinitely while waiting for a
dependency that will never be resolved. Fix this issue by ensuring that
scheduled fences are properly signaled when an entity is killed, allowing
dependent applications to continue execution. |
| In the Linux kernel, the following vulnerability has been resolved:
KVM: Allow CPU to reschedule while setting per-page memory attributes
When running an SEV-SNP guest with a sufficiently large amount of memory (1TB+),
the host can experience CPU soft lockups when running an operation in
kvm_vm_set_mem_attributes() to set memory attributes on the whole
range of guest memory.
watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [qemu-kvm:6372]
CPU: 8 UID: 0 PID: 6372 Comm: qemu-kvm Kdump: loaded Not tainted 6.15.0-rc7.20250520.el9uek.rc1.x86_64 #1 PREEMPT(voluntary)
Hardware name: Oracle Corporation ORACLE SERVER E4-2c/Asm,MB Tray,2U,E4-2c, BIOS 78016600 11/13/2024
RIP: 0010:xas_create+0x78/0x1f0
Code: 00 00 00 41 80 fc 01 0f 84 82 00 00 00 ba 06 00 00 00 bd 06 00 00 00 49 8b 45 08 4d 8d 65 08 41 39 d6 73 20 83 ed 06 48 85 c0 <74> 67 48 89 c2 83 e2 03 48 83 fa 02 75 0c 48 3d 00 10 00 00 0f 87
RSP: 0018:ffffad890a34b940 EFLAGS: 00000286
RAX: ffff96f30b261daa RBX: ffffad890a34b9c8 RCX: 0000000000000000
RDX: 000000000000001e RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffad890a356868
R13: ffffad890a356860 R14: 0000000000000000 R15: ffffad890a356868
FS: 00007f5578a2a400(0000) GS:ffff97ed317e1000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f015c70fb18 CR3: 00000001109fd006 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
<TASK>
xas_store+0x58/0x630
__xa_store+0xa5/0x130
xa_store+0x2c/0x50
kvm_vm_set_mem_attributes+0x343/0x710 [kvm]
kvm_vm_ioctl+0x796/0xab0 [kvm]
__x64_sys_ioctl+0xa3/0xd0
do_syscall_64+0x8c/0x7a0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f5578d031bb
Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 2d 4c 0f 00 f7 d8 64 89 01 48
RSP: 002b:00007ffe0a742b88 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 000000004020aed2 RCX: 00007f5578d031bb
RDX: 00007ffe0a742c80 RSI: 000000004020aed2 RDI: 000000000000000b
RBP: 0000010000000000 R08: 0000010000000000 R09: 0000017680000000
R10: 0000000000000080 R11: 0000000000000246 R12: 00005575e5f95120
R13: 00007ffe0a742c80 R14: 0000000000000008 R15: 00005575e5f961e0
While looping through the range of memory setting the attributes,
call cond_resched() to give the scheduler a chance to run a higher
priority task on the runqueue if necessary and avoid staying in
kernel mode long enough to trigger the lockup. |
| In the Linux kernel, the following vulnerability has been resolved:
mm/shmem, swap: fix softlockup with mTHP swapin
Following softlockup can be easily reproduced on my test machine with:
echo always > /sys/kernel/mm/transparent_hugepage/hugepages-64kB/enabled
swapon /dev/zram0 # zram0 is a 48G swap device
mkdir -p /sys/fs/cgroup/memory/test
echo 1G > /sys/fs/cgroup/test/memory.max
echo $BASHPID > /sys/fs/cgroup/test/cgroup.procs
while true; do
dd if=/dev/zero of=/tmp/test.img bs=1M count=5120
cat /tmp/test.img > /dev/null
rm /tmp/test.img
done
Then after a while:
watchdog: BUG: soft lockup - CPU#0 stuck for 763s! [cat:5787]
Modules linked in: zram virtiofs
CPU: 0 UID: 0 PID: 5787 Comm: cat Kdump: loaded Tainted: G L 6.15.0.orig-gf3021d9246bc-dirty #118 PREEMPT(voluntary)·
Tainted: [L]=SOFTLOCKUP
Hardware name: Red Hat KVM/RHEL-AV, BIOS 0.0.0 02/06/2015
RIP: 0010:mpol_shared_policy_lookup+0xd/0x70
Code: e9 b8 b4 ff ff 31 c0 c3 cc cc cc cc 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 0f 1f 44 00 00 41 54 55 53 <48> 8b 1f 48 85 db 74 41 4c 8d 67 08 48 89 fb 48 89 f5 4c 89 e7 e8
RSP: 0018:ffffc90002b1fc28 EFLAGS: 00000202
RAX: 00000000001c20ca RBX: 0000000000724e1e RCX: 0000000000000001
RDX: ffff888118e214c8 RSI: 0000000000057d42 RDI: ffff888118e21518
RBP: 000000000002bec8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000bf4 R11: 0000000000000000 R12: 0000000000000001
R13: 00000000001c20ca R14: 00000000001c20ca R15: 0000000000000000
FS: 00007f03f995c740(0000) GS:ffff88a07ad9a000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f03f98f1000 CR3: 0000000144626004 CR4: 0000000000770eb0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
shmem_alloc_folio+0x31/0xc0
shmem_swapin_folio+0x309/0xcf0
? filemap_get_entry+0x117/0x1e0
? xas_load+0xd/0xb0
? filemap_get_entry+0x101/0x1e0
shmem_get_folio_gfp+0x2ed/0x5b0
shmem_file_read_iter+0x7f/0x2e0
vfs_read+0x252/0x330
ksys_read+0x68/0xf0
do_syscall_64+0x4c/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f03f9a46991
Code: 00 48 8b 15 81 14 10 00 f7 d8 64 89 02 b8 ff ff ff ff eb bd e8 20 ad 01 00 f3 0f 1e fa 80 3d 35 97 10 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 4f c3 66 0f 1f 44 00 00 55 48 89 e5 48 83 ec
RSP: 002b:00007fff3c52bd28 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 0000000000040000 RCX: 00007f03f9a46991
RDX: 0000000000040000 RSI: 00007f03f98ba000 RDI: 0000000000000003
RBP: 00007fff3c52bd50 R08: 0000000000000000 R09: 00007f03f9b9a380
R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000040000
R13: 00007f03f98ba000 R14: 0000000000000003 R15: 0000000000000000
</TASK>
The reason is simple, readahead brought some order 0 folio in swap cache,
and the swapin mTHP folio being allocated is in conflict with it, so
swapcache_prepare fails and causes shmem_swap_alloc_folio to return
-EEXIST, and shmem simply retries again and again causing this loop.
Fix it by applying a similar fix for anon mTHP swapin.
The performance change is very slight, time of swapin 10g zero folios
with shmem (test for 12 times):
Before: 2.47s
After: 2.48s
[kasong@tencent.com: add comment] |
| In the Linux kernel, the following vulnerability has been resolved:
riscv:uprobe fix SR_SPIE set/clear handling
In riscv the process of uprobe going to clear spie before exec
the origin insn,and set spie after that.But When access the page
which origin insn has been placed a page fault may happen and
irq was disabled in arch_uprobe_pre_xol function,It cause a WARN
as follows.
There is no need to clear/set spie in arch_uprobe_pre/post/abort_xol.
We can just remove it.
[ 31.684157] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1488
[ 31.684677] in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 76, name: work
[ 31.684929] preempt_count: 0, expected: 0
[ 31.685969] CPU: 2 PID: 76 Comm: work Tainted: G
[ 31.686542] Hardware name: riscv-virtio,qemu (DT)
[ 31.686797] Call Trace:
[ 31.687053] [<ffffffff80006442>] dump_backtrace+0x30/0x38
[ 31.687699] [<ffffffff80812118>] show_stack+0x40/0x4c
[ 31.688141] [<ffffffff8081817a>] dump_stack_lvl+0x44/0x5c
[ 31.688396] [<ffffffff808181aa>] dump_stack+0x18/0x20
[ 31.688653] [<ffffffff8003e454>] __might_resched+0x114/0x122
[ 31.688948] [<ffffffff8003e4b2>] __might_sleep+0x50/0x7a
[ 31.689435] [<ffffffff80822676>] down_read+0x30/0x130
[ 31.689728] [<ffffffff8000b650>] do_page_fault+0x166/x446
[ 31.689997] [<ffffffff80003c0c>] ret_from_exception+0x0/0xc |
| In the Linux kernel, the following vulnerability has been resolved:
tty: n_gsm: fix deadlock and link starvation in outgoing data path
The current implementation queues up new control and user packets as needed
and processes this queue down to the ldisc in the same code path.
That means that the upper and the lower layer are hard coupled in the code.
Due to this deadlocks can happen as seen below while transmitting data,
especially during ldisc congestion. Furthermore, the data channels starve
the control channel on high transmission load on the ldisc.
Introduce an additional control channel data queue to prevent timeouts and
link hangups during ldisc congestion. This is being processed before the
user channel data queue in gsm_data_kick(), i.e. with the highest priority.
Put the queue to ldisc data path into a workqueue and trigger it whenever
new data has been put into the transmission queue. Change
gsm_dlci_data_sweep() accordingly to fill up the transmission queue until
TX_THRESH_HI. This solves the locking issue, keeps latency low and provides
good performance on high data load.
Note that now all packets from a DLCI are removed from the internal queue
if the associated DLCI was closed. This ensures that no data is sent by the
introduced write task to an already closed DLCI.
BUG: spinlock recursion on CPU#0, test_v24_loop/124
lock: serial8250_ports+0x3a8/0x7500, .magic: dead4ead, .owner: test_v24_loop/124, .owner_cpu: 0
CPU: 0 PID: 124 Comm: test_v24_loop Tainted: G O 5.18.0-rc2 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0x34/0x44
do_raw_spin_lock+0x76/0xa0
_raw_spin_lock_irqsave+0x72/0x80
uart_write_room+0x3b/0xc0
gsm_data_kick+0x14b/0x240 [n_gsm]
gsmld_write_wakeup+0x35/0x70 [n_gsm]
tty_wakeup+0x53/0x60
tty_port_default_wakeup+0x1b/0x30
serial8250_tx_chars+0x12f/0x220
serial8250_handle_irq.part.0+0xfe/0x150
serial8250_default_handle_irq+0x48/0x80
serial8250_interrupt+0x56/0xa0
__handle_irq_event_percpu+0x78/0x1f0
handle_irq_event+0x34/0x70
handle_fasteoi_irq+0x90/0x1e0
__common_interrupt+0x69/0x100
common_interrupt+0x48/0xc0
asm_common_interrupt+0x1e/0x40
RIP: 0010:__do_softirq+0x83/0x34e
Code: 2a 0a ff 0f b7 ed c7 44 24 10 0a 00 00 00 48 c7 c7 51 2a 64 82 e8 2d
e2 d5 ff 65 66 c7 05 83 af 1e 7e 00 00 fb b8 ff ff ff ff <49> c7 c2 40 61
80 82 0f bc c5 41 89 c4 41 83 c4 01 0f 84 e6 00 00
RSP: 0018:ffffc90000003f98 EFLAGS: 00000286
RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff82642a51 RDI: ffffffff825bb5e7
RBP: 0000000000000200 R08: 00000008de3271a8 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000030 R14: 0000000000000000 R15: 0000000000000000
? __do_softirq+0x73/0x34e
irq_exit_rcu+0xb5/0x100
common_interrupt+0xa4/0xc0
</IRQ>
<TASK>
asm_common_interrupt+0x1e/0x40
RIP: 0010:_raw_spin_unlock_irqrestore+0x2e/0x50
Code: 00 55 48 89 fd 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 85 28 36 ff
48 89 ef e8 cd 58 36 ff 80 e7 02 74 01 fb bf 01 00 00 00 <e8> 3d 97 33 ff
65 8b 05 96 23 2b 7e 85 c0 74 03 5b 5d c3 0f 1f 44
RSP: 0018:ffffc9000020fd08 EFLAGS: 00000202
RAX: 0000000000000000 RBX: 0000000000000246 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffffffff8257fd74 RDI: 0000000000000001
RBP: ffff8880057de3a0 R08: 00000008de233000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000100 R14: 0000000000000202 R15: ffff8880057df0b8
? _raw_spin_unlock_irqrestore+0x23/0x50
gsmtty_write+0x65/0x80 [n_gsm]
n_tty_write+0x33f/0x530
? swake_up_all+0xe0/0xe0
file_tty_write.constprop.0+0x1b1/0x320
? n_tty_flush_buffer+0xb0/0xb0
new_sync_write+0x10c/0x190
vfs_write+0x282/0x310
ksys_write+0x68/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f3e5e35c15c
Code: 8b 7c 24 08 89 c5 e8 c5 ff ff ff 89 ef 89 44 24
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
virtio-net: fix recursived rtnl_lock() during probe()
The deadlock appears in a stack trace like:
virtnet_probe()
rtnl_lock()
virtio_config_changed_work()
netdev_notify_peers()
rtnl_lock()
It happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while the
virtio-net driver is still probing.
The config_work in probe() will get scheduled until virtnet_open() enables
the config change notification via virtio_config_driver_enable(). |
| In the Linux kernel, the following vulnerability has been resolved:
iavf: get rid of the crit lock
Get rid of the crit lock.
That frees us from the error prone logic of try_locks.
Thanks to netdev_lock() by Jakub it is now easy, and in most cases we were
protected by it already - replace crit lock by netdev lock when it was not
the case.
Lockdep reports that we should cancel the work under crit_lock [splat1],
and that was the scheme we have mostly followed since [1] by Slawomir.
But when that is done we still got into deadlocks [splat2]. So instead
we should look at the bigger problem, namely "weird locking/scheduling"
of the iavf. The first step to fix that is to remove the crit lock.
I will followup with a -next series that simplifies scheduling/tasks.
Cancel the work without netdev lock (weird unlock+lock scheme),
to fix the [splat2] (which would be totally ugly if we would kept
the crit lock).
Extend protected part of iavf_watchdog_task() to include scheduling
more work.
Note that the removed comment in iavf_reset_task() was misplaced,
it belonged to inside of the removed if condition, so it's gone now.
[splat1] - w/o this patch - The deadlock during VF removal:
WARNING: possible circular locking dependency detected
sh/3825 is trying to acquire lock:
((work_completion)(&(&adapter->watchdog_task)->work)){+.+.}-{0:0}, at: start_flush_work+0x1a1/0x470
but task is already holding lock:
(&adapter->crit_lock){+.+.}-{4:4}, at: iavf_remove+0xd1/0x690 [iavf]
which lock already depends on the new lock.
[splat2] - when cancelling work under crit lock, w/o this series,
see [2] for the band aid attempt
WARNING: possible circular locking dependency detected
sh/3550 is trying to acquire lock:
((wq_completion)iavf){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x26/0x90
but task is already holding lock:
(&dev->lock){+.+.}-{4:4}, at: iavf_remove+0xa6/0x6e0 [iavf]
which lock already depends on the new lock.
[1] fc2e6b3b132a ("iavf: Rework mutexes for better synchronisation")
[2] https://github.com/pkitszel/linux/commit/52dddbfc2bb60294083f5711a158a |