| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: btnxpuart: Fix random crash seen while removing driver
This fixes the random kernel crash seen while removing the driver, when
running the load/unload test over multiple iterations.
1) modprobe btnxpuart
2) hciconfig hci0 reset
3) hciconfig (check hci0 interface up with valid BD address)
4) modprobe -r btnxpuart
Repeat steps 1 to 4
The ps_wakeup() call in btnxpuart_close() schedules the psdata->work(),
which gets scheduled after module is removed, causing a kernel crash.
This hidden issue got highlighted after enabling Power Save by default
in 4183a7be7700 (Bluetooth: btnxpuart: Enable Power Save feature on
startup)
The new ps_cleanup() deasserts UART break immediately while closing
serdev device, cancels any scheduled ps_work and destroys the ps_lock
mutex.
[ 85.884604] Unable to handle kernel paging request at virtual address ffffd4a61638f258
[ 85.884624] Mem abort info:
[ 85.884625] ESR = 0x0000000086000007
[ 85.884628] EC = 0x21: IABT (current EL), IL = 32 bits
[ 85.884633] SET = 0, FnV = 0
[ 85.884636] EA = 0, S1PTW = 0
[ 85.884638] FSC = 0x07: level 3 translation fault
[ 85.884642] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000041dd0000
[ 85.884646] [ffffd4a61638f258] pgd=1000000095fff003, p4d=1000000095fff003, pud=100000004823d003, pmd=100000004823e003, pte=0000000000000000
[ 85.884662] Internal error: Oops: 0000000086000007 [#1] PREEMPT SMP
[ 85.890932] Modules linked in: algif_hash algif_skcipher af_alg overlay fsl_jr_uio caam_jr caamkeyblob_desc caamhash_desc caamalg_desc crypto_engine authenc libdes crct10dif_ce polyval_ce polyval_generic snd_soc_imx_spdif snd_soc_imx_card snd_soc_ak5558 snd_soc_ak4458 caam secvio error snd_soc_fsl_spdif snd_soc_fsl_micfil snd_soc_fsl_sai snd_soc_fsl_utils gpio_ir_recv rc_core fuse [last unloaded: btnxpuart(O)]
[ 85.927297] CPU: 1 PID: 67 Comm: kworker/1:3 Tainted: G O 6.1.36+g937b1be4345a #1
[ 85.936176] Hardware name: FSL i.MX8MM EVK board (DT)
[ 85.936182] Workqueue: events 0xffffd4a61638f380
[ 85.936198] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 85.952817] pc : 0xffffd4a61638f258
[ 85.952823] lr : 0xffffd4a61638f258
[ 85.952827] sp : ffff8000084fbd70
[ 85.952829] x29: ffff8000084fbd70 x28: 0000000000000000 x27: 0000000000000000
[ 85.963112] x26: ffffd4a69133f000 x25: ffff4bf1c8540990 x24: ffff4bf215b87305
[ 85.963119] x23: ffff4bf215b87300 x22: ffff4bf1c85409d0 x21: ffff4bf1c8540970
[ 85.977382] x20: 0000000000000000 x19: ffff4bf1c8540880 x18: 0000000000000000
[ 85.977391] x17: 0000000000000000 x16: 0000000000000133 x15: 0000ffffe2217090
[ 85.977399] x14: 0000000000000001 x13: 0000000000000133 x12: 0000000000000139
[ 85.977407] x11: 0000000000000001 x10: 0000000000000a60 x9 : ffff8000084fbc50
[ 85.977417] x8 : ffff4bf215b7d000 x7 : ffff4bf215b83b40 x6 : 00000000000003e8
[ 85.977424] x5 : 00000000410fd030 x4 : 0000000000000000 x3 : 0000000000000000
[ 85.977432] x2 : 0000000000000000 x1 : ffff4bf1c4265880 x0 : 0000000000000000
[ 85.977443] Call trace:
[ 85.977446] 0xffffd4a61638f258
[ 85.977451] 0xffffd4a61638f3e8
[ 85.977455] process_one_work+0x1d4/0x330
[ 85.977464] worker_thread+0x6c/0x430
[ 85.977471] kthread+0x108/0x10c
[ 85.977476] ret_from_fork+0x10/0x20
[ 85.977488] Code: bad PC value
[ 85.977491] ---[ end trace 0000000000000000 ]---
Preset since v6.9.11 |
| In the Linux kernel, the following vulnerability has been resolved:
platform/x86: intel-vbtn: Protect ACPI notify handler against recursion
Since commit e2ffcda16290 ("ACPI: OSL: Allow Notify () handlers to run on
all CPUs") ACPI notify handlers like the intel-vbtn notify_handler() may
run on multiple CPU cores racing with themselves.
This race gets hit on Dell Venue 7140 tablets when undocking from
the keyboard, causing the handler to try and register priv->switches_dev
twice, as can be seen from the dev_info() message getting logged twice:
[ 83.861800] intel-vbtn INT33D6:00: Registering Intel Virtual Switches input-dev after receiving a switch event
[ 83.861858] input: Intel Virtual Switches as /devices/pci0000:00/0000:00:1f.0/PNP0C09:00/INT33D6:00/input/input17
[ 83.861865] intel-vbtn INT33D6:00: Registering Intel Virtual Switches input-dev after receiving a switch event
After which things go seriously wrong:
[ 83.861872] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:1f.0/PNP0C09:00/INT33D6:00/input/input17'
...
[ 83.861967] kobject: kobject_add_internal failed for input17 with -EEXIST, don't try to register things with the same name in the same directory.
[ 83.877338] BUG: kernel NULL pointer dereference, address: 0000000000000018
...
Protect intel-vbtn notify_handler() from racing with itself with a mutex
to fix this. |
| In the Linux kernel, the following vulnerability has been resolved:
net/tcp: Disable TCP-AO static key after RCU grace period
The lifetime of TCP-AO static_key is the same as the last
tcp_ao_info. On the socket destruction tcp_ao_info ceases to be
with RCU grace period, while tcp-ao static branch is currently deferred
destructed. The static key definition is
: DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ);
which means that if RCU grace period is delayed by more than a second
and tcp_ao_needed is in the process of disablement, other CPUs may
yet see tcp_ao_info which atent dead, but soon-to-be.
And that breaks the assumption of static_key_fast_inc_not_disabled().
See the comment near the definition:
> * The caller must make sure that the static key can't get disabled while
> * in this function. It doesn't patch jump labels, only adds a user to
> * an already enabled static key.
Originally it was introduced in commit eb8c507296f6 ("jump_label:
Prevent key->enabled int overflow"), which is needed for the atomic
contexts, one of which would be the creation of a full socket from a
request socket. In that atomic context, it's known by the presence
of the key (md5/ao) that the static branch is already enabled.
So, the ref counter for that static branch is just incremented
instead of holding the proper mutex.
static_key_fast_inc_not_disabled() is just a helper for such usage
case. But it must not be used if the static branch could get disabled
in parallel as it's not protected by jump_label_mutex and as a result,
races with jump_label_update() implementation details.
Happened on netdev test-bot[1], so not a theoretical issue:
[] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1
[] ------------[ cut here ]------------
[] kernel BUG at arch/x86/kernel/jump_label.c:73!
[] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
[] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1
[] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[] Workqueue: events jump_label_update_timeout
[] RIP: 0010:__jump_label_patch+0x2f6/0x350
...
[] Call Trace:
[] <TASK>
[] arch_jump_label_transform_queue+0x6c/0x110
[] __jump_label_update+0xef/0x350
[] __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60
[] jump_label_update_timeout+0x2c/0x40
[] process_one_work+0xe3b/0x1670
[] worker_thread+0x587/0xce0
[] kthread+0x28a/0x350
[] ret_from_fork+0x31/0x70
[] ret_from_fork_asm+0x1a/0x30
[] </TASK>
[] Modules linked in: veth
[] ---[ end trace 0000000000000000 ]---
[] RIP: 0010:__jump_label_patch+0x2f6/0x350
[1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: ufs: core: Fix ufshcd_clear_cmd racing issue
When ufshcd_clear_cmd is racing with the completion ISR, the completed tag
of the request's mq_hctx pointer will be set to NULL by the ISR. And
ufshcd_clear_cmd's call to ufshcd_mcq_req_to_hwq will get NULL pointer KE.
Return success when the request is completed by ISR because sq does not
need cleanup.
The racing flow is:
Thread A
ufshcd_err_handler step 1
ufshcd_try_to_abort_task
ufshcd_cmd_inflight(true) step 3
ufshcd_clear_cmd
...
ufshcd_mcq_req_to_hwq
blk_mq_unique_tag
rq->mq_hctx->queue_num step 5
Thread B
ufs_mtk_mcq_intr(cq complete ISR) step 2
scsi_done
...
__blk_mq_free_request
rq->mq_hctx = NULL; step 4
Below is KE back trace:
ufshcd_try_to_abort_task: cmd pending in the device. tag = 6
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
pc : [0xffffffd589679bf8] blk_mq_unique_tag+0x8/0x14
lr : [0xffffffd5862f95b4] ufshcd_mcq_sq_cleanup+0x6c/0x1cc [ufs_mediatek_mod_ise]
Workqueue: ufs_eh_wq_0 ufshcd_err_handler [ufs_mediatek_mod_ise]
Call trace:
dump_backtrace+0xf8/0x148
show_stack+0x18/0x24
dump_stack_lvl+0x60/0x7c
dump_stack+0x18/0x3c
mrdump_common_die+0x24c/0x398 [mrdump]
ipanic_die+0x20/0x34 [mrdump]
notify_die+0x80/0xd8
die+0x94/0x2b8
__do_kernel_fault+0x264/0x298
do_page_fault+0xa4/0x4b8
do_translation_fault+0x38/0x54
do_mem_abort+0x58/0x118
el1_abort+0x3c/0x5c
el1h_64_sync_handler+0x54/0x90
el1h_64_sync+0x68/0x6c
blk_mq_unique_tag+0x8/0x14
ufshcd_clear_cmd+0x34/0x118 [ufs_mediatek_mod_ise]
ufshcd_try_to_abort_task+0x2c8/0x5b4 [ufs_mediatek_mod_ise]
ufshcd_err_handler+0xa7c/0xfa8 [ufs_mediatek_mod_ise]
process_one_work+0x208/0x4fc
worker_thread+0x228/0x438
kthread+0x104/0x1d4
ret_from_fork+0x10/0x20 |
| In the Linux kernel, the following vulnerability has been resolved:
scsi: ufs: core: Fix ufshcd_abort_one racing issue
When ufshcd_abort_one is racing with the completion ISR, the completed tag
of the request's mq_hctx pointer will be set to NULL by ISR. Return
success when request is completed by ISR because ufshcd_abort_one does not
need to do anything.
The racing flow is:
Thread A
ufshcd_err_handler step 1
...
ufshcd_abort_one
ufshcd_try_to_abort_task
ufshcd_cmd_inflight(true) step 3
ufshcd_mcq_req_to_hwq
blk_mq_unique_tag
rq->mq_hctx->queue_num step 5
Thread B
ufs_mtk_mcq_intr(cq complete ISR) step 2
scsi_done
...
__blk_mq_free_request
rq->mq_hctx = NULL; step 4
Below is KE back trace.
ufshcd_try_to_abort_task: cmd at tag 41 not pending in the device.
ufshcd_try_to_abort_task: cmd at tag=41 is cleared.
Aborting tag 41 / CDB 0x28 succeeded
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000194
pc : [0xffffffddd7a79bf8] blk_mq_unique_tag+0x8/0x14
lr : [0xffffffddd6155b84] ufshcd_mcq_req_to_hwq+0x1c/0x40 [ufs_mediatek_mod_ise]
do_mem_abort+0x58/0x118
el1_abort+0x3c/0x5c
el1h_64_sync_handler+0x54/0x90
el1h_64_sync+0x68/0x6c
blk_mq_unique_tag+0x8/0x14
ufshcd_err_handler+0xae4/0xfa8 [ufs_mediatek_mod_ise]
process_one_work+0x208/0x4fc
worker_thread+0x228/0x438
kthread+0x104/0x1d4
ret_from_fork+0x10/0x20 |
| In the Linux kernel, the following vulnerability has been resolved:
drm/drm_file: Fix pid refcounting race
<maarten.lankhorst@linux.intel.com>, Maxime Ripard
<mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>
filp->pid is supposed to be a refcounted pointer; however, before this
patch, drm_file_update_pid() only increments the refcount of a struct
pid after storing a pointer to it in filp->pid and dropping the
dev->filelist_mutex, making the following race possible:
process A process B
========= =========
begin drm_file_update_pid
mutex_lock(&dev->filelist_mutex)
rcu_replace_pointer(filp->pid, <pid B>, 1)
mutex_unlock(&dev->filelist_mutex)
begin drm_file_update_pid
mutex_lock(&dev->filelist_mutex)
rcu_replace_pointer(filp->pid, <pid A>, 1)
mutex_unlock(&dev->filelist_mutex)
get_pid(<pid A>)
synchronize_rcu()
put_pid(<pid B>) *** pid B reaches refcount 0 and is freed here ***
get_pid(<pid B>) *** UAF ***
synchronize_rcu()
put_pid(<pid A>)
As far as I know, this race can only occur with CONFIG_PREEMPT_RCU=y
because it requires RCU to detect a quiescent state in code that is not
explicitly calling into the scheduler.
This race leads to use-after-free of a "struct pid".
It is probably somewhat hard to hit because process A has to pass
through a synchronize_rcu() operation while process B is between
mutex_unlock() and get_pid().
Fix it by ensuring that by the time a pointer to the current task's pid
is stored in the file, an extra reference to the pid has been taken.
This fix also removes the condition for synchronize_rcu(); I think
that optimization is unnecessary complexity, since in that case we
would usually have bailed out on the lockless check above. |
| In the Linux kernel, the following vulnerability has been resolved:
usb: gadget: u_audio: Fix race condition use of controls after free during gadget unbind.
Hang on to the control IDs instead of pointers since those are correctly
handled with locks. |
| In the Linux kernel, the following vulnerability has been resolved:
platform/chrome: cros_ec_uart: properly fix race condition
The cros_ec_uart_probe() function calls devm_serdev_device_open() before
it calls serdev_device_set_client_ops(). This can trigger a NULL pointer
dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
Call Trace:
<TASK>
...
? ttyport_receive_buf
A simplified version of crashing code is as follows:
static inline size_t serdev_controller_receive_buf(struct serdev_controller *ctrl,
const u8 *data,
size_t count)
{
struct serdev_device *serdev = ctrl->serdev;
if (!serdev || !serdev->ops->receive_buf) // CRASH!
return 0;
return serdev->ops->receive_buf(serdev, data, count);
}
It assumes that if SERPORT_ACTIVE is set and serdev exists, serdev->ops
will also exist. This conflicts with the existing cros_ec_uart_probe()
logic, as it first calls devm_serdev_device_open() (which sets
SERPORT_ACTIVE), and only later sets serdev->ops via
serdev_device_set_client_ops().
Commit 01f95d42b8f4 ("platform/chrome: cros_ec_uart: fix race
condition") attempted to fix a similar race condition, but while doing
so, made the window of error for this race condition to happen much
wider.
Attempt to fix the race condition again, making sure we fully setup
before calling devm_serdev_device_open(). |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: flush pending destroy work before exit_net release
Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy
work before netlink notifier") to address a race between exit_net and
the destroy workqueue.
The trace below shows an element to be released via destroy workqueue
while exit_net path (triggered via module removal) has already released
the set that is used in such transaction.
[ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465
[ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359
[ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables]
[ 1360.547984] Call Trace:
[ 1360.547991] <TASK>
[ 1360.547998] dump_stack_lvl+0x53/0x70
[ 1360.548014] print_report+0xc4/0x610
[ 1360.548026] ? __virt_addr_valid+0xba/0x160
[ 1360.548040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[ 1360.548054] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548176] kasan_report+0xae/0xe0
[ 1360.548189] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548312] nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables]
[ 1360.548447] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables]
[ 1360.548577] ? _raw_spin_unlock_irq+0x18/0x30
[ 1360.548591] process_one_work+0x2f1/0x670
[ 1360.548610] worker_thread+0x4d3/0x760
[ 1360.548627] ? __pfx_worker_thread+0x10/0x10
[ 1360.548640] kthread+0x16b/0x1b0
[ 1360.548653] ? __pfx_kthread+0x10/0x10
[ 1360.548665] ret_from_fork+0x2f/0x50
[ 1360.548679] ? __pfx_kthread+0x10/0x10
[ 1360.548690] ret_from_fork_asm+0x1a/0x30
[ 1360.548707] </TASK>
[ 1360.548719] Allocated by task 192061:
[ 1360.548726] kasan_save_stack+0x20/0x40
[ 1360.548739] kasan_save_track+0x14/0x30
[ 1360.548750] __kasan_kmalloc+0x8f/0xa0
[ 1360.548760] __kmalloc_node+0x1f1/0x450
[ 1360.548771] nf_tables_newset+0x10c7/0x1b50 [nf_tables]
[ 1360.548883] nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink]
[ 1360.548909] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink]
[ 1360.548927] netlink_unicast+0x367/0x4f0
[ 1360.548935] netlink_sendmsg+0x34b/0x610
[ 1360.548944] ____sys_sendmsg+0x4d4/0x510
[ 1360.548953] ___sys_sendmsg+0xc9/0x120
[ 1360.548961] __sys_sendmsg+0xbe/0x140
[ 1360.548971] do_syscall_64+0x55/0x120
[ 1360.548982] entry_SYSCALL_64_after_hwframe+0x55/0x5d
[ 1360.548994] Freed by task 192222:
[ 1360.548999] kasan_save_stack+0x20/0x40
[ 1360.549009] kasan_save_track+0x14/0x30
[ 1360.549019] kasan_save_free_info+0x3b/0x60
[ 1360.549028] poison_slab_object+0x100/0x180
[ 1360.549036] __kasan_slab_free+0x14/0x30
[ 1360.549042] kfree+0xb6/0x260
[ 1360.549049] __nft_release_table+0x473/0x6a0 [nf_tables]
[ 1360.549131] nf_tables_exit_net+0x170/0x240 [nf_tables]
[ 1360.549221] ops_exit_list+0x50/0xa0
[ 1360.549229] free_exit_list+0x101/0x140
[ 1360.549236] unregister_pernet_operations+0x107/0x160
[ 1360.549245] unregister_pernet_subsys+0x1c/0x30
[ 1360.549254] nf_tables_module_exit+0x43/0x80 [nf_tables]
[ 1360.549345] __do_sys_delete_module+0x253/0x370
[ 1360.549352] do_syscall_64+0x55/0x120
[ 1360.549360] entry_SYSCALL_64_after_hwframe+0x55/0x5d
(gdb) list *__nft_release_table+0x473
0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354).
11349 list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) {
11350 list_del(&flowtable->list);
11351 nft_use_dec(&table->use);
11352 nf_tables_flowtable_destroy(flowtable);
11353 }
11354 list_for_each_entry_safe(set, ns, &table->sets, list) {
11355 list_del(&set->list);
11356 nft_use_dec(&table->use);
11357 if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT))
11358 nft_map_deactivat
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can
concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable().
And thhere is not any protection when iterate over nf_tables_flowtables
list in __nft_flowtable_type_get(). Therefore, there is pertential
data-race of nf_tables_flowtables list entry.
Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list
in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller
nft_flowtable_type_get() to protect the entire type query process. |
| In the Linux kernel, the following vulnerability has been resolved:
f2fs: compress: fix to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
It needs to cover {reserve,release}_compress_blocks() w/ cp_rwsem lock
to avoid racing with checkpoint, otherwise, filesystem metadata including
blkaddr in dnode, inode fields and .total_valid_block_count may be
corrupted after SPO case. |
| In the Linux kernel, the following vulnerability has been resolved:
tmpfs: fix race on handling dquot rbtree
A syzkaller reproducer found a race while attempting to remove dquot
information from the rb tree.
Fetching the rb_tree root node must also be protected by the
dqopt->dqio_sem, otherwise, giving the right timing, shmem_release_dquot()
will trigger a warning because it couldn't find a node in the tree, when
the real reason was the root node changing before the search starts:
Thread 1 Thread 2
- shmem_release_dquot() - shmem_{acquire,release}_dquot()
- fetch ROOT - Fetch ROOT
- acquire dqio_sem
- wait dqio_sem
- do something, triger a tree rebalance
- release dqio_sem
- acquire dqio_sem
- start searching for the node, but
from the wrong location, missing
the node, and triggering a warning. |
| In the Linux kernel, the following vulnerability has been resolved:
octeontx2-af: Use separate handlers for interrupts
For PF to AF interrupt vector and VF to AF vector same
interrupt handler is registered which is causing race condition.
When two interrupts are raised to two CPUs at same time
then two cores serve same event corrupting the data. |
| In the Linux kernel, the following vulnerability has been resolved:
mm: swap: fix race between free_swap_and_cache() and swapoff()
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map.
This is a theoretical problem and I haven't been able to provoke it from a
test case. But there has been agreement based on code review that this is
possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was not free. This isn't present in get_swap_device()
because it doesn't make sense in general due to the race between getting
the reference and swapoff. So I've added an equivalent check directly in
free_swap_and_cache().
Details of how to provoke one possible issue (thanks to David Hildenbrand
for deriving this):
--8<-----
__swap_entry_free() might be the last user and result in
"count == SWAP_HAS_CACHE".
swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.
So the question is: could someone reclaim the folio and turn
si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().
Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are
still references by swap entries.
Process 1 still references subpage 0 via swap entry.
Process 2 still references subpage 1 via swap entry.
Process 1 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE
[then, preempted in the hypervisor etc.]
Process 2 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE
Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls
__try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()->
...
WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
What stops swapoff to succeed after process 2 reclaimed the swap cache
but before process1 finished its call to swap_page_trans_huge_swapped()?
--8<----- |
| In the Linux kernel, the following vulnerability has been resolved:
net: phy: qcom: at803x: fix kernel panic with at8031_probe
On reworking and splitting the at803x driver, in splitting function of
at803x PHYs it was added a NULL dereference bug where priv is referenced
before it's actually allocated and then is tried to write to for the
is_1000basex and is_fiber variables in the case of at8031, writing on
the wrong address.
Fix this by correctly setting priv local variable only after
at803x_probe is called and actually allocates priv in the phydev struct. |
| In the Linux kernel, the following vulnerability has been resolved:
packet: annotate data-races around ignore_outgoing
ignore_outgoing is read locklessly from dev_queue_xmit_nit()
and packet_getsockopt()
Add appropriate READ_ONCE()/WRITE_ONCE() annotations.
syzbot reported:
BUG: KCSAN: data-race in dev_queue_xmit_nit / packet_setsockopt
write to 0xffff888107804542 of 1 bytes by task 22618 on cpu 0:
packet_setsockopt+0xd83/0xfd0 net/packet/af_packet.c:4003
do_sock_setsockopt net/socket.c:2311 [inline]
__sys_setsockopt+0x1d8/0x250 net/socket.c:2334
__do_sys_setsockopt net/socket.c:2343 [inline]
__se_sys_setsockopt net/socket.c:2340 [inline]
__x64_sys_setsockopt+0x66/0x80 net/socket.c:2340
do_syscall_64+0xd3/0x1d0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
read to 0xffff888107804542 of 1 bytes by task 27 on cpu 1:
dev_queue_xmit_nit+0x82/0x620 net/core/dev.c:2248
xmit_one net/core/dev.c:3527 [inline]
dev_hard_start_xmit+0xcc/0x3f0 net/core/dev.c:3547
__dev_queue_xmit+0xf24/0x1dd0 net/core/dev.c:4335
dev_queue_xmit include/linux/netdevice.h:3091 [inline]
batadv_send_skb_packet+0x264/0x300 net/batman-adv/send.c:108
batadv_send_broadcast_skb+0x24/0x30 net/batman-adv/send.c:127
batadv_iv_ogm_send_to_if net/batman-adv/bat_iv_ogm.c:392 [inline]
batadv_iv_ogm_emit net/batman-adv/bat_iv_ogm.c:420 [inline]
batadv_iv_send_outstanding_bat_ogm_packet+0x3f0/0x4b0 net/batman-adv/bat_iv_ogm.c:1700
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0x465/0x990 kernel/workqueue.c:3335
worker_thread+0x526/0x730 kernel/workqueue.c:3416
kthread+0x1d1/0x210 kernel/kthread.c:388
ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:243
value changed: 0x00 -> 0x01
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 27 Comm: kworker/u8:1 Tainted: G W 6.8.0-syzkaller-08073-g480e035fc4c7 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024
Workqueue: bat_events batadv_iv_send_outstanding_bat_ogm_packet |
| In the Linux kernel, the following vulnerability has been resolved:
net/bnx2x: Prevent access to a freed page in page_pool
Fix race condition leading to system crash during EEH error handling
During EEH error recovery, the bnx2x driver's transmit timeout logic
could cause a race condition when handling reset tasks. The
bnx2x_tx_timeout() schedules reset tasks via bnx2x_sp_rtnl_task(),
which ultimately leads to bnx2x_nic_unload(). In bnx2x_nic_unload()
SGEs are freed using bnx2x_free_rx_sge_range(). However, this could
overlap with the EEH driver's attempt to reset the device using
bnx2x_io_slot_reset(), which also tries to free SGEs. This race
condition can result in system crashes due to accessing freed memory
locations in bnx2x_free_rx_sge()
799 static inline void bnx2x_free_rx_sge(struct bnx2x *bp,
800 struct bnx2x_fastpath *fp, u16 index)
801 {
802 struct sw_rx_page *sw_buf = &fp->rx_page_ring[index];
803 struct page *page = sw_buf->page;
....
where sw_buf was set to NULL after the call to dma_unmap_page()
by the preceding thread.
EEH: Beginning: 'slot_reset'
PCI 0011:01:00.0#10000: EEH: Invoking bnx2x->slot_reset()
bnx2x: [bnx2x_io_slot_reset:14228(eth1)]IO slot reset initializing...
bnx2x 0011:01:00.0: enabling device (0140 -> 0142)
bnx2x: [bnx2x_io_slot_reset:14244(eth1)]IO slot reset --> driver unload
Kernel attempted to read user page (0) - exploit attempt? (uid: 0)
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc0080000025065fc
Oops: Kernel access of bad area, sig: 11 [#1]
.....
Call Trace:
[c000000003c67a20] [c00800000250658c] bnx2x_io_slot_reset+0x204/0x610 [bnx2x] (unreliable)
[c000000003c67af0] [c0000000000518a8] eeh_report_reset+0xb8/0xf0
[c000000003c67b60] [c000000000052130] eeh_pe_report+0x180/0x550
[c000000003c67c70] [c00000000005318c] eeh_handle_normal_event+0x84c/0xa60
[c000000003c67d50] [c000000000053a84] eeh_event_handler+0xf4/0x170
[c000000003c67da0] [c000000000194c58] kthread+0x1c8/0x1d0
[c000000003c67e10] [c00000000000cf64] ret_from_kernel_thread+0x5c/0x64
To solve this issue, we need to verify page pool allocations before
freeing. |
| In the Linux kernel, the following vulnerability has been resolved:
RDMA/irdma: Fix KASAN issue with tasklet
KASAN testing revealed the following issue assocated with freeing an IRQ.
[50006.466686] Call Trace:
[50006.466691] <IRQ>
[50006.489538] dump_stack+0x5c/0x80
[50006.493475] print_address_description.constprop.6+0x1a/0x150
[50006.499872] ? irdma_sc_process_ceq+0x483/0x790 [irdma]
[50006.505742] ? irdma_sc_process_ceq+0x483/0x790 [irdma]
[50006.511644] kasan_report.cold.11+0x7f/0x118
[50006.516572] ? irdma_sc_process_ceq+0x483/0x790 [irdma]
[50006.522473] irdma_sc_process_ceq+0x483/0x790 [irdma]
[50006.528232] irdma_process_ceq+0xb2/0x400 [irdma]
[50006.533601] ? irdma_hw_flush_wqes_callback+0x370/0x370 [irdma]
[50006.540298] irdma_ceq_dpc+0x44/0x100 [irdma]
[50006.545306] tasklet_action_common.isra.14+0x148/0x2c0
[50006.551096] __do_softirq+0x1d0/0xaf8
[50006.555396] irq_exit_rcu+0x219/0x260
[50006.559670] irq_exit+0xa/0x20
[50006.563320] smp_apic_timer_interrupt+0x1bf/0x690
[50006.568645] apic_timer_interrupt+0xf/0x20
[50006.573341] </IRQ>
The issue is that a tasklet could be pending on another core racing
the delete of the irq.
Fix by insuring any scheduled tasklet is killed after deleting the
irq. |
| In the Linux kernel, the following vulnerability has been resolved:
net: bridge: switchdev: Skip MDB replays of deferred events on offload
Before this change, generation of the list of MDB events to replay
would race against the creation of new group memberships, either from
the IGMP/MLD snooping logic or from user configuration.
While new memberships are immediately visible to walkers of
br->mdb_list, the notification of their existence to switchdev event
subscribers is deferred until a later point in time. So if a replay
list was generated during a time that overlapped with such a window,
it would also contain a replay of the not-yet-delivered event.
The driver would thus receive two copies of what the bridge internally
considered to be one single event. On destruction of the bridge, only
a single membership deletion event was therefore sent. As a
consequence of this, drivers which reference count memberships (at
least DSA), would be left with orphan groups in their hardware
database when the bridge was destroyed.
This is only an issue when replaying additions. While deletion events
may still be pending on the deferred queue, they will already have
been removed from br->mdb_list, so no duplicates can be generated in
that scenario.
To a user this meant that old group memberships, from a bridge in
which a port was previously attached, could be reanimated (in
hardware) when the port joined a new bridge, without the new bridge's
knowledge.
For example, on an mv88e6xxx system, create a snooping bridge and
immediately add a port to it:
root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \
> ip link set dev x3 up master br0
And then destroy the bridge:
root@infix-06-0b-00:~$ ip link del dev br0
root@infix-06-0b-00:~$ mvls atu
ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a
DEV:0 Marvell 88E6393X
33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . .
33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . .
ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a
root@infix-06-0b-00:~$
The two IPv6 groups remain in the hardware database because the
port (x3) is notified of the host's membership twice: once via the
original event and once via a replay. Since only a single delete
notification is sent, the count remains at 1 when the bridge is
destroyed.
Then add the same port (or another port belonging to the same hardware
domain) to a new bridge, this time with snooping disabled:
root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \
> ip link set dev x3 up master br1
All multicast, including the two IPv6 groups from br0, should now be
flooded, according to the policy of br1. But instead the old
memberships are still active in the hardware database, causing the
switch to only forward traffic to those groups towards the CPU (port
0).
Eliminate the race in two steps:
1. Grab the write-side lock of the MDB while generating the replay
list.
This prevents new memberships from showing up while we are generating
the replay list. But it leaves the scenario in which a deferred event
was already generated, but not delivered, before we grabbed the
lock. Therefore:
2. Make sure that no deferred version of a replay event is already
enqueued to the switchdev deferred queue, before adding it to the
replay list, when replaying additions. |
| In the Linux kernel, the following vulnerability has been resolved:
vfio/pci: Lock external INTx masking ops
Mask operations through config space changes to DisINTx may race INTx
configuration changes via ioctl. Create wrappers that add locking for
paths outside of the core interrupt code.
In particular, irq_type is updated holding igate, therefore testing
is_intx() requires holding igate. For example clearing DisINTx from
config space can otherwise race changes of the interrupt configuration.
This aligns interfaces which may trigger the INTx eventfd into two
camps, one side serialized by igate and the other only enabled while
INTx is configured. A subsequent patch introduces synchronization for
the latter flows. |