| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
f2fs: ensure node page reads complete before f2fs_put_super() finishes
Xfstests generic/335, generic/336 sometimes crash with the following message:
F2FS-fs (dm-0): detect filesystem reference count leak during umount, type: 9, count: 1
------------[ cut here ]------------
kernel BUG at fs/f2fs/super.c:1939!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 609351 Comm: umount Tainted: G W 6.17.0-rc5-xfstests-g9dd1835ecda5 #1 PREEMPT(none)
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:f2fs_put_super+0x3b3/0x3c0
Call Trace:
<TASK>
generic_shutdown_super+0x7e/0x190
kill_block_super+0x1a/0x40
kill_f2fs_super+0x9d/0x190
deactivate_locked_super+0x30/0xb0
cleanup_mnt+0xba/0x150
task_work_run+0x5c/0xa0
exit_to_user_mode_loop+0xb7/0xc0
do_syscall_64+0x1ae/0x1c0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
</TASK>
---[ end trace 0000000000000000 ]---
It appears that sometimes it is possible that f2fs_put_super() is called before
all node page reads are completed.
Adding a call to f2fs_wait_on_all_pages() for F2FS_RD_NODE fixes the problem. |
| In the Linux kernel, the following vulnerability has been resolved:
iommufd/selftest: Check for overflow in IOMMU_TEST_OP_ADD_RESERVED
syzkaller found it could overflow math in the test infrastructure and
cause a WARN_ON by corrupting the reserved interval tree. This only
effects test kernels with CONFIG_IOMMUFD_TEST.
Validate the user input length in the test ioctl. |
| In the Linux kernel, the following vulnerability has been resolved:
erspan: Initialize options_len before referencing options.
The struct ip_tunnel_info has a flexible array member named
options that is protected by a counted_by(options_len)
attribute.
The compiler will use this information to enforce runtime bounds
checking deployed by FORTIFY_SOURCE string helpers.
As laid out in the GCC documentation, the counter must be
initialized before the first reference to the flexible array
member.
After scanning through the files that use struct ip_tunnel_info
and also refer to options or options_len, it appears the normal
case is to use the ip_tunnel_info_opts_set() helper.
Said helper would initialize options_len properly before copying
data into options, however in the GRE ERSPAN code a partial
update is done, preventing the use of the helper function.
Before this change the handling of ERSPAN traffic in GRE tunnels
would cause a kernel panic when the kernel is compiled with
GCC 15+ and having FORTIFY_SOURCE configured:
memcpy: detected buffer overflow: 4 byte write of buffer size 0
Call Trace:
<IRQ>
__fortify_panic+0xd/0xf
erspan_rcv.cold+0x68/0x83
? ip_route_input_slow+0x816/0x9d0
gre_rcv+0x1b2/0x1c0
gre_rcv+0x8e/0x100
? raw_v4_input+0x2a0/0x2b0
ip_protocol_deliver_rcu+0x1ea/0x210
ip_local_deliver_finish+0x86/0x110
ip_local_deliver+0x65/0x110
? ip_rcv_finish_core+0xd6/0x360
ip_rcv+0x186/0x1a0
Reported-at: https://launchpad.net/bugs/2129580 |
| In the Linux kernel, the following vulnerability has been resolved:
cpuset: fix warning when disabling remote partition
A warning was triggered as follows:
WARNING: kernel/cgroup/cpuset.c:1651 at remote_partition_disable+0xf7/0x110
RIP: 0010:remote_partition_disable+0xf7/0x110
RSP: 0018:ffffc90001947d88 EFLAGS: 00000206
RAX: 0000000000007fff RBX: ffff888103b6e000 RCX: 0000000000006f40
RDX: 0000000000006f00 RSI: ffffc90001947da8 RDI: ffff888103b6e000
RBP: ffff888103b6e000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffff88810b2e2728 R12: ffffc90001947da8
R13: 0000000000000000 R14: ffffc90001947da8 R15: ffff8881081f1c00
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f55c8bbe0b2 CR3: 000000010b14c000 CR4: 00000000000006f0
Call Trace:
<TASK>
update_prstate+0x2d3/0x580
cpuset_partition_write+0x94/0xf0
kernfs_fop_write_iter+0x147/0x200
vfs_write+0x35d/0x500
ksys_write+0x66/0xe0
do_syscall_64+0x6b/0x390
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7f55c8cd4887
Reproduction steps (on a 16-CPU machine):
# cd /sys/fs/cgroup/
# mkdir A1
# echo +cpuset > A1/cgroup.subtree_control
# echo "0-14" > A1/cpuset.cpus.exclusive
# mkdir A1/A2
# echo "0-14" > A1/A2/cpuset.cpus.exclusive
# echo "root" > A1/A2/cpuset.cpus.partition
# echo 0 > /sys/devices/system/cpu/cpu15/online
# echo member > A1/A2/cpuset.cpus.partition
When CPU 15 is offlined, subpartitions_cpus gets cleared because no CPUs
remain available for the top_cpuset, forcing partitions to share CPUs with
the top_cpuset. In this scenario, disabling the remote partition triggers
a warning stating that effective_xcpus is not a subset of
subpartitions_cpus. Partitions should be invalidated in this case to
inform users that the partition is now invalid(cpus are shared with
top_cpuset).
To fix this issue:
1. Only emit the warning only if subpartitions_cpus is not empty and the
effective_xcpus is not a subset of subpartitions_cpus.
2. During the CPU hotplug process, invalidate partitions if
subpartitions_cpus is empty. |
| In the Linux kernel, the following vulnerability has been resolved:
media: mediatek: vcodec: Use spinlock for context list protection lock
Previously a mutex was added to protect the encoder and decoder context
lists from unexpected changes originating from the SCP IP block, causing
the context pointer to go invalid, resulting in a NULL pointer
dereference in the IPI handler.
Turns out on the MT8173, the VPU IPI handler is called from hard IRQ
context. This causes a big warning from the scheduler. This was first
reported downstream on the ChromeOS kernels, but is also reproducible
on mainline using Fluster with the FFmpeg v4l2m2m decoders. Even though
the actual capture format is not supported, the affected code paths
are triggered.
Since this lock just protects the context list and operations on it are
very fast, it should be OK to switch to a spinlock. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/msm: adreno: fix deferencing ifpc_reglist when not declared
On plaforms with an a7xx GPU not supporting IFPC, the ifpc_reglist
if still deferenced in a7xx_patch_pwrup_reglist() which causes
a kernel crash:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
...
pc : a6xx_hw_init+0x155c/0x1e4c [msm]
lr : a6xx_hw_init+0x9a8/0x1e4c [msm]
...
Call trace:
a6xx_hw_init+0x155c/0x1e4c [msm] (P)
msm_gpu_hw_init+0x58/0x88 [msm]
adreno_load_gpu+0x94/0x1fc [msm]
msm_open+0xe4/0xf4 [msm]
drm_file_alloc+0x1a0/0x2e4 [drm]
drm_client_init+0x7c/0x104 [drm]
drm_fbdev_client_setup+0x94/0xcf0 [drm_client_lib]
drm_client_setup+0xb4/0xd8 [drm_client_lib]
msm_drm_kms_post_init+0x2c/0x3c [msm]
msm_drm_init+0x1a4/0x228 [msm]
msm_drm_bind+0x30/0x3c [msm]
...
Check the validity of ifpc_reglist before deferencing the table
to setup the register values.
Patchwork: https://patchwork.freedesktop.org/patch/688944/ |
| In the Linux kernel, the following vulnerability has been resolved:
block: Remove queue freezing from several sysfs store callbacks
Freezing the request queue from inside sysfs store callbacks may cause a
deadlock in combination with the dm-multipath driver and the
queue_if_no_path option. Additionally, freezing the request queue slows
down system boot on systems where sysfs attributes are set synchronously.
Fix this by removing the blk_mq_freeze_queue() / blk_mq_unfreeze_queue()
calls from the store callbacks that do not strictly need these callbacks.
Add the __data_racy annotation to request_queue.rq_timeout to suppress
KCSAN data race reports about the rq_timeout reads.
This patch may cause a small delay in applying the new settings.
For all the attributes affected by this patch, I/O will complete
correctly whether the old or the new value of the attribute is used.
This patch affects the following sysfs attributes:
* io_poll_delay
* io_timeout
* nomerges
* read_ahead_kb
* rq_affinity
Here is an example of a deadlock triggered by running test srp/002
if this patch is not applied:
task:multipathd
Call Trace:
<TASK>
__schedule+0x8c1/0x1bf0
schedule+0xdd/0x270
schedule_preempt_disabled+0x1c/0x30
__mutex_lock+0xb89/0x1650
mutex_lock_nested+0x1f/0x30
dm_table_set_restrictions+0x823/0xdf0
__bind+0x166/0x590
dm_swap_table+0x2a7/0x490
do_resume+0x1b1/0x610
dev_suspend+0x55/0x1a0
ctl_ioctl+0x3a5/0x7e0
dm_ctl_ioctl+0x12/0x20
__x64_sys_ioctl+0x127/0x1a0
x64_sys_call+0xe2b/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK>
task:(udev-worker)
Call Trace:
<TASK>
__schedule+0x8c1/0x1bf0
schedule+0xdd/0x270
blk_mq_freeze_queue_wait+0xf2/0x140
blk_mq_freeze_queue_nomemsave+0x23/0x30
queue_ra_store+0x14e/0x290
queue_attr_store+0x23e/0x2c0
sysfs_kf_write+0xde/0x140
kernfs_fop_write_iter+0x3b2/0x630
vfs_write+0x4fd/0x1390
ksys_write+0xfd/0x230
__x64_sys_write+0x76/0xc0
x64_sys_call+0x276/0x17d0
do_syscall_64+0x96/0x3a0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
</TASK> |
| In the Linux kernel, the following vulnerability has been resolved:
md/raid5: fix possible null-pointer dereferences in raid5_store_group_thread_cnt()
The variable mddev->private is first assigned to conf and then checked:
conf = mddev->private;
if (!conf) ...
If conf is NULL, then mddev->private is also NULL. In this case,
null-pointer dereferences can occur when calling raid5_quiesce():
raid5_quiesce(mddev, true);
raid5_quiesce(mddev, false);
since mddev->private is assigned to conf again in raid5_quiesce(), and conf
is dereferenced in several places, for example:
conf->quiesce = 0;
wake_up(&conf->wait_for_quiescent);
To fix this issue, the function should unlock mddev and return before
invoking raid5_quiesce() when conf is NULL, following the existing pattern
in raid5_change_consistency_policy(). |
| In the Linux kernel, the following vulnerability has been resolved:
mm/slub: reset KASAN tag in defer_free() before accessing freed memory
When CONFIG_SLUB_TINY is enabled, kfree_nolock() calls kasan_slab_free()
before defer_free(). On ARM64 with MTE (Memory Tagging Extension),
kasan_slab_free() poisons the memory and changes the tag from the
original (e.g., 0xf3) to a poison tag (0xfe).
When defer_free() then tries to write to the freed object to build the
deferred free list via llist_add(), the pointer still has the old tag,
causing a tag mismatch and triggering a KASAN use-after-free report:
BUG: KASAN: slab-use-after-free in defer_free+0x3c/0xbc mm/slub.c:6537
Write at addr f3f000000854f020 by task kworker/u8:6/983
Pointer tag: [f3], memory tag: [fe]
Fix this by calling kasan_reset_tag() before accessing the freed memory.
This is safe because defer_free() is part of the allocator itself and is
expected to manipulate freed memory for bookkeeping purposes. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/msm/dpu: Add missing NULL pointer check for pingpong interface
It is checked almost always in dpu_encoder_phys_wb_setup_ctl(), but in a
single place the check is missing.
Also use convenient locals instead of phys_enc->* where available.
Patchwork: https://patchwork.freedesktop.org/patch/693860/ |
| In the Linux kernel, the following vulnerability has been resolved:
nvme: avoid double free special payload
If a discard request needs to be retried, and that retry may fail before
a new special payload is added, a double free will result. Clear the
RQF_SPECIAL_LOAD when the request is cleaned. |
| In the Linux kernel, the following vulnerability has been resolved:
block/ioctl: prefer different overflow check
Running syzkaller with the newly reintroduced signed integer overflow
sanitizer shows this report:
[ 62.982337] ------------[ cut here ]------------
[ 62.985692] cgroup: Invalid name
[ 62.986211] UBSAN: signed-integer-overflow in ../block/ioctl.c:36:46
[ 62.989370] 9pnet_fd: p9_fd_create_tcp (7343): problem connecting socket to 127.0.0.1
[ 62.992992] 9223372036854775807 + 4095 cannot be represented in type 'long long'
[ 62.997827] 9pnet_fd: p9_fd_create_tcp (7345): problem connecting socket to 127.0.0.1
[ 62.999369] random: crng reseeded on system resumption
[ 63.000634] GUP no longer grows the stack in syz-executor.2 (7353): 20002000-20003000 (20001000)
[ 63.000668] CPU: 0 PID: 7353 Comm: syz-executor.2 Not tainted 6.8.0-rc2-00035-gb3ef86b5a957 #1
[ 63.000677] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[ 63.000682] Call Trace:
[ 63.000686] <TASK>
[ 63.000731] dump_stack_lvl+0x93/0xd0
[ 63.000919] __get_user_pages+0x903/0xd30
[ 63.001030] __gup_longterm_locked+0x153e/0x1ba0
[ 63.001041] ? _raw_read_unlock_irqrestore+0x17/0x50
[ 63.001072] ? try_get_folio+0x29c/0x2d0
[ 63.001083] internal_get_user_pages_fast+0x1119/0x1530
[ 63.001109] iov_iter_extract_pages+0x23b/0x580
[ 63.001206] bio_iov_iter_get_pages+0x4de/0x1220
[ 63.001235] iomap_dio_bio_iter+0x9b6/0x1410
[ 63.001297] __iomap_dio_rw+0xab4/0x1810
[ 63.001316] iomap_dio_rw+0x45/0xa0
[ 63.001328] ext4_file_write_iter+0xdde/0x1390
[ 63.001372] vfs_write+0x599/0xbd0
[ 63.001394] ksys_write+0xc8/0x190
[ 63.001403] do_syscall_64+0xd4/0x1b0
[ 63.001421] ? arch_exit_to_user_mode_prepare+0x3a/0x60
[ 63.001479] entry_SYSCALL_64_after_hwframe+0x6f/0x77
[ 63.001535] RIP: 0033:0x7f7fd3ebf539
[ 63.001551] Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
[ 63.001562] RSP: 002b:00007f7fd32570c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 63.001584] RAX: ffffffffffffffda RBX: 00007f7fd3ff3f80 RCX: 00007f7fd3ebf539
[ 63.001590] RDX: 4db6d1e4f7e43360 RSI: 0000000020000000 RDI: 0000000000000004
[ 63.001595] RBP: 00007f7fd3f1e496 R08: 0000000000000000 R09: 0000000000000000
[ 63.001599] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 63.001604] R13: 0000000000000006 R14: 00007f7fd3ff3f80 R15: 00007ffd415ad2b8
...
[ 63.018142] ---[ end trace ]---
Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang; It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").
Let's rework this overflow checking logic to not actually perform an
overflow during the check itself, thus avoiding the UBSAN splat.
[1]: https://github.com/llvm/llvm-project/pull/82432 |
| Unspecified vulnerability in the ExternalInterface ActionScript functionality in Adobe Flash Player before 10.3.183.67 and 11.x before 11.6.602.171 on Windows and Mac OS X, and before 10.3.183.67 and 11.x before 11.2.202.273 on Linux, allows remote attackers to execute arbitrary code via crafted SWF content, as exploited in the wild in February 2013. |
| In the Linux kernel, the following vulnerability has been resolved:
media: venus: Add a check for packet size after reading from shared memory
Add a check to ensure that the packet size does not exceed the number of
available words after reading the packet header from shared memory. This
ensures that the size provided by the firmware is safe to process and
prevent potential out-of-bounds memory access. |
| In the Linux kernel, the following vulnerability has been resolved:
media: usbtv: Lock resolution while streaming
When an program is streaming (ffplay) and another program (qv4l2)
changes the TV standard from NTSC to PAL, the kernel crashes due to trying
to copy to unmapped memory.
Changing from NTSC to PAL increases the resolution in the usbtv struct,
but the video plane buffer isn't adjusted, so it overflows.
[hverkuil: call vb2_is_busy instead of vb2_is_streaming] |
| In the Linux kernel, the following vulnerability has been resolved:
serial: 8250: fix panic due to PSLVERR
When the PSLVERR_RESP_EN parameter is set to 1, the device generates
an error response if an attempt is made to read an empty RBR (Receive
Buffer Register) while the FIFO is enabled.
In serial8250_do_startup(), calling serial_port_out(port, UART_LCR,
UART_LCR_WLEN8) triggers dw8250_check_lcr(), which invokes
dw8250_force_idle() and serial8250_clear_and_reinit_fifos(). The latter
function enables the FIFO via serial_out(p, UART_FCR, p->fcr).
Execution proceeds to the serial_port_in(port, UART_RX).
This satisfies the PSLVERR trigger condition.
When another CPU (e.g., using printk()) is accessing the UART (UART
is busy), the current CPU fails the check (value & ~UART_LCR_SPAR) ==
(lcr & ~UART_LCR_SPAR) in dw8250_check_lcr(), causing it to enter
dw8250_force_idle().
Put serial_port_out(port, UART_LCR, UART_LCR_WLEN8) under the port->lock
to fix this issue.
Panic backtrace:
[ 0.442336] Oops - unknown exception [#1]
[ 0.442343] epc : dw8250_serial_in32+0x1e/0x4a
[ 0.442351] ra : serial8250_do_startup+0x2c8/0x88e
...
[ 0.442416] console_on_rootfs+0x26/0x70 |
| In the Linux kernel, the following vulnerability has been resolved:
NFS: Fix filehandle bounds checking in nfs_fh_to_dentry()
The function needs to check the minimal filehandle length before it can
access the embedded filehandle. |
| In the Linux kernel, the following vulnerability has been resolved:
Revert "fs/ntfs3: Replace inode_trylock with inode_lock"
This reverts commit 69505fe98f198ee813898cbcaf6770949636430b.
Initially, conditional lock acquisition was removed to fix an xfstest bug
that was observed during internal testing. The deadlock reported by syzbot
is resolved by reintroducing conditional acquisition. The xfstest bug no
longer occurs on kernel version 6.16-rc1 during internal testing. I
assume that changes in other modules may have contributed to this. |
| In the Linux kernel, the following vulnerability has been resolved:
smb: client: Fix use-after-free in cifs_fill_dirent
There is a race condition in the readdir concurrency process, which may
access the rsp buffer after it has been released, triggering the
following KASAN warning.
==================================================================
BUG: KASAN: slab-use-after-free in cifs_fill_dirent+0xb03/0xb60 [cifs]
Read of size 4 at addr ffff8880099b819c by task a.out/342975
CPU: 2 UID: 0 PID: 342975 Comm: a.out Not tainted 6.15.0-rc6+ #240 PREEMPT(full)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x53/0x70
print_report+0xce/0x640
kasan_report+0xb8/0xf0
cifs_fill_dirent+0xb03/0xb60 [cifs]
cifs_readdir+0x12cb/0x3190 [cifs]
iterate_dir+0x1a1/0x520
__x64_sys_getdents+0x134/0x220
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f996f64b9f9
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
f0 ff ff 0d f7 c3 0c 00 f7 d8 64 89 8
RSP: 002b:00007f996f53de78 EFLAGS: 00000207 ORIG_RAX: 000000000000004e
RAX: ffffffffffffffda RBX: 00007f996f53ecdc RCX: 00007f996f64b9f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 00007f996f53dea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000207 R12: ffffffffffffff88
R13: 0000000000000000 R14: 00007ffc8cd9a500 R15: 00007f996f51e000
</TASK>
Allocated by task 408:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
__kasan_slab_alloc+0x6e/0x70
kmem_cache_alloc_noprof+0x117/0x3d0
mempool_alloc_noprof+0xf2/0x2c0
cifs_buf_get+0x36/0x80 [cifs]
allocate_buffers+0x1d2/0x330 [cifs]
cifs_demultiplex_thread+0x22b/0x2690 [cifs]
kthread+0x394/0x720
ret_from_fork+0x34/0x70
ret_from_fork_asm+0x1a/0x30
Freed by task 342979:
kasan_save_stack+0x20/0x40
kasan_save_track+0x14/0x30
kasan_save_free_info+0x3b/0x60
__kasan_slab_free+0x37/0x50
kmem_cache_free+0x2b8/0x500
cifs_buf_release+0x3c/0x70 [cifs]
cifs_readdir+0x1c97/0x3190 [cifs]
iterate_dir+0x1a1/0x520
__x64_sys_getdents64+0x134/0x220
do_syscall_64+0x4b/0x110
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The buggy address belongs to the object at ffff8880099b8000
which belongs to the cache cifs_request of size 16588
The buggy address is located 412 bytes inside of
freed 16588-byte region [ffff8880099b8000, ffff8880099bc0cc)
The buggy address belongs to the physical page:
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x99b8
head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
anon flags: 0x80000000000040(head|node=0|zone=1)
page_type: f5(slab)
raw: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001
raw: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000
head: 0080000000000040 ffff888001e03400 0000000000000000 dead000000000001
head: 0000000000000000 0000000000010001 00000000f5000000 0000000000000000
head: 0080000000000003 ffffea0000266e01 00000000ffffffff 00000000ffffffff
head: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000008
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8880099b8080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880099b8100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8880099b8180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8880099b8200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8880099b8280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
POC is available in the link [1].
The problem triggering process is as follows:
Process 1 Process 2
-----------------------------------
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
net/ipv6: release expired exception dst cached in socket
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
before TCP executes ip6_negative_advice() for the expired exception dst
When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.
As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2
Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was
92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But 92f1655aa2b22
merely refactored the code with regards to the dst refcount so the issue
was present even before 92f1655aa2b22. The bug was introduced in
54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.
The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst. |