| CVE |
Vendors |
Products |
Updated |
CVSS v3.1 |
| In the Linux kernel, the following vulnerability has been resolved:
dmaengine: sf-pdma: pdma_desc memory leak fix
Commit b2cc5c465c2c ("dmaengine: sf-pdma: Add multithread support for a
DMA channel") changed sf_pdma_prep_dma_memcpy() to unconditionally
allocate a new sf_pdma_desc each time it is called.
The driver previously recycled descs, by checking the in_use flag, only
allocating additional descs if the existing one was in use. This logic
was removed in commit b2cc5c465c2c ("dmaengine: sf-pdma: Add multithread
support for a DMA channel"), but sf_pdma_free_desc() was not changed to
handle the new behaviour.
As a result, each time sf_pdma_prep_dma_memcpy() is called, the previous
descriptor is leaked, over time leading to memory starvation:
unreferenced object 0xffffffe008447300 (size 192):
comm "irq/39-mchp_dsc", pid 343, jiffies 4294906910 (age 981.200s)
hex dump (first 32 bytes):
00 00 00 ff 00 00 00 00 b8 c1 00 00 00 00 00 00 ................
00 00 70 08 10 00 00 00 00 00 00 c0 00 00 00 00 ..p.............
backtrace:
[<00000000064a04f4>] kmemleak_alloc+0x1e/0x28
[<00000000018927a7>] kmem_cache_alloc+0x11e/0x178
[<000000002aea8d16>] sf_pdma_prep_dma_memcpy+0x40/0x112
Add the missing kfree() to sf_pdma_free_desc(), and remove the redundant
in_use flag. |
| In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: Devcom, fix error flow in mlx5_devcom_register_device
In case devcom allocation is failed, mlx5 is always freeing the priv.
However, this priv might have been allocated by a different thread,
and freeing it might lead to use-after-free bugs.
Fix it by freeing the priv only in case it was allocated by the
running thread. |
| In the Linux kernel, the following vulnerability has been resolved:
io_uring/net: don't overflow multishot recv
Don't allow overflowing multishot recv CQEs, it might get out of
hand, hurt performance, and in the worst case scenario OOM the task. |
| In the Linux kernel, the following vulnerability has been resolved:
drm/msm/hdmi: Add missing check for alloc_ordered_workqueue
Add check for the return value of alloc_ordered_workqueue as it may return
NULL pointer and cause NULL pointer dereference in `hdmi_hdcp.c` and
`hdmi_hpd.c`.
Patchwork: https://patchwork.freedesktop.org/patch/517211/ |
| In the Linux kernel, the following vulnerability has been resolved:
RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task"
In the function rxe_create_qp(), rxe_qp_from_init() is called to
initialize qp, internally things like rxe_init_task are not setup until
rxe_qp_init_req().
If an error occurred before this point then the unwind will call
rxe_cleanup() and eventually to rxe_qp_do_cleanup()/rxe_cleanup_task()
which will oops when trying to access the uninitialized spinlock.
If rxe_init_task is not executed, rxe_cleanup_task will not be called. |
| In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix racy bitfield write in btrfs_clear_space_info_full()
From the memory-barriers.txt document regarding memory barrier ordering
guarantees:
(*) These guarantees do not apply to bitfields, because compilers often
generate code to modify these using non-atomic read-modify-write
sequences. Do not attempt to use bitfields to synchronize parallel
algorithms.
(*) Even in cases where bitfields are protected by locks, all fields
in a given bitfield must be protected by one lock. If two fields
in a given bitfield are protected by different locks, the compiler's
non-atomic read-modify-write sequences can cause an update to one
field to corrupt the value of an adjacent field.
btrfs_space_info has a bitfield sharing an underlying word consisting of
the fields full, chunk_alloc, and flush:
struct btrfs_space_info {
struct btrfs_fs_info * fs_info; /* 0 8 */
struct btrfs_space_info * parent; /* 8 8 */
...
int clamp; /* 172 4 */
unsigned int full:1; /* 176: 0 4 */
unsigned int chunk_alloc:1; /* 176: 1 4 */
unsigned int flush:1; /* 176: 2 4 */
...
Therefore, to be safe from parallel read-modify-writes losing a write to
one of the bitfield members protected by a lock, all writes to all the
bitfields must use the lock. They almost universally do, except for
btrfs_clear_space_info_full() which iterates over the space_infos and
writes out found->full = 0 without a lock.
Imagine that we have one thread completing a transaction in which we
finished deleting a block_group and are thus calling
btrfs_clear_space_info_full() while simultaneously the data reclaim
ticket infrastructure is running do_async_reclaim_data_space():
T1 T2
btrfs_commit_transaction
btrfs_clear_space_info_full
data_sinfo->full = 0
READ: full:0, chunk_alloc:0, flush:1
do_async_reclaim_data_space(data_sinfo)
spin_lock(&space_info->lock);
if(list_empty(tickets))
space_info->flush = 0;
READ: full: 0, chunk_alloc:0, flush:1
MOD/WRITE: full: 0, chunk_alloc:0, flush:0
spin_unlock(&space_info->lock);
return;
MOD/WRITE: full:0, chunk_alloc:0, flush:1
and now data_sinfo->flush is 1 but the reclaim worker has exited. This
breaks the invariant that flush is 0 iff there is no work queued or
running. Once this invariant is violated, future allocations that go
into __reserve_bytes() will add tickets to space_info->tickets but will
see space_info->flush is set to 1 and not queue the work. After this,
they will block forever on the resulting ticket, as it is now impossible
to kick the worker again.
I also confirmed by looking at the assembly of the affected kernel that
it is doing RMW operations. For example, to set the flush (3rd) bit to 0,
the assembly is:
andb $0xfb,0x60(%rbx)
and similarly for setting the full (1st) bit to 0:
andb $0xfe,-0x20(%rax)
So I think this is really a bug on practical systems. I have observed
a number of systems in this exact state, but am currently unable to
reproduce it.
Rather than leaving this footgun lying around for the future, take
advantage of the fact that there is room in the struct anyway, and that
it is already quite large and simply change the three bitfield members to
bools. This avoids writes to space_info->full having any effect on
---truncated--- |
| In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix exclusive map memory leak
When excl_prog_hash is 0 and excl_prog_hash_size is non-zero, the map also
needs to be freed. Otherwise, the map memory will not be reclaimed, just
like the memory leak problem reported by syzbot [1].
syzbot reported:
BUG: memory leak
backtrace (crc 7b9fb9b4):
map_create+0x322/0x11e0 kernel/bpf/syscall.c:1512
__sys_bpf+0x3556/0x3610 kernel/bpf/syscall.c:6131 |
| In the Linux kernel, the following vulnerability has been resolved:
USB: gadget: Fix use-after-free during usb config switch
In the process of switching USB config from rndis to other config,
if the hardware does not support the ->pullup callback, or the
hardware encounters a low probability fault, both of them may cause
the ->pullup callback to fail, which will then cause a system panic
(use after free).
The gadget drivers sometimes need to be unloaded regardless of the
hardware's behavior.
Analysis as follows:
=======================================================================
(1) write /config/usb_gadget/g1/UDC "none"
gether_disconnect+0x2c/0x1f8
rndis_disable+0x4c/0x74
composite_disconnect+0x74/0xb0
configfs_composite_disconnect+0x60/0x7c
usb_gadget_disconnect+0x70/0x124
usb_gadget_unregister_driver+0xc8/0x1d8
gadget_dev_desc_UDC_store+0xec/0x1e4
(2) rm /config/usb_gadget/g1/configs/b.1/f1
rndis_deregister+0x28/0x54
rndis_free+0x44/0x7c
usb_put_function+0x14/0x1c
config_usb_cfg_unlink+0xc4/0xe0
configfs_unlink+0x124/0x1c8
vfs_unlink+0x114/0x1dc
(3) rmdir /config/usb_gadget/g1/functions/rndis.gs4
panic+0x1fc/0x3d0
do_page_fault+0xa8/0x46c
do_mem_abort+0x3c/0xac
el1_sync_handler+0x40/0x78
0xffffff801138f880
rndis_close+0x28/0x34
eth_stop+0x74/0x110
dev_close_many+0x48/0x194
rollback_registered_many+0x118/0x814
unregister_netdev+0x20/0x30
gether_cleanup+0x1c/0x38
rndis_attr_release+0xc/0x14
kref_put+0x74/0xb8
configfs_rmdir+0x314/0x374
If gadget->ops->pullup() return an error, function rndis_close() will be
called, then it will causes a use-after-free problem.
======================================================================= |
| In the Linux kernel, the following vulnerability has been resolved:
soc: qcom: smsm: Fix refcount leak bugs in qcom_smsm_probe()
There are two refcount leak bugs in qcom_smsm_probe():
(1) The 'local_node' is escaped out from for_each_child_of_node() as
the break of iteration, we should call of_node_put() for it in error
path or when it is not used anymore.
(2) The 'node' is escaped out from for_each_available_child_of_node()
as the 'goto', we should call of_node_put() for it in goto target. |
| In the Linux kernel, the following vulnerability has been resolved:
block: fix memory leak in __blkdev_issue_zero_pages
Move the fatal signal check before bio_alloc() to prevent a memory
leak when BLKDEV_ZERO_KILLABLE is set and a fatal signal is pending.
Previously, the bio was allocated before checking for a fatal signal.
If a signal was pending, the code would break out of the loop without
freeing or chaining the just-allocated bio, causing a memory leak.
This matches the pattern already used in __blkdev_issue_write_zeroes()
where the signal check precedes the allocation. |
| In the Linux kernel, the following vulnerability has been resolved:
ASoC: da7219: Fix an error handling path in da7219_register_dai_clks()
If clk_hw_register() fails, the corresponding clk should not be
unregistered.
To handle errors from loops, clean up partial iterations before doing the
goto. So add a clk_hw_unregister().
Then use a while (--i >= 0) loop in the unwind section. |
| In the Linux kernel, the following vulnerability has been resolved:
net/ieee802154: don't warn zero-sized raw_sendmsg()
syzbot is hitting skb_assert_len() warning at __dev_queue_xmit() [1],
for PF_IEEE802154 socket's zero-sized raw_sendmsg() request is hitting
__dev_queue_xmit() with skb->len == 0.
Since PF_IEEE802154 socket's zero-sized raw_sendmsg() request was
able to return 0, don't call __dev_queue_xmit() if packet length is 0.
----------
#include <sys/socket.h>
#include <netinet/in.h>
int main(int argc, char *argv[])
{
struct sockaddr_in addr = { .sin_family = AF_INET, .sin_addr.s_addr = htonl(INADDR_LOOPBACK) };
struct iovec iov = { };
struct msghdr hdr = { .msg_name = &addr, .msg_namelen = sizeof(addr), .msg_iov = &iov, .msg_iovlen = 1 };
sendmsg(socket(PF_IEEE802154, SOCK_RAW, 0), &hdr, 0);
return 0;
}
----------
Note that this might be a sign that commit fd1894224407c484 ("bpf: Don't
redirect packets with invalid pkt_len") should be reverted, for
skb->len == 0 was acceptable for at least PF_IEEE802154 socket. |
| In the Linux kernel, the following vulnerability has been resolved:
vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
Inject fault while probing module, if device_register() fails in
vdpasim_net_init() or vdpasim_blk_init(), but the refcount of kobject is
not decreased to 0, the name allocated in dev_set_name() is leaked.
Fix this by calling put_device(), so that name can be freed in
callback function kobject_cleanup().
(vdpa_sim_net)
unreferenced object 0xffff88807eebc370 (size 16):
comm "modprobe", pid 3848, jiffies 4362982860 (age 18.153s)
hex dump (first 16 bytes):
76 64 70 61 73 69 6d 5f 6e 65 74 00 6b 6b 6b a5 vdpasim_net.kkk.
backtrace:
[<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150
[<ffffffff81731d53>] kstrdup+0x33/0x60
[<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110
[<ffffffff82d87aab>] dev_set_name+0xab/0xe0
[<ffffffff82d91a23>] device_add+0xe3/0x1a80
[<ffffffffa0270013>] 0xffffffffa0270013
[<ffffffff81001c27>] do_one_initcall+0x87/0x2e0
[<ffffffff813739cb>] do_init_module+0x1ab/0x640
[<ffffffff81379d20>] load_module+0x5d00/0x77f0
[<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0
[<ffffffff83c4d505>] do_syscall_64+0x35/0x80
[<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
(vdpa_sim_blk)
unreferenced object 0xffff8881070c1250 (size 16):
comm "modprobe", pid 6844, jiffies 4364069319 (age 17.572s)
hex dump (first 16 bytes):
76 64 70 61 73 69 6d 5f 62 6c 6b 00 6b 6b 6b a5 vdpasim_blk.kkk.
backtrace:
[<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150
[<ffffffff81731d53>] kstrdup+0x33/0x60
[<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110
[<ffffffff82d87aab>] dev_set_name+0xab/0xe0
[<ffffffff82d91a23>] device_add+0xe3/0x1a80
[<ffffffffa0220013>] 0xffffffffa0220013
[<ffffffff81001c27>] do_one_initcall+0x87/0x2e0
[<ffffffff813739cb>] do_init_module+0x1ab/0x640
[<ffffffff81379d20>] load_module+0x5d00/0x77f0
[<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0
[<ffffffff83c4d505>] do_syscall_64+0x35/0x80
[<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 |
| In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: mt7921s: fix slab-out-of-bounds access in sdio host
SDIO may need addtional 511 bytes to align bus operation. If the tailroom
of this skb is not big enough, we would access invalid memory region.
For low level operation, increase skb size to keep valid memory access in
SDIO host.
Error message:
[69.951] BUG: KASAN: slab-out-of-bounds in sg_copy_buffer+0xe9/0x1a0
[69.951] Read of size 64 at addr ffff88811c9cf000 by task kworker/u16:7/451
[69.951] CPU: 4 PID: 451 Comm: kworker/u16:7 Tainted: G W OE 6.1.0-rc5 #1
[69.951] Workqueue: kvub300c vub300_cmndwork_thread [vub300]
[69.951] Call Trace:
[69.951] <TASK>
[69.952] dump_stack_lvl+0x49/0x63
[69.952] print_report+0x171/0x4a8
[69.952] kasan_report+0xb4/0x130
[69.952] kasan_check_range+0x149/0x1e0
[69.952] memcpy+0x24/0x70
[69.952] sg_copy_buffer+0xe9/0x1a0
[69.952] sg_copy_to_buffer+0x12/0x20
[69.952] __command_write_data.isra.0+0x23c/0xbf0 [vub300]
[69.952] vub300_cmndwork_thread+0x17f3/0x58b0 [vub300]
[69.952] process_one_work+0x7ee/0x1320
[69.952] worker_thread+0x53c/0x1240
[69.952] kthread+0x2b8/0x370
[69.952] ret_from_fork+0x1f/0x30
[69.952] </TASK>
[69.952] Allocated by task 854:
[69.952] kasan_save_stack+0x26/0x50
[69.952] kasan_set_track+0x25/0x30
[69.952] kasan_save_alloc_info+0x1b/0x30
[69.952] __kasan_kmalloc+0x87/0xa0
[69.952] __kmalloc_node_track_caller+0x63/0x150
[69.952] kmalloc_reserve+0x31/0xd0
[69.952] __alloc_skb+0xfc/0x2b0
[69.952] __mt76_mcu_msg_alloc+0xbf/0x230 [mt76]
[69.952] mt76_mcu_send_and_get_msg+0xab/0x110 [mt76]
[69.952] __mt76_mcu_send_firmware.cold+0x94/0x15d [mt76]
[69.952] mt76_connac_mcu_send_ram_firmware+0x415/0x54d [mt76_connac_lib]
[69.952] mt76_connac2_load_ram.cold+0x118/0x4bc [mt76_connac_lib]
[69.952] mt7921_run_firmware.cold+0x2e9/0x405 [mt7921_common]
[69.952] mt7921s_mcu_init+0x45/0x80 [mt7921s]
[69.953] mt7921_init_work+0xe1/0x2a0 [mt7921_common]
[69.953] process_one_work+0x7ee/0x1320
[69.953] worker_thread+0x53c/0x1240
[69.953] kthread+0x2b8/0x370
[69.953] ret_from_fork+0x1f/0x30
[69.953] The buggy address belongs to the object at ffff88811c9ce800
which belongs to the cache kmalloc-2k of size 2048
[69.953] The buggy address is located 0 bytes to the right of
2048-byte region [ffff88811c9ce800, ffff88811c9cf000)
[69.953] Memory state around the buggy address:
[69.953] ffff88811c9cef00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[69.953] ffff88811c9cef80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[69.953] >ffff88811c9cf000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[69.953] ^
[69.953] ffff88811c9cf080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[69.953] ffff88811c9cf100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc |
| In the Linux kernel, the following vulnerability has been resolved:
virtio_vdpa: build affinity masks conditionally
We try to build affinity mask via create_affinity_masks()
unconditionally which may lead several issues:
- the affinity mask is not used for parent without affinity support
(only VDUSE support the affinity now)
- the logic of create_affinity_masks() might not work for devices
other than block. For example it's not rare in the networking device
where the number of queues could exceed the number of CPUs. Such
case breaks the current affinity logic which is based on
group_cpus_evenly() who assumes the number of CPUs are not less than
the number of groups. This can trigger a warning[1]:
if (ret >= 0)
WARN_ON(nr_present + nr_others < numgrps);
Fixing this by only build the affinity masks only when
- Driver passes affinity descriptor, driver like virtio-blk can make
sure to limit the number of queues when it exceeds the number of CPUs
- Parent support affinity setting config ops
This help to avoid the warning. More optimizations could be done on
top.
[1]
[ 682.146655] WARNING: CPU: 6 PID: 1550 at lib/group_cpus.c:400 group_cpus_evenly+0x1aa/0x1c0
[ 682.146668] CPU: 6 PID: 1550 Comm: vdpa Not tainted 6.5.0-rc5jason+ #79
[ 682.146671] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
[ 682.146673] RIP: 0010:group_cpus_evenly+0x1aa/0x1c0
[ 682.146676] Code: 4c 89 e0 5b 5d 41 5c 41 5d 41 5e c3 cc cc cc cc e8 1b c4 74 ff 48 89 ef e8 13 ac 98 ff 4c 89 e7 45 31 e4 e8 08 ac 98 ff eb c2 <0f> 0b eb b6 e8 fd 05 c3 00 45 31 e4 eb e5 cc cc cc cc cc cc cc cc
[ 682.146679] RSP: 0018:ffffc9000215f498 EFLAGS: 00010293
[ 682.146682] RAX: 000000000001f1e0 RBX: 0000000000000041 RCX: 0000000000000000
[ 682.146684] RDX: ffff888109922058 RSI: 0000000000000041 RDI: 0000000000000030
[ 682.146686] RBP: ffff888109922058 R08: ffffc9000215f498 R09: ffffc9000215f4a0
[ 682.146687] R10: 00000000000198d0 R11: 0000000000000030 R12: ffff888107e02800
[ 682.146689] R13: 0000000000000030 R14: 0000000000000030 R15: 0000000000000041
[ 682.146692] FS: 00007fef52315740(0000) GS:ffff888237380000(0000) knlGS:0000000000000000
[ 682.146695] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 682.146696] CR2: 00007fef52509000 CR3: 0000000110dbc004 CR4: 0000000000370ee0
[ 682.146698] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 682.146700] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 682.146701] Call Trace:
[ 682.146703] <TASK>
[ 682.146705] ? __warn+0x7b/0x130
[ 682.146709] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146712] ? report_bug+0x1c8/0x1e0
[ 682.146717] ? handle_bug+0x3c/0x70
[ 682.146721] ? exc_invalid_op+0x14/0x70
[ 682.146723] ? asm_exc_invalid_op+0x16/0x20
[ 682.146727] ? group_cpus_evenly+0x1aa/0x1c0
[ 682.146729] ? group_cpus_evenly+0x15c/0x1c0
[ 682.146731] create_affinity_masks+0xaf/0x1a0
[ 682.146735] virtio_vdpa_find_vqs+0x83/0x1d0
[ 682.146738] ? __pfx_default_calc_sets+0x10/0x10
[ 682.146742] virtnet_find_vqs+0x1f0/0x370
[ 682.146747] virtnet_probe+0x501/0xcd0
[ 682.146749] ? vp_modern_get_status+0x12/0x20
[ 682.146751] ? get_cap_addr.isra.0+0x10/0xc0
[ 682.146754] virtio_dev_probe+0x1af/0x260
[ 682.146759] really_probe+0x1a5/0x410 |
| In the Linux kernel, the following vulnerability has been resolved:
virtio-crypto: fix memory leak in virtio_crypto_alg_skcipher_close_session()
'vc_ctrl_req' is alloced in virtio_crypto_alg_skcipher_close_session(),
and should be freed in the invalid ctrl_status->status error handling
case. Otherwise there is a memory leak. |
| In the Linux kernel, the following vulnerability has been resolved:
exfat: fix refcount leak in exfat_find
Fix refcount leaks in `exfat_find` related to `exfat_get_dentry_set`.
Function `exfat_get_dentry_set` would increase the reference counter of
`es->bh` on success. Therefore, `exfat_put_dentry_set` must be called
after `exfat_get_dentry_set` to ensure refcount consistency. This patch
relocate two checks to avoid possible leaks. |
| In the Linux kernel, the following vulnerability has been resolved:
exfat: fix divide-by-zero in exfat_allocate_bitmap
The variable max_ra_count can be 0 in exfat_allocate_bitmap(),
which causes a divide-by-zero error in the subsequent modulo operation
(i % max_ra_count), leading to a system crash.
When max_ra_count is 0, it means that readahead is not used. This patch
load the bitmap without readahead. |
| In the Linux kernel, the following vulnerability has been resolved:
net: vxlan: prevent NULL deref in vxlan_xmit_one
Neither sock4 nor sock6 pointers are guaranteed to be non-NULL in
vxlan_xmit_one, e.g. if the iface is brought down. This can lead to the
following NULL dereference:
BUG: kernel NULL pointer dereference, address: 0000000000000010
Oops: Oops: 0000 [#1] SMP NOPTI
RIP: 0010:vxlan_xmit_one+0xbb3/0x1580
Call Trace:
vxlan_xmit+0x429/0x610
dev_hard_start_xmit+0x55/0xa0
__dev_queue_xmit+0x6d0/0x7f0
ip_finish_output2+0x24b/0x590
ip_output+0x63/0x110
Mentioned commits changed the code path in vxlan_xmit_one and as a side
effect the sock4/6 pointer validity checks in vxlan(6)_get_route were
lost. Fix this by adding back checks.
Since both commits being fixed were released in the same version (v6.7)
and are strongly related, bundle the fixes in a single commit. |
| In the Linux kernel, the following vulnerability has been resolved:
gfs2: Prevent recursive memory reclaim
Function new_inode() returns a new inode with inode->i_mapping->gfp_mask
set to GFP_HIGHUSER_MOVABLE. This value includes the __GFP_FS flag, so
allocations in that address space can recurse into filesystem memory
reclaim. We don't want that to happen because it can consume a
significant amount of stack memory.
Worse than that is that it can also deadlock: for example, in several
places, gfs2_unstuff_dinode() is called inside filesystem transactions.
This calls filemap_grab_folio(), which can allocate a new folio, which
can trigger memory reclaim. If memory reclaim recurses into the
filesystem and starts another transaction, a deadlock will ensue.
To fix these kinds of problems, prevent memory reclaim from recursing
into filesystem code by making sure that the gfp_mask of inode address
spaces doesn't include __GFP_FS.
The "meta" and resource group address spaces were already using GFP_NOFS
as their gfp_mask (which doesn't include __GFP_FS). The default value
of GFP_HIGHUSER_MOVABLE is less restrictive than GFP_NOFS, though. To
avoid being overly limiting, use the default value and only knock off
the __GFP_FS flag. I'm not sure if this will actually make a
difference, but it also shouldn't hurt.
This patch is loosely based on commit ad22c7a043c2 ("xfs: prevent stack
overflows from page cache allocation").
Fixes xfstest generic/273. |