sysctl
sysctl lets you configure Linux kernel parameters at runtime.
Depending of your Linux distribution, the parameters are stored in /etc/sysctl.conf and /etc/sysctl.d/ and applied at system startup.
Warning
The same parameter can be configured in /etc/sysctl.conf and /etc/sysctl.d/ with different values.
Note
To load settings from all system configuration files, run sysctl -p --system instead of sysctl -p. sysctl -p --system isn’t available on old distributions like CentOS 6.
When possible, I am following recommendations from
Some parameters are not available on older kernels.
dev
dev.tty.ldisc_autoload
Disable tty line discipline autoloading
dev.tty.ldisc_autoload = 0
File system configuration options
ANSSI R14 recommends to set
# Disable coredump creation for setuid executables
# Note that it is possible to disable all coredumps with the
# configuration CONFIG_COREDUMP=n
fs.suid_dumpable = 0
# Available from version 4.19 of the Linux kernel , allows to prohibit
# opening FIFOs and "regular" files that are not owned by the user
# in sticky folders for everyone to write.
fs.protected_fifos = 2
fs.protected_regular = 2
# Restrict the creation of symbolic links to files that the user
# owns. This option is part of the vulnerability prevention mechanisms
# of the Time of Check - Time of Use (Time of Check -
# Time of Use)
fs.protected_symlinks = 1
# Restrict the creation of hard links to files whose user is
# owner. This sysctl is part of the prevention mechanisms against
# Time of Check - Time of Use vulnerabilities , but also against the
# possibility of retaining access to obsolete files
fs.protected_hardlinks = 1
kernel
kernel.core_uses_pid
kernel.core_uses_pid = 1
The default coredump filename is "core". By setting
core_uses_pid to 1, the coredump filename becomes core.PID.
If core_pattern does not include "%p" (default does not)
and core_uses_pid is set, then .PID will be appended to
the filename.
kernel.dmesg_restrict
To conform to R9,
kernel.dmesg_restrict = 1
This toggle indicates whether unprivileged users are prevented
from using dmesg(8) to view messages from the kernel's log buffer.
When dmesg_restrict is set to (0) there are no restrictions. When
dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
dmesg(8).
The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
default value of dmesg_restrict.
Note
Debian 10 and later versions restricts the access by default.
$ grep CONFIG_SECURITY_DMESG_RESTRICT /boot/config-$(uname -r)
CONFIG_SECURITY_DMESG_RESTRICT=y
It’s also the case since AlmaLinux 10.
kernel.kptr_restrict
To conform to R9,
kernel.kptr_restrict = 2
This toggle indicates whether restrictions are placed on
exposing kernel addresses via /proc and other interfaces.
When kptr_restrict is set to 0 (the default) the address is hashed before
printing. (This is the equivalent to %p.)
When kptr_restrict is set to (1), kernel pointers printed using the %pK
format specifier will be replaced with 0's unless the user has CAP_SYSLOG
and effective user and group ids are equal to the real ids. This is
because %pK checks are done at read() time rather than open() time, so
if permissions are elevated between the open() and the read() (e.g via
a setuid binary) then %pK will not leak kernel pointers to unprivileged
users. Note, this is a temporary solution only. The correct long-term
solution is to do the permission checks at open() time. Consider removing
world read permissions from files that use %pK, and using dmesg_restrict
to protect against uses of %pK in dmesg(8) if leaking kernel pointer
values to unprivileged users is a concern.
When kptr_restrict is set to (2), kernel pointers printed using
%pK will be replaced with 0's regardless of privileges.
kernel.panic
kernel.panic = 60
The value in this file represents the number of seconds the kernel
waits before rebooting on a panic. When you use the software watchdog,
the recommended setting is 60.
kernel.panic_on_oops
To conform to R9,
kernel.panic_on_oops = 1
Controls the kernel's behaviour when an oops or BUG is encountered.
0: try to continue operation
1: panic immediately. If the panic sysctl is also non-zero then the
machine will be rebooted.
kernel.modules_disabled
ANSSI R10 recommends to set
kernel.modules_disabled = 1
Warning
Once this option is activated, it cannot be deactivated without rebooting the system. Thus, the loading of a new kernel module will require rebooting as well.
I haven’t try yet to configure it in production.
kernel.perf_cpu_time_max_percent
To conform to R9,
kernel.perf_cpu_time_max_percent = 1
kernel.perf_event_max_sample_rate
To conform to R9,
kernel.perf_event_max_sample_rate = 1
But I am using
kernel.perf_event_max_sample_rate = 100000
kernel.perf_event_paranoid
kernel.perf_event_paranoid = 3
R9 recommends 2 or greater. The patch section lists a kernel patch, see https://lwn.net/Articles/696216/ .
Lynis checks for value 2, 3 and 4.
Controls use of the performance events system by unprivileged
users (without CAP_SYS_ADMIN). The default value is 2.
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
kernel.randomize_va_space
ANSSI R9 recommends to set
kernel.randomize_va_space = 2
AFAIK it’s now the default on all Linux distribution.
This option can be used to select the type of process address
space randomization that is used in the system, for architectures
that support this feature.
0 - Turn the process address space randomization off. This is the
default for architectures that do not support this feature anyways,
and kernels that are booted with the "norandmaps" parameter.
1 - Make the addresses of mmap base, stack and VDSO page randomized.
This, among other things, implies that shared libraries will be
loaded to random addresses. Also for PIE-linked binaries, the
location of code start is randomized. This is the default if the
CONFIG_COMPAT_BRK option is enabled.
2 - Additionally enable heap randomization. This is the default if
CONFIG_COMPAT_BRK is disabled.
There are a few legacy applications out there (such as some ancient
versions of libc.so.5 from 1996) that assume that brk area starts
just after the end of the code+bss. These applications break when
start of the brk area is randomized. There are however no known
non-legacy applications that would be broken this way, so for most
systems it is safe to choose full randomization.
Systems with ancient and/or broken binaries should be configured
with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
address space randomization.
kernel.sysrq
ANSSI R9 recommends to set
kernel.sysrq = 0
But I am allowing the sync command:
kernel.sysrq = 16
Here is the list of possible values in /proc/sys/kernel/sysrq:
- 0 - disable sysrq completely
- 1 - enable all functions of sysrq
- >1 - bitmask of allowed sysrq functions (see below for detailed function
description)::
2 = 0x2 - enable control of console logging level
4 = 0x4 - enable control of keyboard (SAK, unraw)
8 = 0x8 - enable debugging dumps of processes etc.
16 = 0x10 - enable sync command
32 = 0x20 - enable remount read-only
64 = 0x40 - enable signalling of processes (term, kill, oom-kill)
128 = 0x80 - allow reboot/poweroff
256 = 0x100 - allow nicing of all RT tasks
kernel.unprivileged_bpf_disabled
ANSSI R9 recommends to set
kernel.unprivileged_bpf_disabled = 1
Source: https://docs.kernel.org/_sources/admin-guide/sysctl/kernel.rst.txt
Writing 1 to this entry will disable unprivileged calls to bpf();
once disabled, calling bpf() without CAP_SYS_ADMIN or CAP_BPF
will return -EPERM. Once set to 1, this can’t be cleared from the
running kernel anymore.
Writing 2 to this entry will also disable unprivileged calls to bpf(),
however, an admin can still change this setting later on, if needed, by
writing 0 or 1 to this entry.
If BPF_UNPRIV_DEFAULT_OFF is enabled in the kernel config, then this
entry will default to 2 instead of 0.
0 |
Unprivileged calls to |
1 |
Unprivileged calls to |
2 |
Unprivileged calls to |
kernel.yama.ptrace_scope
To conform to R11, use 1 or greater.
kernel.yama.ptrace_scope = 1
The sysctl settings (writable only with CAP_SYS_PTRACE) are:
- 0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
process running under the same uid, as long as it is dumpable (i.e.
did not transition uids, start privileged, or have called
prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is
unchanged.
- 1 - restricted ptrace: a process must have a predefined relationship
with the inferior it wants to call PTRACE_ATTACH on. By default,
this relationship is that of only its descendants when the above
classic criteria is also met. To change the relationship, an
inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
an allowed debugger PID to call PTRACE_ATTACH on the inferior.
Using PTRACE_TRACEME is unchanged.
- 2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
with PTRACE_ATTACH, or through children calling PTRACE_TRACEME.
- 3 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via
PTRACE_TRACEME. Once set, this sysctl value cannot be changed."
net
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst
accept_local
net.ipv4.conf.{all,default}.accept_local = 0
- accept_local - BOOLEAN
Accept packets with local source addresses. In combination with suitable routing, this can be used to direct packets between two local interfaces over the wire and have them accepted properly. default FALSE
arp_filter
net.ipv4.conf.all.arp_filter = 1
- arp_filter - BOOLEAN
1 - Allows you to have multiple network interfaces on the same subnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP’d IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of which cards (usually 1) will respond to an arp request.
0 - (default) The kernel can respond to arp requests with addresses from other interfaces. This may seem wrong but it usually makes sense, because it increases the chance of successful communication. IP addresses are owned by the complete host on Linux, not by particular interfaces. Only for more complex setups like load- balancing, does this behaviour cause problems.
arp_filter for the interface will be enabled if at least one of conf/{all,interface}/arp_filter is set to TRUE, it will be disabled otherwise
Note
lynis 3.1.5 doesn’t check it but ANSSI guidelines recommends this setting to be set to 1.
Warning
Setting it to 1 is known to break communications on OVH VPS, use 0 in this case.
net.ipv4.conf.all.arp_filter = 0
arp_ignore
net.ipv4.conf.all.arp_ignore = 2
- arp_ignore - INTEGER
Define different modes for sending replies in response to received ARP requests that resolve local target IP addresses:
0 - (default): reply for any local target IP address, configured on any interface
1 - reply only if the target IP address is local address configured on the incoming interface
2 - reply only if the target IP address is local address configured on the incoming interface and both with the sender’s IP address are part from same subnet on this interface
3 - do not reply for local addresses configured with scope host, only resolutions for global and link addresses are replied
4-7 - reserved
8 - do not reply for all local addresses
The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface}
Warning
On oVirt nodes and OVH VPS, the value 2 is known to break things. Use 1 instead.
net.ipv4.conf.all.arp_ignore = 1
log_martians
net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1
- log_martians - BOOLEAN
Log packets with impossible addresses to kernel log. log_martians for the interface will be enabled if at least one of conf/{all,interface}/log_martians is set to TRUE, it will be disabled otherwise
Warning
log_martians is currently enabled on all hosts except on oVirt hosts.
net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.default.log_martians = 0
route_localnet
net.ipv4.conf.all.route_localnet = 0
- route_localnet - BOOLEAN
Do not consider loopback addresses as martian source or destination while routing. This enables the use of 127/8 for local routing purposes.
default FALSE
rp_filter
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1
There are a few exceptions on some firewalls configured with
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.INTERFACE_NAME.rp_filter = 0
- rp_filter - INTEGER
0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path Each incoming packet is tested against the FIB and if the interface is not the best reverse path the packet check will fail. By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path Each incoming packet’s source address is also tested against the FIB and if the source address is not reachable via any interface the packet check will fail.
Current recommended practice in RFC3704 is to enable strict mode to prevent IP spoofing from DDos attacks. If using asymmetric routing or other complicated routing, then loose mode is recommended.
The max value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}.
Default value is 0. Note that some distributions enable it in startup scripts.
TODO: Check if rp_filter = 2 can be use instead.
ICMP redirect
I think it’s better to push the static routes via ansible (or a similar configuration manager) and ignore all redirects like recommended by ANSSI guide. Configuring the kernel options (R9) lists
# Deny receipt of ICMP redirect packet. The suggested setting of this
# option is to be strongly considered in the case of routers which must not
# depend on an external element to determine the calculation of a route. Even
# for non -router machines , this setting protects against
# traffic diversions with ICMP redirect packets.
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.shared_media = 0
net.ipv4.conf.default.shared_media = 0
Note that Ubuntu recommends using the default values instead.
accept_redirects
- accept_redirects - BOOLEAN
Accept ICMP redirect messages. accept_redirects for the interface will be enabled if:
both conf/{all,interface}/accept_redirects are TRUE in the case forwarding for the interface is enabled
or
at least one of conf/{all,interface}/accept_redirects is TRUE in the case forwarding for the interface is disabled
accept_redirects for the interface will be disabled otherwise
default:
TRUE (host)
FALSE (router)
secure_redirects
- secure_redirects - BOOLEAN
Accept ICMP redirect messages only to gateways listed in the interface’s current gateway list. Even if disabled, RFC1122 redirect rules still apply.
Overridden by shared_media.
secure_redirects for the interface will be enabled if at least one of conf/{all,interface}/secure_redirects is set to TRUE, it will be disabled otherwise
default TRUE
net.core.bpf_jit_harden
net.core.bpf_jit_harden = 2
icmp_echo_ignore_broadcasts
net.ipv4.icmp_echo_ignore_broadcasts = 1
icmp_echo_ignore_broadcasts - BOOLEAN
If enabled, then the kernel will ignore all ICMP ECHO and
TIMESTAMP requests sent to it via broadcast/multicast.
Possible values:
- 0 (disabled)
- 1 (enabled)
Default: 1 (enabled)
I don’t remember when the default was to accept broadcast but it’s checked in scap-security-guide
icmp_ignore_bogus_error_responses
net.ipv4.icmp_ignore_bogus_error_responses = 1
- icmp_ignore_bogus_error_responses - BOOLEAN
Some routers violate RFC1122 by sending bogus responses to broadcast frames. Such violations are normally logged via a kernel warning. If enabled, the kernel will not give such warnings, which will avoid log file clutter.
Possible values:
0 (disabled)
1 (enabled)
Default: 1 (enabled)
send_redirects
It’s more secure to configure static routes on all hosts than using redirect.
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0
- send_redirects - BOOLEAN
Send redirects, if router.
send_redirects for the interface will be enabled if at least one of conf/{all,interface}/send_redirects is set to TRUE, it will be disabled otherwise
Default: TRUE
tcp_rfc1337
net.ipv4.tcp_rfc1337 = 1
- tcp_rfc1337 - BOOLEAN
If enabled, the TCP stack behaves conforming to RFC1337. If unset, we are not conforming to RFC, but prevent TCP TIME_WAIT assassination.
Possible values:
0 (disabled)
1 (enabled)
Default: 0 (disabled)
vm
vm.mmap_min_addr
On x86_64 architecture at least, 65536 is the recommended value. To increase security, force
vm.mmap_min_addr = 65536
NULL pointer dereference flaws in the Linux kernel was often abused by a local, unprivileged user to gain root privileges through the mapping of low memory pages and crafting them to contain valid malicious instructions. In the Linux kernel version 2.6.23, the /proc/sys/vm/mmap_min_addr tunable was introduced to prevent unprivileged users from creating new memory mappings below the configured minimum address. This feature has been backported in some older OS (ie. Red Hat Enterprise Linux 5.2).
The default value is defined when the kernel is compiled. Here is how I found this value on some distributions:
[user@tst-c8 ~]$ uname -r; grep CONFIG_DEFAULT_MMAP_MIN_ADDR /boot/config-$(uname -r)
4.18.0-553.76.1.el8_10.x86_64
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096
Distribution |
Kernel |
CONFIG_DEFAULT_MMAP_MIN_ADDR |
|---|---|---|
AlmaLinux 8 |
4.18.0-553.76.1.el8_10.x86_64 |
4096 |
AlmaLinux 9 |
5.14.0-570.46.1.el9_6.x86_64 |
65536 |
CentOS 6 |
2.6.32-754.35.1.el6.x86_64 |
4096 |
CentOS 7 |
3.10.0-1160.119.1.el7.x86_64 |
4096 |
CentOS 7 |
3.10.0-1160.88.1.el7.centos.plus.x86_64 |
4096 |
CentOS 9 |
5.14.0-542.el9.x86_64 |
65536 |
Debian 8 |
3.16.0-4-amd64 |
65536 |
SLES 15 |
5.14.21-150500.55.39-default |
65536 |