sysctl

sysctl lets you configure Linux kernel parameters at runtime. Depending of your Linux distribution, the parameters are stored in /etc/sysctl.conf and /etc/sysctl.d/ and applied at system startup.

Warning

The same parameter can be configured in /etc/sysctl.conf and /etc/sysctl.d/ with different values.

Note

To load settings from all system configuration files, run sysctl -p --system instead of sysctl -p. sysctl -p --system isn’t available on old distributions like CentOS 6.

When possible, I am following recommendations from

Some parameters are not available on older kernels.

dev

dev.tty.ldisc_autoload

Disable tty line discipline autoloading

dev.tty.ldisc_autoload = 0

File system configuration options

ANSSI R14 recommends to set

# Disable coredump creation for setuid executables
# Note that it is possible to disable all coredumps with the
# configuration CONFIG_COREDUMP=n
fs.suid_dumpable = 0
# Available from version 4.19 of the Linux kernel , allows to prohibit
# opening FIFOs and "regular" files that are not owned by the user
# in sticky folders for everyone to write.
fs.protected_fifos = 2
fs.protected_regular = 2
# Restrict the creation of symbolic links to files that the user
# owns. This option is part of the vulnerability prevention mechanisms
# of the Time of Check - Time of Use (Time of Check -
# Time of Use)
fs.protected_symlinks = 1
# Restrict the creation of hard links to files whose user is
# owner. This sysctl is part of the prevention mechanisms against
# Time of Check - Time of Use vulnerabilities , but also against the
# possibility of retaining access to obsolete files
fs.protected_hardlinks = 1

kernel

kernel.core_uses_pid

kernel.core_uses_pid = 1

Extract from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

The default coredump filename is "core".  By setting
core_uses_pid to 1, the coredump filename becomes core.PID.
If core_pattern does not include "%p" (default does not)
and core_uses_pid is set, then .PID will be appended to
the filename.

kernel.dmesg_restrict

To conform to R9,

kernel.dmesg_restrict = 1

Extract from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

This toggle indicates whether unprivileged users are prevented
from using dmesg(8) to view messages from the kernel's log buffer.
When dmesg_restrict is set to (0) there are no restrictions. When
dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use
dmesg(8).

The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
default value of dmesg_restrict.

Note

Debian 10 and later versions restricts the access by default.

$ grep CONFIG_SECURITY_DMESG_RESTRICT /boot/config-$(uname -r)
CONFIG_SECURITY_DMESG_RESTRICT=y

It’s also the case since AlmaLinux 10.

kernel.kptr_restrict

To conform to R9,

kernel.kptr_restrict = 2

Extract from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

This toggle indicates whether restrictions are placed on
exposing kernel addresses via /proc and other interfaces.

When kptr_restrict is set to 0 (the default) the address is hashed before
printing. (This is the equivalent to %p.)

When kptr_restrict is set to (1), kernel pointers printed using the %pK
format specifier will be replaced with 0's unless the user has CAP_SYSLOG
and effective user and group ids are equal to the real ids. This is
because %pK checks are done at read() time rather than open() time, so
if permissions are elevated between the open() and the read() (e.g via
a setuid binary) then %pK will not leak kernel pointers to unprivileged
users. Note, this is a temporary solution only. The correct long-term
solution is to do the permission checks at open() time. Consider removing
world read permissions from files that use %pK, and using dmesg_restrict
to protect against uses of %pK in dmesg(8) if leaking kernel pointer
values to unprivileged users is a concern.

When kptr_restrict is set to (2), kernel pointers printed using
%pK will be replaced with 0's regardless of privileges.

kernel.panic

kernel.panic = 60

Extract from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

The value in this file represents the number of seconds the kernel
waits before rebooting on a panic. When you use the software watchdog,
the recommended setting is 60.

kernel.panic_on_oops

To conform to R9,

kernel.panic_on_oops = 1

Extract from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

Controls the kernel's behaviour when an oops or BUG is encountered.

0: try to continue operation

1: panic immediately.  If the panic sysctl is also non-zero then the
   machine will be rebooted.

kernel.modules_disabled

ANSSI R10 recommends to set

kernel.modules_disabled = 1

Warning

Once this option is activated, it cannot be deactivated without rebooting the system. Thus, the loading of a new kernel module will require rebooting as well.

I haven’t try yet to configure it in production.

kernel.perf_cpu_time_max_percent

To conform to R9,

kernel.perf_cpu_time_max_percent = 1

kernel.perf_event_max_sample_rate

To conform to R9,

kernel.perf_event_max_sample_rate = 1

But I am using

kernel.perf_event_max_sample_rate = 100000

kernel.perf_event_paranoid

kernel.perf_event_paranoid = 3

R9 recommends 2 or greater. The patch section lists a kernel patch, see https://lwn.net/Articles/696216/ .
Lynis checks for value 2, 3 and 4.

Extract from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

Controls use of the performance events system by unprivileged
users (without CAP_SYS_ADMIN).  The default value is 2.

-1: Allow use of (almost) all events by all users
     Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
     Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN

kernel.randomize_va_space

ANSSI R9 recommends to set

kernel.randomize_va_space = 2

AFAIK it’s now the default on all Linux distribution.

Extract from https://www.kernel.org/doc/Documentation/sysctl/kernel.txt

This option can be used to select the type of process address
space randomization that is used in the system, for architectures
that support this feature.

0 - Turn the process address space randomization off.  This is the
    default for architectures that do not support this feature anyways,
    and kernels that are booted with the "norandmaps" parameter.

1 - Make the addresses of mmap base, stack and VDSO page randomized.
    This, among other things, implies that shared libraries will be
    loaded to random addresses.  Also for PIE-linked binaries, the
    location of code start is randomized.  This is the default if the
    CONFIG_COMPAT_BRK option is enabled.

2 - Additionally enable heap randomization.  This is the default if
    CONFIG_COMPAT_BRK is disabled.

    There are a few legacy applications out there (such as some ancient
    versions of libc.so.5 from 1996) that assume that brk area starts
    just after the end of the code+bss.  These applications break when
    start of the brk area is randomized.  There are however no known
    non-legacy applications that would be broken this way, so for most
    systems it is safe to choose full randomization.

    Systems with ancient and/or broken binaries should be configured
    with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
    address space randomization.

kernel.sysrq

ANSSI R9 recommends to set

kernel.sysrq = 0

But I am allowing the sync command:

kernel.sysrq = 16

Extract from https://www.kernel.org/doc/Documentation/admin-guide/sysrq.rst

Here is the list of possible values in /proc/sys/kernel/sysrq:

 -  0 - disable sysrq completely
 -  1 - enable all functions of sysrq
 - >1 - bitmask of allowed sysrq functions (see below for detailed function
   description)::

        2 =   0x2 - enable control of console logging level
        4 =   0x4 - enable control of keyboard (SAK, unraw)
        8 =   0x8 - enable debugging dumps of processes etc.
       16 =  0x10 - enable sync command
       32 =  0x20 - enable remount read-only
       64 =  0x40 - enable signalling of processes (term, kill, oom-kill)
      128 =  0x80 - allow reboot/poweroff
      256 = 0x100 - allow nicing of all RT tasks

kernel.unprivileged_bpf_disabled

ANSSI R9 recommends to set

kernel.unprivileged_bpf_disabled = 1

Source: https://docs.kernel.org/_sources/admin-guide/sysctl/kernel.rst.txt

Writing 1 to this entry will disable unprivileged calls to bpf(); once disabled, calling bpf() without CAP_SYS_ADMIN or CAP_BPF will return -EPERM. Once set to 1, this can’t be cleared from the running kernel anymore.

Writing 2 to this entry will also disable unprivileged calls to bpf(), however, an admin can still change this setting later on, if needed, by writing 0 or 1 to this entry.

If BPF_UNPRIV_DEFAULT_OFF is enabled in the kernel config, then this entry will default to 2 instead of 0.

0	Unprivileged calls to `bpf()` are enabled
1	Unprivileged calls to `bpf()` are disabled without recovery
2	Unprivileged calls to `bpf()` are disabled

kernel.yama.ptrace_scope

To conform to R11, use 1 or greater.

kernel.yama.ptrace_scope = 1

Extract from https://www.kernel.org/doc/Documentation/security/Yama.txt

The sysctl settings (writable only with CAP_SYS_PTRACE) are:

- 0 - classic ptrace permissions: a process can PTRACE_ATTACH to any other
    process running under the same uid, as long as it is dumpable (i.e.
    did not transition uids, start privileged, or have called
    prctl(PR_SET_DUMPABLE...) already). Similarly, PTRACE_TRACEME is
    unchanged.

- 1 - restricted ptrace: a process must have a predefined relationship
    with the inferior it wants to call PTRACE_ATTACH on. By default,
    this relationship is that of only its descendants when the above
    classic criteria is also met. To change the relationship, an
    inferior can call prctl(PR_SET_PTRACER, debugger, ...) to declare
    an allowed debugger PID to call PTRACE_ATTACH on the inferior.
    Using PTRACE_TRACEME is unchanged.

- 2 - admin-only attach: only processes with CAP_SYS_PTRACE may use ptrace
    with PTRACE_ATTACH, or through children calling PTRACE_TRACEME.

- 3 - no attach: no processes may use ptrace with PTRACE_ATTACH nor via
    PTRACE_TRACEME. Once set, this sysctl value cannot be changed."

net

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst

accept_local

net.ipv4.conf.{all,default}.accept_local = 0

accept_local - BOOLEAN: Accept packets with local source addresses. In combination with suitable routing, this can be used to direct packets between two local interfaces over the wire and have them accepted properly. default FALSE

arp_filter

net.ipv4.conf.all.arp_filter = 1

arp_filter - BOOLEAN

1 - Allows you to have multiple network interfaces on the same subnet, and have the ARPs for each interface be answered based on whether or not the kernel would route a packet from the ARP’d IP out that interface (therefore you must use source based routing for this to work). In other words it allows control of which cards (usually 1) will respond to an arp request.
0 - (default) The kernel can respond to arp requests with addresses from other interfaces. This may seem wrong but it usually makes sense, because it increases the chance of successful communication. IP addresses are owned by the complete host on Linux, not by particular interfaces. Only for more complex setups like load- balancing, does this behaviour cause problems.

arp_filter for the interface will be enabled if at least one of conf/{all,interface}/arp_filter is set to TRUE, it will be disabled otherwise

Note

lynis 3.1.5 doesn’t check it but ANSSI guidelines recommends this setting to be set to 1.

Warning

Setting it to 1 is known to break communications on OVH VPS, use 0 in this case.

net.ipv4.conf.all.arp_filter = 0

arp_ignore

net.ipv4.conf.all.arp_ignore = 2

arp_ignore - INTEGER

Define different modes for sending replies in response to received ARP requests that resolve local target IP addresses:

0 - (default): reply for any local target IP address, configured on any interface
1 - reply only if the target IP address is local address configured on the incoming interface
2 - reply only if the target IP address is local address configured on the incoming interface and both with the sender’s IP address are part from same subnet on this interface
3 - do not reply for local addresses configured with scope host, only resolutions for global and link addresses are replied
4-7 - reserved
8 - do not reply for all local addresses

The max value from conf/{all,interface}/arp_ignore is used when ARP request is received on the {interface}

Warning

On oVirt nodes and OVH VPS, the value 2 is known to break things. Use 1 instead.

net.ipv4.conf.all.arp_ignore = 1

log_martians

net.ipv4.conf.all.log_martians = 1
net.ipv4.conf.default.log_martians = 1

log_martians - BOOLEAN: Log packets with impossible addresses to kernel log. log_martians for the interface will be enabled if at least one of conf/{all,interface}/log_martians is set to TRUE, it will be disabled otherwise

Warning

log_martians is currently enabled on all hosts except on oVirt hosts.

net.ipv4.conf.all.log_martians = 0
net.ipv4.conf.default.log_martians = 0

route_localnet

net.ipv4.conf.all.route_localnet = 0

route_localnet - BOOLEAN

Do not consider loopback addresses as martian source or destination while routing. This enables the use of 127/8 for local routing purposes.

default FALSE

rp_filter

net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.rp_filter = 1

There are a few exceptions on some firewalls configured with

net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.INTERFACE_NAME.rp_filter = 0

rp_filter - INTEGER

0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path Each incoming packet is tested against the FIB and if the interface is not the best reverse path the packet check will fail. By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path Each incoming packet’s source address is also tested against the FIB and if the source address is not reachable via any interface the packet check will fail.

Current recommended practice in RFC3704 is to enable strict mode to prevent IP spoofing from DDos attacks. If using asymmetric routing or other complicated routing, then loose mode is recommended.

The max value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}.

Default value is 0. Note that some distributions enable it in startup scripts.

TODO: Check if rp_filter = 2 can be use instead.

ICMP redirect

I think it’s better to push the static routes via ansible (or a similar configuration manager) and ignore all redirects like recommended by ANSSI guide. Configuring the kernel options (R9) lists

# Deny receipt of ICMP redirect packet. The suggested setting of this
# option is to be strongly considered in the case of routers which must not
# depend on an external element to determine the calculation of a route. Even
# for non -router machines , this setting protects against
# traffic diversions with ICMP redirect packets.
net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.secure_redirects = 0
net.ipv4.conf.default.secure_redirects = 0
net.ipv4.conf.all.shared_media = 0
net.ipv4.conf.default.shared_media = 0

Note that Ubuntu recommends using the default values instead.

accept_redirects

accept_redirects - BOOLEAN

Accept ICMP redirect messages. accept_redirects for the interface will be enabled if:

both conf/{all,interface}/accept_redirects are TRUE in the case forwarding for the interface is enabled

or

at least one of conf/{all,interface}/accept_redirects is TRUE in the case forwarding for the interface is disabled

accept_redirects for the interface will be disabled otherwise

default:

TRUE (host)

FALSE (router)

shared_media

shared_media - BOOLEAN

Send(router) or accept(host) RFC1620 shared media redirects. Overrides secure_redirects.

shared_media for the interface will be enabled if at least one of conf/{all,interface}/shared_media is set to TRUE, it will be disabled otherwise

default TRUE

secure_redirects

secure_redirects - BOOLEAN

Accept ICMP redirect messages only to gateways listed in the interface’s current gateway list. Even if disabled, RFC1122 redirect rules still apply.

Overridden by shared_media.

secure_redirects for the interface will be enabled if at least one of conf/{all,interface}/secure_redirects is set to TRUE, it will be disabled otherwise

default TRUE

net.core.bpf_jit_harden

net.core.bpf_jit_harden = 2

icmp_echo_ignore_broadcasts

net.ipv4.icmp_echo_ignore_broadcasts = 1

Extract from https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst

icmp_echo_ignore_broadcasts - BOOLEAN
      If enabled, then the kernel will ignore all ICMP ECHO and
      TIMESTAMP requests sent to it via broadcast/multicast.

      Possible values:

      - 0 (disabled)
      - 1 (enabled)

      Default: 1 (enabled)

I don’t remember when the default was to accept broadcast but it’s checked in scap-security-guide

icmp_ignore_bogus_error_responses

net.ipv4.icmp_ignore_bogus_error_responses = 1

icmp_ignore_bogus_error_responses - BOOLEAN

Some routers violate RFC1122 by sending bogus responses to broadcast frames. Such violations are normally logged via a kernel warning. If enabled, the kernel will not give such warnings, which will avoid log file clutter.

Possible values:

0 (disabled)
1 (enabled)

Default: 1 (enabled)

send_redirects

It’s more secure to configure static routes on all hosts than using redirect.

net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

send_redirects - BOOLEAN

Send redirects, if router.

send_redirects for the interface will be enabled if at least one of conf/{all,interface}/send_redirects is set to TRUE, it will be disabled otherwise

Default: TRUE

tcp_rfc1337

net.ipv4.tcp_rfc1337 = 1

tcp_rfc1337 - BOOLEAN

If enabled, the TCP stack behaves conforming to RFC1337. If unset, we are not conforming to RFC, but prevent TCP TIME_WAIT assassination.

Possible values:

0 (disabled)
1 (enabled)

Default: 0 (disabled)

tcp_syncookies

net.ipv4.tcp_syncookies = 1

tcp_syncookies - INTEGER: Only valid when the kernel was compiled with CONFIG_SYN_COOKIES Send out syncookies when the syn backlog queue of a socket overflows. This is to prevent against the common ‘SYN flood attack’ Default: 1

Note

Syncookies is fallback facility. It MUST NOT be used to help highly loaded servers to stand against legal connection rate. If you see SYN flood warnings in your logs, but investigation shows that they occur because of overload with legal connections, you should tune another parameters until this warning disappear. See: tcp_max_syn_backlog, tcp_synack_retries, tcp_abort_on_overflow.

If you want to test which effects syncookies have to your network connections you can set this knob to 2 to enable unconditionally generation of syncookies.

Warning

syncookies seriously violate TCP protocol, do not allow to use TCP extensions, can result in serious degradation of some services (f.e. SMTP relaying), visible not by you, but your clients and relays, contacting you. While you see SYN flood warnings in logs not being really flooded, your server is seriously misconfigured.

vm

vm.mmap_min_addr

On x86_64 architecture at least, 65536 is the recommended value. To increase security, force

vm.mmap_min_addr = 65536

NULL pointer dereference flaws in the Linux kernel was often abused by a local, unprivileged user to gain root privileges through the mapping of low memory pages and crafting them to contain valid malicious instructions. In the Linux kernel version 2.6.23, the /proc/sys/vm/mmap_min_addr tunable was introduced to prevent unprivileged users from creating new memory mappings below the configured minimum address. This feature has been backported in some older OS (ie. Red Hat Enterprise Linux 5.2).

The default value is defined when the kernel is compiled. Here is how I found this value on some distributions:

[user@tst-c8 ~]$ uname -r; grep CONFIG_DEFAULT_MMAP_MIN_ADDR /boot/config-$(uname -r)
4.18.0-553.76.1.el8_10.x86_64
CONFIG_DEFAULT_MMAP_MIN_ADDR=4096

Distribution	Kernel	CONFIG_DEFAULT_MMAP_MIN_ADDR
AlmaLinux 8	4.18.0-553.76.1.el8_10.x86_64	4096
AlmaLinux 9	5.14.0-570.46.1.el9_6.x86_64	65536
CentOS 6	2.6.32-754.35.1.el6.x86_64	4096
CentOS 7	3.10.0-1160.119.1.el7.x86_64	4096
CentOS 7	3.10.0-1160.88.1.el7.centos.plus.x86_64	4096
CentOS 9	5.14.0-542.el9.x86_64	65536
Debian 8	3.16.0-4-amd64	65536
SLES 15	5.14.21-150500.55.39-default	65536