Why Linux OS takes over 5 minutes during boot-up process
In this article, I would show you how to find which systemd unit might lead the boot-up times through `systemd-analyze`[1] command.
Problem
As the following output of journalctl
command, we can view the network unit takes around 5 minutes between initial the LSB and failed to start:
- Jun 01 15:59:45:
Starting LSB: Bring up/down networking...
- Jun 01 16:04:45:
Failed to start LSB: Bring up/down networking.
$ journalctl --unit=network
-- Logs begin at Tue 2021-06-01 15:59:40 UTC, end at Tue 2021-06-01 16:05:37 UTC. --
Jun 01 15:59:45 ip-172-31-29-127.eu-west-1.compute.internal systemd[1]: Starting LSB: Bring up/down networking...
Jun 01 15:59:45 ip-172-31-29-127.eu-west-1.compute.internal network[666]: Bringing up loopback interface: [ OK ]
Jun 01 15:59:45 ip-172-31-29-127.eu-west-1.compute.internal network[666]: Bringing up interface eth0:
Jun 01 15:59:45 ip-172-31-29-127.eu-west-1.compute.internal dhclient[789]: DHCPREQUEST on eth0 to 255.255.255.255 port 67 (xid=0x7487dbae)
Jun 01 15:59:45 ip-172-31-29-127.eu-west-1.compute.internal dhclient[789]: DHCPACK from 172.31.16.1 (xid=0x7487dbae)
Jun 01 15:59:47 ip-172-31-29-127.eu-west-1.compute.internal NET[833]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Jun 01 15:59:47 ip-172-31-29-127.eu-west-1.compute.internal network[666]: Determining IP information for eth0... done.
Jun 01 15:59:48 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 1020ms.
Jun 01 15:59:49 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 2120ms.
Jun 01 15:59:51 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 4450ms.
Jun 01 15:59:55 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 9080ms.
Jun 01 16:00:05 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 18980ms.
Jun 01 16:00:24 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 37130ms.
Jun 01 16:01:01 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 77810ms.
Jun 01 16:02:19 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 113620ms.
Jun 01 16:04:12 ip-172-31-29-127.eu-west-1.compute.internal dhclient[856]: XMT: Solicit on eth0, interval 35520ms.
Jun 01 16:04:45 ip-172-31-29-127.eu-west-1.compute.internal systemd[1]: network.service start operation timed out. Terminating.
Jun 01 16:04:45 ip-172-31-29-127.eu-west-1.compute.internal systemd[1]: Failed to start LSB: Bring up/down networking.
Jun 01 16:04:45 ip-172-31-29-127.eu-west-1.compute.internal systemd[1]: Unit network.service entered failed state.
Jun 01 16:04:45 ip-172-31-29-127.eu-west-1.compute.internal systemd[1]: network.service failed.
Jun 01 16:04:48 ip-172-31-29-127.eu-west-1.compute.internal network[666]: Determining IPv6 information for eth0... failed.
Jun 01 16:04:48 ip-172-31-29-127.eu-west-1.compute.internal network[666]: WARN : [/etc/sysconfig/network-scripts/ifup-eth] Unable to obtain IPv6 DHCP address eth0.
Find the systemd unit that take a long time
In this case, we can use systemd-analyze blame
command to show a list of all running units, ordered by the time they took to initialize.
$ sudo systemd-analyze blame
5min 115ms network.service
3.866s dev-xvda1.device
3.101s cloud-init-local.service
1.206s cloud-init.service
From the output message of systemd-analyze blame
, we can know the network.service
take around 5 min. Moreover, we also can check the configuration of network.service
that defined the TimeoutSec=5min
by default[2].
$ sudo systemctl edit --full network.service
...
...
[Service]
Type=forking
Restart=no
TimeoutSec=5min
IgnoreSIGPIPE=no
KillMode=process
GuessMainPID=no
RemainAfterExit=yes
ExecStart=/etc/rc.d/init.d/network start
ExecStop=/etc/rc.d/init.d/network stop
This means that network.service
might be waiting for something and cause the timeout. Therefore, we should check the networking configurations.
$ cat /etc/sysconfig/network-scripts/ifcfg-ens5
# Created by cloud-init on instance boot automatically, do not edit.
#
BOOTPROTO=dhcp
DEVICE=ens5
DHCPV6C=yes
IPV6INIT=yes
ONBOOT=yes
TYPE=Ethernet
USERCTL=no
From this configuration of the interface ens5
, we can know IPv6 address is enabled. However, the configurations in /etc/sysconfig/network-scripts
are created by cloud-init on instance boot automatically.
By checking the configurations in /etc/cloud/cloud.cfg.d/
, we can confirm the IPv6 address is enabled in cloud-init.
...
network:
version: 1
config:
- type: physical
name: eth0
subnets:
- type: dhcp
- type: dhcp6
In the meantime, I can view the VPC did not be enabled the IPv6 support[3].
Summary
If we enable the IPv6 at the Linux OS level, but we don't enable IPv6 support for your VPC and resources. During the boot-up process, network.unit
would like to retrieve IPv6 IP address and wait for DHCP server for 5 min. After the timeout period, the systemd.unit
would continue the process, and cause a slow boot-up time over 5 minutes.