network monitoring with node_exporter – CloudWatch network metrics and Docker –net=host

I started updating dashboards in Grafana, and I ran into two interesting things.

The first is what is actually shown in CloudWatch for the network in the NetworkIn/Out (Bytes) graphs, how to interpret this data correctly, and how CloudWatch data correlates with the data of CloudWatch itself. node_exporter?

Second – why node_exporter must be launched in the host network mode?

First, let’s figure out what exactly CloudWatch shows: let’s run test EC2, there in Docker we’ll run node_exporterconnect it to Prometheus, load the network and look at the graphs from node_exporter and CloudWatch.

We start the instance, here t2.small, install Docker and Docker Compose:

[email protected]:/home/ubuntu# curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg –dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

> “deb [arch=$(dpkg –print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \

>   $(lsb_release -cs) stable” | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

[email protected]:/home/ubuntu# apt-get update && apt-get -y install docker-ce docker-ce-cli containerd.io
[email protected]:/home/ubuntu# curl -s -L “https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)” -o /usr/local/bin/docker-compose

launch node_exporter

Ready Docker Compose file for node_exporter:

---
version: '3.8'

services:
  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

We launch:

Pulling node_exporter (quay.io/prometheus/node-exporter:latest)…

latest: Pulling from prometheus/node-exporter

Check if there is data:

HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.

TYPE go_gc_duration_seconds summary

go_gc_duration_seconds{quantile=”0″} 0

go_gc_duration_seconds{quantile=”0.25″} 0

go_gc_duration_seconds{quantile=”0.5″} 0

Connecting Prometheus and Grafana

We add it to Prometehus – here I already have a running instance on our Dev-monitoring:

...
  - job_name: 'node-exporter'
    metrics_path: '/metrics'
    static_configs:
      - targets:
        - '18.117.88.151:9100'   # test node
...
    metric_relabel_configs:

        # test node
      - source_labels: [instance]
        regex: '18.117.88.151:9100'
        target_label: host
        replacement: 'test-node-exporter'
...

Restart, check the new target:

Add graphs to Grafana, use:

rate(node_network_receive_bytes_total{host="test-node-exporter", device="eth0"}[5m])
rate(node_network_transmit_bytes_total{host="test-node-exporter", device="eth0"}[5m])

rate(node_network_receive_packets_total{host="test-node-exporter", device="eth0"}[5m])
rate(node_network_transmit_packets_total{host="test-node-exporter", device="eth0"}[5m])

Checking the chart:

Network testing with iperf

Install iperf on the test machine and on the monitoring server:

On a test machine, run iperf in server mode (-s) – it will receive traffic:

————————————————————

Server listening on TCP port 5001

TCP window size:  416 KByte (WARNING: requested 1.91 MByte)

————————————————————

And on the monitoring server – in client mode, drive traffic to the test machine, run for 1800 seconds:

WARNING: TCP window size set to 2 bytes. A small window size

will give poor performance. See the Iperf documentation.

————————————————————

Client connecting to 18.117.88.151, TCP port 5001

TCP window size: 4.50 KByte (WARNING: requested 2.00 Byte)

————————————————————

[  3] local 10.0.0.8 port 50174 connected with 18.117.88.151 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3] 0.0- 1.0 sec 49.8 MBytes 417 Mbits/sec

[  3] 1.0- 2.0 sec 50.9 MBytes 427 Mbits/sec

[  3] 2.0- 3.0 sec 51.6 MBytes 433 Mbits/sec

[  3] 3.0- 4.0 sec 51.9 MBytes 435 Mbits/sec

[  3] 4.0- 5.0 sec 51.8 MBytes 434 Mbits/sec

[  3] 5.0- 6.0 sec 52.5 MBytes 440 Mbits/sec

[  3] 6.0- 7.0 sec 48.5 MBytes 407 Mbits/sec

[  3] 7.0- 8.0 sec 46.4 MBytes 389 Mbits/sec

First there was a spike, then it stabilized at 15 megabytes / second:

A client iperf on the monitoring host tells us:

[  3] 1718.0-1719.0 sec 14.6 MBytes 123 Mbits/sec

[  3] 1719.0-1720.0 sec 14.5 MBytes 122 Mbits/sec

[  3] 1720.0-1721.0 sec 14.6 MBytes 123 Mbits/sec

[  3] 1721.0-1722.0 sec 14.5 MBytes 122 Mbits/sec

[  3] 1722.0-1723.0 sec 14.6 MBytes 123 Mbits/sec

[  3] 1723.0-1724.0 sec 14.5 MBytes 122 Mbits/sec

[  3] 1724.0-1725.0 sec 14.5 MBytes 122 Mbits/sec

We translate bits into bytes – and we get the same 15 MB / s:

CloudWatch vs node_exporter

Now we look in CloudWatch:

We have:

  • 4,862,000,000 bytes (StatisticSum)
  • transmitted in 5 minutes (Period5 Minutes)

To translate this into megabytes per second, we execute:

  • divide 4.862.000.000 by 300 – the number of seconds in 5 minutes
  • and divide the result by 1024 twice – bytes into kilobytes, then into megabytes

To get bits per second, then multiply by 8 again.

We believe:

4862000000/300/1024/1024

15

Required 15 MByte/sec, or 120 Mbit/sec.

And the second interesting thing is why node_exporter should run in host network mode?

To check, restart node_exporterbut let’s remove network_mode: host:

---
version: '3.8'

services:

  node_exporter:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter
    command:
      - '--path.rootfs=/host'
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

We repeat the test with iperfand we see … nothing:

72 bytes/second though iperf gives the same result in the region of ~ 120 Mbps.

node_exporter and netstat

To begin with, let’s look at documentation node_exporter – how exactly does it take data about the network?

netstat Exposes network statistics from /proc/net/netstat. This is the same information as netstat -s. Linux

Reads content /proc/net/netstat.

What is the difference between Docker host mode and bridge mode? Reading documentation Docker:

container’s network stack is not isolated from the Docker host (the container shares the host’s networking namespace)

Well, let’s make sure: let’s run two parallel node_exporter – one in host mode, the second – in bridge:

---
version: '3.8'

services:

  node_exporter_1:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter_host
    command:
      - '--path.rootfs=/host'
    network_mode: host
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

  node_exporter_2:
    image: quay.io/prometheus/node-exporter:latest
    container_name: node_exporter_bridge
    command:
      - '--path.rootfs=/host'
    pid: host
    restart: unless-stopped
    volumes:
      - '/:/host:ro,rslave'

Find the Container ID:

CONTAINER ID   IMAGE                                     COMMAND                  CREATED          STATUS         PORTS      NAMES

47b7fd130812   quay.io/prometheus/node-exporter:latest   “/bin/node_exporter …”   9 seconds ago    Up 8 seconds   9100/tcp   node_exporter_bridge

daff8458e7bc   quay.io/prometheus/node-exporter:latest   “/bin/node_exporter …”   54 seconds ago   Up 8 seconds              node_exporter_host

Using CID 47b7fd130812 – container with node_exporter in bridge mode – find the PID with which it is running on the host:

Using nsenter – through the network namespace of the process we check the contents /proc/net/netstat:

r[email protected]:/home/ubuntu# nsenter –net=/proc/4561/ns/net cat /proc/net/netstat

TcpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

IpExt: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Empty.

We repeat the same for the container in host network mode – we find the PID:

And check the data in the namespace:

[email protected]:/home/ubuntu# nsenter –net=/proc/4505/ns/net cat /proc/net/netstat

TcpExt: 0 0 0 8 23 0 0 0 0 0 100 0 0 0 0 95 1 9 0 0 17105774 3547 3110 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 36 1 0 0 0 3758 0 9 0 0 0 2 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62 217266 0 0 1 1 0 0 0 0 0 0 0 0 0 2261 0 0 81 17 11692 2 32 0 0 0 0 0 0 0 0 0 0 0 0 11823 0 126847 0 0 0 0 21 0 0 0

IpExt: 0 0 0 4 0 0 72718074409 510047067 0 160 0 0 0 49856224 0 22 0 0

Repeat directly on the host machine:

TcpExt: 0 0 0 8 23 0 0 0 0 0 100 0 0 0 0 95 1 9 0 0 17105774 3555 3116 0 0 0 0 0 0 0 0 0 0 19 0 0 0 0 0 36 1 0 0 0 3758 0 9 0 0 0 2 3 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 62 217266 0 0 1 1 0 0 0 0 0 0 0 0 0 2269 0 0 81 17 11717 2 32 0 0 0 0 0 0 0 0 0 0 0 0 11848 0 126847 0 0 0 0 21 0 0 0

IpExt: 0 0 0 4 0 0 72718075935 510071965 0 160 0 0 0 49856242 0 22 0 0

Everything converges – a container in host mode uses data from the host.

Amazon web services,CloudWatch,Docker,HOWTO’s,Monitoring,Networking,Prometheus,Virtualization,Проблемы и решения,AWS CloudWatch,

#network #monitoring #node_exporter #CloudWatch #network #metrics #Docker #nethost

Leave a Comment

Your email address will not be published. Required fields are marked *