Discussion:
Slowness, I/O and kswapd issues
(too old to reply)
Peter Chant
2013-05-22 12:05:49 UTC
Permalink
Hi all,

I've been experiencing slowdowns on my machine that last several
seconds. Using iotop it is apparent that kswapd and various btrfs
threads are periodically maxing out the disk I/O capability. These
invariably freeze machine (mouse still works but not much else) until
they settle down.

I'm running btrfs, 64 bit slack 14.0, kernel 3.7.7 and I had swap set up
which had swappiness of 10. I've reduced it to 2. I have 4GB of
memory. The kde memory plasmoid shows me currently using 2.4GB out of
4GB. This is the most I've ever seen used. Swap is barely used.

Any thoughts? If I could not see the memory usage plasmoid I'd put the
periods of unresponsiveness down to swapping due to low memory, disk
thrashing. However, I'm running at 50% memory use - so I can't see it
being swap. I could turn off swap completely - I only configured it as
a 'safety net' in-case something caused me to run low on memory. I do
note that chromium seems to require /dev/shm, so removing that is not a
good option.

I'm not against increasing from 4GB to say 8GB of ram but would only do
so if I really needed to. If it is not a lack of memory problem then
I'd be better saving money it and putting it toward the next upgrade.

Thoughts?

Pete
Henrik Carlqvist
2013-05-22 18:30:24 UTC
Permalink
Post by Peter Chant
I've been experiencing slowdowns on my machine that last several
seconds. Using iotop it is apparent that kswapd and various btrfs
threads are periodically maxing out the disk I/O capability. These
invariably freeze machine (mouse still works but not much else) until
they settle down.
If this suddenly happens on a machine which has worked fine before I
would start to look at the output from dmesg. If you there find something
like IO errors or warnings from your HD it might explain your problems.

If so you might have to replace your HD, but if you are lucky the problem
is only some SATA or IDE cable causing loose contact.

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc351(at)poolhem.se Examples of addresses which go to spammers:
***@localhost ***@localhost
Peter Chant
2013-05-22 18:41:24 UTC
Permalink
Post by Henrik Carlqvist
Post by Peter Chant
I've been experiencing slowdowns on my machine that last several
seconds. Using iotop it is apparent that kswapd and various btrfs
threads are periodically maxing out the disk I/O capability. These
invariably freeze machine (mouse still works but not much else) until
they settle down.
If this suddenly happens on a machine which has worked fine before I
would start to look at the output from dmesg. If you there find something
like IO errors or warnings from your HD it might explain your problems.
This machine has always to me been slow with respect to IO. Could never
put my finger on why though. Nothing too bad in the output of dmesg -
some USB resets (HD is SATA):

[351729.756298] usb 2-4: reset high-speed USB device number 2 using ehci_hcd
[353927.388971] btrfs: unlinked 4 orphans
[353937.618770] btrfs: unlinked 2 orphans
[357512.010242] btrfs: unlinked 4 orphans
[357514.751109] btrfs: unlinked 2 orphans
[359247.005269] usb 2-4: reset high-speed USB device number 2 using ehci_hcd
[359602.819408] nm-applet[763]: segfault at d ip 00007f0bd1d8237e sp
00007fffce3f2d10 error 4 in libc-2.15.so[7f0bd1d38000+1b5000]
[361112.030008] btrfs: unlinked 4 orphans
[361118.775111] btrfs: unlinked 2 orphans
[364705.332288] btrfs: unlinked 4 orphans
[364711.712064] btrfs: unlinked 2 orphans
[368299.718498] btrfs: unlinked 4 orphans
[368305.277005] btrfs: unlinked 2 orphans
[369291.818879] usb 2-4: reset high-speed USB device number 2 using ehci_hcd
[371894.960340] btrfs: unlinked 4 orphans
[371899.599045] btrfs: unlinked 2 orphans
[372744.914650] usb 2-4: reset high-speed USB device number 2 using ehci_hcd
[375491.422723] btrfs: unlinked 4 orphans
[375496.592978] btrfs: unlinked 2 orphans
[376218.465138] usb 2-4: reset high-speed USB device number 2 using ehci_hcd
[379087.458468] btrfs: unlinked 4 orphans
[379094.887328] btrfs: unlinked 2 orphans
[380561.299067] usb 2-4: reset high-speed USB device number 2 using ehci_hcd
[381174.910231] usb 2-4: reset high-speed USB device number 2 using ehci_hcd
[382697.136238] btrfs: unlinked 6 orphans
[382715.872084] btrfs: unlinked 4 orphans
Post by Henrik Carlqvist
If so you might have to replace your HD, but if you are lucky the problem
is only some SATA or IDE cable causing loose contact.
regards Henrik
Henrik Carlqvist
2013-05-23 21:13:31 UTC
Permalink
Post by Peter Chant
This machine has always to me been slow with respect to IO. Could never
put my finger on why though.
Maybe it would be worth doing some benchmarks to find out when and why
the machine is slow. Some quick tests on raw io read performance:

Raw read of first 100 MB of disk (the first part is usually quickest):
bash-4.1# dd if=/dev/sda of=/dev/null bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.965756 s, 109 MB/s

Some more data:
bash-4.1# dd if=/dev/sda of=/dev/null bs=1024 count=10240000
10240000+0 records in
10240000+0 records out
10485760000 bytes (10 GB) copied, 91.3584 s, 115 MB/s

Not at the beginning of the disk but 400 GB into the disk:
bash-4.1# dd if=/dev/sda of=/dev/null bs=1024 count=10240000
skip=409600000
10240000+0 records in
10240000+0 records out
10485760000 bytes (10 GB) copied, 114.32 s, 91.7 MB/s

You get the idea. If you find that your row disk io performance is good
the next step is to do some benchmarks on your file system. Make sure to
create files that are a lot bigger than your RAM to not measure the speed
of the cache in RAM.

There are also benchmarks like iobench and flags like -t and -T for
hdparm, but I like dd myself as it is rather straightforward what is
going on.

It would also be possible to test raw io write performance, but that
would ruin any contents on the drive. If you want to test raw io write
performance you might want to do that on an unused partition only.

regards Henrik
--
The address in the header is only to prevent spam. My real address is:
hc351(at)poolhem.se Examples of addresses which go to spammers:
***@localhost ***@localhost
Peter Chant
2013-05-24 10:00:05 UTC
Permalink
Post by Henrik Carlqvist
Post by Peter Chant
This machine has always to me been slow with respect to IO. Could never
put my finger on why though.
Maybe it would be worth doing some benchmarks to find out when and why
I've had a quick go with your IO tests, results below for info. sda is
an SSD and sdb a hdd. Apart from the last large copy the numbers don't
look too bad.

However, I'm coming to a firm conclusion that the main culprit is
kswapd0. Every so often iotop reports that it is maxing the disk IO
despite reporting nothing for read, write or 'SWAPIN'. Googling shows
that others have had such an issue, but not a great number and I can't
find a clear solution. There does not appear to be any specific bug
identified or fixed. I'm running 3.7.7, there are a few comments on
3.7.x and 3.8.x kernels but also one from the 2.4.x series. I can't
nail down anything specific in searches.

One reporter did suggest faulty ram, but stated that memtest did not
show anything up. I'm not convinced by this argument, surely if I had
faulty ram then the machine would be very flakey - and faulty ram that
does not show up in memtest??? I've not got any spare ram to do a swap
and I don't fancy shelling out on a small chance it might be ram.
Post by Henrik Carlqvist
bash-4.1# dd if=/dev/sda of=/dev/null bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.965756 s, 109 MB/s
bash-4.2# dd if=/dev/sda of=/dev/null bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.404738 s, 259 MB/s

bash-4.2# dd if=/dev/sdb of=/dev/null bs=1024 count=102400
102400+0 records in
102400+0 records out
104857600 bytes (105 MB) copied, 0.939816 s, 112 MB/s
bash-4.2#
Post by Henrik Carlqvist
bash-4.1# dd if=/dev/sda of=/dev/null bs=1024 count=10240000
10240000+0 records in
10240000+0 records out
10485760000 bytes (10 GB) copied, 91.3584 s, 115 MB/s
bash-4.2# dd if=/dev/sda of=/dev/null bs=1024 count=10240000
10240000+0 records in
10240000+0 records out
10485760000 bytes (10 GB) copied, 38.7211 s, 271 MB/s
bash-4.2#
Post by Henrik Carlqvist
bash-4.1# dd if=/dev/sda of=/dev/null bs=1024 count=10240000
skip=409600000
10240000+0 records in
10240000+0 records out
10485760000 bytes (10 GB) copied, 114.32 s, 91.7 MB/s
bash-4.2# dd if=/dev/sdb of=/dev/null bs=1024 count=10240000
10240000+0 records in
10240000+0 records out
10485760000 bytes (10 GB) copied, 151.027 s, 69.4 MB/s
bash-4.2#
Post by Henrik Carlqvist
You get the idea. If you find that your row disk io performance is good
the next step is to do some benchmarks on your file system. Make sure to
create files that are a lot bigger than your RAM to not measure the speed
of the cache in RAM.
There are also benchmarks like iobench and flags like -t and -T for
hdparm, but I like dd myself as it is rather straightforward what is
going on.
It would also be possible to test raw io write performance, but that
would ruin any contents on the drive. If you want to test raw io write
performance you might want to do that on an unused partition only.
regards Henrik
Peter Chant
2013-05-26 08:51:17 UTC
Permalink
Replying to myself so that anyone who is interested knows of my findings.
Post by Peter Chant
However, I'm coming to a firm conclusion that the main culprit is
kswapd0. Every so often iotop reports that it is maxing the disk IO
despite reporting nothing for read, write or 'SWAPIN'. Googling shows
that others have had such an issue, but not a great number and I can't
find a clear solution. There does not appear to be any specific bug
identified or fixed. I'm running 3.7.7, there are a few comments on
3.7.x and 3.8.x kernels but also one from the 2.4.x series. I can't
nail down anything specific in searches.
Now running kernel 3.9.3. kswapd seems to be behaving. However,
perhaps a little early to firmly state that there are no issues as I've
not been running this kernel very long.
Post by Peter Chant
One reporter did suggest faulty ram, but stated that memtest did not
show anything up. I'm not convinced by this argument, surely if I had
faulty ram then the machine would be very flaky - and faulty ram that
does not show up in memtest??? I've not got any spare ram to do a swap
and I don't fancy shelling out on a small chance it might be ram.
Memtest86 showed no issues with my ram. I have had the odd hang on this
machine so there was a small element of doubt. However, though there
was the odd hang my uptimes were days or weeks (I never checked, but
they must have been) - which made me think ram was probably OK.

Pete
j***@gmail.com
2020-02-24 15:20:15 UTC
Permalink
Hey did updating your kernel work. was that the solution because I have been running in to the same problem and while doing anything related to reading or writing to disk kswapd takes 100% of my IO. I thought the problem was related with my swap size but even after fixing that the problem is still there.
Loading...