This is part of a series of articles that covers the booting of an OSR5 machine. See Booting OSR5 for other related articles.
Thanks to Bela Lubkin for some comments and suggestions that helped clarify this article.
Normally, the swap and dump device are identical. You could change that by editing the file /etc/conf/cf.d/sassign and relinking a new kernel. But why would you want to? There might be reasons, but first we need to understand what swap and dump are for.
Dump is pretty simple. Its only purpose is to receive a kernel dump. Therefore there are two immediately obvious things to be said about dump: unless you are a kernel or driver developer who expects to be regularly crashing your system, you may NEVER need dump at all, and secondly, if you do ever need it, it had better be big enough to hold everything currently in memory. That's what a dump is: the contents of memory. If you have 256 MB of memory, you'd need 256 MB of dump space.
What if you had 4 GiB of memory? OSR5 does support that, unlikely as it used to be for most of us. Funny that it's file size is still 2GiB, while memory supports the full 4GiB. But there's a problem there: to analyze a system dump, you normally would copy it (using dd ) from the swap or dump device to a file on disk. But since the maximum size of a file is currently 2 GiB on OSR5, you could only copy half of a 4GiB dump. Given that, it would seem that there would be little value in having dump be more than 2 GiB no matter how much memory you had.
However, it turns out that "crash" (which is what you'd use to analyze the dump) can read directly from the dump device- you don't have to transfer it to a file. So in the case of a machine with more than 2 GiB of RAM, you might very well want a separate dump device. Even if you aren't capable of crash analysis, you could write the dump directly from the dump device to tape so that it could be sent to someone else. Are you likely to do that? Probably not. When most of us have panic problems, we either have a pretty good idea where it's coming from (because we just added something) or we just start ripping things out until the problem goes away.
The most common crash is a Panic Trap 0x0000000E, and that's often bad memory, which is both cheap and easy to fix.
Given the uses your machine is put to, what are the chances that you'd run "crash" to debug a dump or pay money for someone else to do it? Maybe you don't need that dump space at all? Remember though: if you have a crash and don't have enough dump space, it may be too late to change your mind.
You can test dumping with the "sysdump" program. This is actually a tool for copying, compressing and uncompressing dumps, but it can also be used to create a dump on demand. That could be useful if a kernel needs to be professionally examined or if you just want to test your dump device. You could, for example, run
/etc/sysdump -i /dev/mem -n /unix -o /dev/swap
Bela Lubkin commented on that:
This is a decent test, but! If you do this to a running system which is
actually swapping, you will have a big problem. You should have them
run `swap -l` first to make sure swap is idle. You could then go into a
big discussion of what to do if it isn't, but I think that would be a
distraction. (If it isn't: could run `swap -d /dev/swap` and see if
that works -- it will start shoving stuff in from swap, and it'll
succeed if there's enough RAM, as would be the case if you swapped once
for a little while but current memory requirements fit within RAM.
Otherwise they could make a swap _file_ and `swap -a` it, then `swap -d
/dev/swap` and it'll shuffle stuff from one to the other.)
But as I said, a distraction. All you really need is to warn them --
don't try this if swap is in use ("blocks" != "free" in `swap -l
You specify where dump is to go by a "dump" keyword (dump=/dev/mydump) in /etc/default/boot or passed on the boot command line. Unlike swap, dump can't span multiple devices or use a file in the filesystem. You can also say "dump=none" if you definitely don't want to save anything.
See https://aplawrence.com/cgi-bin/ta.pl?arg=105935 for more information.
If you really want to test by creating an actual panic, see https://aplawrence.com/cgi-bin/ta.pl?arg=103679.
Swap is more complex. Swap actually has two purposes: to store processes if and when the kernel gets so low on free memory that it has to swap them out, and to serve as backing store for virtual memory. And that is a concept that is often misunderstood. For example, it is "common knowledge" that you need as much swap as you have memory, even if you have a separate dump device. It turns out that that is not true, at least not on OSR5 systems.
As I write this, my machine has 128 MB of ram and just 1 MB of swap configured. Here's "swap -l" and "memsize":
# swap -l path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 # memsize 129626112
You might think I couldn't do much with that configuration, but that's not the case. Here's what's happening right now:
Skip to the end of this output # ps -el F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 71 S 0 0 0 0 95 20 fb117000 0 f03ab55c ? 00:00:00 sched 20 S 0 1 0 0 66 20 fb117158 140 e0000000 ? 00:00:01 init 71 S 0 2 0 0 95 20 fb1172b0 0 f01a7aa0 ? 00:00:00 vhand 71 S 0 3 0 0 95 20 fb117408 0 f001f820 ? 00:00:00 bdflush 71 S 0 4 0 0 95 20 fb117560 0 f0228004 ? 00:00:00 kmdaemon 71 S 0 5 1 0 95 20 fb1176b8 0 c01b0150 ? 00:00:02 htepi_daemon 71 S 0 6 0 0 95 20 fb117810 0 f02ef708 ? 00:00:00 strd 20 S 0 752 1 0 73 20 fb117968 160 fb117968 tty01 00:00:00 login 20 S 0 48 1 0 76 20 fb117ac0 108 f0240ce4 ? 00:00:00 syslogd 20 S 0 52 1 0 73 20 fb117c18 340 fb117c18 ? 00:00:00 ifor_pmd 20 S 0 53 52 0 76 20 fb117d70 448 f0240ce4 ? 00:00:01 ifor_pmd 71 S 0 41 1 0 95 20 fb117ec8 0 c102b150 ? 00:00:00 htepi_daemon 20 S 0 79 1 0 75 24 fb118020 52 fce282d0 ? 00:00:00 strerr 20 S 0 92 1 0 76 20 fb118178 956 f0240ce4 ? 00:00:17 agent 20 S 0 61 53 0 76 20 fb1182d0 284 f0240ce4 ? 00:00:00 sco_cpd 20 S 0 62 53 0 76 20 fb118428 332 f0240ce4 ? 00:00:00 ifor_sld 20 S 0 700 1 0 76 24 fb118580 556 f0240ce4 ? 00:00:00 httpd 20 S 28 753 722 0 75 24 fb1186d8 100 fce2a0e8 ? 00:00:00 dnsserver 20 S 28 749 722 0 75 24 fb118830 100 fce2a448 ? 00:00:00 dnsserver 20 S 0 390 1 0 76 20 fb118988 144 fc1ffb56 ? 00:00:00 cron 20 S 28 701 700 0 75 24 fb118ae0 560 fce2a7a8 ? 00:00:00 httpd 20 S 0 253 1 0 66 20 fb118c38 156 e0000000 ? 00:00:00 pwrd 20 S 0 435 1 0 76 20 fb118d90 256 f0240ce4 ? 00:00:00 pppd 71 S 0 190 1 0 95 20 fb118ee8 0 c102f150 ? 00:00:00 htepi_daemon 71 S 0 194 1 0 95 20 fb119040 0 c1033150 ? 00:00:00 htepi_daemon 71 S 0 198 1 0 95 20 fb119198 0 c1037150 ? 00:00:00 htepi_daemon 71 S 0 202 1 0 95 20 fb1192f0 0 c103b150 ? 00:00:00 htepi_daemon 20 S 0 255 253 0 76 20 fb119448 64 fc1fc7c6 ? 00:00:00 listen 20 S 0 264 1 0 76 20 fb1195a0 64 f0240ce4 ? 00:00:00 dlpid 20 S 0 529 528 0 76 20 fb1196f8 1060 f0240ce4 ? 00:00:00 ns-admin 20 S 0 405 1 0 76 20 fb119850 216 fc1f55b8 ? 00:00:01 lpsched 20 S 0 434 1 0 76 24 fb1199a8 140 f0240ce4 ? 00:00:00 inetd 20 S 0 436 435 0 76 20 fb119b00 284 f0240ce4 ? 00:00:00 pppd 20 S 0 1415 729 0 66 24 fb119c58 56 e0000000 ? 00:00:00 sleep 20 S 0 445 1 0 76 24 fb119db0 108 f0240ce4 ? 00:00:00 lpd 20 S 0 823 436 0 66 20 fb119f08 284 e0000000 ? 00:00:00 pppd 20 S 0 457 1 0 76 24 fb11a060 304 f0240ce4 ? 00:00:00 snmpd 20 S 0 999 755 0 73 20 fb11a1b8 68 fb11a1b8 tty03 00:00:00 sh 20 S 17 462 1 0 66 20 fb11a310 136 e0000000 ? 00:00:00 deliver 20 S 0 530 529 0 75 20 fb11a468 660 fce2ad48 ? 00:00:00 ns-admin 20 S 0 1006 999 0 73 20 fb11a5c0 68 fb11a5c0 tty03 00:00:00 sh 20 S 0 528 1 0 66 20 fb11a718 388 e0000000 ? 00:00:00 ns-admin 20 S 0 669 665 0 76 20 fb11a870 676 f0240ce4 ? 00:00:00 vfsd 20 S 0 665 1 0 76 20 fb11a9c8 676 f0240ce4 ? 00:00:00 vfsd 20 S 0 1007 1006 0 73 20 fb11ab20 88 fb11ab20 tty03 00:00:00 edge.nightly 20 S 0 563 1 0 76 24 fb11ac78 240 fc206406 ? 00:00:01 logsrv 20 S 0 664 1 0 76 18 fb11add0 200 f0240ce4 ? 00:00:00 vfslockd 20 S 0 670 669 0 76 20 fb11af28 676 f0240ce4 ? 00:00:00 vfsd 20 R 0 1618 799 2 76 20 fb11b080 236 - tty01 00:00:01 vi 20 S 28 722 1 0 76 24 fb11b1d8 692 f0240ce4 ? 00:00:01 squid 20 S 0 711 1 0 76 24 fb11b330 196 fc205e8e ? 00:00:00 calserver 20 S 0 713 711 0 76 24 fb11b488 204 fc205c36 ? 00:00:00 calserver 20 S 0 715 1 0 76 24 fb11b5e0 88 fc209ab6 ? 00:00:00 caldaemon 20 S 0 696 1 0 76 24 fb11b738 320 f0240ce4 ? 00:00:00 scohttpd 20 S 0 729 1 0 73 24 fb11b890 60 fb11b890 ? 00:00:00 sh 20 S 28 750 722 0 75 24 fb11b9e8 100 fce2a328 ? 00:00:00 dnsserver 20 R 0 1138 1007 30 39 20 fb11bb40 8720 - tty03 00:14:53 edge 20 S 28 738 722 0 75 24 fb11bc98 100 fce2a568 ? 00:00:00 dnsserver 20 S 201 1063 754 0 73 20 fb11bdf0 60 fb11bdf0 tty02 00:00:00 sh 20 S 28 751 722 0 75 24 fb11bf48 100 fce2a208 ? 00:00:00 dnsserver 20 S 0 754 1 0 73 20 fb11c0a0 160 fb11c0a0 tty02 00:00:00 login 20 S 0 755 1 0 73 20 fb11c1f8 160 fb11c1f8 tty03 00:00:00 login 20 S 0 756 1 0 75 20 fb11c350 124 f02289f4 tty04 00:00:00 getty 20 S 0 757 1 0 75 20 fb11c4a8 124 f0228a5c tty05 00:00:00 getty 20 S 28 758 722 0 76 24 fb11c600 136 f0240ce4 ? 00:00:00 ftpget 20 S 0 759 1 0 75 20 fb11c758 124 f0228ac4 tty06 00:00:00 getty 20 S 0 760 1 0 75 20 fb11c8b0 124 f0228b94 tty08 00:00:00 getty 20 S 0 761 1 0 75 20 fb11ca08 124 f0228bfc tty09 00:00:00 getty 20 S 0 762 1 0 75 20 fb11cb60 124 f0228c64 tty10 00:00:00 getty 20 S 0 763 1 0 75 20 fb11ccb8 124 f0228ccc tty11 00:00:00 getty 20 S 0 764 1 0 75 20 fb11ce10 124 f0228d34 tty12 00:00:00 getty 20 S 28 765 722 0 76 24 fb11cf68 48 f083d640 ? 00:00:00 unlinkd 20 S 0 766 1 0 81 20 fb11d0c0 104 fc20a4e0 ? 00:00:00 sdd 20 S 0 792 752 0 73 20 fb11d218 68 fb11d218 tty01 00:00:00 sh 20 S 0 799 792 0 73 20 fb11d370 128 fb11d370 tty01 00:00:00 ksh 20 S 201 1076 1063 0 73 20 fb11d4c8 68 fb11d4c8 tty02 00:00:00 sh 20 S 201 1080 1076 0 73 20 fb11d620 212 fb11d620 tty02 00:00:00 xinit 20 S 201 1081 1080 0 76 0 fb11d778 11788 f0240ce4 tty02 00:03:24 Xsco 20 S 201 1083 1081 0 76 0 fb11d8d0 1112 f09056c0 tty02 00:00:01 vbiosd 20 S 201 1145 1088 0 76 20 fb11da28 2736 f0240ce4 tty02 00:00:15 xdt3_binary 20 S 201 1205 1088 0 76 20 fb11db80 712 f0240ce4 tty02 00:00:01 pmwm 20 S 201 1088 1080 0 76 20 fb11dcd8 504 f0240ce4 tty02 00:00:00 scosession 20 S 201 1206 1145 2 76 20 fb11de30 13056 f0240ce4 tty02 00:02:00 netscape-expor 30 S 0 1179 1138 1 81 20 fb11df88 612 f031fd64 tty03 00:00:31 edge 20 S 201 1212 1206 0 76 20 fb11e0e0 3940 f0240ce4 tty02 00:00:00 netscape-expor 20 S 28 1663 1662 0 75 20 fb11e238 468 fce2cd10 ? 00:00:00 httpd 20 S 28 1664 1662 0 75 20 fb11e390 468 fce2cd10 ? 00:00:00 httpd 20 S 28 1665 1662 0 75 20 fb11e4e8 468 fce2cd10 ? 00:00:00 httpd 20 S 28 1666 1662 0 75 20 fb11e640 468 fce2cd10 ? 00:00:00 httpd 20 S 0 1662 1 0 76 20 fb11e798 452 f0240ce4 ? 00:00:00 httpd 20 S 28 1667 1662 0 75 20 fb11e8f0 468 fce2cd10 ? 00:00:00 httpd 20 S 0 1668 1618 3 73 20 fb11ea48 60 fb11ea48 tty01 00:00:00 sh 20 O 0 1669 1668 8 48 20 fb11eba0 148 - tty01 00:00:00 ps
Notice that there's an X session with Netscape running on tty02 and an Edge backup going on on tty03, plus this editing on tty01, plus all the system stuff including Squid and Apache! Another interesting thing to note is the sum of the "size" column:
# ps -e -o size| awk '{ sum += $1
}
END { print "Sum", sum }'
Sum 62256
There's over 62 megabytes worth of programs being run right now; "sar -r" confirms that:
# sar -r 1 1 SCO_SV scobox 3.2v5.0.4 Pentium 09/30/99 12:14:56 freemem freeswp (-r) 12:14:57 13942 2000
Memory is expressed in 4k pages here and swap is 512 byte blocks, so that's 57,106,432 bytes of free memory and 1 MB of swap.
This happens to be a 5.0.4 machine. On 5.0.5, sar -r shows two other colums- we'll get to those shortly.
There are 13942 free pages, which means that (13942 * 4096) 57,106,432 bytes are free. The system started out (after loading the kernel and allocating its buffers and variables) with 27,589 pages.
You get that from availrmem- on 5.0.5 that shows up in sar -r, on older systems you can do:
# echo "od -d availrmem" | crash dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e120: 0000027589
The figure doesn't remain completely constant, but will remain close to the same amount most of the time.
As 27589 - 13942 = 13647, and that times 4096 is 55,898,112, obviously some memory usage changed between the times I did these samples (the total available user memory minus the currently free pages should be the memory in use). This script tries to get closer to making everything agree:
echo "od -d freemem" | crash &
ps -e -o size| awk '{ sum += $1
}
END { print "Sum", sum }'
That "freemem" is the same thing sar -r reports. I put the "crash" session in background so that it has a chance of being included in the ps output; the results are:
Sum 66476 dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e118: 0000012961
That comes out closer, but you'll never get exact on a system that's working while you are measuring. The important point is that here we have a system with 60 megabytes or so of memory in use, and it is running quite happily with 1 MB of swap. Why?
The confusion of the "common knowledge" is due to the fact that virtual memory is what is really important, and virtual memory is the sum of available memory and available swap. With nothing running, that would be availrmem plus swap. Obviously you never have nothing running, so the sum of available ram and available swap is kept track of in a kernel variable "availsmem" (note "availSmem" vs. "availRmem"). Prior to R5.0.5, the only way to find out what the value of it was at any time was to run
echo "od -d availsmem" | crash
Starting with 5.0.5, "sar -r" lists availsmem and availrmem (amount of ram not being used by the kernel). So lets do some testing to see what happens here when we ask for even more memory. First a little shell script:
# cat once #!/bin/sh # "once" echo availsmem freeswap freemem echo "od -d availsmem od -d freeswap od -d freemem" | crash swap -l ps -l # ./once availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000012163 f020e11c: 0000000250 f020e118: 0000012795 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 752 1 0 73 20 fb117968 160 fb117968 tty01 00:00:00 login 20 S 0 1618 799 1 76 20 fb11b080 240 f0f20300 tty01 00:00:04 vi 20 S 0 792 752 0 73 20 fb11d218 68 fb11d218 tty01 00:00:00 sh 20 S 0 799 792 0 73 20 fb11d370 128 fb11d370 tty01 00:00:00 ksh 20 S 0 2009 1618 1 73 20 fb11f258 60 fb11f258 tty01 00:00:00 sh 20 S 0 2010 2009 29 73 20 fb11f3b0 60 fb11f3b0 tty01 00:00:00 sh 20 O 0 2014 2010 6 48 20 fb11f508 148 - tty01 00:00:00 ps
That will give us a quick snapshot of what's happening. Now lets write some C programs to use some memory. The first allocates a 2 MB buffer on its stack, the second uses a static array. Both of them call the "once" script three times while running:
# cat stackarray.c
/* stackarray.c */
#include <stdlib.h>
main()
{
system("./once");
memfunc();
outfunc();
exit(0);
}
outfunc() {
system("./once");
}
memfunc()
{
char array[2 * 1024 * 1024];
outfunc();
}
# cat staticarray.c
/* staticarray.c
#include <stdlib.h>
main()
{
system("./once");
memfunc();
outfunc();
exit(0);
}
outfunc() {
system("./once");
}
memfunc()
{
static char array[2 * 1024 * 1024];
outfunc();
}
# cc -o staticarray staticarray.c
# cc -o stackarray stackarray.c
# size stackarray.c staticarray
stackarray: 26396 + 4312 + 440 = 31148
staticarray: 26392 + 4312 + 2097592 = 2128296
Note the difference between these in the last (.bss column). That's because the 2 MB array won't be setup until stackarray runs.
# ./stackarray availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000014629 f020e11c: 0000000250 f020e118: 0000015313 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 2067 2041 0 73 20 fb11b080 16 fb11b080 ttyp1 00:00:00 stackarray 20 S 0 2068 2067 0 73 20 fb11f258 60 fb11f258 ttyp1 00:00:00 sh 20 S 0 2041 2040 0 73 20 fb11f660 60 fb11f660 ttyp1 00:00:00 sh 20 S 0 2069 2068 9 73 20 fb11f7b8 60 fb11f7b8 ttyp1 00:00:00 sh 20 O 0 2073 2069 8 48 20 fb11f910 148 - ttyp1 00:00:00 ps
Here we see that stackarray has only used 16K of memory (size) when it first loads.
availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000014117 f020e11c: 0000000250 f020e118: 0000015312 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 2067 2041 10 73 20 fb11b080 2064 fb11b080 ttyp1 00:00:00 stackarray 20 S 0 2074 2067 1 73 20 fb11f258 60 fb11f258 ttyp1 00:00:00 sh 20 S 0 2041 2040 0 73 20 fb11f660 60 fb11f660 ttyp1 00:00:00 sh 20 S 0 2075 2074 21 73 20 fb11f7b8 60 fb11f7b8 ttyp1 00:00:00 sh 20 O 0 2079 2075 7 48 20 fb11f910 148 - ttyp1 00:00:00 ps
After the function is called, memory usage goes up to 2064K, and notice that availsmem goes down accordingly (14117 vs 14629). But "freemem" stays about the same, because we really haven't done anything with those pages yet- they are allocated, which affects availsmem, but no physical RAM has been assigned to them, and won't be unless and until we write something into them- which we don't in this test.
availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000014117 f020e11c: 0000000250 f020e118: 0000015312 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 R 0 2067 2041 20 43 20 fb11b080 2064 - ttyp1 00:00:00 stackarray 20 S 0 2080 2067 2 73 20 fb11f258 60 fb11f258 ttyp1 00:00:00 sh 20 S 0 2041 2040 0 73 20 fb11f660 60 fb11f660 ttyp1 00:00:00 sh 20 R 0 2081 2080 18 44 20 fb11f7b8 60 - ttyp1 00:00:00 sh 20 O 0 2085 2081 6 48 20 fb11f910 148 - ttyp1 00:00:00 ps
After the function returns, the space is still being shown as used, but of course it all comes back when the program exits:
# ./once availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000014657 f020e11c: 0000000250 f020e118: 0000015345 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 R 0 2086 2041 30 39 20 fb11b080 60 - ttyp1 00:00:00 sh 20 O 0 2090 2086 7 48 20 fb11f258 148 - ttyp1 00:00:00 ps 20 S 0 2041 2040 4 73 20 fb11f660 60 fb11f660 ttyp1 00:00:00 sh
Now lets try the static array:
# ./staticarray availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000014117 f020e11c: 0000000250 f020e118: 0000015312 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 2091 2041 2 73 20 fb11b080 2064 fb11b080 ttyp1 00:00:00 staticarray 20 S 0 2092 2091 2 73 20 fb11f258 60 fb11f258 ttyp1 00:00:00 sh 20 S 0 2041 2040 1 73 20 fb11f660 60 fb11f660 ttyp1 00:00:00 sh 20 R 0 2093 2092 31 39 20 fb11f7b8 60 - ttyp1 00:00:00 sh 20 O 0 2097 2093 7 48 20 fb11f910 148 - ttyp1 00:00:00 ps
The immediate difference is that the memory use shows up right away as we'd expect. Still no usage of real RAM, and for the same reason.
availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000014117 f020e11c: 0000000250 f020e118: 0000015312 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 R 0 2091 2041 22 42 20 fb11b080 2064 - ttyp1 00:00:00 staticarray 20 S 0 2098 2091 1 73 20 fb11f258 60 fb11f258 ttyp1 00:00:00 sh 20 S 0 2041 2040 0 73 20 fb11f660 60 fb11f660 ttyp1 00:00:00 sh 20 R 0 2099 2098 17 45 20 fb11f7b8 60 - ttyp1 00:00:00 sh 20 O 0 2103 2099 7 48 20 fb11f910 148 - ttyp1 00:00:00 ps availsmem freeswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000014117 f020e11c: 0000000250 f020e118: 0000015312 path dev swaplo blocks free /dev/swap 1,41 0 2000 2000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 R 0 2091 2041 50 32 20 fb11b080 2064 - ttyp1 00:00:00 staticarray 20 S 0 2104 2091 3 73 20 fb11f258 60 fb11f258 ttyp1 00:00:00 sh 20 S 0 2041 2040 0 73 20 fb11f660 60 fb11f660 ttyp1 00:00:00 sh 20 R 0 2105 2104 28 40 20 fb11f7b8 60 - ttyp1 00:00:00 sh 20 O 0 2109 2105 5 49 20 fb11f910 148 - ttyp1 00:00:00 ps #
Note in all of this, we still only had 1 MB of swap to work with. In these programs alone we allocated more than 2 MB of space, not even counting the 50 or 60 megabytes being used for other programs. This proves that you do not need swap for virtual memory if you have sufficient real memory. Also note that "swap -l" never changes, because no swap has been used (swap -l wouldn't show you vm usage anyway).
What happens if we turn things upside down? To find out, I put swap back at 128 MB, and forced memory to 48 MB by typing
mem=0k-639k,1m-16m,16m-48m/s/n
at the boot prompt before booting. The "once" program shows this before starting up anything other than the single login:
# ./once availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000036463 f020e11c: 0000032000 f020e118: 0000005219 path dev swaplo blocks free /dev/swap 1,41 0 256000 256000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 809 808 1 73 20 fb11dcd8 60 fb11dcd8 ttyp0 00:00:00 sh 20 S 0 817 809 33 73 20 fb11de30 60 fb11de30 ttyp0 00:00:00 sh 20 O 0 821 817 5 49 20 fb11e238 148 - ttyp0 00:00:00 ps #
No swapping, 20 meg or so free. Now start up X and Netscape:
# ./once availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000030911 f020e11c: 0000031946 f020e118: 0000000039 path dev swaplo blocks free /dev/swap 1,41 0 256000 255576 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 940 809 17 73 20 fb119f08 60 fb119f08 ttyp0 00:00:00 sh 20 S 0 809 808 0 73 20 fb11dcd8 60 fb11dcd8 ttyp0 00:00:00 sh 20 O 0 944 940 9 48 20 fb11ea48 148 - ttyp0 00:00:00 ps
It had to use a little swap to get Netscape up.
# ./stackarray availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000030873 f020e11c: 0000031944 f020e118: 0000000044 path dev swaplo blocks free /dev/swap 1,41 0 256000 255552 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 946 809 3 73 20 fb119f08 16 fb119f08 ttyp0 00:00:00 stackarray 20 S 0 809 808 2 73 20 fb11dcd8 60 fb11dcd8 ttyp0 00:00:00 sh 20 S 0 947 946 3 73 20 fb11ea48 60 fb11ea48 ttyp0 00:00:00 sh 20 S 0 948 947 31 73 20 fb11ecf8 60 fb11ecf8 ttyp0 00:00:00 sh 20 O 0 952 948 6 48 20 fb11ee50 148 - ttyp0 00:00:00 ps availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000030361 f020e11c: 0000031944 f020e118: 0000000043 path dev swaplo blocks free /dev/swap 1,41 0 256000 255552 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 946 809 22 73 20 fb119f08 2064 fb119f08 ttyp0 00:00:00 stackarray 20 S 0 809 808 1 73 20 fb11dcd8 60 fb11dcd8 ttyp0 00:00:00 sh 20 S 0 953 946 1 73 20 fb11ea48 60 fb11ea48 ttyp0 00:00:00 sh 20 R 0 954 953 26 41 20 fb11ecf8 60 - ttyp0 00:00:00 sh 20 O 0 958 954 7 48 20 fb11ee50 148 - ttyp0 00:00:00 ps availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000030361 f020e11c: 0000031944 f020e118: 0000000043 path dev swaplo blocks free /dev/swap 1,41 0 256000 255552 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 R 0 946 809 58 29 20 fb119f08 2064 - ttyp0 00:00:00 stackarray 20 S 0 809 808 1 73 20 fb11dcd8 60 fb11dcd8 ttyp0 00:00:00 sh 20 S 0 959 946 3 73 20 fb11ea48 60 fb11ea48 ttyp0 00:00:00 sh 20 R 0 960 959 30 39 20 fb11ecf8 60 - ttyp0 00:00:00 sh 20 O 0 964 960 6 48 20 fb11ee50 148 - ttyp0 00:00:00 ps
Notice that swap changes a little bit, but stays the same as availsmem goes down. This shows that "swap -l" means nothing with regard to availsmem- these are entirely separate statistics.
Now lets try something else. We'll modify the "stackarray" code so that it actually uses the memory:
/* stackarray.c with actual use of array */
#include <stdlib.h>
main()
{
system("./once");
memfunc();
outfunc();
}
outfunc() {
system("./once");
}
memfunc()
{
int x;
char array[2 * 1024 * 1024];
outfunc();
for (x=0; x < 2 * 1024 * 1024; x+= 4096) {
array[x]=x;
}
}
When we run it, there's an interesting difference: notice that "freemem" goes down after the memory is actually used, but "availsmem" remains the same throughout. That's because until we actually put something in the array, it's just pointers to virtual memory- no real memory gets allocated until we really need it. This run is with 128 MB of memory and 128 MB of swap, but it shows what actually happens (there is no difference when run with 1 MB of swap- only the "swap" figures change):
availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000047704 f020e11c: 0000032000 f020e118: 0000016512 path dev swaplo blocks free /dev/swap 1,41 0 256000 256000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 777 1 0 73 20 fb118988 160 fb118988 tty03 00:00:00 login 20 S 0 1344 777 0 73 20 fb11b080 68 fb11b080 tty03 00:00:00 sh 20 S 0 1351 1344 1 73 20 fb11e390 128 fb11e390 tty03 00:00:01 ksh 20 S 0 2190 1351 1 73 20 fb11e798 16 fb11e798 tty03 00:00:00 stackarray 20 S 0 2191 2190 1 73 20 fb11f3b0 60 fb11f3b0 tty03 00:00:00 sh 20 R 0 2192 2191 21 43 20 fb11f508 60 - tty03 00:00:00 sh 20 O 0 2196 2192 6 48 20 fb11f660 148 - tty03 00:00:00 ps availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000047192 f020e11c: 0000032000 f020e118: 0000016511 path dev swaplo blocks free /dev/swap 1,41 0 256000 256000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 777 1 0 73 20 fb118988 160 fb118988 tty03 00:00:00 login 20 S 0 1344 777 0 73 20 fb11b080 68 fb11b080 tty03 00:00:00 sh 20 S 0 1351 1344 1 73 20 fb11e390 128 fb11e390 tty03 00:00:01 ksh 20 R 0 2190 1351 31 39 20 fb11e798 2064 - tty03 00:00:00 stackarray 20 S 0 2197 2190 4 73 20 fb11f3b0 60 fb11f3b0 tty03 00:00:00 sh 20 R 0 2198 2197 30 39 20 fb11f508 60 - tty03 00:00:00 sh 20 O 0 2202 2198 6 48 20 fb11f660 148 - tty03 00:00:00 ps availsmem freswap freemem dumpfile = /dev/mem, namelist = /unix, outfile = stdout f020e124: 0000047192 f020e11c: 0000032000 f020e118: 0000016000 path dev swaplo blocks free /dev/swap 1,41 0 256000 256000 F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD 20 S 0 777 1 0 73 20 fb118988 160 fb118988 tty03 00:00:00 login 20 S 0 1344 777 0 73 20 fb11b080 68 fb11b080 tty03 00:00:00 sh 20 S 0 1351 1344 1 73 20 fb11e390 128 fb11e390 tty03 00:00:01 ksh 20 R 0 2190 1351 79 21 20 fb11e798 2064 - tty03 00:00:00 stackarray 20 S 0 2203 2190 3 73 20 fb11f3b0 60 fb11f3b0 tty03 00:00:00 sh 20 R 0 2204 2203 29 40 20 fb11f508 60 - tty03 00:00:00 sh 20 O 0 2208 2204 6 48 20 fb11f660 148 - tty03 00:00:00 ps
Note that this also means that if a program requests (allocates) but does not actually use memory (as was the case in the previous tests), you can have the strange circumstance where you have free memory (because no physical pages have been allocated) but you can't run any more programs because you have run out of virtual memory (availsmem). That doesn't mean you need more swap, it means you need more availsmem- adding either more swap or more real RAM will fix the problem.
Of course, if you want to run that right now, adding swap is easier than adding memory. To add swap, you could just do:
touch /mynewswapfile swap -a /mynewswapfile 256000which instantly and magically adds another 128 MB of swap to your system. Note that's not permanent; you'd need to redo it at every boot.
So how much swap do you need? Who knows? What kind of programs are you running? How much data and stack do they need? How much dump space do you think you'll need? That's the only way you could try to really figure it out; most folks just add 50% to memory and hope for the best. But if the only use is for dump, does that make any sense? Will you just be adding 50% more memory?
With today's large hard drives, it doesn't cost much to configure more swap than you think you'll ever need. When does it become ridiculous though? If you currently run 128 MB of memory, should you size for potentially having 256 MB? 512? A gigabyte?
If you have a separate dump device, and lots of real memory, you may not need much swap at all. I think I'd always configure some, just in case there's some kernel code somewhere that expects it, but that may not even be necessary- in fact, it's darn unlikely. You should certainly understand (despite the "common knowledge" to the contrary) that virtual memory doesn't NEED swap- swap can and will be used for vm, but it isn't REQUIRED. As for dump space, it's needed if you ever need it AND you expect to be analyzing the results. Otherwise, it truly is wasted space, and while today's hard drives are inexpensive, you might have better use for that space.
Bela Lubkin was kind enough to make some comments and suggestions on this article which caused to me to rewrite a few sections of it trying to make things clearer. Whether I succeeded or not, I thought it would be good to add his actual comments here also, and he agreed to publish his email. What follows is extractions from those emails with explanatory comments from me in italics.
Here I had said that I hadn't stressed "freemem" in the original article because it didn't seem important to me in the context of writing about swap:
(Bela)
It is definitely important. freemem measures the amount of actual RAM that isn't currently in use, while availsmem measures the amount of virtual space that hasn't yet been promised to someone. availsmem is the upper bound on how much [measured in terms of memory usage] you can run at all. freemem is the upper bound on how much you can run without actually performing swap I/O, which is rather costly in performance. If you were to graph performance vs. memory usage, you would see something like this:
100% |====================================
| /=====
| (1) =====
| =====
| /==
| (1) memory getting tight(*); kernel starts to (2) ==
| page non-dirty pages out of executable ==
| binaries and other such read-only sources ==
| ==
| (2) freemem approaching 0 (crosses GPGSHI), kernel /=
| starts paging dirty pages out to swap (3)=
| =
| (3) all inactive pages have been written out to swap(@); =
| active pages start getting written out to swap; =
| system starts to thrash =
|
0% +-----------------------------------------------------------------------------
(*) I'm not sure what the exact technical threshold is here. It
isn't GPGSHI, it isn't MINASMEM...
(@) This isn't a technical threshold; more of a user tolerance
threshold. As you run more stuff, and as that stuff touches
more of its memory more frequently, and as a higher
percentage of that memory gets pushed out to swap,
performance is going to degrade rapidly until the user finds
it intolerable.
Also, there's no real distinction between "active" and "not
active" pages, in this context. The question is, on average,
how long will it be before this page that's being written out
to swap will be needed in RAM again? The kernel has
strategies which make this average quite high when things
aren't too tight. As memory tightens, the average
necessarily goes down. When a significant portion of memory
accesses actually become disk accesses, performance is
extremely degraded; time to add more RAM.
freemem is important for performance; availsmem is important for being
able to run things at all -- quickly or not.
=============================================================================
The independence of these variables is also quite confusing. For instance, availsmem can approach 0 while freemem is still a large number. It's easy -- just run a lot of programs (like the examples in this question) which *allocate* a lot of memory, but never touch it. Suppose a system had 64MB RAM and 256MB swap. It would start out with availsmem around 80000 (*4K) and freemem around 15000. Now run 10 instances of a simple program that allocates 30MB of RAM, but doesn't touch it. These will decrement availsmem by about 300MB == 75000, leaving about 5000. But they won't take up an appreciable amount of real RAM, so freemem's still around 15000. Now try to run one more copy. There is still plenty of freemem; `sar -r` "freemem" looks fine. But you get EAGAIN because the program can't allocate another 7500 pages of availsmem.
What good is this mechanism doing? Well, what if all those programs *did* suddenly start touching their memory. The kernel would have to find actual backing store for those pages -- either RAM or swap. availsmem tells it how many pages of that backing store are not yet claimed. Thus, it can prevent a process from starting, which will require memory that might eventually not be available. The mechanism has an underlying assumption that processes will *not*, in general, allocate huge amounts of memory that they won't actually use. When that assumption is broken, the mechanism is over-protective -- it prevents you from using your RAM just because someone is hogging (and not using) address space.
I had responded with:
> And there's a common misunderstanding: most folks seem to > believe that it has to be swap, and it doesn't. If it did, > I'd never be able to run 60 MB of programs when swap was 1 > MB. I could be wrong, but I believe that the source of > this is that older Unices (like Sun 4.x releases) actually > DID require swap space, and couldn't use ram- but I don't > have one of those anymore to mess with, so I can't be sure..
(Bela)
Some versions of Unix use static mappings of virtual space to swap space. The act of allocating virtual space (whether through malloc() (== [s]brk()), growing your stack, fork(), or initial mapping of a process's .data and .bss) also allocates matching pages of swap. In such a system, the total weight of processes that you can run at once equals your swap space. OSR5 doesn't bind virtual space to specific swap pages; as a result, it can use *all* of the potential backing store -- both RAM and swap -- to hold the combined weight of processes.
Then I asked:
> One more thing: suppose you actually had 4 gig of RAM. I'd > assume, given what I think I know about OSR5, that there > would be no point in having swap (assuming separate dump) at > all- that availsmem couldn't exceed 4 gig anyway? Not > anything I can test with my cheap hardware!
availsmem is counted in 4K pages. 4GiB == 1048576. So availsmem itself doesn't limit (RAM + swap) to 4GiB.
I don't know the answer to your broader question. In principle, I see no reason you couldn't have multiple large swap spaces. No one swap area can be larger than 4GiB, but the total size of swap can be much larger. As long as the swap page number can be stored in a 32-bit integer, it should be usable. Of course, you could never have more than 4GiB worth of it in memory at once -- so if you were really using all that swap, the ratio of slow-access to fast-access pages would be bad -- performance would bite.
As long as I had him on the hook, I figured I might as well ask all my questions:
> Sun used to give a swap sizing guide for Solaris 2.x that > went down as memory increased, and swap disappeared entirely > eventually. Is there anything in OSR5 that would assume the > existence of a swap device and cause a problem if there > wasn't any?
I don't know, but I've deliberately run test systems with no swap at all (boot keyword "swap=none") for long enough to feel reasonably safe about it.
Publish your articles, comments, book reviews or opinions here!
© September 1999 A.P. Lawrence. All rights reservedGot something to add? Send me email.
More Articles by Tony Lawrence © 2011-03-12 Tony Lawrence
Better to fight for something than live for nothing. (George S. Patton)
Printer Friendly Version
Booting SCO_OSR5- Swap and Dump Copyright © September 1999 Tony Lawrence
Have you tried Searching this site?
This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.
Contact us
Printer Friendly Version