RAM-Problems

And how to fight them


I have updated this page today (8.Nov.03) after a huge amount of work of getting redhat 9 running on the aero. It works now smoothly and I soon will add a new page for it. But from this install procedure I learned again a lot of things about the RAM-problem of the aero.

This problem affects everyone who has got the aero's RAM upgraded to 20 MB and wants to do networking (what else should one do with linux?)

The best proof that the solution written down here works is this website, provided by a server which runs on an Aero 4/25 with 20 MB RAM since February 18th.



If you don't want to do all the patching and compiling stuff right now...

If you followed this documentation to this point, linux ist just freshly installed. Maybe you first want to look around, configure some settings - and need therefore a stable system - without directly beginning with patching the kernel and compiling. And if you do all this directly on the aero (and not still with the hdd in the desktop) you anyway can't patch and compile on a machine that freezes and crashs.

So at this stadium of our install we simply disable pcmcia and avoid with this the ram-problem. For security I also disabled most daemons that linux starts at boot time.

To do all this:

Hit "i" when asked to in the boot process (the line "Hit i to enter interactive setup"). You then can choose each loaded daemon by yes/no. Choose "no" for all daemons.

After booting Red Hat you can choose the loaded daemons manually with the tool ntsysv, so give in the command: "/usr/sbin/ntsysv"

Most important is to disable the pcmcia daemon.


The RAM-problem

Linux on the compaq contura aero may become unstable on machines that have been upgraded to 20 MB RAM. This instability is caused by an invalid access to specific memory addresses above 16 MB. Linux conflicts here with the pcmcia-controller.

As workaround it is possible to run linux stable if you limit the aeros RAM with "mem=16M" at boot or by disabling PCMCIA. This is not satisfying. So let's take a look at the problem.

The invalid memory addresses can not be described as a fixed memory hole - it is more like a shadow, thrown by the controllers actually used memory adresses into an area above 16 MB.

One aero-user, Donald Gordon suspects, that this behaviour is caused by "incomplete address decoding. Some parts of the system, including the PCMCIA controller, may ignore the top 8 address bits, thus causing a false image of (for instance) the memory-mapped area used by the PCMCIA controller to appear at 16mb + expected location".

This 'false image' or shadow in the > 16 MB RAM adresses, thrown by the real memory addresses used by the pcmcia controller, was irrelevant with compaqs stock memory extension modules: They added only 4 or 8 MB to the aeros soldered in 4MB-RAM-chips and gave it a max of 12 MB RAM.

With the production of third party (Kingston) 16-MB-RAM-Extension-Modules for the aero in the late 90's, which gave the machine a max of 20 MB RAM, this shadow became visible -and a problem for linux users.

A problem that can be solved. Two things are here important:

  • An user-defined setting of the memory that should be used by the pcmcia-controller. This can be done with the line "include memory" in the configuration file "/etc/pcmcia/config.opts" of the pcmcia-cs-package by David Hinds.
    The package must be installed if you want to use pcmcia with kernel 2.2 or 2.4 and can be downloaded here:
    http://pcmcia-cs.sourceforge.net/

  • A patch for the kernel (2.2. or 2.4) called BadRAM by Rick van Rein (and maybe it's successor BadMEM by Nico Schmoigl, although I don't have experiences with BadMEM). It allows specifying the appropriate RAM-Adresses in the 'shadowed area' that should be avoided by linux.

    The BadRAM -patch can be downloaded from here:
    http://rick.vanrein.org/linux/badram/

    BadMEM is found here:
    http://badmem.sourceforge.net/

    Applied patches for redhat stock-kernels (2.4) can be found here:
    http://dynamicnetservices.com/~will/badram/
  • All BadRAM-patches that I have found, I have collected here.


    1. Specifying the memory for the pcmcia-controller

    With kernel 2.2 I used with success the ram-addresses "0xb0000-0xb7fff". So try to specify the ports and the memory that the pcmcia-controller shall use in /etc/pcmcia/config.opts with the following values:

    /etc/pcmcia/config.opts 
    
    include port 0x100-0x4ff, port 0xc00-0xcff
    include memory 0xb0000-0xb7fff
    

    All other "include memory"-statements must be outcommented.

    Kernel 2.4 (including BadRAM-patch) seems to react more sensitive to the ram-map provided by the aero's bios and reserves pages for the Video-RAM that formerly were available for kernel 2.2. That means with kernel 2.4 we need to search a new free RAM-adress-range for the pcmcia-controller. This can be detected from /proc/iomem. So boot linux with pcmcia disabled and command in the shell:

    cat /proc/iomem
    

    For me this created the following output:

    00000000-0009efff : System RAM
    000a0000-000bffff : Video RAM area
    000c0000-000c7fff : Video ROM
    000f0000-000fffff : System ROM
    00100000-013fffff : System RAM
      00100000-00240c09 : Kernel code
      00240c0a-00283d5f : Kernel data 
    

    Kernel 2.2 used the memory range 0xb0000-0xb7fff. Now, with Kernel 2.4 this range is reserved for Video RAM and can not be used. Still free seem to be the adress-range from Oxd0000 to 0xeffff. So lets try out these addresses:

    Set in /etc/pcmcia/config.opts

    include port 0x100-0x4ff, port 0xc00-0xcff
    include memory 0xd0000-0xd7fff
    

    All other "include memory"-statements must be outcommented. The above values work OK for me.



    2. Configuring BadRAM

    Now with specifying the memory-adresses for the pcmcia-controller the work is NOT done. We must keep in mind, that the memory range used by the controller throws a shadow of unusable ram-adresses into the > 16 MB range. These RAM adresses have to be excluded, otherwise linux will freeze if you enable pcmcia. This can be done with the kernel-patches BadRAM or BadMEM.

    Apply the appropriate BadRAM-patch for the kernel. The patch "BadRAM-2.2.19.A1.patch" also works for all above kernel 2.2-versions including 2.2.25. For kernel 2.4 there are patches from different contributers. I use the adapted version for the newest redhat kernel-source-2.4.20-20.9.i386.rpm.

    I noticed that the patch won't work, if the kernel is compiled with CFLAGS "-O3". -O3 is often used to get more performance from the compiled kernel. It can be manually set in some lines in the file /usr/src/linux/Makefile. The default in all kernels is "-O2" and works fine with BadRAM. So if you didn't already change the Makefile by hand, don't worry about this.

    See chapter Patching and compiling a new kernel for information about how to apply a patch.

    After patching the kernel you have a new kernel-configuration-option under General setup that is called

    CONFIG_BADRAM=y
    

    In some kernels this option may be found under "kernel-hacking". Please take care to enable that option.

    Now specify the RAM-adresses you don't want to use:
    either at boottime with the command

    linux badram=value1,value2
    

    or in lilo.conf with the line

    append="badram=value1,value2"
    

    The two values specify a so called V1-pattern. value1 is supposed to be the first RAM-address to exclude, value2 is a so called mask. So the second value is NOT the last address to exclude but (as I understand) a description of the geometrical form of the RAM area that should be excluded. This is OK for the original purpose of BadRAM: Using defunct RAM-Modules. Defect areas of a RAM bank don't stretch over a one coherent logical adress range, they stretch over a geometrical area on the physical ram-module.

    A description (unfortunately in german) of these patterns can be found in a report by Nico Schmoigl for the Linux Magazin.


    Now how to find out the correct values?

    The value2 was provided by Donald Gordon in his mail to the aero-usergroup. It is:

    0xffff8000
    

    value1 depends on the address-range, specified in /etc/pcmcia/config.opts for the pcmcia-controller.
    If you used

    include memory 0xb0000-0xb7fff
    

    the correct value1 for BadRAM would be

    0x010b0000.
    

    So these are settings for Badram corresponding to the settings for the pcmcia-controller in /etc/pcmcia/config.opts:

    config.opts	corresponding BadRAM-statement
    
    0xb0000-0xb7fff		0x010b0000,0xffff8000
    0xd0000-0xd7fff		0x010d0000,0xffff8000
    0xe0000-0xe7fff		0x010e0000,0xffff8000
    

    and so on.

    For instance with kernel 2.4 I have in /etc/pcmcia/config.opts the line:

    include memory 0xd0000-0xd7fff
    

    So in lilo.conf I must use the line

    append="badram=0x010d0000,0xffff8000"
    

    Now we paid respect to the 'shadow' the pcmcia-controller throws into the >16MB RAM-range, we excluded these areas with the help of BadRAM and can now (after a lilo -v) safely reboot the aero and use pcmcia.


    3. Conclusion

    Running Linux on laptops meant once again that users have to work around errors that were made by the manufactures, in this case compaq. On the other hand I can not blame them: They never thought that the contura aero would once run with more than 12 MB RAM and never dreamed of Red Hat 9 installed on it instead of windows 3.1. ;-)

    My thanks go to David Hinds who helped with practical tips to find the correct RAM-values. And to the linux-community for providing all the information, knowledge and software.

    Linux means learning. Thanks for helping. Hope this may help others.

    Uli

    The exact procedure of patching and configuring will be described in the further chapters "Patching and Compiling a new kernel", "Solving the RAM-problem" and in the section "PCMCIA", subsection "The BadRAM-options for the card-manager".



    Annotations:

    Interesting feature for 2.4-kernels

    With the Linux-kernel 2.4 there is an boot-option called "mem=exactmap" in 2.4 kernels that can be passed to the kernel.

    In difference to the BadRAM-patch (where you can EXCLUDE special addresses) you can pass with that option the memory instead that the kernel is SUPPOSED TO USE with your machine.

    In the lilo-configuration file /etc/lilo.conf you could add:

    append="mem=exactmap mem=X@Y mem=X@Y"
    

    when X and Y stand for the hexadecimal memory-addresses.
    The whole procedure is described f.i. at

    http://www.geocities.com/rlcomp_1999/memory.html

    This solution may turn out very hard to realise. BadRAM uses geometrical patterns of affected RAM. exactmap uses coherent adress-ranges. So I would guess it would be hard bringing together all the knowledge to specify all the parts of the RAM that are not affected by the "shadow" of the pcmcia-controllers memory-usage. You also would have to change all this if you want to allocate the pcmcia-controller somewhere else (i.e upgrading from 2.2 to 2.4 kernel.

    If someone knows more about it, he is much invited to mail or use the comment-form below.




    memtest86

    The author of the BadRAM-patch, Rick van Rein suggests to test the RAM with the test-software memtest86 (memtest 3.0) The software can be found at:

    http://www.memtest86.com/

    Although RAM-testing can never be wrong I want you to keep in mind that the aeros problems with ram addresses are not caused by defect RAM-modules (see above). Nevertheless I tried out those tests. They took quite some time on the aero. I did it with 16 runs of all 7 default-tests in one and a half DAY (!) and as expected it did not show any errors.




    The original finder of the BadRAM solution: Donald Gordon

    The idea to use the BadRAM-patch to use all the 20 MB of RAM with the aero came from a mail to the aero-list by Donald Gordon from 10.07.02.


    
    ---------------------snip----------------------------
    
    I've got PCMCIA and the 20mb upgrade working under 
    linux on the aero. The problem appears to be incomplete
    address decoding (at a guess) - Isuspect some parts of 
    the system, including the PCMCIA controller,ignore the 
    top 8 address bits, thus causing a false image of e.g. 
    thememory-mapped area used by the PCMCIA controller to
    appear at 16mb+expected 
    location.
    
    The solution (works for me on 2.2.20 and debian woody) 
    is to patch yourkernel with Rick van Rein's BadRAM patch.
    e.g. If you tell cardmgr toonly use memory locations 
    0xb0000-0xb7fff, then tell badram to exclude 16mb + this 
    area with the kernel parameters
    
    "badram=0x010b0000,0xffff8000".
    
    This only wastes 32kb of RAM instead of the 4mb you 
    give up when you use"mem=16384k". 
    
    So far it appears to be as stable, too.
    
    don
    
    
    -----------------snap---------------------
    




    Comments

    Suggestions for this page? Ideas? Please drop a note!
    Don't forget to add your email, if you appreciate a personal reply.
    The comments are sorted from date.

    previous previous index index next next


    Home   ·   manual & docs   ·   drivers & updates   ·   original software
    linux   ·   internal speaker   ·   ads   ·   the aero's wildest dream...   ·   links