|
<< BACK
TO SUMMARY
Memory Errors and Testing?
Most of the time when Windows freezes up its the fault of
Windows itself, sometimes though its not the software but
the hardware, and in those cases the memory is almost always the
culprit. Fortunately, the Micro-Scope diagnostic software is an
excellent tool to figure out which is which.
To use Micro-Scope to its best advantage, it helps to understand
not only the various Micro-Scope tests, but also the different types
of memory error. There is a lot to say on the topic, so this week
we will cover memory errors and some data about memory testing in
general, and next week we will go over each of the memory tests
available in Micro-Scope.
Types of Errors
On the subject of memory errors, there are basically three categories:
hard errors, timing errors, and soft
errors. These are further broken down into data errors
and addressing errors.
Hard Errors These would be any
permanent errors that occur without fail, and thus are easy to reproduce
and diagnose. These are usually caused by shorted or open connections
in the RAM chips or modules. They include:
- Stuck bits, either singly stuck as ones or zeros, or stuck together
so that the value of one is always matched to another.
- Address errors, where attempting to write Address A always writes
Address B instead. (Address A may also be written.)
Timing Errors This type of error
is harder to spot. They occur infrequently, more often during a
period of heavy RAM access. The errors can be either incorrect data
or an incorrect address, and each may involve a different address
or data than the previous error. One cause of timing errors is over-clocking,
by setting the RAM clock higher than the spec, or by putting in
DIMM modules with a lower speed rating, which amounts to the same
thing. You can also get errors if the clock is too slow, because
the memory cells must be refreshed periodically, and
the refresh timing signal is derived from the clock. Rounding out
the picture, dirty contacts on the DIMM or an uneven clock signal
can give symptoms that are indistinguishable from over or under
clocking.
Soft Errors This is a catch-all
term for errors that do not fit the above categories. It includes
errors that only happen once, or ones that are intermittent like
timing errors but dont have anything to do with timing. They
can be caused by faulty RAM chips, dirty or intermittent contacts,
or by external factors such as a nearby noisy power line or device.
An often overlooked source of one-time errors is cosmic rays. There
is no point trying to track down these or other one-time errors,
but tests designed for timing errors should also catch intermittent
soft errors.
Memory Testing Pitfalls
There are some engineering considerations that are peculiar to
memory testing. You dont necessarily have to know these to
test your memory, but you might find them interesting, and its
definitely worth knowing when considering the features in Micro-Scope.
System Cache Memory The idea behind cache
is that you have a small amount of higher-speed (but more expensive)
memory right next to the CPU, and the system tries to anticipate
what RAM contents the CPU will want next and have those waiting
in the cache. The CPU gets data from the cache rather than system
RAM whenever it can, because its faster. There are two approaches
to making this work. One assumes that the CPU will access the same
data again, so the cache is filled with whatever the CPU has been
using most recently. The other idea is called look-ahead
cache and it assumes that data is used in blocks, so whatever location
is being accessed, the following locations are what will be needed
next. Most systems use some balance of these two methods.
Cache speeds up system performance beyond a doubt, but must be
circumvented in a memory test to be sure you are testing a particular
RAM cell and not the cache chip. For instance if you want to test
a block of memory addresses, as soon as the first address is accessed
the following addresses are loaded into cache and CPU will access
them in the cache rather than in system RAM. The RAM is not being
tested at all, except for the first address.
There are two ways around this problem. On 486 and later processors,
there is a CPU instruction to invalidate the cache.
This tells the processor that the cache is incorrect or invalid
and that the next read must be done from system RAM, not from the
cache. However, running an extra command after every read makes
for a very lengthy test. A better way to solve it is to fill all
of the memory area with the test pattern before reading it, so that
all of the original values have been moved out of the cache. This
works well for extended memory, because there is much more system
RAM than there is of cache memory. This is not necessarily true
for Base Memory though, because many systems have more than 640KB
of cache. For Base Memory, the only choices are to run the Invalidate
Cache command after every read or have the user turn off caching
in CMOS before running the test.
Line Charges This has nothing to do with American
football. A line charge is the residual electrical field that remains
for a short time on a signal line and makes it appear that the signal
is still there. This applies to memory testing because if a location
is read too quickly after it is written, the data lines are still
charged with the correct value and any timing or refresh errors
may be masked. This can be easily avoided by not reading the same
location immediately after writing to it, but the problem has to
be understood and accounted for in the design of the test.
Restoring Memory Although sloppy test designers
can (and often do) ignore the previous two situations, they all
have to deal with this one. Even while the test is busy writing
and reading its own values from various RAM locations, there are
certain RAM contents, especially in base memory, that must be preserved
and then properly restored in order for the test program to continue
running and for the operating system not to crash.
This is actually less of a problem with Micro-Scope and its bootable
operating system, because we allow you to boot to the Base Memory
test, or to the full diagnostic with multiple memory tests for everything
except base memory, but you cant do both at the same time.
For those of you who stayed awake nights wondering why we did it
that way, now you know.
Disclaimer - The Micro 2000 Tech Tip is a free service
providing information only. While we use reasonable care to see
that this information is correct, we do not guarantee it for accuracy,
completeness or fitness for a particular purpose. Micro 2000, Inc.
shall not be liable for damages of any kind in connection with the
use or misuse of this information.
|