Fast P3 Memory Accessback
Sree found this wonderful article on SGI's site:
Of course, since the Visual Workstations are just P3's, these techniques are useful to a lot of people. The SGI article describes a technique to alleviate alignment problems, wildly improve caching, and use all the write-combining features of the P3. It's almost magic, it's so fast.
To give you a teaser, numbers from my P3/600:
Here's memcpy.zip [24k].
If you have an Athlon, I'd love to hear how this code runs there. Or if your machine is just amazingly fast, let me know that too. Here's the complete output for my P3/600, with a P3V4X motherboard:
(I repeat the same memfill test multiple times to test variability.)SGI ex1: 246.866850ms = 129.624532mb/sec SGI ex2: 156.955982ms = 203.878818mb/sec SGI ex3: 159.718903ms = 200.351990mb/sec SGI ex4: 102.957219ms = 310.808705mb/sec memcpy 183.452366ms = 174.432201mb/sec memfill 50.355841ms = 635.477418mb/sec memfill 50.456692ms = 634.207251mb/sec memfill 49.485340ms = 646.656166mb/sec memfill 49.908857ms = 641.168759mb/sec memset 93.590386ms = 341.915459mb/sec