Ego,
Thanks for all your options. :) It is appreciated!
I'll keep in mind that a small amount of garbage collection is probably not a bad thing. I only got a few hiccups from GC when I was very stupidly allocating 1,000,000's of small classes to store a rectangle (it should have been a struct, so it was on the stack) over the course of a minute. I almost barely noticed the GC handling all of that, so I'm sure even if I new'ed a 4 Color element array 60 times a second, it would be nothing for the GC to handle. Right now, the GC isn't invoked at all during my game play, so I have a lot of room to work with.
I could unroll the loop, but it seems more complex than what it's worth. Thanks for the suggestion, though.
Thanks for the info regarding how easy it is to do my own custom thread local storage. Yes, all I have to do is allocate it myself and pass it into the threads, so they each have their own copy. Simple. :)
A simple thread local storage array is definitely the answer, it's just too bad the code to do it wasn't as easy as it would be if the Xbox Compact .NET framework supported it like it does on the PC.