« | Home | Categories | »

T2000 in non-technical terms

Posted on March 26th, 2006 at 11:45 by John Sinteur in category: Sun Coolthreads T2000 -- Write a comment

Most of the reviews I’ve found on the net for this machine are pretty technical in nature. Let’s see if I can write something you can use to convince your non-technical manager that this computer is interesting. Again, feel free to ask for clarification on any point..

Most people are only familiar with the computer that sits on their desk and is used to browse the Internet. It has one important chip in it, usually the one that gets them an “Intel Inside” sticker on the box, and the salesman has told the buyer that this “CPU” is a “3 GHz pentium”, and it is very fast. And, indeed, it is. This T2000 computer from Sun also has one important chip in it, the CPU, and it’s “only” 1 GHz. So if you didn’t know any better, you’d think this machine would be a third of the speed of the machine you’re browsing on, right? Well, no. Not at all.

That Intel chip generally has one “core” on it – you could compare it with an office where one person was working very fast. The Sun Coolthreads chip has more than one core on it – compare it with an office that has 32 people working in it. Although each of those 32 people is working slower, together they can perform massively more work than the office with one person in it.

This kind of multiple-core technology will reach the desktop soon enough – Apple already has a laptop available with a new Intel chip, the “Core Duo”, and if you guessed from its name that you could compare this with an office with 2 people working in it, you’d be right.

This trend to change chips from “one core” to “more cores” has been going on for a while now. Why did that happen, exactly?

Apologies to Intel, but I’m going to use them to illustrate some points, but the things I say are more or less true for the entire industry. Intel just happened to be the most visible example of all this. A few years ago, there was a “MHz war” going on. Intel and their competitors (both AMD and the Power consortium) were both marketing to their customers with “we are faster because our chip runs on a higher rate”. Although this used to be true, it also forced the engineers at those companies to look at just one thing for the next version of their chip: higher frequencies. One of the things the engineers realized is that if you chop your work into smaller pieces, you can get through more pieces per second, and thus get a higher rate. How does that work for a computer? Let’s say we have a program in memory that says “add one to whatever number is in that piece of memory”. Sounds like a simple thing, right? But you can split it into a lot of small steps: fetch the program instruction from memory to the chip, decode it to see what it wants to do (“add 1”), fetch the number that is currently in memory, add one to it, store it back into memory. The trick Intel and others invented was to have different parts of their chip do the different steps. In older chips, after the “fetch the program instruction from memory” bit was done, no new “program instructions” were fetched until the entire “add 1” operation was completed. These days, that’s no longer true. The part of the chip that gets instructions on what to do from memory will go on and fetch the next part from memory while the rest of the chip is still busy adding one to a number. Clearly this can be faster than waiting, however, suppose the next instruction after “add 1” is “if the result is 100, do this otherwise do that. Which flow of instructions are going to be fetched next? Modern chips have logic for that, called “branch prediction”, and the chip will take a guess. Sometimes this guess is wrong, of course, and then the chip has to back up and redo a small bit of work. The overall speed gain from “guessing” is worth the occasional miss.

This idea is called “pipelining” and it was so perfected in the Pentium 4 that the chip is known to have a “long pipeline”. That means that the entire process of doing work on the chip was chopped into so many pieces it has to go through a fairly large number of stages to get done. The advantage is high clock rates and thus good marketing material, but a fairly large “cost” in speed if somewhere in the pipeline it is discovered the wrong “guess” was made. Every time that happens a large part of the pipeline is cleared and must be refilled, and that costs you speed and processing power.

The Pentium 4 competitors used different methods to get speed from their processor, and although they’ll claim a lower frequency (AMD is typically at 2 or 2.2 GHz) they’ll give you the same amount of actual work as a Pentium 4 at 3 GHz.

A while ago Intel engineers found themselves running into a few technical problems getting the clock rate any higher, and since then it has become clear that to get the speeds up again, something else had to be done. The new Intel “Core Duo” is the first big result. Intel went more or less back to the Pentium 3 chips, and evolved from there, in a different direction. Instead of makeing a longer pipeline, they doubled the chip. Instead of one program doing “add 1 to a number” the chip can have two programs both doing “add 1 to a number” at the same time. This isn’t really a new idea, it had been done before – either by actually sticking to processors in the same computer, or by dividing up work in the chip itself. For example, doing a calculation with two “floating point” numbers (such as 3.14159 times 2.718) would be done in a different part of the chip from doing calculations with “integer” numbers (such as 2 times 3), and those two parts could work on a different calculation at the same time. This multiple-core thing is more or less the same, except now just about all the functionality of the chip is duplicated. There’s a whole bunch of extremely technical stuff I’m glossing over right now, such as sharing the memory that is on the chip (level 1 and level 2 cache) but if you want to read about that, there’s other places on the Internet than this post.

So, back to the T2000 and the CoolThreads chip. The Intel Core Duo presentes itself als two processors to the operating system. The CoolThreads chip in the T2000 I’m evaluating presents itself not as one, not as two, but as 32 processors. Not blazingly fast processors, each presents itself as a 1 GHz chip, but it sure makes that up in quantity.

It also means there are a lot of things this chip is not good at. You probably would not want one to run Word or Excel on it. In that case, you’d be doing one thing only, and you would get one of those 32 parts working for you while the other 31 would sit by idle.

Sun also clearly states on their web site that this chip is not good at doing floating point calculations. So, if you need a machine that is good with floating point and has multiple processors, you’re probably still going to end up with a Enterprise 6900, which has 24 seperate UltraSparc IV chips, each at 1.5 GHz. But that machine costs a cool million dollars, and my T2000 is listed for just a little bit more than 12,000 dollars. Clearly the T2000 is limited compared to the E6900, but there’s a few things the T2000 excels at (and might even give the E6900 a run for its money – I’d love to test drive one of those for 60 days and find out).

The Sun web site calls it “the fastest web server”, and not without reason. Let’s look at what a busy webserver does: serving lots and lots of people web pages. Some of those pages will need to be generated on the fly (for example because they contain personalized information). Lots and lots of websites these days do it on a server with a Pentium 4 in it. The server hosting this weblog, for example, is a computer with 1 chip in it, a Pentium 4. That may change in the coming week or so, as the software I’m installing on the T2000 should be able to handle my weblog nicely, and that’s a great thing to try. I get between 3000 and 4000 visitors on my weblog per day, spread out over the day. That’s not much, but imagine the other website I’m working on, where those same 3,000 to 4,000 visitors browsing the site at the same time would be considered a quiet moment. Now mind you, when I say “at the same time” I mean they’re browsing at the same time, and requesting something like ten to fifteen webpages during the ten minutes that their visit lasts.

Back to that same image of the office with one worker in it – if that one worker had to serve webpages to those 4,000 people, that one worker would have to switch jobs a lot – so often, actually, that the overhead of switching would hurt performance. That same office with 32 slower workers however, would serve those 4,000 people a lot better. The kind of work (compositing web pages and handing them out through the net) and the nature of the work (lots, lots and lots of jobs that have no dependency on each other – each visitor gets their own web pages and they have no relationship to the other 4,000 pages generated at the same time) makes the T2000 a perfect match.

Now, since most of my work involves getting webservers to handle lots and lots of visitors, you realize why I’m testdriving it.

Whilst installing software I’ve already seen the first effects of the way this machine works. Building software is sequential work – the compiler will generally only do one thing at a time. And the machine does not “feel” fast when I’m doing that. But for some software I can install two parts at the same time – so I open a second window, and start a second build in that window. That’s not the way I’m used to doing things, since when you do that on a machine with a Pentium in it, you’ll notice both builds will indeed go slower. The total amount of time it takes to build both pieces of software remains the same (or sometimes goes up since you add work-switching overhead). With this T2000, that is very clearly not the case. Build three or four pieces of software at the same, and you won’t notice any slowdown in any build. It helps that the machine has nice little fast disks, of course, since the build results need to be stored on disk, but it’s a nice indicator of things to come.

  1. Hey John, will Sun give me a free server box if I write about it on my weblog?

  2. PS: why don’t you use a parallel make for large builds instead of starting another build in another shell?

  3. I will do a parallel build later this week, I haven’t got my toolset complete yet.

  4. Ask them 😉

    When you fill out the forms they want to know why you’re evaluating the machine. For my work, there’s a clear and obvious need to know if this machine is useful – and the try-60-days program is a god’s end. Rumor has it that they are indeed saying “keep it” to some people who write about it, but I would have written this anyway, it’s a good way to organize my thoughts, and I have to present my findings to others as well. That will be a more formal report, but my informal thoughts are highly regarded in KPN 🙂

    Having said that – I’m sure somebody at Sun will be reading this weblog, and indeed this comment, in the near future, so let the record show that I would welcome such a message from Sun – it’s a very nice machine, but since my boss pays me more than this machine is worth, it will not influence the conclusions I am going to reach a few weeks from now.

previous post: Two years in prison for downloading latest film

next post: Apple should copy Microsoft on security