## Supercomputers Vs Mobile Phones

June 2nd, 2010 | Categories: Android, java | Tags:

Ever wondered how fast the fastest computer on Earth is? Well wonder no more because the latest edition of the Top 500 supercomputers was published earlier this week. Thanks to this list we can see that the fastest (publicly announced) computer in the world is currently an American system called Jaguar. Jaguar currently consists of 37,376 six-core AMD Istanbul processors and has a speed of 1.75 petaflops as measured by the Linpack benchmarks. According to the BBC, a computation that takes Jaguar a day would keep a standard desktop PC busy for 100 years. Whichever way you look at it, Jaguar is a seriously quick piece of kit.

All this got me thinking….how fast is my mobile phone compared to these computational behemoths?

The key to answering this question lies with the Linpack benchmarks developed by Jack Dongarra back in 1979.  Wikipedia explains:

‘they [The Linpack benchmarks] measure how fast a computer solves a dense N by N system of linear equations Ax = b, which is a common task in engineering. The solution is obtained by Gaussian elimination with partial pivoting, with 2/3·N3 + 2·N2 floating point operations. The result is reported in millions of floating point operations per second (MFLOP/s).’

People have been using the Linpack benchmarks to measure the speed of computers for decades and so we can use the historical results to see just how far computers have come over the last thirty years or so.  Back in 1979, for example, the fastest computer on the block according to the N=100 Linpack benchmark was the Cray 1 supercomputer which had a measured speed of 3.4 Mflop/s per processor.

More recently, a Java version of the Linpack benchmark was developed and this was used by GreeneComputing to produce an Android version of the benchmark.

I installed the benchmark onto my trusty T-Mobile G2 (a rebadged HTC Hero, currently running Android 1.5) and on firing it up discovered that it tops out at around 2.3 Mflop/s which makes it around 66% as fast as a single processor on a 1979 Cray 1 supercomputer.  OK, so maybe that’s not particularly impressive but the very latest crop of Android phones are a different matter entirely.

According to the current Top 10 Android Linpack results, a tweaked Motorola Droid is capable of scoring 52 Mflop/s which is over 15 times faster than the 1979 Cray 1 CPU.  Put another way, if you transported that mobile phone back to 1987 then it would be on par with the processors in one of the fastest computers in the world of the time, the ETA 10-E, and they had to be cooled by liquid nitrogen.

Like all benchmarks, however, you need to take this one with a pinch of salt.  As explained on the Java Linpack page ‘This [the Java version of the] test is more a reflection of the state of the Java systems than of the floating point performance of the underlying processors.’ In other words, the underlying processors of our mobile phones are probably faster than these Java based tests imply.

1. Really enjoyed reading this post. Impressed by the capabilities of mobile.

2. I just tried to make sense of a number like 1.76 PFlops. I believe that if 1 Flop is equivalent to, say, Graeme Swann bowling a delivery in one second…. then 1.76 PFlops is bowling the ball to Alpha Centauri. (Or thereabouts.)

3. I think you’re misreading the results in the websites to which you link. The number’s you quote for the Cray-1 and ETA-10 are linpack runs with n=100. It’s a small test, and thus gets very little of peak performance. I suspect the linpack you’re running on android is n=1000 or higher. If you look at the n=1000 results for old supercomputers, you’ll find that they get a much higher percent of peak performance. While there’s no listing for the Cray-1, it would probably clock in around 100 mflops, if you extrapolate from the XMP.

The point is true that modern devices perform well compared to decades-old computers, but I think you’ve greatly exaggerated the difference.

4. The Linpack benchmark needs to be recompiled with the Android NDK for more accurate results…

5. Wikipedia quotes the Cray-1 performance at 136MFlops (sustained) and up to 250 peak.

6. Android 2.2 supposedly see a 2x – 5.5x speed up due to improvements in Java execution. It would be interesting to see how any phone upgraded to 2.2 ranks on the charts.

7. Matt :I just tried to make sense of a number like 1.76 PFlops…. then 1.76 PFlops is bowling the ball to Alpha Centauri. (Or thereabouts.)

And that made it totally clear to you? It made me boggle even harder.

8. Very true, I was quite surprised to read your post and look it up myself to see that. Though, you have to consider that these type of computers were specifically optimized around these type of computations, as this was usually what they were for anyways, scientific / military number crunching… I notice that it said that the cray 1 was a 64 bit machine and had multiple memory paths so that may help.

Also being that I think these computers used n=100 for it’s performance, it didn’t use a standarized test. it basically sounds like they just gave it a specific type of computation to solve and the engineers were allowed to use whatever they wanted and optimize whatever they needed for the best results in returning the answer… not a luxury afforded to us today really.

9. Matt :I just tried to make sense of a number like 1.76 PFlops…. then 1.76 PFlops is bowling the ball to Alpha Centauri. (Or thereabouts.)

And that made it totally clear to you? It made me boggle even harder.

It made it clear how mind-buggeringly huge that number is. When I said “make sense of” I meant “find a way to conceptualize/visualize”.

10. I didn’t know anything about these benchmarks before now – but there’s some nice papers on how they work, and the way the benchmarks have had to evolve over time with changing technology and architecture. (Thought this one was the pick of the bunch, http://www.mathworks.com/company/newsletters/news_notes/pdf/sumfall94cleve.pdf).

Maybe a larger matrix size might perform better on a Cray-1 in terms of raw calculations – but according to Wikipedia (which is of course always accurate…!) the Cray-1 CPU had no data cache (although it did have a lot of registers amounting to maybe 4k?), so maybe the overhead of additional shunting data around slow memory to process a larger matrix would offset advantages in raw number crunching. Sure, Wikipedia quotes sustained 136 and peak 250MFLOPS… but… I’m surprised sources and citations are not referenced, so we don’t know anything about the nature of the processing involved and so we don’t know how to compare to what the benchmark is doing. The kind of ARM processors in Android handsets might have 32kb of cache memory available to hold data for the CPU.

After have a good nose around wikipedia articles and the like, I thought there were a few more interesting comparables.
– The Cray-1A and the HTC Hero technically have the same number of CPUs – just one. The Cray’s CPU is that tower full of logic boards and cabling.
– All-in, the HTC Hero weighs 135g, whereas the Cray-1A weighed in at 5.5 tons (about 37,000 times as much).
– The Cray, with all its paraphernalia, had a power consumption of around 230kW – I’ve not found exact figures for a Hero, but 10mW is likely a conservative estimate meaning that you could power around twenty-three million phones or one Cray-1A.
– The cheapest Cray-1 sold cost \$5million in 1980s money. HTC Hero – free! OK, ok you need to pay a contract, I wonder what the maintenance contract cost on one of those supercomputers?
– The Cray-1A had one million words of main memory, which I reckon is about 8 megabytes on a 64-bit system. The Hero has 288MB of main RAM – maybe it’s not fair to count the 32GB SD Card…!
– The Cray didn’t have a touch-screen, accelerometers, wifi or a camera! :) In fact the Cray needed a minicomputer to be plugged into it to operate it.

Great post at any rate, it’s entertained me for ages and I think I even learned some stuff! (Hope I didn’t mess up my arithmetic in those comparisons!)

11. I’d like to see it running on mobile GPUs too. They are supporting OpenCL very soon, so I think its fair to run the benchmark on both (CPU + GPU) and after all, sum the results, because they operate in parallel, after that the results will be much more impressive:P

12. A stimulating article, I thought – just the idea of trying to use a “phone” to perform a calculation for which a “supercomputer” would have been used just forty years ago is the stuff of science fiction. In fact, that’s literally true: only this morning, I was reading William Gibson’s short story “New Rose Hotel” (first published in 1981), in which he casually refers to a hand-held “Cray minicomputer”. It sounds as if that’s what we’ve all got in our pockets today.

13. Consider also that the US Military classified these supercomputers as munitions and forbade export of them to certain countries. When Sony’s PlayStation2 came out it fell into this situation, and it is dramatically less powerful than the latest smartphones.

Due to lack of floating-point hardware, desktop machines circa 1984 were capable of about 0.05 MFLOPS. Now they are in the 100,000-2,000,000 MFLOPS range, and mobile devices aren’t far behind. This is a staggering increase in capability. Since memory and I/O bandwidth hasn’t scaled as quickly, there are all sorts of fascinating implications for software developers.

14. Some interesting points:

The Cray 1 is (as they are still used) vector processing which Intel is not, however the PS3 has 9 vector processors.

The Cray 1 has a 1024bit data bus that you can funnel data through 24-7

You need a licence from the government to buy a Cray 1 as it is fast enough to do the calculations to make nuclear weapons.

A student at Imperial College wrote a real time radiosity engine then bolted it onto Quake and use 23mins of their 30mins a year processing time before being kicked off.

The Cray 1 does not have a CPU, it is a CPU, hence why you need a mini computer to control it – it is 70,000 NAND gates wired together. Unlike silicon they pass electrons ultra fast and therefore the actual processing time is minute compared to silicon. Send it a problem and it spits 90Gbytes of data back, you then spend a month analysing the results on your PC!

Good page, thanks for the article! :o)