2013 B-Sides San Francisco Talk Summary Series
Whenever I attend these talks, I always include a couple that are pure indulgence to keep me awake, sustain my enthusiasm, and broaden my knowledge. At DefCon there was one about using quantum physics for random key generation and another one using GPUs for massively parallel password cracking. Schuyler Towne’s locks talks are always a joy, and this talk fits nicely into that category. I really should say, “pure indulgence” is not entirely correct. While it is true that there will never be a one-domino causality chain from any of these indulgence talks I mentioned here to any security assessment code I might write for NTO, the stimulation of thought does seep into product and some things oblique to a particular software product like physics and numerical analysis do have a way of popping up in algorithms I write for the product.
What are side-channel attacks?
So first things first… I expect at least some of you, like me, had to look up “side-channel attacks.” There have been side channel attacks in the news recently, like the one last year where, as published in ThreatPost, a side channel attack was used to steal a cryptography key from co-locoated virtual machines. Wikipedia defines a side channel attack as “any attack based on information gained from the physical implementation of a cryptosystem, rather than brute force or theoretical weaknesses in the algorithms(compare cryptanalysis).” Side channel attacks have to do with measuring fluctuations in hardware and then intuiting the behaviour of an algorithm running on that hardware. Or, monitoring something related to the information you are pursuing and then doing further analysis of the monitored information to tease out the desired information.
Obtain RSA key by monitoring power usage, Passive methods
The first example the speaker addressed was ascertaining an RSA key by monitoring power usage of the CPU executing the algorithm. The RSA encryption algorithm bottom lines to a sequence of squares and multiplies. But the multiplies are executed only for 1-bits in the key. So what you see in the power graph is a sequence of spikes with time differentials between them that are proportional to whether or not a multiply was executed in that iteration and from this one can piece together the key. The countermeasure is to do a dummy multiply when the key bit is zero so each iteration does a square and multiply. This of course increases the execution time of the algorithm but it is also not a sure thing; the dummy multiply is still slightly different from the actual multiply though you do have to try harder to get the data. With this and other approaches the speaker discussed, a common denominator is that if you have alot of time with the device in question, you can simply do massive amounts of iterations and overwhelm subtleties with statistics.
Clarifying Statistics and Algorithms
Interesting related side note: I knew a guy on a previous job who did astronomical photography involving multiple all-night exposures of the subject being photographed (a galaxy in his case). It turns out that the more pictures you take of the same subject and then combine later, the more purturbances like atmospheric distortion are averaged out and the image becomes clearer. Statistics in general works like this. The persistent factors become ever more emergent and pronounced and the error ever smaller the more samples you take. Sometimes the algorithm such as ECDSA may power spike in such a way that you do not directly get the variable you are after but you get one of the variables in the formula and so with a bit of algebra and several iterations you can get what you are after. Also such things as the algorithm using 24 bit numbers and dealing with them 8 bits at a time can be used to analyse the power profile of the algorithm. Interestingly, the speaker said that even if the algorithm used 16 bit numbers, using an 8 bit approach gets you not as good but still usable correlations.
Side channel attacks – Active methods
That fairly accounts for the passive methods he discussed. He then went on to discuss active methods. These include glitching supply voltage, glitching the clock, and glitching the chip itself using powerful optical spikes. A well placed supply glitch introduces errors in the execution of the algorithm that can yield information as to the data it was dealing with when it errored. Clock glitches can cause the algorithm to skip instructions such as branches that can also produce useful data in the power signature. Optical glitches target specific parts of the chip with electromagnetic interference (light is an EM wave) which, again, can yield information via how they affect the running of the algorithm. Countermeasures to these techniques include inserting random waits before comparisons and doing multiple comparisons and requiring the results to be the same (being wary of compiler optimizations, i.e. turn them off).
As you would expect, these too can be circumvented but they make the attacker’s job harder. The data one gets from glitched execution of a crypto algorithm can in some cases be analysed by lattice methods. As the speaker said, he didn’t have time to fully elucidate this but in summary, one calculates a lattice and then calculate closest vector within that lattice (this is admittedly a glossover paraphrase of an admitted glossover to begin with) and it can be used to reconstruct crypto keys from the glitched and power-signatured algorithm.
This talk was most enjoyable to someone like me. In security, it is always valuable to be made to think about unexpected ways to acquire information since of course the more clever of the attackers are doing that. We have all noticed how computers have become orders of magnitude faster and more efficient. What once took hundreds of dollars worth of Cray time and about as much electrical power can now be done on a $300 computer for “too cheap to meter” electrical power. If you have ever designed anything around a 6502 chip, you know those old chips consume whatever power they consume nearly constantly regardless of what they are doing. This is not to say the methods elucidated in this talk would not work on a 6502 but modern chips that throttle themselves according to what they are doing greatly help these methods along compared to the old chips. The biggest software threat to security in the Apple-II days was getting a virus. On a computer that was not connected to the internet or any other communications net, not running services that listen for commands to execute, and barely fast/capacious enough to run the one program it was running, one didn’t worry about security much. But as we obsess on CSRF, XSS, SSL, SQLI, etc., we must remember that hardware has evolved with software and therefore hardware vulnerability has also evolved with software vulnerability.