The Center for Education and Research in Information Assurance and Security (CERIAS)

The Center for Education and Research in
Information Assurance and Security (CERIAS)

CERIAS Blog

Page Content

Follow-up on the CA Hack

Share:

Yesterday, I posted a long entry on the recent news about how some researchers obtained a “rogue” certificate from one of the Internet Certificate Authorities. There are some points I missed in the original post that should be noted.

     
  • The authors of the exploit have a very readable, interesting description of what they did and why it worked. I should have included a link to it in the original posting, but forgot to edit it in. The interested reader should definitely see that article online, include the animations.
  •  
  • There are other ways this attack can be defeated, certainly, but they are stop-gap measures. I didn’t explain them because I don’t view them as other than quick patches. However, if you are forced to continue to use MD5 and you issue certificates, then it is important to randomize the certificate serial number that is issued, and to insert a random delay interval in the validity time field. Both will introduce enough random bits so as to make this particular attack against the CA infeasible given current technology.
  •  
  • I suggested that vendors use another hash algorithm, and have SHA-1 as an example. SHA-2 would be better, as SHA-1 has been shown to have a theoretical weakness similar to MD5, although it has proven more resistant to attack to date. Use of SHA-1 could possible result in a similar problem within a few years (or, as suggested in the final part of my post, within a few weeks if a breakthrough occurs). However, use of SHA-1 would be preferable to MD5!
  •  
  • MD5 is not “broken” in a complete way. There are several properties of a message digest that are valuable, including collision resistance: that it is infeasible to end up with two inputs giving the same hash value. To the best of my knowledge, MD5 has only been shown to be susceptible to “weak collisions”—instances where the attacker can pick one or both inputs so as to produce identical hash values. The stronger form of preimage resistance, where there is an arbitrary hash output H and an attacker cannot form an input that also produces H, still holds for MD5. Thus, applications that depend on this property (including many signing applications and integrity tools) are apparently still okay.
  •  
  • One of our recent PhD grads, William Speirs, worked on defining hash functions for his PhD dissertation. His dissertation, Dynamic Cryptographic Hash Functions, is available online for those interested in seeing it.

I want to reiterate that there are more fundamental issues of trust involved than what hash function is used. The whole nature of certificates is based around how much we trust the certificate authorities that issue the certificates, and the correctness of the software that verifies those certificates then shows us the results. If an authority is careless or rogue, then the certificates may be technically valid but not match our expectations for validity. If our local software (such as a WWW browser) incorrectly validates a certificate, or presents the results incorrectly, we may trust a certificate we shouldn’t. Even such mundane issues as having one’s system at the correct time/date can be important: the authors of this particular hack demonstrated that by backdating their rogue certificate.

My continuing message to the community is to not lose sight of those things we assume. Sometimes, changes in the world around us render those assumptions invalid, and everything built on them becomes open to question. If we forget those assumptions—and our chains of trust built on them—we will continue to be surprised by the outcomes.

That is perhaps fitting to state (again) on the last day of the year. Let me observe that as human beings we sometimes take things for granted in our lives. Spend a few moments today (and frequently, thereafter) to pause and think about the things in your life that you may be taking for granted: family, friends, love, health, and the wonder of the world around you. Then as is your wont, celebrate what you have.

Best wishes for a happy, prosperous, safe—and secure—2009.



[12/31/08 Addition]: Steve Bellovin has noted that transition to the SHA-2 hash algorithm in certificates (and other uses) would not be simple or quick. He has written a paper describing the difficulties and that paper is online.

 

A Serious Threat to Online Trust

Share:

There are several news stories now appearing (e.g., Security News) about a serious flaw in how certificates used in online authentication are validated. Ed Felten gives a nice summary of how this affects online WWW site authentication in his Freedom to Tinker blog posting. Brian Krebs also has his usual readable coverage of the problem in his Washington Post article. Steve Bellovin has some interesting commentary, too, about the legal climate.

Is there cause to be concerned? Yes, but not necessarily about what is being covered in the media. There are other lessons to be learned from this.

Short tutorial

First, for the non-geek reader, I’ll briefly explain certificates.

Think about how, online, I can assure myself that the party at the other end of a link is really who they claim to be. What proof can they offer, considering that I don’t have a direct link? Remember that an attacker can send any bits down the wire to me and may access to faster computers than I do.

I can’t base my decision on how the WWW pages appear, or embedded images. Phishing, for instance, succeeds because the phishers set up sites with names and graphics that look like the real banks and merchants, and users trust the visual appearance. This is a standard difficulty for people—understanding the difference between identity (claiming who I am) and authentication (proving who I am).

In the physical world, we do this by using identity tokens that are issued by trusted third parties. Drivers licenses and passports are two of the most common examples. To get one, we need to produce sufficient proof of identity to a third party to meet its standards of proof. Then, the third party issues a document that is very difficult to forge (almost nothing constructed is impossible to forge or duplicate—but some things require so much time and expenditure it isn’t worthwhile). Because the criteria for proof of identity and strength of construction of the document are known, various other parties will accept the document as “proof” of identity. Of course, other problems occur that I’m not going to address—this USACM whitepaper (of which I was principal author) touches on many of them.

Now, in the online world we cannot issue or see physical documents. Instead, we use certificates. We do this by putting together an electronic document that gives the information we want some entity to certify as true about us. The format of this certificate is generally fixed by standards, the most common one being the X.509 suite. This document is sent to an organization known as a Certificate Authority (CA), usually along with a fee. The certificate authority is presumably well-known, and performs a check (to their own standards) that the information in the document is correct, and it has the right form. The CA then calculate a digital hash value of the data, and creates a digital signature of that hash value. This is then added to the certificate and sent back to the user. This is the equivalent of putting a signature on a license and then sealing it in plastic. Any alteration of the data will change the digital hash, and a third party will find that the new hash and the hash value signed with the key of the CA don’t match. The reason this works is that the hash function and encryption algorithm used are presumed to be so computationally difficult to forge that it is basically not possible.

As an example of a certificate , if you visit “https://www.cerias.purdue.edu” you can click on the little padlock icon that appears somewhere in the browser window frame (this is browser dependent) to view details of the CERIAS SSL certificate.

You can get more details on all this by reading the referenced Wikipedia pages, and by reading chapters 5 & 7 in Web Security, Privacy and Commerce.

Back to the hack

In summary, some CAs have been negligent about updating their certificate signing mechanisms in the wake of news that MD5 is weak, published back in 2004. The result is that malicious parties can generate and obtain a certificate “authenticating” them as someone else. What makes it worse is that the root certificate of most of these CAs are “built in” to browser and application trust lists to simplify look-up of new certificates. Thus, most people using standard WWW browsers can be fooled into thinking they have connected to real, valid sites—even through they are connecting to rogue sites.

The approach is simple enough: a party constructs two certificates. One is for the false identity she wishes to claim, and the other is real. She crafts the contents of the certificate so that the MD5 hash of the two, in canonical format, is the same. She submits the real identity certificate to the authority, which verifies her bona fides, and returns the certificate with the MD5 hash signed with the CA private key. Our protagonist then copies that signature to the false certificate, which has the same MD5 hash value and thus the same digital signature, and proceeds with her impersonation!

What makes this worse is that the false key she crafts is for a secondary certificate authority. She can publish this in appropriate places, and is now able to mint as many false keys as she wishes—and they will all have signatures that verify in the chain of trust back to the issuer! She can even issue these new certificates using a stronger hash algorithm than MD5!

What makes this even worse is that it has been known for years that MD5 is weak, yet some CAs have continued to use it! Particularly unfortunate is the realization that Lenstra, Wang and de Weger described how this could be done back in 2005. Methinks that may be grounds for some negligence lawsuits if anyone gets really burned by this….

And adding to the complexity of all this is the issue of certificates in use for other purposes. For example, certificates are used with encrypted S/MIME email to digitally sign messages. Certificates are used to sign ActiveX controls for Microsoft software. Certificates are used to verify the information on many identity cards, including (I believe) government-issued Common Access Cards (CAC). Certificates also provide identification for secured instant messaging sessions (e.g., iChat). There may be many other sensitive uses because certificates are a “known” mechanism. Cloud computing services , software updates, and more may be based on these same assumptions. Some of these services may accept and/or use certificates issued by these deficient CAs.

Fixes

Fixing this is not trivial. Certainly, all CAs need to start issuing certificates based on other message digests, such as SHA-1. However, this will take time and effort, and may not take effect before this problem can be exploited by attackers. Responsible vendors will cease to issue certificates until they get this fixed, but that has an economic impact some many not wish to incur.

We can try to educate end-users about this, but the problem is so complicated with technical details, the average person won’t know how to actually make a determination about valid certificates. It might even cause more harm by leading people to distrust valid certificates by mistake!

It is not possible to simply say that all existing applications will no longer accept certificates rooted at those CAs, or will not accept certificates based on MD5: there are too many extant, valid certificates in place to do that. Eventually, those certificates will expire, and be replaced. That will eventually take care of the problem—perhaps within the space of the next 18 months or so (most certificates are issued for only a year at a time, in part for reasons such as this).

Vendors of applications, and especially WWW browsers, need to give careful thought about updates to their software to flag MD5-based certificates as deserving of special attention. This may or may not be a worthwhile approach, for the reason given above: even with a warning, too few people will be able to know what to do.

Bigger issue

We base a huge amount of trust on certificates and encryption. History has shown how easy it is to get implementations and details wrong. History has also shown how quickly things can be destabilized with advances in technology.

In particular, too many people and organizations take for granted the assumptions on which this vast certificate system is based. For instance, we assume that the hash/digest functions in use are computationally difficult to reverse or cause collisions. We also assume that certain mathematical functions underlying public/private key encryption are too difficult to reverse or “brute force.” However, all it takes is some new insight or analysis, or maybe new, affordable technology (e.g., practical quantum computing, or massively parallel computing) to violate those assumptions.

If you look at the way our systems are constructed, too little thought is given to what happens to existing infrastructure when something breaks. Designs can include compensating and recovery code, but doing so requires some cost in space or time. However, all too often people are willing to avoid the investment by putting off the danger to “if and when that happens.” Thus, we instance such as the Y2K problems and the issues here with potentially rogue CAs.

(I’ll note as an aside, that when I designed the original version of Tripwire and what became the Signacert product, I specifically included simultaneous use of several different message digest functions in different families for this very reason. I knew it was a matter of time before one or two were broken. I still believe that it is beyond reason to find files that will match multiple, different algorithms simultaneously.)

Another issue is the whole question of who we trust, and for what. As noted in the USACM whitepaper, authentication is always relative to a third party. How much do we trust those third parties? How much trust have we invested in the companies and agencies issuing certificates? Are they properly verifying identities? How good is there internal security? How do we know, and how much is at risk from our trust in those entities?

Let me leave you with a final thought. How do we know that this problem has not already been quietly exploited? The basic concept has been in the open literature for years. The general nature of this attack on certificates has been known for well over a decade, if not two. Given the technical and infrastructure resources available to national agencies and organized criminals, and given the motivation to use this hack selectively and quietly, how can we know that it is not already being used?


[Added 12/31/2008]: A follow-up post to this one is available in the blog.

 

Word documents being used in new attacks

Share:

I have repeatedly pointed out (e.g., this post) to people that sending Word files as attachments is a bad idea. This has been used many, many times to circulate viruses, worms, and more. People continue to push back because (basically) it is convenient for them. How often have we heard that convenience trumps good security (and good sense)?

Now comes this story of yet another attack being spread with Word documents.

There are multiple reasons why I don’t accept Word documents in email. This is simply one of the better reasons.

If you want to establish a sound security posture at your organization, one of the things you should mandate is no circulation of executable formats—either out or in. “.doc” files are in this category. I am unsure if the new “.docx” format is fully immune to these kinds of things but it seems “.rtf” is.

 

Rethinking computing insanity, practice and research

Share:

[A portion of this essay appeared in the October 2008 issue of Information Security magazine. My thanks to Dave Farber for a conversation that spurred me to post this expanded version.]

[Small typos corrected in April 2010.]

I’d like to repeat (portions of) a theme I have been speaking about for over a decade. I’ll start by taking a long view of computing.

Fifty years ago, IBM introduced the first all-transistor computer (the 7000 series). Transistors were approximately $60 apiece (in current dollars). Secondary storage was about 10 cents per byte (also in current dollars) and had a density of approximately 2000 bits per cubic inch. According to Wikipedia, a working IBM 7090 system with a full 32K of memory (the capacity of the machine) cost about $3,000,000 to purchase—over $21,000,000 in current dollars. Software, peripherals, and maintenance all cost more. Rental of a system (maintenance included) could be well over $500,000 per month (in 1958 dollars). Other vendors soon brought their own transistorized systems to market, at similar costs.

These early computing systems came without an operating system. However, the costs of having such a system sit idle between jobs (and during I/O) led the field to develop operating systems that supported sharing of hardware to maximize utilization. It also led to the development of user accounts for cost accounting. And all of these soon led to development of security features to ensure that the sharing didn’t go too far, and that accounting was not disabled or corrupted. As the hardware evolved and became more capable, the software also evolved and took on new features.

Costs and capabilities of computing hardware have changed by a factor of tens of millions in five decades. Currently, transistors cost less than 1/7800 of a cent apiece in modern CPU chips (Intel Itanium). Assuming I didn’t drop a decimal place, that is a drop in price by 7 orders of magnitude.  Ed Lazowska made a presentation a few years ago where he indicated that the number of grains of rice harvested worldwide in 2004 was ten quintillion—10 raised to the 18th power. But in 2004, there were also ten quintillion transistors manufactured, and that number has increased faster than the rice harvest ever since. We have more transistors being produced and fielded each year than all the grains of rice harvested in all the countries of the world. Isn’t that amazing?

Storage also changed drastically. We have gone from core memory to semiconductor memory. And in secondary storage we have gone from drum memory to disks to SSDs. If we look at consumer disk storage, it is now common to get storage density of better than 500Gb per cubic inch at a cost of less than $.20 per Gb (including enclosure and controller)—a price drop of nearly 8 orders of magnitude. Of course, weight, size, speed, noise, heat, power, and other factors have all also undergone major changes. To think of it another way, that same presentation by Professor Lazowska, noted that the computerized greeting cards you can buy at the store to record and play back a message to music have more computing power and memory in them than some of those multi-million $ computers of the 1950s, all for under $10.

Yet, despite these incredible transformations, the operating systems, databases, languages, and more that we use are still basically the designs we came up with in the 1960s to make the best use of limited, expensive, shared equipment. More to the theme of this blog, overall information security is almost certainly worse now than it was in the 1960s. We’re still suffering from problems known for decades, and systems are still being built with intrinsic weaknesses, yet now we have more to lose with more valuable information coming online every week.

Why have we failed to make appreciable progress with the software? In part, it is because we’ve been busy trying to advance on every front. Partially, it is because it is simpler to replace the underlying hardware with something faster, thus getting a visible performance gain. This helps mask the ongoing lack of quality and progression to really new ideas. As well, the speed with which the field of computing (development and application) moves is incredible, and few have the time or inclination to step back and re-examine first principles. This includes old habits such as the sense of importance in making code “small” even to the point of leaving out internal consistency checks and error handling. (Y2K was not a one-time fluke—it’s an instance of an institutional bad habit.)

Another such habit is that of trying to build every system to have the capability to perform every task. There is a general lack of awareness that security needs are different for different applications and environments; instead, people seek uniformity of OS, hardware architecture, programming languages and beyond, all with maximal flexibility and capacity. Ostensibly, this uniformity is to reduce purchase, training, and maintenance costs, but fails to take into account risks and operational needs. Such attitudes are clearly nonsensical when applied to almost any other area of technology, so it is perplexing they are still rampant in IT.   

For instance, imagine buying a single model of commercial speedboat and assuming it will be adequate for bass fishing, auto ferries, arctic icebreakers, Coast Guard rescues, oil tankers, and deep water naval interdiction—so long as we add on a few aftermarket items and enable a few options. Fundamentally, we understand that this is untenable and that we need to architect a vessel from the keel upwards to tailor it for specific needs, and to harden it against specific dangers. Why cannot we see the same is true for computing? Why do we not understand that the commercial platform used at home to store Aunt Bee’s pie recipes is NOT equally suitable for weapons control, health care records management, real-time utility management, storage of financial transactions, and more? Trying to support everything in one system results in huge, unwieldy software on incredibly complex hardware chips, all requiring dozens of external packages to attempt to shore up the inherent problems introduced by the complexity. Meanwhile, we require more complex hardware to support all the software, and this drives complexity, cost and power issues.

The situation is unlikely to improve until we, as a society, start valuing good security and quality over the lifetime of our IT products. We need to design systems to enforce behavior within each specific configuration, not continually tinker with general systems to stop each new threat. Firewalls, IDS, antivirus, DLP and even virtual machine “must-have” products are used because the underlying systems aren’t trustworthy—as we keep discovering with increasing pain. A better approach would be to determine exactly what we want supported in each environment, build systems to those more minimal specifications only, and then ensure they are not used for anything beyond those limitations. By having a defined, crafted set of applications we want to run, it will be easier to deny execution to anything we don’t want; To use some current terminology, that’s “whitelisting” as opposed to “blacklisting.” This approach to design is also craftsmanship—using the right tools for each task at hand, as opposed to treating all problems the same because all we have is a single tool, no matter how good that tool may be. After all, you may have the finest quality multitool money can buy, with dozens of blades and screwdrivers and pliers. But you would never dream of building a house (or a government agency) using that multitool. Sure, it does a lot of things passably, but it is far from ideal for expertly doing most complex tasks.

Managers will make the argument that using a single, standard component means it can be produced, acquired and operated more cheaply than if there are many different versions. That is often correct insofar as direct costs are concerned. However, it fails to include secondary costs such as reducing the costs of total failure and exposure, and reducing the cost of “bridge” and “add-on” components to make items suitable. Smaller and more directed systems need to be patched and upgraded far less often than large, all-inclusive systems because they have less to go wrong and don’t change as often. There is also a defensive benefit to the resulting diversity: attackers need to work harder to penetrate a given system because they don’t know what is running. Taken to an extreme, having a single solution also reduces or eliminates real innovation as there is no incentive for radical new approaches; with a single platform, the only viable approach is to make small, incremental changes built to the common format. This introduces a hidden burden on progress that is well understood in historical terms—radical new improvements seldom result from staying with the masses in the mainstream.

Therein lies the challenge, for researchers and policy-makers. The current cyber security landscape is a major battlefield. We are under constant attack from criminals, vandals, and professional agents of governments. There is such an urgent, large-scale need to simply bring current systems up to some bare minimum that it could soak up way more resources than we have to throw at the problems. The result is that there is a huge sense of urgency to find ways to “fix” the current infrastructure. Not only is this where the bulk of the resources is going, but this flow of resources and attention also fixes the focus of our research establishment on these issues, But when this happens, there is great pressure to direct research towards the current environment, and towards projects with tangible results. Program managers are encouraged to go this way because they want to show they are good stewards of the public trust by helping solve major problems. CIOs and CTOs are less willing to try outlandish ideas, and cringe at even the notion of replacing their current infrastructure, broken as it may be. So, researchers go where the money is—tried and true, incremental, “safe” research.

We have crippled our research community as a result. There are too few resources devoted to far-ranging ideas that may not have immediate results. Even if the program managers encourage vision, review panels are quick to quash it. The recent history of DARPA is one that has shifted towards immediate results from industry and away from vision, at least in computing. NSF, DOE, NIST and other agencies have also shortened their horizons, despite claims to the contrary. Recommendations for action (including the recent CSIS Commission report to the President) continue this by posing the problem as how to secure the current infrastructure rather than asking how we can build and maintain a trustable infrastructure to replace what is currently there.

Some of us see how knowledge of the past combined with future research can help us have more secure systems. The challenge continues to be convincing enough people that “cheap” is not the same as “best,” and that we can afford to do better. Let’s see some real innovation in building and deploying new systems, languages, and even networks. After all, we no longer need to fit in 32K of memory on a $21 million computer. Let’s stop optimizing the wrong things, and start focusing on discovering and building the right solutions to problems rather than continuing to try to answer the same tired (and wrong) questions. We need a major sustained effort in research into new operating systems and architectures, new software engineering methods, new programming languages and systems, and more, some with a (nearly) clean-slate starting point. Small failures should be encouraged, because they indicate people are trying risky ideas. Then we need a sustained effort to transition good ideas into practice.

I’ll conclude with s quote that many people attribute to Albert Einstein, but I have seen multiple citations to its use by John Dryden in the 1600s in his play “The Spanish Friar”:

  “Insanity: doing the same thing over and over again expecting different results.”

What we have been doing in cyber security has been insane. It is past time to do something different.

[Added 12/17: I was reminded that I made a post last year that touches on some of the same themes; it is here.]