Monday, March 23, 2009

Matthew Caesar's talk on Trustworthy Internet Infrastructure

On 11th March, 2009, I attended UIUC CS Assistant Prof. Matthew Caesar's talk on trustworthy Internet Infrastructure. This was part of the ITI/UIUC TSS Seminar Series.

Matt talked about how to make the Internet routing system trustworthy. Internet infrastructure is very complex. Some people consider it as the most complex thing ever designed. And modeling internet operation is difficult ...

Because of the enormous cost of changing the way the Internet operates, traditional approaches have been to focus on incremental workarounds. Any radical redesign would be useless as it is not practical to deploy.

However, the quick hacks are not clearly working, as we face many problems.

Matt pondered, what if we could start over? Start from scratch and focus on a clean slate approach? He argued that any redesign is not going to happen because of the enormous task of changing routers. So, the best we can do is to work with the existing schemes and make them trustworthy.

Caesar talked about 3 schemes towards trustworthy Internet. The first is easier network monitoring, second, making making network operations/debugging/monitoring simpler to design. and third, making network more available and resilient, with high uptimes.

Among these, Matt talked about Bug tolerant networks.

Building bug tolerant networks with bug tolerant routers

Today's routers have a lot of complex software running on them. Because of the complexity, bugs abound, causing a string of high profile vulnerabilities, outages. Like any complex software , complexity is the cause of bugs, which cause serious issues, that propagate , ISPs often run routers in a way that errors are cascaded.

A recent example was Misconfiguration at AS 47868, Supronet, that started as a BGP error. BGP routing updates, has path vectors, A particular isp supronet misconfigured their routers , a typo, they were running a mikrotik router, this vendor has a bug in router, and cisco also has a bug in their OS, long AS path would trigger buffer overflow. FEb 16, 2009, 1/3 of internet was slow.

Challenges of router bugs

Bugs are different from traditional failures. fail stop failures cause routers to stop working. Bugs cause misbehavior, violate protocol, need vendor repair, exploitable by attackers.

Vikram Adve asked what code was the problem -- can a malicious guy write bad code to run in the router. Caesar replied that's not the main issue they are concerned with. Carl Gunter asked how similar this is to active networking, it seems like a similar concept. Caesar said this is similar to active networks.

Solving the bug problem

Caesar's main research here focuses on how to run multiple copies of router software on the router. A single copy/implementation may have bugs, but with voting among multiple different implementations, bugs can be filtered out.

So, he and his team built a router that had 3 routing software daemons (Quaga, zorp, etc.) running in parallel. These run on top of a hypervisor. After these run, a voting tool decides the final result. The hypervisor gives the illusion of a single routing daemon.

Nitin Vaidya asked, does Caesar assume that similar specs used by all the implementations? Has Caesar thought if two BGP implementations can have different results? Caesar replied that yes, because there are some non-determinism, so results may vary between implementations.

Bill Sanders asked, there are lots of work in voting issues in faut tolerant computing, . Caesar said, the non-determinism is disabled by ISP operators. because for traffic engineering, they need to make things deterministic.

Majoz (?) asked what you do when you have a buggy daemon ... do you remove it? Caesar said they haven't thought of it, but you can patch/update a daemon, or give weights.

Someone asked how much overhead is there? Caesar said update has some additional latency, they implemented it, overheads are 10s of milliseconds to process. But that's OK, because routers only exchange updates every several seconds.

Carl Gunter remarked that most problems are caused by misconfiguration, so what if someone misconfigured all 3 of the daemons. Caesars said that, configuration issues are orthogonal to their research.

Someone asked about the overhead with additional instances of daemons. Caesar replied that each daemons can be run on a separate core in multicore proc (routers have desktop like processors).

Bill Sanders asked if Matt knew about papers or reports that document quantifiably how diverse different copies of software are to produce the same things. Caesar commented it is similar to Knight-Levenson experiment, but Sanders said that was a 1985 experiment done with a few group of students. He then pondered, how much diversity do you need here?

Anecdotal evidence by Bill Sanders - they had a big DARPA projects, two different operating systems, turned off all unneeded service. and then looked how many advisories are common to both operating systems. the answer was 60%. Is it good or bad number? They don't know.

Carl Gunter said that there is a much better chance of replicating the knight experiment because the code size is small. Sanders commented what fraction of error will be caught by the small piece of errors doing voting.

Vikram Adve said: if you are running 3 copies of code, then isn't performance reduced by a factor of 3. Caesar said the bottleneck is in the hardware in data plane.

When sending updates to neighbors, the router also uses voting to choose the update to advertise.

Finally, Matt presented the research questions.
  • how to do voting?
  • nature o routing bugs?
  • how to achieve diversity?

They also looked into different voting strategies:

Wait for consensus: output majority. Downside = wait for consensus to happen, that slows down updates.

Master slave - always output master's answer, but switch to slave on buggy behavior.

Another aspect Matt had little time to talk about was the Study of nature of router bugs. They looked at Bugzilla bug db, and characterized bugs, most are from se fault/crash/freeze.

Wrong behavior (Bugs) = 43%, router crash = 57%.

There was a big diversity across virtual routers, different techniques, code bases, software versions, configurations,

And then he ran out of time.

My take: Interesting idea. But the problem is, often all different implementations of a particular algorithm or routing scheme reuse some components (e.g. random number generator? Some existing common library?). Often, security vulnerabilities are in the common library code, rather than in any particular implementation. How much dependability do you get by using 3 different copies? I'll be very interested in looking into more numbers. I hope Matt's paper on this has more experimental results.

My take2: And obviously, I'm interested to know how secure provenance can solve the malicious router update problem. It seems that by considering provenance of an update, it may be possible to filter out bad/tampered update info.

Sunday, March 08, 2009

On Secure Provenance and the logic behind the threat model

In our USENIX FAST 2009 paper (the "Picasso" paper), we discussed a scheme for providing integrity and confidentiality assurances to provenance of files. While this is a good first step towards securing provenance, I think there are many more issues we need to resolve.

These days, I see many security related papers advocating this or that scheme to secure objects. However, I really don't buy anything that claims to solve problems by having access control or policies. Here is why: access control works fine if the system is centralized, or the sysadmin of the system is incorruptible. However, when you have a distributed system with no control over other principals/their systems, OR when even sysadmins may become an attacker, there is no guarantee that access control constraints will be honored.

So, in the "REAL World", we can't claim to have a system that will prevent attacks from happening. With enough money, even trusted hardware devices can be breached (my co-advisor Radu Sion likes to stress on this point ... nothing is invincible). So, what can we do? We can't prevent someone from lying about themselves, or from deleting / changing things in their possession. What we CAN do is to prevent people from lying about others (i.e. "honest" others). This is exactly what guarantee we provide in our Secure Provenance work ... we prevent people from undetectably "invent" history involving other honest people.

To give a real life analogy, suppose a forger has painted a fake Picasso painting. The forger benefits here by taking his fake Picasso, and then inventing a fake history / provenance record involving his painting. He must have some honest buyers / art galleries listed in the provenance, otherwise, if the provenance only lists his cronies, it won't be believed.

The forger will NEVER do the opposite thing, i.e. take a real Picasso, and then remove its provenance and claim it to be painted by him. :)

The analogy applies to many scenarios involving data. I won't claim that it applies to all cases ... there are scenarios where the adversary might want to claim something as his own. An example would be the case of copyright disputes ... imagine two scientists bickering over who discovered something. But in most cases, the forger's goal with data is just like real life objects ... the forger wants to pass off something as what it's not ... so he needs a fake history, and that fake history must involve "honest" principals.

There are tons of issues to solve in order to have secure provenance ... but I'll write more about them later.

BTW, the painting shown above is a "real" Picasso, it is the painting titled "Dora Maar au Chat" (Dora Maar with cat). It is one of the most expensive paintings in the world; it was auctioned off in 2004 for $95 million!! Now, that has got to be the costliest painting of a cat!!

Thursday, March 05, 2009

Ouri Wolfson's talk on using IT for Intelligent Transportation

This afternoon, I listened to Prof. Ouri Wolfson of UIC talk at the DAIS seminar about using IT in managing transportation in an intelligent manner. The talk title was Information Technology and Intelligent Transportation - A Marriage Made in Heaven.

Prof. Wolfson presented some futuristic ideas ... if vehicles could talk to each other, if things such as vehicle speed, road congestion, brake malfunction etc. could be exchanged between adjacent vehicles, and then propagated to a larger area, what things can we do? Wolfson argued that we could do a LOT. For example, if a road is congested, and our car learned that, we can take an alternate route. If suddenly a vehicle in front of a my car lost control, my car could take proactive steps to avoid an accident.

There are of course sensors showing traffic congestion, but those are usually deployed at main highways. Wolfson's vision is to create a vehicular p2p system that will work for arterial roads as well. There are of course many issues such as how cars exchange and propagate the info.

Prof. Wolfson showed a small box-device that can collect vehicle movement/location info, and then transmit that to other such devices. The vision is to have these devices deployed on a large number of cars, and then beam the aggregated system state to all of these devices. The device can then show a map, and provide situational awareness to the entire set of vehicles.

The work is part of the IGERT initiative, for intelligent transportation.

My take: As a security researcher, the first thing that comes into my mind is how we would handle security issues. Thinking like an attacker, I see a lot of ways to subvert the system. For example, how do we know that the boxes are not lying and feeding fake data? If I want other vehicles to erroneously believe a side road is congestion free, move over there, and hence clear the road ahead of me, shouldn't I (or a lot of people) be tempted to do that?

Besides lying for personal benefits, there may be more sinister reasons to cheat/subvert the system. Attackers can also cause congestion, and jam critical highways. Attackers can also feed fake "accident" information to cause nearby vehicles take emergency evasive action, which itself can cause more problems.

When we take information from a mass of potentially untrusted sources, we have to be careful from the outset. Unless we can ensure the provenance of information, and also verify trustworthiness of things we get fed from others, it is better not to look at the info.

People don't even have to be malicious ... a vehicular sensor unit can malfunction ... how do we ensure it won't cause a ripple effect and mess up the entire system?

I won't even touch on the "Privacy" issues i.e. Big brother like monitoring of cars ... though this is a big concern as well.

Of course, the networking and data propagation issues are quite interesting by their own right, and that's where Ouri Wolfson focused on the talk. Quite interesting.