02 Oct 2015

FatTire: Declarative Fault Tolerance for Software Defined Networks

HotSDN’13 author site pdf

Abstract FatTire, a new language for writing fault-tolerant networks programs. The central feature of this language is a new programming construct based on regular expressions that allows developers to specify the set of paths that packets may take through the network as well as the degree of fault-tolerant required. This construct is implemented by a compiler that targets the in-network fast-failover mechanisms provided in recent versions of the OpenFlow standard, and facilitates simple reasoning about network programs even in the presence of failure. We describe the design of FatTire, present algorithms for compiling programs to OpenFlow switch configurations, describe our prototype FatTire implementation, and demonstrate its use on simple examples.

Introduction

mechanisms (20) that allow routers and switches to rapidly respond to failure, restoring connectivity in 10s of milliseconds.

high-level constructs that allow SDN programmers to specify distinct policy concerns, such as forwarding, performance, security, and fault-tolerance. In addition, SDN programmers should be able to reason about the interactions between those constructs when they are combined in a single program.

2 Programming fault tolerance

e.g., the network continues forwarding SSH traffic even if a single link fails

… fault-tolerance property, each of the links in this primary path also needs a backup …

Fast failove: OpenFlow 1.3, (ruletable and grouptable), supports conditional rules whose forwarding behavior depends on the local state of the switch. group table contains entries whose rules include “an order list of action buckets” … provide the ability to define multiple forwarding behaviors.

while fast-failover groups make it possible to implement rapid failure recovery, using them correctly places a heavy burden on the SDN programmer.

3 FatTire Language

central feature: regular expression that specifies sets of legal paths through the network, along with fault-tolerant requirements for those paths

atomic policy: Predicate => Path_exp with n

FatTire adds high-level constructs for paths and regular expressions, along with fault-tolerance annotations (a default fault-tolerance annotation of 0). Semantically, intersecting two policies results in a policy whose paths are the paths described by both policies and whose fault-tolerance is the maximum of the fault-tolerance provided by the individual policies.

4 The FatTire Compiler

  1. normalize the input policy into a union of atomic policies, each with non-overlapping predicates

       (tpDst = 22 => [*.IDS.*])
     /\(tpDst = 22 => [*] with 1)
     /\(any => [GW.*.A])
    

    normalized to

     (tpDst = 22 => [GW.*.IDS.*.A] with 1)
    
  2. construct a fault-tolerant forwarding graph for each atomic policy
  3. translate the forwarding graphs to policies in NetCore.
  4. compile the resulting policies to OpenFlow

6 Evaluation

use iperf to transfer 100MB of data between a host attached to GW and one attached to A in the topology and measured the time needed to complete the transfer: transfer completion time achieved by fast fail-over as enabled by FatTire is only slightly higher than when no failure occurs.

integrating fault-tolerance and traffic engineering (19) could potentially be used in conjunction with FatTire abstractions.

a fault management approach similar to MPLS global path protection (7), which (the authors argue) should be part of OpenFlow: focus is on extending the OpenFlow switch software with end-to-end path monitoring capabilities.

reference