http://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p227.pdf
Network middleboxes must offer high availability, with automatic failover when a device fails … challenging because failover must correctly restore lost state (e.g., activity logs, port mappings) but must do so quickly (e.g., in less than typical transport timeout values to minimize disruption to applications) and with little overhead to failure-free operation (e.g., additional per-packet latencies of 10-100s of μs).
middleboxes typically involve proprietary monolithic software running on dedicated hardware, they can be expensive to deploy and manage ... To rectify this situation, network operators are moving towards Network Function Virtualization (NFV), in which middlebox functionality is moved out of dedicated physical boxes into virtual appliances that can be run on commodity processors [32]
While the NFV vision solves the dedicated hardware problem, it presents some technical challenges of its own.
We argue that an equally important challenge — one that has received far less attention — is that of fault-tolerance.
traditional middleboxes ... limiting the introduction of faults ... will not apply to NFV: vendor diversity in hardware and applications will explode the test space
greater openness and agility in middlebox infrastructure
With current middleboxes, operators often maintain a dedicated per-appliance backup. This is inefficient and offers only a weak form of recovery for the many middlebox applications that are stateful — e.g., NAT …
dynamic state about flows, users, and network conditions
correct recovery from failures …
tailor the classic approach of rollback recovery to the middlebox domain and achieves correct recovery in a general and passive manner
FIMB — achieves rapid recovery
30μs
to median per-packet latencies40-275ms
for practical
system configurations)