http://research.microsoft.com/en-US/people/mzh/statesman.pdf
multiple network management apps to operate independently, while maintaining network-wide safety and performance invariants
e.g., TE application + firmware upgrade: depending on which action happens first, either the TE application fails to create the tunnel (because the [machine] is already down), or the already-established tunnel ultimately drops traffic during the firmware upgrade.
thus, DCN (datacenter networks) … need a way to keep the applications separate. … the monolithic applications would be highly complex, and worse, it would need to be extended as new needs arise … coupling greatly increases application complexity
typically the “control loop”: each management application measures the state of the network, performs a computation, and then reconfigures the network. …
these applications can conflict with each other, even if they interact with the network at different levels, such as establishing network paths, assigning IP addresses to interfaces, or installing firmware switches … running multiple … also raises the risk of network-wide failures … while each application alone is fine, their joint actions would disconnect the ToR (top rack) …
explicit coordination among applications Corybantic1 … a general solution to the problem of co-existence imposes — require each application to understand the intended network changes of all others. … worse, each time an app is changed or a new is developed, DCN operators would need to test again, and potentially retrofit some existing applications … advocate … loosely coupled … conflict resolution and invariant enforcement should be handled by a separate management system.
exposing “controllability” to applications … denoting whether the parent state variables is currently controllable, and its value is computed by Statesman based on lower-level dependencies.
e.g., DeviceFirmwareVersion is controllable only if … switches’ power and admin states are appropriate. firware-upgrade application can work with DeviceFirmwareVersion only if it is controllable.
… minimum safety and performance requirements, independent of what applications are currently running.
split the monitoring responsibility across many monitor instances, so each instance covers roughly 1,000 switches … currently the monitors run periodically to collect all switches’ power states, firmware versions, device configurations, and various counters (and forwarding states for a subset of switches) …
depends a long line of prior on SDN [1, 2, 3, 6, 8, 9, 22]. … in contrast, statesman supports a wider range of network management functions (e.g., switch upgrade, link failure mitigation, elastic scaling, etc) …
Mesos [11] schedules competing applications using the cluster-resource abstraction, which is quite different from our network-state abstraction (e.g., no cross-variable dependency).
Onix [18, 19] and Hercules [16] provide a shared network-state platform for all applications. But these systems neither resolve application conflicts, in particular those caused by state variable dependency, nor enforce network-wide invariants.
Corybantic (loacl copy) [23] proposes a different way of resolving conflicts …
[23] J. Mogul, A. AuYoung, S. Banerjee, J. Lee, J. Mudigonda, L. Popa, P. Sharma, and Y. Turner. Corybantic: Towards Modular Composition of SDN Control Programs1. In ACM HotNets Workshop, November 2013.
[1] M. Caesar, D. Caldwell, N. Feamster, J. Rexford, A. Shaikh, and J. van der Merwe. Design and Implementation of a Routing Control Platform. In USENIX NSDI, May 2005.
[2] M. Casado, M. J. Freedman, J. Pettit, J. Luo, N. McKeown, and S. Shenker. Ethane: Taking Control of the Enterprise. ACM SIGCOMM CCR, 37(4):1–12, August 2007.
[3] M. Casado, T. Garfinkel, A. Akella, M. J. Freedman, D. Boneh, N. McKeown, and S. Shenker. SANE: A Protection Architecture for Enterprise Networks. In USENIX Security Symposium, July 2006.
[6] N. Feamster, J. Rexford, and E. Zegura. The Road to SDN. ACM Queue, 11(12):20:20–20:40, December 2013.
[8] A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, J. Rexford, G. Xie, H. Yan, J. Zhan, and H. Zhang. A Clean Slate 4D Approach to Network Control and Management. ACM SIGCOMM CCR, 35(5):41–54, October 2005
[9] N. Gude, T. Koponen, J. Pettit, B. Pfaff, M. Casado, N. McKeown, and S. Shenker. NOX: Towards an Operating System for Networks. ACM SIGCOMM CCR, 38(3):105–110, July 2008.
[22] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. OpenFlow: Enabling Innovation in Campus Networks. ACM SIGCOMM CCR, 38(2):69–74, March 2008.