10 Oct 2015

ORCHESTRA: Rapid, Collaborative Sharing of Dynamic Data

http://www.cidrdb.org/cidr2005/papers/P09.pdf

Shadowed Smiley face

bottom-up collaborative data sharing, in which independent researchers or groups with different goals, schemas, and data can share information in the absence of global agreement. … peer-to-peer data sharing, which considers revision, disagreement, authority, and intermittent partition. … yields to others with greater authority.

the central problem … science evolves in a “bottom-up” fashion, resulting in a fundamental mismatch with top-down data integration methods. Scientists make and publish new discoveries, and other new concepts, build upon, and refine the most convincing work. Science does not revisit the global models after every discovery: this is time consuming, requires consensus, and may not be necessary depending on the long-term significance of the discovery.

The web evolves rapidly, and it is self-organizing, and self-maintaining.

rapidly contribute new schemas, data, and revisions. …

Orchestra emphasizes managing disagreement, and it supports rapid changing membership …

conflicting data and updates: the traditional emphasis in distributed data sharing has been on providing (at least eventually) consistency ... the goal is to merge update sequences in an ordered and consistent way, yielding a globally consistent data instance. … we propose a model that intuitively resembles that of incomplete information[2]. … each participant has an internally consistent database instance, formed by the set of tuples that it accepts. Importantly, no participant is required to modify its data instance to reach agreement with the others, although it has the option if it so chooses.

Orchestra coordinates a set of autonomous participants who make updates to local relation instance and later publish them for others to access. the general mode of operation is to operate in disconnected fashion, then to reconcile.

a participant p reconciles its updates with those made by others through

  1. compute the effects of these updates on all shared relations … determine which updates would be accepted by p and remove those that conflict
  2. propagate to p’s relation those updates that are accepted and non-conflicting
  3. record the updates originating from p, or accepted by it, in Orchestra for future reconciliation operation.

4 consistency with conflicting data

The fundamental unit of storage and propagation is an atomic delta over a single relation, representing a minimal encoding for the insertion, deletion, or replacement of a single tuple.

reference