diff --git a/ChannelSnapshots/ChannelSnapshots.html b/ChannelSnapshots/ChannelSnapshots.html index df8a0ba..c230f03 100644 --- a/ChannelSnapshots/ChannelSnapshots.html +++ b/ChannelSnapshots/ChannelSnapshots.html @@ -99,156 +99,202 @@
- A global snapshot algorithm for determining a state in a computation. -
- A global snapshot is the state at a consistent cut. + A global snapshot is the system state at an input-closed set of + events. Systems can be monitored by scheduling repeated global snapshots. - All state detection algorithms are based on global snapshots. + Algorithms that detect states are based on global snapshots. - This page gives an algorithm to determine a global snapshot; we will - look at other algorithms later. + This page gives one algorithm to determine a global snapshot, and + other algorithms are described later. -
We discuss the design of an algorithm by which agents collaborate to determine -the state at a consistent cut; this state is called a global - snapshot. +the global state at an input-closed set of events; this state is +called a global snapshot.
-The algorithm is executed by an operating system (OS) on behalf of a -client. - The algorithm consists of OS actions that - do not change the client's causality graph. - OS actions can record, but not modify, states. -Likewise, OS actions can send and receive OS messages without changing - client states and messages. +The algorithm is executed by a distributed operating system (OS) on behalf of a + client. + + Each client agent has an OS agent that supervises it. + + OS agents use the same processors and channels as clients do. + + OS agents can record, but not modify, states of their clients. + + OS agents can send and receive OS messages that are not seen + by clients.
- The OS may use the same processors and channels as clients do. - So, the OS may change when client actions are executed. - For example, execution of the OS on a processor may delay execution of a client's - action on the same processor. - The OS may change a client's computation, but we require that it - must not change a client's causality graph. + + + Execution of an OS agent on a processor may delay a client's + events on the same processor. + + The OS may change the order in which agents execute events. + + So, + execution of an OS algorithm may change a client's + computation. +
+We will design OS algorithms that do not change the client's event +graph. + +Equivalently, we design OS algorithms that leave agent computations +unchanged. + +
One way to record a global snapshot is for the OS to stop a client computation, then take a global snapshot, and then restart the - client computation. +client computation. + We describe a snapshot algorithm that runs concurrently with the client computation without stopping the client. -
-The algorithm is as follows. Each agent records its own local state, at -some point in the computation. -An agent's record of its local state is called the agent's local snapshot. -An agent's actions before it takes its local snapshot are called -pre-snapshot actions. + +Each agent records its own local state, at +some point in the computation, and does so exactly once. -
Proof
-If \(v\) and \(v'\) are actions by the same agent and if \(v'\) -is a pre-snapshot action then so is \(v\) because \(v\) -precedes \(v'\). -If \(v\) and \(v'\) are actions at different agents then \(v'\) - receives the message sent in \(v\); from the condition of the -theorem, if \(v'\) is a pre-snapshot action then so is \(v\). -- A cut is consistent - - if for all edges \((v, v')\) of the causality -graph if \(v'\) is in the cut then so is \(v\). -Therefore, the cut consisting of pre-snapshot actions is consistent. -
-The representation of the computation is the causality graph with a -vertex sequence in which each vertex appears after vertices on which it depends. -Labels of edges aren't shown so as not to overcrowd the figure. + +The figure shows a computation with event sequence \([0, 1, 2, 3, +\ldots,]\). -In figure 2, each agent takes a local snapshot shown as a yellow -circle. -Taking local snapshots may change the order in which actions of the -client computation are executed. -Action 3 occurs after after actions 4, 5, and 6 in figure 2 whereas 3 -is before 4, 5, and 6 in figure 1. +In figure 2, each agent takes a local snapshot which is shown as a yellow +circle on the agent's timeline. -
-The computation shown in figure 2 is different from that in figure 1. -The computation in figure 2 has a state in which there is -a message in a channel from agent X to agent Y, and a message in a + + + +
Example: Taking Snapshots may change Client +Computations
+ +Taking local snapshots may change the order in which events of the +client computation occur. + +Whe agent \(Y\) takes its local snapshot it delays event 3. + +So, event 3 occurs after after events 4, 5, and 6 in the computation +of figure 2 whereas +event 3 occurs before events 4, 5, and 6 in the computation of figure 1. + +The computation in figure 2 has a state in which there is both a +message in a channel from agent X to agent Y, and a message in a channel from Z to Y (see edges (4, 5) and (1, 3)). -By contrast, in figure 1, -message (1, 3) is delivered before message (4, 5) is sent. -The action of making observations (recording states) of the client -computation changes the computation. --Though the computations are different, the causality graphs for -figures 1 and 2 are the same. +That state does not occur in the computation of figure 1. -
-The pre-snapshot vertices are 0, 1, 2, 4, 6. -The cut is the set of vertices to the left of the curved brown line. -Snapshots don't change the client's causality graph. -So the label of the edge from 4 to 6 (the state of agent X between the -actions) isn't modified by the action of -taking the local snapshot; the label of the edge from 4 to the -snapshot action is the same as that from the snapshot action to 6. +
Though the computations are different, the event graphs, and hence +the agent computations, are the same for figures 1 and 2 are the same. -
-There is only one message received in a pre-snapshot action, namely the -message represented by the edge (0, 2). -The action of sending this message is a pre-snapshot action. -So, the local snapshots satisfy the condition of the lemma. -The lemma tells us that the cut consisting of pre-snapshot vertices is -consistent. + +
Example: Pre-Snapshot Set that is Input +Closed
+ +The pre-snapshot vertices are 0, 1, 2, 4, 6 in figure 2. + +The cut is the boundary between the pre-snapshot events to the left of +the boundary and post-snapshot events to the right. +
In figure 2 there is only one message received in a pre-snapshot +action, namely the message represented by the edge (0, 2). + +Sending this message is also a pre-snapshot event. + +So, every message received in a pre-snapshot event is sent in a +pre-snapshot event. + +The figure shows that the set of pre-snapshot +events is input closed -- there is no edge from a post- to a +pre-snapshot event. + + + + + +
Example: Pre-Snapshot Set that is not Input +Closed
+ +Figure 3 shows an event graph that has a message edge directed from a +post-snapshot event, 6, to a pre-snapshot event, 7. + +This set of pre-snapshot events is not input closed. + +Proof
+We prove that the pre-snapshot set is input closed if the +condition of the theorem holds. The only-if part of the proof is +straightforward. + ++A set of events is input-closed exactly when for every edge \((e, +e')\) of the event graph, if \(e'\) is in the set then so is \(e\). + +If the edge is at an agent \(u\), then \(e\) precedes \(e'\) in agent +\(u\)'s computation, and + +so, if \(e'\) is pre-snapshot then so is \(e\). + +
+If the edge is a message edge, and the message is received in a +pre-snapshot event \(e'\), then from the condition of the theorem, the +message is sent in a pre-snapshot event \(e\).
-We assume that the +The agent-channel graph -- the directed graph of agents (vertices) and channels (edges) -- is strongly connected. -So the algorithm sends exactly one marker on -every channel, and every agent takes its local snapshot. +So the algorithm sends a marker on every channel.
+When agents X and Z each receive the markers, they take their local +snapshots because they haven't taken snapshots earlier. -When agents X and Z each receive the markers, they take their local snapshots -because they haven't taken snapshots earlier. The actions by X and Z of taking their snapshots are shown as yellow vertices on their timelines. + +
When X and Z take their snapshots they send markers out on their -output channels; these markers are not shown in figure 4. -A total of 6 markers are sent in the algorithm, one marker for each channel. +output channels. + +The markers sent by X are shown in figure 6. +The markers sent by Z are not shown in the figure. +
+A total of 6 markers are sent in the algorithm, one marker for each +channel. +
From rule 2 of the algorithm, the messages sent by an agent along a channel before the agent takes its snapshot are the messages that the agent sends along the channel before sending a marker on the channel. + Because channels are first-in-first-out, the messages sent along a -channel before the -sender takes its snapshot are the messages received along the channel -before the marker on the channel. +channel before the sender takes its snapshot are the messages received +along the channel before the marker on the channel. + Therefore:
Note: If an agent takes its local snapshot when it receives a marker along a channel, then the snapshot of the channel is the empty sequence of messages. -
Figure 6 shows how agent Y determines the state of the channel from X to Y in the global snapshot. + Y starts recording the messages it receives along this channel after Y takes its snapshot and stops the recording when it receives a marker -on this channel. +on this channel + The only message in this interval is the message corresponding to edge (6, 7). +Look at figure 7 to see the snapshot of the channel from X to Y in +more detail. + +The message corresponding to edge \((0, 2)\) is from X to Y but is not +in the snapshot of +the channel because both \(0\) and \(2\) are pre-snapshot events. + +Likewise, the message corresponding to edge \((12, 13)\) is from X to +Y but is not in the snapshot of +the channel because both \(12\) and \(13\) are post-snapshot events. + +The message corresponding to edge \((6, 7)\) was sent in a +pre-snapshot event and received in a post-snapshot event, and so it is +in the snapshot of the channel. +
+The global snapshot is a state of an input-closed event set of the +changed computation -- the client computation that may have +been changed by the execution of the snapshot algorithm. + +Does the snapshot provide information about the original +computation -- the computation without an execution of the +snapshot algorithm?
-Let the states in which the snapshot algorithm starts and ends be +The key idea is that the event graphs are the same in the original and +changed computations. + +So, the states of input-closed event sets are the same in the original +and changed computations. + +
+Let the global states in which the snapshot algorithm starts and ends be \(s_{init}\) and \(s_{fini}\), respectively, and let \(s_{snap}\) be the global state recorded by the algorithm. -Because \(s_{snap}\) is the state of a consistent cut, it follows that - +Because these states are states of input-closed event sets, it +follows from the + +theorem on computations from sets to supersets + +that in both the original and changed computations:
+Global snapshots are used to detect persistent properties such as the computation is in a terminated or deadlocked state. If the computation has terminated when the snapshot algorithm starts then the snapshot shows that the computation has terminated. And, if the snapshot shows that the computation has terminated then the computation has terminated at the point that the snapshot algorithm -finishes. - - - -
Stable set of States
-A stable set of states is a set \(P\) such that every transition from a -state in \(P\) is to a state in \(P\). There is no transition from -inside \(P\) to outside \(P\), and so after a computation visits a -state in \(P\) the computation remains forever in \(P\). The set of -states in which agents are deadlocked is an example of a stable set: -there is no transition from a deadlocked state to a non-deadlocked -state. - - --What do we know about the snapshot if the snapshot algorithm is -initiated when the system is outside a stable set \(P\) and enters -\(P\) while the snapshot algorithm is still running? In this case, the -specification doesn't tell us whether the snapshot will be in \(P\) or -not. - -
-If the snapshot is in \(P\) then the system is in \(P\) when the -snapshot algorithm ends. If the snapshot is not in \(P\) then -the system is not in \(P\) when the snapshot algorithm begins, -but we don't know whether the system is in \(P\) when the snapshot -algorithm ends. - -
-If a snapshot is not in \(P\) then more -snapshots must be taken to detect whether the system may have entered -\(P\) after the last snapshot was initiated. -A general method to detect whether a system is in a stable set of -states is to monitor the system by scheduling repeated snapshots. - - -
+The state at an input-closed event set is a concept that is +used in many algorithms. +A global snapshot is a state at an input-closed event set. -
-The state at a consistent cut is a concept that is used in all algorithms -that detect properties of computations. -A global snapshot is a state at a consistent cut. -The global snapshot algorithm uses markers to separate past from -future to identify a consistent cut (past, future). -We will look at more algorithms that determine global snapshots. +The global snapshot algorithm uses markers to separate events in an +input-closed set from events outside it.