Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Abstract

This part of the documentation is still under construction.

It is intended to become a beginner-friendly introduction (and less of a draft of a research paper) to the threshold automaton model and a guide to what flavors of distributed algorithms can be modeled using threshold automaton model.

For a formal introduction consult the papers that introduced the threshold automata model alongside model-checking algorithms and applied the models to interesting algorithms 12345678910111213141516.

Also check out the previous state-of-the art model checker for threshold automata ByMC 1718 which unfortunately not maintained anymore and its website is no longer available. Still it contains features like a Promela which are not implemented in TACO.

Keywords:tacodocumentationtaco documentationtaco docsuser guideuser manualverificationthreshold automatonprotocol verificationmodel checkingdistributed systemsfault toleranceconsensus algorithms

This section of the documentation will be dedicated to introducing you to the theoretical models that are used in TACO. We will start by introducing the formal definition of the system model,i.e., threshold automata and their properties with a simple example. Then, Section: Model Checking Algorithms gives you a high-level overview of the three model checking algorithms implemented in TACO.

Distributed Algorithms and the Byzantine Generals Problem

Before jumping into the model lets first take a look at what kinds of algorithms TACO has been designed for.

So let’s start with the question: What are distributed algorithms? In essence, a distributed algorithm is an algorithm designed to be executed by a group of computers, also called nodes, that can communicate over some kind of network. Such algorithms are, for example, vital to scaling modern applications across many servers, often across multiple data centers, or even across the globe.

However, the distributed setting is hard, or in the words of Lesslie Lamport:

A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable. 19

One key challenge that makes the distributed setting hard, is that not all nodes necessarily agree on the current view of the system. For example, initially the nodes of the system might not agree on which nodes are part of the system. To get a number of nodes to agree, one needs to solve the challenge of consensus.

The consensus problem asks a set of nodes to agree on a data value by exchanging messages. Many protocols have been developed for achieving consensus, a famous example used in many systems is Raft 20. However, in a real world distributed setting failures or incorrect behavior must be expected. Unfortunately, failures are often not simple crash-stop failures.

To model more complex failure behavior, the model of “Byzantine Faults” or byzantine behavior has been developed 21. A byzantine fault allows a node to behave arbitrarily, it is allowed to follow the protocol and stop responding, or it can actively try to sabotage correct nodes.

Achieving consensus in such a setting is especially hard, or even impossible if there are more than 1/3n1/3 n faulty nodes where nn is the number of nodes 21.

In general most of the consensus algorithms rely on some form of voting. This is where TACOs internal model comes in.

Threshold Automata

This section introduces the underlying theoretical framework of TACO, which are (extended) threshold automata (TA) 4516.

Definition:

A threshold automaton (TA) 45 is a tuple A=(L,I,Γ,Π,R,RC)A=(L, \mI, \Shared, \Pi,\Rules, RC) where:

Most commonly, the parameter variables are Π={n,t,f}\Pi = \{n,t,f\}, where nn is the number of processes, tt is a bound on the number of faulty processes, and ff is the actual number of faulty processes. This allows RCRC to express assumptions about the fraction of faulty processes in the system, e.g., RC=n>3ttfRC = n > 3t\, \land\, t \geq f.

If uvN0Γ\vup \in \Nat_0^{|\Shared|} and no variables are reset, then the TA is called a canonical TA (CTA). Otherwise we call it an extended TA (ETA)16.

Note that rechability is in general undecidable for ETA1016. Consequently, TACO’s decision procedures become semi-decision procedures for the ETAs, which will in general not terminate when an ETA is safe.

Semantics

For a vector pN0Π\vp \in \Nat_0^{|\Pi|}, we write pRC\vp \models RC if RCRC holds after substituting parameter variables with values according to p\vp. Then the set of admissible parameters is PRC={pN0Π:pRC}\vP_{RC} = \{\vp \in \Nat_0^{|\Pi|} : \vp \models RC\}.

Once a TA is specified, the number of processes to be modeled must also be defined through the function N:PRCN0\mN: \vP_{RC} \rightarrow \Nat_0 (usually N(n,t,f)=nf\mN(n,t,f) = n - f).

Configurations. A configuration of a TA is a triple σ=(k,g,p)\sigma = (\vk, \vg, \vp), where kN0L\vk \in \Nat_0^{|L|} assigns a number of processes to each location, gN0Γ\vg \in \Nat_0^{|\Shared|} assigns a valuation to the shared variables, and pPRC\vp \in \vP_{RC} assigns a valuation to the parameters. We say that a configuration is initial if g=0\vg = \vec{0} and i\forall i k[i]>0liI\vk[i] > 0 \iff l_i \in \mI.

Paths. A rule r=(from,to,φ,uv,τ)r = (\from, \to, \varphi,\vup,\resets) is enabled in a configuration σ=(k,g,p)\sigma = (\vk,\vg,\vp) if k[from]>0\vec{k}[\from] > 0 and (g,p)(\vg,\vp) satisfies the guard φ\varphi. In this case, we say that r(σ)=(k,g,p)r(\sigma) = (\vk',\vg',\vp') where k[from]=k[from]1\vk'[\from]=\vk[\from] -1, k[to]=k[to]+1\vk'[\to]=\vk[\to] + 1, g=g+uv\vg'=\vg + \vup, and p=p\vp'=\vp. A path π\pi is then a finite or infinite sequence of configurations π=σ0,σ1,\pi = \sigma_0, \sigma_1, \ldots such that for each ii there exists an enabled rule rir_i with σi+1=ri(σi)\sigma_{i+1} = r_i(\sigma_i). We denote by P(TA)\Paths the set of all paths π\pi in the threshold automaton.

Pseudocode of a reliable broadcast protocol.

Figure 1:Pseudocode of a reliable broadcast protocol.

Example:

Figure 1 shows the pseudocode of a reliable broadcast, inspired by 22. In every round, processes with input v=1v=1 will broadcast a message, then processes with v=0v=0 will set v=1v=1 if they received at least t+1t+1 messages, and finally processes set accept=trueaccept=true if at least ntn-t messages have been received. If acceptaccept is false at the end of the round, a new round starts.

Threshold automaton of the algorithm in Figure %s.

Figure 2:Threshold automaton of the algorithm in Figure 1.

Example: Figure 2 depicts a TA for this algorithm, with L={V0,V1,RV0,SE,AC}L = \{V_0, V_1, RV_0, SE, AC\}, I={V0,V1}\mI = \{V_0, V_1\}, Γ={nsnt,rec}\Shared = \{nsnt, rec\}, Π={n,t,f}\Pi = \{n, t, f\}, and RC=n>3ttf0RC = n > 3t \land t \geq f \geq 0. A process in V1V_1 has input 1 and can move freely (there is no guard) to SESE, incrementing both variables nsntnsnt and recrec to simulate lines 4–6 of the algorithm. A process in location V0V_0 has input 0 and can move to RV0RV_0, incrementing recrec to simulate line 6 (but not line 5, since the condition in line 4 evaluates to false). From RV0RV_0, a process can move to SESE if nsntφ1nsnt \geq \varphi_1, simulating lines 7-8. Processes that started with v=1v=1 already are in SESE, corresponding to the fact that v=1v=1 is not changed in line 8.

Note that instead of being t+1t+1, φ1\varphi_1 is chosen as t+1ft + 1 - f. This prevents processes from making a transition based on messages from faulty processes, hence the TA represents the behavior of correct processes and ff the effect of faulty processes on correct ones. For more details on the role of ff, see 12.

Processes in SESE can move to ACAC if nsntφ2nsnt \geq \varphi_2, simulating lines 9-10. If nsnt<φ2nsnt < \varphi_2 and recϕ3rec \geq \phi_3 (this constraint corresponds to waiting long enough for all non-faulty processes to receive all messages), the condition in line 9 was not satisfied and a new iteration of the while-loop is started by moving back from SESE to V1V_1. The first process taking this transition will reset nsntnsnt and recrec, the others take the transition that does not change any variables. Similarly, processes in RV0RV_0 will move to V0V_0 to start the new round.

The RCRC condition is critical here to ensure fault-tolerance: if f>tf > t, then ACAC will be reachable even if all processes start in V0V_0, violating the validity property.

Specifications on Threshold Automata

Basic Specifications

We consider two basic kinds of specifications: coverability, which refers to the notion of coverability that is widely used in parametrized verification, e.g., Petri nets 23 or VASS 24, and reachability.

Definition: The parametrized coverability problem is: Given a TA and a coverability specification cN0L\vec{c} \in \Nat_0^{|L|}, decide if there is a reachable configuration σr\conf_r[1] with σr.k[i]c[i]\conf_r.\vk[i] \geq \vec{c}[i] for all 0iL10 \leq i \leq |L|-1.

Definition: The parametrized reachability problem is: Given a TA and a reachability specification crN0L\vec{c_r} \in \Nat_0^{|L|}, decide if there is a reachable configuration σr\conf_r with σr.k[i]cr[i](σr.k[i]=0cr[i]=0)\conf_r.\vk[i] \geq \vec{c_r}[i] \land \left( \conf_r.\vk[i] = 0 \iff \vec{c_r}[i] = 0 \right) for all 0iL10 \leq i \leq |L|-1.

Usually, our specifications are definitions of error configurations, and therefore paths that reach them are called error paths.

ELTLFTELTL_{FT}

TACO supports safety specifications in the fragment of ELTLFTELTL_{FT}725 that can be translated into reachability or coverability properties. Formally, the syntax we support is defined as follows:

ψ::=pform        ψ        ψψpform::=cform        gformcformcform::=lLocsσ.k[l]0        lLocsσ.k[l]=0        cformcformgform::=φ        ¬gform        gformgform\begin{align*} \psi &::= \mathsf{pform} \;\;|\;\; \globally \psi \;\;|\;\; \psi \wedge \psi \\ \mathsf{pform} &::= \mathsf{cform} \;\;|\;\; \mathsf{gform} \vee \mathsf{cform} \\ \mathsf{cform} &::= \bigwedge_{l \in Locs} \conf.\vk[l] \neq 0 \;\;|\;\; \bigvee_{l \in Locs} \conf.\vk[l] = 0 \;\;|\;\; \mathsf{cform} \wedge \mathsf{cform} \\ \mathsf{gform} &::= \varphi \;\;|\;\; \neg \mathsf{gform} \;\;|\;\; \mathsf{gform} \wedge \mathsf{gform} \end{align*}

where φ\varphi is a guard as defined in Threshold Automata and LocsLLocs \subseteq L.

Furthermore, TACO also supports lower integer bounds on the number of processes in a specific location. The extended cform\mathsf{cform} syntax is:

cform::=σ.k[l]n        σ.k[l]=0        σ.k[l]0        cformcform        cformcform\begin{align*} \mathsf{cform} ::= \conf.\vk[l] * n \;\;|\;\; \conf.\vk[l] = 0 \;\;|\;\; \conf.\vk[l] \neq 0 \;\;|\;\; \mathsf{cform} \wedge \mathsf{cform} \;\;|\;\; \mathsf{cform} \vee \mathsf{cform} \end{align*}

where lL,nNl \in L, n \in \mathbb{N} and {>,}* \in \{>, \geq\}.

Example: One important property for consensus-based distributed algorithms is to check for validity, i.e., if no process received a specific value (e.g., 0), then no process should decide on that value. 1 specifies the valdity property in ELTLFTELTL_{FT} as follows:

V1==0    (AC==0)V1 == 0 \implies \globally (AC == 0)

To check if this property holds, TACO transforms this formula into a coverability specification that requires at least one process to be eventually in location ACAC while all correct processes initially start in location V1V1. If such a configuration is reachable, then the validity property is violated. Otherwise the property holds.

For further reading on the model of threshold automata, we refer the reader to 123456781713141516.

The next section Model Checking Algorithms will introduce the model-checking algorithms implemented in TACO.

Footnotes
  1. A configuration σr\conf_r is said to be reachable if there exists a path starting from an initial configuration and ending in σr\conf_r.

References
  1. John, A., Konnov, I., Schmid, U., Veith, H., & Widder, J. (2013). Towards Modeling and Model Checking Fault-Tolerant Distributed Algorithms. In E. Bartocci & C. R. Ramakrishnan (Eds.), Model Checking Software - 20th International Symposium, SPIN 2013, Stony Brook, NY, USA, July 8-9, 2013. Proceedings (Vol. 7976, pp. 209–226). Springer. 10.1007/978-3-642-39176-7_14
  2. John, A., Konnov, I., Schmid, U., Veith, H., & Widder, J. (2013). Parameterized model checking of fault-tolerant distributed algorithms by abstraction. Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013, 201–209. https://ieeexplore.ieee.org/document/6679411/
  3. Gmeiner, A., Konnov, I., Schmid, U., Veith, H., & Widder, J. (2014). Tutorial on Parameterized Model Checking of Fault-Tolerant Distributed Algorithms. In M. Bernardo, F. Damiani, R. Hähnle, E. B. Johnsen, & I. Schaefer (Eds.), Formal Methods for Executable Software Models - 14th International School on Formal Methods for the Design of Computer, Communication, and Software Systems, SFM 2014, Bertinoro, Italy, June 16-20, 2014, Advanced Lectures (Vol. 8483, pp. 122–171). Springer. 10.1007/978-3-319-07317-0_4
  4. Konnov, I., Veith, H., & Widder, J. (2014). On the Completeness of Bounded Model Checking for Threshold-Based Distributed Algorithms: Reachability. In P. Baldan & D. Gorla (Eds.), CONCUR 2014 - Concurrency Theory - 25th International Conference, CONCUR 2014, Rome, Italy, September 2-5, 2014. Proceedings (Vol. 8704, pp. 125–140). Springer. 10.1007/978-3-662-44584-6_10
  5. Konnov, I. V., Veith, H., & Widder, J. (2017). On the completeness of bounded model checking for threshold-based distributed algorithms: Reachability. Inf. Comput., 252, 95–109. 10.1016/J.IC.2016.03.006
  6. Konnov, I. V., Veith, H., & Widder, J. (2015). What You Always Wanted to Know About Model Checking of Fault-Tolerant Distributed Algorithms. In M. Mazzara & A. Voronkov (Eds.), Perspectives of System Informatics - 10th International Andrei Ershov Informatics Conference, PSI 2015, in Memory of Helmut Veith, Kazan and Innopolis, Russia, August 24-27, 2015, Revised Selected Papers (Vol. 9609, pp. 6–21). Springer. 10.1007/978-3-319-41579-6_2
  7. Konnov, I. V., Lazic, M., Veith, H., & Widder, J. (2017). A short counterexample property for safety and liveness verification of fault-tolerant distributed algorithms. In G. Castagna & A. D. Gordon (Eds.), Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, POPL 2017, Paris, France, January 18-20, 2017 (pp. 719–734). ACM. 10.1145/3009837.3009860
  8. Konnov, I., Lazic, M., Veith, H., & Widder, J. (2016). A Short Counterexample Property for Safety and Liveness Verification of Fault-tolerant Distributed Algorithms. CoRR, abs/1608.05327. http://arxiv.org/abs/1608.05327
  9. Konnov, I. V., Widder, J., Spegni, F., & Spalazzi, L. (2017). Accuracy of Message Counting Abstraction in Fault-Tolerant Distributed Algorithms. In A. Bouajjani & D. Monniaux (Eds.), Verification, Model Checking, and Abstract Interpretation - 18th International Conference, VMCAI 2017, Paris, France, January 15-17, 2017, Proceedings (Vol. 10145, pp. 347–366). Springer. 10.1007/978-3-319-52234-0_19
  10. Kukovec, J., Konnov, I., & Widder, J. (2018). Reachability in Parameterized Systems: All Flavors of Threshold Automata. In S. Schewe & L. Zhang (Eds.), 29th International Conference on Concurrency Theory, CONCUR 2018, September 4-7, 2018, Beijing, China (Vol. 118, p. 19:1-19:17). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. 10.4230/LIPICS.CONCUR.2018.19
  11. Bertrand, N., Konnov, I., Lazic, M., & Widder, J. (2019). Verification of Randomized Consensus Algorithms Under Round-Rigid Adversaries. In W. J. Fokkink & R. van Glabbeek (Eds.), 30th International Conference on Concurrency Theory, CONCUR 2019, Amsterdam, The Netherlands, August 27-30, 2019 (Vol. 140, p. 33:1-33:15). Schloss Dagstuhl - Leibniz-Zentrum für Informatik. 10.4230/LIPICS.CONCUR.2019.33
  12. Stoilkovska, I., Konnov, I., Widder, J., & Zuleger, F. (2020). Eliminating Message Counters in Threshold Automata. In D. V. Hung & O. Sokolsky (Eds.), Automated Technology for Verification and Analysis - 18th International Symposium, ATVA 2020, Hanoi, Vietnam, October 19-23, 2020, Proceedings (Vol. 12302, pp. 196–212). Springer. 10.1007/978-3-030-59152-6_11
  13. Konnov, I., Lazic, M., Stoilkovska, I., & Widder, J. (2023). Survey on Parameterized Verification with Threshold Automata and the Byzantine Model Checker. Log. Methods Comput. Sci., 19(1). 10.46298/LMCS-19(1:5)2023
  14. Konnov, I., Lazic, M., Stoilkovska, I., & Widder, J. (2020). Survey on Parameterized Verification with Threshold Automata and the Byzantine Model Checker. CoRR, abs/2011.14789. https://arxiv.org/abs/2011.14789
  15. Balasubramanian, A. R., Esparza, J., & Lazic, M. (2020). Complexity of Verification and Synthesis of Threshold Automata. https://arxiv.org/abs/2007.06248