\section{Observer theory}
\label{s-obsv}
An observer theory describes the knowledge accumulated by an observer
in its interaction with a process (in the form of messages sent over
networks), and its capability in analyzing and synthesizing messages.
Since messages can be encrypted, and the encryption key may be
unknown to the observer, it is not always the case that
the observer can decompose all messages sent over the networks.
In the presence of an active intruder, the traditional notion
of bisimulation is not fine grained enough to prove interesting equivalence
of protocols. A notion of bisimulation in which the knowledge
and capability of the intruder is taken into account is
often called an {\em environment-sensitive bisimulation}.
{\em Messages} are expressions formed from names, pairing constructor, e.g., $\spr{M}{N}$,
and symmetric encryption, e.g., $\enc M K$, where $K$ is the encryption key and
$M$ is the message being encrypted.
Note that we restrict to pairing and encryption to simplify discussion;
there is no difficulty in extending the set of messages to include
other constructors, including asymmetric encryption, natural numbers, etc.
For technical reasons, we shall distinguish two kinds of names:
{\em flexible names} and {\em rigid names}. We shall refer to flexible names as
simply names. Names will be denoted with lower-case letters, e.g., $a$, $x$, $y$, etc.,
and rigid names will be denoted with bold letters, e.g., $\rna$, $\rnb$, etc.
We let $\mathcal N$ denote the set of names and
\Ne\ denote the set of pairs $(x,x)$ of the same name.
A name is really just a variable, i.e.,
a site for substitutions, and rigid names are just constants.
This slightly different terminology is to conform with a ``tradition'' in name-passing
process calculi where names are sometimes confused with variables (see e.g.,
\cite{sangiorgi96acta}). In the context of open bisimulation for
the spi-calculus \cite{tiu-trace-based-bisim-abs}, names stand for undetermined messages
which can be synthesized by the observer.
There are two aspects of an observer theory which are relevant to bisimulation methods for
protocols verification (for a more detailed discussion, see, e.g., \cite{abadi98njc}):
\begin{itemize}
\item {\em Message analysis and synthesis}: This is often formalised
as a deduction system with judgments of the form $\Sigma \vdash M$,
where $\Sigma$ is a set of messages and $M$ is a message.
The intuitive meaning is that the observer can derive $M$ given $\Sigma.$
The deduction system is given in Figure~\ref{f-msg} using sequent
calculus. The usual formulation is based on natural deduction, but
there is an easy correspondence between the two presentations (see \cite{tiu-gore-intruder}
for details).
One can derive, for example, that $\Sigma \vdash M$ holds if
$\Sigma \vdash \enc M K$ and $\Sigma \vdash K$ hold, i.e., if the observer
can derive $\enc M K$ and the key $K$, then it can derive $M$.
\item {\em Indistinguishability of messages}:
This notion arises when an observer tries to differentiate two processes
based on the messages output by the processes. In the presence of
encryption, indistinguishability does not simply mean syntactic equality.
The judgment of interest in this case takes the form
$
\Gamma \vdash M \eqm N
$
where $\Gamma$ is a finite set of pairs of messages. It means, intuitively,
that the observer cannot distinguish between $M$ and $N$, given the
{\em indistinguishability assumption} $\Gamma.$
We shall not go into detailed discussion on this notion of indistinguishability;
it has been discussed extensively in the
literature~\cite{abadi98njc,borgstrom05,boreale-siam,tiu-trace-based-bisim-abs}.
Instead we give a proof system for message indistinguishability (or message
equivalence) in Figure~\ref{f-msg-ind}.
\end{itemize}
Note that there are some minor differences between the inference
rules in Figure~\ref{f-msg} and Figure~\ref{f-msg-ind} and those
given in \cite{tiu-trace-based-bisim-abs}.
That is, the ``principal'' message pairs for the rules $(pl)$ and $(el)$ in \cite{tiu-trace-based-bisim-abs},
(\pairM,\pairN) and (\encM,\encN), are also in the premises.
We proved that the alternative system is equivalent and that, in both
systems, weakening on the left of $\vdash$ is admissible:
see Appendix~\ref{ded-eqv}.
We note that, by a cut-admissibility-like result,
it is possible to further remove $(M_k, N_k)$
from the second premise of $(el)$: see Appendix~\ref{inv-ca}.
Subsequent results in this paper are concerned mainly with the above notion
of indistinguishability. We therefore identify an {\em observer theory}
with its underlying indistinguishability assumptions (i.e., $\Gamma$ in the second
item above).
Hence, from now on, an observer theory (or theory) is a just finite set
of pairs of messages, and will be denoted with $\Gamma$.
Given a theory $\Gamma$, we write $\pi_1(\Gamma)$ to denote the
set obtained by projecting on the first components of the pairs in $\Gamma$.
The set $\pi_2(\Gamma)$ is defined analogously.
{\em Observer theory consistency:} ~
An important notion in the theory of environment sensitive bisimulation
is that of {\em consistency} of an observer theory. This amounts to the requirement
that any observation (i.e., any ``destructive'' operations related to constructors of
the messages, e.g., projection, decryption) that is applicable to the first projection
of the theory is also applicable to the second projection.
For example, the theory
$
\{ (\enc \rna \rnb, \enc \rnc \rnd),
(\rnb, \rnc)
\}
$
is not consistent, since on the first projection (i.e., the set
$\{\enc \rna \rnb, \rnb\}$), one can decrypt the
first message $\enc \rna \rnb$ using the second message $\rnb$, but
the same operation cannot be done on the second projection.
The formal definition of consistency involves checking
all message pairs $(M,N)$ such that $\Gamma \vdash M \eqm N$ is derivable
for certain similarity of observations. The first part of this paper is
about verifying that this infinite quantification is not necessary. This
involves showing that for every theory $\Gamma$, there is a corresponding
{\em reduced theory} that is equivalent, but for which consistency checking
requires only checking finitely many message pairs.
\begin{figure}[t]
$$
\infer[(var)]
{\Sigma \seqsym x}
{x \in {\mathcal{N}}}
\qquad
\infer[(id)]
{\Sigma, M \seqsym M}
{}
\qquad
\infer[(pr)]
{\Sigma \seqsym \spr M N}
{\Sigma \seqsym M & \Sigma \seqsym N}
$$
$$
\infer[(er)]
{\Sigma \seqsym \enc M N}
{\Sigma \seqsym M & \Sigma \seqsym N}
\qquad
\infer[(pl)]
{\Sigma, \spr M N \seqsym R}
{\Sigma, M, N \seqsym R}
\qquad
\infer[(el)]
{\Sigma, \enc M N \seqsym R}
{\Sigma \seqsym N & \Sigma, M, N \seqsym R}
$$
\caption{A proof system for message synthesis}
\label{f-msg}
\end{figure}
\begin{figure}[t]
$$
\infer[(var)]
{\Gamma \vdash x \eqm x}
{x \in \mathcal N}
\qquad
\infer[(id)]
{\Gamma \vdash M \eqm N}
{(M, N) \in \Gamma}
$$
$$
\infer[(pr)]
{\Gamma \vdash \pairM \eqm \pairN}
{\Gamma \vdash M_a \eqm N_a & \Gamma \vdash M_b \eqm N_b}
\qquad
\infer[(er)]
{\Gamma \vdash \encM \eqm \encN}
{\Gamma \vdash M_p \eqm N_p & \Gamma \vdash M_k \eqm N_k}
$$
$$
\infer[(pl)]
{\Gamma, {(\pairM,\pairN)} \vdash M \eqm N}
{\Gamma, (M_a, N_a), (M_b, N_b) \vdash M \eqm N}
$$
$$
\infer[(el)]
{\Gamma, {(\encM,\encN)} \vdash M \eqm N}
{\Gamma \vdash M_k \eqm N_k &
\Gamma, (M_p, N_p), (M_k, N_k) \vdash M \eqm N}
$$
\caption{A proof system for deducing message equivalence}
\label{f-msg-ind}
\end{figure}
{\em Symbolic observer theory:} ~
The definition of open bisimulation for name-passing calculi, such as
the $\pi$-calculus, typically includes closure under a certain
notion of {\em respectful substitutions}~\cite{sangiorgi96acta}.
In the $\pi$-calculus, this notion of respectfulness is defined w.r.t.
to a notion of {\em distinction} among names, i.e., an irreflexive relation on
names which forbids identification of certain names.
In the case of the spi-calculus, things get more complicated because the
bisimulation relation is indexed by an observer theory, not just a simple
distinction on names. We need to define a symbolic representation
of observer theories, and an appropriate notion of consistency for
the symbolic theories.
These are addressed in \cite{tiu-trace-based-bisim-abs}
via a structure called {\em bi-traces}. A bi-trace is essentially a
list of pairs of messages. It can be seen as a pair of symbolic traces,
in the sense of \cite{boreale}.
The order of the message pairs in the list indicates the order of
their creation (i.e., by the intruder or by the processes themselves).
Names in a bi-trace indicate undetermined messages, which are open to
instantiations. Therefore the notion of consistency of bi-traces needs
to take into account these possible instantiations.
Consider the following sequence of message pairs:
$
(\rna, \rnd), (\enc \rna \rnb, \enc \rnd \rnk), (\enc \rnc {\enc x \rnb}, \enc \rnk \rnl).
$
Considered as a theory, it is consistent, since none of the encryption keys are
known to the observer. However, if we allow $x$ to be instantiated to $a$,
then the resulting theory
$
\{ (\rna, \rnd), (\enc \rna \rnb, \enc \rnd \rnk), (\enc \rnc {\enc \rna \rnb}, \enc \rnk \rnl) \}
$
is inconsistent, since on the first projection,
$\enc \rna \rnb$ can be used as a key to decrypt $\enc \rnc {\enc \rna \rnb}$, while
in the second projection, no decryption is possible.
Therefore to check consistency of a bi-trace, one needs to consider
potentially infinitely many instances of the bi-trace.
Section~\ref{s-bt} shows some key steps to simplify
consistency checking for bi-traces.