\hyphenation{
assump-tion
in-volves
non-classical
conn-ection
typi-cally
tradi-tional
frame-works
con-clusion
deriv-ation
}
%% Here begins the main text
\section{Introduction} \label{s-intro}
Sequent calculi provide a rigorous basis for meta-theoretic studies of
various logics. The central theorem is cut-elimination/admissibility,
which states that detours through lemmata can be avoided, since it can
help to show many important logical properties like consistency,
interpolation, and Beth definability. Cut-free sequent calculi are
also used for automated deduction, for nonclassical extensions of
logic programming, and for studying the connection between normalising
lambda calculi and functional programming. Sequent calculi, and their
extensions, therefore play an important role in logic and computation.
% Meta-theoretic reasoning using sequent calculi is error-prone and
% difficult with numerous false proofs in the literature: a good example
% is the sequent calculus for provability logic \GL{}. Shallow
% embeddings of sequent calculi into various interactive theorem provers
% exist but they are not satisfactory when the sequent calculus contains
% explicit structural rules.
Meta-theoretic reasoning about sequent calculi is error-prone because
it involves checking many combinatorial cases,
some being very difficult, but many being
similar. Invariably, authors resort to expressions like ``the other
cases are similar'', or ``we omit details''. The literature contains
many examples of meta-theoretic proofs containing serious and subtle
errors in the original pencil-and-paper proofs. For example, the
cut-elimination theorem for the modal ``provability logic'' \GL{},
where $\Box \varphi$ can be read as ``$\varphi$ is provable in Peano
Arithmetic'', has a long and chequered history which has only recently
been resolved~\cite{gr-valentini}.
% Various proof assistants and logical frameworks now allow us to
% formalise basic reasoning about sequent
% calculi~\cite{pfenning-structural-lics94,twelf-system,abella-gacek},
% and some even allow for automatic cut-elimination. But in % all of
% these, it is not possible to add and reason about arbitrary structural rules.
% % since they invariably use a shallow embedding that identifies
% % the properties of the object logic with those of some fixed meta-logic.
% On the other hand, methods based on ``deeply embedding'' the notion of a
% derivation often requires overly tedious reasoning, making it difficult to
% generalise such results.
% NEW TEXT HERE
When reasoning about sequent calculi using proof assistants, we have to
represent proof-theoretical concepts like ``sequents'',
``derivations'' and ``derivability'' in the meta-logic of the given proof
assistant.
The granularity of the representation plays a critical role in
determining the versatility of the representation.
At one extreme, we have ``deep'' embeddings in which a sequent and a derivation
are represented explicitly as terms of the meta-logic, as espoused in our
previous work on display calculi \cite{dawson-gore-cut-admissibility}.
The advantage of this approach is that it allows us to encode
fine-grained notions
like ``each structure variable appears at most once in the conclusion of a
rule''.
The disadvantage of this approach is that it requires a lot of
ancillary low-level work
to capture essential notions like ``rule instance''.
At the other extreme, we have ``shallow'' embeddings like the one in the
Isabelle/Sequents package, which allows us to derive sequents, but does not
allow us to reason about the derivations or derivability.
Most practitioners~\cite{pfenning-structural-lics94,twelf-system,abella-gacek} choose an approach somewhere between these extremes,
which then limits their ability to generalise their work
to other calculi or to handle arbitrary structural rules for example.
% A general methodology which allows us to mix and match deep and shallow
% embeddings is therefore As far as we know, no current work allows us to
% ``mix and match'' shallow and deep embeddings as desired.
Here, we describe general methods for reasoning in Isabelle/HOL about
the proof-theory of traditional, propositional, multiset-based sequent
calculi. Our methods are modular in allowing an arbitrary set of
rules, permit explicit structural rules,
and are widely applicable to many sequent systems, even to other
styles of calculi like natural deduction and term rewriting
systems. They advance the ``state of the art'' in that they allow us
to ``mix and match'' shallow and deep embeddings where appropriate,
which, as far as we know, has not been done before. We then show how
we used them to formalise a highly non-trivial proof of
cut-admissibility for \GL{} based on, but different from, that
of~\cite{gr-valentini}.
In Section~\ref{sec:preliminaries}, we briefly describe traditional
sequent calculi, discuss the need for multisets, and describe the
controversy surrounding the cut-elimination theorem for the set-based
sequent system \GLS{} for provability logic \GL{} .
%
In Section~\ref{s-deriv} we describe general deep and shallow techniques
and functions
we have defined for reasoning about derivations
and derivability,
independently of the rules of a particular sequent system.
We give very general induction principles
which are useful beyond the application to \GLS{}.
We show how we formalise formulae, sequents and rules,
and then show some of the $\GL$ sequent rules as examples.
%
In Section~\ref{s-axtc} we describe an Isabelle axiomatic type class
which we developed to facilitate reasoning about multisets of
formulae, and sequents based on them. This development explores the
interaction between lattice ($\land$, $\lor$, $\leq$) and
``arithmetic'' ($+$, $-$, 0, $\leq$) operations.
%
In Section~\ref{s-cagen} we discuss how we
%aspects of our Isabelle proofs which we tried to
made our Isabelle proofs as general as possible, and how they are
useful for proving cut-elimination and other results in arbitrary
sequent calculi which meet the associated preconditions.
%
In Section~\ref{s-caglr} we describe the cut-admissibility proof for
\GLS.
%and in Section~\ref{s-concl-fw} we conclude.
We assume the reader is familiar with basic
proof-theory, ML and Isabelle/HOL.
%
In the paper we show some Isabelle code, edited
to use mathematical symbols. % in it, a `\verb|?|' precedes a variable name.
The Appendix gives the actual Isabelle text
of many definitions and theorems.
Our Isabelle code can be found at
\url{http://users.rsise.anu.edu.au/~jeremy/isabelle/200x/seqms/}.
Some of this work was reported informally in \cite{gore-india}.
%TO DO - include explanation of Isabelle notation
\section{Sequents, Multisets, Sets and Provability Logic}
\label{sec:preliminaries}
Proof-theorists typically work with sequents $\Gamma
\vdash \Delta$ where $\Gamma$ and $\Delta$ are ``collections'' of
formulae. The ``collections'' found in the literature increase in
complexity from simple sets for classical logic, to multisets for
linear logic, to ordered lists for substructural logics, to complex
tree structures for display logics.
%
A sequent rule typically has a rule name, a (finite) number of premises,
a side-condition and a conclusion.
% \begin{prooftree}
% \AxiomC{$\Gamma_1 \vdash \Delta_1
% \qquad\cdots\qquad
% \Gamma_n \vdash \Delta_n$}
% \LeftLabel{RuleName}
% \RightLabel{Condition}
% \UnaryInfC{$\Gamma_0 \vdash \Delta_0$}
% \end{prooftree}
Rules are read top-down as ``if all the premises hold then the
conclusion holds''. A derivation of the judgement $\Gamma \vdash
\Delta$ is typically a finite tree of judgements with root $\Gamma
\vdash \Delta$ where parents are obtained from children by ``applying
a rule''. We use ``derivation'' to refer to a
proof \emph{within} a calculus, reserving ``proof'' for a
meta-theoretic proof of a theorem \emph{about} the calculus.
Provability logic \GL{} is obtained by adding L\"ob's axiom
$\Box (\Box A \to A) \to \Box A$ to propositional normal modal logic
\textrm{K}.
Its Kripke semantics is based on rooted transitive Kripke frames without
infinite forward chains.
%
It rose to prominence when Solovay showed that $\Box A$
could be interpreted as ``A is provable in Peano
Arithmetic''~\cite{Sol76}.
%
An initial proof-theoretic account was given by Leivant in
1981 when he ``proved'' cut-elimination for a set-based sequent
calculus for \GL~\cite{Lei81}.
%
But Valentini in 1983 found a simple counter-example and gave a new
cut-elimination proof~\cite{Val83}.
%
%The issue seemed to have been settled, but
In 2001, Moen~\cite{moen}
claimed that Valentini's transformations don't terminate if the
sequents $\Gamma \vdash \Delta$ are based on multisets. There is of
course no \textit{a priori} reason why a proof based on sets should
not carry over with some modification to a proof based on multisets.
%so this set the cat amongst the pigeons.
%
% In response, Negri~\cite{Neg05} in 2005 gave a new cut-elimination
% proof using sequents built from labelled formulae $w: A$, which
% captures that the traditional formula $A$ is true at the possible
% world $w$. But this is not satisfactory as it brings the underlying
% (Kripke) semantics of modal logic into the proof theory. Mints in 2005
% announced a new proof using traditional methods~\cite{Min06}.
%
The issue was recently resolved by Gor\'e and
Ramanayake~\cite{gr-valentini} who gave a pen-and-paper proof
that Moen is incorrect, and
that Valentini's proof using multisets is \textit{mostly} okay.
%referee objected to following sentence
%Many such examples exist in the literature.
The sequent system \GLS{} for the logic \GL{} as given by Gor\'e and
Ramanayake in \cite{gr-valentini}, like Valentini's, contains explicit
weakening and contraction rules and, modulo these, a typical
(additive) set of rules for the classical logical connectives $\lnot,
\land, \lor, \to$.
The axiom $A \vdash A$ does not require atomic $A$.
Since \GLS\ admits axiom extension, it could have been
formulated using $p \vdash p$, for $p$ atomic.
In fact the general result Theorem~\ref{t-gs2ssr} doesn't depend on the
axiom or on axiom extension.
The one additional rule \GLR{} which characterises \GL{} is shown below:
\begin{prooftree}
\AxiomC{$\GLRp$}
\RightLabel{\GLR \mbox{ or } \GLRB \mbox{ or } \GLRXB}
\UnaryInfC{$\GLRc$}
\end{prooftree}
% $$
% \frac \GLRp \GLRc ~~ GLR \mbox{ or } GLR(B) \mbox{ or } GLR(X,B)
% $$
The rule is unusual since the principal formula $\Box B$ changes
polarity from conclusion to premise.
To identify the principal formula involved, or all the formulae, we
refer to it as \GLRB, or \GLRXB.
The full set of rules of \GLS\ is
shown in \cite{gr-valentini}, but
note that our formalisation does not
regard the cut or multicut rules shown below as being part of the
system.
We show a context-sharing cut rule and a context-splitting multicut rule.
Given the contraction and weakening rules, the
context-sharing and context-splitting versions are equivalent;
our formal proofs show the admissibility of a context-splitting multicut rule
where $A$ is not contained in $\Gamma''$ or $ \Delta''$.
\[
\AxiomC{$\Gamma \vdash A, \Delta$}
\AxiomC{$\Gamma, A \vdash \Delta$}
\LeftLabel{(cut)}
\BinaryInfC{$\Gamma \vdash \Delta$}
\DisplayProof
\qquad
\AxiomC{$\Gamma' \vdash A^{n}, \Delta'$}
\AxiomC{$\Gamma'', A^{m} \vdash \Delta''$}
\LeftLabel{(multicut)}
\BinaryInfC{$\Gamma', \Gamma'' \vdash \Delta', \Delta''$}
\DisplayProof
\]
Thus our results will be lemmata of the form:
if $\Gamma \vdash A, \Delta$
and $\Gamma, A \vdash \Delta$ are derivable
then $\Gamma \vdash \Delta$ is derivable,
where ``derivable'' means without using (cut).
\section{Reasoning About Derivations and Derivability}
\label{s-deriv}
% \section{Isabelle} \label{s-isa}
% Isabelle is an automated proof assistant \cite{isabelle-ref}. Its
% meta-logic is an intuitionistic typed higher-order logic, sufficient
% to support the built-in inference steps of higher-order unification
% and term rewriting. Isabelle accepts inference rules of the form
% $\alpha_1, \alpha_2, \ldots, \alpha_n, \Longrightarrow \beta$
% where the $\alpha_i$ and $\beta$ are expressions of the Isabelle
% meta-logic, or are expressions using a new syntax, defined by the
% user, for some ``object logic''. Most users build on one of the
% comprehensive ``object logics'' already supplied, like Isabelle/HOL
% \cite{IsLogHOL}, which is an Isabelle theory based on the higher-order
% logic of Church % \cite{church-types}
% and the HOL system of Gordon \cite{HOL-introduction}. HOL offers
% inductively defined sets, and recursive datatypes, which we use
% extensively.
% COULD DELETE THIS PARA
% In previous work~\cite{dawson-gore-embedding-comparing}, we describe
% our initial attempts to formalise display calculi in various logical
% frameworks and describe why we selected Isabelle/HOL. When we model a
% derivation tree, explicitly, we do so using the capability in HOL of
% defining types like the datatypes of SML, as explained next.
% %Here we return to simpler sequents and seek generality.
\subsection{Deep and Shallow Embeddings} \label{s-dse} In
\cite{dawson-gore-cut-admissibility} we proved cut admissibility for
\dRA, the display calculus for relation algebras. The proof was based
on a proof of Belnap, which applied generally to display calculi whose
inference rules satisfied certain properties. In that paper we
described the formalisation as a ``deep embedding''. We now believe
that to describe a particular formalisation as either a ``deep
embedding'' or a ``shallow embedding'' over-simplifies the issue as we
now discuss.
In~\cite{dawson-gore-cut-admissibility}, we modelled an explicit derivation tree
in HOL as a recursive structure which consists of a
root sequent (which should be the conclusion of some rule), and an
associated list of (sub)trees (each of which should derive a premise of
that rule).
This is expressed in the following recursive Isabelle datatype declaration:
% The type of explicit derivation trees is a recursive
% datatype, such as:
\begin{verbatim}
datatype 'a dertree = Der 'a ('a dertree list)
| Unf 'a (* unfinished leaf which remains unproved *)
\end{verbatim}
%
We then had to express the property of such a tree, that it
is in fact correctly based on the given rules, and so represents a
valid derivation.
%
We modelled a sequent rule as an object in HOL consisting of a list of
premises and a conclusion, each of which was a sequent in the language
of the logic (of relation algebras). Notably, our language of formulae
included ``variables'' which could be instantiated with formulae,
so we defined functions for such substitutions.
This was necessary because we had to express conditions on the rules such
as that a variable appearing in the premises also appears in the conclusion.
It is much more common to let the variables in a rule be the variables of the
theorem prover, and for the theorem prover to do the substitution.
%
Thus it is more accurate to say that in
\cite{dawson-gore-cut-admissibility}, we used a deep embedding for
\emph{derivations}, \emph{rules} and \emph{variables}, since we
modelled each of these features of the proof-theory explicitly rather
than identifying them with
%the same
analogous features of Isabelle.
In alternative (``shallow derivations'') approaches to work of this type,
the set of derivable sequents can be modelled as an inductively defined set,
%\texttt{ders} (say),
% (as we describe in \S\ref{s-dp}).
and there is no term representing the derivation tree as such,
although the steps used in proving that a specific sequent is derivable
could be written in the form of a derivation tree. That is,
% where the derivation tree does not exist, but it ``happens'' in the
% course of proving that a specific sequent is derivable. That is,
derivability in the sequent calculus studied equates to derivability
in Isabelle, giving a shallow embedding of {\em derivations}.
We now briefly describe the functions we used to describe
derivability. More details are in Appendix~\ref{a-gdp},
and a more expository account is given in \cite{gore-india}.
\begin{verbatim}
types 'a psc = "'a list * 'a" (* single step inference *)
consts
derl, adm :: "'a psc set => 'a psc set"
derrec :: "'a psc set => 'a set => 'a set"
\end{verbatim}
\noindent
An inference rule of type \texttt{'a psc} is a list of premises % \texttt{ps}
and a conclusion. Then
\texttt{derl rls} is the set of
rules derivable from the rule set \texttt{rls},
\texttt{adm rls} is the set of
admissible rules of the rule set \texttt{rls},
and
\texttt{derrec rls prems}
is the set of sequents derivable using rules \texttt{rls}
from the set \texttt{prems} of premises. These were defined separately
using Isabelle's package for inductively defined sets as below.
Thus, using shallow derivations, the rules of the sequent calculus can
either be encoded explicitly as in \texttt{derrec}
or they can be encoded in the clauses for the inductive definition of
the set of derivable sequents as in \texttt{ders}.
% So $\mathtt{(ps, c) \in \derl~\rls}$ reflects the
% shape of a derivation tree: \texttt{ps} is an (ordered)
% \emph{list} of exactly the
% unproved leaf sequents of the tree,
% % premises used, in the correct order,
% whereas $\mathtt{c \in \derrec\ \rls\ prems}$ holds
% for any superset of the \emph{set} of the premises used.
% % even if the \emph{set} of premises \texttt{prems}
% % contains superfluous sequents.
%
%
\begin{description}
\item[\rm Shallow Embedding of Derivations and Deep Embedding of Rules:] \ \\[0.5em]
\begin{tabular}[c]{l}
$(\{\Gamma \vdash P, ~~\Gamma \vdash Q\},
\;\; \Gamma \vdash P \land Q) \in \mathtt{rules}$ \quad (etc for
other rules) \\[0.5em]
$ ~c \in \mathtt{prems} \Longrightarrow c \in \mathtt{derrec~rules~prems}$ \\
$ [|~(ps, c) \in \mathtt{rules} ~;~
ps \subseteq \mathtt{derrec~rules~prems}~|]
\Longrightarrow c \in \mathtt{derrec~rules~prems}$
\\[0.5em]
\end{tabular}
\item[\rm Shallow Embedding of Derivations and Shallow Embedding of Rules:] \ \\[0.5em]
\begin{tabular}[c]{l}
$ ~c \in \mathtt{prems} \Longrightarrow c \in \mathtt{ders~prems}$
\\
$[|~\Gamma \vdash P \in \mathtt{ders~prems} ~;~
\Gamma \vdash Q \in \mathtt{ders~prems}~|]
\Longrightarrow \Gamma \vdash P \land Q \in \mathtt{ders~prems}$
\end{tabular}
\end{description}
% These functions give a
% deep embedding of rules and shallow embedding of derivations.
% following now earlier
% This framework is general in that a rule merely consists of ``premises''
% and a ``conclusion'', and is independent of whether the things derived
% are formulae, sequents, or other constructs, but we will refer to them
% as sequents.
% In the former case, the set of derivable sequents (\texttt{derrec})
% from premises \texttt{prems}
% is defined in terms of \texttt{rules}, the set of rules.
% A rule $(ps, c)$ is a set or list $ps$ of premises, and a conclusion $c$.
The first clause for \texttt{derrec} says that each premise $c$ in
\texttt{prems} is derivable from \texttt{prems},
and the second says that a sequent $c$ ``obtained'' from a set of derivable
sequents $ps$ by a rule $(ps, c)$ is itself derivable. The set of
rules \texttt{rules} is a parameter.
%
The first clause for \texttt{ders} also says that each premise $c$
in \texttt{prems} is derivable from \texttt{prems}. The rules
however are no longer a parameter but are hard-coded
as clauses in the definition of \texttt{ders} itself.
Thus, with a shallow embedding of derivations, we have a choice of
either a deep or a shallow embedding of {\em rules}. It would also be
possible to combine our deep embedding of derivations
(\texttt{dertree}) with a shallow embedding of rules by encoding the
rules in the function which checks whether a derivation tree is valid.
Note however, that when we use a deep embedding of derivations in
Lemma~\ref{l-derrec-valid}, our definition of validity is
parameterised over the set of rules.
Our framework is generic
in that a rule merely consists of ``premises'' and a ``conclusion'',
and is independent of whether the things derived
are formulae, sequents, or other constructs,
but we will refer to them as sequents.
%where \textit{ders} means the set of derivable sequents:
% \begin{gather}
% (\{P, Q\}, {P \land Q}) \in \mathit{rules} \tag{deep}
% \qquad
% \frac{ (ps, c) \in \mathit{rules} \quad ps \subseteq \mathtt{ders}}
% { c \in \mathtt{ders}}
% \\[1ex]
% \frac {P \in \mathtt{ders} \qquad Q \in \mathtt{ders}}
% {P \land Q \in \mathtt{ders}} \tag{shallow}
% \end{gather}
%
% In general, however, a deep embedding of one feature can be combined with
% a shallow embedding of another.
Our experience is that shallow embeddings generally permit easier
proofs, but sometimes they are inadequate to express a desired
concept. For example, with a deep embedding of rules it is easy to
express that one set of rules is a subset of another set,
but with a shallow embedding this is not possible.
With a deep
embedding of derivation trees it is easy to express that one property
of derivation trees is the conjunction of two other properties, or
that a derivation tree has a particular height,
since each such tree has an explicit representation as a term,
whereas to express
such things using \texttt{ders} or \derrec\ (as above), one would
have to redefine these predicates incorporating the particular
properties of interest. Indeed, in this work (see \S\ref{s-caglr}) we
discuss how we found a shallow embedding of derivability inadequate,
and we describe there how we ``mix and match'' the various styles as required.
\comment{ not significant
We have developed the code on which
\cite{dawson-gore-cut-admissibility} was based to reflect the fact
that Belnap's original proof applied generally to display calculi
satisfying certain conditions. Some of these conditions required a
deep embedding of variables (and, therefore, of the rules), for
example, that no structure variable appears twice in the conclusion of
the statement of a rule. For others, a shallow embedding of variables
was adequate, since the condition applied equally to the statement of
the rule and to all its substitution instances, for example that a
premise of a rule contains no variable not contained in the
conclusion.
}
% In \cite{dawson-gore-machine-checked-strong-normalisation}, we
% considered the individual steps involved in changing a derivation
% containing cuts to a derivation not containing cuts. The explicit
% manipulation of derivation trees, such that each change makes a tree
% ``smaller'' in some sense, made it necessary to have a deep embedding
% of derivation trees.
\subsection{Properties of Our Generic Derivability Predicates} \label{s-dp}
We obtained the expected results linking \texttt{derl} and \texttt{derrec},
and a number of results expressing transitivity of derivation and the
results of derivation using derived rules, of which the most elegant are:
\begin{verbatim}
derl_deriv_eq : "derl (derl ?rls) = derl ?rls"
derrec_trans_eq : "derrec ?rls (derrec ?rls ?prems)
= derrec ?rls ?prems"
\end{verbatim}
\begin{description}
\item[\texttt{derl\_deriv\_eq}] states that
derivability using derived rules implies derivability using the original rules
\item[\texttt{derrec\_trans\_eq}] states that
derivability from derivable sequents implies derivability
from the original premises.
\end{description}
A simplified version of the induction principle
generated by the definition of the inductive set \derrec\ is as follows:
$$
\frac {x \in derrec\ rls\ prems \quad
\forall c \in prems. P\ c \quad
\forall (ps, c) \in rls. (\forall p\ in\ ps. P\ p) \Rightarrow P\ c}
{P\ x}
$$
The principle says that if each rule preserves the
property $P$ from its premises to its conclusion,
then so does the whole derivation,
even though there is no explicit derivation as such.
We contrast this principle
with induction on the height of derivations where it is
possible, and sometimes necessary, to transform a sub-tree in
some height-preserving (or height-reducing) way, and
assume that the transformed tree has property $P$. Such
transformations are not available when using the induction principle
above.
%, and were not needed in our earlier work.
Where we have a property of two derivations, such as
cut-admissibility, we need a more complex induction principle
\textbf{Ind} which we
derived,
%where the left tree derives $cl$ and the right tree derives
%$cr$
though, again, there are no explicit representations of derivation trees:
% (limited to the case
% where the set \textit{prems} of ultimate assumptions is empty):
$$
\begin{array}{c}
cl \in derrec\ rlsl\ \{\} \qquad cr \in derrec\ rlsr\ \{\} \\
\forall (lps, lc) \in rlsl. \forall (rps, rc) \in rlsr. \hfill \\
\hfill \qquad \qquad
(\forall lp \in lps. P~lp~rc) \land (\forall rp \in rps. P~lc~rp)
\Rightarrow P~lc~rc \\
\hline { P\ cl\ cr}
\end{array}
$$
\textbf{Ind:} Suppose $cl$ and $cr$ are the conclusions of the left
and right subderivations. Assume that for every rule pair $(lps, lc)$
and $(rps, rc)$, if each premise $lp$ in $lps$ satisfies $P~lp ~rc$, and each
premise $rp$ in $rps$ satisfies $P~lc~rp$, then $P~lc~rc$ holds. Then we
can conclude that $P~cl~cr$ holds.
Finally, $(ps, c)$ is an admissible rule iff: if all premises in $ps$
are derivable, then $c$ is too:
$(ps, c) \in \adm\ \rls \iff
(ps \subseteq \derrec\ \rls\ \{\} \ \Rightarrow \ c \in \derrec\ \rls\ \{\})$.
We obtained the following four results, which were surprisingly tricky
because \texttt{adm} is not monotonic
(since any rule with a premise is in \texttt{adm} $\{\}$).
For example, the last of these says that a derived rule,
derived from the admissible rules, is itself admissible.
\begin{verbatim}
"derl ?rls <= adm ?rls" "adm (adm ?rls) = adm ?rls"
"adm (derl ?rls) = adm ?rls" "derl (adm ?rls) = adm ?rls"
\end{verbatim}
\subsection{Sequents, Formulae and the \GLS\ rules} \label{s-gls}
We define a language of formula connectives, formula variables and
primitive (atomic) propositions:
\begin{verbatim}
datatype formula = FC string (formula list) (* formula connective *)
| FV string (* formula variable *)
| PP string (* primitive proposition *)
\end{verbatim}
%
Although the formula connectives are fixed for each logic,
the datatype is more general, using a single constructor \texttt{FC}
for all formula connectives.
We then define (for example) $P \land Q$ as \texttt{FC ''Btimes'' [P, Q]}.
%
%
A sequent is a pair of multisets of formulae, written
$\Gamma \vdash \Delta$.
% \begin{verbatim}
% datatype 'a sequent =
% Sequent "'a multiset" "'a multiset" ("((_)/ |- (_))" [6,6] 5)
% \end{verbatim}
Given a rule such as $(\vdash \land)$ in the two forms below,
$$
\MC_s = \frac{\vdash A \quad \vdash B} {\vdash A \land B}
\qquad\qquad\qquad\qquad
\MC_e = \frac{X \vdash Y, A \quad X \vdash Y, B} {X \vdash Y, A \land B}
$$
we call $\MC_e$ an \emph{extension} of $\MC_s$, and
we define functions \texttt{pscmap} and \texttt{extend},
where \texttt{pscmap} $f$ applies $f$ to premises and conclusion,
% such that (for example), where $+$ means multiset union,
so, using $+$ for multiset union,
%
\begin{gather*}
\texttt{extend}\ (X \vdash Y)\ (U \vdash V) = (X + U) \vdash (Y + V) \\
\MC_e = \texttt{pscmap}\ (\texttt{extend}\ (X \vdash Y))\ \MC_s
\end{gather*}
Then we define \texttt{glss}, the set of rules of \GLS{} by defining:
%We define these sets of rules:
\begin{description}
\item[\texttt{glil} and \texttt{glir}:] the unextended left and right
introduction rules, like $\MC_s$ above;
\item[\texttt{wkrls} and \texttt{ctrrls} $A$:] the unextended
weakening and contraction (on $A$) rules;
\item[\texttt{glne}:] all of the above;
\item[\texttt{glr} $B$:] the \GLR(B) rule;
\item[\texttt{glss}:] the axiom $A \vdash A$ (not requiring $A$ to be atomic),
the \GLR(B) rule for all $B$, and all extensions of all rules in \texttt{glne}.
\end{description}
The Isabelle definitions are given in Appendix~\ref{a-gls}.
% Since \GLS\ admits axiom extension, the logic could equally have been
% formulated with the axiom $p \vdash p$, for $p$ atomic.
% In fact the general result Theorem~\ref{t-gs2ssr} doesn't depend on either
% axiom or on axiom extension.
Note that in the \GLR\ rule,
$X$ is a multiset, and $\Box X$ is informal notation for applying
$\Box$ to each member of $X$; this is formalised as \texttt{mset\_map},
used in the formalised \GLR\ rule.
Using a similar notation we write $\Box B^k$ for $(\Box B)^k$,
the multiset containing $n$ copies of $\Box B$.
Development of \texttt{mset\_map} and relevant lemmas is in the source files
\texttt{Multiset\_no\_le.\{thy,ML\}}.
Our results there also show multisets form a monad,
see Appendix~\ref{a-ms} for details.
Our treatment of sequents and formulae amounts to a deep embedding of
sequents and formulae which is independent of the set of rules. The
implementation in \cite{pfenning-structural-lics94} is a shallow
embedding of sequents, which automatically implies the admissiblity of
certain structural rules like contraction and weakening.
\section{An Axiomatic Type Class for Multisets and Sequents} \label{s-axtc}
Isabelle provides a theory of finite multisets with an ordering
which we did not use;
% based on an underlying ordering on members: given multiset $M$,
% we obtain a multiset $N$ less than $M$ by removing an occurrence of some $m$
% from $M$, and replacing it with any number of occurrences of elements
% smaller than $m$.
we defined a different ordering $\leq$ analogous to
$\subseteq$ for sets: $N \leq M$ if, for all $x$, $N$ contains no more
occurrences of $x$ than does $M$.
An axiomatic type class in Isabelle is characterised by a number of axioms,
which hold for all members of a type in the type class.
% The axioms can be used to prove theorems, which then apply
% and can be used efficiently for any type which is in the class.
The multiset operators $\leq$, $+$, $-$ and $0$ have several useful properties,
which are described by the axiomatic type classes \pmz{} and \pmgez.
\comment{
\begin{verbatim}
axclass pm0 < comm_monoid_add, minus
pm0_plus_minus : "A + B - A = B"
pm0_minus_minus : "A - B - C = A - (B + C)"
\end{verbatim}
}
%
For any type in class \pmz, the operations
$+$ and $0$ form a commutative monoid and
% the type allows the $-$ operator, and
the following two properties hold.
\begin{equation*}
A + B - A = B \qquad \qquad A - B - C = A - (B + C)
\end{equation*}
%
\comment{
\begin{verbatim}
axclass pm_ge0 < pm0, ord, leq
all_ge0 : "0 <= A"
diff_0_le : "(m - n = 0) = (m <= n)"
le_add_diff_eq :
"B <= A ==> B + (A - B) = A"
pm_ge0_less_le :
"(x < y) = (x <= y & x ~= y)"
pg_leq_def : "a [= b == a <= b"
\end{verbatim}
}
%
We then define a class \pmgez\
which also has an $\leq$ operator and a smallest element $0$,
in which the axioms of \pmz\ and the following hold.
\begin{gather*}
0 \leq A \qquad \qquad
B \leq A \Rightarrow B + (A - B) = A \\
m \leq n \Leftrightarrow m - n = 0 \qquad
x < y \Leftrightarrow x \leq y \land x \not= y \qquad
a \sqsubseteq b \Leftrightarrow a \leq b
\end{gather*}
The last three axioms could be given as definitions,
except for a type where $\leq$, $<$ or $\sqsubseteq$
are already defined.
% but for natural numbers, $\leq$ and $<$ are already defined.
We define $\sqsubseteq$ as a synonym for $\leq$, because
Isabelle's lattice type class uses $\sqsubseteq$ as the order operator.
% We explain $\sqsubseteq$ below.
\begin{lemma}
Multisets are in \pmz\ and \pmgez\ using our definition of $\leq$,
and, if $\Gamma$ and $\Delta$ are of any type in the classes
\pmz\ or \pmgez, then so is sequent $\Gamma \vdash \Delta$.
\end{lemma}
Isabelle has ``simplification procedures'',
which will (for example) simplify $a - b + c + b$ to $a + c$ for integers,
or $a + b + c - b$ to $a + c$
for integers \emph{or} naturals.
The naturals obey the axioms above.
% We have been able to apply almost all of the ones which apply to naturals,
We have been able to apply the simplification procedures for naturals,
other than those involving the successor function \texttt{Suc},
to types of the classes \pmz\ and \pmgez.
This was a very great help in doing the proofs discussed in
\S\ref{s-caglr}, especially since
$X \vdash Y$ can be derived from
$X' \vdash Y'$ by weakening if and only if
$X \vdash Y \leq X' \vdash Y'$.
It is easy to show that, in the obvious way, multisets form a lattice.
In fact we found the interesting result
that the axioms of \pmgez\ are sufficient to give a lattice
(with $\sqsubseteq$ as the order operator,
% as in Isabelle's lattice type class,
defined as $a \sqsubseteq b $ iff $ a \leq b$), so we have:
% Isabelle's lattice type class uses $\sqsubseteq$ as the order operator,
% so, defining $a \sqsubseteq b $ iff $ a \leq b$,
\begin{lemma}
Any type of class \pmgez\ forms a lattice,
using the definitions \\
% {$c \land d = c - (c - d) $} and $ c \lor d = c + (d - c)$.
$$c \land d = c - (c - d) \qquad \qquad c \lor d = c + (d - c)$$
\end{lemma}
% \begin{proof}
% The definitions of meet and join are shown below:
% \begin{verbatim}
% "pg_meet ?c ?d == ?c - (?c - ?d)" "pg_join ?c ?d == ?c + (?d - ?c)"
% \end{verbatim}
From these definitions it is possible (at some length) to prove
the axioms for a lattice
and so any type in the class \pmgez\ is also in Isabelle's class
\texttt{lattice}.
% \qed
% \end{proof}
%
The source files for this development are \texttt{pmg*.\{thy,ML\}}.
\section{Capturing the Core of Cut-Admissibility Proofs} \label{s-cagen}
% Many cut-elimination proofs proceed via two main steps. The first step
% is to transform the given sub-derivations (subtrees) so that the
% cut-formula is principal in the resulting left and right
% sub-derivations, meaning that it is introduced as the principal
% formula of a logical rule. The second step ``reduces''
% such principal cuts into cuts which are ``smaller'' with respect to
% some well-founded ordering. We now describe how we captured these
% core steps in a very general form, widely applicable to many
% different sequent calculi.
Many cut-elimination proofs proceed via two main phases. The first
phase transforms the given derivations using several ``parametric''
steps until the cut-formula is the principal formula of the final
rule in the resulting sub-derivations above the cut.
(In the diagram for $\MC_e$ above, for example,
a parametric formula in the rule application is one within
the $X$ or $Y$, but $A \land B$ is principal;
a parametric step is used when the cut-formula is parametric
in the bottom rule application of a sub-derivation above the cut).
The ``principal cut'' is then
``reduced'' into cuts which are ``smaller'' in some well-founded
ordering such as size of cut-formula. We describe how we captured
this two-phase structure of cut-elimination proofs, and present a
widely applicable result that a parametric step is possible under
certain conditions.
In \S\ref{s-dp} we mentioned the induction principle \textbf{Ind} used for
deriving cut-admissibility, or indeed any property $P$ of pairs of
subtrees. In the diagram below, to prove $P (\MC_{l}, \MC_{r})$, the
induction hypothesis is that $P (\MQ_{li}, \MC_{r})$ and $P (\MC_{l},
\MQ_{rj})$ hold for all $i$ and $j$:
% \begin{center}
% \bpf
% \A "$\MQ_{l1} \ldots \MQ_{ln}$"
% \U "$\MC_{l}$" "$\rho_l$"
% \A "$\MQ_{r1} \ldots \MQ_{rm}$"
% \U "$\MC_{r}$" "$\rho_r$"
% \B "?" "(\textit{cut ?})"
% \epf
% \end{center}
%
\begin{center}
\begin{prooftree}
\AxiomC{$\MQ_{l1} \ldots \MQ_{ln}$}
\RightLabel{$\rho_l$}
\UnaryInfC{$\MC_{l}$}
\AxiomC{$\MQ_{r1} \ldots \MQ_{rm}$}
\RightLabel{$\rho_r$}
\UnaryInfC{$\MC_{r}$}
\RightLabel{(\textit{cut ?})}
\dottedLine
\BinaryInfC{?}
\end{prooftree}
\end{center}
A proof of $P (\MC_{l}, \MC_{r})$ using this induction hypothesis
inevitably proceeds by cases on the actual rules $\rho_l$ and
$\rho_r$, and for a cut-formula $A$, on whether it is principal
in either or both of $\rho_l$ and $\rho_r$. But we also use
induction on the size of the cut-formula, or, more generally, on some
well-founded relation on formulae.
%
% So we actually consider a property $P\ A\ (cl, cr)$
% where $psl$ are the premises $\MQ_{l1} \ldots \MQ_{ln}$ of rule
% $\rho_l$, and $cl$ is its conclusion $\MC_{l}$, and analogously for
% $\rho_r$, and $A$ is the cut-formula.
So we actually consider a property $P$ of a (cut) formula $A$
and (left and right subtree conclusion) sequents $(\MC_l, \MC_r)$.
In proving $P\ A\ (\MC_{l}, \MC_{r})$,
in addition to the inductive hypothesis above, we assume
that $P\ A'\ (\MD_l, \MD_r)$ holds generally for $A'$ smaller than $A$ and
all ``\texttt{rls}-derivable'' sequents $\MD_l$ and $\MD_r$:
\textit{i.e.} derivable from the empty set of
premises using rules from $rls$. These intuitions give the
following definition \texttt{gen\_step2ssr} of
a condition which permits one step of the inductive proof.
See Appendix~\ref{a-gen-step2ssr} for reference to related more complex
predicates and theorems.
% where we shorten ``derivable from the empty set of premises using rules
% from \texttt{rls}'' to ``\texttt{rls}-derivable'':
\begin{definition}[\texttt{gen\_step2ssr}] \label{d-gen-step2ssr}
For a formula $A$, a property $P$,
a subformula relation \texttt{sub},
a set of rules \texttt{rls},
inference rule instances $\MR_l = {(\psl, \MC_l)}$
and $\MR_r = {(\psr, \MC_r)}$,
\texttt{gen\_step2ssr} $P\ A$ \texttt{sub rls} $(\MR_l, \MR_r)$
means: %\\%[0.5em]
% if %all the following hold:
\begin{description}
\item[if] forall ${A'}$ such that $(A', A) \in \texttt{sub}$
and all \texttt{rls}-derivable sequents
$\MD_l$ and $\MD_r$,
${P\ A'\ (\MD_l, \MD_r)}$ holds
\item[ ] and for each ${\MP_{li}}$ in ${\psl}$,
% ${\MP}$ is \texttt{rls}-derivable and
${P\ A\ (\MP_{li}, \MC_r)}$ holds
\item[ ] and for each ${\MP_{rj}}$ in ${\psr}$,
% \texttt{\MP} is \texttt{rls}-derivable and
${P\ A\ (\MC_l, \MP_{rj})}$ holds
% \item ${\MC_l}$ and ${\MC_r}$ are \texttt{rls}-derivable.
\item[then] ${P\ A\ (\MC_{l}, \MC_{r})}$ holds.
\end{description}
\end{definition}
The main theorem \texttt{gen\_step2ssr\_lem} below for proving an
arbitrary property $P$ states that if the step of the inductive proof
is possible for all cases of final rule instances
$\MR_l$ and $\MR_r$
% ${(\psl, \MC_l)}$ and ${(\psr, \MC_r)}$
on each side, then $P$ holds in all cases.
\begin{theorem}[\texttt{gen\_step2ssr\_lem}] \label{t-gen-step2ssr-lem}
If
\begin{itemize}
\item \texttt{A} is in the well-founded part of the subformula relation
\texttt{sub},
\item sequents $\MS_l$ and $\MS_r$ are \texttt{rls}-derivable, and
\item for all formulae $A'$, and all rules $\MR_l$ and $\MR_r$,
% ${(\psl, \MC_l)}$ and ${(\psr, \MC_r)}$,
our induction step condition
\texttt{gen\_step2ssr} $P\ A'\ \texttt{sub rls}\ (\MR_l, \MR_r)$
holds
\end{itemize}
then ${P\ A\ (\MS_l, \MS_r)}$ holds.
\end{theorem}
This enables us to split up an inductive proof, by showing,
separately, that \texttt{gen\_step2ssr} holds for particular cases of
the final rules $(\MP_l, \MC_l)$ and $(\MP_r, \MC_r)$ on each side.
For
example, the inductive step for the case where the cut-formula $A$ is
parametric, not principal, on the left is encapsulated in the
following theorem where
\mbox{\texttt{prop2 mar erls} $A\ (\MC_l, \MC_r)$} means
that the conclusion of a multicut on \texttt{A} whose premises are
$\MC_l$ and $\MC_r$ is derivable using rules \texttt{erls}.
\begin{theorem}[\texttt{lmg\_gen\_steps}]~\label{t-gs2ssr}
For any relation \texttt{sub} and any rule set \texttt{rls},
given an instance of multicut with left and right subtrees
ending with rules $\MR_l$ and $\MR_r$:
\begin{description}
\item[if] weakening is admissible for the rule set \texttt{erls},
\item and all extensions of some rule $(\MP, X \vdash Y)$ are in the rule set
\texttt{erls},
\item and $\MR_l$ is an extension of $(\MP, X \vdash Y)$,
\item and the cut-formula $A$ is not in $Y$
(meaning that $A$ is parametric on the left)
% \end{itemize}
\item[then] \texttt{gen\_step2ssr (prop2 mar erls)} $A$ \texttt{sub rls}
$(\MR_l, \MR_r)$ holds.
\end{description}
\end{theorem}
Theorem~\ref{t-gs2ssr} codifies multi-cut elimination for parametric
cut-formulae, and applies generally to many different
calculi since it holds independently of the values of \texttt{sub} and
\texttt{rls}. Of course, for a system with explicit weakening rules,
such as \GLS{}, weakening is {\it a fortiori} admissible.
As we note later, the proof
for \GLS\ involves one really difficult case and a lot of fairly
routine cases. In dealing with the routine cases, automated theorem
proving has the benefit of ensuring that no detail is overlooked.
Moreover,
%We also have the fact that,
as in this example, we often have more
general theorems that apply directly to a set of rules such as \GLS.
Notice that all of this section has used a shallow embedding of
derivations since no explicit derivation trees were required.
% Of course, for a system with explicit weakening rules, such as
% \GLS{}, weakening is {\it a fortiori} admissible. But
% %we have made our results as general as we can.
% this result also applies generally to many different calculi since
% this theorem holds independently of the
% values of \texttt{sub} and \texttt{rls}!
% This result codifies parametric proof steps, and is quite general
% enough to apply in the proof for \GLS. As we note later, the proof
% for \GLS\ involves one really difficult case and a lot of fairly
% routine cases. In dealing with the routine cases, automated theorem
% proving has the benefit of ensuring that no detail is overlooked; in
% addition we have the fact that, as in this example, we often have more
% general theorems that apply directly to a set of rules such as \GLS.
% Notice that all of this section has used a shallow embedding of
% derivations since no explicit derivation trees were required.
\section{The Proof of Cut-Admissibility for \GLS} \label{s-caglr}
Valentini's proof of cut-admissibility for \GLS{} uses a triple
induction on the size of the cut-formula, the heights of the
derivations of the left and right premises of cut, and a third
parameter which he called the ``width''.
Roughly speaking, the width of a cut-instance is the number of \GLR{}
rule instances above that cut which contain a parametric ancestor of
the cut-formula in their conclusion. The Gor\'e and
Ramanayake~\cite{gr-valentini} pen-and-paper
proof follows Valentini
%in using these three measures,
but gives a constructive way to calculate the width of
a cut by inspecting the branches of the left and right sub-derivations
of a cut rule instance.
% \begin{figure}[t]
% \centering
% \normalsize
% \bpf
% \A "$\displaystyle \mu\left\{\ \frac{\Pi_l}{\GLRp}\right.$"
% \U "\GLRc" "\GLR(B)"
% \A "$\Pi_r$"
% \U "${\Box B^k, Y \vdash Z}$" "($\rho$)"
% \B [2em] "${\Box X, Y \vdash Z}$" "(\textit{multicut})"
% \epf
% \caption{A multicut on cut formula $\Box B$ where $\Box B$ is
% left-principal via \GLR{}}
% \label{f-muxbn}
% \end{figure}
\begin{figure}[t]
\centering
\normalsize
\begin{prooftree}
\AxiomC{$\displaystyle \mu\left\{\ \frac{\Pi_l}{\GLRp}\right.$}
\RightLabel{\GLRB}
\UnaryInfC{\GLRc}
\AxiomC{$\Pi_r$}
\RightLabel{$\rho$}
\UnaryInfC{${\Box B^k, Y \vdash Z}$}
\dottedLine
\RightLabel{(\textit{multicut ?})}
\BinaryInfC{${\Box X, Y \vdash Z}$}
\end{prooftree}
\caption{A multicut on cut formula $\Box B$ where $\Box B$ is
left-principal via \GLR{}}
\label{f-muxbn}
\end{figure}
As usual, the proof of cut-admissibility for \GLS{} proceeds by
considering whether the cut-formula is principal in the left or right
premise of the cut, or principal in both. The crux of the proof is a
``reduction'' when the cut-formula is of the form $\Box B$ and is
principal in both the left and right premises of the cut.
% as shown below:
% %
% \begin{prooftree}
% \AxiomC{$X, \Box X, \Box B \vdash B$}
% \RightLabel{GLR(B)}
% \UnaryInfC{$\Box X \vdash \Box B$}
% %
% \AxiomC{$Y, \Box Y, B, \Box B, \Box C\vdash C$}
% \RightLabel{GLR(C)}
% \UnaryInfC{$\Box B, \Box Y \vdash \Box C$}
% %
% \dottedLine
% \RightLabel{cut}
% \BinaryInfC{$\Box X, \Box Y \vdash \Box C$}
% \end{prooftree}
% %
The solution is to replace this cut on $\Box B$ by cuts which are
``smaller'' either because their cut-formula is smaller, or because
their width is smaller. In reality, most of the work involves a cut
instance which is only left-principal as shown in
Figure~\ref{f-muxbn}, and most of this section is devoted to show how we
utilised our general methods to formalise these ``reductions'' as
given by Gor\'e and Ramanayake~\cite{gr-valentini}. But there are some
important differences which we now explain.
As explained in \S\ref{s-dp}, our general methods do not model
derivation trees explicitly, since we use a shallow embedding. So our proof
uses induction on the size of the cut-formula and on the fact of
derivation, rather than on the size of the cut-formula and the height
of derivation trees.
Also, we actually proved the admissibility of ``multicut'':
that is, if {$\Gamma' \vdash A^{n}, \Delta'$} and
{$\Gamma'', A^{m} \vdash \Delta''$} are both cut-free derivable,
then so is {$\Gamma', \Gamma'' \vdash \Delta', \Delta''$}.
This avoids problems in the cases where these
sequents are derived using contraction on $A$.
In all other aspects, our proof of cut-admissibility for \GLS\ is
based on that given by Gor\'e and Ramanayake~\cite{gr-valentini}. In
particular, although we do not actually require
\cite[Lemma~19]{gr-valentini}, we use the construction in that lemma,
which is fundamental to overcoming the difficulties of adapting
standard proofs to \GLS, as we explain shortly. Consequently, our
proof uses the idea of induction on the \emph{width} as defined in
\cite{gr-valentini}, although as we shall see, our proof is expressed
in terms of \delz, which approximates to the \delzs\ of
\cite{gr-valentini}, not width {\em per se}.
To cater for width/\delzs, we could have altered our shallow embedding,
\texttt{derrec}, but that destroys the modularity of our
approach. Instead, we defined our \delz{} by using the datatype
\texttt{dertree} from \S\ref{s-dse} to represent explicit derivation
tree objects. These trees, and the general lemmas about them, are
similar to the trees of \cite{dawson-gore-cut-admissibility}. Thus we
``mix and match'' a deep embedding of derivation trees with a shallow
embedding of inductively defined sets of derivable sequents.
To ensure the correctness of our ``mixing and matching'' we needed the
following relationship between our definitions of derivability according
to the two embeddings.
A \textit{valid} tree is one whose inferences are in the set of rules
and which as a whole has no premises.
\begin{lemma} \label{l-derrec-valid}
Sequent $X \vdash Y$ is derivable, shallowly,
from the empty set of premises using rules \texttt{rls}
(ie, is in \texttt{derrec rls} $\{\}$)
%(ie, according to the definitions of \S\ref{s-dp})
iff some explicit derivation tree
\texttt{dt} is valid wrt.\ \texttt{rls} and has a conclusion
$X \vdash Y$.
\begin{verbatim}
"(?a : derrec ?rls {}) = (EX dt. valid ?rls dt & conclDT dt = ?a)"
\end{verbatim}
\end{lemma}
% The provability logic \GL\
% and its sequent system \GLS, including the \GLR\ rule,
% are described in \S\ref{s-intro}.
% It is clear that the \GLR\ rule implies the transitivity rule
% (by weakening $\Box B$ from the premise of \GLR),
% as shown on the left below;
% the rule on the right holds,
% as can be shown using \textit{(cut)} by the derivation in Figure~\ref{f-dlxxc},
% or by consideration of the semantics.
% $$
% \begin{array}{c}
% \lxxc \\ \hline \GLRp \\ \hline \GLRc
% \end{array}
% \qquad \qquad
% \frac \GLRp \lxxc
% $$
We now define a function \delz{} % in Figure~\ref{fig:delz}
which is closely related to
$\partial^0$ and the width of a cut of \cite{gr-valentini}, although
we can avoid using the annotated derivations of
\cite{gr-valentini}.
\begin{definition}[\delz] \label{d-del0}
For derivation tree \texttt{dt} and formula $B$,
define \delz\ $B$ \texttt{dt}: % is as follows:
% \delz\ $B$ \texttt{dt} is given by:
\begin{itemize}
\item if the bottom rule of \texttt{dt} is % \GLR\ (for \emph{any} $B,X$),
\GLR($Y,A$) (for \emph{any} $Y,A$),
then \delz\ $B$ \texttt{dt} is 1 (0) if $\Box B$ is (is not) in the antecedent
of the conclusion of \texttt{dt}
\item if the bottom rule of \texttt{dt} is not \GLR, then \delz\ $B$ \texttt{dt}
is obtained by summing \delz\ $B$ \texttt{dt'} over all premise subtrees
\texttt{dt'} of \texttt{dt}.
\end{itemize}
\end{definition}
%
%
Thus, the calculation of \delz\ traces up each branch of an explicit
derivation tree until an instance of the \GLR\ rule is found: it then
counts 1 if $\Box B$ is in the antecedent, meaning that $B$ is in the
$X$ of the statement of \GLR. Where the derivation tree branches
below any \GLR\ rule, the value is summed over the branches.
(When we use \delz, its argument will be the
sub-tree $\mu$ of Figure~\ref{f-muxbn}.)
We now give a sequence of lemmata which eventually give the
``reduction'' for left-and-right \GLR{}-principal occurrences
of a cut-formula of the form $\Box B$.
\begin{lemma}[\texttt{gr19e}] \label{l-gr19e} If $\mu$ is a valid
derivation tree with conclusion \GLRp{}, and \delz\ $B\ \mu = 0$, then
\lxxc\ is derivable.
\end{lemma}
\begin{proof}
By applying the \GLR\ rule to the conclusion, we can derive \GLRc.
Tracing $\Box B$ upwards in $\mu$, it is parametric in each
inference, except possibly weakening, contraction or the axiom $\Box
B \vdash \Box B$. That is, since \delz\ $B\ \mu = 0$, when we meet a
\GLR\ inference as we trace upwards, $\Box B$ must have already
disappeared (through weakening). So, tracing upwards, we can
change each instance of $\Box B$ to $\Box X$ in the usual way.
The axiom $\Box B \vdash \Box B$ is changed to \GLRc, which is
derivable. Contraction is not problematic since we use, as the
inductive hypothesis, that \emph{all} occurrences of $\Box B$ can be
replaced by $\Box X$. \qed
\end{proof}
To abbreviate the statement of several lemmas, we define a function
\muxbn, based on Figure~\ref{f-muxbn}, which is from~\cite{gr-valentini}.
It concerns a multicut on a cut-formula $\Box B$ which is left-principal
because the bottom rule on the left is \GLRB.
\begin{definition}[\muxbn] \label{d-muxbn}
\muxbn\ $B\ n$ holds iff:
for all instances of Figure~\ref{f-muxbn} (for fixed $B$) such that
\delz\ $B\ \mu \leq n$, the multicut in Figure~\ref{f-muxbn}
is admissible.
\end{definition}
% Intuitively, \texttt{muxbn B n} states that if the left premise
% derivation consists of a derivation $\mu$ whose conclusion
% is the premise of an instance of the \GLRB\ rule, and \delz~\texttt{B}
% $\mu \leq n$, then multicut-admissibility
% holds for GLS (rule set \texttt{glss}) with cut-formula $\Box B$:
% see Figure~\ref{f-muxbn}, which is from~\cite{gr-valentini}.
% % (In the following diagram, $\mu = \pi / \GLRp$).
% That is, the multicut in Figure~\ref{f-muxbn} is admissible.
% \vpf
% \begin{figure}
% \centering
% \normalsize
% \bpf
% \A "$\displaystyle \mu\left\{\ \frac{\pi}{\GLRp}\right.$"
% \U "\GLRc" "\GLR"
% \A "$\sigma$"
% \U "${\Box B^k, U \vdash W}$"
% \B [2em] "${\Box X, U \vdash W}$" "(\textit{multicut})"
% \epf
% \caption{Definition of \muxbn}
% \label{f-muxbn}
% \end{figure}
The next lemma says that multicut admissibility on $B$
implies \muxbn\ $B$ 0.
\begin{lemma}[\texttt{del0\_ca', caB\_muxbn'}] \label{l-caB_muxbn'} If
$\mu$ is a valid derivation tree with conclusion \GLRp, and \delz\
$B\ \mu = 0$, and multicut on $B$ is admissible, and $\Box B^k, Y
\vdash Z$ is derivable, then $\Box X, Y \vdash Z$ is derivable.
That is, if multicut on $B$ is admissible, then \muxbn\ $B$ 0 holds.
\end{lemma}
% \begin{verbatim}
% caB_muxbn' : "mas glss ?B = UNIV ==> muxbn ?B 0"
% \end{verbatim}
\begin{proof}
\GLRc{} is derivable from \GLRp{} via \GLRXB. By
Lemma~\ref{l-gr19e}, \lxxc\ is derivable. The rest of the proof is
by induction on the derivation of $\Box B^k, Y \vdash Z$, in effect,
by tracing relevant occurrences of $\Box B$ up that
derivation. Weakening and contraction involving $\Box B$ are not
problematic, and the axiom $\Box B \vdash \Box B$ is changed to
\GLRc, which is derivable. Suppose an inference \GLR($Y,C$) is
encountered, where $B$ is in $Y$. This inference, allowing for
multiple copies of B, is as shown below left where $Z$ is $Y$ with
$B$ deleted:
% NOTE - slides include these using prooftree in bussproofs
\vpf
$$
\bpf
\A "$\Box B^k, B^k, \Box Z, Z, \Box C \vdash C$"
\U "$\Box B^k , \Box Z \vdash \Box C$" "\GLR($Y,C$)"
\epf
\hspace*{-2em}
\bpf
\A "Lemma~\ref{l-gr19e}"
\U "\lxxc"
\A "${\Box X, B^k, \Box Z, Z, \Box C \vdash C}$"
% following line for context-sharing multicut
% \B "${\Box X, X, \Box Z, Z, \Box C \vdash C}$" "$\textit{mcut}^*(B)$"
\B "${\Box X, \Box X, X, \Box Z, Z, \Box C \vdash C}$"
"$\textit{mcut}(B)$"
\U "${\Box X, X, \Box Z, Z, \Box C \vdash C}$" "\textit{ctr}"
\U "${\Box X, \Box Z \vdash \Box C}$" "\GLR($C$)"
\epf
$$
By induction, we have
${\Box X, B^k, \Box Z, Z, \Box C \vdash C}$ is derivable.
From there we have the derivation shown above right.
% where $\textit{mcut}^*(B)$ means weakening and
% then the admissible multicut on $B$.
As the machine-checking process showed us,
additional weakening steps are necessary
if $\Box B$ is in $Z$ or if $B$ is in $\Box Z$.
% \vpf
% \begin{center}
% \bpf
% \A "\lxxc"
% \A "${\Box X, B^k, \Box Z, Z, \Box C \vdash C}$"
% \B "${\Box X, \Box X, X, \Box Z, Z, \Box C \vdash C}$"
% "(\textit{multicut on B})"
% \U "${\Box X, X, \Box Z, Z, \Box C \vdash C}$" "(\textit{contraction})"
% \U "${\Box X, \Box Z \vdash \Box C}$" "(\GLR($C$))"
% \epf
% \end{center}
\qed
\end{proof}
This construction is like that of case 2(a)(ii) in the proof of
Theorem~20 of \cite{gr-valentini}. Having shown, in
Lemma~\ref{l-caB_muxbn'}, that \muxbn\ $B$ 0 holds, we now use the
construction of \cite[Lemma~19]{gr-valentini} to show that \muxbn\ $B$
$n$ holds for all $n$: except that our inductive
assumptions and results involve admissibility of multicut, not cut.
Again we use induction: we assume that \muxbn\ $B$ $n$, and show that
\muxbn\ $B$ $(n+1)$ holds.
So suppose a derivation tree $\mu / \GLRc$ has a bottom inference
\GLRXB, as shown in Figure~\ref{f-muxbn}, and \delz\ $B\ \mu = n
+ 1$. We follow the construction of \cite[Lemma~19]{gr-valentini} to
obtain a derivation $\mu'$ of \GLRp, where \delz\ $B\ \mu' \leq n$.
% We first look at deriving \lxxc.
Since \delz\ $B\ \mu > 0$, the tree $\mu / \GLRc$ is as shown below left
(with other branches not shown).
We first delete the topmost application of \GLR{($A$)} leaving
a tree $\mu^{-}$.
Then adjoin $\Box A$ to each antecedent of $\mu^{-}$
obtaining the tree on the right
(call it $\mu^A / \Box A, \GLRc$),
whose topmost sequent is now a weakened axiom,
and which requires us to weaken in an occurrence of
$A$ just above the final GLR{}-rule instance:
% NOTE - slides include these using prooftree in bussproofs
\vpf
\begin{center}
\bpf
\A "${\Box G, G, \Box B^k, B^k, \Box A\vdash A} $"
\U "\gbna" "\GLR($A$)"
\U "\vdots"
\U "\GLRp"
\U "\GLRc" "\GLR($X,B$)"
\epf
\qquad
\bpf
% \A "$\Box A \vdash \Box A$"
% \U "$\Box A, A, \gbna$" "(\textit{weakening})"
\A "$\Box A, \gbna$"
\U "\vdots"
\U "$\Box A, \GLRp$"
\U "$\Box A, A, \GLRp$" "(\textit{weakening})"
\U "$\Box A, \GLRc$" "\GLR($B$)"
\epf
\end{center}
Now \delz\ $B\ \mu > \mbox{\delz}\ B\ \mu^A$,
and so $\mu^A / \Box A, \GLRc$ can be
% the tree on the right has a smaller value of \delz\ $B$, and so can be
used as the left branch of an admissible (i.e.\ ``smaller'') multicut.
We do this twice, the right branches being derivations of \GLRp\ and
${\Box G, G, \Box B^k, B^k, \Box A\vdash A} $ respectively; which
after contractions, give derivations of $\Box A, \lxxc$ and ${\Box G,
G, \Box X, B^k, \Box A\vdash A}$. These two are cuts 1 and 2 of the
description in \cite[Lemma~19]{gr-valentini}, producing derivations
which are cut-free equivalents of $\Lambda_{11}$ and $\Lambda_{12}$
from \cite[Lemma~19]{gr-valentini}. The result \texttt{gr19a} in the
Isabelle code %Lemma~\ref{l-gr19a}
gives the existence of these two derivations; it uses the result
\texttt{add\_Box\_ant} which permits adding $\Box A$ to the antecedent
of each sequent of a derivation tree.
We combine these derivations of $\Box A, \lxxc$ and $\Box G, G, \Box
X, B^k, \Box A\vdash A$ using an admissible (``smaller'') multicut on
$B$, and then use contraction to obtain $\Box G, G, \Box A, \Box X, X
\vdash A$. This is cut 3 of~\cite[Lemma~19]{gr-valentini}. Then the
\GLR\ rule gives $\Box G, \Box X \vdash \Box A$. This is derivation
$\Lambda_{2}$ of \cite[Lemma~19]{gr-valentini}. Because of this \GLR\
rule at this point, we do not need to worry about the \delz\ $B$
values of any of the subtrees mentioned above, whose existence is
given by the inductive assumptions of multicut-admissibility. We now
weaken the conclusion of this tree to $\Box X, \Box G, \Box B^k \vdash
\Box A$, giving (a counterpart of) the derivation $\Lambda_{3}$ of
\cite[Lemma~19]{gr-valentini}.
% \begin{figure}[t]
% \centering
% \normalsize
% \bpf
% \A "\GLRp"
% \U "\GLRc" "(\GLR)"
% \A "\GLRp"
% \B "${\Box X, \Box X, X \vdash B}$" "(\textit{cut})"
% \U "\lxxc" "(\textit{contraction})"
% \epf
% \caption{Derivation of \lxxc\ from \GLRp}
% \label{f-dlxxc}
% \end{figure}
Returning to $\mu^{-}$, as below left, we this time adjoin $\Box X$
in the antecedent, giving the tree below right,
but we can now use $\Lambda_{3}$ as a derivation for its leaf:
That is, we have replaced a given derivation $\mu$ of \GLRp\, where
\delz\ $B\ \mu = n + 1$, with a derivation $\mu'$ of \GLRp\ where
\delz\ $B\ \mu' = n$.
\vpf
\begin{center}
\bpf
\A "\gbna"
\U "\vdots"
\U "\GLRp"
\U "\GLRc" "\GLR(X,B)"
\epf
\qquad \qquad
\bpf
\A "$\Lambda_3$"
\U "$\Box X, \gbna$"
\U "\vdots"
\U "$\Box X, \GLRp$"
\U "$\GLRp$" "(\textit{contraction})"
\U "\GLRc" "\GLR"
\epf
\end{center}
We record this as the result as Lemma~\ref{l-gr19bc}\ref{l-gr19b}
(\texttt{gr19b}) below, however, we do not use this result directly.
Instead, we obtain
Lemma~\ref{l-gr19bc}\ref{l-gr19c} (\texttt{gr19c'}) below by referring to
Figure~\ref{f-muxbn} and noting that we have replaced $\mu$ by $\mu'$.
\comment{
\begin{verbatim}
PROBABLY OMIT THIS - AT PRESENT WE HAVE A DESCRIPTION OF THIS RESULT BUT NO
FORMAL STATEMENT
gr19a; "[| ([conclDT ?dtglp], ?seq) : glrxb ?X ?B;
valid glss ?dtglp; del0 ?B ?dtglp = Suc ?n; muxbn ?B ?n |]
==> EX A H dtH.
({#Box A#} + mset_map Box ?X + ?X |- {#?B#}) : derrec glss {} &
({#Box A#} + mset_map Box (?X + ms_delete {?B} H) + H |- {#A#})
: derrec glss {} &
premsDT dtH = [mset_map Box H |- {#Box A#}] &
conclDT dtH = conclDT ?dtglp &
allDT (frg glss) dtH & del0 ?B dtH <= ?n" : Thm.thm
\end{verbatim}
}
\begin{lemma}[\texttt{gr19b, gr19c'}] \label{l-gr19bc} Assume that
multicut-admissibility holds for cut-formula $B$, and
that \muxbn\ $B$ $n$ holds.
\begin{enumerate}
\item \label{l-gr19b}
If $\mu$ is a derivation of \GLRp, where \delz\ $B\ \mu = n + 1$,
then there exists another derivation $\mu'$ of \GLRp\ with
\delz\ $B\ \mu' \leq n$.
\item \label{l-gr19c} \muxbn\ $B$ $(n+1)$ holds.
\end{enumerate}
\end{lemma}
\begin{proof} The proof of the first part is the construction
described above. For the second part, we are given an instance of
Figure~\ref{f-muxbn}, where $\mu$ has \delz\ $B\ \mu = n + 1$.
Using the first part, we can transform $\mu$ to $\mu'$, with \delz\
$B\ \mu' = n$. Since \muxbn\ $B$ $n$ holds, the multicut on $\Box
B$ where the left premise derivation is $\mu' / \GLRc$ is
admissible. Hence the conclusion $\Box X, Y \vdash Z$ of
Figure~\ref{f-muxbn} is derivable.
\qed
\end{proof}
% OMITTING THESE TWO LINES SAVES A LOT OF SPACE
The next result, which we do not use directly, approximates to
\cite[Lemma~19]{gr-valentini}.
\begin{lemma}[\texttt{gr19d}] \label{l-gr19d} If
multicut-admissibility holds for cut-formula $B$, \muxbn\ $B\ 0$
holds, and \GLRp\ is derivable, then \lxxc\ is
derivable.
\end{lemma}
% \begin{verbatim}
% gr19d : "[| ((conclDT ?dtglp) =
% (mset_map Box ?X + ?X + {# Box ?B#} |- {#?B#})) ;
% valid glss ?dtglp; mas glss ?B = UNIV; muxbn ?B 0 |]
% ==> (mset_map Box ?X + ?X |- {#?B#}) : derrec glss {}"
% \end{verbatim}
\begin{proof}
By using Lemma~\ref{l-gr19bc}\ref{l-gr19c} repeatedly, \muxbn\ $B\
n$ holds for any $n$.
\qed
% Then the cut of Figure~\ref{f-dlxxc} is
% admissible, giving the result.
\end{proof}
\begin{lemma}[\texttt{caB\_muxbn, cut\_glr}] \label{l-cut-glr} If
multicut-admissibility holds for cut-formula
$B$, then it holds for cut-formula $\Box B$
for the case where the left premise derivation ends with a \GLRB\
rule. That is, \muxbn\ $B\ n$ holds for any $n$.
\end{lemma}
\begin{proof}
By combining Lemma \ref{l-caB_muxbn'} with
repeated use of Lemma~\ref{l-gr19bc}\ref{l-gr19c}.
\qed
\end{proof}
We now have the admissibility of principal multicuts on $\Box B$,
which we indicated was the crux of the cut-admissibility for \GL{}.
For the other cases, the usual proofs hold. Although there is no
particular difficulty in showing this, many details need to be
checked: this takes almost 3 pages of \cite{gr-valentini}, even with
``omitting details'', and citing ``standard transformations''. The
value of automated theorem proving at this point is simply to ensure
that all the necessary details have been checked. The proof uses
techniques described in \S\ref{s-cagen}, which means that a relatively
small amount of the proofs remaining for this theorem are peculiar to
the \GLS\ calculus. Consequently, we get:
\begin{theorem}[\texttt{glss\_ca}] \label{t-glss-ca}
Multicut is admissible in \GLS.
\end{theorem}
\section{Conclusions}
\label{s-concl-fw}
We have described a formalised proof in the Isabelle theorem prover of
cut admissibility for the sequent calculus \GLS\ for the provability logic \GL.
The proof is based on the one from \cite{gr-valentini},
particularly the construction in \cite[Lemma~19]{gr-valentini},
though we use it in a slightly different way.
The Isabelle proof moves to and fro between our deep embedding
of explicit trees (\texttt{dertree}) and our shallow embedding using
\texttt{derrec},
so our proof has demonstrated the technique of combining
use of a shallow embedding where adequate with a deep embedding where necessary.
% The work in \S\ref{s-cagen}, while unpleasantly complex, was valuable in
% completing the proof
% since the proof of Theorem~\ref{t-glss-ca} depended,
% apart from Lemma~\ref{l-cut-glr},
% mostly upon general results such as Theorem~\ref{t-gs2ssr}.
The work in \S\ref{s-cagen}, while complex, succeeds in extracting those parts
of the cut elimination proof which follow a common pattern,
and expressing them in a way which will be useful
beyond the particular sequent system GLS.
% we know of no other work which features this degree of freedom:
% shallow while possible, deep when essential.
We have made considerable use of definitions and theorems which are
useful beyond this particular sequent system, or, indeed, beyond
proofs about sequent systems. We have developed axiomatic type
classes based on properties of $+$, $-$, $0$ and $\leq$ for multisets,
and shown how a rather small set of axioms about these operators is
sufficient for defining a lattice. Multiset sequents (pairs of
multisets) also belong to this type class. We have applied relevant
simplification procedures to this type class, which was useful in our
proofs. We have described extensive definitions and theorems relating
to abstract derivability, which we have used in several different
metalogical theories and proofs, and we have discussed the issue of
deep versus shallow embeddings in the light of % the experience of
this and previous work.
%%% Local Variables:
%%% mode: latex
%%% TeX-master: "root"
%%% End: