# Solving a Logic Problem with Coq

I’ve been playing around with the Coq proof assistant tool over the weekend. Here’s an interesting problem and its solution proven with Coq. I am by no means an expert in this. I’ve used Coq for less than 6 hours at this point. There are a lot of nicer tutorials out there, including the one on the Coq web site. But this is an interesting motivating example, and I’m writing to share my experiences.

The puzzle I’ll solve here comes from John Pratt’s page. A lot of supposed “logic puzzle” sites aren’t really logic puzzles at all, but just plays on words; but that one is okay. Here’s the puzzle:

When asked her 3 children’s ages, Mrs. Muddled said that Alice is the youngest unless Bill is, and that if Carl isn’t the youngest then Alice is the oldest. Who is the oldest and who is the youngest?

The first step is to start up Coq. In my case I’ve generally used CoqIde, which is a simple apt-get in Ubuntu. However, for this blog post I’ll use the command-line `coq-interface`

in hopes of more portability. Let's tell Coq a few of the basics of the problem, such as that there's a domain of discourse, and it has three children in it named Alice, Bill, and Carl.

Section Ages. Variable Child : Set. Variable Alice : Child. Variable Bill : Child. Variable Carl : Child. Variable Youngest : Child -> Prop. Variable Oldest : Child -> Prop.

The last two lines declare unary predicates for the domain of discourse, indicating whether a child is the youngest or the oldest. Unfortunately, Coq doesn't know some things that we're assuming here. One such fact is that the three children are different from each other. So let's tell it that.

Hypothesis Diff1 : Alice <> Carl. Hypothesis Diff2 : Bill <> Carl. Hypothesis Diff3 : Alice <> Bill.

Each of these hypotheses has a name, so that we can refer to it in the proof later.

Next, there's some implicit meaning in the superlative words "youngest" and "oldest".

Hypothesis Superlative1 : forall A B, (A <> B) -> Youngest A -> ~Youngest B. Hypothesis Superlative2 : forall A B, (A <> B) -> Oldest A -> ~Oldest B.

In other words, you can't have two different children who are both the youngest or the oldest. Even twins are usually born a few minutes apart!

Finally, "youngest" and "oldest" are opposites. Since there is more than one child, no one can be both.

Hypothesis Antonym1 : forall A, Oldest A -> ~ Youngest A. Hypothesis Antonym2 : forall A, Youngest A -> ~ Oldest A.

Having now explained the terminology to Coq, we may now enter the information given by Mrs. Muddled in the problem.

Hypothesis Given1 : ~ Youngest Alice -> Youngest Bill. Hypothesis Given2 : ~ Youngest Carl -> Oldest Alice.

Now that Coq understands the premises, we need to tell it what we are proving. That requires solving the puzzle. Simple enough: Alice is the youngest unless Bill is, which means Carl can't be the youngest. But if Carl is not the youngest, then Alice is the oldest, which leaves Bill as the youngest. So the youngest is Bill, the middle child is Carl, and the oldest is Alice. Let's tell Coq that.

Goal Oldest Alice /\ Youngest Bill.

And Coq responds by listing the goal, and all the hypotheses we may use to prove it:

1 subgoal Child : Set Alice : Child Bill : Child Carl : Child Youngest : Child -> Prop Oldest : Child -> Prop Diff1 : Alice <> Carl Diff2 : Bill <> Carl Diff3 : Alice <> Bill Superlative1 : forall A B : Child, A <> B -> Youngest A -> ~ Youngest B Superlative2 : forall A B : Child, A <> B -> Oldest A -> ~ Oldest B Antonym1 : forall A : Child, Oldest A -> ~ Youngest A Antonym2 : forall A : Child, Youngest A -> ~ Oldest A Given1 : ~ Youngest Alice -> Youngest Bill Given2 : ~ Youngest Carl -> Oldest Alice ============================ Oldest Alice /\ Youngest Bill

Fair enough. The proof wizard doesn't know how to solve this, so we'll need to take it on by hand. To prove a conjunction, we can individually prove each side. Coq calls this the "split" tactic.

split.

And Coq responds:

2 subgoals Child : Set Alice : Child Bill : Child Carl : Child Youngest : Child -> Prop Oldest : Child -> Prop Diff1 : Alice <> Carl Diff2 : Bill <> Carl Diff3 : Alice <> Bill Superlative1 : forall A B : Child, A <> B -> Youngest A -> ~ Youngest B Superlative2 : forall A B : Child, A <> B -> Oldest A -> ~ Oldest B Antonym1 : forall A : Child, Oldest A -> ~ Youngest A Antonym2 : forall A : Child, Youngest A -> ~ Oldest A Given1 : ~ Youngest Alice -> Youngest Bill Given2 : ~ Youngest Carl -> Oldest Alice ============================ Oldest Alice subgoal 2 is: Youngest Bill

In other words, we're currently working on proving `Oldest Alice`

, but once we finish that, we'll need to prove `Youngest Bill`

. Well, there's only one good way to conclude `Oldest Alice`

, and that's to show that Carl is not the youngest. So let's prove the current subgoal by applying `Given2`

.

apply Given2.

I'll stop quoting the entirety of Coq's responses, but here's the important part:

============================ ~ Youngest Carl

So how do we know that Carl is not the youngest? Well, it's because Bill is the youngest. That's rule `Superlative1`

, but we'll also have to tell Coq to use Bill as the other child.

apply Superlative1 with Bill.

And Coq responds:

============================ Bill <> Carl subgoal 2 is: Youngest Bill

So we need to prove that Bill is not Carl, and then that Bill is the youngest. (That Bill is the youngest is also subgoal 3 left over from splitting the conjunction, a fact that we'll think about shortly.) For now, it's easy to prove that Bill is not Carl, because that was one of our hypotheses.

apply Diff2.

And Coq responds:

============================ Youngest Bill

Now it remains only to prove that Bill is the youngest. The way to go about that is to apply the first given.

apply Given1.

And Coq responds:

============================ ~ Youngest Alice

This is the tricky part. The reason that Alice is not the youngest is that if she were the youngest, then Carl would not be the youngest, but then the second given statement would lead us to conclude that Alice is the oldest, which is a contradiction. Proof by contradiction, though, is generally a technique valid only in classical logic, not in the constructive logic that Coq typically uses. It turns out we won't need to resort to classical logic for this theorem, but just to make things easier in our first pass we'll go ahead and use it. This is one statement in Coq.

Require Import Classical.

So now having classical logic at our disposal, we can go about proving that Alice is not the youngest by applying Peirce's law.

apply Peirce.

And Coq's response:

============================ (~ Youngest Alice -> False) -> ~ Youngest Alice

We'll split up the implication by giving a name to the hypothesis and then proving the conclusion. This tactic is called "intro".

intro H.

Coq responds:

H : ~ Youngest Alice -> False ============================ ~ Youngest Alice

Note that the new hypothesis is the double-negation of "Alice is the youngest." In classical logic, which we're now using, that's the same as Alice being the youngest, so we've essentially used Peirce's law to start the proof by contradiction. Now we'll go backward through the contradiction argument given above. First, Alice can't be the youngest because she's the oldest. She's the oldest because Carl is not the youngest. Carl is not the youngest because we've assumed (to obtain the contradiction) that Alice is the youngest.

apply Antonym1. apply Given2. apply Superlative1 with Alice. apply Diff1.

And here's where we are, according to Coq.

H : ~ Youngest Alice -> False ============================ Youngest Alice

This is just a double negation: a rule from the Classical package that goes by the name NNPP. Once we double-negate the conclusion, it will look slightly different from the hypothesis H, but they will actually mean the same thing. (`~A`

is just shorthand for `A -> False`

).

apply NNPP. exact H.

Coq says:

============================ Youngest Bill

So we've proven the left half of the conjunction: that Alice is the oldest. We now need to prove the right half. But wait! We already did that, as the second step to proving the left half. For now, we'll just copy and paste the last part of the earlier proof.

apply Given1. apply Peirce. intro H. apply Antonym1. apply Given2. apply Superlative1 with Alice. apply Diff1. apply NNPP. exact H.

and Coq says:

Proof completed.

Finally, we get to say:

Qed.

Ah, we're done. But it turns out we can improve the proof a little by using a few more Coq features: specifically, by proving some lemmas (or lemmata, if your prefer), and by eliminating the use of classical logic (since this happens to be a constructively valid statement; not all classically true statements will be, though).

First, although Peirce's law (`(~P -> P) -> P`

) is not true in constructive logic, but we can prove `(P -> ~P) -> ~P`

. Let's do so, by introducing a lemma, and then by using the automatic prover.

Lemma NPeirce : forall P : Prop, (P -> ~P) -> ~P. auto. Qed.

In this case, the automated theorem prover worked fine. We could have proven the theorem by hand, as well (by treating the `~P`

as `P -> False`

, using the intro tactic, applying the modus ponens rule, and then easily satisfying all of the hypotheses). Using this, we can simplify our proof of `Youngest Bill`

. The process is the same, but instead of applying Peirce's law and the beginning and the double-negation at the end, we apply our new NPeirce lemma at the beginning, and we don't need double-negation at the end.

Lemma YoungBill : Youngest Bill. apply Given1. apply NPeirce. intro H. apply Antonym1. apply Given2. apply Superlative1 with Alice. apply Diff1. exact H. Qed.

Notice that I've also defined this as a lemma. That's because we needed to prove it twice in the earlier attempt. May as well prove it once up front instead. So now comes the final goal.

Goal Oldest Alice /\ Youngest Bill. split. apply Given2. apply Superlative1 with Bill. apply Diff2. apply YoungBill. apply YoungBill. Qed.

And we're done. For reference, here's the entire proof as entered in CoqIde.

Variable Child : Set. Variable Alice : Child. Variable Bill : Child. Variable Carl : Child. Variable Youngest : Child -> Prop. Variable Oldest : Child -> Prop. Hypothesis Diff1 : Alice <> Carl. Hypothesis Diff2 : Bill <> Carl. Hypothesis Diff3 : Alice <> Bill. Hypothesis Superlative1 : forall A B, (A <> B) -> Youngest A -> ~Youngest B. Hypothesis Superlative2 : forall A B, (A <> B) -> Oldest A -> ~Oldest B. Hypothesis Antonym1 : forall A, Oldest A -> ~ Youngest A. Hypothesis Antonym2 : forall A, Youngest A -> ~ Oldest A. Hypothesis Given1 : ~ Youngest Alice -> Youngest Bill. Hypothesis Given2 : ~ Youngest Carl -> Oldest Alice. Lemma NPeirce : forall P : Prop, (P -> ~P) -> ~P. auto. Qed. Lemma YoungBill : Youngest Bill. apply Given1. apply NPeirce. intro H. apply Antonym1. apply Given2. apply Superlative1 with Alice. apply Diff1. exact H. Qed. Goal Oldest Alice /\ Youngest Bill. split. apply Given2. apply Superlative1 with Bill. apply Diff2. apply YoungBill. apply YoungBill. Qed.

An entire article devoted to formally proving something, and it isn’t even proofread!

You missed a > in a /code tag there. :-)

WHY NAME A LANGUAGE TO RHYME WITH COCK

@Anonymous: The language comes from France. QED.

Leon, thanks. Fixed.

Uh .. Prolog?

youngest(alice) :- not(oldest(alice)).

youngest(bill) :- not(oldest(bill)).

oldest(alice) :- not(youngest(carl)).

?- oldest(X),youngest(Y).

X = alice,

Y = bill.

Coq means rooster (= cock out of the USA), it’s a play on CoC (Calculus of Construction).

Karl: Haha, my thought exactly. Been a while since I’ve been working with prolog, but I ended up with something like you. That’s what, 3 lines in Prolog vs. 30 in Coq? Coq has been outsmarted.

FWIW, I posted a solution in Coq that, given the problem as formulated in this post, solves it with a single tactic: firstorder. My entire reply can be found here.

Cool, Prolog does it in three lines total.

Sure, but this misses the point: it tells us a lot about the simplicity of the example rather than anything particularly compelling about either Prolog or Coq. First of all, the point of Coq is not to fully automate proof, although the firstorder tactic does everything that Prolog’s resolution does, and more. Secondly, precisely because the Calculus of Inductive Constructions is more expressive than the decidable fragment of first-order logic that Prolog treats, cdsmith had to make certain things that are implicit in Prolog explicit, e.g. that the children are members of a set, that these members are disjoint, that if “youngest” or “oldest” are true of one then they are not true of another, that “youngest” and “oldest” are opposites, and so on. But once these non-problem-specific things are encoded, the actual problem encoding—Given1 and Given2—are direct from the problem description, and firstorder does indeed completely automate the proof.

But again, you should ignore most of that, because all it shows is that Coq does

at least as muchas Prolog. What Coq is really for is helping people prove things that can’t be automatically proven at all, e.g. the four-color map theorem, or that a compiler for a programming language correctly compiles all possible source programs into their semantically-identical respective target programs.Okay, but I didn’t find your one line FOL, and I don’t know Coq so I seriously don’t see what you have done. What is “Unnamed_thm” and what is “firstorder”?

Sorry, yeah, I should have explained the transcript better. I’ll try to be brief. Basically, Coq is a proof assistant using the natural deduction style. Given a set of definitions such as the Variables and Hypotheses in this problem, when you say what it is that you want to prove, Coq shows you the current context above the “=========” line, and the current (sub)goal below that line. Given the context, the Coq standard libraries, and possibly third-party libraries if necessary, you apply what are called “tactics” in an effort to prove, or at least simplify, the current subgoal, and you repeat until you don’t have any subgoals left. What this process is

literallydoing is constructing a term in Coq’s logic, the Calculus of Inductive Constructions, piecemeal. This term is the actual proof. The stuff you, the user, are typing in is more properly called a “proof script.” So one thing cdsmith and I have shown is that there may be more than one proof script that will successfully construct a proof for a goal, and some of these scripts will be shorter than others. :-)When cdsmith said “Goal Oldest Alice /\ Youngest Bill.” s/he was literally saying “Here’s a theorem I want to prove.” “Goal” essentially introduces anonymous theorems, so that’s where “Unnamed_thm” came from. S/he could also have said “Theorem OldestAliceAndYoungestBill Oldest Alice /\ Youngest Bill” and the prompt would have become “OldestAliceAndYoungestBill < ” instead. When you say you want to prove something to Coq, it goes into proof-editing mode and the prompt changes to reflect what you’re proving. This is the context in which you ask Coq to apply tactics in an attempt to simplify (away) the current (sub)goal. Coq has a

lotof built-in tactics, and many, many more are available in libraries, but again, this is anextremelysimple puzzle, so the built-in ones are enough. cdsmith is a newcomer to Coq, so s/he used the most basic tactics that all of the introductory materials teach you about (intro, split, apply, exact…) and some understanding of both classical and intuitionistic logic variants of Pierce’s law. What I did was to pull out one of Coq’s sledgehammer tactics, firstorder, which implements a very powerful first-order proof-search procedure, the basis of which is documented in Pierre Corbineau’s paper, First-order reasoning in the Calculus of Inductive Constructions. So the firstorder tactic does automatically what cdsmith did somewhat more manually, although it’s worth emphasizing that both cdsmith’s proof script and mine generate a term in the Calculus of Inductive Constructions that is quite a bit larger than either script. :-) You can see the term by asking Coq to “Print Unnamed_thm.” after it is defined.This, by the way, is another difference between Prolog and Coq: typically in Prolog, you just want the result(s) of the resolution, and in fact one of the nice things is you get to ask “Who is oldest and who is youngest?” In Coq, you have to say what you want to prove (“I intend to prove that Alice is oldest and Bill is youngest”), but you don’t just get True/False, you also get the proof. When you’re just doing logic, this probably doesn’t matter much, but one of Coq’s most interesting applications is that you can develop software with it that is proven correct, and then literally extract the code from the proof, in any of Scheme, Haskell, or OCaml. This code is “correct by construction” with respect to its specification in Coq, and in Coq, specifications can be

extremelystrong, again thanks to the expressive power of the Calculus of Inductive Constructions. A nice example of this is this implementation of Finger Trees in Coq.By the way, I hope you don’t believe I’m attempting to downplay the power of Prolog here—my point really is more about the simplicity of the puzzle. My copy of “PROLOG Programming for Artificial Intelligence,” 3rd edition, is about 6″ away from my right elbow, and I recently got ODBC working on my Mac OS X box, which means that I can play with FLORA-2‘s Persistent Module package. FLORA-2, in turn, relies on the XSB Prolog system. I’m a huge fan.

I hope this helps clarify what Coq and firstorder are a bit!

I think your encoding of Given1 is not correct. You have:

Hypothesis Given1 : ~ Youngest Alice -> Youngest Bill.

The text says: Alice is the youngest unless Bill is

Which IMO should translate to:

Hypothesis Given1 : ~ Youngest Bill -> Youngest Alice.

I think the Lemma YoungBill (and the entire goal) cannot be proven without classical with this encoding anymore. It can be proved with double negative elimination.

Also, the firstorder tactic does not give an answer to this.