I'm trying to formalize some properties on regular expressions (REs) using Coq. But, I've got some troubles to prove a rather simple property:
For all strings s, if s is in the language of (epsilon)* RE, then s = "", where epsilon and * denotes the empty string RE and Kleene star operation.
This seems to be an obvious application of induction / inversion tactics, but I couldn't make it work.
The minimal working code with the problematic lemma is in the following gist. Any tip on how should I proceed will be appreciated.
EDIT:
One of my tries was something like:
Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.  
  intros s H.
  inverts* H.
  inverts* H2.
  inverts* H1.
  inverts* H1.
  inverts* H2.
  simpl in *.
  -- stuck here
that leave me with the following goal:
s' : string
H4 : s' <<- (#1 ^*)
============================
s' = ""
At least to me, it appears that using induction would finish the proof, since I could use H4 in induction hypothesis to finish the proof, but when I start the proof using
induction H
instead of
inverts* H
I got some (at least for me) senseless goals. In Idris / Agda, such proof just follows by pattern matching and recursion over the structure of s <<- (#1 ^*). My point is how to do such recursion in Coq.
Here is one possible solution of the original problem:
Lemma star_lemma : forall s,
    s <<- (#1 ^*) -> s = "".
Proof.
  refine (fix star_lemma s prf {struct prf} : s = "" := _).
  inversion_clear prf; subst.
  inversion_clear H; subst.
  - now inversion H0.
  - inversion_clear H0; subst. inversion_clear H; subst.
    rewrite (star_lemma s' H1).
    reflexivity.
Qed.
The main idea is to introduce a term in the context which will resemble the recursive call in a typical Idris proof. The approaches with remember and dependent induction don't work well (without modifications of in_regex) because they introduce impossible to satisfy equations as induction hypotheses' premises.
Note: it can take a while to check this lemma (around 40 seconds on my machine under Coq 8.5pl3). I think it's due to the fact that the inversion tactic tends to generate big proof terms.
This problem has obsessed me for a week, and I have finally found a solution that I find elegant.
I had already read that when an induction principle does not fit your needs, you can write and prove another one, more adapted to your problem. That is what I have done in this case. What we would want is the one obtained when using the more natural definition given in this answer. By doing this, we can keep the same definition (if changing it implies too many changes, for example) and reason about it more easily.
Here is the proof of the induction principle (I use a section to specify precisely the implicit arguments, since otherwise I observe strange behaviours with them, but the section mechanism is not necessary at all here).
Section induction_principle.
Context (P : string -> regex -> Prop)
  (H_InEps : P "" #1)
  (H_InChr : forall c, P (String c "") ($ c))
  (H_InCat : forall {e e' s s' s1}, s <<- e -> P s e -> s' <<- e' ->
    P s' e' -> s1 = s ++ s' -> P s1 (e @ e'))
  (H_InLeft : forall {s e e'}, s <<- e -> P s e -> P s (e :+: e'))
  (H_InRight : forall {s' e e'}, s' <<- e' -> P s' e' -> P s' (e :+: e'))
  (H_InStar_Eps : forall e, P "" (e ^*))
  (H_InStar_Cat : forall {s1 s2 e}, s1 <<- e -> s2 <<- (e ^*) ->
    P s1 e -> P s2 (e ^*) -> P (s1++s2) (e ^*)).
Arguments H_InCat {_ _ _ _ _} _ _ _ _ _.
Arguments H_InLeft {_ _ _} _ _.
Arguments H_InRight {_ _ _} _ _.
Arguments H_InStar_Cat {_ _ _} _ _ _ _.
Definition in_regex_ind2 : forall (s : string) (r : regex), s <<- r -> P s r.
Proof.
  refine (fix in_regex_ind2 {s r} prf {struct prf} : P s r :=
    match prf with
    | InEps => H_InEps
    | InChr c => H_InChr c
    | InCat prf1 prf2 eq1 =>
        H_InCat prf1 (in_regex_ind2 prf1) prf2 (in_regex_ind2 prf2) eq1
    | InLeft _ prf => H_InLeft prf (in_regex_ind2 prf)
    | InRight _ prf => H_InRight prf (in_regex_ind2 prf)
    | InStar prf => _
    end).
  inversion prf; subst.
  - inversion H1. apply H_InStar_Eps.
  - inversion H1; subst.
    apply H_InStar_Cat; try assumption; apply in_regex_ind2; assumption.
Qed.
End induction_principle.
And it turned out that the Qed of this proof was not instantaneous (probably due to inversion producing large terms as in this answer), but took less than 1s (maybe because the lemma is more abstract).
The star_lemma becomes nearly trivial to prove (as soon as we know the remember trick), as with the natural definition.
Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.
  intros s H. remember (#1 ^*) as r.
  induction H using in_regex_ind2; try discriminate.
  - reflexivity.
  - inversion Heqr; subst.
    inversion H. rewrite IHin_regex2 by reflexivity. reflexivity.
Qed.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With