Skip to content
Ruben Van de Velde edited this page Nov 28, 2022 · 69 revisions

Porting status

The current state of play:

  1. mathport more or less works.
    • Lots of warts, but it gives an output that is much easier to clean up than to translate from scratch.
    • mathport is getting minor fixes and improvements from Mario and Gabriel, but likely there's no full-time attention on it in the near future
  2. mathlib4 tactics are still pretty primitive
    • There is a lot of work-in-progress and resources allocated to porting more tactics to mathlib4.
    • Many advanced parts of the library are not portable with the current state of tactics.
    • However a lot of basic material is portable now, with only minor changes
    • Even that material is going to regularly discover issues with the existing ports of tactics, or issues in lean4 (but this is highly desirable).
  3. The theory development in mathlib4 is at a cusp, transitioning from "a mix of directly ported stuff and adhoc by-hand porting or experimentation", to "most files in it are good approximations of a direct port from mathlib3"
  4. Porting theory development is helpful for tactic porting (judicious use of sorry gets you a long way, but there's no particular need for theory development to lag behind).
  5. There is (hopefully!!!) lots of latent enthusiasm for putting in the work to port the theory development in mathlib3.
  6. But it is not immediately obvious what to do.

Porting procedure

Note there is a video showing the whole process on a very simple file.

  1. Identify a file X.lean in mathlib4 which
    • either doesn't exist, or exists in a partially-ported or adhoc state
    • but for which all imports are "approximately directly ported" You can run scripts/port_status.py in the main mathlib directory (on a recent commit) to identify these files.
  2. Open the mathlib3port repository and copy down the mathlib3 SHA on which your current mathlib3port is based; this is found in the README file. You will need this for the PR to mathlib4.
  3. If you would like to "claim" that you are working on the file, you can visit the port status page and change No to No: WIP by [your name] [mathlib3 SHA you just found]
  4. Open that file X.lean in the mathlib3port repository.
  5. Copy and paste it into a new branch in the mathlib4 repository, and commit it as-is. This allows reviewers to see the diff between mathport and the manually ported version, making it easy to check aligns and similar statements.
  6. Perform a bunch of mechanical updates on the file (i.e. things that in principle mathport could do, but doesn't yet), see below.
  7. Try to get everything compiling again. It's usually useful to have the original mathlib file open in a separate window, so you can typecheck, discover where instances are coming from, etc. Potential obstacles include
    • Discovering that an import Y.lean which you thought was sufficiently well ported already in fact isn't. (Either patch it up, or go back to step 1 with X = Y.)
    • Finding that a tactic T used in a proof is still missing. (Either, hesitantly, replace it with simpler tactics, or decide to pause on this file, perhaps noting somewhere that X.lean is now waiting on T.)
    • Encountering a surprising change in behavior in either a ported tactic or Lean core. (Ask on Zulip, create an issue, hesitantly work around or preferably note the X.lean is waiting on that issue.)
    • Some small changes are just going to have to happen. Leave TODO or FIXME liberally, and note any name changes you make using #align so mathport can follow your changes in its next run.
    • Realizing that it is not a trivial port, e.g. because it depends fundamentally on the significant changes to coercions or typeclass resolution in Lean4. (e.g. data.nat.cast.basic and data.int.cast.basic are on the nearby horizon in this category). Time to assemble the relevant experts.
  8. At this point you've either noted the file as blocked on other changes, or successfully ported the entire file! In this case, add it to the Mathlib.lean file, that is supposed to import all files of Mathlib.
  9. If it's a successful port, make the PR to mathlib4 and be sure to include the mathlib3 SHA you found in Step 2 to the PR description.
  10. Once it is accepted
  • Go to the port status page and change No to Yes mathlib4#XXX [mathlib3 SHA], to indicate that has been ported and specify the mathlib3 SHA on which it was based. This will update the results that scripts/port_status.py produces.
  • Wait for CI to update port comments in mathlib
  • Once this mathlib PR is merged, the port_status will flag your file as changed after the SHA you wrote in the port status (because of your PR!). You need to update the SHA to this newer mathlib3 version, but please be sure no other modification have been done to the file.

Implicit in this proposed scenario there is no "flag day" for all of mathlib: we just work our way up from the bottom.

In the optimistic scenario, the problems in step 5 become less and less frequent, and mathport gets some tweaks that reduce the effort required in step 4, and it becomes possible to do multiple files, or whole directories or areas in one go!

For any file that isn't already a "complete port" but has been manually partly ported, we take "starting over" as the default route.

This might involve:

  • move the existing X.lean to X2.lean
  • take X.lean from mathlib3port, and get it working as X.lean
  • import X.lean in X2.lean
  • delete everything in X2.lean that by now has already been defined in X.lean
  • find homes for or discard anything that remains (judgement required here!)
  • get everything downstream working again

You can also port from Lean 3 core, as well as from mathlib3. If you find something is missing that was in Lean 3 core, please put it in the Mathlib/Init/ directory. There is a repository lean3port which contains the output of mathport running on Lean 3 core, analogous to mathlib3port.

Some common fixes

  • change import Mathbin to import Mathlib
  • Add explicit imports to resolve missing dependencies where transitive imports have changed (often by identifying what the mathlib equivalent resolves to & what the mathlib4 equivalent is)
  • If a file imports another file from the tactic or meta folder: look at the status of which tactics that still need to be ported in this list. Tactics, attributes and commands that are not on this list should be working in Lean 4. If the tactics mentioned in the imports are already ported, you can import the corresponding mathlib4 files (the file name might have changed a bit). If not, you can replace the 'No' on https://github.com/leanprover-community/mathlib/wiki/mathlib4-port-status by 'Waiting on tactic X' or something.
  • remove in names
  • Missing attributes:
    • For refl:
      import Mathlib.Tactic.Relation.Rfl
      
      @[refl]
      theorem foo : 1 = 1 := rfl
      
    • trans: for now use import Mathlib.Mathport.Attributes.
    • protect_proj: add protected before each field
    • If you run into other attributes, hopefully you will find them noted as needing porting in Mathlib/Mathport/Syntax.lean (which you can freely import if needed, I guess), or they should already be implemented somewhere, and searching for register.*attr should help you locate it.
  • See also the Lean 4 documentation about differences with Lean 3.
  • When declaring a structure, if a field needs to be put on a new line, then each fields need to be on a new line, or at least the first after the with keyword must be on a new line.
  • The command include and omit are gone in Lean4, you can safely delete the corresponding lines. Lean should be able to do the right thing by default.

Linting

Run lints by adding #lint at the bottom of the file.

If the file you are importing don't give you access to the #lint command then you need to add

import Std.Tactic.Lint.Frontend
import Std.Tactic.Lint.Misc

Note that we now expect docstrings for each field of each inductive type. For instance

/-- The natural numbers. -/
inductive Nat where
  /-- `Nat.zero`, normally written `0 : Nat`, is the smallest natural number.
  This is one of the two constructors of `Nat`. -/
  | zero : Nat
  /-- The successor function on natural numbers, `succ n = n + 1`.
  This is one of the two constructors of `Nat`. -/
  | succ (n : Nat) : Nat

We also expect documentation of notations. You should usually use the @[inherit_doc] attribute, which will copy the documentation from another declaration. This attribute takes a declaration name as an optional parameter in case Lean doesn't figure out which declaration should give its documentation to the notation.

Aligning names

In Mathlib.Mathport.Rename there are a bunch of commands of the form #align lean3_name lean4Name, but you can place them anywhere. These instruct mathport to adjust its naming on a case-by-case basis, and we need to be using it a lot more than we are. Note that mathport#187 removes the \_x's in exchange for now quite often running into severe type errors and having to stub stuff out. Basically, it will now assume that any name clashes are intentional and will use the clashed name even if it's not type correct (instead of making up \_x names), but that means that some definitions that actually do need to be different, like CoeT or Stream, need to be manually marked for realignment.

But even beyond this, whenever mathport gets a name wrong you want to use an #align directive to change the name to the desired thing, even if you have already fixed the definition, because that influences all future references of that definition.

If Lean complains it does not know about the #align command then you need to import Mathlib.Mathport.Rename.

Note that #align does not automatically include namespaces.

Naming convention

(note: These are work in progress)

We use the following conventions

  1. Terms of Props (proofs, theorem names) are in snake_case.
  2. Props and Types (or Sort) (inductive types, structures, classes) are in UpperCamelCase. defs that are Type or Prop valued follow this rules. (please note the distinction between def Foo : Type and def foo: (α: Type)
  3. All other terms of Types (basically anything else) are in lowerCamelCase.
  4. When something named with UpperCamelCase is part of something named with snake_case, it is referenced in lowerCamelCase.
  5. Acronyms like LE are written upper-/lowercase as a group, depending on what the first character would be.
  6. Rules 1-5 apply to fields of a structure or constructors of an inductive type in the same way.

There are some rare counter-exceptions to preserve local naming symmetry: e.g., we use Ne rather than NE to follow the example of Eq; any such counter-exceptions should be discussed on Zulip.

Note that if a theorem contains a lowerCamelCase component with multiple words as in rule 4, mathport will usually get the name wrong and it has to be #aligned by hand.

Examples

-- follows rule 2
structure OneHom (M : Type _) (N : Type _) [One M] [One N] where
  toFun : M → N -- follows rule 3 via rule 6
  map_one' : toFun 1 = 1 -- follows rule 1 via rule 6

-- follows rule 2
class CoeIsOneHom [One M] [One N] : Prop where
  coe_one : (↑(1 : M) : N) = 1 -- follows rule 1 via rule 6

-- follows rule 1
theorem map_one [OneHomClass F M N] (f : F) : f 1 = 1 := sorry

-- follows rules 1 and 4
theorem MonoidHom.toOneHom_injective [MulOneClass M] [MulOneClass N] :
  Function.Injective (MonoidHom.toOneHom : (M →* N) → OneHom M N) := sorry
-- manual align is needed due to `lowerCamelCase` with several words inside `snake_case`
#align monoid_hom.to_one_hom_injective MonoidHom.toOneHom_injective

-- follows rule 3
def outParam (α : Sort u) : Sort u := α

-- follows rule 2
class HPow (α : Type u) (β : Type v) (γ : outParam (Type w)) where
  hPow : α → β → γ -- follows rule 3 via rule 6; note that rule 5 does not apply

-- follows rules 2 and 5
class LT (α : Type u) where
  lt : α → α → Prop -- follows rule 3 and 5

-- follows rules 1 and 5
theorem gt_iff_lt [LT α] {a b : α} : a > b ↔ b < a := sorry

-- follows rule 2; `Ne` is an exception to rule 5
class NeZero : Prop := sorry

-- follows rules 1 and 4
theorem neZero_iff {R : Type _} [Zero R] {n : R} : NeZero n ↔ n ≠ 0 := sorry
-- manual align is needed due to `lowerCamelCase` with several words inside `snake_case`
#align ne_zero_iff neZero_iff

Further conventions

Type Classes

Where Lean 3 often named type classes by the predicate implementing them implied, Lean 4 prefers to name the noun of the attribute implied by implementing them.

Examples

  • Lean 3: has_mul, Lean 4: Mul
  • Lean 3: is_lawful_monad, Lean 4: LawfulMonad

For the mathlib port, we wish to follow this for has_foo typeclasses, which should be named Foo; for is_foo typeclasses, we <?>

Categories

  • Categories are suffixed with Cat, e.g. LatticeCat.

Coercions

In Lean 4 coercions are elaborated directly, so given

variable (f : α ≃ β) (a : α)

the application f a is syntactically equal to f.toFun a.

UPD: It seems that we have some magic that prevents Lean from unfolding past FunLike.coe, so for types with FunLike instance we have a Lean3-style behavior (thus we can have generic lemmas like map_mul). If you define a CoeFun instance instead, then it unfolds to toFun.

Similarly

variable (g : α → Prop) (x : Subtype g)

the coercion (x : α) is elaborated as x.val.

There are several different typeclasses for coercions, the most common ones being Coe, CoeFun and CoeTail. Usually, mathlib3's coe corresponds to Coe, coe_fn to CoeFun and coeTC to CoeTail.

Consequences for porting

This means that we can get rid of lemmas that used to do the simplification f.to_fun x to ⇑f x as these lemmas are now syntactic tautologies. This lemma for Subtype is not needed anymore in mathlib4:

theorem val_eq_coe {x : Subtype p} : x.1 = ↑x :=
  rfl

Another consequence is that certain instances become unnecessary:

instance coe_trans [Zero M] [Coe R S] [CoeTail S M] [h : NeZero (r : M)] : NeZero ((r : S) : M) :=
  ⟨h.out⟩

If the coercion function appears explicitly in a theorem, then it needs to be replaced by the function that is used in the coercion. So for example

theorem cast_injective : injective (coe : ℕ → R) := sorry

becomes

theorem cast_injective : injective (Nat.cast : ℕ → R) := sorry

Coercion tag

To define coercions, there exists now the coe attribute to automatically create the delaborator and the norm_cast attribute. A complete definition of a coercion might look like this:

@[coe] def cast [AddGroupWithOne R] : ℤ → R := AddGroupWithOne.intCast

instance [AddGroupWithOne R] : CoeTail ℤ R where coe := cast

Known tactic issues

  • change e defined in Mathlib.Tactic.Basic as an alias for show e, but the at form and change .. with .. are both defined without implementation in lean 4 core. These are just missing an implementation (and they should either be PR'd to lean 4 or moved to mathlib4).
  • to_additive attribute doesn't correctly translate some names. Just like in Lean 3, it has issues with flipping variable order in pow -> smul and zpow -> zsmul. In such cases, use @[to_additive my_additive_theorem]` before the multiplicative theorem.
  • When applied to a structure: to_additive: doesn't automatically translate projections to extended structures other than the first, in case that the extended structures have a field in common. In this case, manually add these tags for all projections other than the first. Example: attribute [to_additive AddCommGroup.toAddCommMonoid] CommGroup.toCommMonoid. You can easily see which projections are translated with attribute [to_additive? ...] ....
  • to_additive might not reliable deal with numerals or other operations on fixed types, like Nat. In this case, make a Zulip thread tagging @fpvandoorn. As a workaround, you can first manually state the additive version, and then the multiplicative version tagged with @[to_additive].

Tips and tricks

Auto-implicits

Lean 4 has automatic generation for implicit types, so that the following code is valid without explicitly mentioning α or β:

variable (f : α ≃ β)

When porting this may cause errors due to mathlib3port not aligning names correctly, so it can help to turn auto-implicits off with

set_option autoImplicit false

Finding instances

#synth is useful for discovering if and where instances are available (e.g. as #synth Semiring Nat)