From ffc28da3311876fa9e39803690ea42b8f7188f70 Mon Sep 17 00:00:00 2001 From: Alan Jeffrey <403333+asajeffrey@users.noreply.github.com> Date: Mon, 9 Oct 2023 18:57:50 -0500 Subject: [PATCH] RFC for new non-strict mode (#1037) --- rfcs/new-nonstrict.md | 340 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 340 insertions(+) create mode 100644 rfcs/new-nonstrict.md diff --git a/rfcs/new-nonstrict.md b/rfcs/new-nonstrict.md new file mode 100644 index 00000000..42e9cc6f --- /dev/null +++ b/rfcs/new-nonstrict.md @@ -0,0 +1,340 @@ +# New non-strict mode + +## Summary + +Currently, strict mode and non-strict mode infer different types for +the same program. With this feature, strict and non-strict modes will +share the [local type inference](local-type-inference.md) +engine, and the only difference between the modes will be in which +errors are reported. + +## Motivation + +Having two different type inference engines is unnecessarily +confusing, and can result in unexpected behaviors such as changing the +mode of a module can cause errors in the users of that module. + +The current non-strict mode infers very coarse types (e.g. all local +variables have type `any`) and so is not appropriate for type-driven +tooling, which results in expensively and inconsistently using +different modes for different tools. + +## Design + +### Code defects + +The main goal of non-strict mode is to minimize false positives, that +is if non-strict mode reports an error, then we have high confidence +that there is a code defect. Example defects are: + +* Run-time errors +* Dead code +* Using an expression whose only possible value is `nil` +* Writing to a table property that is never read + +*Run-time errors*: this is an obvious defect. Examples include: + +* Built-in operators (`"hi" + 5`) +* Luau APIs (`math.abs("hi")`) +* Function calls from embeddings (`CFrame.new("hi")`) +* Missing properties from embeddings (`CFrame.new().nope`) + +Detecting run-time errors is undecidable, for example + +```lua +if cond() then + math.abs(“hi”) +end +``` + +It is undecidable whether this code produces a run-time error, but we +do know that if `math.abs("hi")` is executed, it will produce a +run-time error, and so report a type error in this case. + +*Expressions guaranteed to be `nil`*: Luau tables do not error when a +missing property is accessed (though embeddings may). So something +like + +```lua +local t = { Foo = 5 } +local x = t.Fop +``` + +won’t produce a run-time error, but is more likely than not a +programmer error. In this case, if the programmer intent was to +initialize `x` as `nil`, they could have written + +```lua +local t = { Foo = 5 } +local x = nil +``` + +For this reason, we consider it a code defect to use a value that the +type system guarantees is of type `nil`. + +*Writing properties that are never read*: There is a matching problem +with misspelling properties when writing. For example + +```lua +function f() + local t = {} + t.Foo = 5 + t.Fop = 7 + print(t.Foo) +end +``` + +won’t produce a run-time error, but is more likely than not a +programmer error, since `t.Fop` is written but never read. We can use +read-only and write-only table properties for this, and make it an +error to create a write-only property. + +We have to be careful about this though, because if `f` ended with +`return t`, then it would be a perfectly sensible function with type +`() -> { Foo: number, Fop: number }`. The only way to detect that `Fop` +was never read would be whole-program analysis, which is prohibitively +expensive. + +*Implicit coercions*: Luau supports various implicit coercions, such +as allowing `math.abs("-12")`. These should be reported as defects. + +### New Non-strict error reporting + +The difficult part of non-strict mode error-reporting is detecting +guaranteed run-time errors. We can do this using an error-reporting +pass that generates a type context such that if any of the `x : T` in +the type context are satisfied, then the program is guaranteed to +produce a type error. + +For example in the program + +```lua +function h(x, y) + math.abs(x) + string.lower(y) +end +``` + +an error is reported when `x` isn’t a `number`, or `y` isn’t a `string`, so the generated context is + +``` +x : ~number +y : ~string +``` + +In the function: + +```lua +function f(x) + math.abs(x) + string.lower(x) +end +``` + +an error is reported when x isn’t a number or isn’t a string, so the constraint set is + +``` +x : ~number | ~string +``` + +Since `~number | ~string` is equivalent to `unknown`, non-strict mode +can report a warning, since calling the function is guaranteed to +throw a run-time error. In contrast: + +```lua +function g(x) + if cond() then + math.abs(x) + else + string.lower(x) + end +end +``` + +generates context + +``` +x : ~number & ~string +``` + +Since `~number & ~string` is not equivalent to `unknown`, non-strict mode reports no warning. + +* The disjunction of contexts `C1` and `C2` contains `x : T1|T2`, + where `x : T1` is in `C1` and `x : T2` is in `C2`. +* The conjunction of contexts `C1` and `C2` contains `x : T1&T2`, + where `x : T1` is in `C1` and `x : T2` is in `C2`. + +The context generated by a block is: + +* `x = E` generates the context of `E : never`. +* `B1; B2` generates the disjunction of the context of `B1` and the + context of `B2`. +* `if C then B1 else B2` end generates the conjunction of the context + of `B1` and the context of `B2`. +* `local x; B` generates the context of `B`, removing the constraint + `x : T`. If the type inferred for `x` is a subtype of `T`, then + issue a warning. +* `function f(x1,...,xN) B end` generates the context for `B`, + removing `x1 : T1, ..., xN : TN`. If any of the `Ti` are equivalent to + `unknown`, then issue a warning. + +The constraint set generated by a typed expression is: + +* The context generated by `x : T` is `x : T`. +* The context generated by `s : T` (where `s` is a scalar) is + trivial. Issue a warning if `s` has type `T`. +* The context generated by `F(M1, ..., MN) : T` is the disjunction of + the contexts generated by `F : ~function`, and by + `M1 : T1`, ...,`MN : TN` where for each `i`, `F` has an overload + `(unknown^(i-1),Ti,unknown^(N-i)) -> error`. (Pick `Ti` to be + `never` if no such overload exists). Issue a warning if `F` has an + overload `(unknown^N) -> S` where `S <: (T | error)`. +* The context generated by `M.p` is the context generated by `M : ~table`. +* The context generated by `M[N]` is the disjunction of the contexts + generated by `M : ~table` and `N : never`. + +For example: + +* The context generated by `math.abs("hi") : never` is + * the context generated by `"hi" : ~number`, since `math.abs` has an + overload `(~number) -> error`, which is trivial. + * A warning is issued since `"hi"` has type `~number`. +* The context generated by `function f(x) math.abs(x); string.lower(x) end` is + * the context generated by `math.abs(x); string.lower(x)` which is the disjunction of + * the context generated by `math.abs(x)`, which is + * the context `x : ~number`, since `math.abs` has an overload `(~number)->error` + * the context generated by `string.lower(x)`, which is + * the context `x : ~string`, since `string.lower` has an overload `(~string)->error` + * remove the binding `x : (~number | ~string)` + * A warning is issued since `(~number | ~string)` is equivalent to `unknown`. +* The context generated by `math.abs(string.lower(x))` is + * the context generated by `string.lower(x) : ~number`, since `math.abs` has an overload `(~number)->error`, which is + * the text`x : ~string`, since `string.lower` has an overload `(~string)->error`. + * An warning is issued, since `string.lower` has an overload `(unknown) -> (string | error)` and `(string | error) <: (~number | error)`. + +### Ergonomics + +*Error reporting*. A straightforward implementation of this design +issues warnings at the point that data flows into a place +guaranteed to later produce a run-time error, which may not be perfect +ergonomics. For example, in the program: + +```lua +local x +if cond() then + x = 5 +else + x = nil +end +string.lower(x) +``` + +the type inferred for `x` is `number?` and the context generated is `x +: ~string`. Since `number? <: ~string`, a warning is issued at the +declaration `local x`. For ergonomics, we might want to identify +either `string.lower(x)` or `x = 5` (or both!) in the error report. + +*Stringifying checked functions*. This design depends on functions +having overloads with `error` return type. This integrates with +[type error suppression](type-error-suppression.md), but would not be +a perfect way to present types to users. A common case is that the +checked type is the negation of the function type, for example the +type of `math.abs`: + +``` +(number) -> number & (~number) -> error +``` + +This might be better presented as an annotation on the argument type, something like: + +``` +@checked (number) -> number +``` + +The type + +``` + @checked (S1,...,SN) -> T +``` + +is equivalent to + + +``` + (S1,...,SN) -> T + & (~S1, unknown^N-1) -> error + & (unknown, ~S2, unknown^N-2) -> error + ... + & (unknown^N-1, SN) -> error +``` + +As a further extension, we might allow users to explicitly provide `@checked` type annotations. + +Checked functions are known as strong functions in Elixir. + +## Drawbacks + +This is a breaking change, since it results in more errors being issued. + +Strict mode infers more precise (and hence more complex) types than +current non-strict mode, which are presented by type error messages +and tools such as type hover. + +## Alternatives + +Success typing (used in Erlang Dialyzer) is the nearest existing +solution. It is similar to this design, except that it only works in +(the equivalent of) non-strict mode. The success typing function type +`(S)->T` is the equivalent of our +`(~S)->error & (unknown)->(T|error)`. + +We could put the `@checked` annotation on individual function argments +rather than the function type. + +We could use this design to infer checked functions. In function +`f(x1, ..., xN) B end`, we could generate the context +`(x1 : T1, ..., xN : TN)` for `B`, and add an overload +`(unknown^(i-1),Ti,unknown^(N-i))->error` to the inferred type of `f`. For +example, for the function + +```lua +function h(x, y) + math.abs(x) + string.lower(y) +end +``` + +We would infer type + +``` + (number, string) -> () +& (~number, unknown) -> error +& (unknown, ~string) -> error +``` + +which is the same as + +``` + @checked (number, string) -> () +``` + +The problem with doing this is what to do about recursive functions. + +## References + +Lily Brown, Andy Friesen and Alan Jeffrey +*Position Paper: Goals of the Luau Type System*, +in HATRA '21: Human Aspects of Types and Reasoning Assistants, +2021. +https://doi.org/10.48550/arXiv.2109.11397 + +Giuseppe Castagna, Guillaume Duboc, José Valim +*The Design Principles of the Elixir Type System*, +2023. +https://doi.org/10.48550/arXiv.2306.06391 + +Tobias Lindahl and Konstantinos Sagonas, +*Practical Type Inference Based on Success Typings*, +in PPDP '06: Principles and Practice of Declarative Programming, +pp. 167–178, 2006. +https://doi.org/10.1145/1140335.1140356