luau/rfcs/new-nonstrict.md
2023-10-09 18:57:50 -05:00

341 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# New non-strict mode
## Summary
Currently, strict mode and non-strict mode infer different types for
the same program. With this feature, strict and non-strict modes will
share the [local type inference](local-type-inference.md)
engine, and the only difference between the modes will be in which
errors are reported.
## Motivation
Having two different type inference engines is unnecessarily
confusing, and can result in unexpected behaviors such as changing the
mode of a module can cause errors in the users of that module.
The current non-strict mode infers very coarse types (e.g. all local
variables have type `any`) and so is not appropriate for type-driven
tooling, which results in expensively and inconsistently using
different modes for different tools.
## Design
### Code defects
The main goal of non-strict mode is to minimize false positives, that
is if non-strict mode reports an error, then we have high confidence
that there is a code defect. Example defects are:
* Run-time errors
* Dead code
* Using an expression whose only possible value is `nil`
* Writing to a table property that is never read
*Run-time errors*: this is an obvious defect. Examples include:
* Built-in operators (`"hi" + 5`)
* Luau APIs (`math.abs("hi")`)
* Function calls from embeddings (`CFrame.new("hi")`)
* Missing properties from embeddings (`CFrame.new().nope`)
Detecting run-time errors is undecidable, for example
```lua
if cond() then
math.abs(hi)
end
```
It is undecidable whether this code produces a run-time error, but we
do know that if `math.abs("hi")` is executed, it will produce a
run-time error, and so report a type error in this case.
*Expressions guaranteed to be `nil`*: Luau tables do not error when a
missing property is accessed (though embeddings may). So something
like
```lua
local t = { Foo = 5 }
local x = t.Fop
```
wont produce a run-time error, but is more likely than not a
programmer error. In this case, if the programmer intent was to
initialize `x` as `nil`, they could have written
```lua
local t = { Foo = 5 }
local x = nil
```
For this reason, we consider it a code defect to use a value that the
type system guarantees is of type `nil`.
*Writing properties that are never read*: There is a matching problem
with misspelling properties when writing. For example
```lua
function f()
local t = {}
t.Foo = 5
t.Fop = 7
print(t.Foo)
end
```
wont produce a run-time error, but is more likely than not a
programmer error, since `t.Fop` is written but never read. We can use
read-only and write-only table properties for this, and make it an
error to create a write-only property.
We have to be careful about this though, because if `f` ended with
`return t`, then it would be a perfectly sensible function with type
`() -> { Foo: number, Fop: number }`. The only way to detect that `Fop`
was never read would be whole-program analysis, which is prohibitively
expensive.
*Implicit coercions*: Luau supports various implicit coercions, such
as allowing `math.abs("-12")`. These should be reported as defects.
### New Non-strict error reporting
The difficult part of non-strict mode error-reporting is detecting
guaranteed run-time errors. We can do this using an error-reporting
pass that generates a type context such that if any of the `x : T` in
the type context are satisfied, then the program is guaranteed to
produce a type error.
For example in the program
```lua
function h(x, y)
math.abs(x)
string.lower(y)
end
```
an error is reported when `x` isnt a `number`, or `y` isnt a `string`, so the generated context is
```
x : ~number
y : ~string
```
In the function:
```lua
function f(x)
math.abs(x)
string.lower(x)
end
```
an error is reported when x isnt a number or isnt a string, so the constraint set is
```
x : ~number | ~string
```
Since `~number | ~string` is equivalent to `unknown`, non-strict mode
can report a warning, since calling the function is guaranteed to
throw a run-time error. In contrast:
```lua
function g(x)
if cond() then
math.abs(x)
else
string.lower(x)
end
end
```
generates context
```
x : ~number & ~string
```
Since `~number & ~string` is not equivalent to `unknown`, non-strict mode reports no warning.
* The disjunction of contexts `C1` and `C2` contains `x : T1|T2`,
where `x : T1` is in `C1` and `x : T2` is in `C2`.
* The conjunction of contexts `C1` and `C2` contains `x : T1&T2`,
where `x : T1` is in `C1` and `x : T2` is in `C2`.
The context generated by a block is:
* `x = E` generates the context of `E : never`.
* `B1; B2` generates the disjunction of the context of `B1` and the
context of `B2`.
* `if C then B1 else B2` end generates the conjunction of the context
of `B1` and the context of `B2`.
* `local x; B` generates the context of `B`, removing the constraint
`x : T`. If the type inferred for `x` is a subtype of `T`, then
issue a warning.
* `function f(x1,...,xN) B end` generates the context for `B`,
removing `x1 : T1, ..., xN : TN`. If any of the `Ti` are equivalent to
`unknown`, then issue a warning.
The constraint set generated by a typed expression is:
* The context generated by `x : T` is `x : T`.
* The context generated by `s : T` (where `s` is a scalar) is
trivial. Issue a warning if `s` has type `T`.
* The context generated by `F(M1, ..., MN) : T` is the disjunction of
the contexts generated by `F : ~function`, and by
`M1 : T1`, ...,`MN : TN` where for each `i`, `F` has an overload
`(unknown^(i-1),Ti,unknown^(N-i)) -> error`. (Pick `Ti` to be
`never` if no such overload exists). Issue a warning if `F` has an
overload `(unknown^N) -> S` where `S <: (T | error)`.
* The context generated by `M.p` is the context generated by `M : ~table`.
* The context generated by `M[N]` is the disjunction of the contexts
generated by `M : ~table` and `N : never`.
For example:
* The context generated by `math.abs("hi") : never` is
* the context generated by `"hi" : ~number`, since `math.abs` has an
overload `(~number) -> error`, which is trivial.
* A warning is issued since `"hi"` has type `~number`.
* The context generated by `function f(x) math.abs(x); string.lower(x) end` is
* the context generated by `math.abs(x); string.lower(x)` which is the disjunction of
* the context generated by `math.abs(x)`, which is
* the context `x : ~number`, since `math.abs` has an overload `(~number)->error`
* the context generated by `string.lower(x)`, which is
* the context `x : ~string`, since `string.lower` has an overload `(~string)->error`
* remove the binding `x : (~number | ~string)`
* A warning is issued since `(~number | ~string)` is equivalent to `unknown`.
* The context generated by `math.abs(string.lower(x))` is
* the context generated by `string.lower(x) : ~number`, since `math.abs` has an overload `(~number)->error`, which is
* the text`x : ~string`, since `string.lower` has an overload `(~string)->error`.
* An warning is issued, since `string.lower` has an overload `(unknown) -> (string | error)` and `(string | error) <: (~number | error)`.
### Ergonomics
*Error reporting*. A straightforward implementation of this design
issues warnings at the point that data flows into a place
guaranteed to later produce a run-time error, which may not be perfect
ergonomics. For example, in the program:
```lua
local x
if cond() then
x = 5
else
x = nil
end
string.lower(x)
```
the type inferred for `x` is `number?` and the context generated is `x
: ~string`. Since `number? <: ~string`, a warning is issued at the
declaration `local x`. For ergonomics, we might want to identify
either `string.lower(x)` or `x = 5` (or both!) in the error report.
*Stringifying checked functions*. This design depends on functions
having overloads with `error` return type. This integrates with
[type error suppression](type-error-suppression.md), but would not be
a perfect way to present types to users. A common case is that the
checked type is the negation of the function type, for example the
type of `math.abs`:
```
(number) -> number & (~number) -> error
```
This might be better presented as an annotation on the argument type, something like:
```
@checked (number) -> number
```
The type
```
@checked (S1,...,SN) -> T
```
is equivalent to
```
(S1,...,SN) -> T
& (~S1, unknown^N-1) -> error
& (unknown, ~S2, unknown^N-2) -> error
...
& (unknown^N-1, SN) -> error
```
As a further extension, we might allow users to explicitly provide `@checked` type annotations.
Checked functions are known as strong functions in Elixir.
## Drawbacks
This is a breaking change, since it results in more errors being issued.
Strict mode infers more precise (and hence more complex) types than
current non-strict mode, which are presented by type error messages
and tools such as type hover.
## Alternatives
Success typing (used in Erlang Dialyzer) is the nearest existing
solution. It is similar to this design, except that it only works in
(the equivalent of) non-strict mode. The success typing function type
`(S)->T` is the equivalent of our
`(~S)->error & (unknown)->(T|error)`.
We could put the `@checked` annotation on individual function argments
rather than the function type.
We could use this design to infer checked functions. In function
`f(x1, ..., xN) B end`, we could generate the context
`(x1 : T1, ..., xN : TN)` for `B`, and add an overload
`(unknown^(i-1),Ti,unknown^(N-i))->error` to the inferred type of `f`. For
example, for the function
```lua
function h(x, y)
math.abs(x)
string.lower(y)
end
```
We would infer type
```
(number, string) -> ()
& (~number, unknown) -> error
& (unknown, ~string) -> error
```
which is the same as
```
@checked (number, string) -> ()
```
The problem with doing this is what to do about recursive functions.
## References
Lily Brown, Andy Friesen and Alan Jeffrey
*Position Paper: Goals of the Luau Type System*,
in HATRA '21: Human Aspects of Types and Reasoning Assistants,
2021.
https://doi.org/10.48550/arXiv.2109.11397
Giuseppe Castagna, Guillaume Duboc, José Valim
*The Design Principles of the Elixir Type System*,
2023.
https://doi.org/10.48550/arXiv.2306.06391
Tobias Lindahl and Konstantinos Sagonas,
*Practical Type Inference Based on Success Typings*,
in PPDP '06: Principles and Practice of Declarative Programming,
pp. 167178, 2006.
https://doi.org/10.1145/1140335.1140356