luau/rfcs/new-nonstrict.md
2023-10-09 18:57:50 -05:00

10 KiB
Raw Permalink Blame History

New non-strict mode

Summary

Currently, strict mode and non-strict mode infer different types for the same program. With this feature, strict and non-strict modes will share the local type inference engine, and the only difference between the modes will be in which errors are reported.

Motivation

Having two different type inference engines is unnecessarily confusing, and can result in unexpected behaviors such as changing the mode of a module can cause errors in the users of that module.

The current non-strict mode infers very coarse types (e.g. all local variables have type any) and so is not appropriate for type-driven tooling, which results in expensively and inconsistently using different modes for different tools.

Design

Code defects

The main goal of non-strict mode is to minimize false positives, that is if non-strict mode reports an error, then we have high confidence that there is a code defect. Example defects are:

  • Run-time errors
  • Dead code
  • Using an expression whose only possible value is nil
  • Writing to a table property that is never read

Run-time errors: this is an obvious defect. Examples include:

  • Built-in operators ("hi" + 5)
  • Luau APIs (math.abs("hi"))
  • Function calls from embeddings (CFrame.new("hi"))
  • Missing properties from embeddings (CFrame.new().nope)

Detecting run-time errors is undecidable, for example

if cond() then
    math.abs(hi)
end

It is undecidable whether this code produces a run-time error, but we do know that if math.abs("hi") is executed, it will produce a run-time error, and so report a type error in this case.

Expressions guaranteed to be nil: Luau tables do not error when a missing property is accessed (though embeddings may). So something like

local t = { Foo = 5 }
local x = t.Fop

wont produce a run-time error, but is more likely than not a programmer error. In this case, if the programmer intent was to initialize x as nil, they could have written

local t = { Foo = 5 }
local x = nil

For this reason, we consider it a code defect to use a value that the type system guarantees is of type nil.

Writing properties that are never read: There is a matching problem with misspelling properties when writing. For example

function f()
  local t = {}
  t.Foo = 5
  t.Fop = 7
  print(t.Foo)
end

wont produce a run-time error, but is more likely than not a programmer error, since t.Fop is written but never read. We can use read-only and write-only table properties for this, and make it an error to create a write-only property.

We have to be careful about this though, because if f ended with return t, then it would be a perfectly sensible function with type () -> { Foo: number, Fop: number }. The only way to detect that Fop was never read would be whole-program analysis, which is prohibitively expensive.

Implicit coercions: Luau supports various implicit coercions, such as allowing math.abs("-12"). These should be reported as defects.

New Non-strict error reporting

The difficult part of non-strict mode error-reporting is detecting guaranteed run-time errors. We can do this using an error-reporting pass that generates a type context such that if any of the x : T in the type context are satisfied, then the program is guaranteed to produce a type error.

For example in the program

function h(x, y)
  math.abs(x)
  string.lower(y)
end

an error is reported when x isnt a number, or y isnt a string, so the generated context is

x : ~number
y : ~string

In the function:

function f(x)
  math.abs(x)
  string.lower(x)
end

an error is reported when x isnt a number or isnt a string, so the constraint set is

x : ~number | ~string

Since ~number | ~string is equivalent to unknown, non-strict mode can report a warning, since calling the function is guaranteed to throw a run-time error. In contrast:

function g(x)
  if cond() then
    math.abs(x)
  else
    string.lower(x)
  end
end

generates context

x : ~number & ~string

Since ~number & ~string is not equivalent to unknown, non-strict mode reports no warning.

  • The disjunction of contexts C1 and C2 contains x : T1|T2, where x : T1 is in C1 and x : T2 is in C2.
  • The conjunction of contexts C1 and C2 contains x : T1&T2, where x : T1 is in C1 and x : T2 is in C2.

The context generated by a block is:

  • x = E generates the context of E : never.
  • B1; B2 generates the disjunction of the context of B1 and the context of B2.
  • if C then B1 else B2 end generates the conjunction of the context of B1 and the context of B2.
  • local x; B generates the context of B, removing the constraint x : T. If the type inferred for x is a subtype of T, then issue a warning.
  • function f(x1,...,xN) B end generates the context for B, removing x1 : T1, ..., xN : TN. If any of the Ti are equivalent to unknown, then issue a warning.

The constraint set generated by a typed expression is:

  • The context generated by x : T is x : T.
  • The context generated by s : T (where s is a scalar) is trivial. Issue a warning if s has type T.
  • The context generated by F(M1, ..., MN) : T is the disjunction of the contexts generated by F : ~function, and by M1 : T1, ...,MN : TN where for each i, F has an overload (unknown^(i-1),Ti,unknown^(N-i)) -> error. (Pick Ti to be never if no such overload exists). Issue a warning if F has an overload (unknown^N) -> S where S <: (T | error).
  • The context generated by M.p is the context generated by M : ~table.
  • The context generated by M[N] is the disjunction of the contexts generated by M : ~table and N : never.

For example:

  • The context generated by math.abs("hi") : never is
    • the context generated by "hi" : ~number, since math.abs has an overload (~number) -> error, which is trivial.
    • A warning is issued since "hi" has type ~number.
  • The context generated by function f(x) math.abs(x); string.lower(x) end is
    • the context generated by math.abs(x); string.lower(x) which is the disjunction of
      • the context generated by math.abs(x), which is
        • the context x : ~number, since math.abs has an overload (~number)->error
      • the context generated by string.lower(x), which is
        • the context x : ~string, since string.lower has an overload (~string)->error
    • remove the binding x : (~number | ~string)
    • A warning is issued since (~number | ~string) is equivalent to unknown.
  • The context generated by math.abs(string.lower(x)) is
    • the context generated by string.lower(x) : ~number, since math.abs has an overload (~number)->error, which is
      • the textx : ~string, since string.lower has an overload (~string)->error.
      • An warning is issued, since string.lower has an overload (unknown) -> (string | error) and (string | error) <: (~number | error).

Ergonomics

Error reporting. A straightforward implementation of this design issues warnings at the point that data flows into a place guaranteed to later produce a run-time error, which may not be perfect ergonomics. For example, in the program:

local x
if cond() then
  x = 5
else
  x = nil
end
string.lower(x)
the type inferred for x is number? and the context generated is `x
~string. Since number? <: ~string, a warning is issued at the declaration local x. For ergonomics, we might want to identify either string.lower(x)orx = 5` (or both!) in the error report.

Stringifying checked functions. This design depends on functions having overloads with error return type. This integrates with type error suppression, but would not be a perfect way to present types to users. A common case is that the checked type is the negation of the function type, for example the type of math.abs:

(number) -> number & (~number) -> error

This might be better presented as an annotation on the argument type, something like:

@checked (number) -> number

The type

   @checked (S1,...,SN) -> T

is equivalent to

   (S1,...,SN) -> T
   & (~S1, unknown^N-1) -> error
   & (unknown, ~S2, unknown^N-2) -> error
   ...
   & (unknown^N-1, SN) -> error

As a further extension, we might allow users to explicitly provide @checked type annotations.

Checked functions are known as strong functions in Elixir.

Drawbacks

This is a breaking change, since it results in more errors being issued.

Strict mode infers more precise (and hence more complex) types than current non-strict mode, which are presented by type error messages and tools such as type hover.

Alternatives

Success typing (used in Erlang Dialyzer) is the nearest existing solution. It is similar to this design, except that it only works in (the equivalent of) non-strict mode. The success typing function type (S)->T is the equivalent of our (~S)->error & (unknown)->(T|error).

We could put the @checked annotation on individual function argments rather than the function type.

We could use this design to infer checked functions. In function f(x1, ..., xN) B end, we could generate the context (x1 : T1, ..., xN : TN) for B, and add an overload (unknown^(i-1),Ti,unknown^(N-i))->error to the inferred type of f. For example, for the function

function h(x, y)
  math.abs(x)
  string.lower(y)
end

We would infer type

  (number, string) -> ()
& (~number, unknown) -> error
& (unknown, ~string) -> error

which is the same as

  @checked (number, string) -> ()

The problem with doing this is what to do about recursive functions.

References

Lily Brown, Andy Friesen and Alan Jeffrey Position Paper: Goals of the Luau Type System, in HATRA '21: Human Aspects of Types and Reasoning Assistants, 2021. https://doi.org/10.48550/arXiv.2109.11397

Giuseppe Castagna, Guillaume Duboc, José Valim The Design Principles of the Elixir Type System, 2023. https://doi.org/10.48550/arXiv.2306.06391

Tobias Lindahl and Konstantinos Sagonas, Practical Type Inference Based on Success Typings, in PPDP '06: Principles and Practice of Declarative Programming, pp. 167178, 2006. https://doi.org/10.1145/1140335.1140356