diff --git a/rfcs/README.md b/rfcs/README.md deleted file mode 100644 index 4b5e7b04..00000000 --- a/rfcs/README.md +++ /dev/null @@ -1,60 +0,0 @@ -Background -=== - -Whenever Luau language changes its syntax or semantics (including behavior of builtin libraries), we need to consider many implications of the changes. - -Whenever new syntax is introduced, we need to ask: - -- Is it backwards compatible? -- Is it easy for machines and humans to parse? -- Does it create grammar ambiguities for current and future syntax? -- Is it stylistically coherent with the rest of the language? -- Does it present challenges with editor integration like autocomplete? - -For changes in semantics, we should be asking: - -- Is behavior easy to understand and non-surprising? -- Can it be implemented performantly today? -- Can it be sandboxed assuming malicious usage? -- Is it compatible with type checking and other forms of static analysis? - -For new standard library functions, we should be asking: - -- Is the new functionality used/useful often enough in existing code? -- Does the standard library implementation carry important performance benefits that can't be achieved in user code? -- Is the behavior general and unambiguous, as opposed to solving a problem / providing an interface that's too specific? -- Is the function interface amenable to type checking / linting? - -In addition to these questions, we also need to consider that every addition carries a cost, and too many features will result in a language that is harder to learn, harder to implement and ensure consistent implementation quality throughout, slower, etc. In addition, any language is greater than the sum of its parts and features often have non-intuitive interactions with each other. - -Since reversing these decisions is incredibly costly and can be impossible due to backwards compatibility implications, all user facing changes to Luau language and core libraries must go through an RFC process. - -Process -=== - -To open an RFC, a Pull Request must be opened which creates a new Markdown file in `rfcs/` folder. The RFCs should follow the template `rfcs/TEMPLATE.md`, and should have a file name that is a short human readable description of the feature (using lowercase alphanumeric characters and dashes only). Try using the general area of the RFC as a prefix, e.g. `syntax-generic-functions.md` or `function-debug-info.md`. - -**Please make sure to add `rfc` label to PRs *before* creating them!** This makes sure that our automatic notifications work correctly. - -Every open RFC will be open for at least two calendar weeks. This is to make sure that there is sufficient time to review the proposal and raise concerns or suggest improvements. The discussion points should be reflected on the PR comments; when discussion happens outside of the comment stream, the points salient to the RFC should be summarized as a followup. - -When the initial comment period expires, the RFC can be merged if there's consensus that the change is important and that the details of the syntax/semantics presented are workable. The decision to merge the RFC is made by the Luau team. - -When revisions on the RFC text that affect syntax/semantics are suggested, they need to be incorporated before a RFC is merged; a merged RFC represents a maximally accurate version of the language change that is going to be implemented. - -In some cases RFCs may contain conditional compatibility clauses. E.g. there are cases where a change is potentially not backwards compatible, but is believed to be substantially beneficial that it can be implemented if, in practice, the backwards compatibility implications are minimal. As a strawman example, if we wanted to introduce a non-context-specific keyword `globallycoherent`, we would be able to do so if our analysis of Luau code (based on the Roblox platform at the moment) informs us that no script in existence uses this keyword. In cases like this an RFC may need to be revised after the initial implementation attempt based on the data that we gather. - -In general, RFCs can also be updated after merging to make the language of the RFC more clear, but should not change their meaning. When a new feature is built on top of an existing feature that has an RFC, a new RFC should be created instead of editing an existing RFC. - -When there's no consensus that the feature is broadly beneficial and can be implemented, an RFC will be closed. The decision to close the RFC is made by the Luau team. - -Note that in some cases an RFC may be closed because we don't have sufficient data or believe that at this point in time, the stars do not line up sufficiently for this change to be worthwhile, but this doesn't mean that it may never be considered again; an RFC PR may be reopened if new data is available since the original discussion, or if the PR has changed substantially to address the core problems raised in the prior round. - -Implementation -=== - -When an RFC gets merged, the feature *can* be implemented; however, there's no set timeline for that implementation. In some cases implementation may land in a matter of days after an RFC is merged, in some it may take months. - -To avoid having permanently stale RFCs, in rare cases Luau team can *remove* a previously merged RFC when the landscape is believed to change enough for a feature like this to warrant further discussion. - -When an RFC is implemented and the implementation is enabled via feature flags, RFC should be updated to include "**Status**: Implemented" at the top level (before *Summary* section). diff --git a/rfcs/STATUS.md b/rfcs/STATUS.md deleted file mode 100644 index 653f168f..00000000 --- a/rfcs/STATUS.md +++ /dev/null @@ -1,14 +0,0 @@ -This document tracks unimplemented RFCs. - -## Read-only and write-only properties - -[RFC: Read-only properties](https://github.com/Roblox/luau/blob/master/rfcs/property-readonly.md) | -[RFC: Write-only properties](https://github.com/Roblox/luau/blob/master/rfcs/property-writeonly.md) - -**Status**: Needs implementation - -## Expanded Subtyping for Generic Function Types - -[RFC: Expanded Subtyping for Generic Function Types](https://github.com/Roblox/luau/blob/master/rfcs/generic-function-subtyping.md) - -**Status**: Implemented but not fully rolled out yet. diff --git a/rfcs/TEMPLATE.md b/rfcs/TEMPLATE.md deleted file mode 100644 index 266922b2..00000000 --- a/rfcs/TEMPLATE.md +++ /dev/null @@ -1,21 +0,0 @@ -# Feature name - -## Summary - -One paragraph explanation of the feature. - -## Motivation - -Why are we doing this? What use cases does it support? What is the expected outcome? - -## Design - -This is the bulk of the proposal. Explain the design in enough detail for somebody familiar with the language to understand, and include examples of how the feature is used. - -## Drawbacks - -Why should we *not* do this? - -## Alternatives - -What other designs have been considered? What is the impact of not doing this? diff --git a/rfcs/behavior-eq-metamethod.md b/rfcs/behavior-eq-metamethod.md deleted file mode 100644 index eeb768f0..00000000 --- a/rfcs/behavior-eq-metamethod.md +++ /dev/null @@ -1,59 +0,0 @@ -# Always call `__eq` when comparing for equality - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -`__eq` metamethod will always be called during `==`/`~=` comparison, even for objects that are rawequal. - -## Motivation - -Lua 5.x has the following algorithm it uses for comparing userdatas and tables: - -- If two objects are not of the same type (userdata vs number), they aren't equal -- If two objects are referentially equal, they are equal (!) -- If no object has a metatable with `__eq` metamethod, they are equal iff they are referentially equal -- Otherwise, pick one of the `__eq` metamethods, call it with both objects as arguments and return the result. - -In mid-2019, we've released Luau which implements a fast path for userdata comparison. This fast path accidentally omitted step 2 for userdatas with C `__eq` implementations (!), and thus comparing a userdata object vs itself would actually run `__eq` metamethod. This is significant as it allowed users to use `v == v` as a NaN check for vectors, coordinate frames, and other objects that have floating point contents. - -Since this was a bug, we're in a rather inconsistent state: - -- `==` and `~=` in the code always call `__eq` for userdata with C `__eq` -- `==` and `~=` don't call `__eq` for tables and custom newproxy-like userdatas with Lua `__eq` when objects are ref. equal -- `table.find` *doesn't* call `__eq` when objects are ref. equal - -## Design - -Since developers started relying on `==` behavior for NaN checks in the last two years since Luau release, the bug has become a feature. Additionally, it's sort of a good feature since it allows to implement NaN semantics for custom types - userdatas, tables, etc. - -Thus the proposal suggests changing the rules so that when `__eq` metamethod is present, `__eq` is always called even when comparing the object to itself. - -This would effectively make the current ruleset for userdata objects official, and change the behavior for `table.find` (which is probably not significant) and, more significantly, start calling user-provided `__eq` even when the object is the same. It's expected that any reasonable `__eq` implementation can handle comparing the object to itself so this is not expected to result in breakage. - -## Drawbacks - -This represents a difference in a rather core behavior from all upstream versions of Lua. - -## Alternatives - -We could instead equalize (ha!) the behavior between Luau and Lua. In fact, this is what we tried to do initially as the userdata behavior was considered a bug, but encountered the issue with games already depending on the new behavior. - -We could work with developers to change their games to stop relying on this. However, this is more complicated to deploy and - upon reflection - makes `==` less intuitive than the main proposal when comparing objects with NaN, since e.g. it means that these two functions have a different behavior: - -``` -function compare1(a: Vector3, b: Vector3) - return a == b -end - -function compare2(a: Vector3, b: Vector3) - return a.X == b.X and a.Y == b.Y and a.Z == b.Z -end -``` - -## References - -https://devforum.roblox.com/t/call-eq-even-when-tables-are-rawequal/1088886 -https://devforum.roblox.com/t/nan-vector3-comparison-broken-cframe-too/1130778 diff --git a/rfcs/change-global-version.md b/rfcs/change-global-version.md deleted file mode 100644 index cdb6c1e7..00000000 --- a/rfcs/change-global-version.md +++ /dev/null @@ -1,21 +0,0 @@ -# Change \_VERSION global to "Luau" - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Change \_VERSION global to "Luau" to differentiate Luau from Lua - -## Motivation - -Provide an official way to distinguish Luau from Lua implementation. - -## Design - -We inherit the global string \_VERSION from Lua (this is distinct from Roblox `version()` function that returns a full version number such as 0.432.43589). - -The string is set to "Lua 5.1" for us (and "Lua 5.2" etc for newer versions of Lua. - -Since our implementation is sufficiently divergent from upstream, this proposal suggests setting \_VERSION to "Luau". diff --git a/rfcs/config-luaurc.md b/rfcs/config-luaurc.md deleted file mode 100644 index b321b94b..00000000 --- a/rfcs/config-luaurc.md +++ /dev/null @@ -1,68 +0,0 @@ -# Configure analysis via .luaurc - -**Status**: Implemented - -## Summary - -Introduces a way to configure type checker and linter using JSON-like .luaurc files - -## Motivation - -While Luau analysis tools try to provide sensible defaults, it's difficult to establish the rules that work for all code. -For example, some packages may decide that unused variables aren't interesting; other packages may decide that all files should be using strict typechecking mode. - -While it's possible to configure some aspects of analysis behavior using --! comments, it can be cumbersome to replicate this in all files. - -## Design - -To solve this problem, we are going to introduce support for `.luaurc` files for users of command-line Luau tools. -For a given .lua file, Luau will search for .luaurc files starting from the folder that the .lua file is in; all files in the ancestry chain will be parsed and their configuration -applied. When multiple files are used, the file closer to the .lua file overrides the settings. - -.luaurc is a JSON file that can also contain comments and trailing commas. The file can have the following keys: - -- `"languageMode"`: type checking mode, can be one of "nocheck", "nonstrict", "strict" -- `"lint"`: lints to enable; points to an object that maps string literals that correspond to the names of linting rules (see https://luau-lang.org/lint), or `"*"` that means "all rules", to a boolean (to enable/disable the lint) -- `"lintErrors"`: a boolean that controls whether lint issues are reported as errors or warnings (off by default) -- `"typeErrors"`: a boolean that controls whether type issues are reported as errors or warnings (on by default) -- `"globals"`: extra global values; points to an array of strings where each string names a global that the type checker and linter must assume is valid and of type `any` - -Example of a valid .luaurc file: - -```json5 -{ - "languageMode": "nonstrict", - "lint": { "*": true, "LocalUnused": false }, - "lintErrors": true, - "globals": ["expect"] // TestEZ -} -``` - -Note that in absence of a configuration file, we will use default settings: languageMode will be set to nonstrict, a set of lint warnings is going to be enabled by default (this proposal doesn't detail that set - that will be subject to a different proposal), type checking issues are going to be treated as errors, lint issues are going to be treated as warnings. - -## Design -- compatibility - -Today we support .robloxrc files; this proposal will keep parsing legacy specification of configuration for compatibility: - -- Top-level `"language"` key can refer to an object that has `"languageMode"` key that also defines language mode -- Top-level `"lint"` object values can refer to a string `"disabled"`/`"enabled"`/`"fatal"` instead of a boolean as a value. - -These keys are only going to be supported for compatibility and only when the file name is .robloxrc (which is only going to be parsed by internal Roblox command line tools but this proposal mentions it for completeness). - -## Drawbacks - -The introduction of configuration files means that it's now impossible to type check or lint sources in isolation, which complicates the code setup. - -File-based JSON configuration may or may not map cleanly to environments that don't support files, such as Roblox Studio. - -Using JSON5 instead of vanilla JSON limits the interoperability. - -There's no way to force specific lints to be fatal, although this can be solved in the future by promoting the "compatibility" feature where one can specify a string to a non-compatibility feature. - -## Alternatives - -It's possible to consider forcing users to specify the source settings via `--!` comments exclusively. This is problematic as it may require excessive amounts of annotation though, which this proposal aims to simplify. - -The format of the configuration file does not have to be JSON; for example, it can be a valid Luau source file which is the approach luacheck takes. This makes it more difficult to repurpose the .luaurc file to use third-party processing tools though, e.g. a package manager would need to learn how to parse Luau syntax to store configuration in .luaurc. - -It's possible to use the old style of lint rule specification with "enabled"/"fatal"/etc., but it's more verbose and is more difficult to use in common scenarios, such as "all enabled lints are fatal and these are the ones we need to enable in addition to the default set" is impossible to specify. diff --git a/rfcs/deprecate-getfenv-setfenv.md b/rfcs/deprecate-getfenv-setfenv.md deleted file mode 100644 index 9e80f9c6..00000000 --- a/rfcs/deprecate-getfenv-setfenv.md +++ /dev/null @@ -1,38 +0,0 @@ -# Deprecate getfenv/setfenv - -**Status**: Implemented - -## Summary - -Mark getfenv/setfenv as deprecated - -## Motivation - -getfenv and setfenv are problematic for a host of reasons: - -- They allow uncontrolled mutation of global environment, which results in deoptimization; various important performance features -like builtin calls or imports are disabled when these functions are used. -- Because of the uncontrolled mutation code that uses getfenv/setfenv can't be typechecked correctly; in particular, injecting new -globals is going to produce "unknown globals" warnings, and modifying existing globals can trivially violate soundness wrt type -checking -- While these functions can be used for good (once you ignore the issues above), such as custom module systems, statistically speaking -they are mostly used to obfuscate code to hide malicious intent. - -## Design - -We will mark getfenv and setfenv as deprecated. The only consequence of this change is that the linter will start emitting warnings when they are used. - -Removing support for getfenv/setfenv, while tempting, is not planned in the foreseeable future because it will cause significant backwards compatibility issues. - -## Drawbacks - -There are valid uses for getfenv/setfenv, that include extra logging (in Roblox code this manifests as `getfenv(1).script`), monkey patching for mocks in unit tests, and custom -module systems that inject globals into the calling environment. We do have a replacement for logging use cases, `debug.info`, and we do have an officially recommended replacement -for custom module systems, which is to use `require` that doesn't result in issues that fenv modification carries and can be understood by the type checker, we do not have an -alternative for mocks. As such, testing frameworks that implement mocking via setfenv/getfenv will need to use `--!nolint DeprecatedGlobal` to avoid this warning. - -## Alternatives - -Besides the obvious alternative "do nothing", we could also consider implementing Lua 5.2 support for _ENV. However, since we do not have a way to load script files other than -via `require` that doesn't support _ENV, and `loadstring` is supported but discouraged, we do not currently plan to implement `_ENV` although it's possible that this will happen -in the future. diff --git a/rfcs/deprecate-table-getn-foreach.md b/rfcs/deprecate-table-getn-foreach.md deleted file mode 100644 index ea8e1e6b..00000000 --- a/rfcs/deprecate-table-getn-foreach.md +++ /dev/null @@ -1,31 +0,0 @@ -# Deprecate table.getn/foreach/foreachi - -**Status**: Implemented - -## Summary - -Mark table.getn/foreach/foreachi as deprecated - -## Motivation - -`table.getn`, `table.foreach` and `table.foreachi` were deprecated in Lua 5.1 that Luau is based on, and removed in Lua 5.2. - -`table.getn(x)` is equivalent to `rawlen(x)` when `x` is a table; when `x` is not a table, `table.getn` produces an error. It's difficult to imagine code where `table.getn(x)` is better than either `#x` (idiomatic) or `rawlen(x)` (fully compatible replacement). However, `table.getn` is slower and provides yet another way to perform an operation, leading new users of the language to use it unknowingly. - -`table.foreach` is equivalent to a `for .. pairs` loop; `table.foreachi` is equivalent to a `for .. ipairs` loop; both may also be replaced by generalized iteration. Both functions are significantly slower than equivalent `for` loop replacements, are more restrictive because the function can't yield, and result in new users (particularly coming from JS background) unknowingly using these thus producing non-idiomatic non-performant code. - -In both cases, the functions bring no value over other library or language alternatives, and thus just serve as a distraction. - -## Design - -We will mark all three functions as deprecated. The only consequence of this change is that the linter will start emitting warnings when they are used. - -Removing support for these functions doesn't provide any measurable value and as such is not planned in the foreseeable future because it may cause backwards compatibility issues. - -## Drawbacks - -None - -## Alternatives - -If we consider table.getn/etc as supported, we'd want to start optimizing their usage which gets particularly tricky with foreach and requires more compiler machinery than this is probably worth. diff --git a/rfcs/disallow-proposals-leading-to-ambiguity-in-grammar.md b/rfcs/disallow-proposals-leading-to-ambiguity-in-grammar.md deleted file mode 100644 index d9c5c7d7..00000000 --- a/rfcs/disallow-proposals-leading-to-ambiguity-in-grammar.md +++ /dev/null @@ -1,129 +0,0 @@ -# Disallow `name T` and `name(T)` in future syntactic extensions for type annotations - -## Summary - -We propose to disallow the syntax `` `('`` as well as ` ` in future syntax extensions for type annotations to ensure that all existing programs continue to parse correctly. This still keeps the door open for future syntax extensions of different forms such as `` `<' `>'``. - -## Motivation - -Lua and by extension Luau's syntax is very free form, which means that when the parser finishes parsing a node, it doesn't try to look for a semi-colon or any termination token e.g. a `{` to start a block, or `;` to end a statement, or a newline, etc. It just immediately invokes the next parser to figure out how to parse the next node based on the remainder's starting token. - -That feature is sometimes quite troublesome when we want to add new syntax. - -We have had cases where we talked about using syntax like `setmetatable(T, MT)` and `keyof T`. They all look innocent, but when you look beyond that, and try to apply it onto Luau's grammar, things break down really fast. - -### `F(T)`? - -An example that _will_ cause a change in semantics: - -``` -local t: F -(u):m() -``` - -where today, `local t: F` is one statement, and `(u):m()` is another. If we had the syntax for `F(T)` here, it becomes invalid input because it gets parsed as - -``` -local t: F(u) -:m() -``` - -This is important because of the `setmetatable(T, MT)` case: - -``` -type Foo = setmetatable({ x: number }, { ... }) -``` - -For `setmetatable`, the parser isn't sure whether `{}` is actually a type or an expression, because _today_ `setmetatable` is parsed as a type reference, and `({}, {})` is the remainder that we'll attempt to parse as a statement. This means `{ x: number }` is invalid table _literal_. Recovery by backtracking is technically possible here, but this means performance loss on invalid input + may introduce false positives wrt how things are parsed. We'd much rather take a very strict stance about how things get parsed. - -### `F T`? - -An example that _will_ cause a change in semantics: - -``` -local function f(t): F T - (t or u):m() -end -``` - -where today, the return type annotation `F T` is simply parsed as just `F`, followed by a ambiguous parse error from the statement `T(t or u)` because its `(` is on the next line. If at some point in the future we were to allow `T` followed by `(` on the next line, then there's yet another semantic change. `F T` could be parsed as a type annotation and the first statement is `(t or u):m()` instead of `F` followed by `T(t or u):m()`. - -For `keyof`, here's a practical example of the above issue: - -``` -type Vec2 = {x: number, y: number} - -local function f(t, u): keyof Vec2 - (t or u):m() -end -``` - -There's three possible outcomes: - 1. Return type of `f` is `keyof`, statement throws a parse error because `(` is on the next line after `Vec2`, - 2. Return type of `f` is `keyof Vec2` and next statement is `(t or u):m()`, or - 3. Return type of `f` is `keyof` and next statement is `Vec2(t or u):m()` (if we allow `(` on the next line to be part of previous line). - -This particular case is even worse when we keep going: - -``` -local function f(t): F - T(t or u):m() -end -``` - -``` -local function f(t): F T - {1, 2, 3} -end -``` - -where today, `F` is the return type annotation of `f`, and `T(t or u):m()`/`T{1, 2, 3}` is the first statement, respectively. - -Adding some syntax for `F T` **will** cause the parser to change the semantics of the above three examples. - -### But what about `typeof(...)`? - -This syntax is grandfathered in because the parser supported `typeof(...)` before we stabilized our syntax, and especially before type annotations were released to the public, so we didn't need to worry about compatibility here. We are very glad that we used parentheses in this case, because it's natural for expressions to belong within parentheses `()`, and types to belong within angles `<>`. - -## The One Exception with a caveat - -This is a strict requirement! - -`function() -> ()` has been talked about in the past, and this one is different despite falling under the same category as `` `('``. The token `function` is in actual fact a "hard keyword," meaning that it cannot be parsed as a type annotation because it is not an identifier, just a keyword. - -Likewise, we also have talked about adding standalone `function` as a type annotation (semantics of it is irrelevant for this RFC) - -It's possible that we may end up adding both, but the requirements are as such: - 1. `function() -> ()` must be added first before standalone `function`, OR - 2. `function` can be added first, but with a future-proofing parse error if `<` or `(` follows after it - -If #1 is what ends up happening, there's not much to worry about because the type annotation parser will parse greedily already, so any new valid input will remain valid and have same semantics, except it also allows omitting of `(` and `<`. - -If #2 is what ends up happening, there could be a problem if we didn't future-proof against `<` and `(` to follow `function`: - -``` - return f :: function(T) -> U -``` - -which would be a parse error because at the point of `(` we expect one of `until`, `end`, or `EOF`, and - -``` - return f :: function(a) -> a -``` - -which would also be a parse error by the time we reach `->`, that is the production of the above is semantically equivalent to `(f < a) > (a)` which would compare whether the value of `f` is less than the value of `a`, then whether the result of that value is greater than `a`. - -## Alternatives - -Only allow these syntax when used inside parentheses e.g. `(F T)` or `(F(T))`. This makes it inconsistent with the existing `typeof(...)` type annotation, and changing that over is also breaking change. - -Support backtracking in the parser, so if `: MyType(t or u):m()` is invalid syntax, revert and parse `MyType` as a type, and `(t or u):m()` as an expression statement. Even so, this option is terrible for: - 1. parsing performance (backtracking means losing progress on invalid input), - 2. user experience (why was this annotation parsed as `X(...)` instead of `X` followed by a statement `(...)`), - 3. has false positives (`foo(bar)(baz)` may be parsed as `foo(bar)` as the type annotation and `(baz)` is the remainder to parse) - -## Drawbacks - -To be able to expose some kind of type-level operations using `F` syntax, means one of the following must be chosen: - 1. introduce the concept of "magic type functions" into type inference, or - 2. introduce them into the prelude as `export type F = ...` (where `...` is to be read as "we haven't decided") diff --git a/rfcs/function-bit32-byteswap.md b/rfcs/function-bit32-byteswap.md deleted file mode 100644 index fff22924..00000000 --- a/rfcs/function-bit32-byteswap.md +++ /dev/null @@ -1,46 +0,0 @@ -# bit32.byteswap - -## Summary - -Add `bit32.byteswap` to swap the endianness of a 32-bit integer. - -## Motivation - -The endianness of an integer is generally invisible to Luau users. Numbers are treated as expected regardless of their underlying representation, as is standard across programming languages. However, in some file formats and algorithms, the endianness of an integer is important, so it becomes necessary to swap the order of bytes of an integer to 'pretend' that it is one endian or the other. - -While the endianness of numbers can be swapped through a few methods, it is cumbersome. Modern CPUs have instructions dedicated to this (`bswap` on x86-64, `rev` on aarch64) but in Luau, the current best method is to manually shift bytes around and OR them together. For 32-bit integers, this becomes a total of 7 calls: - -```lua -bit32.bor( - bit32.lshift(n, 24), - bit32.band(bit32.lshift(n, 8), 0xFF0000), - bit32.band(bit32.rshift(n, 8), 0xFF00), - bit32.rshift(n, 24), -) -``` - -Along with being inefficient, it is also difficult read this code and remember it. It took the author of this RFC several tries to write the above example correctly. - -## Design - -The `bit32` library will gain a new function: `bit32.byteswap`: - -``` -bit32.byteswap(n: number): number -``` - -`byteswap` will take the bytes of a number and swap their endianness. To be exact, for an integer `0xA1B2_C3D4`, it will return `0xD4C3_B2A1`. - -## Drawbacks - -There is a reasonable expectation that `bit32` functions recieve built-in implementations to improve their performance. This is even more true with native codegen. As this functionality is relatively niche, it may not be worth including it for that reason alone because it would occupy a built-in function slot in the VM. - -However even without a built-in call, an initial implementation was still significantly faster than the alternative presented above. So, the only drawback known is in the marginal increase to the overall VM complexity, which is not considered to be a serious drawback. - -## Alternatives - -A function to simply convert an integer to little-endian was considered, but was rejected due to a basic logic: it is impossible to know whether a given integer is in little-endian so the function may as well be a generic swapping function. Naming such a function is also potentially complex without being verbose (`bit32.tole` is a bad name, but `bit32.tolittleendian` is too long). - -Simply using the existing `bit32` functions as presented at the beginning of the RFC is not unworkably slow, so it is a viable alternative for a niche use case like this. However, as noted before it is complicated to visually parse. - -It may be more reasonable to identify and implement use cases for this function rather than the function itself. However, this is not sustainable: it is doubtful anyone wishes to include support for MD5 hashing natively, as an example. \ No newline at end of file diff --git a/rfcs/function-bit32-countlz-countrz.md b/rfcs/function-bit32-countlz-countrz.md deleted file mode 100644 index b4ccb197..00000000 --- a/rfcs/function-bit32-countlz-countrz.md +++ /dev/null @@ -1,52 +0,0 @@ -# bit32.countlz/countrz - -**Status**: Implemented - -## Summary - -Add bit32.countlz (count left zeroes) and bit32.countrz (count right zeroes) to accelerate bit scanning - -## Motivation - -All CPUs have instructions to determine the position of first/last set bit in an integer. These instructions have a variety of uses, the popular ones being: - -- Fast implementation of integer logarithm (essentially allowing to compute `floor(log2(value))` quickly) -- Scanning set bits in an integer, which allows efficient traversal of compact representation of bitmaps -- Allocating bits out of a bitmap quickly - -Today it's possible to approximate `countlz` using `floor` and `log` but this approximation is relatively slow; approximating `countrz` is difficult without iterating through each bit. - -## Design - -`bit32` library will gain two new functions, `countlz` and `countrz`: - -``` -function bit32.countlz(n: number): number -function bit32.countrz(n: number): number -``` - -`countlz` takes an integer number (converting the input number to a 32-bit unsigned integer as all other `bit32` functions do), and returns the number of consecutive left-most zero bits - that is, the number of most significant zero bits in a 32-bit number until the first 1. The result is in `[0, 32]` range. - -For example, when the input number is `0`, it's `32`. When the input number is `2^k`, the result is `31-k`. - -`countrz` takes an integer number (converting the input number to a 32-bit unsigned integer as all other `bit32` functions do), and returns the number of consecutive right-most zero bits - that is, -the number of least significant zero bits in a 32-bit number until the first 1. The result is in `[0, 32]` range. - -For example, when the input number is `0`, it's `32`. When the input number is `2^k`, the result is `k`. - -> Non-normative: a proof of concept implementation shows that a polyfill for `countlz` takes ~34 ns per loop iteration when computing `countlz` for an increasing number sequence, whereas -> a builtin implementation takes ~4 ns. - -## Drawbacks - -None known. - -## Alternatives - -These functions can be alternatively specified as "find the position of the most/least significant bit set" (e.g. "ffs"/"fls" for "find first set"/"find last set"). This formulation -can be more immediately useful since the bit position is usually more important than the number of bits. However, the bit position is undefined when the input number is zero, -returning a sentinel such as -1 seems non-idiomatic, and returning `nil` seems awkward for calling code. Counting functions don't have this problem. - -An early version of this proposal suggested `clz`/`ctz` (leading/trailing) as names; however, using a full verb is more consistent with other operations like shift/rotate, and left/right may be easier to understand intuitively compared to leading/trailing. left/right are used by C++20. - -Of the two functions, `countlz` is vastly more useful than `countrz`; we could implement just `countlz`, but having both is nice for symmetry. diff --git a/rfcs/function-coroutine-close.md b/rfcs/function-coroutine-close.md deleted file mode 100644 index b9ffbf6f..00000000 --- a/rfcs/function-coroutine-close.md +++ /dev/null @@ -1,36 +0,0 @@ -# coroutine.close - -**Status**: Implemented - -## Summary - -Add `coroutine.close` function from Lua 5.4 that takes a suspended coroutine and makes it "dead" (non-runnable). - -## Motivation - -When implementing various higher level objects on top of coroutines, such as promises, it can be useful to cancel the coroutine execution externally - when the caller is not -interested in getting the results anymore, execution can be aborted. Since coroutines don't provide a way to do that externally, this requires the framework to implement -cancellation on top of coroutines by keeping extra status/token and checking that token in all places where the coroutine is resumed. - -Since coroutine execution can be aborted with an error at any point, coroutines already implement support for "dead" status. If it were possible to externally transition a coroutine -to that status, it would be easier to implement cancellable promises on top of coroutines. - -## Design - -We implement Lua 5.4 behavior exactly with the exception of to-be-closed variables that we don't support. Quoting Lua 5.4 manual: - -> coroutine.close (co) -> Closes coroutine co, that is, puts the coroutine in a dead state. The given coroutine must be dead or suspended. In case of error (either the original error that stopped the coroutine or errors in closing methods), returns false plus the error object; otherwise returns true. - -The `co` argument must be a coroutine object (of type `thread`). - -After closing the coroutine, it gets transitioned to dead state which means that `coroutine.status` will return `"dead"` and attempts to resume the coroutine will fail. In addition, the coroutine stack (which can be accessed via `debug.traceback` or `debug.info`) will become empty. Calling `coroutine.close` on a closed coroutine will return `true` - after closing, the coroutine transitions into a "dead" state with no error information. - -## Drawbacks - -None known, as this function doesn't introduce any existing states to coroutines, and is similar to running the coroutine to completion/error. - -## Alternatives - -Lua's name for this function is likely in part motivated by to-be-closed variables that we don't support. As such, a more appropriate name could be `coroutine.cancel` which also -aligns with use cases better. However, since the semantics is otherwise the same, using the same name as Lua 5.4 reduces library fragmentation. diff --git a/rfcs/function-debug-info.md b/rfcs/function-debug-info.md deleted file mode 100644 index 5f486db4..00000000 --- a/rfcs/function-debug-info.md +++ /dev/null @@ -1,109 +0,0 @@ -# debug.info - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Add `debug.info` as programmatic debug info access API, similarly to Lua's `debug.getinfo` - -## Motivation - -Today Luau provides only one method to get the callstack, `debug.traceback`. This method traverses the entire stack and returns a string containing the call stack details - with no guarantees about the format of the call stack. As a result, the string doesn't present a formal API and can't be parsed programmatically. - -There are a few cases where this can be inconvenient: - -- Sometimes it is useful to pass the resulting call stack to some system expecting a structured input, e.g. for crash aggregation -- Sometimes it is useful to use the information about the caller for logging or filtering purposes; in these cases using just the script name can be useful, and getting script name out of the traceback is slow and imprecise - -Additionally, in some cases instead of getting the information (such as script or function name) from the callstack, it can be useful to get it from a function object for diagnostic purposes. For example, maybe you want to call a callback and if it doesn't return expected results, display a user-friendly error message that contains the function name & script location - these aren't possible today at all. - -## Design - -The proposal is to expose a function from Lua standard library, `debug.getinfo`, to fix this problem - but change the function's signature for efficiency: - -> debug.info([thread], [function | level], options) -> any... - -(note that the function has been renamed to make it more obvious that the behavior differs from that of Lua) - -The parameters of the function match that of Lua's variant - the first argument is either a function object or a stack level (which is a number starting from 1, where 1 means "my caller"), or a thread (followed by the stack level), followed by a string that contains a list of things the result needs to contain: - - * s - function source identifier, in Roblox environment this is equal to the full name of the script the function is defined in - * l - line number that the function is defined on (when examining a function) or line number of the stack frame (when examining a stack frame) - * n - function name if present; this can be absent for anonymous functions or some C functions that don't have an assigned debug name - * a - function arity information, which refers to the parameter count and whether the function is variadic or not - * f - function object - -Unlike Lua version, which would use the options given to fill a resulting table (e.g. "l" would map to a "currentline" and "linedefined" fields of the output table), our version will return the requested information in the order that it was requested in in the string - all letters specified above map to one extra returned value, "a" maps to a pair of a parameter number and a boolean indicating variadic status. - -For example, here's how you implement a stack trace function: - -``` - for i=1,100 do -- limit at 100 entries for very deep stacks - local source, name, line = debug.info(i, "snl") - if not source then break end - if line >= 0 then - print(string.format("%s(%d): %s", source, line, name or "anonymous")) - else - print(string.format("%s: %s", source, name or "anonymous")) - end - end -``` - -output: - -``` - cs.lua(3): stacktrace - cs.lua(17): bar - cs.lua(13): foo - [C]: pcall - cs.lua(20): anonymous -``` - -When the first argument is a number and the input level is out of bounds, the function returns no values. - -### Why the difference from Lua? - -Lua's variant of this function has the same string as an input and the same thread/function/level combo as arguments before that, but returns a table with the requested data - or nil, when stack is exhausted. - -The problem with this solution is performance. It results in generating excessive garbage by wrapping results in a table, which slows down the function call itself and generates extra garbage that needs to be collected later. This is not a problem for error handling scenarios, but can be an issue when logging is required; for example, `debug.info` with options containing a single result, "s" (mapping to source identifier aka script name), runs 3-4x slower when using a table variant with the current implementation of both functions in our VM. - -While the difference in behavior is unfortunate, note that Lua has a long-standing precedent of using characters in strings to define the set of inputs or outputs for functions; of particular note is string.unpack which closely tracks this proposal where input string characters tell the implementation what data to return. - -### Why not hardcode the options? - -One possibility is that we could return all data associated with the function or a stack frame as a tuple. - -This would work but has issues: - -1. Because of the tuple-like API, the code becomes more error prone and less self-descriptive. -2. Some data is more expensive to access than other data - by forcing all callers to process all possible data we regress in performance; this is also why the original Lua API has an options string - -To make sure we appropriately address 1, unlike Lua API in our API options string is mandatory to specify. - -### Sandboxing risk? - -Compared to information that you can already parse from traceback, the only extra data we expose is the function object. This is valuable when collecting stacks because retrieving the function object is faster than retrieving the associated source/name data - for example a very performant stack tracing implementation could collect data using "fl" (function and line number), and later when it comes the time to display the results, use `debug.info` again with "sn" to get script & name data from the object. - -This technically wasn't possible to get before - this means in particular that if your function is ever called by another function, a malicious script could grab that function object again and call it with different arguments. However given that it's already possible to mutate global environment of any function on the callstack using getfenv/setfenv, the extra risk presented here seems minimal. - -### Options delta from Lua - -Lua presents the following options in getinfo: - -* `n´ selects fields name and namewhat -* `f´ selects field func -* `S´ selects fields source, short_src, what, and linedefined -* `l´ selects field currentline -* `u´ selects field nup - -We chose to omit `namewhat` as it's not meaningful in our implementation, omit `what` as it's redundant wrt source/short_src for C functions, replace source/short_src with only a single option (`s`) to avoid leaking script source via callstack API, remove `u` because there are no use cases for knowing the number of upvalues without debug.getupvalue API, and add `a` which has been requested by Roact team before for complex backwards compatibility workarounds wrt passed callbacks. - -## Drawbacks - -Having a different way to query debug information from Lua requires language-specific dispatch for code that wants to work on Lua and Luau. - -## Alternatives - -We could expose `debug.getinfo` from Lua as is; the problem is that in addition to performance issues highlighted above, Luau implementation doesn't track the same data and as such can't provide a fully compatible implementation short of implementing a shim for the sake of compatibility - an option this proposal keeps open. diff --git a/rfcs/function-string-pack-unpack.md b/rfcs/function-string-pack-unpack.md deleted file mode 100644 index 5315f4c3..00000000 --- a/rfcs/function-string-pack-unpack.md +++ /dev/null @@ -1,71 +0,0 @@ -# string.pack/unpack/packsize from Lua 5.3 - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Add string pack/unpack from Lua 5.3 for binary interop, with small tweaks to format specification to make format strings portable. - -## Motivation - -While the dominant usecase for Luau is a game programming language, for backend work it's sometimes the case that developers need to work with formats defined outside of Roblox. When these are structured as JSON, it's easy, but if they are binary, it's not. Additionally for the game programming, often developers end up optimizing their data transmission using custom binary codecs where they know the range of the data (e.g. it's much more efficient to send a number using 1 byte if you know the number is between 0 and 1 and 8 bits is enough, but RemoteEvent/etc won't do it for you because it guarantees lossless roundtrip). For both working with external data and optimizing data transfer, it would be nice to have a way to work with binary data. - -This is doable in Luau using `string.byte`/`string.char`/`bit32` library/etc. but tends to be a bit cumbersome. Lua 5.3 provides functions `string.pack`/`string.unpack`/`string.packsize` that, while not solving 100% of the problems, often make working with binary much easier and much faster. This proposal suggests adding them to Luau - this will both further our goal to be reasonably compatible with latest Lua versions, and make it easier for developers to write some types of code. - -## Design - -Concretely, this proposal suggests adding the following functions: - -``` -string.pack (fmt, v1, v2, ···) -``` - -Returns a binary string containing the values v1, v2, etc. packed (that is, serialized in binary form) according to the format string fmt. - -``` -string.packsize (fmt) -``` - -Returns the size of a string resulting from string.pack with the given format. The format string cannot have the variable-length options 's' or 'z'. - -``` -string.unpack (fmt, s [, pos]) -``` - -Returns the values packed in string s (see string.pack) according to the format string fmt. An optional pos marks where to start reading in s (default is 1). After the read values, this function also returns the index of the first unread byte in s. - -The format string is a sequence of characters that define the data layout that is described here in full: https://www.lua.org/manual/5.3/manual.html#6.4.2. We will adopt this wholesale, but we will guarantee that the resulting code is cross-platform by: - -a) Ensuring native endian is little endian (de-facto true for all our platforms) -b) Fixing sizes of native formats to 2b short, 4b int, 8b long -c) Treating `size_t` in context of `T` and `s` formats as a 32-bit integer - -Of course, the functions are memory-safe; if the input string is too short to provide all relevant data they will fail with "data string is too short" error. - -This may seem slightly unconventional but it's very powerful and expressive, in much the same way format strings and regular expressions are :) Here's a basic example of how you might transmit a 3-component vector with this: - -``` --- returns a 24-byte string with 64-bit double encoded three times, similar to how we'd replicate 3 raw numbers -string.pack("ddd", x, y, z) - --- returns a 12-byte string with 32-bit float encoded three times, similar to how we'd replicate Vector3 -string.pack("fff", x, y, z) - --- returns a 3-byte string with each value stored in 8 bits --- assumes -1..1 range; this code doesn't round the right way because I'm too lazy -string.pack("bbb", x * 127, y * 127, z * 127) -``` - -The unpacking of the data is symmetrical - using the same format string and `string.unpack` you get the encoded data back. - -## Drawbacks - -The format specification is somewhat arbitrary and is likely to be unfamiliar to people who come with prior experience in other languages (having said that, this feature closely follows equivalent functionality from Ruby). - -The implementation of string pack/unpack requires yet another format string matcher, which increases complexity of the builtin libraries and static analysis (since we need to provide linting for another format string syntax). - -## Alternatives - -We could force developers to rely on existing functionality for string packing; it is possible to replicate this proposal in a library, although at a much reduced performance. diff --git a/rfcs/function-table-clear.md b/rfcs/function-table-clear.md deleted file mode 100644 index 92279928..00000000 --- a/rfcs/function-table-clear.md +++ /dev/null @@ -1,21 +0,0 @@ -# table.clear - -> Note: this RFC was adapted from an internal proposal that predates RFC process and as such doesn't follow the template precisely - -**Status**: Implemented - -## Summary - -Add `table.clear` function that removes all elements from the table but keeps internal capacity allocated. - -## Design - -`table.clear` adds a fast way to clear a Lua table. This is effectively a sister function to `table.create()`, only for reclaiming an existing table's memory rather than pre-allocating a new one. Use cases: - -* Often you want to recalculate a set or map data structure based on a table. Currently there is no good way to do this, the fastest way is simply to throw away the old table and construct a new empty one to work with. This is wasteful since often the new structure will take a similar amount of memory to the old one. - -* Sometimes you have a shared table which multiple scripts access. In order to clear this kind of table, you have no other option than to use a slow for loop setting each index to nil. - -These use cases can technically be accomplished via `table.move` moving from an empty table to the table which is to be edited, but I feel that they are frequent enough to warrant a clearer more understandable method which has an opportunity to be more efficient. - -Like `table.move`, does not invoke any metamethods. Not that it would anyways, given that assigning nil to an index never invokes a metamethod. diff --git a/rfcs/function-table-clone.md b/rfcs/function-table-clone.md deleted file mode 100644 index 8cb97984..00000000 --- a/rfcs/function-table-clone.md +++ /dev/null @@ -1,66 +0,0 @@ -# table.clone - -**Status**: Implemented - -## Summary - -Add `table.clone` function that, given a table, produces a copy of that table with the same keys/values/metatable. - -## Motivation - -There are multiple cases today when cloning tables is a useful operation. - -- When working with tables as data containers, some algorithms may require modifying the table that can't be done in place for some reason. -- When working with tables as objects, it can be useful to obtain an identical copy of the object for further modification, preserving the metatable. -- When working with immutable data structures, any modification needs to clone some parts of the data structure to produce a new version of the object. - -While it's possible to implement this function in user code today, it's impossible to implement it with maximum efficiency; furthermore, cloning is a reasonably fundamental -operation so from the ergonomics perspective it can be expected to be provided by the standard library. - -## Design - -`table.clone(t)` takes a table, `t`, and returns a new table that: - -- has the same metatable -- has the same keys and values -- is not frozen, even if `t` was - -The copy is shallow: implementing a deep recursive copy automatically is challenging (for similar reasons why we decided to avoid this in `table.freeze`), and often only certain keys need to be cloned recursively which can be done after the initial clone. - -The table can be modified after cloning; as such, functions that compute a slightly modified copy of the table can be easily built on top of `table.clone`. - -`table.clone(t)` is functionally equivalent to the following code, but it's more ergonomic (on the account of being built-in) and significantly faster: - -```lua -assert(type(t) == "table") -local nt = {} -for k,v in pairs(t) do - nt[k] = v -end -if type(getmetatable(t)) == "table" then - setmetatable(nt, getmetatable(t)) -end -``` - -The reason why `table.clone` can be dramatically more efficient is that it can directly copy the internal structure, preserving capacity and exact key order, and is thus -limited purely by memory bandwidth. In comparison, the code above can't predict the table size ahead of time, has to recreate the internal table structure one key at a time, -and bears the interpreter overhead (which can be avoided for numeric keys with `table.move` but that doesn't work for the general case of dictionaries). - -Out of the abundance of caution, `table.clone` will fail to clone the table if it has a protected metatable. This is motivated by the fact that you can't do this today, so -there are no new potential vectors to escape various sandboxes. Superficially it seems like it's probably reasonable to allow cloning tables with protected metatables, but -there may be cases where code manufactures tables with unique protected metatables expecting 1-1 relationship and cloning would break that, so for now this RFC proposes a more -conservative route. We are likely to relax this restriction in the future. - -## Drawbacks - -Adding a new function to `table` library theoretically increases complexity. In practice though, we already effectively implement `table.clone` internally for some VM optimizations, so exposing this to the users bears no cost. - -Assigning a type to this function is a little difficult if we want to enforce the "argument must be a table" constraint. It's likely that we'll need to type this as `table.clone(T): T` for the time being, which is less precise. - -## Alternatives - -We can implement something similar to `Object.assign` from JavaScript instead, that simultaneously assigns extra keys. However, this won't be fundamentally more efficient than -assigning the keys afterwards, and can be implemented in user space. Additionally, we can later extend `clone` with an extra argument if we so choose, so this proposal is the -minimal viable one. - -We can immediately remove the rule wrt protected metatables, as it's not clear that it's actually problematic to be able to clone tables with protected metatables. diff --git a/rfcs/function-table-create-find.md b/rfcs/function-table-create-find.md deleted file mode 100644 index 671e16af..00000000 --- a/rfcs/function-table-create-find.md +++ /dev/null @@ -1,28 +0,0 @@ -# table.create and table.find - -> Note: this RFC was adapted from an internal proposal that predates RFC process and as such doesn't follow the template precisely - -**Status**: Implemented - -## Design - -This proposal suggests adding two new builtin table functions: - -`table.create(count, value)`: Creates an array with count values, initialized to value. This can be useful to preallocate large tables - repeatedly appending an element to the table repeatedly reallocates it. count is converted to an integer using standard conversion/coercion rules (strings are converted to doubles, doubles are converted to integers using truncation). Negative counts result in the function failing. Positive counts that are too large and would cause a heap allocation error also result in function failing. When value is nil or omitted, table is preallocated without storing anything in it - this is roughly equivalent to creating a large table literal filled with `nil`, or preallocating a table by assigning a sufficiently large numeric index to a value and then erasing it by reassigning it to nil. - -`table.find(table, value [, init])`: Looks for value in the array part of the table; returns index of first occurrence or nil if value is not found. Comparison is performed using standard equality (non-raw) to make sure that objects like Vector3 etc. can be found. The first nil value in the array part of the table terminates the traversal. init is an optional numeric index where the search starts and it defaults to 1; this can be useful to go through repeat occurrences. - -`table.create` can not be replicated efficiently in Lua at all; `table.find` is provided as a faster and more convenient option compared to the code above. - -`table.find` is roughly equivalent to the following code modulo semantical oddities with #t and performance: - -``` -function find(table, value, init) - for i=init or 1, #table do - if rawget(table, i) == value then - return i - end - end - return nil -end -``` diff --git a/rfcs/function-table-freeze.md b/rfcs/function-table-freeze.md deleted file mode 100644 index ca819882..00000000 --- a/rfcs/function-table-freeze.md +++ /dev/null @@ -1,55 +0,0 @@ -# table.freeze - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Add `table.freeze` which allows to make a table read-only in a shallow way. - -## Motivation - -Lua tables by default are freely modifiable in every possible way: you can add new fields, change values for existing fields, or set or unset the metatable. - -Today it is possible to customize the behavior for *adding* new fields by setting a metatable that overrides `__newindex` (including setting `__newindex` to a function that always errors to prohibit additions of new fields). - -Today it is also possible to customize the behavior of setmetatable by "locking" the metatable - this can be achieved by setting a meta-index `__metatable` to something, which would block setmetatable from functioning and force metatable to return the provided value. With this it's possible to prohibit customizations of a table's behavior, but existing fields can still be assigned to. - -To make an existing table read-only, one needs to combine these mechanisms, by creating a new table with a locked metatable, which has an `__index` function pointing to the old table. However, this results in iteration and length operator not working on the resulting table, and carries a performance cost - both for creating the table, and for repeated property access. - -## Design - -This proposal proposes formalizing the notion of "read-only" tables by providing two new table functions: - -- `table.freeze(t)`: given a non-frozen table t, freezes it; fails when t is not a table or is already frozen. Returns t. -- `table.isfrozen(t)`: given a table t, returns a boolean indicating the frozen status; fails when t is not a table. - -When a table is frozen, the following is true: - -- Attempts to modify the existing keys of the table fail (regardless of how they are performed - via table assignments, rawset, or any other methods like table.sort) -- Attempts to add new keys to the table fail, unless `__newindex` is defined on the metatable (in which case the assignment is routed through `__newindex` as usual) -- Attempts to change the metatable of the table fail -- Reading the table fields or iterating through the table proceeds as usual - -This feature is useful for two reasons: - -a) It allows an easier way to expose sandboxed objects that aren't possible to monkey-patch for security reasons. We actually already have support for freezing and use it internally on various builtin tables like `math`, we just don't expose it to Lua. - -b) It allows an easier way to expose immutable objects for consistency/correctness reasons. For example, Cryo library provides an implementation of immutable data structures; with this functionality, it's possible to implement a lighter-weight library by, for example, extending a table with methods to return mutated versions of the table, but retaining the usual table interface - -To limit the use of `table.freeze` to cases when table contents can be freely manipulated, `table.freeze` shall fail when the table has a locked metatable (but will succeed if the metatable isn't locked). - -## Drawbacks - -Exposing the internal "readonly" feature may have an impact on interoperability between scripts - for example, it becomes possible to freeze some tables that scripts may be expecting to have write access to from other scripts. Since we don't provide a way to unfreeze tables and freezing a table with a locked metatable fails, in theory the impact should not be any worse than allowing to change a metatable, but the full extents are unclear. - -There may be existing code in the VM that allows changing frozen tables in ways that are benign to the current sandboxing code, but expose a "gap" in the implementation that becomes significant with this feature; thus we would need to audit all table writes when implementing this. - -## Alternatives - -We've considered exposing a recursive freeze. The correct generic implementation is challenging since it requires supporting infinitely nested tables when working on the C stack (or a stackless implementation that requires heap allocation); also, to handle self-recursive tables requires a separate temporary tracking table since stopping the traversal at frozen sub-tables is insufficient as their children may not have been frozen. As such, we leave recursive implementation to user code. - -We've considered exposing thawing. The problem with this is that freezing is required for sandboxing, and as such we'd need to support "permafrozen" status that is separate from "frozen". This complicates implementation and we didn't find compelling use cases for thawing - if it becomes necessary we can always expose it separately. - -We've considered calling this "locking", but the term has connotations coming from multithreading that aren't applicable here, and in absence of unlocking, "locking" makes a bit less sense. diff --git a/rfcs/generalized-iteration.md b/rfcs/generalized-iteration.md deleted file mode 100644 index c28156ff..00000000 --- a/rfcs/generalized-iteration.md +++ /dev/null @@ -1,126 +0,0 @@ -# Generalized iteration - -**Status**: Implemented - -## Summary - -Introduce support for iterating over tables without using `pairs`/`ipairs` as well as a generic customization point for iteration via `__iter` metamethod. - -## Motivation - -Today there are many different ways to iterate through various containers that are syntactically incompatible. - -To iterate over arrays, you need to use `ipairs`: `for i, v in ipairs(t) do`. The traversal goes over a sequence `1..k` of numeric keys until `t[k] == nil`, preserving order. - -To iterate over dictionaries, you need to use `pairs`: `for k, v in pairs(t) do`. The traversal goes over all keys, numeric and otherwise, but doesn't guarantee an order; when iterating over arrays this may happen to work but is not guaranteed to work, as it depends on how keys are distributed between array and hash portion. - -To iterate over custom objects, whether they are represented as tables (user-specified) or userdata (host-specified), you need to expose special iteration methods, for example `for k, v in obj:Iterator() do`. - -All of these rely on the standard Lua iteration protocol, but it's impossible to trigger them in a generic fashion. Additionally, you *must* use one of `pairs`/`ipairs`/`next` to iterate over tables, which is easy to forget - a naive `for k, v in tab do` doesn't work and produces a hard-to-understand error `attempt to call a table value`. - -This proposal solves all of these by providing a way to implement uniform iteration with self-iterating objects by allowing to iterate over objects and tables directly via convenient `for k, v in obj do` syntax, and specifies the default iteration behavior for tables, thus mostly rendering `pairs`/`ipairs` obsolete - making Luau easier to use and teach. - -## Design - -In Lua, `for vars in iter do` has the following semantics (otherwise known as the iteration protocol): `iter` is expanded into three variables, `gen`, `state` and `index` (using `nil` if `iter` evaluates to fewer than 3 results); after this the loop is converted to the following pseudocode: - -```lua -while true do - vars... = gen(state, index) - index = vars... -- copy the first variable into the index - if index == nil then break end - - -- loop body goes here -end -``` - -This is a general mechanism that can support iteration through many containers, especially if `gen` is allowed to mutate state. Importantly, the *first* returned variable (which is exposed to the user) is used to continue the process on the next iteration - this can be limiting because it may require `gen` or `state` to carry extra internal iteration data for efficiency. To work around this for table iteration to avoid repeated calls to `next`, Luau compiler produces a special instruction sequence that recognizes `pairs`/`ipairs` iterators and stores the iteration index separately. - -Thus, today the loop `for k, v in tab do` effectively executes `k, v = tab()` on the first iteration, which is why it yields `attempt to call a table value`. If the object defines `__call` metamethod then it can act as a self-iterating method, but this is not idiomatic, not efficient and not pure/clean. - -This proposal comes in two parts: general support for `__iter` metamethod and default implementation for tables without one. With both of these in place, there's going to be a single, idiomatic, general and performant way to iterate through the object of any type: - -```lua -for k, v in obj do -... -end -``` - -### __iter - -To support self-iterating objects, we modify the iteration protocol as follows: instead of simply expanding the result of expression `iter` into three variables (`gen`, `state` and `index`), we check if the first result has an `__iter` metamethod (which can be the case if it's a table, userdata or another composite object (e.g. a record in the future). If it does, the metamethod is called with `gen` as the first argument, and the returned three values replace `gen`/`state`/`index`. This happens *before* the loop: - -```lua -local genmt = rawgetmetatable(gen) -- pseudo code for getmetatable that bypasses __metatable -local iterf = genmt and rawget(genmt, "__iter") -if iterf then - gen, state, index = iterf(gen) -end -``` - -This check is comparatively trivial: usually `gen` is a function, and functions don't have metatables; as such we can simply check the type of `gen` and if it's a table/userdata, we can check if it has a metamethod `__iter`. Due to tag-method cache, this check is also very cheap if the metamethod is absent. - -This allows objects to provide a custom function that guides the iteration. Since the function is called once, it is easy to reuse other functions in the implementation, for example here's a node object that exposes iteration through its children: - -```lua -local Node = {} -Node.__index = Node - -function Node.new(children) - return setmetatable({ children = children }, Node) -end - -function Node:__iter() - return next, self.children -end -``` - -Luau compiler already emits a bytecode instruction, FORGPREP*, to perform initial loop setup - this is where we can evaluate `__iter` as well. - -Naturally, this means that if the table has `__iter` metamethod and you need to iterate through the table fields instead of using the provided metamethod, you can't rely on the general iteration scheme and need to use `pairs`. This is similar to other parts of the language, like `t[k]` vs `rawget(t, 'k')`, where the default behavior is overrideable but a library function can help peek behind the curtain. - -### Default table iteration - -If the argument is a table and it does not implement `__iter` metamethod, we treat this as an attempt to iterate through the table using the builtin iteration order. - -> Note: we also check if the table implements `__call`; if it does, we fall back to the default handling. We may be able to remove this check in the future, but we will need this initially to preserve backwards compatibility with custom table-driven iterator objects that implement `__call`. In either case, we will be able to collect detailed analytics about the use of `__call` in iteration, and if neither is present we can emit a specialized error message such as `object X is not iteratable`. - -To have a single, unified, iteration scheme over tables regardless of whether they are arrays or dictionaries, we establish the following semantics: - -- First, the traversal goes over numeric keys in range `1..k` up until reaching the first `k` such that `t[k] == nil` -- Then, the traversal goes over the remaining keys (with non-nil values), numeric and otherwise, in unspecified order. - -For arrays with gaps, this iterates until the first gap in order, and the remaining order is not specified. - -> Note: This behavior is similar to what `pairs` happens to provide today, but `pairs` doesn't give any guarantees, and it doesn't always provide this behavior in practice. - -To ensure that this traversal is performant, the actual implementation of the traversal involves going over the array part (in index order) and then over the hash part (in hash order). For that implementation to satisfy the criteria above, we need to make two additional changes to table insertion/rehash: - -- When inserting key `k` in the table when `k == t->sizearray + 1`, we force the table to rehash (resize its array portion). Today this is only performed if the hash portion is full, as such sometimes numeric keys can end up in the hash part. -- When rehashing the table, we ensure that the hash part doesn't contain the key `newsizearray + 1`. This requires checking if the table has this key, which may require an additional hash lookup but we only need to do this in rare cases based on the analysis of power-of-two key buckets that we already collect during rehash. - -These changes guarantee that the order observed via standard traversal with `next`/`pairs` matches the guarantee above, which is nice because it means we can minimize the complexity cost of this change by reusing the traversal code, including VM optimizations. They also mean that the array boundary (aka `#t`) can *always* be computed from just the array portion, which simplifies the table length computation and may slightly speed it up. - -## Drawbacks - -This makes `for` desugaring and implementation a little more complicated; it's not a large complexity factor in Luau because we already have special handling for `for` loops in the VM, but it's something to keep in mind. - -While the proposed iteration scheme should be a superset to both `pairs` and `ipairs` for tables, for arrays `ipairs` may in some cases be faster because it stops at the first `nil`, whereas the proposed new scheme (like `pairs`) needs to iterate through the rest of the table's array storage. This may be fixable in the future, if we replace our cached table length (`aboundary`) with Lua 5.4's `alimit`, which maintains the invariant that all values after `alimit` in the array are `nil`. This would make default table iteration maximally performant as well as help us accelerate GC in some cases, but will require extra checks during table assignments which is a cost we may not be willing to pay. Thus it is theoretically possible that we will end up with `ipairs` being a slightly faster equivalent for array iteration forever. - -The resulting iteration behavior, while powerful, increases the divergence between Luau and Lua, making more programs that are written for Luau not runnable in Lua. Luau language in general does not consider this type of compatibility essential, but this is noted for posterity. - -The changes in insertion behavior that facilitate single iteration order may have a small cost; that said, they are currently understood to belong to paths that are already slow and the added cost is minimal. - -The extra semantics will make inferring the types of the variables in a for loop more difficult - if we know the type of the expression that is being iterated through it probably is not a problem though. - -## Alternatives - -Other major designs have been considered. - -A minor variation of the proposal involves having `__iter` be called on every iteration instead of at loop startup, effectively having `__iter` work as an alternative to `__call`. The issue with this variant is that while it's a little simpler to specify and implement, it restricts the options when implementing custom iteratable objects, because it would be difficult for iteratable objects to store custom iteration state elsewhere since `__iter` method would effectively need to be pure, as it can't modify the object itself as more than one concurrent iteration needs to be supported. - -A major variation of the proposal involves instead supporting `__pairs` from Lua 5.2. The issue with this variant is that it still requires the use of a library method, `pairs`, to work, which doesn't make the language simpler as far as table iteration, which is the 95% case, is concerned. Additionally, with some rare exceptions metamethods today extend the *language* behavior, not the *library* behavior, and extending extra library functions with metamethods does not seem true to the core of the language. Finally, this only works if the user uses `pairs` to iterate and doesn't work with `ipairs`/`next`. - -Another variation involves using a new pseudo-keyword, `foreach`, instead of overloading existing `for`, and only using the new `__iter` semantics there. This can more cleanly separate behavior, requiring the object to have an `__iter` metamethod (or be a table) in `foreach` - which also avoids having to deal with `__call` - but it also requires teaching the users a new keyword which fragments the iteration space a little bit more. Compared to that, the main proposal doesn't introduce new divergent syntax, and merely tweaks existing behavior to be more general, thus making an existing construct easier to use. - -Finally, the author also considered and rejected extending the iteration protocol as part of this change. One problem with the current protocol is that the iterator requires an allocation (per loop execution) to keep extra state that isn't exposed to the user. The builtin iterators like `pairs`/`ipairs` work around this by feeding the user-visible index back to the search function, but that's not always practical. That said, having a different iteration protocol in effect only when `__iter` is used makes the language more complicated for unclear efficiency gains, thus this design doesn't suggest a new core protocol to favor simplicity. diff --git a/rfcs/generic-function-subtyping.md b/rfcs/generic-function-subtyping.md deleted file mode 100644 index b9c0c430..00000000 --- a/rfcs/generic-function-subtyping.md +++ /dev/null @@ -1,211 +0,0 @@ -# Expanded Subtyping for Generic Function Types - -## Summary - -Extend the subtyping relation for function types to relate generic function -types with compatible instantiated function types. - -## Motivation - -As Luau does not have an explicit syntax for instantiation, there are a number -of places where the typechecker will automatically perform instantiation with -the goal of permitting more programs. These instances of instantiation are -ad-hoc and strategic, but useful in practice for permitting programs such as: - -```lua -function id(x: T): T - return x -end - -local idNum : (number) -> number -idNum = id -- ok -``` - -However, they have also been a source of some typechecking bugs because of how -they actually make a determination as to whether the instantation should happen, -and they currently open up some potential soundness holes when instantiating -functions in table types since properties of tables are mutable and thus need to -be invariant (which the automatic-instantiation potentially masks). - -## Design - -The goal then is to rework subtyping to support the relationship we want in the -first place: allowing polymorphic functions to be used where instantiated -functions are expected. In particular, this means adding instantiation itself to -the subtyping relation. Formally, that'd look something like: - -``` -instantiate((T1) -> T2) = (T1') -> T2' -(T1') -> T2' <: (T3) -> T4 --------------------------------------------- -(T1) -> T2 <: (T3) -> T4) -``` - -Or informally, we'd say that a generic function type is a subtype of another -function type if we can instantiate it and show that instantiated function type -to be a subtype of the original function type. Implementation-wise, this loose -formal rule suggests a strategy of when we'll want to apply instantiation. -Namely, whenever the subtype and supertype are both functions with the potential -subtype having some generic parameters and the supertype having none. So, if we -look once again at our simple example from motivation, we can walk through how -we expect it to type check: - -```lua -function id(x: T): T - return x -end - -local idNum : (number) -> number -idNum = id -- ok -``` - -First, `id` is given the type `(T) -> T` and `idNum` is given the type -`(number) -> number`. When we actually perform the assignment, we must show that -the type of the right-hand side is compatible with the type of the left-hand -side according to subtyping. That is, we'll ask if `(T) -> T` is a subtype of -`(number) -> number` which matches the rule to apply instantiation since the -would-be subtype has a generic parameter while the would-be supertype has no -generic parameters. This contrasts with the current implementation which, before -asking the subtyping question, checks if the type of the right-hand side -contains any generics at any point and if the type of the left-hand side cannot -_possibly_ contain generics and instantiates the right-hand side if so. - -Adding instantiation to subtyping does pose some additional questions still -about when exactly to instantiate. Namely, we need to consider cases like -function application. We can see why by looking at some examples: - -```lua -function rank2(f: (a) -> a): (number) -> number - return f -end -``` - -In this case, we expect to allow the instantiation of `f` from `(a) -> a` to -`(number) -> number`. After all, we can consider other cases like where the body -instead applies `f` to some particular value, e.g. `f(42)`, and we'd want the -instantiation to be allowed there. However, this means we'd potentially run into -issues if we allowed call sites to `rank2` to pass in non-polymorphic functions. -A naive approach to implementing this proposal would do exactly that because we -currently treat contravariant subtyping positions (i.e. for the arguments of -functions) as being the same as our normal (i.e. covariant) subtyping relation -but with the arguments reversed. So, to type check an application like -`rank2(function(str: string) return str + "s" end)` (where the function argument -is of type `(string) -> string`), we would ask if `(a) -> a` is a subtype of -`(string) -> string`. This is precisely the question we asked in the original -example, but in the contravariant context, this is actually unsound since -`rank2` would then function as a general coercion from, e.g., -`(string) -> string` to `(number) -> number`. - -This sort of behavior does come up in other languages that mix polymorphism and -subtyping. If we consider the same example in F#, we can compare its behavior: - -```fsharp -let ranktwo (f : 'a -> 'a) : int -> int = f -let pluralize (s : string) : string = s + "s" -let x = ranktwo pluralize -``` - -For this example, F# produces one warning and one error. The warning is applied -to the function definition of `ranktwo` itself (coded `FS0064`), and says "This -construct causes code to be less generic than indicated by the type annotations. -The type variable 'a has been constrained to be type 'int'." This warning -highlights the actual difference between our example in Luau and the F# -translation. In F#, `'a` is really a free type variable, rather than a generic -type parameter of the function `ranktwo`, as such, this code actually -constrains the type of `ranktwo` to be `(int -> int) -> (int -> int)`. As such, -the application on line 3 errors because our `(string -> string)` function is -simply not compatible with that type. With higher-rank polymorphic function -parameters, it doesn't make sense to warn on their instantiation (as illustrated -by the example of actually applying `f` to some particular data in the -definition of `rank2`), but it's still just as problematic if we were to accept -instantiated functions at polymorphic types. Thus, it's important that we -actually ensure that we only instantiate in covariant contexts. So, we must -ensure that subtyping only instantiates in covariant contexts. - -It may also be helpful to consider an example of rank-1 polymorphism to -understand the full scope of the behavior. So, we can look at what happens if we -simply move the type parameter out in our working example: - -```lua -function rank1(f: (a) -> a): (number) -> number - return f -end -``` - -In this case, we expect an error to occur because the type of `f` depends on -what we instantiate `rank1` with. If we allowed this, it would naturally be -unsound because we could again provide a `(string) -> string` argument (by -instantiating `a` with `string`). This reinforces the idea that the presence of -the generic type parameter is likely to be a good option for determining -instantiation (at least when compared to the presence of free type variables). - -## Drawbacks - -One of the aims of this proposal is to provide a clear and predictable mental -model of when instantiation will take place in Luau. The author feels this -proposal is step forward compared to the existing ad-hoc usage of instantiation -in the typechecker, but it's possible that programmers are already comfortable -with the mental model they have built for the existing implementation. -Hopefully, this is mitigated by the fact that the new setup should allow all of -the _sound_ uses of instantiation permitted by the existing system. Notably, -however, programmers may be surprised by the added restriction when it comes to -properties in tables. In particular, we can consider a small variation of our -original example with identity functions: - -```lua -function id(x: T): T - return x -end - -local poly : { id : (a) -> a } = { id = id } - -local mono : { id : (number) -> number } -mono = poly -- error! -mono.id = id -- also an error! -``` - -In this case, the fact that we're dealing with a _property_ of a table type -means that we're in a context that needs to be invariant (i.e. not allow -subtyping) to avoid unsoundness caused by interactions between mutable -references and polymorphism (see things like the [value -restriction in OCaml][value-restriction] to understand why). In most cases, we -believe programmers will be using functions in tables as an implementation of -methods for objects, so we don't anticipate that they'll actually _want_ to do -the unsound thing here. The accepted RFC for [read-only -properties][read-only-props] gives us a technically-precise solution since -read-only properties would be free to be typechecked as a covariant context -(since they disallow mutation), and thus if the property `id` was marked -read-only, we'd be able to do both of the assignments in the above example. - -## Alternatives - -The main alternatives would likely be keeping the existing solution (and -likely having to tactically fix future bugs where instantiation either happens -too much or not enough), or removing automatic instantiation altogether in favor -of manual instantiation syntax. The former solution (changing nothing) is cheap -now (both in terms of runtime performance and also development cost), but the -existing implementation involves extra walks of both types to make a decision -about whether or not to perform instantiation. To minimize the performance -impact, the functions that perform these questions (`isGeneric` and -`maybeGeneric`) actually do not perform a full walk, and instead try to -strategically look at only enough to make the decision. We already found and -fixed one bug that was caused by these functions being too imprecise against -their spec, but fleshing them out entirely could potentially be a noticeable -performance regression since the decision to potentially instantiate is one that -comes up often. - -Removing automatic instantiation altogether, by contrast, will definitely be -"correct" in that we'll never instantiate in the wrong spot and programmers will -always have the ability to instantiate, but it would be a marked regression on -developer experience since it would increase the annotation burden considerably -and generally runs counter to the overall design strategy of Luau (which focuses -heavily on type inference). It would also require us to actually pick a syntax -for manual instantiation (which we are still open to do in the future if we -maintain an automatic instantiation solution) which is frought with parser -ambiguity issues or requires the introduction of a sigil like Rust's turbofish -for instantiation. Discussion of that syntax is present in the [generic -functions][generic-functions] RFC. - -[value-restriction]: https://stackoverflow.com/questions/22507448/the-value-restriction#22507665 -[read-only-props]: https://github.com/Roblox/luau/blob/master/rfcs/property-readonly.md -[generic-functions]: https://github.com/Roblox/luau/blob/master/rfcs/generic-functions.md diff --git a/rfcs/generic-functions.md b/rfcs/generic-functions.md deleted file mode 100644 index 3ac1bbba..00000000 --- a/rfcs/generic-functions.md +++ /dev/null @@ -1,155 +0,0 @@ -# Generic functions - -**Status**: Implemented - -## Summary - -Extend the syntax and semantics of functions to support explicit generic functions, which can bind type parameters as well as data parameters. - -## Motivation - -Currently Luau allows generic functions to be inferred but not given explicit type annotations. For example - -```lua -function id(x) return x end -local x: string = id("hi") -local y: number = id(37) -``` - -is fine, but there is no way for a user to write the type of `id`. - -## Design - -Allow functions to take type parameters as well as function parameters, similar to Java/Typescript/... - -```lua -function id(x : a) : a return x end -``` - -Functions may also take generic type pack arguments for varargs, for instance: - -```lua -function compose(... : a...) -> (a...) return ... end -``` - -Generic type and type pack parameters can also be used in function types, for instance: - -```lua -local id: (a)->a = function(x) return x end -``` - -This change is *not* only syntax, as explicit type parameters need to be part of the semantics of types. For example, we can define a generic identity function - -```lua -local function id(x) return x end -local x: string = id("hi") -local y: number = id(37) -type Id = typeof(id) -``` - -and two functions - -```lua -function f() - return id -end -function g() - local y - function oh(x) - if not(y) then y = x end - return y - end - return oh -end -``` - -The types of these functions are - -```lua - f : () -> (a) -> a - g : () -> (a) -> a -``` - -so this is okay: - -```lua - local i: Id = f() - local x: string = i("hi") - local y: number = i(37) -``` - -but this is not: - -```lua - -- This assignment shouldn't typecheck! - local i: Id = g() - local x: string = i("hi") - -- This is unsound, since it assigns a string to a variable of type number - local y: number = i(37) -``` - -Currently, Luau does not have explicit type binders, so `f` and `g` have the same type. We propose making type binders part of the semantics of types as well as their syntax (so `f` and `g` have different types, and the unsound example does not typecheck). - -We propose supporting type parameters which can be instantiated with any type (jargon: Rank-N Types) but not type functions (jargon: Higher Kinded Types) or types with constraints (jargon: F-bounded polymorphism). - -## Turbofish - -Note that this RFC proposes a syntax for adding generic parameters to functions, but it does *not* propose syntax for adding generic arguments to function call site. For example, for `id` function you *can* write: - -```lua - -- generic type gets inferred as a number in all these cases -local x = id(4) -local x = id(y) :: number -local x: number = id(y) -``` - -but you can *not* write `id(y)`. - -This syntax is difficult to parse as it's ambiguous wrt grammar for comparison, and disambiguating it requires being able to parse types in expression context which makes parsing slow and complicated. It's also worth noting that today there are programs with this syntax that are grammatically correct (eg `id('4')` parses as "compare variable `id` to variable `string`, and compare the result to string '4'"). The specific example with a single argument will always fail at runtime because booleans can't be compared with relational operators, but multi-argument cases such as `print(foo(4))` can execute without errors in certain cases. - -Note that in many cases the types can be inferred, whether through function arguments (`id(4)`) or through expected return type (`id(y) :: number`). It's also often possible to cast the function object to a given type, even though that can be unwieldy (`(id :: (number)->number)(y)`). Some languages don't have a way to specify the types at call site either, Swift being a prominent example. Thus it's not a given we need this feature in Luau. - -If we ever want to implement this though, we can use a solution inspired by Rust's turbofish and require an extra token before `<`. Rust uses `::<` but that doesn't work in Luau because as part of this RFC, `id::(a)->a` is a valid, if redundant, type ascription, so we need to choose a different prefix. - -The following two variants are grammatically unambiguous in expression context in Luau, and are a better parallel for Rust's turbofish (in Rust, `::` is more similar to Luau's `:` or `.` than `::`, which in Rust is called `as`): - -```lua -foo:() -- require : before <; this is only valid in Luau in variable declaration context, so it's safe to use in expression context -foo.() -- require . before <; this is currently never valid in Luau -``` - -This RFC doesn't propose using either of these options, but notes that either one of these options is possible to specify & implement in the future if we so desire. - -## Drawbacks - -This is a breaking change, in that examples like the unsound program above will no longer typecheck. - -Types become more complex, so harder for programmers to reason about, and adding to their space usage. This is particularly noticeable anywhere the typechecker has exponential blowup, since small increases in type size can result in large increases in space or time usage. - -Not having higher-kinded types stops some examples which are parameterized on container types, for example: - -```lua - function g(f : (a) -> c) : (b) -> c> - return function(x) return f(f(x)) end - end -``` - -Not having bounded types stops some examples like giving a type to the function that sums an non-empty array: - -```lua - function sum(xs) - local result = x[0] - for i=1,#xs - result += x[i] - end - return result - end -``` - -## Alternatives - -We did originally consider Rank-1 types, but the problem is that's not backward-compatible, as DataBrain pointed out in the [Dev Forum](https://devforum.roblox.com/t/luau-recap-march-2021/1141387/29), since `typeof` allows users to construct generic types even without syntax for them. Rank-1 types give a false positive type error in this case, which comes from deployed code. - -We could introduce syntax for generic types without changing the semantics, but then there'd be a gap between the syntax (where the types `() -> (a) -> a` and `() -> (a) -> a` are different) and the semantics (where they are not). As noted above, this isn't sound. - -Rather than using Rank-N types, we could use SML-style polymorphism, but this would need something like the [value restriction](http://users.cis.fiu.edu/~smithg/cop4555/valrestr.html) to be sound. diff --git a/rfcs/len-metamethod-rawlen.md b/rfcs/len-metamethod-rawlen.md deleted file mode 100644 index 60278dda..00000000 --- a/rfcs/len-metamethod-rawlen.md +++ /dev/null @@ -1,45 +0,0 @@ -# Support `__len` metamethod for tables and `rawlen` function - -**Status**: Implemented - -## Summary - -`__len` metamethod will be called by `#` operator on tables, matching Lua 5.2 - -## Motivation - -Lua 5.1 invokes `__len` only on userdata objects, whereas Lua 5.2 extends this to tables. In addition to making `__len` metamethod more uniform and making Luau -more compatible with later versions of Lua, this has the important advantage which is that it makes it possible to implement an index based container. - -Before `__iter` and `__len` it was possible to implement a custom container using `__index`/`__newindex`, but to iterate through the container a custom function was -necessary, because Luau didn't support generalized iteration, `__pairs`/`__ipairs` from Lua 5.2, or `#` override. - -With generalized iteration, a custom container can implement its own iteration behavior so as long as code uses `for k,v in obj` iteration style, the container can -be interfaced with the same way as a table. However, when the container uses integer indices, manual iteration via `#` would still not work - which is required for some -more complicated algorithms, or even to simply iterate through the container backwards. - -Supporting `__len` would make it possible to implement a custom integer based container that exposes the same interface as a table does. - -## Design - -`#v` will call `__len` metamethod if the object is a table and the metamethod exists; the result of the metamethod will be returned if it's a number (an error will be raised otherwise). - -`table.` functions that implicitly compute table length, such as `table.getn`, `table.insert`, will continue using the actual table length. This is consistent with the -general policy that Luau doesn't support metamethods in `table.` functions. - -A new function, `rawlen(v)`, will be added to the standard library; given a string or a table, it will return the length of the object without calling any metamethods. -The new function has the previous behavior of `#` operator with the exception of not supporting userdata inputs, as userdata doesn't have an inherent definition of length. - -## Drawbacks - -`#` is an operator that is used frequently and as such an extra metatable check here may impact performance. However, `#` is usually called on tables without metatables, -and even when it is, using the existing metamethod-absence-caching approach we use for many other metamethods a test version of the change to support `__len` shows no -statistically significant difference on existing benchmark suite. This does complicate the `#` computation a little more which may affect JIT as well, but even if the -table doesn't have a metatable the process of computing `#` involves a series of condition checks and as such will likely require slow paths anyway. - -This is technically changing semantics of `#` when called on tables with an existing `__len` metamethod, and as such has a potential to change behavior of an existing valid program. -That said, it's unlikely that any table would have a metatable with `__len` metamethod as outside of userdata it would not anything, and this drawback is not feasible to resolve with any alternate version of the proposal. - -## Alternatives - -Do not implement `__len`. diff --git a/rfcs/local-type-inference.md b/rfcs/local-type-inference.md deleted file mode 100644 index 88cfe3cd..00000000 --- a/rfcs/local-type-inference.md +++ /dev/null @@ -1,180 +0,0 @@ -# Local Type Inference - -## Summary - -We are going to supplant the current type solver with one based on Benjamin Pierce's Local Type Inference algorithm: - -https://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf - -## Motivation - -Luau's type inference algorithm is used for much more than typechecking scripts. It is also the backbone of an autocomplete algorithm which has to work even for people who don't know what types or type systems are. - -We originally implemented nonstrict mode by making some tactical adjustments to the type inference algorithm. This was great for reducing false positives in untyped code, but carried with it the drawback that the inference result was usually not good enough for the autocomplete system. In order to offer a high quality experience, we've found ourselves to run type inference on nonstrict scripts twice: once for error feedback, and once again to populate the autocomplete database. - -Separately, we would also like more accurate type inference in general. Our current type solver jumps to conclusions a little bit too quickly. For example, it cannot infer an accurate type for an ordinary search function: - -```lua -function index_of(tbl, el) - for i = 0, #tbl do - if tbl[i] == el then - return i - end - end - return nil -end -``` - -Our solver sees two `return` statements and assumes that, because the first statement yields a `number`, so too must the second. - -To fix this, we are going to move to an architecture where type inference and type checking are two separate steps. Whatever mode the user is programming with, we will run an accurate type inference pass over their code and then run one of two typechecking passes over it. - -## Notation - -We'll use the standard notation `A <: B` to indicate that `A` is a subtype of `B`. - -## Design - -At a very high level, local type inference is built around the idea that we track the lower bounds and their upper bounds. The lower bounds of a binding is the set of values that it might conceivably receive. If a binding receives a value outside of its upper bounds, the program will fail. - -At the implementation level, we reencode free types as the space between these bounds. - -Upper bounds arise only from type annotations and certain builtin operations whereas lower bounds arise from assignments, return statements, and uses. - -Free types all start out with bounds `never <: 't <: unknown`. Intuitively, we say that `'t` represents some set of values whose domain is at least `never` and at most `unknown`. This naturally could be any value at all. - -When dispatching a constraint `T <: 't`, we replace the lower bounds of `'t` by the union of its old lower bounds and `T`. When dispatching a constraint `'t <: T`, we replace the upper bounds by its upper bound intersected with `T`. In other words, lower bounds grow from nothing as we see the value used whereas the upper bound initially encompasses everything and shrinks as we constrain it. - -### Constraint Generation Rules - -A return statement expands the lower bounds of the enclosing function's return type. - -```lua -function f(): R - local x: X - return x - -- X <: R -end -``` - -An assignment adds to the lower bounds of the assignee. - -```lua -local a: A -local b: B -a = b --- B <: A -``` - -A function call adds to the upper bounds of the function being called. - -Equivalently, passing a value to a function adds to the upper bounds of that value and to the lower bounds of its return value. - -```lua -local g -local h: H -local j = g(h) --- G <: (H) -> I... --- I... <: J -``` - -Property access is a constraint on a value's upper bounds. -```lua -local a: A -a.b = 2 --- A <: {b: number} - -a[1] = 3 --- A <: {number} -``` - -### Generalization - -Generalization is the process by which we infer that a function argument is generic. Broadly speaking, we solve constraints that arise from the function interior, we scan the signature of the function for types that are unconstrained, and we replace those types with generics. This much is all unchanged from the old solver. - -Unlike with the old solver, we never bind free types when dispatching a subtype constraint under local type inference. We only bind free types during generalization. - -If a type only appears in covariant positions in the function's signature, we can replace it by its lower bound. If it only appears in contravariant positions, we replace it by its upper bound. If it appears in both, we'll need to implement bounded generics to get it right. This is beyond the scope of this RFC. - -If a free type has neither upper nor lower bounds, we replace it with a generic. - -Some simple examples: - -```lua -function print_number(n: number) print(n) end - -function f(n) - print_number(n) -end -``` - -We arrive at the solution `never <: 'n <: number`. When we generalize, we can replace `'n` by its upper bound, namely `number`. We infer `f : (number) -> ()`. - -Next example: - -```lua -function index_of(tbl, el) -- index_of : ('a, 'b) -> 'r - for i = 0, #tbl do -- i : number - if tbl[i] == el then -- 'a <: {'c} - return i -- number <: 'r - end - end - return nil -- nil <: 'r -end -``` - -When typechecking this function, we have two constraints on `'r`, the return type. We can combine these constraints by taking the union of the lower bounds, leading us to `number | nil <: 'r <: unknown`. The type `'r` only appears in the return type of the function. The return type of this function is `number | nil`. - -At runtime, Luau allows any two values to be compared. Comparisons of values of mismatched types always return `false`. We therefore cannot produce any interesting constraints about `'b` or `'c`. - -We end up with these bounds: - -``` -never <: 'a <: {'c} -never <: 'b <: unknown -never <: 'c <: unknown -number | nil <: 'r <: unknown -``` - -`'a` appears in the argument position, so we replace it with its upper bound `{'c}`. `'b` and `'c` have no constraints at all so they are replaced by generics `B` and `C`. `'r` appears only in the return position and so is replaced by its lower bound `number | nil`. - -The final inferred type of `index_of` is `({C}, B) -> number | nil`. - -## Drawbacks - -This algorithm requires that we create a lot of union and intersection types. We need to be able to consistently pare down degenerate unions like `number | number`. - -Local type inference is also more permissive than what we have been doing up until now. For instance, the following is perfectly fine: - -```lua -local x = nil -if something then - x = 41 -else - x = "fourty one" -end -``` - -We'll infer `x : number | string | nil`. If the user wishes to constrain a value more tightly, they will have to write an annotation. - -## Alternatives - -### What TypeScript does - -TypeScript very clearly makes it work in what we would call a strict mode context, but we need more in order to offer a high quality nonstrict mode. For instance, TypeScript's autocomplete is completely helpless in the face of this code fragment: - -```ts -let x = null; -x = {a: "a", b: "pickles"}; -x. -``` - -TypeScript will complain that the assignment to `x` is illegal because `x` has type `null`. It will further offer no autocomplete suggestions at all when the user types the final `.`. - -It's not viable for us to require users to write type annotations. Many of our users do not yet know what types are but we are nevertheless committed to providing them a tool that is helpful to them. - -### Success Typing - -Success typing is the algorithm used by the Dialyzer inference engine for Erlang. Instead of attempting to prove that values always flow in sensible ways, it tries to prove that values _could_ flow in sensible ways. - -Success typing is quite nice in that it's very forgiving and can draw surprisingly useful information from untyped code, but that forgiving nature works against us in the case of strict mode. diff --git a/rfcs/lower-bounds-calculation.md b/rfcs/lower-bounds-calculation.md deleted file mode 100644 index 7208bf1a..00000000 --- a/rfcs/lower-bounds-calculation.md +++ /dev/null @@ -1,219 +0,0 @@ -# Lower Bounds Calculation - -**Status**: Abandoned in favor of a future design for full local inference - -## Summary - -We propose adapting lower bounds calculation from Pierce's Local Type Inference paper into the Luau type inference algorithm. - -https://www.cis.upenn.edu/~bcpierce/papers/lti-toplas.pdf - -## Motivation - -There are a number of important scenarios that occur where Luau cannot infer a sensible type without annotations. - -Many of these revolve around type variables that occur in contravariant positions. - -### Function Return Types - -A very common thing to write in Luau is a function to try to find something in some data structure. These functions habitually return the relevant datum when it is successfully found, or `nil` in the case that it cannot. For instance: - -```lua --- A.lua -function find_first_if(vec, f) - for i, e in ipairs(vec) do - if f(e) then - return i - end - end - - return nil -end -``` - -This function has two `return` statements: One returns `number` and the other `nil`. Today, Luau flags this as an error. We ask authors to add a return annotation to make this error go away. - -We would like to automatically infer `find_first_if : ({T}, (T) -> boolean) -> number?`. - -Higher order functions also present a similar problem. - -```lua --- B.lua -function foo(f) - f(5) - f("string") -end -``` - -There is nothing wrong with the implementation of `foo` here, but Luau fails to typecheck it all the same because `f` is used in an inconsistent way. This too can be worked around by introducing a type annotation for `f`. - -The fact that the return type of `f` is never used confounds things a little, but for now it would be a big improvement if we inferred `f : ((number | string) -> T...) -> ()`. - -## Design - -We introduce a new kind of TypeVar, `ConstrainedTypeVar` to represent a TypeVar whose lower bounds are known. We will never expose syntax for a user to write these types: They only temporarily exist as type inference is being performed. - -When unifying some type with a `ConstrainedTypeVar` we _broaden_ the set of constraints that can be placed upon it. - -It may help to realize that what we have been doing up until now has been _upper bounds calculation_. - -When we `quantify` a function, we will _normalize_ each type and convert each `ConstrainedTypeVar` into a `UnionTypeVar`. - -### Normalization - -When computing lower bounds, we need to have some process by which we reduce types down to a minimal shape and canonicalize them, if only to have a clean way to flush out degenerate unions like `A | A`. Normalization is about reducing union and intersection types to a minimal, canonicalizable shape. - -A normalized union is one where there do not exist two branches on the union where one is a subtype of the other. It is quite straightforward to implement. - -A normalized intersection is a little bit more complicated: - -1. The tables of an intersection are always combined into a single table. Coincident properties are merged into intersections of their own. - * eg `normalize({x: number, y: string} & {y: number, z: number}) == {x: number, y: string & number, z: number}` - * This is recursive. eg `normalize({x: {y: number}} & {x: {y: string}}) == {x: {y: number & string}}` -1. If two functions in the intersection have a subtyping relationship, the normalization results only in the super-type-most function. (more on function subtyping later) - -### Function subtyping relationships - -If we are going to infer intersections of functions, then we need to be very careful about keeping combinatorics under control. We therefore need to be very deliberate about what subtyping rules we have for functions of differing arity. We have some important requirements: - -* We'd like some way to canonicalize intersections of functions, and yet -* optional function arguments are a great feature that we don't want to break - -A very important use case for us is the case where the user is providing a callback to some higher-order function, and that function will be invoked with extra arguments that the original customer doesn't actually care about. For example: - -```lua --- C.lua -function map_array(arr, f) - local result = {} - for i, e in ipairs(arr) do - table.insert(result, f(e, i, arr)) - end - return result -end - -local example = {1, 2, 3, 4} -local example_result = map_array(example, function(i) return i * 2 end) -``` - -This function mirrors the actual `Array.map` function in JavaScript. It is very frequent for users of this function to provide a lambda that only accepts one argument. It would be annoying for callers to be forced to provide a lambda that accepts two unused arguments. This obviously becomes even worse if the function later changes to provide yet more optional information to the callback. - -This use case is very important for Roblox, as we have many APIs that accept callbacks. Implementors of those callbacks frequently omit arguments that they don't care about. - -Here is an example straight out of the Roblox developer documentation. ([full example here](https://developer.roblox.com/en-us/api-reference/event/BasePart/Touched)) - -```lua --- D.lua -local part = script.Parent - -local function blink() - -- ... -end - -part.Touched:Connect(blink) -``` - -The `Touched` event actually passes a single argument: the part that touched the `Instance` in question. In this example, it is omitted from the callback handler. - -We therefore want _oversaturation_ of a function to be allowed, but this combines with optional function arguments to create a problem with soundness. Consider the following: - -```lua --- E.lua -type Callback = (Instance) -> () - -local cb: Callback -function register_callback(c: Callback) - cb = c -end - -function invoke_callback(i: Instance) - cb(i) -end - ---- - -function bad_callback(x: number?) -end - -local obscured: () -> () = bad_callback - -register_callback(obscured) - -function good_callback() -end - -register_callback(good_callback) -``` - -The problem we run into is, if we allow the subtyping rule `(T?) -> () <: () -> ()` and also allow oversaturation of a function, it becomes easy to obscure an argument type and pass the wrong type of value to it. - -Next, consider the following type alias - -```lua --- F.lua -type OldFunctionType = (any, any) -> any -type NewFunctionType = (any) -> any -type FunctionType = OldFunctionType & NewFunctionType -``` - -If we have a subtyping rule `(T0..TN) <: (T0..TN-1)` to permit the function subtyping relationship `(T0..TN-1) -> R <: (T0..TN) -> R`, then the above type alias normalizes to `(any) -> any`. In order to call the two-argument variation, we would need to permit oversaturation, which runs afoul of the soundness hole from the previous example. - -We need a solution here. - -To resolve this, let's reframe things in simpler terms: - -If there is never a subtyping relationship between packs of different length, then we don't have any soundness issues, but we find ourselves unable to register `good_callback`. - -To resolve _that_, consider that we are in truth being a bit hasty when we say `good_callback : () -> ()`. We can pass any number of arguments to this function safely. We could choose to type `good_callback : () -> () & (any) -> () & (any, any) -> () & ...`. Luau already has syntax for this particular sort of infinite intersection: `good_callback : (any...) -> ()`. - -So, we propose some different inference rules for functions: - -1. The AST fragment `function(arg0..argN) ... end` is typed `(T0..TN, any...) -> R` where `arg0..argN : T0..TN` and `R` is the inferred return type of the function body. Function statements are inferred the same way. -1. Type annotations are unchanged. `() -> ()` is still a nullary function. - -For reference, the subtyping rules for unions and functions are unchanged. We include them here for clarity. - -1. `A <: A | B` -1. `B <: A | B` -1. `A | B <: T` if `A <: T` or `B <: T` -1. `T -> R <: U -> S` if `U <: T` and `R <: S` - -We propose new subtyping rules for type packs: - -1. `(T0..TN) <: (U0..UN)` if, for each `T` and `U`, `T <: U` -1. `(U...)` is the same as `() | (U) | (U, U) | (U, U, U) | ...`, therefore -1. `(T0..TN) <: (U...)` if for each `T`, `T <: U`, therefore -1. `(U...) -> R <: (T0..TN) -> R` if for each `T`, `T <: U` - -The important difference is that we remove all subtyping rules that mention options. Functions of different arities are no longer considered subtypes of one another. Optional function arguments are still allowed, but function as a feature of function calls. - -Under these rules, functions of different arities can never be converted to one another, but actual functions are known to be safe to oversaturate with anything, and so gain a type that says so. - -Under these subtyping rules, snippets `C.lua` and `D.lua`, check the way we want: literal functions are implicitly safe to oversaturate, so it is fine to cast them as the necessary callback function type. - -`E.lua` also typechecks the way we need it to: `(Instance) -> () ()` and so `obscured` cannot receive the value `bad_callback`, which prevents it from being passed to `register_callback`. However, `good_callback : (any...) -> ()` and `(any...) -> () <: (Instance) -> ()` and so it is safe to register `good_callback`. - -Snippet `F.lua` is also fixed with this ruleset: There is no subtyping relationship between `(any) -> ()` and `(any, any) -> ()`, so the intersection is not combined under normalization. - -This works, but itself creates some small problems that we need to resolve: - -First, the `...` symbol still needs to be unavailable for functions that have been given this implicit `...any` type. This is actually taken care of in the Luau parser, so no code change is required. - -Secondly, we do not want to silently allow oversaturation of direct calls to a function if we know that the arguments will be ignored. We need to treat these variadic packs differently when unifying for function calls. - -Thirdly, we don't want to display this variadic in the signature if the author doesn't expect to see it. - -We solve these issues by adding a property `bool VariadicTypePack::hidden` to the implementation and switching on it in the above scenarios. The implementation is relatively straightforward for all 3 cases. - -## Drawbacks - -There is a potential cause for concern that we will be inferring unions of functions in cases where we previously did not. Unions are known to be potential sources of performance issues. One possibility is to allow Luau to be less intelligent and have it "give up" and produce less precise types. This would come at the cost of accuracy and soundness. - -If we allow functions to be oversaturated, we are going to miss out on opportunities to warn the user about legitimate problems with their program. I think we will have to work out some kind of special logic to detect when we are oversaturating a function whose exact definition is known and warn on that. - -Allowing indirect function calls to be oversaturated with `nil` values only should be safe, but a little bit unfortunate. As long as we statically know for certain that `nil` is actually a permissible value for that argument position, it should be safe. - -## Alternatives - -If we are willing to sacrifice soundness, we could adopt success typing and come up with an inference algorithm that produces less precise type information. - -We could also technically choose to do nothing, but this has some unpalatable consequences: Something I would like to do in the near future is to have the inference algorithm assume the same `self` type for all methods of a table. This will make inference of common OO patterns dramatically more intuitive and ergonomic, but inference of polymorphic methods requires some kind of lower bounds calculation to work correctly. diff --git a/rfcs/never-and-unknown-types.md b/rfcs/never-and-unknown-types.md deleted file mode 100644 index 5ad216ef..00000000 --- a/rfcs/never-and-unknown-types.md +++ /dev/null @@ -1,146 +0,0 @@ -# never and unknown types - -**Status**: Implemented - -## Summary - -Add `unknown` and `never` types that are inhabited by everything and nothing respectively. - -## Motivation - -There are lots of cases in local type inference, semantic subtyping, -and type normalization, where it would be useful to have top and -bottom types. Currently, `any` is filling that role, but it has -special "switch off the type system" superpowers. - -Any use of `unknown` must be narrowed by type refinements unless another `unknown` or `any` is expected. For -example a function which can return any value is: - -```lua - function anything() : unknown ... end -``` - -and can be used as: - -```lua - local x = anything() - if type(x) == "number" then - print(x + 1) - end -``` - -The type of this function cannot be given concisely in current -Luau. The nearest equivalent is `any`, but this switches off the type system, for example -if the type of `anything` is `() -> any` then the following code typechecks: - -```lua - local x = anything() - print(x + 1) -``` - -This is fine in nonstrict mode, but strict mode should flag this as an error. - -The `never` type comes up whenever type inference infers incompatible types for a variable, for example - -```lua - function oops(x) - print("hi " .. x) -- constrains x must be a string - print(math.abs(x)) -- constrains x must be a number - end -``` - -The most general type of `x` is `string & number`, so this code gives -a type error, but we still need to provide a type for `oops`. With a -`never` type, we can infer the type `oops : (never) -> ()`. - -or when exhaustive type casing is achieved: - -```lua - function f(x: string | number) - if type(x) == "string" then - -- x : string - elseif type(x) == "number" then - -- x : number - else - -- x : never - end - end -``` - -or even when the type casing is simply nonsensical: - -```lua - function f(x: string | number) - if type(x) == "string" and type(x) == "number" then - -- x : string & number which is never - end - end -``` - -The `never` type is also useful in cases such as tagged unions where -some of the cases are impossible. For example: - -```lua - type Result = { err: false, val: T } | { err: true, err: E } -``` - -For code which we know is successful, we would like to be able to -indicate that the error case is impossible. With a `never` type, we -can do this with `Result`. Similarly, code which cannot succeed -has type `Result`. - -These types can _almost_ be defined in current Luau, but only quite verbosely: - -```lua - type never = number & string - type unknown = nil | number | boolean | string | {} | (...never) -> (...unknown) -``` - -But even for `unknown` it is impossible to include every single data types, e.g. every root class. - -Providing `never` and `unknown` as built-in types makes the code for -type inference simpler, for example we have a way to present a union -type with no options (as `never`). Otherwise we have to contend with ad hoc -corner cases. - -## Design - -Add: - -* a type `never`, inhabited by nothing, and -* a type `unknown`, inhabited by everything. - -And under success types (nonstrict mode), `unknown` is exactly equivalent to `any` because `unknown` -encompasses everything as does `any`. - -The interesting thing is that `() -> (never, string)` is equivalent to `() -> never` because all -values in a pack must be inhabitable in order for the pack itself to also be inhabitable. In fact, -the type `() -> never` is not completely accurate, it should be `() -> (never, ...never)` to avoid -cascading type errors. Ditto for when an expression list `f(), g()` where the resulting type pack is -`(never, string, number)` is still the same as `(never, ...never)`. - -```lua - function f(): never error() end - function g(): string return "" end - - -- no cascading type error where count mismatches, because the expression list f(), g() - -- was made to return (never, ...never) due to the presence of a never type in the pack - local x, y, z = f(), g() - -- x : never - -- y : never - -- z : never -``` - -## Drawbacks - -Another bit of complexity budget spent. - -These types will be visible to creators, so yay bikeshedding! - -Replacing `any` with `unknown` is a breaking change: code in strict mode may now produce errors. - -## Alternatives - -Stick with the current use of `any` for these cases. - -Make `never` and `unknown` type aliases rather than built-ins. diff --git a/rfcs/new-nonstrict.md b/rfcs/new-nonstrict.md deleted file mode 100644 index 42e9cc6f..00000000 --- a/rfcs/new-nonstrict.md +++ /dev/null @@ -1,340 +0,0 @@ -# New non-strict mode - -## Summary - -Currently, strict mode and non-strict mode infer different types for -the same program. With this feature, strict and non-strict modes will -share the [local type inference](local-type-inference.md) -engine, and the only difference between the modes will be in which -errors are reported. - -## Motivation - -Having two different type inference engines is unnecessarily -confusing, and can result in unexpected behaviors such as changing the -mode of a module can cause errors in the users of that module. - -The current non-strict mode infers very coarse types (e.g. all local -variables have type `any`) and so is not appropriate for type-driven -tooling, which results in expensively and inconsistently using -different modes for different tools. - -## Design - -### Code defects - -The main goal of non-strict mode is to minimize false positives, that -is if non-strict mode reports an error, then we have high confidence -that there is a code defect. Example defects are: - -* Run-time errors -* Dead code -* Using an expression whose only possible value is `nil` -* Writing to a table property that is never read - -*Run-time errors*: this is an obvious defect. Examples include: - -* Built-in operators (`"hi" + 5`) -* Luau APIs (`math.abs("hi")`) -* Function calls from embeddings (`CFrame.new("hi")`) -* Missing properties from embeddings (`CFrame.new().nope`) - -Detecting run-time errors is undecidable, for example - -```lua -if cond() then - math.abs(“hi”) -end -``` - -It is undecidable whether this code produces a run-time error, but we -do know that if `math.abs("hi")` is executed, it will produce a -run-time error, and so report a type error in this case. - -*Expressions guaranteed to be `nil`*: Luau tables do not error when a -missing property is accessed (though embeddings may). So something -like - -```lua -local t = { Foo = 5 } -local x = t.Fop -``` - -won’t produce a run-time error, but is more likely than not a -programmer error. In this case, if the programmer intent was to -initialize `x` as `nil`, they could have written - -```lua -local t = { Foo = 5 } -local x = nil -``` - -For this reason, we consider it a code defect to use a value that the -type system guarantees is of type `nil`. - -*Writing properties that are never read*: There is a matching problem -with misspelling properties when writing. For example - -```lua -function f() - local t = {} - t.Foo = 5 - t.Fop = 7 - print(t.Foo) -end -``` - -won’t produce a run-time error, but is more likely than not a -programmer error, since `t.Fop` is written but never read. We can use -read-only and write-only table properties for this, and make it an -error to create a write-only property. - -We have to be careful about this though, because if `f` ended with -`return t`, then it would be a perfectly sensible function with type -`() -> { Foo: number, Fop: number }`. The only way to detect that `Fop` -was never read would be whole-program analysis, which is prohibitively -expensive. - -*Implicit coercions*: Luau supports various implicit coercions, such -as allowing `math.abs("-12")`. These should be reported as defects. - -### New Non-strict error reporting - -The difficult part of non-strict mode error-reporting is detecting -guaranteed run-time errors. We can do this using an error-reporting -pass that generates a type context such that if any of the `x : T` in -the type context are satisfied, then the program is guaranteed to -produce a type error. - -For example in the program - -```lua -function h(x, y) - math.abs(x) - string.lower(y) -end -``` - -an error is reported when `x` isn’t a `number`, or `y` isn’t a `string`, so the generated context is - -``` -x : ~number -y : ~string -``` - -In the function: - -```lua -function f(x) - math.abs(x) - string.lower(x) -end -``` - -an error is reported when x isn’t a number or isn’t a string, so the constraint set is - -``` -x : ~number | ~string -``` - -Since `~number | ~string` is equivalent to `unknown`, non-strict mode -can report a warning, since calling the function is guaranteed to -throw a run-time error. In contrast: - -```lua -function g(x) - if cond() then - math.abs(x) - else - string.lower(x) - end -end -``` - -generates context - -``` -x : ~number & ~string -``` - -Since `~number & ~string` is not equivalent to `unknown`, non-strict mode reports no warning. - -* The disjunction of contexts `C1` and `C2` contains `x : T1|T2`, - where `x : T1` is in `C1` and `x : T2` is in `C2`. -* The conjunction of contexts `C1` and `C2` contains `x : T1&T2`, - where `x : T1` is in `C1` and `x : T2` is in `C2`. - -The context generated by a block is: - -* `x = E` generates the context of `E : never`. -* `B1; B2` generates the disjunction of the context of `B1` and the - context of `B2`. -* `if C then B1 else B2` end generates the conjunction of the context - of `B1` and the context of `B2`. -* `local x; B` generates the context of `B`, removing the constraint - `x : T`. If the type inferred for `x` is a subtype of `T`, then - issue a warning. -* `function f(x1,...,xN) B end` generates the context for `B`, - removing `x1 : T1, ..., xN : TN`. If any of the `Ti` are equivalent to - `unknown`, then issue a warning. - -The constraint set generated by a typed expression is: - -* The context generated by `x : T` is `x : T`. -* The context generated by `s : T` (where `s` is a scalar) is - trivial. Issue a warning if `s` has type `T`. -* The context generated by `F(M1, ..., MN) : T` is the disjunction of - the contexts generated by `F : ~function`, and by - `M1 : T1`, ...,`MN : TN` where for each `i`, `F` has an overload - `(unknown^(i-1),Ti,unknown^(N-i)) -> error`. (Pick `Ti` to be - `never` if no such overload exists). Issue a warning if `F` has an - overload `(unknown^N) -> S` where `S <: (T | error)`. -* The context generated by `M.p` is the context generated by `M : ~table`. -* The context generated by `M[N]` is the disjunction of the contexts - generated by `M : ~table` and `N : never`. - -For example: - -* The context generated by `math.abs("hi") : never` is - * the context generated by `"hi" : ~number`, since `math.abs` has an - overload `(~number) -> error`, which is trivial. - * A warning is issued since `"hi"` has type `~number`. -* The context generated by `function f(x) math.abs(x); string.lower(x) end` is - * the context generated by `math.abs(x); string.lower(x)` which is the disjunction of - * the context generated by `math.abs(x)`, which is - * the context `x : ~number`, since `math.abs` has an overload `(~number)->error` - * the context generated by `string.lower(x)`, which is - * the context `x : ~string`, since `string.lower` has an overload `(~string)->error` - * remove the binding `x : (~number | ~string)` - * A warning is issued since `(~number | ~string)` is equivalent to `unknown`. -* The context generated by `math.abs(string.lower(x))` is - * the context generated by `string.lower(x) : ~number`, since `math.abs` has an overload `(~number)->error`, which is - * the text`x : ~string`, since `string.lower` has an overload `(~string)->error`. - * An warning is issued, since `string.lower` has an overload `(unknown) -> (string | error)` and `(string | error) <: (~number | error)`. - -### Ergonomics - -*Error reporting*. A straightforward implementation of this design -issues warnings at the point that data flows into a place -guaranteed to later produce a run-time error, which may not be perfect -ergonomics. For example, in the program: - -```lua -local x -if cond() then - x = 5 -else - x = nil -end -string.lower(x) -``` - -the type inferred for `x` is `number?` and the context generated is `x -: ~string`. Since `number? <: ~string`, a warning is issued at the -declaration `local x`. For ergonomics, we might want to identify -either `string.lower(x)` or `x = 5` (or both!) in the error report. - -*Stringifying checked functions*. This design depends on functions -having overloads with `error` return type. This integrates with -[type error suppression](type-error-suppression.md), but would not be -a perfect way to present types to users. A common case is that the -checked type is the negation of the function type, for example the -type of `math.abs`: - -``` -(number) -> number & (~number) -> error -``` - -This might be better presented as an annotation on the argument type, something like: - -``` -@checked (number) -> number -``` - -The type - -``` - @checked (S1,...,SN) -> T -``` - -is equivalent to - - -``` - (S1,...,SN) -> T - & (~S1, unknown^N-1) -> error - & (unknown, ~S2, unknown^N-2) -> error - ... - & (unknown^N-1, SN) -> error -``` - -As a further extension, we might allow users to explicitly provide `@checked` type annotations. - -Checked functions are known as strong functions in Elixir. - -## Drawbacks - -This is a breaking change, since it results in more errors being issued. - -Strict mode infers more precise (and hence more complex) types than -current non-strict mode, which are presented by type error messages -and tools such as type hover. - -## Alternatives - -Success typing (used in Erlang Dialyzer) is the nearest existing -solution. It is similar to this design, except that it only works in -(the equivalent of) non-strict mode. The success typing function type -`(S)->T` is the equivalent of our -`(~S)->error & (unknown)->(T|error)`. - -We could put the `@checked` annotation on individual function argments -rather than the function type. - -We could use this design to infer checked functions. In function -`f(x1, ..., xN) B end`, we could generate the context -`(x1 : T1, ..., xN : TN)` for `B`, and add an overload -`(unknown^(i-1),Ti,unknown^(N-i))->error` to the inferred type of `f`. For -example, for the function - -```lua -function h(x, y) - math.abs(x) - string.lower(y) -end -``` - -We would infer type - -``` - (number, string) -> () -& (~number, unknown) -> error -& (unknown, ~string) -> error -``` - -which is the same as - -``` - @checked (number, string) -> () -``` - -The problem with doing this is what to do about recursive functions. - -## References - -Lily Brown, Andy Friesen and Alan Jeffrey -*Position Paper: Goals of the Luau Type System*, -in HATRA '21: Human Aspects of Types and Reasoning Assistants, -2021. -https://doi.org/10.48550/arXiv.2109.11397 - -Giuseppe Castagna, Guillaume Duboc, José Valim -*The Design Principles of the Elixir Type System*, -2023. -https://doi.org/10.48550/arXiv.2306.06391 - -Tobias Lindahl and Konstantinos Sagonas, -*Practical Type Inference Based on Success Typings*, -in PPDP '06: Principles and Practice of Declarative Programming, -pp. 167–178, 2006. -https://doi.org/10.1145/1140335.1140356 diff --git a/rfcs/property-readonly.md b/rfcs/property-readonly.md deleted file mode 100644 index 6d09212d..00000000 --- a/rfcs/property-readonly.md +++ /dev/null @@ -1,148 +0,0 @@ -# Read-only properties - -## Summary - -Allow properties of classes and tables to be inferred as read-only. - -## Motivation - -Currently, Roblox APIs have read-only properties of classes, but our -type system does not track this. As a result, users can write (and -indeed due to autocomplete, an encouraged to write) programs with -run-time errors. - -In addition, user code may have properties (such as methods) -that are expected to be used without modification. Currently there is -no way for user code to indicate this, even if it has explicit type -annotations. - -It is very common for functions to only require read access to a parameter, -and this can be inferred during type inference. - -## Design - -### Properties - -Add a modifier to table properties indicating that they are read-only. - -This proposal is not about syntax, but it will be useful for examples to have some. Write: - -* `get p: T` for a read-only property of type `T`. - -For example: -```lua -function f(t) - t.p = 1 + t.p + t.q -end -``` -has inferred type: -``` -f: (t: { p: number, get q: number }) -> () -``` -indicating that `p` is used read-write but `q` is used read-only. - -### Subtyping - -Read-only properties are covariant: - -* If `T` is a subtype of `U` then `{ get p: T }` is a subtype of `{ get p: U }`. - -Read-write properties are a subtype of read-only properties: - -* If `T` is a subtype of `U` then `{ p: T }` is a subtype of `{ get p: U }`. - -### Indexers - -Indexers can be marked read-only just like properties. In -particular, this means there are read-only arrays `{get T}`, that are -covariant, so we have a solution to the "covariant array problem": - -```lua -local dogs: {Dog} -function f(a: {get Animal}) ... end -f(dogs) -``` - -It is sound to allow this program, since `f` only needs read access to -the array, and `{Dog}` is a subtype of `{get Dog}`, which is a subtype -of `{get Animal}`. This would not be sound if `f` had write access, -for example `function f(a: {Animal}) a[1] = Cat.new() end`. - -### Functions - -Functions are not normally mutated after they are initialized, so -```lua -local t = {} -function t.f() ... end -function t:m() ... end -``` - -should have type -``` -t : { - get f : () -> (), - get m : (self) -> () -} -``` - -If developers want a mutable function, -they can use the anonymous function version -```lua -t.g = function() ... end -``` - -For example, if we define: -```lua - type RWFactory = { build : () -> A } -``` - -then we do *not* have that `RWFactory` is a subtype of `RWFactory` -since the build method is read-write, so users can update it: -```lua - local mkdog : RWFactory = { build = Dog.new } - local mkanimal : RWFactory = mkdog -- Does not typecheck - mkanimal.build = Cat.new -- Assigning to methods is OK for RWFactory - local fido : Dog = mkdog.build() -- Oh dear, fido is a Cat at runtime -``` - -but if we define: -```lua - type ROFactory = { get build : () -> A } -``` - -then we do have that `ROFactory` is a subtype of `ROFactory` -since the build method is read-write, so users can update it: -```lua - local mkdog : ROFactory = { build = Dog.new } - local mkanimal : ROFactory = mkdog -- Typechecks now! - mkanimal.build = Cat.new -- Fails to typecheck, since build is read-only -``` - -Since most idiomatic Lua does not update methods after they are -initialized, it seems sensible for the default access for methods should -be read-only. - -*This is a possibly breaking change.* - -### Classes - -Classes can also have read-only properties and accessors. - -Methods in classes should be read-only by default. - -Many of the Roblox APIs an be marked as having getters but not -setters, which will improve accuracy of type checking for Roblox APIs. - -## Drawbacks - -This is adding to the complexity budget for users, -who will be faced with inferred get modifiers on many properties. - -## Alternatives - -Rather than making read-write access the default, we could make read-only the -default and add a new modifier for read-write. This is not backwards compatible. - -We could continue with read-write access to methods, -which means no breaking changes, but means that users may be faced with type -errors such as "`Factory` is not a subtype of `Factory`". diff --git a/rfcs/property-writeonly.md b/rfcs/property-writeonly.md deleted file mode 100644 index 1a49c26b..00000000 --- a/rfcs/property-writeonly.md +++ /dev/null @@ -1,179 +0,0 @@ -# Write-only properties - -## Summary - -Allow properties of classes and tables to be inferred as write-only. - -## Motivation - -This RFC is a follow-on to supporting read-only properties. - -Read-only properties have many obvious use-cases, but write-only properties -are more technical. - -The reason for wanting write-only properties is that it means -that we can infer a most specific type for functions, which we can't do if -we only have read-write and read-only properties. - -For example, consider the function -```lua - function f(t) t.p = Dog.new() end -``` - -The obvious type for this is -```lua - f : ({ p: Dog }) -> () -``` - -but this is not the most specific type, since read-write properties -are invariant, We could have inferred `f : ({ p: Animal }) -> ()`. -These types are incomparable (neither is a subtype of the other) -and there are uses of `f` that fail to typecheck depending which one choose. - -If `f : ({ p: Dog }) -> ()` then -```lua - local x : { p : Animal } = { p = Cat.new() } - f(x) -- Fails to typecheck -``` - -If `f : ({ p: Animal }) -> ()` then -```lua - local x : { p : Dog } = { p = Dog.new() } - f(x) -- Fails to typecheck -``` - -The reason for these failures is that neither of these is the most -specific type. It is one which includes that `t.p` is written to, and -not read from. -```lua - f : ({ set p: Dog }) -> () -``` - -This allows both example uses of `f` to typecheck. To see that it is more specific than `({ p: Animal }) -> ()`: - -* `Dog` is a subtype of `Animal` -* so (since write-only properties are contravariant) `{ set p: Dog }` is a supertype of `{ set p: Animal }` -* and (since read-write properties are a subtype of write-only properties) `{ set p: Animal }` is a supertype of `{ p: Animal }` -* so (by transitivity) `{ set p: Dog }` is a supertype of `{ set p: Animal }` is a supertype of `{ p: Animal }` -* so (since function arguments are contravariant `({ set p: Dog }) -> ()` is a subtype of `({ p: Animal }) -> ()` - -and similarly `({ set p: Dog }) -> ()` is a subtype of `({ p: Dog }) -> ()`. - -Local type inference depends on the existence of most specific (and most general) types, -so if we want to use it "off the shelf" we will need write-only properties. - -There are also some security reasons why properties should be -write-only. If `t` is a shared table, and any security domain can -write to `t.p`, then it may be possible to use this as a back-channel -if `t.p` is readable. If there is a dynamic check that a property is -write-only then we may wish to present a script analysis error if a -user tries reading it. - -## Design - -### Properties - -Add a modifier to table properties indicating that they are write-only. - -This proposal is not about syntax, but it will be useful for examples to have some. Write: - -* `set p: T` for a write-only property of type `T`. - -For example: -```lua -function f(t) - t.p = 1 + t.q -end -``` -has inferred type: -``` -f: (t: { set p: number, get q: number }) -> () -``` -indicating that `p` is used write-only but `q` is used read-only. - -### Adding read-only and write-only properties - -There are various points where type inference adds properties to types, we now have to consider how to treat each of these. - -When reading a property from a free table, we should add a read-only -property if there is no such property already. If there is already a -write-only property, we should make it read-write. - -When writing a property to a free table, we should add a write-only -property if there is no such property already. If there is already a -read-only property, we should make it read-write. - -When writing a property to an unsealed table, we should add a read-write -property if there is no such property already. - -When declaring a method in a table or class, we should add a read-only property for the method. - -### Subtyping - -Write-only properties are contravariant: - -* If `T` is a subtype of `U` then `{ set p: U }` is a subtype of `{ set p: T }`. - -Read-write properties are a subtype of write-only properties: - -* If `T` is a subtype of `U` then `{ p: U }` is a subtype of `{ set p: T }`. - -### Indexers - -Indexers can be marked write-only just like properties. In -particular, this means there are write-only arrays `{set T}`, that are -contravariant. These are sometimes useful, for example: - -```lua -function move(src, tgt) - for i,v in ipairs(src) do - tgt[i] = src[i] - src[i] = nil - end -end -``` - -we can give this function the type -``` - move: ({a},{set a}) -> () -``` - -and since write-only arrays are contravariant, we can call this with differently-typed -arrays: -```lua - local dogs : {Dog} = {fido,rover} - local animals : {Animal} = {tweety,sylvester} - move (dogs,animals) -``` - -This program does not type-check with read-write arrays. - -### Classes - -Classes can also have write-only properties and indexers. - -Some Roblox APIs which manipulate callbacks are write-only for security reasons. - -### Separate read and write types - -Once we have read-only properties and write-only properties, type intersection -gives read-write properties with different types. - -```lua - { get p: T } & { set p : U } -``` - -If we infer such types, we may wish to present them differently, for -example TypeScript allows both a getter and a setter. - -## Drawbacks - -This is adding to the complexity budget for users, who will be faced -with inferred set modifiers on many properties. There is a trade-off -here about how to spend the user's complexity budget: on understanding -inferred types with write-only properties, or debugging false positive -type errors caused by variance issues). - -## Alternatives - -Just stick with read-only and read-write accesses. diff --git a/rfcs/recursive-type-restriction.md b/rfcs/recursive-type-restriction.md deleted file mode 100644 index 6f69d43a..00000000 --- a/rfcs/recursive-type-restriction.md +++ /dev/null @@ -1,65 +0,0 @@ -# Recursive type restriction - -**Status**: Implemented - -## Summary - -Restrict generic type aliases to only be able to refer to the exact same instantiation of the generic that's being declared. - -## Motivation - -Luau supports recursive type aliases, but with an important restriction: -users can declare functions of recursive types, such as: -```lua - type Tree = { data: a, children: {Tree} } -``` -but *not* recursive type functions, such as: -```lua - type Weird = { data: a, children: Weird<{a}> } -``` -If types such as `Weird` were allowed, they would have infinite unfoldings for example: -```lua - Weird = { data: number, children: Weird<{number}> }` - Weird<{number}> = { data: {number}, children: Weird<{{number}}> } - Weird<{{number}}> = { data: {{number}}, children: Weird<{{{number}}}> } - ... -``` - -Currently Luau has this restriction, but does not enforce it, and instead -produces unexpected types, which can result in free types leaking into -the module exports. - -## Design - -To enforce the restriction that recursive types aliases produce functions of -recursive types, we require that in any recursive type alias defining `T`, -in any recursive use of `T`, we have that `gs` and `Us` are equal. - -This allows types such as: -```lua - type Tree = { data: a, children: {Tree} } -``` -but *not*: -```lua - type Weird = { data: a, children: Weird<{a}> } -``` -since in the recursive use `a` is not equal to `{a}`. - -This restriction applies to mutually recursive types too. - -## Drawbacks - -This restriction bans some type declarations which do not produce infinite unfoldings, -such as: -```lua - type WeirdButFinite = { data: a, children: WeirdButFinite } -``` -This restriction is stricter than TypeScript, which allows programs such as: -```typescript -interface Foo { x: Foo[]; y: a; } -let x: Foo = { x: [], y: 37 } -``` - -## Alternatives - -We could adopt a solution more like TypeScript's, which is to lazily rather than eagerly instantiate types. diff --git a/rfcs/sealed-table-subtyping.md b/rfcs/sealed-table-subtyping.md deleted file mode 100644 index 73714909..00000000 --- a/rfcs/sealed-table-subtyping.md +++ /dev/null @@ -1,106 +0,0 @@ -# Sealed table subtyping - -**Status**: Implemented - -## Summary - -In Luau, tables have a state, which can, among others, be "sealed". A sealed table is one that we know the full shape of and cannot have new properties added to it. We would like to introduce subtyping for sealed tables, to allow users to express some subtyping relationships that they currently cannot. - -## Motivation - -We would like this code to type check: -```lua -type Interface = { - name: string, -} - -type Concrete = { - name: string, - id: number, -} - -local x: Concrete = { - name = "foo", - id = 123, -} - -local function getImplementation(): Interface - return x -end -``` -Right now this code fails to type check, because `x` contains an extra property, `id`. Allowing sealed tables to be subtypes of other sealed tables would permit this code to type check successfully. - -## Design - -In order to do this, we will make sealed tables act as a subtype of other sealed tables if they contain all the properties of the supertype. - -``` -type A = { - name: string, -} - -type B = { - name: string, - id: number, -} - -type C = { - id: number, -} - -local b: B = { - name = "foo", - id = 123, -} - --- works: B is a subtype of A -local a: A = b - --- works: B is a subtype of C -local c: C = b - --- fails: A is not a subtype of C -local a2: A = c -``` - -This change affects existing code, but it should be a strictly more permissive change - it won't break any existing code, but it will allow code that was previously denied before. - -## Drawbacks - -This change will mean that sealed tables that don't exactly match may be permitted. In the past, this was an error; users may be relying on the type checker to perform these checks. We think the risk of this is minimal, as the presence of extra properties is unlikely to break user code. This is an example of code that would have raised a type error before: - -```lua -type A = { - name: string, -} - -local a: A = { - name = "foo", - -- Before, we would have raised a type error here for the presence of the - -- extra property `id`. - id = 123, -} -``` - -## Alternatives - -In order to avoid any chance of breaking backwards-compatibility, we could introduce a new state for tables, "interface" or something similar, that can only be produced via new syntax. This state would act like a sealed table, except with the addition of the subtyping rule described in this RFC. An example syntax for this: - -```lua --- `interface` context-sensitive keyword denotes an interface table -type A = interface { - name: string, -} - -type B = { - name: string, - id: number, -} - -local b: B = { - name = "foo", - id = 123, -} - -local a: A = b -``` diff --git a/rfcs/syntax-array-like-table-types.md b/rfcs/syntax-array-like-table-types.md deleted file mode 100644 index 486f7deb..00000000 --- a/rfcs/syntax-array-like-table-types.md +++ /dev/null @@ -1,65 +0,0 @@ -# Array-like table types - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Add special syntax for array-like table types, `{ T }` - -## Motivation - -Luau supports annotating table types. Tables are quite complex beasts, acting as essentially an associative container mapping any value to any other value, and to make it possible to reason about them at type level we have a more constrained definition of what a table is: - -- A table can contain a set of string keys with a specific type for each key -- A table can additionally have an "indexer" for a given key/value type, meaning that it acts as an associative container mapping keys of type K to values of type V - -The syntax for this right now looks like this: - -``` -{ key1: Type1, key2: Type2, [KeyType]: ValueType } -``` - -This is an example of a hybrid table that has both an indexer and a list of specific key/value pairs. - -While Luau technically doesn't support arrays, canonically tables with integer keys are called arrays, or, more precisely, array-like tables. Luau way to specify these is to use an indexer with a number key: - -``` -{ [number]: ValueType } -``` - -(note that this permits use of non-integer keys, so it's technically richer than an array). - -As the use of arrays is very common - for example, many library functions such as `table.insert`, `table.find`, `ipairs`, work on array-like tables - Luau users who want to type-annotate their code have to use array-like table annotations a lot. - -`{ [number]: Type }` is verbose, and the only alternative is to provide a slightly shorter generic syntax: - -``` -type Array = { [number]: T } -``` - -... but this is necessary to specify in every single script, as we don't support preludes. - -## Design - -This proposal suggests adding syntactic sugar to make this less cumbersome: - -``` -{T} -``` - -This will be exactly equivalent to `{ [number]: T }`. `T` must be a type definition immediately followed by `}` (ignoring whitespace characters of course) - -Conveniently, `{T}` syntax matches the syntax for arrays in Typed Lua (a research project from 2014) and Teal (a recent initiative for a TypeScript-like Lua extension language from 2020). - -## Drawbacks - -This introduces a potential ambiguity wrt a tuple-like table syntax; to represent a table with two values, number and a string, it's natural to use syntax `{ number, string }`; however, how would you represent a table with just one value of type number? This may seem concerning but can be resolved by requiring a trailing comma for one-tuple table type in the future, so `{ number, }` would mean "a table with one number", vs `{ number }` which means "an array-like table of numbers". - -## Alternatives - -A different syntax along the lines of `[T]` or `T[]` was considered and rejected in favor of the current syntax: - -a) This allows us to, in the future - if we find a good workaround for b - introduce "real" arrays with a distinct runtime representation, maybe even starting at 0! (whether we do this or not is uncertain and outside of scope of this proposal) -b) Square brackets don't nest nicely due to Lua lexing rules, where [[foo]] is a string literal "foo", so with either syntax with square brackets array-of-arrays is not easy to specify diff --git a/rfcs/syntax-compound-assignment.md b/rfcs/syntax-compound-assignment.md deleted file mode 100644 index 6ab97f6a..00000000 --- a/rfcs/syntax-compound-assignment.md +++ /dev/null @@ -1,49 +0,0 @@ -# Compound assignment using `op=` syntax - -> Note: this RFC was adapted from an internal proposal that predates RFC process and as such doesn't follow the template precisely - -**Status**: Implemented - -## Design - -A feature present in many many programming languages is assignment operators that perform operations on the left hand side, for example - -``` -a += b -``` - -Lua doesn't provide this right now, so it requires code that's more verbose, for example - -``` -data[index].cost = data[index].cost + 1 -``` - -This proposal suggests adding `+=`, `-=`, `*=`, `/=`, `%=`, `^=` and `..=` operators to remedy this. This improves the ergonomics of writing code, and occasionally results in code that is easier to read to also be faster to execute. - -The semantics of the operators is going to be as follows: - -- Only one value can be on the left and right hand side -- The left hand side is evaluated once as an l-value, similarly to the left hand side of an assignment operator -- The right hand side is evaluated as an r-value (which results in a single Lua value) -- The assignment-modification is performed, which can involve table access if the left hand side is a table dereference -- Unlike C++, these are *assignment statements*, not expressions - code like this `a = (b += 1)` is invalid. - -Crucially, this proposal does *not* introduce new metamethods, and instead uses the existing metamethods and table access semantics, for example - -``` -data[index].cost += 1 -``` - -translates to - -``` -local table = data[index] -local key = "cost" -table[key] = table[key] + 1 -``` - -Which can invoke `__index` and `__newindex` on table as necessary, as well as `__add` on the element. In this specific example, this is *faster* than `data[index].cost = data[index].cost + 1` because `data[index]` is only evaluated once, but in general the compound assignment is expected to have the same performance and the goal of this proposal is to make code easier and more pleasant to write. - -The proposed new operators are currently invalid in Lua source, and as such this is a backwards compatible change. - -From the implementation perspective, this requires adding new code/structure to AST but doesn't involve adding new opcodes, metatables, or any extra cost at runtime. diff --git a/rfcs/syntax-continue-statement.md b/rfcs/syntax-continue-statement.md deleted file mode 100644 index 94e2009a..00000000 --- a/rfcs/syntax-continue-statement.md +++ /dev/null @@ -1,98 +0,0 @@ -# continue statement - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Add `continue` statement to `for`, `while` and `repeat` loops using a context-sensitive keyword to preserve compatibility. - -## Motivation - -`continue` statement is a feature present in basically all modern programming languages. It's great for ergonomics - often you want the loop to only process items of a specific kind, so you can say `if item.kind ~= "blah" then continue end` in the beginning of the loop. - -`continue` never makes code that was previously impossible to write possible, but it makes some code easier to write. - -We'd like to add this to Luau but we need to keep backwards compatibility - all existing scripts that parse correctly must parse as they do now. The rest of the proposal outlines the exact syntax and semantics that makes it possible. - -## Design - -`continue` statement shall be the statement that *starts* with "continue" identifier (*NOT* keyword - effectively it will be a context-sensitive keyword), and such that the *next* token is none of (`.`, `[`, `:`, `{`, `(`, `=`, string literal or ','). - -These rules effectively say that continue statement is the statement that *does not* parse as a function call or the beginning of an assignment statement. - -This is a continue statement: - -``` -do -continue -end -``` - -This is not a continue statement: - -``` -do -continue = 5 -end -``` - -This is not a continue statement: - -``` -do -continue(5) -end -``` - -This is not a continue statement either, why do you ask? - -``` -do -continue, foo = table.unpack(...) -end -``` - -These rules are simple to implement. In any Lua parser there is already a point where you have to disambiguate an identifier that starts an assignment statement (`foo = 5`) from an identifier that starts a function call (`foo(5)`). It's one of the few, if not the only, place in the Lua grammar where single token lookahead is not sufficient to parse Lua, because you could have `foo.bar(5)` or `foo.bar=5` or `foo.bar(5)[6] = 7`. - -Because of this, we need to parse the entire left hand side of an assignment statement (primaryexp in Lua's BNF) and then check if it was a function call; if not, we'd expect it to be an assignment statement. - -Alternatively in this specific case we could parse "continue", parse the next token, and if it's one of the exclusion list above, roll the parser state back and re-parse the non-continue statement. Our lexer currently doesn't support rollbacks but it's also an easy strategy that other implementations might employ for `continue` specifically. - -The rules make it so that the only time we interpret `continue` as a continuation statement is when in the old Lua the program would not have compiled correctly - because this is not valid Lua 5.x: - -``` -do -continue -end -``` - -There is one case where this can create new confusion in the newly written code - code like this: - -``` -do -continue -(foo())(5) -end -``` - -could be interpreted both as a function call to `continue` (which it is!) and as a continuation statement followed by a function call (which it is not!). Programmers writing this code might expect the second treatment which is wrong. - -We have an existing linter rule to prevent this, however *for now* we will solve this in a stronger way: - -Once we parse `continue`, we will treat this as a block terminator - similarly to `break`/`return`, we will expect the block to end and the next statement will have to be `end`. This will make sure there's no ambiguity. We may relax this later and rely on the linter to tell people when the code is wrong. - -Semantically, continue will work as you would expect - it would skip the rest of the loop body, evaluate the condition for loop continuation (e.g. check the counter value for numeric loops, call the loop iterator for generic loops, evaluate while/repeat condition for while/repeat loops) and proceed accordingly. Locals declared in the loop body would be closed as well. - -One special case is the `until` expression: since it has access to the entire scope of `repeat` statement, using `continue` is invalid when it would result in `until` expression accessing local variables that are declared after `continue`. - -## Drawbacks - -Adding `continue` requires a context-sensitive keyword; this makes editor integration such as syntax highlighting more challenging, as you can't simply assume any occurrence of the word `continue` is referring to the statement - this is different from `break`. - -Implementing `continue` requires special care for `until` statement as highlighted in the design, which may make compiler slower and more complicated. - -## Alternatives - -In later versions of Lua, instead of `continue` you can use `goto`. However, that changes control flow to be unstructured and requires more complex implementation and syntactic changes. diff --git a/rfcs/syntax-default-type-alias-type-parameters.md b/rfcs/syntax-default-type-alias-type-parameters.md deleted file mode 100644 index 443bbac3..00000000 --- a/rfcs/syntax-default-type-alias-type-parameters.md +++ /dev/null @@ -1,97 +0,0 @@ -# Default type alias type parameters - -**Status**: Implemented - -## Summary - -Introduce syntax to provide default type values inside the type alias type parameter list. - -## Motivation - -Luau has support for type parameters for type aliases and functions. -In languages with similar features like C++, Rust, Flow and TypeScript, it is possible to specify default values for looser coupling and easier composability, and users with experience in those languages would like to have these design capabilities in Luau. - -Here is an example that is coming up frequently during development of GraphQL Luau library: -```lua -export type GraphQLFieldResolver< - TSource, - TContext, - TArgs = { [string]: any } -> = (TSource, TArgs, TContext, GraphQLResolveInfo) -> any -``` -If we could specify defaults like that, we won't have to write long type names when type alias is used unless specific customization is required. -Some engineers already skip these extra arguments and use `'any'` to save time, which gives worse typechecking quality. - -Without default parameter values it's also harder to refactor the code as each type alias reference that uses 'common' type arguments has to be updated. - -While previous example uses a concrete type for default type value, it should also be possible to reference generic types from the same list: -```lua -type Eq = (l: T, r: U) -> boolean - -local a: Eq = ... -local b: Eq = ... -``` - -Generic functions in Luau also have a type parameter list, but it's not possible to specify type arguments at the call site and because of that, default type parameter values for generic functions are not proposed. - -## Design - -If a default type parameter value is assigned, following type parameters (on the right) must also have default type parameter values. -```lua -type A = ... -- not allowed -``` - -Default type parameter values can reference type parameters which were defined earlier (to the left): -```lua -type A = ...-- ok - -type A = ... -- not allowed -``` - -Default type parameter values are also allowed for type packs: -```lua -type A -- ok, variadic type pack -type B -- ok, type pack with no elements -type C -- ok, type pack with one element -type D -- ok, type pack with two elements -type E -- ok, variadic type pack with a different first element -type F -- ok, same type pack as T... -``` - ---- - -Syntax for type alias type parameter is extended as follows: - -```typeparameter ::= Name [`...'] [`=' typeannotation]``` - -Instead of storing a simple array of names in AstStatTypeAlias, we will store an array of structs containing the name and an optional default type value. - -When type alias is referenced, missing type parameters are replaced with default type values, if they are available. - -If all type parameters have a default type value, it is now possible to reference that without providing a type parameter list: -```lua -type All = ... - -local a: All -- ok -local b: All<> -- ok as well -``` - -If type is exported from a module, default type parameter values will still be available when module is imported. - ---- -Type annotations in Luau are placed after `':'`, but we use `'='` here to assign a type value, not to imply that the type parameter on the left has a certain type. - -Type annotation with `':'` could be used in the future for bounded quantification which is orthogonal to the default type value. - -## Drawbacks - -Other languages might allow references to the type alias without arguments inside the scope of that type alias to resolve into a recursive reference to the type alias with the same arguments. - -While that is not allowed in Luau right now, if we decide to change that in the future, we will have an ambiguity when all type alias parameters have default values: -```lua --- ok if we allow Type to mean Type -type Type = { x: number, b: Type? } - --- ambiguity, Type could mean Type or Type -type Type = { x: number, b: Type? } -``` diff --git a/rfcs/syntax-floor-division-operator.md b/rfcs/syntax-floor-division-operator.md deleted file mode 100644 index b84ddd18..00000000 --- a/rfcs/syntax-floor-division-operator.md +++ /dev/null @@ -1,64 +0,0 @@ -# Floor division operator - -**Status**: Implemented - -## Summary - -Add floor division operator `//` to ease computing with integers. - -## Motivation - -Integers are everywhere. Indices, pixel coordinates, offsets, ranges, quantities, counters, rationals, fixed point arithmetic and bitwise operations all use integers. - -Luau is generally well suited to work with integers. The math operators +, -, \*, ^ and % support integers. That is, given integer operands these operators produce an integer result (provided that the result fits into representable range of integers). However, that is not the case with the division operator `/` which in the general case produces numbers with fractionals. - -To overcome this, typical Luau code performing integer computations needs to wrap the result of division inside a call to `math.floor`. This has a number of issues and can be error prone in practice. - -A typical mistake is to forget to use `math.floor`. This can produce subtle issues ranging from slightly wrong results to script errors. A script error could occur, for example, when the result of division is used to fetch from a table with only integer keys, which produces nil and a script error happens soon after. Another type of error occurs when an accidental fractional number is passed to a C function. Depending on the implementation, the C function could raise an error (if it checks that the number is actually an integer) or cause logic errors due to rounding. - -Particularly problematic is incorrect code which seems to work with frequently used data, only to fail with some rare input. For example, image sizes often have power of two dimensions, so code dealing with them may appear to work fine until much later some rare image has an odd size and a division by two in the code does not produce the correct result. Due to better ergonomics of the floor division operator, it becomes a second nature to write `//` everywhere when integers are involved and thus this class of bugs is much less likely to happen. - -Another issue with using `math.floor` as a workaround is that code performing a lot of integer calculations is harder to understand, write and maintain. - -Especially with applications dealing with pixel graphics, such as 2D games, integer math is so common that `math.floor` could easily become the most commonly used math library function. For these applications, avoiding the calls to `math.floor` is alluring from the performance perspective. - -> Non-normative: Here are the top math library functions used by a shipped game that heavily uses Lua: -> `floor`: 461 matches, `max`: 224 matches, `sin`: 197 matches, `min`: 195 matches, `clamp`: 171 matches, `cos`: 106 matches, `abs`: 85 matches. -> The majority of `math.floor` calls disappear from this codebase with the floor division operator. - -Lua has had floor division operator since version 5.3, so its addition to Luau makes it easier to migrate from Lua to Luau and perhaps more importantly use the wide variety of existing Lua libraries in Luau. Of other languages, most notably Python has floor division operator with same semantics and same syntax. R and Julia also have a similar operator. - -## Design - -The design mirrors Lua 5.3: - -New operators `//` and `//=` will be added to the language. The operator `//` performs division of two operands and rounds the result towards negative infinity. By default, the operator is only defined for numbers. The operator has the same precedence as the normal division operator `/`. `//=` is the compound-assignment operator for floor division, similar to the existing operator `/=`. - -A new metamethod `__idiv` will be added. The metamethod is invoked when any operand of floor division is not a number. The metamethod can be used to implement floor division for any user defined data type as well as the built-in vector type. - -The typechecker does not need special handling for the new operators. It can simply apply the same rules for floor division as it does for normal division operators. - -Examples of usage: - -``` --- Convert offset into 2d indices -local i, j = offset % 5, offset // 5 - --- Halve dimensions of an image or UI element -width, height = width // 2, height // 2 - --- Draw an image to the center of the window -draw_image(image, window_width // 2 - element_width // 2, window_height // 2 - element_height // 2) -``` - -## Drawbacks - -The addition of the new operator adds some complexity to the implementation (mostly to the VM) and to the language, which can be seen as a drawback. - -C like languages use `//` for line comments. Using the symbol `//` for floor division closes the door for using it for line comments in Luau. On the other hand, Luau already has long and short comment syntax, so adding yet another syntax for comments would add complexity to the language for little benefit. Moreover, it would make it harder to translate code from Lua to Luau and use existing Lua libraries if the symbol `//` has a completely different meaning in Lua and Luau. - -## Alternatives - -An alternative would be to do nothing but this would not solve the issues the lack of floor division currently has. - -An alternative implementation would treat `//` and `//=` only as syntactic sugar. The addition of new VM opcode for floor division could be omitted and the compiler could be simply modified to automatically emit a call to `math.floor` when necessary. This would require only minimal changes to Luau, but it would not support overloading the floor division operator using metamethods and would not have the performance benefits of the full implementation. diff --git a/rfcs/syntax-if-expression.md b/rfcs/syntax-if-expression.md deleted file mode 100644 index 76f76cf1..00000000 --- a/rfcs/syntax-if-expression.md +++ /dev/null @@ -1,108 +0,0 @@ -# if-then-else expression - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Introduce a form of ternary conditional using `if cond then value else alternative` syntax. - -## Motivation - -Luau does not have a first-class ternary operator; when a ternary operator is needed, it is usually emulated with `and/or` expression, such as `cond and value or alternative`. - -This expression evaluates to `value` if `cond` and `value` are truthy, and `alternative` otherwise. In particular it means that when `value` is `false` or `nil`, the result of the entire expression is `alternative` even when `cond` is truthy - which doesn't match the expected ternary logic and is a frequent source of subtle errors. - -Instead of `and/or`, `if/else` statement can be used but since that requires a separate mutable variable, this option isn't ergonomic. An immediately invoked function expression is also unergonomic and results in performance issues at runtime. - -## Design - -To solve these problems, we propose introducing a first-class ternary conditional. Instead of `? :` common in C-like languages, we propose an `if-then-else` expression form that is syntactically similar to `if-then-else` statement, but lacks terminating `end`. - -Concretely, the `if-then-else` expression must match `if then else `; it can also contain an arbitrary number of `elseif` clauses, like `if then elseif then else `. Unlike if statements, `else` is mandatory. - -The result of the expression is the then-expression when condition is truthy (not `nil` or `false`) and else-expression otherwise. Only one of the two possible resulting expressions is evaluated. - -Example: - -```lua -local x = if FFlagFoo then A else B - -MyComponent.validateProps = t.strictInterface({ - layoutOrder = t.optional(t.number), - newThing = if FFlagUseNewThing then t.whatever() else nil, -}) -``` - -Note that `else` is mandatory because it's always better to be explicit. If it weren't mandatory, it opens the possiblity that someone might be writing a chain of if-then-else and forgot to add in the final `else` that _doesn't_ return a `nil` value! Enforcing this syntactically ensures the program does not run. Also, with it being mandatory, it solves many cases where parsing the expression is ambiguous due to the infamous [dangling else](https://en.wikipedia.org/wiki/Dangling_else). - -This example will not do what it looks like it's supposed to do! The if expression will _successfully_ parse and be interpreted as to return `h()` if `g()` evaluates to some falsy value, when in actual fact the clear intention is to evaluate `h()` only if `f()` is falsy. - -```lua -if f() then - ... - local foo = if g() then x -else - h() - ... -end -``` - -The only way to solve this had we chose optional `else` branch would be to wrap the if expression in parentheses or to place a semi-colon. - -## Drawbacks - -Studio's script editor autocomplete currently adds an indented block followed by `end` whenever a line ends that includes a `then` token. This can make use of the if expression unpleasant as developers have to keep fixing the code by removing auto-inserted `end`. We can work around this on the editor side by (short-term) differentiating between whether `if` token is the first on its line, and (long-term) by refactoring completion engine to use infallible parser for the block completer. - -Parser recovery can also be more fragile due to leading `if` keyword - when `if` was encountered previously, it always meant an unfinished expression, but now it may start an `if-expr` that, when confused with `if-end` statement can lead to a substantially incorrect parse that is difficult to recover from. However, similar issues occur frequently due to function call statements and as such it's not clear that this makes the recovery materially worse. - -While this is not a problem today, in the past we've contemplated adding support for mid-block `return` statements; these would create an odd grammatical quirk where an `if..then` statement following an empty `return` would parse as an `if` expression. This would happen even without `if` expressions though for function calls (e.g. `return` followed by `print(1)`), and is more of a problem with the potential `return` statement changes and less of a problem with this proposal. - -## Alternatives - -We've evaluated many alternatives for the proposed syntax. - -### Python syntax -``` -b if a else c -``` -Undesirable because expression evaluation order is not left-to-right which is a departure from all other Lua expressions. Additionally, since `b` may be ending a statement (followed by `if` statement), resolving this ambiguity requires parsing `a` as expression and backtracking if `else` is not found, which is expensive and likely to introduce further ambiguities. - -### C-style ternary operator -``` -a ? b : c -``` -Problematic because `:` is used for method calls. In Julia `? :` and `:` are both operators which are disambiguated by _requiring_ spaces in the first case and _prohibiting_ them in the second case; this breaks backwards compatibility and doesn't match the rest of the language where whitespace in the syntax is not significant. - -### Function syntax -``` -iff(a, b, c) -``` -If implemented as a regular function call, this would break short-circuit behavior. If implemented as a special builtin, it would look like a regular function call but have magical behavior -- something likely to confuse developers. - -### Perl 6 syntax -``` -a ?? b !! c -``` -Syntax deemed too unconventional to use in Luau. - -### Smaller variations -``` -(if a then b else c) -``` -Ada uses this syntax (with parentheses required for clarity). Similar solutions were discussed for `as` previously and rejected to make it easier for humans and machines to understand the language syntax. - -``` -a then b else c -``` -This is ambiguous in some cases (like within if condition) so not feasible from a grammar perspective. - -``` -if a then b else c end -``` -The `end` here is unnecessary since `c` is not a block of statements -- it is simply an expression. Thus, use of `end` here would be inconsistent with its other uses in the language. It also makes the syntax more cumbersome to use and could lead to developers sticking with the error-prone `a and b or c` alternative. - -### `elseif` support - -We discussed a simpler version of this proposal without `elseif` support. Unlike if statements, here `elseif` is purely syntactic sugar as it's fully equivalent to `else if`. However, supporting `elseif` makes if expression more consistent with if statement - it is likely that developers familiar with Luau are going to try using `elseif` out of habit. Since supporting `elseif` here is trivial we decided to keep it for consistency. diff --git a/rfcs/syntax-named-function-type-args.md b/rfcs/syntax-named-function-type-args.md deleted file mode 100644 index 536e5606..00000000 --- a/rfcs/syntax-named-function-type-args.md +++ /dev/null @@ -1,58 +0,0 @@ -# Named function type arguments - -**Status**: Implemented - -## Summary - -Introduce syntax for optional names of function type arguments. - -## Motivation - -This feature will be useful to improve code documentation and provide additional information to LSP clients. - -## Design - -This proposal uses the same syntax that functions use to name the arguments: `(a: number, b: string) -> string` - -Names can be provided in any place where function type is used, for example: - -* in type aliases: -``` -type MyFunc = (cost: number, name: string) -> string -``` - -* in definition files for table types: -``` -declare string: { - rep: (pattern: string, repeats: number) -> string, - sub: (string, start: number, end: number?) -> string -- names are optional, here the first argument doesn't use a name -} -``` - -* for variables: -``` -local cb: (amount: number) -> number -local function foo(cb: (name: string) -> ()) -``` - -Variadic arguments cannot have a name, they are already written as ...: number. - -This feature can be found in other languages: - -* TypeScript (names are required): `let func: (p: type) => any` -* C++: `void (*f)(int cost, std::string name) = nullptr;` - -Implementation will store the names inside the function type description. - -Parsing the argument list will require a single-token lookahead that we already support. -Argument list parser will check if current token is an identifier and if the lookahead token is a colon, in which case it will consume both tokens. - -Function type comparisons will ignore the argument names, this proposal doesn't change the semantics of the language and how typechecking is performed. - -## Drawbacks - -Argument names require that we create unique function types even when these types are 'identical', so we can't compare types using pointer identity. - -This is already the case in current Luau implementation, but it might reduce the optimization opportunities in the future. - -There might also be cases of pointer identity checks that are currently hidden and named arguments might expose places where correct unification is required in the type checker. diff --git a/rfcs/syntax-number-literals.md b/rfcs/syntax-number-literals.md deleted file mode 100644 index 2ad6a6fa..00000000 --- a/rfcs/syntax-number-literals.md +++ /dev/null @@ -1,14 +0,0 @@ -# Extended numeric literal syntax - -> Note: this RFC was adapted from an internal proposal that predates RFC process and as such doesn't follow the template precisely - -**Status**: Implemented - -## Design - -This proposal suggests extending Lua number syntax with: - -1. Binary literals: `0b10101010101`. The prefix is either '0b' or '0B' (to match Lua's '0x' and '0X'). Followed by at least one 1 or 0. -2. Number literal separators: `1_034_123`. We will allow an arbitrary number and arrangement of underscores in all numeric literals, including hexadecimal and binary. This helps with readability of long numbers. - -Both of these features are standard in all modern languages, and can help write readable code. diff --git a/rfcs/syntax-singleton-types.md b/rfcs/syntax-singleton-types.md deleted file mode 100644 index 2c1f5442..00000000 --- a/rfcs/syntax-singleton-types.md +++ /dev/null @@ -1,91 +0,0 @@ -# Singleton types - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Introduce a new kind of type variable, called singleton types. They are just like normal types but has the capability to represent a constant runtime value as a type. - -## Motivation - -There are two primary drivers to add two kinds of singleton types: `string` and `boolean`. - -### `string` singleton types - -Luau type checker can get by mostly fine without constant string types, but it can shine at its best in user code. - -One popular pattern are the abstract data types, which could be supported: - -``` -type Ok = { type: "ok", value: T } -type Err = { type: "error", error: E } -type Result = Ok | Err - -local result: Result = ... -if result.type == "ok" then - -- result :: Ok - print(result.value) -else - -- result :: Err - error(result.error) -end -``` - -### `boolean` singleton types - -At the moment, Luau type checker is completely unable to discern the state of a boolean whatsoever, which makes it impossible to determine all the possible types of the expression from any variations of `a and b`, `a and b or c`, or `a or b`. - -## Design - -Both design components of singleton types should be intuitive for everyone by default. - -### Syntax - -A constant string token as well as a constant boolean token is now allowed to show up in type annotation context. - -``` -type Animals = "Dog" | "Cat" | "Bird" -type TrueOrNil = true? -``` - -Adding constant strings as type means that it is now legal to write -`{["foo"]:T}` as a table type. This should be parsed as a property, -not an indexer. For example: -```lua - type T = { - ["foo"]: number, - ["$$bar"]: string, - baz: boolean, - } -``` -The table type `T` is a table with three properties and no indexer. - -### Semantics - -You are allowed to provide a constant value to the generic primitive type. - -```lua -local foo: "Hello world" = "Hello world" -local bar: string = foo -- allowed - -local foo: true = true -local bar: boolean = foo -- also allowed -``` - -The inverse is not true, because you're trying to narrow any values to a specific value. - -```lua -local foo: string = "Hello world" -local bar: "Hello world" = foo -- not allowed - -local foo: boolean = true -local bar: true = foo -- not allowed -``` - -## Drawbacks - -This may increase the cost of type checking - since some types now need to carry a string literal value, it may need to be copied and compared. The cost can be mitigated through interning although this is not very trivial due to cross-module type checking and the need to be able to typecheck a module graph incrementally. - -This may make the type system a bit more complex to understand, as many programmers have a mental model of types that doesn't include being able to use literal values as a type, and having that be a subtype of a more general value type. diff --git a/rfcs/syntax-string-interpolation.md b/rfcs/syntax-string-interpolation.md deleted file mode 100644 index b2b9b0ea..00000000 --- a/rfcs/syntax-string-interpolation.md +++ /dev/null @@ -1,171 +0,0 @@ -# String interpolation - -**Status**: Implemented - -## Summary - -New string interpolation syntax. - -## Motivation - -The problems with `string.format` are many. - -1. Must be exact about the types and its corresponding value. -2. Using `%d` is the idiomatic default for most people, but this loses precision. - * `%d` casts the number into `long long`, which has a lower max value than `double` and does not support decimals. - * `%f` by default will format to the millionths, e.g. `5.5` is `5.500000`. - * `%g` by default will format up to the hundred thousandths, e.g. `5.5` is `5.5` and `5.5312389` is `5.53123`. It will also convert the number to scientific notation when it encounters a number equal to or greater than 10^6. - * To not lose too much precision, you need to use `%s`, but even so the type checker assumes you actually wanted strings. -3. No support for `boolean`. You must use `%s` **and** call `tostring`. -4. No support for values implementing the `__tostring` metamethod. -5. Using `%` is in itself a dangerous operation within `string.format`. - * `"Your health is %d% so you need to heal up."` causes a runtime error because `% so` is actually parsed as `(%s)o` and now requires a corresponding string. -6. Having to use parentheses around string literals just to call a method of it. - -## Design - -To fix all of those issues, we need to do a few things. - -1. A new string interpolation expression (fixes #5, #6) -2. Extend `string.format` to accept values of arbitrary types (fixes #1, #2, #3, #4) - -Because we care about backward compatibility, we need some new syntax in order to not change the meaning of existing strings. There are a few components of this new expression: - -1. A string chunk (`` `...{ ``, `}...{`, and `` }...` ``) where `...` is a range of 0 to many characters. - * `\` escapes `` ` ``, `{`, and itself `\`. - * The pairs must be on the same line (unless a `\` escapes the newline) but expressions needn't be on the same line. -2. An expression between the braces. This is the value that will be interpolated into the string. - * Restriction: we explicitly reject `{{` as it is considered an attempt to escape and get a single `{` character at runtime. -3. Formatting specification may follow after the expression, delimited by an unambiguous character. - * Restriction: the formatting specification must be constant at parse time. - * In the absence of an explicit formatting specification, the `%*` token will be used. - * For now, we explicitly reject any formatting specification syntax. A future extension may be introduced to extend the syntax with an optional specification. - -To put the above into formal EBNF grammar: - -``` -stringinterp ::= exp { exp} -``` - -Which, in actual Luau code, will look like the following: - -``` -local world = "world" -print(`Hello {world}!`) ---> Hello world! - -local combo = {5, 2, 8, 9} -print(`The lock combinations are: {table.concat(combo, ", ")}`) ---> The lock combinations are: 5, 2, 8, 9 - -local set1 = Set.new({0, 1, 3}) -local set2 = Set.new({0, 5, 4}) -print(`{set1} ∪ {set2} = {Set.union(set1, set2)}`) ---> {0, 1, 3} ∪ {0, 5, 4} = {0, 1, 3, 4, 5} - -print(`Some example escaping the braces \{like so}`) -print(`backslash \ that escapes the space is not a part of the string...`) -print(`backslash \\ will escape the second backslash...`) -print(`Some text that also includes \`...`) ---> Some example escaping the braces {like so} ---> backslash that escapes the space is not a part of the string... ---> backslash \ will escape the second backslash... ---> Some text that also includes `... -``` - -As for how newlines are handled, they are handled the same as other string literals. Any text between the `{}` delimiters are not considered part of the string, hence newlines are OK. The main thing is that one opening pair will scan until either a closing pair is encountered, or an unescaped newline. - -``` -local name = "Luau" - -print(`Welcome to { - name -}!`) ---> Welcome to Luau! - -print(`Welcome to \ -{name}!`) ---> Welcome to --- Luau! -``` - -We currently *prohibit* using interpolated strings in function calls without parentheses, this is illegal: - -``` -local name = "world" -print`Hello {name}` -``` - -> Note: This restriction is likely temporary while we work through string interpolation DSLs, an ability to pass individual components of interpolated strings to a function. - -The restriction on `{{` exists solely for the people coming from languages e.g. C#, Rust, or Python which uses `{{` to escape and get the character `{` at runtime. We're also rejecting this at parse time too, since the proper way to escape it is `\{`, so: - -```lua -print(`{{1, 2, 3}} = {myCoolSet}`) -- parse error -``` - -If we did not apply this as a parse error, then the above would wind up printing as the following, which is obviously a gotcha we can and should avoid. - -``` ---> table: 0xSOMEADDRESS = {1, 2, 3} -``` - -Since the string interpolation expression is going to be lowered into a `string.format` call, we'll also need to extend `string.format`. The bare minimum to support the lowering is to add a new token whose definition is to perform a `tostring` call. `%*` is currently an invalid token, so this is a backward compatible extension. This RFC shall define `%*` to have the same behavior as if `tostring` was called. - -```lua -print(string.format("%* %*", 1, 2)) ---> 1 2 -``` - -The offset must always be within bound of the numbers of values passed to `string.format`. - -```lua -local function return_one_thing() return "hi" end -local function return_two_nils() return nil, nil end - -print(string.format("%*", return_one_thing())) ---> "hi" - -print(string.format("%*", Set.new({1, 2, 3}))) ---> {1, 2, 3} - -print(string.format("%* %*", return_two_nils())) ---> nil nil - -print(string.format("%* %* %*", return_two_nils())) ---> error: value #3 is missing, got 2 -``` - -It must be said that we are not allowing this style of string literals in type annotations at this time, regardless of zero or many interpolating expressions, so the following two type annotations below are illegal syntax: - -```lua -local foo: `foo` -local bar: `bar{baz}` -``` - -String interpolation syntax will also support escape sequences. Except `\u{...}`, there is no ambiguity with other escape sequences. If `\u{...}` occurs within a string interpolation literal, it takes priority. - -```lua -local foo = `foo\tbar` -- "foo bar" -local bar = `\u{0041} \u{42}` -- "A B" -``` - -## Drawbacks - -If we want to use backticks for other purposes, it may introduce some potential ambiguity. One option to solve that is to only ever produce string interpolation tokens from the context of an expression. This is messy but doable because the parser and the lexer are already implemented to work in tandem. The other option is to pick a different delimiter syntax to keep backticks available for use in the future. - -If we were to naively compile the expression into a `string.format` call, then implementation details would be observable if you write `` `Your health is {hp}% so you need to heal up.` ``. When lowering the expression, we would need to implicitly insert a `%` character anytime one shows up in a string interpolation token. Otherwise attempting to run this will produce a runtime error where the `%s` token is missing its corresponding string value. - -## Alternatives - -Rather than coming up with a new syntax (which doesn't help issue #5 and #6) and extending `string.format` to accept an extra token, we could just make `%s` call `tostring` and be done. However, doing so would cause programs to be more lenient and the type checker would have no way to infer strings from a `string.format` call. To preserve that, we would need a different token anyway. - -Language | Syntax | Conclusion -----------:|:----------------------|:----------- -Python | `f'Hello {name}'` | Rejected because it's ambiguous with function call syntax. -Swift | `"Hello \(name)"` | Rejected because it changes the meaning of existing strings. -Ruby | `"Hello #{name}"` | Rejected because it changes the meaning of existing strings. -JavaScript | `` `Hello ${name}` `` | Viable option as long as we don't intend to use backticks for other purposes. -C# | `$"Hello {name}"` | Viable option and guarantees no ambiguities with future syntax. - -This leaves us with only two syntax that already exists in other programming languages. The current proposal are for backticks, so the only backward compatible alternative are `$""` literals. We don't necessarily need to use `$` symbol here, but if we were to choose a different symbol, `#` cannot be used. I picked backticks because it doesn't require us to add a stack of closing delimiters in the lexer to make sure each nested string interpolation literals are correctly closed with its opening pair. You only have to count them. diff --git a/rfcs/syntax-type-alias-type-packs.md b/rfcs/syntax-type-alias-type-packs.md deleted file mode 100644 index d5bb6065..00000000 --- a/rfcs/syntax-type-alias-type-packs.md +++ /dev/null @@ -1,218 +0,0 @@ -# Type alias type packs - -**Status**: Implemented - -## Summary - -Provide semantics for referencing type packs inside the body of a type alias declaration - -## Motivation - -We now have an ability to declare a placeholder for a type pack in type alias declaration, but there is no support to reference this pack inside the body of the alias: -```lua -type X = () -> A... -- cannot reference A... as the return value pack - -type Y = X -- invalid number of arguments -``` - -Additionally, while a simple introduction of these generic type packs into the scope will provide an ability to reference them in function declarations, we want to be able to use them to instantiate other type aliases as well. - -Declaration syntax also supports multiple type packs, but we don't have defined semantics on instantiation of such type alias. - -## Design - -We currently support type packs at these locations: -```lua --- for variadic function parameter when type pack is generic -local function f(...: a...) - --- for multiple return values -local function f(): a... - --- as the tail item of function return value pack -local function f(): (number, a...) -``` - -We want to be able to use type packs for type alias instantiation: -```lua -type X = -- - -type A = X -- T... = (S...) -``` - -Similar to function calls, we want to be able to assign zero or more regular types to a single type pack: -```lua -type A = X<> -- T... = () -type B = X -- T... = (number) -type C = X -- T... = (number, string) -``` - -Definition of `A` doesn't parse right now, we would like to make it legal going forward. - -Variadic types can also be assigned to type alias type pack: -```lua -type D = X<...number> -- T... = (...number) -``` - -### Multiple type pack parameters - -We have to keep in mind that it is also possible to declare a type alias that takes multiple type pack parameters. - -Again, type parameters that haven't been matched with type arguments are combined together into the first type pack. -After the first type pack parameter was assigned, following type parameters are not allowed. -Type pack parameters after the first one have to be type packs: -```lua -type Y = -- - -type A = Y -- T... = S..., U... = S... -type B = Y<...string, S...> -- T... = (...string), U... = S... -type C = Y -- T... = (number, string), U... = S... -type D = Y<...number> -- error, T = (...number), but U... = undefined, not (...number) even though one infinite set is enough to fill two, we may have '...number' inside a type pack argument and we'll be unable to see its content -type E = Y -- error, type parameters are not allowed after a type pack - -type Z = -- - -type F = Z -- T = number, U... = S... -type G = Z -- error, not enough regular type arguments, can't split the front of S... into T - -type W = -- - -type H = W -- U... = S..., V... = R... -type I = W -- U... = (string), V... = S... -``` - -### Explicit type pack syntax - -To enable additional control for the content of a type pack, especially in cases where multiple type pack parameters are expected, we introduce an explicit type pack syntax for use in type alias instantiation. - -Similar to variadic types `...a` and generic type packs `T...`, explicit type packs can only be used at type pack positions: -```lua -type Y = (T...) -> (U...) - -type F1 = Y<(number, string), (boolean)> -- T... = (number, string), U... = (boolean) -type F2 = Y<(), ()> -- T... = (), U... = () -type F3 = Y -- T... = (string, number), U... = (number, S...) -``` - -In type parameter list, types inside the parentheses always produce a type pack. -This is in contrast to function return type pack annotation, where `() -> number` is the same as `() -> (number)`. - -However, to preserve backwards-compatibility with optional parenthesis around regular types, type alias instantiation is allowed to assign a non-variadic type pack parameter with a single element to a type argument: -```lua -type X = (T) -> U? -type A = X<(number), (string)> -- T = number, U = string -type A = X<(number), string> -- same - -type Y = (T...) -> () -type B = Y<(number), (string)> -- error: too many type pack parameters -``` - -Explicit type pack syntax is not available in other type pack annotation contexts. - -## Drawbacks - -### Type pack element extraction - -Because our type alias instantiations are not lazy, it's impossible to split of a single type from a type pack: -```lua -type Car = T - -type X = Car -- number -type Y = Car -- error, not enough regular type arguments -type Z = Y -- error, Y doesn't have a valid definition -``` - -With our immediate instantiation, at the point of `Car`, we only know that `S...` is a type pack, but contents are not known. - -Splitting off a single type is is a common pattern with variadic templates in C++, but we don't allow type alias overloads, so use cases are more limited. - -### Type alias can't result in a type pack - -We don't propose type aliases to generate type packs, which could have looked as: -```lua -type Car = T -type Cdr = U... -type Cons = (T, U...) - ---[[ - using type functions to operate on type packs as a list of types -]] -``` - -We wouldn't be able to differentiate if an instantiation results in a type or a type pack and our type system only allows variadic types as the type pack tail element. - -Support for variadic types in the middle of a type pack can be found in TypeScript's tuples. - -## Alternatives - -### Function return type syntax for explicit type packs - -Another option that was considered is to parse `(T)` as `T`, like we do for return type annotation. - -This option complicates the match ruleset since the typechecker will never know if the user has written `T` or `(T)` so each regular type could be a single element type pack and vice versa. -```lua -type X -type C = X -- T... = (number, number) -type D = X<(number), (number)> -- T... = (number, number) - -type Y - ---- two items that were enough to satisfy only a single T... in X are enough to satisfy two T..., U... in Y -type E = Y -- T... = (number), U... = (number) -``` - -### Special mark for single type type packs - -In the Rust language, there is a special disambiguation syntax for single element tuples and single element type packs using a trailing comma: -```rust -(Type,) -``` - -In Python, the same idea is used for single element tuple values: -```python -value = (1, ) -``` - -Since our current ruleset no longer has a problem with single element type tuples, I don't think we need syntax-directed disambiguation option like this one. - -### Only type pack arguments for type pack parameters - -One option that we have is to remove implicit pack assignment from a set of types and always require new explicit type pack syntax: - -```lua -type X = -- - -type B = X<> -- invalid -type C = X -- invalid -type D = X -- invalid - -type B = X<()> -- T... = () -type C = X<(number)> -- T... = (number) -type D = X<(number, string)> -- T... = (number, string) -``` - -But this doesn't allow users to define type aliases where they only care about a few types and use the rest as a 'tail': - -```lua -type X = (T, U, Rest...) -> Rest... - -type A = X -- forced to use a type pack when there are no tail elements -``` - -It also makes it harder to change the type parameter count without fixing up the instantiations. - -### Combining types together with the following type pack into a single argument - -Earlier version of the proposal allowed types to be combined together with a type pack as a tail: -```lua -type X = -- - -type A = X --- T... = (number, S...) -``` - -But this syntax resulted in some confusing behavior when multiple type pack arguments are expected: -```lua -type Y = -- - -type B = Y -- not enough type pack parameters -``` diff --git a/rfcs/syntax-type-ascription-bidi.md b/rfcs/syntax-type-ascription-bidi.md deleted file mode 100644 index 0831aba5..00000000 --- a/rfcs/syntax-type-ascription-bidi.md +++ /dev/null @@ -1,36 +0,0 @@ -# Relaxing type assertions - -**Status**: Implemented - -## Summary - -The way `::` works today is really strange. The best solution we can come up with is to allow `::` to convert between any two related types. - -## Motivation - -Due to an accident of the implementation, the Luau `::` operator can only be used for downcasts and casts to `any`. - -Because of this property, `::` works as users expect in a great many cases, but doesn't actually make a whole lot of sense when scrutinized. - -```lua -local t = {x=0, y=0} - -local a = t :: {x: number, y: number, z: number} -- OK -local a = t :: {x: number} -- Error: This is an upcast! -``` - -Originally, we intended for type assertions to only be useful for upcasts. This would make it consistent with the way annotations work in OCaml and Haskell and would never break soundness. However, users have yet to report this oddity! It is working correctly for them! - -From this, we conclude that users are actually much more interested in having a convenient way to write a downcast. We should bless this use and clean up the rules so they make more sense. - -## Design - -I propose that we change the meaning of the `::` operator to permit conversions between any two types for which either is a subtype of the other. - -## Drawbacks - -`::` was originally envisioned to be a way for users to make the type inference engine work smarter and better for them. The fact of the matter is, though, that downcasts are useful to our users. We should be responsive to that. - -## Alternatives - -We initially discussed allowing `::` to coerce anything to anything else, acting as a full bypass of the type system. We are not doing this because it is really just not that hard to implement: All we need to do is to succeed if unification works between the two types in either direction. Additionally, requiring one type to be subtype of another catches mistakes when two types are completely unrelated, e.g. casting a `string` to a table will still produce an error when this proposal is in effect - this will make sure that `::` is as safe of a bypass as it can be in practice. diff --git a/rfcs/syntax-type-ascription.md b/rfcs/syntax-type-ascription.md deleted file mode 100644 index e48b723a..00000000 --- a/rfcs/syntax-type-ascription.md +++ /dev/null @@ -1,68 +0,0 @@ -# Type ascriptions - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Implement syntax for type ascriptions using `::` - -## Motivation - -Luau would like to provide a mechanism for requiring a value to be of a specific type: - -``` --- Asserts that the result of a + b is a number. --- Emits a type error if it isn't. -local foo = (a + b) as number -``` - -This syntax was proposed in the original Luau syntax proposal. Unfortunately, we discovered that there is a syntactical ambiguity with `as`: - -``` --- Two function calls or a type assertion? -foo() as (bar) -``` - -## Design - -To provide this functionality without introducing syntactical confusion, we want to change this syntax to use the `::` symbol instead of `as`: - -``` -local foo = (a + b) :: number -``` - -This syntax is borrowed from Haskell, where it performs the same function. - -The `::` operator will bind very tightly, like `as`: - -``` --- type assertion applies to c, not (b + c). -local a = b + c :: number -``` - -Note that `::` can only cast a *single* value to a type - not a type pack (multiple values). This means that in the following context, `::` changes runtime behavior: - -``` -foo(1, bar()) -- passes all values returned by bar() to foo() -foo(1, bar() :: any) -- passes just the first value returned by bar() to foo() -``` - -## Drawbacks - -It's somewhat unusual for Lua to use symbols as operators, with the exception of arithmetics (and `..`). Also a lot of Luau users may be familiar with TypeScript, where the equivalent concept uses `as`. - -`::` may make it more difficult for us to use Turbofish (`::<>`) in the future. - -## Alternatives - -We considered requiring `as` to be wrapped in parentheses, and then relaxing this restriction where there's no chance of syntactical ambiguity: - -``` -local foo: SomeType = (fn() as SomeType) --- Parentheses not needed: unambiguous! -bar(foo as number) -``` - -We decided to not go with this due to concerns about the complexity of the grammar - it requires users to internalize knowledge of our parser to know when they need to surround an `as` expression with parentheses. The rules for when you can leave the parentheses out are somewhat nonintuitive. diff --git a/rfcs/syntax-typed-variadics.md b/rfcs/syntax-typed-variadics.md deleted file mode 100644 index 2988787e..00000000 --- a/rfcs/syntax-typed-variadics.md +++ /dev/null @@ -1,45 +0,0 @@ -# Typed variadics - -> Note: this RFC was adapted from an internal proposal that predates RFC process - -**Status**: Implemented - -## Summary - -Add syntax for ascribing a type to variadic pack (`...`). - -## Motivation - -Luau's type checker internally can represent a typed variadic: any number of values of the same type. Developers should be able to describe this construct in their own code, for cases where they have a function that accepts an arbitrary number of `string`s, for example. - -## Design - -We think that the postfix `...: T` syntax is the best balance of readability and simplicity. In function type annotations, we will use `...T`: - -``` -function math.max(...: number): number -end - -type fn = (...string) -> string - -type fn2 = () -> ...string -``` - -This doesn't introduce syntactical ambiguity and should cover all cases where we need to represent this construct. Like `...` itself, this syntax is only legal as the last parameter to a function. - -Like all type annotations, the `...: T` syntax has no effect on runtime behavior versus an unannotated `...`. - -There are currently no plans to introduce named variadics, but this proposal leaves room to adopt them with the form `...name: Type` in function declarations in the future. - -## Drawbacks - -The mismatch between the type of `...` in function declaration (`number`) and type declaration (`...number`) is a bit awkward. This also gets more complicated when we introduce generic variadic packs. - -## Alternatives - -We considered several other syntaxes for this construct: - -* `...T`: leaves no room to introduce named variadics -* `...: T...`: redundant `...` -* `... : ...T`: feels redundant, same as above -* `...: T*`: potentially confusing for users with C knowledge, where `T*` is a pointer type diff --git a/rfcs/type-byte-buffer.md b/rfcs/type-byte-buffer.md deleted file mode 100644 index b01ac00c..00000000 --- a/rfcs/type-byte-buffer.md +++ /dev/null @@ -1,185 +0,0 @@ -# Byte buffer type - -## Summary - -A new built-in type to serve as a mutable array of bytes, with a library for reading and writing the contents. - -## Motivation - -The existing mechanisms for representing binary data in Luau can be insufficient for performance-oriented use cases. - -A binary blob may be represented as an array of numbers 0-255 (idiomatic and reasonably performant, but very space-inefficient: each element takes 16 bytes, and it's difficult to work with data that is wider than bytes) or a string (only works for read-only cases, data extraction is possible via `string.unpack` but not very efficient). Neither of the two options are optimal, especially when the use case is data encoding (as opposed to decoding). - -While the host can provide custom data types that close this gap using `userdata` with overridden `__index`/`__newindex` that provide byte storage, the resulting type would be memory-efficient but not performance-efficient due to the cost of metamethod dispatch for every access. Additionally, since every host has a different API, this would make it difficult to write portable Luau algorithms that require efficient binary access. - -With this type, we solve the use cases for binary format encoding and decoding. This opens the door for developers to work with file formats that might've been too large to represent with tables or to write to strings. It also allows for writing algorithms that deal with raw data often, such as compression or hashing. Web services that exchange data in packed formats could also benefit from this. The new type can also serve as a more efficient internal representation for libraries that provide higher level objects like images or geometry data. - -Other high-level languages support similar data structures, for example [Java ByteByffer](https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/nio/ByteBuffer.html) or [JavaScript ArrayBuffer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/ArrayBuffer). - -## Design - -This type will be called 'buffer' and will be implemented using a new built-in type (GCObject with new tag). - -By default, metatable is not set for this type and can only be modified using `lua_setmetatable` C API. - -Operations on this type will be exposed through a new Luau library called 'buffer`, with the following functions: - -`buffer.create(size: number): buffer` - -Instantiates the object with a fixed size. -Each byte is initialized to 0. - -'size' has to be an integer and it cannot be negative. Maximum size is defined by implementation, but it at least matches the maximum string size. - -`buffer.fromstring(str: string): buffer` - -Instantiates the object from a string. -The size of the buffer is fixed and equals to the length of the string. - -`buffer.tostring(): string` - -Returns the buffer data as a string. - -`buffer.len(b: buffer): number` - -Returns the size of the buffer. - -`buffer.copy(target_buffer: buffer, target_offset: number, source_buffer: buffer, source_offset: number?, count: number?): ()` - -Copy 'count' bytes from 'source_buffer' starting at offset 'source_offset' into the 'target_buffer' at 'target_offset'. - -It is possible for 'source_buffer' and 'target_buffer' to be the same. -Copying an overlapping region inside the same buffer acts as if the source region is copied into a temporary buffer and then that buffer is copied over to the target. - -If 'source_offset' is nil or is omitted, it defaults to 0. -If 'count' is 'nil' or is omitted, the whole 'source_buffer' data starting from 'source_offset' is taken. - -`buffer.fill(b: buffer, offset: number, value: number, count: number?): ()` - -Set 'count' bytes in the buffer starting from specified offset to 'value'. - -'value' is converted to unsigned integer using `bit32` library semantics, lower 8 bits are taken from the resulting integer to use as the byte value. - -If 'count' is 'nil' or is omitted, all bytes after the specified offset are set. - -`buffer.readi8(b: buffer, offset: number): number` - -`buffer.readu8(b: buffer, offset: number): number` - -`buffer.readi16(b: buffer, offset: number): number` - -`buffer.readu16(b: buffer, offset: number): number` - -`buffer.readi32(b: buffer, offset: number): number` - -`buffer.readu32(b: buffer, offset: number): number` - -`buffer.readf32(b: buffer, offset: number): number` - -`buffer.readf64(b: buffer, offset: number): number` - -Used to read the data from the buffer by reinterpreting bytes at the offset as the type in the argument and converting it into a number. - -When reading the value of any NaN representation, implementation can (but not required to) replace it with a different quiet NaN representation. - -`buffer.writei8(b: buffer, offset: number, value: number): ()` - -`buffer.writeu8(b: buffer, offset: number, value: number): ()` - -`buffer.writei16(b: buffer, offset: number, value: number): ()` - -`buffer.writeu16(b: buffer, offset: number, value: number): ()` - -`buffer.writei32(b: buffer, offset: number, value: number): ()` - -`buffer.writeu32(b: buffer, offset: number, value: number): ()` - -`buffer.writef32(b: buffer, offset: number, value: number): ()` - -`buffer.writef64(b: buffer, offset: number, value: number): ()` - -Used to write data to the buffer by converting the number into the type specified by the argument and reinterpreting it as individual bytes. - -Conversion to integer numbers performs a truncation of the number value. Results of converting special number values (inf/nan) are platform-specific. -Conversion to unsigned numbers uses `bit32` library semantics. - -`buffer.readstring(b: buffer, offset: number, count: number): string` - -Used to read a string of length 'count' from the buffer at specified offset. - -`buffer.writestring(b: buffer, offset: number, value: string, count: number?): ()` - -Used to write data from a string into the buffer at specified offset. - -If an optional 'count' is specified, only 'count' bytes are taken from the string. 'count' cannot be larger than the string length. - ---- - -All offsets start at 0 (not to be confused with indices that start at 1 in Luau tables). -This choice is made for both performance reasons (no need to subtract 1) and for compatibility with data formats that often describe field positions using offsets. -While there is a way to solve the performance problem using luajit trick where table array part is allocated from index 0, this would mean that data in the buffer has 1 extra byte and this complicates the bounds checking. - -Offsets and 'count' numbers are cast to an integer in an implementation-defined way. - -Read and write operations for relevant types are little endian as it is the most common use case, and conversion is often trivial to do manually. - -Integer numbers are read and written using two's complement representation. - -Floating-point numbers are read and written using a format specified by IEEE 754. - -Additionally, unaligned offsets in all operations are valid and behave as expected. - -Unless otherwise specified, if a read or write operation would cause an access outside the data in the buffer, an error is thrown. - -### Public C API - -`void* lua_tobuffer(lua_State* L, int idx, size_t* len);` - -Used to fetch buffer data pointer and buffer size at specified location. - -If there is no buffer at the location, `NULL` is returned and `len` is not modified. - -`void* lua_newbuffer(lua_State* L, size_t l);` - -Pushes new buffer of size `l` onto the stack. - -`lua_isbuffer(L, n)` - -C macro helper to check if value at the specified location is a buffer. - -Simiar to `lua_istable`/`lua_isvector`/`lua_isthread` it's a simple wrapper over `lua_type` call and doesn't require internal coercions/internal field access like `lua_isnumber`/`lua_iscfunction`. - -`void* luaL_checkbuffer(lua_State* L, int narg, size_t* len);` - -Similar to `lua_tobuffer`, but throws a tag error if there is no buffer at specified location. - -`int luaopen_buffer(lua_State* L);` - -Registers the 'buffer' library. If `luaL_openlibs` is used, that includes the 'buffer' library. - -`LUA_BUFFERLIBNAME` - -Macro containing the 'buffer' library name. - -## Drawbacks - -This introduces 'buffer' as a class type in global typing context and adds new global 'buffer' table. -While class type might intersect with user-defined 'buffer' type, such type redefinitions are already allowed in Luau, so this should not cause new type errors. -The same goes for the global table, users can already override globals like 'string', so additional of a new global is backwards-compatible, but new table will not be accessible in such a case. - -This increases the complexity of the VM a little bit, since support for new tagged type is required in interpreter loop and GC. - -There is also a string buffer C API; by having functions talk about 'buffer' (like `luaL_extendbuffer`) and use `luaL_Buffer`, it might be a point of confusion for C API users. - -## Alternatives - -The workarounds without this feature are significantly inefficient: - -* Tables can, at most, represent 64 bits per slot using expensive `vector` packing. -* Tables with or without packing severely bloat memory, as each array entry is subject to Luau value size and alignment. -* Strings are immutable and can’t be used to efficiently construct binary data without exponential allocations. -* Built in `string.pack` and `string.unpack` can’t cover more complex schemas on their own or formats which are edited mid-creation. - -The proposed buffer object has no cursor/position as part of its state; while it would be possible to implement this along with a separate set of APIs like `pushTYPE` and `takeTYPE`, this addition is always possible to implement later and it makes the buffer structure more complicated; additionally, external offset management might be easier to optimize and is more orthogonal as we do not need to duplicate stateful and stateless functions. - -The proposed buffer object is not resizeable; this is possible to implement later using explicit `buffer.resize` call, however this may result in a performance impact for native implementation as the data will be read through a pointer redirection and will be more difficult to optimize; thus, this version of the RFC only proposes fixed length buffers. That said, if resizeable buffers are desired in the future, we would plan to enhance the current buffer type instead of making a parallel resizeable buffer type to reduce complexity. diff --git a/rfcs/type-error-suppression.md b/rfcs/type-error-suppression.md deleted file mode 100644 index 27a1c132..00000000 --- a/rfcs/type-error-suppression.md +++ /dev/null @@ -1,131 +0,0 @@ -# Type Error Suppression - -## Summary - -An alternative approach to type error suppression and the `any` type. - -## Motivation - -There are two reasons for this RFC: to make clearer how we're -approaching error suppression, and to remove the magic "both top and -bottom" behavior of the `any` type. - -### Error suppression - -Currently, we have ad hoc error suppression, where we try to avoid cascading errors, for example in -```lua - local x = t.p.q.r -``` - -if `t` is a table without a `p` field, we report a type error on -`t.p`, but we avoid cascading errors by assigning `t.p` an internal -`error` type, and suppressing errors in property access `M.p` when `M` -has type `error`. - -In this RFC, we clarify that error suppression occurs when the error -is caused by a type `T`, and `error` is a subtype of `T`. - -### The `any` type - -The `any` type is an outlier in the type system, in that currently it -is both a top type (`T` is a subtype of `any` for all types `T`) and a -bottom type (`any` is a subtype of `U` for all types `U`). This is -"consistent subtyping" (written `T ≾ U`) from Siek and Taha (2007), -which has the issue of not being transitive (if it were, then `T ≾ U` -for all types `T` and `U`, which is not a very useful definition of -subtyping). - -The solution used by Siek and Taha is to split consistent subtyping (`S ≾ U`) -into a *consistency relation* `S ~ T` and a *subtyping relation* (`T <: U`). -The role of the consistency relation is to allow `any` to stand in for any type -(`any ~ T` for all types `T`). - -We propose something different: performing *error suppression* on -failures of subtyping. We treat `any` as a top type, so `T <: any`, -but suppress type error messages caused by `any <: U` failing. - -## Design - -This design uses an `error` type (though adding user syntax for it is -out of scope of this RFC). - -Call a type: - - * shallowly safe when any uses of `error` or `any` are inside a table or function type, and - * deeply safe when it does not contain `error` or `any` anywhere. - -A type `T` is shallowly unsafe precisely when `error <: T`. - -We add a new subtyping relationship: - - * `any <: unknown | error` - -We keep the existing subtyping relationships: - - * `T <: any` for any type `T` - -We add a proviso to `unknown` being a top type: - - * `T <: unknown` for any *shallowly safe* type `T` - -Currently, we consider a subtype test to have failed when it generates -no errors. We separate out the result of the check from its errors, -and instead have a requirement: - - * If checking `T <: U` succeeds, it produces no errors. - -It is now possible for a subtyping test to fail, but produce no errors. -For example, `number <: any` succeeds (since `any` is the top type) -and `number <: string` fails with an error, but now `any <: string` fails -*but produces no errors*. - -For end users, who only care about errors being reported, this will not be -a noticable change (but see the discussion of breaking changes below). -Internally though, it helps us avoid footguns, since now subtyping -is transitive. - -The subtype testing algorithm changes: - - * Subtype checking returns a boolean. - * Replace all of the current tests of "errors are empty" by testing the return value. - * In the case of testing `any <: T`, return `true` with no errors. - * In the case of testing `T <: any`, return `false` with no errors. - * In the case of testing `T <: unknown`, check `T` for being a shallowly safe type. - -These changes are not huge, and can be implemented for both the current greedy unifier, -and future constraint solvers. - -Theses changes have been prototyped: https://github.com/luau-lang/agda-typeck/pull/4 - -## Drawbacks - -This is theoretically a breaking change but my word you have to work hard at it. -For just checking subtyping there is no difference: the new algorithm returns `true` precisely -when the old algorithm generates no errors. But it can result in different unifications. - -For example, if `Y` is a free type variable, then currently checking `(any & Y) <: number` -will not perform any unification, which makes a difference to the program: - -```lua - function f(x : any, y) -- introduces a new free type Y for y - if x == y then -- type refinement makes y have type (any & Y) - return math.abs(y) -- checks (any & Y) <: number - end - end -``` - -Currently we infer type `(any, a) -> number` for `f`. With the new -algorithm, checking `(any & Y) <: number` will succeed by unifying `Y` -with `number`, so `f` will be given the more accurate type -`(any, number) -> number`. - -So this is a breaking change, but results in a more accurate type. -In practice it is unlikely that this change will do anything but help find bugs. - -## Alternatives - -We could implement Siek and Taha's algorithm, but that only helps with -`any`, not with more general error supression. - -We could leave everything alone, and live with the weirdness of non-transitive subtyping. - diff --git a/rfcs/unsealed-table-assign-optional-property.md b/rfcs/unsealed-table-assign-optional-property.md deleted file mode 100644 index 477399c2..00000000 --- a/rfcs/unsealed-table-assign-optional-property.md +++ /dev/null @@ -1,60 +0,0 @@ -# Unsealed table assignment creates an optional property - -**Status**: Implemented - -## Summary - -In Luau, tables have a state, which can, among others, be "unsealed". -An unsealed table is one that we are still constructing. Currently -assigning a table literal to an unsealed table does not introduce new -properties, so it is a type error if they are read. -We would like to change this so that assigning a table -literal to an unsealed table creates an optional property. - -## Motivation - -In lua-apps, there is testing code which (simplified) looks like: - -```lua -local t = { u = {} } -t = { u = { p = 37 } } -t = { u = { q = "hi" } } -local x: number? = t.u.p -local y: string? = t.u.q -``` - -Currently, this code doesn't typecheck, due to `p` and `q` being unknown properties of `t.u`. - -## Design - -In order to support this idiom, we propose that assigning a table -to an unsealed table should add an optional property. - -For example, before this change the type of `t` is `{ u: {} }`, -and after this change is `{ u: { p: number?, q: number? } }`. - -This is implemented by adding a case to unification where the supertype -is an unsealed table, and the subtype is a table with extra properties. -Currently the extra properties are ignored, but with this change we would -add the property to the unsealed table (making it optional if necessary). - -Since tables with optional properties of the same type are subtypes of -tables with indexers, this allows table literals to be used as dictionaries, -for example the type of `t` is a subtype of `{ u: { [string]: number } }`. - -Note that we need to add an optional property, otherwise the example above will not typecheck. -```lua -local t = { u = {} } -t = { u = { p = 37 } } -t = { u = { q = "hi" } } -- fails because there's no u.p -``` - -## Drawbacks - -The implementation of this proposal introduces optional types during unification, -and so needs access to an allocator. - -## Alternatives - -Rather than introducing optional properties, we could introduce an indexer. For example we could infer the type of -`t` as `{ u: { [string]: number } }`. diff --git a/rfcs/unsealed-table-literals.md b/rfcs/unsealed-table-literals.md deleted file mode 100644 index 669b67d4..00000000 --- a/rfcs/unsealed-table-literals.md +++ /dev/null @@ -1,78 +0,0 @@ -# Unsealed table literals - -**Status**: Implemented - -## Summary - -Currently the only way to create an unsealed table is as an empty table literal `{}`. -This RFC proposes making all table literals unsealed. - -## Motivation - -Table types can be *sealed* or *unsealed*. These are different in that: - -* Unsealed table types are *precise*: if a table has unsealed type `{ p: number, q: string }` - then it is guaranteed to have only properties `p` and `q`. - -* Sealed tables support *width subtyping*: if a table has sealed type `{ p: number }` - then it is guaranteed to have at least property `p`, so we allow `{ p: number, q: string }` - to be treated as a subtype of `{ p: number }` - -* Unsealed tables can have properties added to them: if `t` has unsealed type - `{ p: number }` then after the assignment `t.q = "hi"`, `t`'s type is updated to be - `{ p: number, q: string }`. - -* Unsealed tables are subtypes of sealed tables. - -Currently the only way to create an unsealed table is using an empty table literal, so -```lua - local t = {} - t.p = 5 - t.q = "hi" -``` -typechecks, but -```lua - local t = { p = 5 } - t.q = "hi" -``` -does not. - -This causes problems in examples, in particular developers -may initialize properties but not methods: -```lua - local t = { p = 5 } - function t.f() return t.p end -``` - -## Design - -The proposed change is straightforward: make all table literals unsealed. - -## Drawbacks - -Making all table literals unsealed is a conservative change, it only removes type errors. - -It does encourage developers to add new properties to tables during initialization, which -may be considered poor style. - -It does mean that some spelling mistakes will not be caught, for example -```lua -local t = {x = 1, y = 2} -if foo then - t.z = 3 -- is z a typo or intentional 2-vs-3 choice? -end -``` - -In particular, we no longer warn about adding properties to array-like tables. -```lua -local a = {1,2,3} -a.p = 5 -``` - -## Alternatives - -We could introduce a new table state for unsealed-but-precise -tables. The trade-off is that that would be more precise, at the cost -of adding user-visible complexity to the type system. - -We could continue to treat array-like tables as sealed. diff --git a/rfcs/unsealed-table-subtyping-strips-optional-properties.md b/rfcs/unsealed-table-subtyping-strips-optional-properties.md deleted file mode 100644 index d99c1f81..00000000 --- a/rfcs/unsealed-table-subtyping-strips-optional-properties.md +++ /dev/null @@ -1,68 +0,0 @@ -# Only strip optional properties from unsealed tables during subtyping - -**Status**: Implemented - -## Summary - -Currently subtyping allows optional properties to be stripped from table types during subtyping. -This RFC proposes only allowing that when the subtype is unsealed and the supertype is sealed. - -## Motivation - -Table types can be *sealed* or *unsealed*. These are different in that: - -* Unsealed table types are *precise*: if a table has unsealed type `{ p: number, q: string }` - then it is guaranteed to have only properties `p` and `q`. - -* Sealed tables support *width subtyping*: if a table has sealed type `{ p: number }` - then it is guaranteed to have at least property `p`, so we allow `{ p: number, q: string }` - to be treated as a subtype of `{ p: number }` - -* Unsealed tables can have properties added to them: if `t` has unsealed type - `{ p: number }` then after the assignment `t.q = "hi"`, `t`'s type is updated to be - `{ p: number, q: string }`. - -* Unsealed tables are subtypes of sealed tables. - -Currently we allow subtyping to strip away optional fields -as long as the supertype is sealed. -This is necessary for examples, for instance: -```lua - local t : { p: number, q: string? } = { p = 5, q = "hi" } - t = { p = 7 } -``` -typechecks because `{ p : number }` is a subtype of -`{ p : number, q : string? }`. Unfortunately this is not sound, -since sealed tables support width subtyping: -```lua - local t : { p: number, q: string? } = { p = 5, q = "hi" } - local u : { p: number } = { p = 5, q = false } - t = u -``` - -## Design - -The fix for this source of unsoundness is twofold: - -1. make all table literals unsealed, and -2. only allow stripping optional properties from when the - supertype is sealed and the subtype is unsealed. - -This RFC is for (2). There is a [separate RFC](unsealed-table-literals.md) for (1). - -## Drawbacks - -This introduces new type errors (it has to, since it is fixing a source of -unsoundness). This means that there are now false positives such as: -```lua - local t : { p: number, q: string? } = { p = 5, q = "hi" } - local u : { p: number } = { p = 5, q = "lo" } - t = u -``` -These false positives are so similar to sources of unsoundness -that it is difficult to see how to allow them soundly. - -## Alternatives - -We could just live with unsoundness. -