# What's changed?
* Support for new 'require by string' RFC with relative paths and
aliases in now enabled in Luau REPL application
### New Type Solver
* Fixed assertion failure on generic table keys (`[expr] = value`)
* Fixed an issue with type substitution traversing into the substituted
parts during type instantiation
* Fixed crash in union simplification when that union contained
uninhabited unions and other types inside
* Union types in binary type families like `add<a | b, c>` are expanded
into `add<a, c> | add<b, c>` to handle
* Added handling for type family solving creating new type families
* Fixed a bug with normalization operation caching types with unsolved
parts
* Tables with uninhabited properties are now simplified to `never`
* Fixed failures found by fuzzer
### Native Code Generation
* Added support for shared code generation between multiple Luau VM
instances
* Fixed issue in load-store propagation and new tagged LOAD_TVALUE
instructions
* Fixed issues with partial register dead store elimination causing
failures in GC assists
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: James McNellis <jmcnellis@roblox.com>
Co-authored-by: Vighnesh Vijay <vvijay@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
# What's Changed
## New Type Solver
- Many more fixes to crashes, assertions, and hangs
- Annotated locals now countermand the inferred types of locals, meaning
that for a type `type MyType = number | string`, `local foo : MyType =
5` behaves the same as `local foo = 5 :: MyType`, where before, foo
would be assigned the type of the value on the rhs.
- Type Normalization now respects resource limits.
- Subtyping between classes and cyclic tables now supported
## Native Code Generation
- Work on the Native Code Generation(NCG) allocator continues
---
# Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: James McNellis <jmcnellis@roblox.com>
Co-authored-by: Vighnesh Vijay <vvijay@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
# What's Changed
## New Type Solver
- Many fixes to crashes, assertions, and hangs
- Binary type family aliases now have a default parameter
- Added a debug check for unsolved types escaping the constraint solver
- Overloaded functions are no longer inferred
- Unification creates additional subtyping constraints for blocked types
- Attempt to guess the result type for type families that are too large
to resolve timely
## Native Code Generation
- Fixed `IrCmd::CHECK_TRUTHY` lowering in a specific case
- Detailed compilation errors are now supported
- More work on the new allocator
---
# Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: James McNellis <jmcnellis@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
Co-authored-by: Vighnesh Vijay <vvijay@roblox.com>
# What's changed
### Debugger
* Values after a 'continue' statement should not be accessible by
debugger in the 'until' condition
### New Type Solver
* Many fixes to crashes and hangs
* Better bidirectional inference of table literal expressions
### Native Code Generation
* Initial steps toward a shared code allocator
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
# What's Changed
* Fix a case where the stack wasn't completely cleaned up where
`debug.info` errored when passed `"f"` option and a thread.
* Fix a case of uninitialized field in `luaF_newproto`.
### New Type Solver
* When a local is captured in a function, don't add a new entry to the
`DfgScope::bindings` if the capture occurs within a loop.
* Fix a poor performance characteristic during unification by not trying
to simplify an intersection.
* Fix a case of multiple constraints mutating the same blocked type
causing incorrect inferences.
* Fix a case of assertion failure when overload resolution encounters a
return typepack mismatch.
* When refining a property of the top `table` type, we no longer signal
an unknown property error.
* Fix a misuse of free types when trying to infer the type of a
subscript expression.
* Fix a case of assertion failure when trying to resolve an overload
from `never`.
### Native Code Generation
* Fix dead store optimization issues caused by partial stores.
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
When lowering LOADK for booleans/numbers/nils, we deconstruct the
operation using STORE_TAG which informs the rest of the optimization
pipeline about the tag of the value. This is helpful to remove various
tag checks.
When the constant is a string or a vector, we just use
LOAD_TVALUE/STORE_TVALUE. For strings, this could be replaced by pointer
load/store, but for vectors there's no great alternative using current
IR ops; in either case, the optimization needs to be carefully examined
for profitability as simply copying constants into registers for
function calls could become more expensive.
However, there are cases where it's still valuable to preserve the tag.
For vectors, doing any math with vector constants contains tag checks
that could be removed. For both strings and vectors, storing them into a
table has a barrier that for vectors could be elided, and for strings
could be simplified as there's no need to confirm the tag.
With this change we now carry the optional tag of the value with
LOAD_TVALUE. This has no performance effect on existing benchmarks but
does reduce the generated code for benchmarks by ~0.1%, and it makes
vector code more efficient (~5% lift on X64 log1p approximation).
vm.mmap_rnd_bits has been recently changed to 32 on GHA, which triggers
issues in ASAN builds that spuriously fail on startup. The fix requires
a more recent clang/gcc than the agents have available (clang 17, not
sure what GCC version), so for now we need to work around this by
restricting the ASLR randomness.
See https://github.com/google/sanitizers/issues/1614
When the input is a constant, we use a fairly inefficient sequence of
fmov+fcvt+dup or, when the double isn't encodable in fmov,
adr+ldr+fcvt+dup.
Instead, we can use the same lowering as X64 when the input is a
constant, and load the vector from memory. However, if the constant is
encodable via fmov, we can use a vector fmov instead (which is just one
instruction and doesn't need constant space).
Fortunately the bit encoding of fmov for 32-bit floating point numbers
matches that of 64-bit: the decoding algorithm is a little different
because it expands into a larger exponent, but the values are
compatible, so if a double can be encoded into a scalar fmov with a
given abcdefgh pattern, the same pattern should encode the same float;
due to the very limited number of mantissa and exponent bits, all values
that are encodable are also exact in both 32-bit and 64-bit floats.
This strategy is ~same as what gcc uses. For complex vectors, we
previously used 4 instructions and 8 bytes of constant storage, and now
we use 2 instructions and 16 bytes of constant storage, so the memory
footprint is the same; for simple vectors we just need 1 instruction (4
bytes).
clang lowers vector constants a little differently, opting to synthesize
a 64-bit integer using 4 instructions (mov/movk) and then move it to the
vector register - this requires 5 instructions and 20 bytes, vs ours/gcc
2 instructions and 8+16=24 bytes. I tried a simpler version of this that
would be more compact - synthesize a 32-bit integer constant with
mov+movk, and move it to vector register via dup.4s - but this was a
little slower on M2, so for now we prefer the slightly larger version as
it's not a regression vs current implementation.
On the vector approximation benchmark we get:
- Before this PR (flag=false): ~7.85 ns/op
- After this PR (flag=true): ~7.74 ns/op
- After this PR, with 0.125 instead of 0.123 in the benchmark code (to
use fmov): ~7.52 ns/op
- Not part of this PR, but the mov/dup strategy described above: ~8.00
ns/op
The last line of the help message was missing a newline character. I
feel a little silly creating a pull request for a 2 character change but
it was bothering me. Fixes#1185
# What's Changed
* Add a compiler hint to improve Luau memory allocation inlining
### New Type Solver
* Added a system for recommending explicit type annotations to users in
cases where we've inferred complex generic types with type families.
* Marked string library functions as `@checked` for use in new
non-strict mode.
* Fixed a bug with new non-strict mode where we would incorrectly report
arity mismatches when missing optional arguments.
* Implement an occurs check for unifications that would produce
self-recursive types.
* Fix bug where overload resolution would fail when applied to
non-overloaded functions.
* Fix bug that caused the subtyping to report an error whenever a
generic was instantiated in an invariant context.
* Fix crash caused by `SetPropConstraint` not blocking properly.
### Native Code Generation
* Implement optimization to eliminate dead stores
* Optimize vector ops for X64 when the source is computed (thanks,
@zeux!)
* Use more efficient lowering for UNM_* (thanks, @zeux!)
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
# What's changed?
* Luau allocation scheme was changed to handle allocations in 513-1024
byte range internally without falling back to global allocator
* coroutine/thread creation no longer requires any global allocations,
making it up to 15% faster (vs libc malloc)
* table construction for 17-32 keys or 33-64 array elements is up to 30%
faster (vs libc malloc)
### New Type Solver
* Cyclic unary negation type families are reduced to `number` when
possible
* Class types are skipped when searching for free types in unifier to
improve performance
* Fixed issues with table type inference when metatables are present
* Improved inference of iteration loop types
* Fixed an issue with bidirectional inference of method calls
* Type simplification will now preserve error suppression markers
### Native Code Generation
* Fixed TAG_VECTOR skip optimization to not break instruction use counts
(broken optimization wasn't included in 614)
* Fixed missing side-effect when optimizing generic loop preparation
instruction
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
With the TAG_VECTOR change, we can now confidently distinguish cases
when the .w component
contains TVECTOR tag from cases where it doesn't: loads and tag ops
produce the tag, whereas
other instructions don't.
We now take advantage of this fact and only apply vandps with a mask
when we need to.
It would be possible to use a positive filter (explicitly checking for
source coming from ADD_VEC
et al), but there are more instructions to check this way and this is
purely an optimization so
it is allowed to be conservative (as in, the cost of a mistake here is a
potential slowdown,
not a correctness issue).
Additionally, this change only performs vandps once when the arguments
are the same instead
of doing it twice.
On the function that computes a polynomial approximation this change
makes it ~20% faster on Zen4.
When --vector-lib is not specified, CompileOptions::vectorLib was set to
an empty string. This resulted in the builtin matching not working,
since vectorLib must either be a null pointer or a pointer to a valid
global identifier.
---------
Co-authored-by: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com>
# What's changed?
Add program argument passing to scripts run using the Luau REPL! You can
now pass `--program-args` (or shorthand `-a`) to the REPL which will
treat all remaining arguments as arguments to pass to executed scripts.
These values can be accessed through variadic argument expansion. You
can read these values like so:
```
local args = {...} -- gets you an array of all the arguments
```
For example if we run the following script like `luau test.lua -a test1
test2 test3`:
```
-- test.lua
print(...)
```
you should get the output:
```
test1 test2 test3
```
### Native Code Generation
* Improve A64 lowering for vector operations by using vector
instructions
* Fix lowering issue in IR value location tracking!
- A developer reported a divergence between code run in the VM and
Native Code Generation which we have now fixed
### New Type Solver
* Apply substitution to type families, and emit new constraints to
reduce those further
* More progress on reducing comparison (`lt/le`)type families
* Resolve two major sources of cyclic types in the new solver
### Miscellaneous
* Turned internal compiler errors (ICE's) into warnings and errors
-------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
Instead of patching the tag component with TVECTOR in every instruction
that produces a vector value, we now use a separate IR instruction to do
this. This reduces implementation redundancy, but more importantly
allows for a class of optimizations:
- NUM_TO_VECTOR previously patched the component unconditionally but the
result was used only in MUL/DIV_VEC instructions that ignore it anyway;
we can now remove this.
- ADD_VEC et al can now forward the source of TAG_VECTOR instruction of
either input; this shortens the latency chain and in the future could
allow us to generate optimal vector instruction sequence once the
temporary stores are marked as dead.
- In the future on X64, ADD_VEC et al will be able to analyze the input
instruction and remove tag masking conditionally. This is not part of
this PR as it requires a decision around expected FP environment and/or
the necessity of the existing masking to begin with.
I've also renamed NUM_TO_VECTOR to NUM_TO_VEC so that "VEC" always
refers to "3 float values" and for consistency with ADD/etc.
Note: ADD_VEC input forwarding is currently performed unconditionally;
it may or may not increase the spills that can't be reloaded from the
stack.
On A64 this makes the Taylor series computation a tiny bit faster
(11.3ns => 11.0ns) as it removes the redundant ins instructions along
the NUM_TO_VEC path. Curiously, the optimization of forwarding
TAG_VECTOR input to arithmetic instructions actually has a small penalty
as without it this PR runs at 10.9 ns. I don't know if this is a
property of the benchmark though, as I just noticed that in this
benchmark type inference actually fails to infer parts of the
computation as a vector op. If desired I will happily omit this part of
the change and we can explore that separately.