# What's changed?
* Luau allocation scheme was changed to handle allocations in 513-1024
byte range internally without falling back to global allocator
* coroutine/thread creation no longer requires any global allocations,
making it up to 15% faster (vs libc malloc)
* table construction for 17-32 keys or 33-64 array elements is up to 30%
faster (vs libc malloc)
### New Type Solver
* Cyclic unary negation type families are reduced to `number` when
possible
* Class types are skipped when searching for free types in unifier to
improve performance
* Fixed issues with table type inference when metatables are present
* Improved inference of iteration loop types
* Fixed an issue with bidirectional inference of method calls
* Type simplification will now preserve error suppression markers
### Native Code Generation
* Fixed TAG_VECTOR skip optimization to not break instruction use counts
(broken optimization wasn't included in 614)
* Fixed missing side-effect when optimizing generic loop preparation
instruction
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
With the TAG_VECTOR change, we can now confidently distinguish cases
when the .w component
contains TVECTOR tag from cases where it doesn't: loads and tag ops
produce the tag, whereas
other instructions don't.
We now take advantage of this fact and only apply vandps with a mask
when we need to.
It would be possible to use a positive filter (explicitly checking for
source coming from ADD_VEC
et al), but there are more instructions to check this way and this is
purely an optimization so
it is allowed to be conservative (as in, the cost of a mistake here is a
potential slowdown,
not a correctness issue).
Additionally, this change only performs vandps once when the arguments
are the same instead
of doing it twice.
On the function that computes a polynomial approximation this change
makes it ~20% faster on Zen4.
When --vector-lib is not specified, CompileOptions::vectorLib was set to
an empty string. This resulted in the builtin matching not working,
since vectorLib must either be a null pointer or a pointer to a valid
global identifier.
---------
Co-authored-by: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com>
# What's changed?
Add program argument passing to scripts run using the Luau REPL! You can
now pass `--program-args` (or shorthand `-a`) to the REPL which will
treat all remaining arguments as arguments to pass to executed scripts.
These values can be accessed through variadic argument expansion. You
can read these values like so:
```
local args = {...} -- gets you an array of all the arguments
```
For example if we run the following script like `luau test.lua -a test1
test2 test3`:
```
-- test.lua
print(...)
```
you should get the output:
```
test1 test2 test3
```
### Native Code Generation
* Improve A64 lowering for vector operations by using vector
instructions
* Fix lowering issue in IR value location tracking!
- A developer reported a divergence between code run in the VM and
Native Code Generation which we have now fixed
### New Type Solver
* Apply substitution to type families, and emit new constraints to
reduce those further
* More progress on reducing comparison (`lt/le`)type families
* Resolve two major sources of cyclic types in the new solver
### Miscellaneous
* Turned internal compiler errors (ICE's) into warnings and errors
-------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
Instead of patching the tag component with TVECTOR in every instruction
that produces a vector value, we now use a separate IR instruction to do
this. This reduces implementation redundancy, but more importantly
allows for a class of optimizations:
- NUM_TO_VECTOR previously patched the component unconditionally but the
result was used only in MUL/DIV_VEC instructions that ignore it anyway;
we can now remove this.
- ADD_VEC et al can now forward the source of TAG_VECTOR instruction of
either input; this shortens the latency chain and in the future could
allow us to generate optimal vector instruction sequence once the
temporary stores are marked as dead.
- In the future on X64, ADD_VEC et al will be able to analyze the input
instruction and remove tag masking conditionally. This is not part of
this PR as it requires a decision around expected FP environment and/or
the necessity of the existing masking to begin with.
I've also renamed NUM_TO_VECTOR to NUM_TO_VEC so that "VEC" always
refers to "3 float values" and for consistency with ADD/etc.
Note: ADD_VEC input forwarding is currently performed unconditionally;
it may or may not increase the spills that can't be reloaded from the
stack.
On A64 this makes the Taylor series computation a tiny bit faster
(11.3ns => 11.0ns) as it removes the redundant ins instructions along
the NUM_TO_VEC path. Curiously, the optimization of forwarding
TAG_VECTOR input to arithmetic instructions actually has a small penalty
as without it this PR runs at 10.9 ns. I don't know if this is a
property of the benchmark though, as I just noticed that in this
benchmark type inference actually fails to infer parts of the
computation as a vector op. If desired I will happily omit this part of
the change and we can explore that separately.
This change replaces scalar versions of vector opcodes for A64 with
actual vector instructions.
We take the approach similar to X64: patch last component with zero,
perform the math, patch last component with type tag. I'm hoping that in
the future the type tag will be placed separately (separate IR opcode?)
because right now chains of math operations result in excessive type tag
operations.
To patch the type tag without always keeping a mask in a register,
ins.4s instructions can be used; unfortunately it's only capable of
patching a register in-place, so we need an extra register copy in case
it's not last-use. Usually it's last-use so the patch is free; probably
with IR rework mentioned above all of this can be improved (e.g.
load-with-patch will never need to copy).
~It's not 100% clear if we *have* to patch type tag: Apple does preserve
denormals but we'd need to benchmark this to see if there's an actual
performance impact. But for now we're playing it safe.~
This was tested by running the conformance tests, and new opcode
implementations were checked by comparing the result with
https://armconverter.com/.
Performance testing is complicated by the fact that OSS Luau doesn't
support vector constructor out of the box, and other limitations of
codegen. I've hacked vector constructor/type into REPL and confirmed
that on a test that calls this function in a loop (not inlined):
```
function fma(a: vector, b: vector, c: vector)
return a * b + c
end
```
... this PR improves performance by ~6% (note that probably most of the
overhead here is the call dispatch; I didn't want to brave testing a
more complex expression). The assembly for an individual operation
changes as follows:
Before:
```
# %14 = MUL_VEC %12, %13 ; useCount: 2, lastUse: %22
dup s29,v31.s[0]
dup s28,v30.s[0]
fmul s29,s29,s28
ins v31.s[0],v29.s[0]
dup s29,v31.s[1]
dup s28,v30.s[1]
fmul s29,s29,s28
ins v31.s[1],v29.s[0]
dup s29,v31.s[2]
dup s28,v30.s[2]
fmul s29,s29,s28
ins v31.s[2],v29.s[0]
```
After:
```
# %14 = MUL_VEC %12, %13 ; useCount: 2, lastUse: %22
ins v31.s[3],w31
ins v30.s[3],w31
fmul v31.4s,v31.4s,v30.4s
movz w17,#4
ins v31.s[3],w17
```
**edit** final form (see comments):
```
# %14 = MUL_VEC %12, %13 ; useCount: 2, lastUse: %22
fmul v31.4s,v31.4s,v30.4s
movz w17,#4
ins v31.s[3],w17
```
# What's changed?
* Compiler now targets bytecode version 5 by default, this includes
support for vector type literals and sub/div opcodes with a constant on
lhs
### New Type Solver
* Normalizer type inhabitance check has been optimized
* Added ability to reduce cyclic `and`/`or` type families
### Native Code Generation
* `CodeGen::compile` now returns more specific causes of a code
generation failure
* Fixed linking issues on platforms that don't support unwind frame data
registration
---
### Internal Contributors
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
New solver
* Fix bugs where bidirectional type inference would fail to take effect
at the proper stage.
* Improve inference of mutually recursive functions
* Fix crashes
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
# What's changed?
### Native Code Generation
* Fixed an UAF relating to reusing a hash key after a weak table has
undergone some GC.
* Fixed a bounds check on arm64 to allow access to the last byte of a
buffer.
### New Type Solver
* Type states now preserves error-suppression, i.e. `local x: any = 5`
and `x.foo` does not error.
* Made error-suppression logic in subtyping more accurate.
* Subtyping now knows how to reduce type families.
* Fixed function call overload resolution so that the return type
resolves to the correct overload.
* Fixed a case where we attempted to reduce irreducible type families a
few too many times, leading to duplicate errors.
* Type checker needs to type check annotations in function signatures to
be able to report errors relating to those annotations.
* Fixed an UAF from a pointer to stack-allocated data in Subtyping's
`explainReasonings`.
### Nonstrict Type Checker
* Fixed a crash when calling a checked function of the form `math.abs`
with an incorrect argument type.
* Fixed a crash when calling a checked function with a number of
arguments that did not exactly match the number of parameters required.
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
GitHub just released support for M1 runners for open source projects via
GHA. It can be selected by using macos-14 OS to run on.
This is very valuable because until now, Luau wasn't tested in OSS CI on
Arm64, despite the presence of a fairly large volume of AArch64 specific
code courtesy of A64 codegen. This change fixes that.
In the future we could also expand codecov reporting as it seems to work
well but it needs a little bit of cleanup for the build in question so
let's start small.
Currently calling `luaL_sandbox` on Lua instance without loaded strlib
causes crash (assertion).
It happens because inside `luaL_sandbox` there is no check that
metatable for strings is present.
# What's changed?
* Check interrupt handler inside the pattern match engine to eliminate
potential for programs to hang during string library function execution.
* Allow iteration over table properties to pass the old type solver.
### Native Code Generation
* Use in-place memory operands for math library operations on x64.
* Replace opaque bools with separate enum classes in IrDump to improve
code maintainability.
* Translate operations on inferred vectors to IR.
* Enable support for debugging native-compiled functions in Roblox
Studio.
### New Type Solver
* Rework type inference for boolean and string literals to introduce
bounded free types (bounded below by the singleton type, and above by
the primitive type) and reworked primitive type constraint to decide
which is the appropriate type for the literal.
* Introduce `FunctionCheckConstraint` to handle bidirectional
typechecking for function calls, pushing the expected parameter types
from the function onto the arguments.
* Introduce `union` and `intersect` type families to compute deferred
simplified unions and intersections to be employed by the constraint
generation logic in the new solver.
* Implement support for expanding the domain of local types in
`Unifier2`.
* Rework type inference for iteration variables bound by for in loops to
use local types.
* Change constraint blocking logic to use a set to prevent accidental
re-blocking.
* Add logic to detect missing return statements in functions.
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
### What's changed?
* Syntax for [read-only and write-only
properties](https://github.com/luau-lang/rfcs/pull/15) is now parsed,
but is not yet supported in typechecking
### New Type Solver
* `keyof` and `rawkeyof` type operators have been updated to match final
text of the [RFC](https://github.com/luau-lang/rfcs/pull/16)
* Fixed issues with cyclic type families that were generated for mutable
loop variables
### Native Code Generation
* Fixed inference for number / vector operation that caused an
unnecessary VM assist
---
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
# Old Solver:
- Fix a bug in the old solver where a user could use the keyword
`typeof` as the name of a type alias.
- Fix stringification of scientific notation to omit a trailing decimal
place when not followed by a digit e.g. `1.e+20` -> `1e+20`
# New Solver
- Continuing work on the New non-strict mode
- Introduce `keyof` and `rawkeyof` type function for acquiring the type
of all keys in a table or class
(https://github.com/luau-lang/rfcs/pull/16)
---
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
Co-authored-by: Vighnesh Vijay <vvijay@roblox.com>
---------
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
# What's changed?
* Fix up the `std::iterator_traits` definitions for some Luau data
structures.
* Replace some of the usages of `std::unordered_set` and
`std::unordered_map` with Luau-provided data structures to increase
performance and reduce overall number of heap allocations.
* Update some of the documentation links in comments throughout the
codebase to correctly point to the moved repository.
* Expanded JSON encoder for AST to support singleton types.
* Fixed a bug in `luau-analyze` where exceptions in the last module
being checked during multithreaded analysis would not be rethrown.
### New type solver
* Introduce a `refine` type family to handle deferred refinements during
type inference, replacing the old `RefineConstraint`.
* Continued work on the implementation of type states, fixing some known
bugs/blockers.
* Added support for variadic functions in new non-strict mode, enabling
broader support for builtins and the Roblox API.
### Internal Contributors
Co-authored-by: Aaron Weiss <aaronweiss@roblox.com>
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>
---------
Co-authored-by: Alexander McCord <amccord@roblox.com>
Co-authored-by: Andy Friesen <afriesen@roblox.com>
Co-authored-by: Vighnesh <vvijay@roblox.com>
Co-authored-by: Aviral Goel <agoel@roblox.com>
Co-authored-by: David Cope <dcope@roblox.com>
Co-authored-by: Lily Brown <lbrown@roblox.com>
Co-authored-by: Vyacheslav Egorov <vegorov@roblox.com>