luau/CodeGen/include/Luau
Arseny Kapoulkine 9aa82c6fb9
CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants (#1194)
When the input is a constant, we use a fairly inefficient sequence of
fmov+fcvt+dup or, when the double isn't encodable in fmov,
adr+ldr+fcvt+dup.

Instead, we can use the same lowering as X64 when the input is a
constant, and load the vector from memory. However, if the constant is
encodable via fmov, we can use a vector fmov instead (which is just one
instruction and doesn't need constant space).

Fortunately the bit encoding of fmov for 32-bit floating point numbers
matches that of 64-bit: the decoding algorithm is a little different
because it expands into a larger exponent, but the values are
compatible, so if a double can be encoded into a scalar fmov with a
given abcdefgh pattern, the same pattern should encode the same float;
due to the very limited number of mantissa and exponent bits, all values
that are encodable are also exact in both 32-bit and 64-bit floats.

This strategy is ~same as what gcc uses. For complex vectors, we
previously used 4 instructions and 8 bytes of constant storage, and now
we use 2 instructions and 16 bytes of constant storage, so the memory
footprint is the same; for simple vectors we just need 1 instruction (4
bytes).

clang lowers vector constants a little differently, opting to synthesize
a 64-bit integer using 4 instructions (mov/movk) and then move it to the
vector register - this requires 5 instructions and 20 bytes, vs ours/gcc
2 instructions and 8+16=24 bytes. I tried a simpler version of this that
would be more compact - synthesize a 32-bit integer constant with
mov+movk, and move it to vector register via dup.4s - but this was a
little slower on M2, so for now we prefer the slightly larger version as
it's not a regression vs current implementation.

On the vector approximation benchmark we get:

- Before this PR (flag=false): ~7.85 ns/op
- After this PR (flag=true): ~7.74 ns/op
- After this PR, with 0.125 instead of 0.123 in the benchmark code (to
use fmov): ~7.52 ns/op
- Not part of this PR, but the mov/dup strategy described above: ~8.00
ns/op
2024-03-13 12:56:11 -07:00
..
AddressA64.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
AssemblyBuilderA64.h CodeGen: Improve lowering of NUM_TO_VEC on A64 for constants (#1194) 2024-03-13 12:56:11 -07:00
AssemblyBuilderX64.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
BytecodeAnalysis.h Sync to upstream/release/605 (#1118) 2023-12-01 23:46:57 -08:00
BytecodeSummary.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
CodeAllocator.h Sync to upstream/release/590 (#1008) 2023-08-11 07:42:37 -07:00
CodeBlockUnwind.h Sync to upstream/release/576 (#928) 2023-05-12 10:50:47 -07:00
CodeGen.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
CodeGenCommon.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
ConditionA64.h Sync to upstream/release/573 (#903) 2023-04-21 15:14:26 -07:00
ConditionX64.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
IrAnalysis.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
IrBuilder.h Sync to upstream/release/605 (#1118) 2023-12-01 23:46:57 -08:00
IrCallWrapperX64.h Sync to upstream/release/572 (#899) 2023-04-14 11:06:22 -07:00
IrData.h Sync to upstream/release/616 (#1184) 2024-03-08 16:47:53 -08:00
IrDump.h Sync to upstream/release/610 (#1154) 2024-01-26 19:20:56 -08:00
IrRegAllocX64.h Sync to upstream/release/596 (#1050) 2023-09-22 12:12:15 -07:00
IrUtils.h CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171) 2024-02-21 07:06:11 -08:00
IrVisitUseDef.h Sync to upstream/release/616 (#1184) 2024-03-08 16:47:53 -08:00
Label.h Sync to upstream/release/529 (#505) 2022-05-26 15:08:16 -07:00
OperandX64.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
OptimizeConstProp.h Sync to upstream/release/574 (#910) 2023-04-28 12:55:13 -07:00
OptimizeDeadStore.h Sync to upstream/release/616 (#1184) 2024-03-08 16:47:53 -08:00
OptimizeFinalX64.h Sync to upstream/release/563 (#833) 2023-02-10 11:40:38 -08:00
RegisterA64.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
RegisterX64.h Sync to upstream/release/613 (#1167) 2024-02-15 18:04:39 -08:00
UnwindBuilder.h Sync to upstream/release/592 (#1018) 2023-08-25 10:23:55 -07:00
UnwindBuilderDwarf2.h Sync to upstream/release/592 (#1018) 2023-08-25 10:23:55 -07:00
UnwindBuilderWin.h Sync to upstream/release/592 (#1018) 2023-08-25 10:23:55 -07:00