mirror of
https://github.com/luau-lang/luau.git
synced 2024-11-15 14:25:44 +08:00
9aa82c6fb9
When the input is a constant, we use a fairly inefficient sequence of fmov+fcvt+dup or, when the double isn't encodable in fmov, adr+ldr+fcvt+dup. Instead, we can use the same lowering as X64 when the input is a constant, and load the vector from memory. However, if the constant is encodable via fmov, we can use a vector fmov instead (which is just one instruction and doesn't need constant space). Fortunately the bit encoding of fmov for 32-bit floating point numbers matches that of 64-bit: the decoding algorithm is a little different because it expands into a larger exponent, but the values are compatible, so if a double can be encoded into a scalar fmov with a given abcdefgh pattern, the same pattern should encode the same float; due to the very limited number of mantissa and exponent bits, all values that are encodable are also exact in both 32-bit and 64-bit floats. This strategy is ~same as what gcc uses. For complex vectors, we previously used 4 instructions and 8 bytes of constant storage, and now we use 2 instructions and 16 bytes of constant storage, so the memory footprint is the same; for simple vectors we just need 1 instruction (4 bytes). clang lowers vector constants a little differently, opting to synthesize a 64-bit integer using 4 instructions (mov/movk) and then move it to the vector register - this requires 5 instructions and 20 bytes, vs ours/gcc 2 instructions and 8+16=24 bytes. I tried a simpler version of this that would be more compact - synthesize a 32-bit integer constant with mov+movk, and move it to vector register via dup.4s - but this was a little slower on M2, so for now we prefer the slightly larger version as it's not a regression vs current implementation. On the vector approximation benchmark we get: - Before this PR (flag=false): ~7.85 ns/op - After this PR (flag=true): ~7.74 ns/op - After this PR, with 0.125 instead of 0.123 in the benchmark code (to use fmov): ~7.52 ns/op - Not part of this PR, but the mov/dup strategy described above: ~8.00 ns/op |
||
---|---|---|
.. | ||
conformance | ||
require | ||
AssemblyBuilderA64.test.cpp | ||
AssemblyBuilderX64.test.cpp | ||
AstJsonEncoder.test.cpp | ||
AstQuery.test.cpp | ||
AstQueryDsl.cpp | ||
AstQueryDsl.h | ||
AstVisitor.test.cpp | ||
Autocomplete.test.cpp | ||
BuiltinDefinitions.test.cpp | ||
ClassFixture.cpp | ||
ClassFixture.h | ||
CodeAllocator.test.cpp | ||
Compiler.test.cpp | ||
Config.test.cpp | ||
Conformance.test.cpp | ||
ConstraintGeneratorFixture.cpp | ||
ConstraintGeneratorFixture.h | ||
ConstraintSolver.test.cpp | ||
CostModel.test.cpp | ||
DataFlowGraph.test.cpp | ||
DenseHash.test.cpp | ||
DiffAsserts.cpp | ||
DiffAsserts.h | ||
Differ.test.cpp | ||
Error.test.cpp | ||
Fixture.cpp | ||
Fixture.h | ||
Frontend.test.cpp | ||
InsertionOrderedMap.test.cpp | ||
IostreamOptional.h | ||
IrBuilder.test.cpp | ||
IrCallWrapperX64.test.cpp | ||
IrLowering.test.cpp | ||
IrRegAllocX64.test.cpp | ||
JsonEmitter.test.cpp | ||
Lexer.test.cpp | ||
Linter.test.cpp | ||
LValue.test.cpp | ||
main.cpp | ||
Module.test.cpp | ||
NonstrictMode.test.cpp | ||
NonStrictTypeChecker.test.cpp | ||
Normalize.test.cpp | ||
NotNull.test.cpp | ||
Parser.test.cpp | ||
RegisterCallbacks.cpp | ||
RegisterCallbacks.h | ||
Repl.test.cpp | ||
RequireByString.test.cpp | ||
RequireTracer.test.cpp | ||
RuntimeLimits.test.cpp | ||
ScopedFlags.h | ||
Set.test.cpp | ||
Simplify.test.cpp | ||
StringUtils.test.cpp | ||
Subtyping.test.cpp | ||
Symbol.test.cpp | ||
ToDot.test.cpp | ||
TopoSort.test.cpp | ||
ToString.test.cpp | ||
Transpiler.test.cpp | ||
TxnLog.test.cpp | ||
TypeFamily.test.cpp | ||
TypeInfer.aliases.test.cpp | ||
TypeInfer.annotations.test.cpp | ||
TypeInfer.anyerror.test.cpp | ||
TypeInfer.builtins.test.cpp | ||
TypeInfer.cfa.test.cpp | ||
TypeInfer.classes.test.cpp | ||
TypeInfer.definitions.test.cpp | ||
TypeInfer.functions.test.cpp | ||
TypeInfer.generics.test.cpp | ||
TypeInfer.intersectionTypes.test.cpp | ||
TypeInfer.loops.test.cpp | ||
TypeInfer.modules.test.cpp | ||
TypeInfer.negations.test.cpp | ||
TypeInfer.oop.test.cpp | ||
TypeInfer.operators.test.cpp | ||
TypeInfer.primitives.test.cpp | ||
TypeInfer.provisional.test.cpp | ||
TypeInfer.refinements.test.cpp | ||
TypeInfer.singletons.test.cpp | ||
TypeInfer.tables.test.cpp | ||
TypeInfer.test.cpp | ||
TypeInfer.tryUnify.test.cpp | ||
TypeInfer.typePacks.test.cpp | ||
TypeInfer.typestates.test.cpp | ||
TypeInfer.unionTypes.test.cpp | ||
TypeInfer.unknownnever.test.cpp | ||
TypePack.test.cpp | ||
TypePath.test.cpp | ||
TypeVar.test.cpp | ||
Unifier2.test.cpp | ||
Variant.test.cpp | ||
VecDeque.test.cpp | ||
VisitType.test.cpp |