luau/CodeGen at 80928acb92d1e4b6db16bada6d21b1fb6fa66265 - luau

mirror of https://github.com/luau-lang/luau.git synced 2024-11-15 06:15:44 +08:00

History

Arseny Kapoulkine 80928acb92 CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171 ) Instead of patching the tag component with TVECTOR in every instruction that produces a vector value, we now use a separate IR instruction to do this. This reduces implementation redundancy, but more importantly allows for a class of optimizations: - NUM_TO_VECTOR previously patched the component unconditionally but the result was used only in MUL/DIV_VEC instructions that ignore it anyway; we can now remove this. - ADD_VEC et al can now forward the source of TAG_VECTOR instruction of either input; this shortens the latency chain and in the future could allow us to generate optimal vector instruction sequence once the temporary stores are marked as dead. - In the future on X64, ADD_VEC et al will be able to analyze the input instruction and remove tag masking conditionally. This is not part of this PR as it requires a decision around expected FP environment and/or the necessity of the existing masking to begin with. I've also renamed NUM_TO_VECTOR to NUM_TO_VEC so that "VEC" always refers to "3 float values" and for consistency with ADD/etc. Note: ADD_VEC input forwarding is currently performed unconditionally; it may or may not increase the spills that can't be reloaded from the stack. On A64 this makes the Taylor series computation a tiny bit faster (11.3ns => 11.0ns) as it removes the redundant ins instructions along the NUM_TO_VEC path. Curiously, the optimization of forwarding TAG_VECTOR input to arithmetic instructions actually has a small penalty as without it this PR runs at 10.9 ns. I don't know if this is a property of the benchmark though, as I just noticed that in this benchmark type inference actually fails to infer parts of the computation as a vector op. If desired I will happily omit this part of the change and we can explore that separately.	2024-02-21 07:06:11 -08:00
..
include	CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171 )	2024-02-21 07:06:11 -08:00
src	CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171 )	2024-02-21 07:06:11 -08:00

Arseny Kapoulkine 80928acb92

CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171 )

Instead of patching the tag component with TVECTOR in every instruction
that produces a vector value, we now use a separate IR instruction to do
this. This reduces implementation redundancy, but more importantly
allows for a class of optimizations:

- NUM_TO_VECTOR previously patched the component unconditionally but the
result was used only in MUL/DIV_VEC instructions that ignore it anyway;
we can now remove this.

- ADD_VEC et al can now forward the source of TAG_VECTOR instruction of
either input; this shortens the latency chain and in the future could
allow us to generate optimal vector instruction sequence once the
temporary stores are marked as dead.

- In the future on X64, ADD_VEC et al will be able to analyze the input
instruction and remove tag masking conditionally. This is not part of
this PR as it requires a decision around expected FP environment and/or
the necessity of the existing masking to begin with.

I've also renamed NUM_TO_VECTOR to NUM_TO_VEC so that "VEC" always
refers to "3 float values" and for consistency with ADD/etc.

Note: ADD_VEC input forwarding is currently performed unconditionally;
it may or may not increase the spills that can't be reloaded from the
stack.

On A64 this makes the Taylor series computation a tiny bit faster
(11.3ns => 11.0ns) as it removes the redundant ins instructions along
the NUM_TO_VEC path. Curiously, the optimization of forwarding
TAG_VECTOR input to arithmetic instructions actually has a small penalty
as without it this PR runs at 10.9 ns. I don't know if this is a
property of the benchmark though, as I just noticed that in this
benchmark type inference actually fails to infer parts of the
computation as a vector op. If desired I will happily omit this part of
the change and we can explore that separately.

2024-02-21 07:06:11 -08:00

include

CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171 )

2024-02-21 07:06:11 -08:00

src

CodeGen: Extract all vector tag patching into TAG_VECTOR (#1171 )

2024-02-21 07:06:11 -08:00