For now just do this in strict mode. This will help us track performance
over time, although for now the behavior is going to keep changing so
it's not going to be a fully solid metric for a few weeks.
We will now run luau with --codegen during benchmark runs and collect
the data into separate JSON. Note that we don't yet have the historical
data for these, which will be backfilled later.
We don't need to run any cachegrind benchmarks in benchmark-dev, since
benchmark uses our new callgrind setup instead.
Also removes prototyping filters that we no longer need from all builds.
This change adds another file for benchmarking luau-analyze and sets up
benchmarks for both non-strict/strict modes for analysis and all three
optimization levels for compilation performance.
To avoid issues with race conditions on repository update we do all this
in the same job in benchmark.yml.
To be able to benchmark both modes from a single file, luau-analyze
gains --mode argument which allows to override the default typechecking
mode. Not sure if we'll want this to be a hard override on top of the
module-specified mode in the future, but this works for now.
Since callgrind allows to control stats collection from the guest, this
allows us to reset the collection right before the benchmark starts.
This change exposes this to the benchmark runner and integrates
callgrind data parsing into bench.py, so that we can run bench.py with
--callgrind argument and, as long as the runner was built with callgrind
support, we get instruction counts from the run.
We convert instruction counts to seconds using 10G instructions/second
rate; there's no correct way to do this without simulating the full CPU
pipeline but it results in time units on a similar scale to real runs.