luau/tests/StringUtils.test.cpp

128 lines
3.9 KiB
C++
Raw Permalink Normal View History

// This file is part of the Luau programming language and is licensed under MIT License; see LICENSE.txt for details
#include "Luau/StringUtils.h"
#include "doctest.h"
#include <iostream>
namespace
{
using LevenshteinMatrix = std::vector<std::vector<size_t>>;
std::string format(std::string_view a, std::string_view b, size_t expected, size_t actual)
{
return "Distance of '" + std::string(a) + "' and '" + std::string(b) + "': expected " + std::to_string(expected) + ", got " +
std::to_string(actual);
}
// Each call to this function is not one test, but instead actually runs tests (A.size() * B.size()) + 2 times.
void compareLevenshtein(LevenshteinMatrix distances, std::string_view a, std::string_view b)
{
for (size_t x = 0; x <= a.size(); ++x)
{
for (size_t y = 0; y <= b.size(); ++y)
{
std::string_view currentA = a.substr(0, x);
std::string_view currentB = b.substr(0, y);
size_t actual = Luau::editDistance(currentA, currentB);
size_t expected = distances[x][y];
CHECK_MESSAGE(actual == expected, format(currentA, currentB, expected, actual));
}
}
}
} // namespace
TEST_SUITE_BEGIN("StringUtilsTest");
#if 0
// This unit test is only used to measure how performant the current levenshtein distance algorithm is.
// It is entirely ok to submit this, but keep #if 0.
TEST_CASE("BenchmarkLevenshteinDistance")
{
// For reference: running this benchmark on a Macbook Pro 16 takes ~250ms.
int count = 1'000'000;
// specifically chosen because they:
// - are real words,
// - have common prefix and suffix, and
// - are sufficiently long enough to stress test with
std::string_view a("Intercalate");
std::string_view b("Interchangeable");
auto start = std::chrono::steady_clock::now();
for (int i = 0; i < count; ++i)
Luau::editDistance(a, b);
auto end = std::chrono::steady_clock::now();
auto time = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
Sync to upstream/release/588 (#992) Type checker/autocomplete: * `Luau::autocomplete` no longer performs typechecking internally, make sure to run `Frontend::check` before performing autocomplete requests * Autocomplete string suggestions without "" are now only suggested inside the "" * Autocomplete suggestions now include `function (anonymous autofilled)` key with a full suggestion for the function expression (with arguments included) stored in `AutocompleteEntry::insertText` * `AutocompleteEntry::indexedWithSelf` is provided for function call suggestions made with `:` * Cyclic modules now see each other type exports as `any` to prevent memory use-after-free (similar to module return type) Runtime: * Updated inline/loop unroll cost model to better handle assignments (Fixes https://github.com/Roblox/luau/issues/978) * `math.noise` speed was improved by ~30% * `table.concat` speed was improved by ~5-7% * `tonumber` and `tostring` now have fastcall paths that execute ~1.5x and ~2.5x faster respectively (fixes #777) * Fixed crash in `luaL_typename` when index refers to a non-existing value * Fixed potential out of memory scenario when using `string.sub` or `string.char` in a loop * Fixed behavior of some fastcall builtins when called without arguments under -O2 to match original functions * Support for native code execution in VM is now enabled by default (note: native code still has to be generated explicitly) * `Codegen::compile` now accepts `CodeGen_OnlyNativeModules` flag. When set, only modules that have a `--!native` hot-comment at the top will be compiled to native code In our new typechecker: * Generic type packs are no longer considered to be variadic during unification * Timeout and cancellation now works in new solver * Fixed false positive errors around 'table' and 'function' type refinements * Table literals now use covariant unification rules. This is sound since literal has no type specified and has no aliases * Fixed issues with blocked types escaping the constraint solver * Fixed more places where error messages that should've been suppressed were still reported * Fixed errors when iterating over a top table type In our native code generation (jit): * 'DebugLuauAbortingChecks' flag is now supported on A64 * LOP_NEWCLOSURE has been translated to IR
2023-07-28 23:13:53 +08:00
MESSAGE("Running levenshtein distance ", count, " times took ", time.count(), "ms");
}
#endif
TEST_CASE("LevenshteinDistanceKittenSitting")
{
LevenshteinMatrix distances{
{0, 1, 2, 3, 4, 5, 6, 7}, // S I T T I N G
{1, 1, 2, 3, 4, 5, 6, 7}, // K
{2, 2, 1, 2, 3, 4, 5, 6}, // I
{3, 3, 2, 1, 2, 3, 4, 5}, // T
{4, 4, 3, 2, 1, 2, 3, 4}, // T
{5, 5, 4, 3, 2, 2, 3, 4}, // E
{6, 6, 5, 4, 3, 3, 2, 3}, // N
};
compareLevenshtein(distances, "kitten", "sitting");
}
TEST_CASE("LevenshteinDistanceSaturdaySunday")
{
LevenshteinMatrix distances{
{0, 1, 2, 3, 4, 5, 6}, // S U N D A Y
{1, 0, 1, 2, 3, 4, 5}, // S
{2, 1, 1, 2, 3, 3, 4}, // A
{3, 2, 2, 2, 3, 4, 4}, // T
{4, 3, 2, 3, 3, 4, 5}, // U
{5, 4, 3, 3, 4, 4, 5}, // R
{6, 5, 4, 4, 3, 4, 5}, // D
{7, 6, 5, 5, 4, 3, 4}, // A
{8, 7, 6, 6, 5, 4, 3}, // Y
};
compareLevenshtein(distances, "saturday", "sunday");
}
TEST_CASE("EditDistanceIsAgnosticOfArgumentOrdering")
{
CHECK_EQ(Luau::editDistance("blox", "block"), Luau::editDistance("block", "blox"));
}
TEST_CASE("AreWeUsingDistanceWithAdjacentTranspositionsAndNotOptimalStringAlignment")
{
size_t distance = Luau::editDistance("CA", "ABC");
CHECK_EQ(distance, 2);
}
Sync to upstream/release/572 (#899) * Fixed exported types not being suggested in autocomplete * `T...` is now convertible to `...any` (Fixes https://github.com/Roblox/luau/issues/767) * Fixed issue with `T?` not being convertible to `T | T` or `T?` (sometimes when internal pointer identity is different) * Fixed potential crash in missing table key error suggestion to use a similar existing key * `lua_topointer` now returns a pointer for strings C++ API Changes: * `prepareModuleScope` callback has moved from TypeChecker to Frontend * For LSPs, AstQuery functions (and `isWithinComment`) can be used without full Frontend data A lot of changes in our two experimental components as well. In our work on the new type-solver, the following issues were fixed: * Fixed table union and intersection indexing * Correct custom type environments are now used * Fixed issue with values of `free & number` type not accepted in numeric operations And these are the changes in native code generation (JIT): * arm64 lowering is almost complete with support for 99% of IR commands and all fastcalls * Fixed x64 assembly encoding for extended byte registers * More external x64 calls are aware of register allocator * `math.min`/`math.max` with more than 2 arguments are now lowered to IR as well * Fixed correctness issues with `math` library calls with multiple results in variadic context and with x64 register conflicts * x64 register allocator learnt to restore values from VM memory instead of always using stack spills * x64 exception unwind information now supports multiple functions and fixes function start offset in Dwarf2 info
2023-04-15 02:06:22 +08:00
TEST_CASE("EditDistanceSupportsUnicode")
{
// ASCII character
CHECK_EQ(Luau::editDistance("A block", "X block"), 1);
// UTF-8 2 byte character
CHECK_EQ(Luau::editDistance("A block", "À block"), 2);
// UTF-8 3 byte character
CHECK_EQ(Luau::editDistance("A block", "⪻ block"), 3);
// UTF-8 4 byte character
CHECK_EQ(Luau::editDistance("A block", "𒋄 block"), 4);
// UTF-8 extreme characters
CHECK_EQ(Luau::editDistance("A block", "R̴̨̢̟̚ŏ̶̳̳͚́ͅb̶̡̻̞̐̿ͅl̸̼͝ợ̷̜͓̒̏͜͝ẍ̴̝̦̟̰́̒́̌ block"), 85);
}
TEST_SUITE_END();