diff --git a/docs/_pages/performance.md b/docs/_pages/performance.md index 4e87ec5b..b4fd3a7b 100644 --- a/docs/_pages/performance.md +++ b/docs/_pages/performance.md @@ -167,3 +167,11 @@ While Luau uses an incremental garbage collector, once per each collector cycle Normally objects that have been modified after the GC marked them in an incremental mark phase need to be rescanned during atomic phase, so frequent modifications of existing tables may result in a slow atomic step. To address this, we run a "remark" step where we traverse objects that have been modified after being marked once more (incrementally); additionally, the write barrier that triggers for object modifications changes the transition logic during remark phase to reduce the probability that the object will need to be rescanned. Another source of scalability challenges is coroutines. Writes to coroutine stacks don't use a write barrier, since that's prohibitively expensive as they are too frequent. This means that coroutine stacks need to be traversed during atomic step, so applications with many coroutines suffer large atomic pauses. To address this, we implement incremental marking of coroutines: marking a coroutine makes it "inactive" and resuming a coroutine (or pushing extra objects on the coroutine stack via C API) makes it "active". Atomic step only needs to traverse active coroutines again, which reduces the cost of atomic step by effectively making coroutine collection incremental as well. + +While large tables can be a problem for incremental GC in general since currently marking a single object is indivisible, large weak tables are a unique challenge because they also need to be processed during atomic phase, and the main use case for weak tables - object caches - may result in tables with large capacity but few live objects in long-running applications that exhibit bursts of activity. To address this, weak tables in Luau can be marked as "shrinkable" by including `s` as part of `__mode` string, which results in weak tables being resized to the optimal capacity during GC. This option may result in missing keys during table iteration if the table is resized while iteration is in progress and as such is only recommended for use in specific circumstances. + +## Optimized garbage collector sweeping + +The incremental garbage collector in Luau runs three phases for each cycle: mark, atomic and sweep. Mark incrementally traverses all live objects, atomic finishes various operations that need to happen without mutator intervention (see previous section), and sweep traverses all objects in the heap, reclaiming memory used by dead objects and performing minor fixup for live objects. While objects allocated during the mark phase are traversed in the same cycle and thus may get reclaimed, objects allocated during the sweep phase are considered live. Because of this, the faster the sweep phase completes, the less garbage will accumulate; and, of course, the less time sweeping takes the less overhead there is from this phase of garbage collection on the process. + +Since sweeping traverses the whole heap, we maximize the efficiency of this traversal by allocating garbage-collected objects of the same size in 16 KB pages, and traversing each page at a time, which is otherwise known as a paged sweeper. This ensures good locality of reference as consecutively swept objects are contiugous in memory, and allows us to spend no memory for each object on sweep-related data or allocation metadata, since paged sweeper doesn't need to be able to free objects without knowing which page they are in. Compared to linked list based sweeping that Lua/LuaJIT implement, paged sweeper is 2-3x faster, and saves 16 bytes per object on 64-bit platforms.