mirror of
https://github.com/luau-lang/luau.git
synced 2024-11-16 06:45:44 +08:00
d40107dc60
It seems more consistent and unambiguous if we mark RFCs as being implemented when the implementation lands instead of expecting to cross-reference documentation. That also makes it easier for us to flag stale RFCs.
72 lines
4.3 KiB
Markdown
72 lines
4.3 KiB
Markdown
# string.pack/unpack/packsize from Lua 5.3
|
|
|
|
> Note: this RFC was adapted from an internal proposal that predates RFC process
|
|
|
|
**Status**: Implemented
|
|
|
|
## Summary
|
|
|
|
Add string pack/unpack from Lua 5.3 for binary interop, with small tweaks to format specification to make format strings portable.
|
|
|
|
## Motivation
|
|
|
|
While the dominant usecase for Luau is a game programming language, for backend work it's sometimes the case that developers need to work with formats defined outside of Roblox. When these are structured as JSON, it's easy, but if they are binary, it's not. Additionally for the game programming, often developers end up optimizing their data transmission using custom binary codecs where they know the range of the data (e.g. it's much more efficient to send a number using 1 byte if you know the number is between 0 and 1 and 8 bits is enough, but RemoteEvent/etc won't do it for you because it guarantees lossless roundtrip). For both working with external data and optimizing data transfer, it would be nice to have a way to work with binary data.
|
|
|
|
This is doable in Luau using `string.byte`/`string.char`/`bit32` library/etc. but tends to be a bit cumbersome. Lua 5.3 provides functions `string.pack`/`string.unpack`/`string.packsize` that, while not solving 100% of the problems, often make working with binary much easier and much faster. This proposal suggests adding them to Luau - this will both further our goal to be reasonably compatible with latest Lua versions, and make it easier for developers to write some types of code.
|
|
|
|
## Design
|
|
|
|
Concretely, this proposal suggests adding the following functions:
|
|
|
|
```
|
|
string.pack (fmt, v1, v2, ···)
|
|
```
|
|
|
|
Returns a binary string containing the values v1, v2, etc. packed (that is, serialized in binary form) according to the format string fmt.
|
|
|
|
```
|
|
string.packsize (fmt)
|
|
```
|
|
|
|
Returns the size of a string resulting from string.pack with the given format. The format string cannot have the variable-length options 's' or 'z'.
|
|
|
|
```
|
|
string.unpack (fmt, s [, pos])
|
|
```
|
|
|
|
Returns the values packed in string s (see string.pack) according to the format string fmt. An optional pos marks where to start reading in s (default is 1). After the read values, this function also returns the index of the first unread byte in s.
|
|
|
|
The format string is a sequence of characters that define the data layout that is described here in full: https://www.lua.org/manual/5.3/manual.html#6.4.2. We will adopt this wholesale, but we will guarantee that the resulting code is cross-platform by:
|
|
|
|
a) Ensuring native endian is little endian (de-facto true for all our platforms)
|
|
b) Fixing sizes of native formats to 2b short, 4b int, 8b long
|
|
c) Treating `size_t` in context of `T` and `s` formats as a 32-bit integer
|
|
|
|
Of course, the functions are memory-safe; if the input string is too short to provide all relevant data they will fail with "data string is too short" error.
|
|
|
|
This may seem slightly unconventional but it's very powerful and expressive, in much the same way format strings and regular expressions are :) Here's a basic example of how you might transmit a 3-component vector with this:
|
|
|
|
```
|
|
-- returns a 24-byte string with 64-bit double encoded three times, similar to how we'd replicate 3 raw numbers
|
|
string.pack("ddd", x, y, z)
|
|
|
|
-- returns a 12-byte string with 32-bit float encoded three times, similar to how we'd replicate Vector3
|
|
string.pack("fff", x, y, z)
|
|
|
|
-- returns a 3-byte string with each value stored in 8 bits
|
|
-- assumes -1..1 range; this code doesn't round the right way because I'm too lazy
|
|
string.pack("bbb", x * 127, y * 127, z * 127)
|
|
```
|
|
|
|
The unpacking of the data is symmetrical - using the same format string and `string.unpack` you get the encoded data back.
|
|
|
|
## Drawbacks
|
|
|
|
The format specification is somewhat arbitrary and is likely to be unfamiliar to people who come with prior experience in other languages (having said that, this feature closely follows equivalent functionality from Ruby).
|
|
|
|
The implementation of string pack/unpack requires yet another format string matcher, which increases complexity of the builtin libraries and static analysis (since we need to provide linting for another format string syntax).
|
|
|
|
## Alternatives
|
|
|
|
We could force developers to rely on existing functionality for string packing; it is possible to replicate this proposal in a library, although at a much reduced performance.
|