Features and Usage

Features

LuaSrcDiet features include the following:

Predefined default, --basic (token-only) and --maximum settings.
Avoid deleting a block comment with a certain message with --keep; this is for copyright or license texts.
Special handling for #! (shbang) lines and in functions, self implicit parameters.
Dumping of raw information using --dump-lexer and --dump-parser. See the samples directory.
A HTML plugin: outputs files that highlights globals and locals, useful for eliminating globals. See the samples directory.
An SLOC plugin: counts significant lines of Lua code, like SLOCCount.
Source and binary equivalence testing with --opt-srcequiv and --opt-binequiv.

List of optimizations:

Line endings are always normalized to LF, except those embedded in comments or strings.
--opt-comments: Removal of comments and comment blocks.
--opt-whitespace: Removal of whitespace, excluding end-of-line characters.
--opt-emptylines: Removal of empty lines.
--opt-eols: Removal of unnecessary end-of-line characters.
--opt-strings: Rewrite strings and long strings. See the samples directory.
--opt-numbers: Rewrite numbers. See the samples directory.
--opt-locals: Rename local variable names. Does not rename field or method names.
--opt-entropy: Tries to improve symbol entropy when renaming locals by calculating actual letter frequencies.
--opt-experimental: Apply experimental optimizations.

LuaSrcDiet tries to allow each option to be enabled or disabled separately, but they are not completely orthogonal.

If comment removal is disabled, LuaSrcDiet only removes trailing whitespace. Trailing whitespace is not removed in long strings, a warning is generated instead. If empty line removal is disabled, LuaSrcDiet keeps all significant code on the same lines. Thus, a user is able to debug using the original sources as a reference since the line numbering is unchanged.

String optimization deals mainly with optimizing escape sequences, but delimiters can be switched between single quotes and double quotes if the source size of the string can be reduced. For long strings and long comments, LuaSrcDiet also tries to reduce the = separators in the delimiters if possible. For number optimization, LuaSrcDiet saves space by trying to generate the shortest possible sequence, and in the process it does not produce “proper” scientific notation (e.g. 1.23e5) but does away with the decimal point (e.g. 123e3) instead.

The local variable name optimizer uses a full parser of Lua 5.1 source code, thus it can rename all local variables, including upvalues and function parameters. It should handle the implicit self parameter gracefully. In addition, local variable names are either renamed into the shortest possible names following English frequent letter usage or are arranged by calculating entropy with the --opt-entropy option. Variable names are reused whenever possible, reducing the number of unique variable names. For example, for LuaSrcDiet.lua (version 0.11.0), 683 local identifiers representing 88 unique names were optimized into 32 unique names, all which are one character in length, saving over 2600 bytes.

If you need some kind of reassurance that your app will still work at reduced size, see the section on verification below.

Usage

LuaSrcDiet needs a Lua 5.1.x (preferably Lua 5.1.4) binary to run. On Unix machines, one can use the following command line:

LuaSrcDiet myscript.lua -o myscript_.lua

On Windows machines, the above command line can be used on Cygwin, or you can run Lua with the LuaSrcDiet script like this:

lua LuaSrcDiet.lua myscript.lua -o myscript_.lua

When run without arguments, LuaSrcDiet prints a list of options. Also, you can check the Makefile for some examples of command lines to use. For example, for maximum code size reduction and maximum verbosity, use:

LuaSrcDiet --maximum --details myscript.lua -o myscript_.lua

Output Example

A sample output of LuaSrcDiet 0.11.0 for processing llex.lua at --maximum settings is as follows:

Statistics for: LuaSrcDiet.lua -> sample/LuaSrcDiet.lua

*** local variable optimization summary ***
----------------------------------------------------------
Variable          Unique   Decl.   Token    Size   Average
Types              Names   Count   Count   Bytes     Bytes
----------------------------------------------------------
Global                10       0      19      95      5.00
----------------------------------------------------------
Local (in)            88     153     683    3340      4.89
TOTAL (in)            98     153     702    3435      4.89
----------------------------------------------------------
Local (out)           32     153     683     683      1.00
TOTAL (out)           42     153     702     778      1.11
----------------------------------------------------------

*** lexer-based optimizations summary ***
--------------------------------------------------------------------
Lexical            Input   Input     Input  Output  Output    Output
Elements           Count   Bytes   Average   Count   Bytes   Average
--------------------------------------------------------------------
TK_KEYWORD           374    1531      4.09     374    1531      4.09
TK_NAME              795    3963      4.98     795    1306      1.64
TK_NUMBER             54      59      1.09      54      59      1.09
TK_STRING            152    1725     11.35     152    1717     11.30
TK_LSTRING             7    1976    282.29       7    1976    282.29
TK_OP                997    1092      1.10     997    1092      1.10
TK_EOS                 1       0      0.00       1       0      0.00
--------------------------------------------------------------------
TK_COMMENT           140    6884     49.17       1      18     18.00
TK_LCOMMENT            7    1723    246.14       0       0      0.00
TK_EOL               543     543      1.00     197     197      1.00
TK_SPACE            1270    2465      1.94     263     263      1.00
--------------------------------------------------------------------
Total Elements      4340   21961      5.06    2841    8159      2.87
--------------------------------------------------------------------
Total Tokens        2380   10346      4.35    2380    7681      3.23
--------------------------------------------------------------------

Overall, the file size is reduced by more than 9 kiB. Tokens in the above report can be classified into “real” or actual tokens, and “fake” or whitespace tokens. The number of “real” tokens remained the same. Short comments and long comments were completely eliminated. The number of line endings was reduced by 59, while all but 152 whitespace characters were optimized away. So, token separators (whitespace, including line endings) now takes up just 10 % of the total file size. No optimization of number tokens was possible, while 2 bytes were saved for string tokens.

For local variable name optimization, the report shows that 38 unique local variable names were reduced to 20 unique names. The number of identifier tokens should stay the same (there is currently no optimization option to optimize away non-essential or unused “real” tokens). Since there can be at most 53 single-character identifiers, all local variables are now one character in length. Over 600 bytes was saved. --details will give a longer report and much more information.

A sample output of LuaSrcDiet 0.12.0 for processing the one-file LuaSrcDiet.lua program itself at --maximum and --opt-experimental settings is as follows:

*** local variable optimization summary ***
----------------------------------------------------------
Variable          Unique   Decl.   Token    Size   Average
Types              Names   Count   Count   Bytes     Bytes
----------------------------------------------------------
Global                27       0      51     280      5.49
----------------------------------------------------------
Local (in)           482    1063    4889   21466      4.39
TOTAL (in)           509    1063    4940   21746      4.40
----------------------------------------------------------
Local (out)           55    1063    4889    4897      1.00
TOTAL (out)           82    1063    4940    5177      1.05
----------------------------------------------------------

*** BINEQUIV: binary chunks are sort of equivalent

Statistics for: LuaSrcDiet.lua -> app_experimental.lua

*** lexer-based optimizations summary ***
--------------------------------------------------------------------
Lexical            Input   Input     Input  Output  Output    Output
Elements           Count   Bytes   Average   Count   Bytes   Average
--------------------------------------------------------------------
TK_KEYWORD          3083   12247      3.97    3083   12247      3.97
TK_NAME             5401   24121      4.47    5401    7552      1.40
TK_NUMBER            467     494      1.06     467     494      1.06
TK_STRING            787    7983     10.14     787    7974     10.13
TK_LSTRING            14    3453    246.64      14    3453    246.64
TK_OP               6381    6861      1.08    6171    6651      1.08
TK_EOS                 1       0      0.00       1       0      0.00
--------------------------------------------------------------------
TK_COMMENT          1611   72339     44.90       1      18     18.00
TK_LCOMMENT           18    4404    244.67       0       0      0.00
TK_EOL              4419    4419      1.00    1778    1778      1.00
TK_SPACE           10439   24475      2.34    2081    2081      1.00
--------------------------------------------------------------------
Total Elements     32621  160796      4.93   19784   42248      2.14
--------------------------------------------------------------------
Total Tokens       16134   55159      3.42   15924   38371      2.41
--------------------------------------------------------------------
* WARNING: before and after lexer streams are NOT equivalent!

The command line was:

lua LuaSrcDiet.lua LuaSrcDiet.lua -o app_experimental.lua --maximum --opt-experimental --noopt-srcequiv

The important thing to note is that while the binary chunks are equivalent, the source lexer streams are not equivalent. Hence, the --noopt-srcequiv makes LuaSrcDiet report a warning for failing the source equivalence test.

LuaSrcDiet.lua was reduced from 157 kiB to about 41.3 kiB. The --opt-experimental option saves an extra 205 bytes over standard --maximum. Note the reduction in TK_OP count due to a reduction in semicolons and parentheses. TK_SPACE has actually increased a bit due to semicolons that are changed into single spaces; some of these spaces could not be removed.

For more performance numbers, see the Performance Statistics page.

Verification

Code size reduction can be quite a hairy thing (even I peer at the results in suspicion), so some kind of verification is desirable for users who expect processed files to not blow up. Since LuaSrcDiet has been talked about as a tool to reduce code size in projects such as WoW add-ons, eLua and nspire, adding a verification step will reduce risk for all users of LuaSrcDiet.

LuaSrcDiet performs two kinds of equivalence testing as of version 0.12.0. The two tests can be very, very loosely termed as source equivalence testing and binary equivalence testing. They are controlled by the --opt-srcequiv and --opt-binequiv options and are enabled by default.

Testing behaviour can be summarized as follows:

Both tests are always executed. The options control the resulting actions taken.
Both options are normally enabled. This will make any failing test to throw an error.
When an option is disabled, LuaSrcDiet will at most print a warning.
For passing results, see the following subsections that describe what the tests actually does.

You only need to disable a testing option for experimental optimizations (see the following section for more information on this). For anything up to and including --maximum, both tests should pass. If any test fail under these conditions, then something has gone wrong with LuaSrcDiet, and I would be interested to know what has blown up.

--opt-srcequiv Source Equivalence

The source equivalence test uses LuaSrcDiet’s lexer to read and compare the before and after lexer token streams. Numbers and strings are dumped as binary chunks using loadstring() and string.dump() and the results compared.

If your file passes this test, it means that a Lua 5.1.x binary should see the exact same token streams for both before and after files. That is, the parser in Lua will see the same lexer sequence coming from the source for both files and thus they should be equivalent. Touch wood. Heh.

However, if you are cross-compiling, it may be possible for this test to fail. Experienced Lua developers can modify equiv.lua to handle such cases.

--opt-binequiv Binary Equivalence

The binary equivalence test uses loadstring() and string.dump() to generate binary chunks of the entire before and after files. Also, any shbang (#!) lines are removed prior to generation of the binary chunks.

The binary chunks are then run through a fake undump routine to verify the integrity of the binary chunks and to compare all parts that ought to be identical.

On a per-function prototype basis (where ignored means that any difference between the two binary chunks is ignored):

All debug information is ignored.
The source name is ignored.
Any line number data is ignored. For example, linedefined and lastlinedefined.

The rest of the two binary chunks must be identical. So, while the two are not binary-exact, they can be loosely termed as “equivalent” and should run in exactly the same manner. Sort of. You get the idea.

This test may also cause problems if you are cross-compiling.

Experimental Stuff

The --opt-experimental option applies experimental optimizations that generally, makes changes to “real” tokens. Such changes may or may not lead to the result failing binary chunk equivalence testing. They would likely fail source lexer stream equivalence testing, so the --noopt-srcequiv option needs to be applied so that LuaSrcDiet just gives a warning instead of an error.

For sample files, see the samples directory.

Currently implemented experimental optimizations are as follows:

Semicolon Operator Removal

The semicolon (;) operator is an optional operator that is used to separate statements. The optimization turns all of these operators into single spaces, which are then run through whitespace removal. At worst, there will be no change to file size.

Fails source lexer stream equivalence.
Passes binary chunk equivalence.

Function Call Syntax Sugar Optimization

This optimization turns function calls that takes a single string or long string parameter into its syntax-sugar representation, which leaves out the parentheses. Since strings can abut anything, each instance saves 2 bytes.

For example, the following:

fish("cow")fish('cow')fish([[cow]])

is turned into:

fish"cow"fish'cow'fish[[cow]]

Fails source lexer stream equivalence.
Passes binary chunk equivalence.

Other Experimental Optimizations

There are two more of these optimizations planned, before focus is turned to the Lua 5.2.x series:

Simple local keyword removal. Planned to work for a few kinds of patterns only.
User directed name replacement, which will need user input to modify names or identifiers used in table keys and function methods or fields.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!