LuaSrcDiet features include the following:
-
Predefined default, --basic (token-only) and --maximum settings.
-
Avoid deleting a block comment with a certain message with --keep; this is for copyright or license texts.
-
Special handling for
#!
(shbang) lines and in functions,self
implicit parameters. -
Dumping of raw information using --dump-lexer and --dump-parser. See the
samples
directory. -
A HTML plugin: outputs files that highlights globals and locals, useful for eliminating globals. See the
samples
directory. -
An SLOC plugin: counts significant lines of Lua code, like SLOCCount.
-
Source and binary equivalence testing with --opt-srcequiv and --opt-binequiv.
List of optimizations:
-
Line endings are always normalized to LF, except those embedded in comments or strings.
-
--opt-comments: Removal of comments and comment blocks.
-
--opt-whitespace: Removal of whitespace, excluding end-of-line characters.
-
--opt-emptylines: Removal of empty lines.
-
--opt-eols: Removal of unnecessary end-of-line characters.
-
--opt-strings: Rewrite strings and long strings. See the
samples
directory. -
--opt-numbers: Rewrite numbers. See the
samples
directory. -
--opt-locals: Rename local variable names. Does not rename field or method names.
-
--opt-entropy: Tries to improve symbol entropy when renaming locals by calculating actual letter frequencies.
-
--opt-experimental: Apply experimental optimizations.
LuaSrcDiet tries to allow each option to be enabled or disabled separately, but they are not completely orthogonal.
If comment removal is disabled, LuaSrcDiet only removes trailing whitespace. Trailing whitespace is not removed in long strings, a warning is generated instead. If empty line removal is disabled, LuaSrcDiet keeps all significant code on the same lines. Thus, a user is able to debug using the original sources as a reference since the line numbering is unchanged.
String optimization deals mainly with optimizing escape sequences, but delimiters can be switched between single quotes and double quotes if the source size of the string can be reduced.
For long strings and long comments, LuaSrcDiet also tries to reduce the =
separators in the
delimiters if possible.
For number optimization, LuaSrcDiet saves space by trying to generate the shortest possible sequence, and in the process it does not produce “proper” scientific notation (e.g. 1.23e5) but does away with the decimal point (e.g. 123e3) instead.
The local variable name optimizer uses a full parser of Lua 5.1 source code, thus it can rename all local variables, including upvalues and function parameters.
It should handle the implicit self
parameter gracefully.
In addition, local variable names are either renamed into the shortest possible names following English frequent letter usage or are arranged by calculating entropy with the --opt-entropy option.
Variable names are reused whenever possible, reducing the number of unique variable names.
For example, for LuaSrcDiet.lua
(version 0.11.0), 683 local identifiers representing 88 unique names were optimized into 32 unique names, all which are one character in length, saving over 2600 bytes.
If you need some kind of reassurance that your app will still work at reduced size, see the section on verification below.
LuaSrcDiet needs a Lua 5.1.x (preferably Lua 5.1.4) binary to run. On Unix machines, one can use the following command line:
LuaSrcDiet myscript.lua -o myscript_.lua
On Windows machines, the above command line can be used on Cygwin, or you can run Lua with the LuaSrcDiet script like this:
lua LuaSrcDiet.lua myscript.lua -o myscript_.lua
When run without arguments, LuaSrcDiet prints a list of options.
Also, you can check the Makefile
for some examples of command lines to use.
For example, for maximum code size reduction and maximum verbosity, use:
LuaSrcDiet --maximum --details myscript.lua -o myscript_.lua
A sample output of LuaSrcDiet 0.11.0 for processing llex.lua
at --maximum settings is as follows:
Statistics for: LuaSrcDiet.lua -> sample/LuaSrcDiet.lua *** local variable optimization summary *** ---------------------------------------------------------- Variable Unique Decl. Token Size Average Types Names Count Count Bytes Bytes ---------------------------------------------------------- Global 10 0 19 95 5.00 ---------------------------------------------------------- Local (in) 88 153 683 3340 4.89 TOTAL (in) 98 153 702 3435 4.89 ---------------------------------------------------------- Local (out) 32 153 683 683 1.00 TOTAL (out) 42 153 702 778 1.11 ---------------------------------------------------------- *** lexer-based optimizations summary *** -------------------------------------------------------------------- Lexical Input Input Input Output Output Output Elements Count Bytes Average Count Bytes Average -------------------------------------------------------------------- TK_KEYWORD 374 1531 4.09 374 1531 4.09 TK_NAME 795 3963 4.98 795 1306 1.64 TK_NUMBER 54 59 1.09 54 59 1.09 TK_STRING 152 1725 11.35 152 1717 11.30 TK_LSTRING 7 1976 282.29 7 1976 282.29 TK_OP 997 1092 1.10 997 1092 1.10 TK_EOS 1 0 0.00 1 0 0.00 -------------------------------------------------------------------- TK_COMMENT 140 6884 49.17 1 18 18.00 TK_LCOMMENT 7 1723 246.14 0 0 0.00 TK_EOL 543 543 1.00 197 197 1.00 TK_SPACE 1270 2465 1.94 263 263 1.00 -------------------------------------------------------------------- Total Elements 4340 21961 5.06 2841 8159 2.87 -------------------------------------------------------------------- Total Tokens 2380 10346 4.35 2380 7681 3.23 --------------------------------------------------------------------
Overall, the file size is reduced by more than 9 kiB. Tokens in the above report can be classified into “real” or actual tokens, and “fake” or whitespace tokens. The number of “real” tokens remained the same. Short comments and long comments were completely eliminated. The number of line endings was reduced by 59, while all but 152 whitespace characters were optimized away. So, token separators (whitespace, including line endings) now takes up just 10 % of the total file size. No optimization of number tokens was possible, while 2 bytes were saved for string tokens.
For local variable name optimization, the report shows that 38 unique local variable names were reduced to 20 unique names. The number of identifier tokens should stay the same (there is currently no optimization option to optimize away non-essential or unused “real” tokens). Since there can be at most 53 single-character identifiers, all local variables are now one character in length. Over 600 bytes was saved. --details will give a longer report and much more information.
A sample output of LuaSrcDiet 0.12.0 for processing the one-file LuaSrcDiet.lua
program itself at --maximum and --opt-experimental settings is as follows:
*** local variable optimization summary *** ---------------------------------------------------------- Variable Unique Decl. Token Size Average Types Names Count Count Bytes Bytes ---------------------------------------------------------- Global 27 0 51 280 5.49 ---------------------------------------------------------- Local (in) 482 1063 4889 21466 4.39 TOTAL (in) 509 1063 4940 21746 4.40 ---------------------------------------------------------- Local (out) 55 1063 4889 4897 1.00 TOTAL (out) 82 1063 4940 5177 1.05 ---------------------------------------------------------- *** BINEQUIV: binary chunks are sort of equivalent Statistics for: LuaSrcDiet.lua -> app_experimental.lua *** lexer-based optimizations summary *** -------------------------------------------------------------------- Lexical Input Input Input Output Output Output Elements Count Bytes Average Count Bytes Average -------------------------------------------------------------------- TK_KEYWORD 3083 12247 3.97 3083 12247 3.97 TK_NAME 5401 24121 4.47 5401 7552 1.40 TK_NUMBER 467 494 1.06 467 494 1.06 TK_STRING 787 7983 10.14 787 7974 10.13 TK_LSTRING 14 3453 246.64 14 3453 246.64 TK_OP 6381 6861 1.08 6171 6651 1.08 TK_EOS 1 0 0.00 1 0 0.00 -------------------------------------------------------------------- TK_COMMENT 1611 72339 44.90 1 18 18.00 TK_LCOMMENT 18 4404 244.67 0 0 0.00 TK_EOL 4419 4419 1.00 1778 1778 1.00 TK_SPACE 10439 24475 2.34 2081 2081 1.00 -------------------------------------------------------------------- Total Elements 32621 160796 4.93 19784 42248 2.14 -------------------------------------------------------------------- Total Tokens 16134 55159 3.42 15924 38371 2.41 -------------------------------------------------------------------- * WARNING: before and after lexer streams are NOT equivalent!
The command line was:
lua LuaSrcDiet.lua LuaSrcDiet.lua -o app_experimental.lua --maximum --opt-experimental --noopt-srcequiv
The important thing to note is that while the binary chunks are equivalent, the source lexer streams are not equivalent. Hence, the --noopt-srcequiv makes LuaSrcDiet report a warning for failing the source equivalence test.
LuaSrcDiet.lua
was reduced from 157 kiB to about 41.3 kiB.
The --opt-experimental option saves an extra 205 bytes over standard --maximum.
Note the reduction in TK_OP
count due to a reduction in semicolons and parentheses.
TK_SPACE
has actually increased a bit due to semicolons that are changed into single spaces; some of these spaces could not be removed.
For more performance numbers, see the Performance Statistics page.
Code size reduction can be quite a hairy thing (even I peer at the results in suspicion), so some kind of verification is desirable for users who expect processed files to not blow up.
Since LuaSrcDiet has been talked about as a tool to reduce code size in projects such as WoW add-ons, eLua
and nspire
, adding a verification step will reduce risk for all users of LuaSrcDiet.
LuaSrcDiet performs two kinds of equivalence testing as of version 0.12.0. The two tests can be very, very loosely termed as source equivalence testing and binary equivalence testing. They are controlled by the --opt-srcequiv and --opt-binequiv options and are enabled by default.
Testing behaviour can be summarized as follows:
-
Both tests are always executed. The options control the resulting actions taken.
-
Both options are normally enabled. This will make any failing test to throw an error.
-
When an option is disabled, LuaSrcDiet will at most print a warning.
-
For passing results, see the following subsections that describe what the tests actually does.
You only need to disable a testing option for experimental optimizations (see the following section for more information on this). For anything up to and including --maximum, both tests should pass. If any test fail under these conditions, then something has gone wrong with LuaSrcDiet, and I would be interested to know what has blown up.
The source equivalence test uses LuaSrcDiet’s lexer to read and compare the before and after lexer token streams.
Numbers and strings are dumped as binary chunks using loadstring()
and string.dump()
and the results compared.
If your file passes this test, it means that a Lua 5.1.x binary should see the exact same token streams for both before and after files. That is, the parser in Lua will see the same lexer sequence coming from the source for both files and thus they should be equivalent. Touch wood. Heh.
However, if you are cross-compiling, it may be possible for this test to fail.
Experienced Lua developers can modify equiv.lua
to handle such cases.
The binary equivalence test uses loadstring()
and string.dump()
to generate binary chunks of the entire before and after files.
Also, any shbang (#!
) lines are removed prior to generation of the binary chunks.
The binary chunks are then run through a fake undump
routine to verify the integrity of the binary chunks and to compare all parts that ought to be identical.
On a per-function prototype basis (where ignored means that any difference between the two binary chunks is ignored):
-
All debug information is ignored.
-
The source name is ignored.
-
Any line number data is ignored. For example,
linedefined
andlastlinedefined
.
The rest of the two binary chunks must be identical. So, while the two are not binary-exact, they can be loosely termed as “equivalent” and should run in exactly the same manner. Sort of. You get the idea.
This test may also cause problems if you are cross-compiling.
The --opt-experimental option applies experimental optimizations that generally, makes changes to “real” tokens. Such changes may or may not lead to the result failing binary chunk equivalence testing. They would likely fail source lexer stream equivalence testing, so the --noopt-srcequiv option needs to be applied so that LuaSrcDiet just gives a warning instead of an error.
For sample files, see the samples
directory.
Currently implemented experimental optimizations are as follows:
The semicolon (;
) operator is an optional operator that is used to separate statements.
The optimization turns all of these operators into single spaces, which are then run through whitespace removal.
At worst, there will be no change to file size.
-
Fails source lexer stream equivalence.
-
Passes binary chunk equivalence.
This optimization turns function calls that takes a single string or long string parameter into its syntax-sugar representation, which leaves out the parentheses. Since strings can abut anything, each instance saves 2 bytes.
For example, the following:
fish("cow")fish('cow')fish([[cow]])
is turned into:
fish"cow"fish'cow'fish[[cow]]
-
Fails source lexer stream equivalence.
-
Passes binary chunk equivalence.
There are two more of these optimizations planned, before focus is turned to the Lua 5.2.x series:
-
Simple
local
keyword removal. Planned to work for a few kinds of patterns only. -
User directed name replacement, which will need user input to modify names or identifiers used in table keys and function methods or fields.