A tiny, fast JavaScript and Objective-J parser with a complete implementation of the GNU C preprocessor, written completely in JavaScript.
objj-acorn is an extension of the acorn JavaScript parser by Marijn Haverbeke, and is the foundation of the Objective-J compiler.
Acorn can be invoked in several ways.
- From a Node script.
- From the command line.
- From a browser script.
To use acorn from a Node script, install objj-acorn as a package as usual using npm:
npm install objj-acorn
Alternately, download the source and link to that:
git clone https://github.com/mrcarlberg/acorn.git
cd acorn
npm link
cd /path/to/project
npm link objj-acorn
Now you can require
objj-acorn in your node scripts. The main entrypoint to acorn is the parse
function, which returns an object with the AST nodes:
var fs = require('fs'),
acorn = require('objj-acorn');
try
{
var options = {trackComments: true},
code = fs.readFileSync(pathToFile, 'utf8'),
ast = acorn.parse(code, options);
}
catch(e)
{
console.error(e.message);
process.exit(1);
}
To use acorn from the command line, use the acorn
binary, which is installed when you use npm to install or link the objj-acorn package. Alternately, you can execute bin/acorn
directly. The syntax is as follows:
usage: acorn [options] file
Parses <file> and outputs the parsed AST as a stringified JSON object.
If <file> is '-', reads from stdin. You may also pipe or redirect into acorn and omit <file>.
Options:
--ecma3|--ecma5 Sets the ECMAScript version to parse. Default is version 5.
--strictSemicolons Prevents the parser from doing automatic semicolon insertion.
Statements that do not end in semicolons will generate an error.
--trackComments Attaches an array of comments found before and after each expression node.
--trackSpaces Attaches an array of whitespaces found before and after each expression node.
--locations Attaches a "loc" object to each node with "start" and "end" subobjects,
each of which contains the one-based line and zero-based column numbers.
-Dmacro[([param, ...])][=definition]]
Defines a macro. A name with no parameters and no definition will be defined
with the value 1. To be safe from shell expansion, the values on either side
of the = should be enclosed in '', for example -D'PLUS_ONE(arg)'='arg + 1'.
May be used multiple times to define multiple macros.
--no-objj Turns off Objective-J syntax parsing.
--no-preprocess Turns off the preprocessor. Results in ~30% speed increase.
--compact No whitespace is used in the AST output.
--silent Do not output the AST, just return the exit status.
--version Print the current version and exit.
--help Print this usage information and exit.
As noted above, the acorn
binary accepts input from a file or from stdin, so you can do all of these:
acorn test.js
cat test.js | acorn
acorn < test.js
acorn -
In the last instance, acorn -
allows you to directly type one or more lines of code and terminate the input with Control-D (on Posix systems).
To use acorn in the browser, load acorn.js
with a <script>
tag. If you are using the preprocessor, you must also load util/walk.js
first:
<script src="util/walk.js" type="text/javascript"></script>
<script src="acorn.js" type="text/javascript"></script>
Acorn is compatible with AMD, so you may also use loaders like require.js to load acorn in the browser. When loaded without any kind of module management, a single global object acorn
will be defined, and all the exported properties will be added to that.
Once acorn is loaded, you may use acorn within your own scripts by calling functions or accessing attributes of the acorn
object:
This is the main interface used to parse a JavaScript program. The input
parameter is a string, options
can be undefined or an object setting one or more of the options listed below. The return value will be an abstract syntax tree object as specified by the Mozilla Parser API.
This function can be used to get a {line, column}
object for a given program string and character offset.
This function exports a primitive interface to Acorn's tokenizer. The function takes an input string and options similar to parse
(though only some options are meaningful here), and returns a function that can be called repeatedly to read a single token. The tokenizing function returns a {start, end, type, value}
object (with added startLoc
and endLoc
properties when the locations
option is enabled). This object will be reused (updated) for each token, so you can't count on it staying stable.
This attribute holds an object mapping names to the token type objects that end up in the type
properties of tokens.
When an error occurs, acorn throws a SyntaxError
with the following attributes:
- message – A descriptive message of the error. If the
lineNoInErrorMessage
option is on, the error message will end with(line:column)
, whereline
is the one-based line number on which the error occurred, andcolumn
is the zero-based column within that line. - line – The one-based line number on which the error occurred.
- column – The zero-based column number within
line
. - lineStart – The zero-based character position of the start of
line
. - lineEnd – The zero-based character position of the end of
line
. - fileName – The value of the
sourceFile
option passed in to acorn, ornull
if none was passed in.
The optional second parameter to the parse
and tokenize
functions is an options object. Acorn supports a number of options that control its behavior and its output.
-
ecmaVersion – Indicates the ECMAScript version to parse. Must be either 3 or 5. This influences support for strict mode, the set of reserved words, and support for getters and setter. Default: 5
-
strictSemicolons – If
true
, prevents the parser from doing automatic semicolon insertion, and statements that do not end with a semicolon will generate an error. Default:false
-
allowTrailingCommas – If
false
, the parser will not allow trailing commas in array and object literals. -
forbidReserved – If
true
, using a reserved word as an identifier will generate an error. If"everywhere"
, using a reserved word as a member of an object will generate an error. Default:false
-
trackComments – If
true
, the parser attaches "commentsBefore" and "commentsAfter" properties, which contain an array of comments, to each AST node that has comments before or after. A single comment may appear in both "commentsBefore" and "commentsAfter". Default:false
-
trackCommentsIncludeLineBreak – When
true
, the parser will include, if present, line breaks before comments and all the whitespace in between. Default:false
-
trackSpaces – If
true
, the parser attaches "spacesBefore" and "spacesAfter" properties, which contain an array of whitespace, to each AST node that has whitespace before or after. The same whitespace may appear in both "spacesBefore" and "spacesAfter". Default:false
-
locations – When
true
, each node has a "loc" object attached with "start" and "end" subobjects, each of which contains the one-based line and zero-based column numbers in{line, column}
form. Default:false
-
onComment – If a function is passed for this option, whenever a comment is encountered the function will be called with the following parameters:
- block –
true
if the comment is a block comment, false if it is a line comment. - text – The content of the comment.
- start – Character offset of the start of the comment.
- end – Character offset of the end of the comment.
When the
locations
options is on, the{line, column}
locations of the comment’s start and end are passed as two additional parameters. Default:null
- block –
-
ranges – Nodes have their start and end characters offsets recorded in "start" and "end" properties (directly on the node, rather than the "loc" object, which holds line/column data. To also add a semi-standardized "range" property holding a
[start, end]
array with the same numbers, set theranges
option totrue
. Default:false
-
program – It is possible to parse multiple files into a single AST by passing the tree produced by parsing the first file as the
program
option in subsequent parses. This will add the toplevel forms of the parsed file to the "Program" (top) node of an existing parse tree. Default:null
-
sourceFile – When the
locations
option istrue
, you can pass this option to record the source file in every node’sloc
object. Note that the contents of this option are not examined or processed in any way; you are free to use whatever format you choose. When acorn is invoked via the command line, this option is set to the full path of the file being parsed. Default:null
-
directSourceFile: Like
sourceFile
, but the property will be added directly to the nodes, rather than to aloc
object. -
objj – When
true
, the parser recognizes and parses Objective-J syntax. Default:true
-
preprocess – When
true
, the parser recognizes and follows preprocessor directives. For more information, see the Preprocessor section below. Default: true -
macros – When
preprocess
istrue
, you may pass an array of macro objects and/or text definitions in this option, which will create predefined macros with the given names. Macro objects will be added as is. Text definitions may be in one of three forms:- name
- name=definition
- name(parameter, ...)=definition
The first form predefines
name
as a macro with the value1
. The second form predefines the object macroname
, and the contents ofdefinition
are tokenized and processed as if they appeared in a#define
directive. The third form is similar to the second form, but predefines the function macroname
with the given parameters and definition. Default:null
-
lineNoInErrorMessage – When
true
, error messages are suffixed with(line:column)
, whereline
is the one-based line number on which the error occurred, andcolumn
is the zero-based column within that line. Default:true
When the preprocess
option is true
(the default), acorn recognizes preprocessor directives defined in the GNU C 4.0 Preprocessor User Guide. Acorn offers a complete implementation of the following preprocessor features:
- Macros
- Conditionals
- Diagnostics
#pragma
is recognized but remaining tokens on the line are not examined.
In conjunction with the macros
option, there are many powerful uses of the preprocessor. For example, you can define a debug logging macro that will compile to nothing unless you pass ["DEBUG"]
in the macros
option (or -DDEBUG
from the command line):
#if DEBUG
#define LOG(format, args...) console.log(format, ##args)
#else
#define LOG(...)
#endif
LOG("(%d, %d)", x, y);
LOG("This is awesome!");
After calling the acorn.parse
function, you may retrieve any defined macros with the acorn.getMacros
function. If the preprocessor is off, it will return null
. If the preprocessor is on, it will return an array of macro objects defined during the call to acorn.parse
, including any macros you passed in via options.macros
, but not including acorn’s own predefined macros.
This allows you to pass the macros created in one file to other files. For example, let’s say you have a file that defines debug logging macros:
#define LOG_LEVEL_NONE 0
#define LOG_LEVEL_DEBUG 1
#define LOG_LEVEL_INFO 2
#define LOG_LEVEL_VERBOSE 3
#if LOG_LEVEL >= LOG_LEVEL_DEBUG
#define LOG(format, args...) console.log(format, ##args)
#else
#define LOG(...)
#endif
#if LOG_LEVEL >= LOG_LEVEL_INFO
#define LOG_INFO(format, args...) console.log(format, ##args)
#else
#define LOG_INFO(...)
#endif
#if LOG_LEVEL >= LOG_LEVEL_VERBOSE
#define LOG_VERBOSE(format, args...) console.log(format, ##args)
#else
#define LOG_VERBOSE(...)
#endif
We can create these macros by parsing the file and passing in the macro for LOG_LEVEL
, for example let’s say we want LOG_LEVEL_INFO
(1):
var fs = require('fs'),
acorn = require('objj-acorn');
function getLoggingMacros(logLevel)
{
var code = fs.readFileSync('logging-defines.js', 'utf8');
acorn.parse(code, {macros: ['LOG_LEVEL=' + logLevel]});
return acorn.getMacros();
}
var macros = getLoggingMacros(1);
We use these logging macros in all of our source files. Here’s an example:
if (someFlag === 0)
{
LOG_INFO("someFlag was 0!");
doSomething();
}
else
{
LOG_VERBOSE("someFlag == " + someFlag);
doSomethingElse();
}
We can pass the macros defined in "logging-defines.js" to another file that uses the logging macros:
code = fs.readFileSync('something.js', 'utf8');
var ast = acorn.parse(code, {macros: macros});
And in this case, where LOG_LEVEL
was defined as 1 (LOG_LEVEL_INFO
), the following code would be generated:
if (someFlag === 0)
{
console.log("someFlag was 0!");
doSomething();
}
else
{
doSomethingElse();
}
Acorn defines the following predefined macros:
-
_OBJJ_ – Defined with the value
1
if theobjj
option istrue
, otherwise undefined. -
_BROWSER_ – Defined with the value
1
if acorn is executed in a browser, otherwise undefined.
For the supported features mentioned above, the acorn preprocessor implementation is identical in every detail to the GNU C preprocessor, with the following exceptions:
-
For various reasons, preprocessor directives are effectively implemented as JavaScript statements, and thus may not break up regular JavaScript statements. For example, you cannot do this:
#ifdef FOO function foo() #else function bar() #endif { doSomething(); }
This doesn’t work because the body of a function is actually part of the
function
statement. So when the parser reaches the{
, it chokes, because that isn’t a valid statement. -
Preprocessor directives may not be used within macro arguments.
-
If you use regular expression literals in a macro, to be safe you should enclose them in parentheses to be sure they are parsed as regular expressions. This is due to an ambiguity in the ECMAScript grammar. For example, you would do this:
#define isFoo(arg) (/foo/).test(arg) if (isFoo('bar'))