verible-verilog-syntax
checks SystemVerilog syntax, and provides some useful
options for examining lexed/parsed representations. When troubleshooting
Verible's SystemVerilog tools, start with this tool.
You can read about lexer and parser implementation details here.
usage: verible-verilog-syntax [options] <file(s)...>
Flags from verilog/tools/syntax/verilog_syntax.cc:
--error_limit (Limit the number of syntax errors reported. (0: unlimited));
default: 0;
--export_json (Uses JSON for output. Intended to be used as an input for
other tools.); default: false;
--lang (Selects language variant to parse. Options:
auto: SystemVerilog-2017, but may auto-detect alternate parsing modes
sv: strict SystemVerilog-2017, with explicit alternate parsing modes
lib: Verilog library map language (LRM Ch. 33)
); default: auto;
--printrawtokens (Prints all lexed tokens, including filtered ones.);
default: false;
--printtokens (Prints all lexed and filtered tokens); default: false;
--printtree (Whether or not to print the tree); default: false;
--verifytree (Verifies that all tokens are parsed into tree, prints
unmatched tokens); default: false;
The parser supports
alternative parsing modes where a
file is intended to be included in another context, such as module body items,
and can be triggered with comments near the top-of-file like // verilog_syntax: parse-as-module-body
.
The following code:
// This is module foo.
module foo(input a, b, output z);
endmodule : foo
produces the following tokens (shown using --printrawtokens
):
All lexed tokens:
(#"// end of line comment" @0-22: "// This is module foo.")
(#"<<\\n>>" @22-23: "
")
(#"module" @23-29: "module")
(#"<<space>>" @29-30: " ")
(#SymbolIdentifier @30-33: "foo")
(#'(' @33-34: "(")
(#"input" @34-39: "input")
(#"<<space>>" @39-40: " ")
(#SymbolIdentifier @40-41: "a")
(#',' @41-42: ",")
(#"<<space>>" @42-43: " ")
(#SymbolIdentifier @43-44: "b")
(#',' @44-45: ",")
(#"<<space>>" @45-46: " ")
(#"output" @46-52: "output")
(#"<<space>>" @52-53: " ")
(#SymbolIdentifier @53-54: "z")
(#')' @54-55: ")")
(#';' @55-56: ";")
(#"<<\\n>>" @56-57: "
")
(#"endmodule" @57-66: "endmodule")
(#"<<space>>" @66-67: " ")
(#':' @67-68: ":")
(#"<<space>>" @68-69: " ")
(#SymbolIdentifier @69-72: "foo")
(#"<<\\n>>" @72-73: "
")
(#"<<\\n>>" @73-74: "
")
(#$end @74-74: "")
The token names (after #
) correspond to description strings in the yacc
grammar file; keywords are shown the same as the text they match. Byte offsets
are shown as the range that follows '@'. The raw, unfiltered token stream is
lossless with respect to the original input text.
With --printtokens
, you should see whitespace tokens filtered out.
The following code (same as above):
// This is module foo.
module foo(input a, b, output z);
endmodule : foo
produces this concrete syntax tree (CST), rendered by
verible-verilog-syntax --printtree
:
Parse Tree:
Node @0 (tag: kDescriptionList) {
Node @0 (tag: kModuleDeclaration) {
Node @0 (tag: kModuleHeader) {
Leaf @0 (#"module" @23-29: "module")
Leaf @2 (#SymbolIdentifier @30-33: "foo")
Node @5 (tag: kParenGroup) {
Leaf @0 (#'(' @33-34: "(")
Node @1 (tag: kPortDeclarationList) {
Node @0 (tag: kPortDeclaration) {
Leaf @0 (#"input" @34-39: "input")
Node @2 (tag: kDataType) {
}
Node @3 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @40-41: "a")
}
Node @4 (tag: kUnpackedDimensions) {
}
}
Leaf @1 (#',' @41-42: ",")
Node @2 (tag: kPort) {
Node @0 (tag: kPortReference) {
Node @0 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @43-44: "b")
}
}
}
Leaf @1 (#',' @41-42: ",")
Node @2 (tag: kPort) {
Node @0 (tag: kPortReference) {
Node @0 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @43-44: "b")
}
}
}
Leaf @3 (#',' @44-45: ",")
Node @4 (tag: kPortDeclaration) {
Leaf @0 (#"output" @46-52: "output")
Node @2 (tag: kDataType) {
}
Node @3 (tag: kUnqualifiedId) {
Leaf @0 (#SymbolIdentifier @53-54: "z")
}
Node @4 (tag: kUnpackedDimensions) {
}
}
}
Leaf @2 (#')' @54-55: ")")
}
Leaf @7 (#';' @55-56: ";")
}
Node @1 (tag: kModuleItemList) {
}
Leaf @2 (#"endmodule" @57-66: "endmodule")
Node @3 (tag: kLabel) {
Leaf @0 (#':' @67-68: ":")
Leaf @1 (#SymbolIdentifier @69-72: "foo")
}
}
}
The N
in Node @N
or Leaf @N
refers to the child rank of that node/leaf
with respect to its immediate parent node, starting at 0. nullptr
nodes are
skipped and will look like gaps in the rank sequence.
Nodes of the CST may link to other nodes or leaves (which contain tokens). The
nodes are tagged with
language-specific enumerations. Each leaf
encapsulates a token and is shown with its corresponding byte-offsets in the
original text (as @left-right
). Null nodes are not shown.
When --export_json
flag is set, concrete syntax tree is printed as JSON
object. See Parser tree object below for details.
The exact structure of the SystemVerilog CST is fragile, and should not be
considered stable; at any time, node enumerations can be created or removed, and
subtree structures can be re-shaped. In the above example, kModuleHeader
is an
implementation detail of a module definition's composition, and doesn't map
directly to a named grammar construct in the SV-LRM. The
verilog/CST
library provides functions that abstract away
internal structure.
JSON root is an object which maps each input file name to an object containing parsing result for that file.
Key | Type | Description |
---|---|---|
tokens |
array | List of Token objects, with whitespace tokens filtered out. Present only when --printtokens flag is specified. |
rawtokens |
array | List of Token objects. Present only when --printrawtokens flag is specified. |
tree |
object | Parser tree. Present only when --printtree flag is specified and parsing errors didn't prevent tree creation. |
errors |
array | List of Error objects. Present only when there were any errors. |
The tree consist of Node and Token objects. The tree root is a Node object.
Key | Type | Description |
---|---|---|
tag |
string | Node tag. See NodeEnum in verilog_nonterminals.h for available values. |
children |
array | List of children (Node and Token, or null ). |
Key | Type | Description |
---|---|---|
start , end |
int | Byte offset of token's first character and a character just past the symbol in source text. |
tag |
string | Token tag. See Possible token tag values below for details. |
text (optional) |
string | Token text. Not present in operator and keyword token objects. |
To get token text, either use text
value (if present), or read source file from byte start
(included) to byte end
(excluded). Example in Python:
start = token["start"]
end = token["end"]
# Read source file contents as bytes
with open(source_file_path, "rb") as f:
source = f.read()
# Get token text from source file contents
text = source[start:end].decode("utf-8")
Token tag enumerations come from the parser generator, with a few overrides specified in verilog_token.cc
. There are 3 types of values:
- Named tokens (e.g.
SymbolIdentifier
,TK_DecNumber
), which come from%token TOKEN_TAG
lines. - String literals (e.g.
module
,==
), which come from%token SOME_ID "token_tag"
lines. - Single characters (e.g.
;
,=
). They can be found using'.'
regular expression.
Key | Type | Description |
---|---|---|
line , column |
int | Line and column in source text. 0-based. |
text |
string | Character sequence which caused the error. |
phase |
string | Phase during which the error occured. One of: lex , parse , preprocess , unknown . |
message |
string | (optional) Error explanation. |
export_json_examples
directory contains Python wrappers for verible-verilog-syntax --export_json
(verible_verilog_syntax.py
file) and some examples.