Skip to content

A GLSL ES 1.0 and 3.0 parser that can preserve whitespace and comments

Notifications You must be signed in to change notification settings

ShaderFrog/glsl-parser

Repository files navigation

Shaderfrog GLSL Compiler

The Shaderfrog GLSL compiler is an open source GLSL 1.00 and 3.00 parser and preprocessor that compiles back to GLSL. Both the parser and preprocessor can preserve comments and whitespace.

The parser is built with a PEG grammar, via the Peggy Javascript library. The PEG grammars for both the preprocessor and parser are available on Github.

See the state of this library for limitations and goals of this compiler.

Table of Contents

Usage

Installation

npm install --save @shaderfrog/glsl-parser

Parsing

import { parser, generate } from '@shaderfrog/glsl-parser';

// To parse a GLSL program's source code into an AST:
const program = parser.parse('float a = 1.0;');

// To turn a parsed AST back into a source program
const transpiled = generate(program);

The parser accepts an optional second options argument:

parser.parse('float a = 1.0;', options);

Where options is:

type ParserOptions = {
  // Hide warnings. If set to false or not set, then the parser logs warnings
  // like undefined functions and variables. If `failOnWarn` is set to true,
  // warnings will still cause the parser to raise an error. Defaults to false.
  quiet: boolean;

  // An optional string representing the origin of the GLSL, for debugging and
  // error messages. For example, "main.js". If the parser raises an error, the
  // grammarSource shows up in the error.source field. If you format the error
  // (see the errors section), the grammarSource shows up in the formatted error
  // string. Defaults to undefined.
  grammarSource: string;

  // If true, sets location information on each AST node, in the form of
  // { column: number, line: number, offset: number }. Defaults to false.
  includeLocation: boolean;

  // If true, causes the parser to raise an error instead of log a warning.
  // The parser does limited type checking, and things like undeclared variables
  // are treated as warnings. Defaults to false.
  failOnWarn: boolean;
}

The program type

parser.parse() returns a Program, which is a special AST Node:

interface Program {
  // Hard coded AST node type of "program"
  type: 'program';
  // The AST itself is an array of the top level statements
  program: AstNode[];
  // All of the scopes found during parsing
  scopes: Scope[];
  // Leading whitespace of the source code
  wsStart?: string;
  // Trailing whitespace of the source code
  wsEnd?: string;
}

Scope

parse() returns a Program, which has a scopes array on it. A scope looks like:

type Scope = {
  name: string;
  parent?: Scope;
  bindings: ScopeIndex;
  types: TypeScopeIndex;
  functions: FunctionScopeIndex;
  location?: LocationObject;
}

The name of a scope is either "global", the name of the function that introduced the scope, or in anonymous blocks, "{". In each scope, bindings represents variables, types represents user-created types (structs in GLSL), and functions represents functions.

For bindings and types, the scope index looks like:

type ScopeIndex = {
  [name: string]: {
    declaration?: AstNode;
    references: AstNode[];
  }
}

Where name is the name of the variable or type. declaration is the AST node where the variable was declared. In the case the variable is used without being declared, declaration won't be present. If you set the failOnWarn parser option to true, the parser will throw an error when encountering an undeclared variable, rather than allow a scope entry without a declaration.

For functions, the scope index is slightly different:

type FunctionScopeIndex = {
  [name: string]: {
    [signature: string]: {
      returnType: string;
      parameterTypes: string[];
      declaration?: FunctionNode;
      references: AstNode[];
    }
  }
};

Where name is the name of the function, and signature is a string representing the function's return and parameter types, in the form of "returnType: paramType1, paramType2, ..." or "returnType: void" in the case of no arguments. Each signature in this index represents an "overloaded" function in GLSL, as in:

void someFunction(int x) {};
void someFunction(int x, int y) {};

With this source code, there will be two entries under name, one for each overload signature. The references are the uses of that specific overloaded version of the function. references also contains the function prototypes for the overloaded function, if present.

In the case there is only one declaration for a function, there will still be a single entry under name with the function's signature.

⚠️ Caution! This parser does very limited type checking. This leads to a known case where a function call can match to the wrong overload in scope:

void someFunction(float, float);
void someFunction(bool, bool);
someFunction(true, true); // This will be attributed to the wrong scope entry

The parser doesn't know the type of the operands in the function call, so it matches based on the name and arity of the functions.

See also Utility-Functions for renaming scope references.

Errors

If you have invalid GLSL, the parser throws a GlslSyntaxError. The parser uses Peggy under the hood, so GlslSyntaxError is a convenience type alias for a peggy.SyntaxError.

import { parser, GlslSyntaxError } from '@shaderfrog/glsl-parser';

let error: GlslSyntaxError | undefined;
try {
  // Line without a semicolon
  c.parse(`float a`);
} catch (e) {
  error = e as GlslSyntaxError;
}

If you want to check if a caught error is an instanceof a GlslSyntaxError, then you need to use the Peggy SyntaxError error object, which lives on the parser:

console.log(error instanceof parser.SyntaxError)
// true

The only error the parser intentionally throws is a GlslSyntaxError. You should be safe to cast it to a GlslSyntaxError with as in Typescript.

The error message string is automatically generated by Peggy:

console.log(error.message)
// 'Expected ",", ";", "=", or array specifier but end of input found'

The error object includes the location of the error. It is not printed in the error message by default.

console.log(error.location)
/*
{
  source: undefined,
  start: { offset: 7, line: 1, column: 8 },
  end: { offset: 7, line: 1, column: 8 }
}
*/

Note the source field on the error object is the grammarSource string provided to the parser options, which is undefined by default. If you pass in a grammarSource to parser.parse(), it shows up in the error object. This is meant to help you track which source file you're parsing, for example you could enter "myfile.glsl" as an argument to parser.parse() so that the error includes that your source GLSL came from your application's myfile.glsl file.

The error object also has a fairly confusing format() method, which comes from the underlying Peggy error object. It produces an ASCII formatted string with arrows and underlines. The source option passed to .format() must match your grammarSource in parser options (which is undefined by default). This API is awkward and I might override it in future versions of the parser.

console.log(error.format([{ text, source: undefined }])
/*
Error: Expected ",", ";", "=", or array specifier but "f" found.
  --> undefined:2:1
  |
2 | float b
  | ^
*/

Preprocessing

The parser also ships with a preprocess() function.

The preprocessor takes in a program source code string and produces a preprocessed program source code string. If you want to access and manipulate the AST produced by preprocessing, see the next sections.

import preprocess from '@shaderfrog/glsl-parser/preprocessor';

// Preprocess a program
console.log(preprocess(`
  #define a 1
  float b = a;
`, options));

Where options is:

type PreprocessorOptions = {
  // Don't strip comments before preprocessing
  preserveComments: boolean,
  // Macro definitions to use when preprocessing
  defines: {
    SOME_MACRO_NAME: 'macro body'
  },
  // A list of callbacks evaluated for each node type, and returns whether or not
  // this AST node is subject to preprocessing
  preserve: {
    ast_node_name: (path) => boolean
  }
}

A preprocessed program string can be handed off to the main GLSL parser. Preprocessing is optional, but a program string may not be valid until it is preprocessed.

If you want more control over preprocessing, the preprocess function above is a convenience method for approximately the following:

import {
  preprocessAst,
  preprocessComments,
  generate,
  parser,
} from '@shaderfrog/glsl-parser/preprocessor';

// Remove comments before preprocessing
const commentsRemoved = preprocessComments(`float a = 1.0;`);

// Parse the source text into an AST
const ast = parser.parse(commentsRemoved);

// Then preprocess it, expanding #defines, evaluating #ifs, etc
preprocessAst(ast);

// Then convert it back into a program string, which can be passed to the
// core glsl parser
const preprocessed = generate(ast);

Manipulating and Searching ASTs

Visitors

The Shaderfrog parser provides a AST visitor function for manipulating and searching an AST. The visitor API loosely follows the Babel visitor API. A visitor object looks like:

const visitors = {
  function_call: {
    enter: (path) => {},
    exit: (path) => {},
  }
}

Where every key in the object is a node type, and every value is an object with optional enter and exit functions. What's passed to each function is not the AST node itself, instead it's a "path" object, which gives you information about the node's parents, methods to manipulate the node, and the node itself. The path object:

{
  // Properties:

  // The node itself
  node: AstNode;
  // The parent of this node
  parent: AstNode | null;
  // The parent path of this path
  parentPath: Path | null;
  // The key of this node in the parent object, if node parent is an object
  key: string | null;
  // The index of this node in the parent array, if node parent is an array
  index: number | null;

  // Methods:

  // Don't visit any children of this node
  skip: () => void;
  // Stop traversal entirely
  stop: () => void;
  // Remove this node from the AST
  remove: () => void;
  // Replace this node with another AST node. See replaceWith() documentation.
  replaceWith: (replacer: any) => void;
  // Search for parents of this node's parent using a test function
  findParent: (test: (p: Path) => boolean) => Path | null;
}

Visit an AST by calling the visit method with an AST and visitors:

import { visit } from '@shaderfrog/glsl-parser/ast';

visit(ast, visitors);

The visit function doesn't return a value. If you want to collect data from the AST, use a variable in the outer scope to collect data. For example:

let numberOfFunctionCalls = 0;
visit(ast, {
  function_call: {
    enter: (path) => {
      numberOfFunctionCalls += 1;
    },
  }
});
console.log('There are ', numberOfFunctionCalls, 'function calls');

You can also visit the preprocessed AST with visitPreprocessedAst. Visitors follow the same convention outlined above.

import {
  parser,
  visitPreprocessedAst,
} from '@shaderfrog/glsl-parser/preprocessor';

// Parse the source text into an AST
const ast = parser.parse(`float a = 1.0;`);
visitPreprocessedAst(ast, visitors);

Stopping traversal

To skip all children of a node, call path.skip().

To stop traversal entirely, call path.stop() in either enter() or exit(). No future enter() nor exit() callbacks will fire.

Visitor .replaceWith() Behavior

When you visit a node and call path.replaceWith(otherNode) inside the visitor's enter() method:

  1. otherNode and its children are visited by the same visitors.
  2. The exit() function of the visitor is not called.

Notes:

  • Calling .replaceWith() in a visitor's exit() method is undefined behavior.
  • Replacing a node with the same type can cause infinite recursion, as the visitor will continue to visit the replaced node of the same type. You must handle this case manually.

These rules apply to all visitors, both GLSL AST visitors and preprocessor AST visitors.

Utility Functions

Rename variables / identifiers in a program

You can rename bindings (aka variables), functions, and types (aka structs) with renameBindings, renameFunctions, and renameTypes respectively.

The signature for these methods:

const renameBindings = (
  // The scope to rename the bindings in. ast.scopes[0] is the global scope.
  // Passing this ast.scopes[0] renames all global variables
  bindings: ScopeIndex,

  // The rename function. This is called once per scope entry with the original
  // name in the scope, to generate the renamed variable. 
  mangle: (name: string) => string
): ScopeIndex

These scope renaming functions, renameBindings, renameFunctions, and renameTypes, do two things:

  1. Each function mutates the AST to rename identifiers in place.
  2. They return an immutable new ScopeIndex where the scope references themselves are renamed.

If you want your ast.scopes array to stay in sync with your AST, you need to re-assign it to the output of the functions! Examples:

import { renameBindings, renameFunctions, renameTypes } from '@shaderfrog/glsl-parser/parser/utils';

// Suffix top level variables with _x, and update the scope
ast.scopes[0].bindings = renameBindings(ast.scopes[0].bindings, (name) => `${name}_x`);

// Suffix function names with _x
ast.scopes[0].functions = renameFunctions(ast.scopes[0].functions, (name) => `${name}_x`);

// Suffix struct names and usages (including constructors) with _x
ast.scopes[0].types = renameTypes(ast.scopes[0].types, (name) => `${name}_x`);

There are also functions to rename only one variable/identifier in a given scope. Use these if you know specifically which variable you want to rename.

import { renameBinding, renameFunction, renameType } from '@shaderfrog/glsl-parser/parser/utils';

// Replace all instances of "oldVar" with "newVar" in the global scope, and
// creates a new global scope entry named "newVar"
ast.scopes[0].bindings.newVar = renameBinding(
  ast.scopes[0].bindings.oldVar,
  'newVar',
);
// You need to manually delete the old scope entry if you want the scope to stay
// in sync with your program AST
delete ast.scopes[0].bindings.oldVar;

// Rename a specific function
ast.scopes[0].functions.newFn = renameFunction(
  ast.scopes[0].functions.oldFn,
  'newFn',
);
delete ast.scopes[0].functions.oldFn;

// Rename a specific type/struct
ast.scopes[0].functions.newType = renametype(
  ast.scopes[0].functions.oldType,
  'newType',
);
delete ast.scopes[0].functions.oldType;

Debugging utility functions

The parser also exports a debugging function, useful for logging information about the AST.

import { debugScopes } from '@shaderfrog/glsl-parser/parser/utils';

// Print a condensed representation of the AST scopes to the console
debugScopes(ast);

What are "parsing" and "preprocessing"?

In general, a parser is a computer program that analyzes source code and turn it into a data structure called an "abstract syntax tree" (AST). The AST is a tree representation of the source program, which can be analyzed or manipulated. A use of this GLSL parser could be to parse a program into an AST, find all variable names in the AST, rename them, and generate new GLSL source code with renamed variables.

Preprocessing is a text manipulation step supported in shader source code. One way to think about preprocessing is it's a glorified find and replace language that's also part of your program's source code. Special lines starting with # tell the preprocessor how to manipulate other text in the program source code. Some programs are not parsable until they are preprocessed, because the source code may be invalid until the text find and replacements are done. During preprocessing, the source code is treated purely as text. Said another way, there is no consideration for the GLSL source code structure, like float, vec2, etc. Preprocessing only handles special lines and rules starting with #.

See the GLSL Langauge Spec to learn more about GLSL preprocessing. Some notable differences from the C++ parser are no "stringize" operator (#), no #include operator, and #if expressions can only operate on integer constants, not other types of data. The Shaderfrog GLSL preprocessor can't be used as a C/C++ preprocessor without modification.

Parsing, preprocessing, and code generation, are all phases of a compiler. This library is technically a source code > source code compiler, also known as a "transpiler." The input and output source code are both GLSL.

State of this library

The Shaderfrog compiler has tests for the more complex parts of the GLSL ES 3.00 grammar. This library is definitively the most complete GLSL compiler written in Javascript.

This library is used by the experimental Shaderfrog 2.0 shader composer. The compiler has wide expoure to different GLSL programs.

This library also exposed:

  • A typo in the official GLSL grammar specification.
  • A bug in Chrome's ANGLE compiler.

This library doesn't support full "semantic analysis" required by the Khronos GLSL specification. For example, some tokens are only valid in GLSL 1.00 vs 3.00, like texture() vs texture2D(). This parser considers both valid as they're both part of the grammar. However if you send compiled source code off to a native compiler like ANGLE with the wrong texture function, it will fail to compile.

This library is mainly for manipulating ASTs before handing off a generated program to a downstream compilers like as ANGLE.

The preprocessor supports full macro evaluations and expansions, with the exceptions of __LINE__. Additional control lines like #error and #pragma and #extension have no effect, and can be fully preserved as part of parsing.

Limitations of the Parser and Preprocessor

Known Issues

  • There's probably some bugs in the preprocessor logic. I haven't yet verified all of the evaluations of "binary" expressions in preprocessor.ts
  • preprocessor.ts has lots of yucky typecasting

Known missing semantic analysis compared to the specification

  • Compilers are supposed to raise an error if a switch body ends in a case or default label.
  • Currently no semantic analysis of vertex vs fragment shaders

Deviations from the Khronos Grammar

  • The parser allows for programs to contain top level preprocessor statements, like #define X, as a convenience to avoid preprocessing some programs, and for simple programs lets you preserve preprocessor lines in the AST if you want to do something specific with them. However, preprocessor statements at any other part of the program are not allowed. The Khronos grammar does not include preprocessor statements.
  • selection_statement is renamed to if_statement
  • The grammar specifies declaration itself ends with a semicolon. I moved the semicolon into the declaration_statement rule.
  • The grammar has a left paren "(" in the function_call production. Due to how I de-left-recursed the function_call -> postfix_expression loop, I moved the left paren into the function_identifier production.
  • Function calls in the grammar are TYPE_NAME LEFT_PAREN, in my grammar they're IDENTIFIER LEFT_PAREN, because TYPE_NAME is only used for structs, and function names are stored in their own separate place in the scope.

Local Development

To work on the tests, run npx jest --watch.

The GLSL grammar definition lives in src/parser/glsl-grammar.pegjs. Peggy supports inlining Javascript code in the .pegjs file to define utility functions, but that means you have to write in vanilla Javascript, which is terrible. Instead, I've pulled out utility functions into the grammar.ts entrypoint. Some functions need access to Peggy's local variables, like location(s), so the makeLocals() function uses a closure to provide that access.

To submit a change, please open a pull request. Tests are appreciated!

See the Github workflow for the checks run against each PR.