can now write more complex semgrep rules, see semgrep.yml
=> semgrep support for ocaml (dogfood)
no more meta_ast_xxx.ml
parser ported from yacc grammar in official Go repository.
parser ported from Diamond-back Ruby, including its intermediate-language RIL.
right now supports mostly Python and Go though.
suited for complex static analysis (a mix between CIL/RIL/…).
foreach, lambda, &&, etc.
like for the other languages.
features
ported the Javascript, C++, and OCaml grammar to menhir
10 years of Pfff! Started in November 2009 (while at Facebook).
related, mini, and scheck in separate repositories either under github.com/returntocorp (pfff, sgrep, check_generic) or under github.com/aryx
a generic AST helping to factorize AST/visitor-based analysis, but now also CFG/DFG-based analysis, across multiple languages.
basic lexer and highlighter.
to factorize analysis on generic AST (e.g., control flow graph). port control flow graph of PHP to AST generic (and unreachable statement check) port data flow analysis of PHP to AST generic (and useless assignment check).
to gather all fuzzy parsing code.
to fully support Python3 (especially type annotations)
differentiate imported modules, entities, locals, params, globals! semantic highlighting!
C, and Java.
and sgrep will find aliases. Support for Python only for now (Javascript should be similar).
the PHP version of sgrep never had that! (coccinelle had though, and even better with working on the CFG).
e.g., unreachable statement detection using a generic control flow graph.
transpile JSX, patterns, generators, just enough for codegraph to work on most node_modules/
(for r2c)
handle far more code. Far more robust Javascript parser.
do it both via error recovery as stated in the spec, and by using a fix_token parsing hack strategy when needed.
and globals
(I need some of the new code from pfff in efuns, which I also want to release in OPAM)
(cairo was not working anymore with opam 2.0.2 or ocaml > 4.06)
thanks to Cygwin and Cygwin/X. See install_windows.txt. (I had to port pfff to Windows after I bought the Surface book 2)
enough to recognize most attributes in the stdlib/ and to colorize them in codemap and efuns
extracted from codegraph, to avoid dependencies to GUI
without having to know ocaml
motivated by magicator?
(to automatically generate chunks of a literate programs)
using toy datalog (written in Lua)
7000 and 100000 LOC respectively
it tries to detect automatically less relevant code, that is the yet-another-extension-plugins that are not important to understand the software architecture (e.g. for Linux it would be the thousands of device drivers, the many fs, internen protocols, etc).
so can experiment with datalog fact generation and datalog logic
(e.g. function pointers)
handle dependencies of cpp constructs!
mostly port of miniC, but doing the linearization of expression and handle more than just one file by hooking into graph_code_c!
can explore dependencies in emacs source :)
for graph_code_clang and also graph_code_c
so can explore usages of a field for instance via emacs middle click
run for instance on include/ kernel/ while still using information from the skip_list
with archi_code classification
and toplevel prototypes
4 spaces
for implicit fields
use of spatch for ocaml :)
(do not use the probably obsolete line information in graph_code, trust more the lexer/parser/highlighter fuzzy def/use classifier)
so can hover on some code and see the corresponding definition (as well as the other users of this definition and the uses of this definition).
scalable spirit! a visualization should apply from the topmost entities to the lowest one! Also factorized more code.
finally … basic TAGS functionality, “hypertext” because can follow “links”
so easy to manually configure which parts to not display when the heuristics in archi_code_lexer are not enough.
light db (while still filtering object files and skip list files)
less backward dependencies on plan9 for instance
Deadcode, UnusedExport, UndefinedDefOfDecl
!!deadcode false positives really help to find bugs in other parts of the analysis!!
(port what was in cmf)
for structures
extensions)
trailing commas, interpolated strings, and fix bugs in regexp and xhp
skip list
simplified configure output too
http://ocaml.org/meetings/ocaml/2013/program.html
via graph_code
via graph_code
via graph_code, great!
associate a depth in the use graph to each file and entities.
refresh to 4.01, fix many bugs, cleanup makefiles, added some builtins, added debugging support, better error message, stack trace, etc.
https://travis-ci.org/facebook/pfff
(but not for the rest of pfff/ so people don’t need OPAM to compile pfff)
with rpc calls for getting cell information
tested for ocaml
(fixed my long standing issue with canvas and flexible text font size)
(e.g. minus on a ‘…’)
implicit fields, xhp comment syntax, trailing comma, (new Foo) and id, trait renaming feature
adapted each time:
- CST, AST (but not pretty print AST)
- visitors, dumpers, mappers
- tags, prolog db
- sgrep/spatch
- scheck/cmf
- codegraph
got from 7 s/r conflicts to 1, and can now parse more code by being more orthogonal.
make huge refactoring on ASTs easier
making codegraph usable on ugly projects with hundreds of subdirectories (but very fragile, lots of new exceptions)
a thin wrapper around the clang/llvm ast dumps of clang-check -emit-ast.
basic lexer and grammar
Handle nested classes, generic classes, anon classes (understand better how things are bytecompiled, e.g. where does foo$1$bar.class come from?)
added OCaml (.ml and .cmt), Java (.java and .class), and improved PHP. Also improved speed a lot.
in addition to PHP, support for java (via .class and .java files), OCaml (via .cmt files)
with graph_code support.
of ocaml code, including pfff code
regarding spacing issues
can fix bugs one by one
with support for PHP, C,
a simplified version of lang_cpp/ just focusing on C. Also added graph_code_c.ml, mainly for analyzing xv6
Java features. Also added graph_code_java.ml.
far less false positives
futur backend of codegraph
good basis then for graph_code_php.ml, new, check_variables_php.ml, etc.
(please rerun ./configure if you have compilation pbs)
basic support update: removed later
less error messages, progress meter, split files, each PHP database has its own file, refactored abstract interpreter, etc.
especially useful to query inheritance information.
which leads to a more precise callgraph, type information, and opens the way for many more checks (including security checks using tainting analysis).
while still being reasonably fast, thanks to the use of a lazy entity finder.
after some transformation. Thx to julien. spatch –pretty-printer.
so can parse julien’s code which heavily use modules.
(and prolog) instead of a PIL+cflow+dataflow+db+…
with some helper functions to check the validity of an overlay. Also added some support for overlay in codemap.
Overlays help organize and visualize a (bad) codebase from a different point of view.
a package/module dependency visualizer exporting data for Gephi (only for ocaml code for now). Played with it on the code of pfff (in package mode) and some of its components: codemap, cpp, php, cmf (in module mode with and without extern mode).
update: superseded by codegraph
can now display the treemap. Had to report many bugs to the ocsigen team to get this to work.
complete refactoring of the parser. Far less heuristics techniques. Closer to what was described in the CC’09 paper.
first play with ocsigen.
update: not used
(also using code from a stripped down version of ocamlnet)
(using code from ccss by dario teixeira)
later in codemap or pfff_statistics
- architecture/aspect layer (default one)
- static dead code layer
- dead: dynamic live code layer (using xhprof data)
- test coverage layer (using phpunit and xdebug data)
-
- bugs layer
- security layer
- cyclomatic complexity,
- age (and activity) layer
- number of authors layer
can see arguments passed by refs visually (TakeArgNByRef) as well as functions containing dynamic calls (ContainDynamicCall)
can now visualize the result of a git grep on a project
use different color for dirs and files labels, and highlight first letter of label at depth = 1
a DSL to express easily refactoring on PHP.
on attributes
can now express patterns like: sgrep -e ‘foreach($A as $V) { if (strpos($T, $V) !== false) { return Z; }}’
so can visualize also Tex/Latex/Noweb source (which includes the documentation of pfff!)
based on joust
just a lexer-based highlighter. No real parser yet.
more highlights
update: superseded by ast_php_simple.ml
update: supersed by abstract interpreter
- does it take argument by refs.
- does it contain dynamic calls ($fn(…))
This can help the visualizer to give more semantic visual feedback.
wrote wiki pages (intro, sgrep, spatch, features, vision, roadmap, etc)
applied codemap on many open source project and generated screenshots.
thanks to literate programming and codemap itself to show the problem and assist in the refactoring
and put more generic stuff in h_program-lang/
a polymorphic wrapper around ocamlgraph
(to compute strongly connected components of php callgraph, in prevision of a bottom up analysis of php)
update: also useful for codegraph backend.
first public release! https://github.com/facebook/pfff
Real start of multi-language support.
Show treemap and thumbnails of file content! Have also minimap, zoom, labels, alpha for overlapping labels, labels in diagonal, anamorphic content showing in bigger fonts the important stuff, magnifying glass, clickable content where a click opens the file in your editor at the right place, etc. => A kind of google maps but on code :)
Support for PHP, Javascript, ML, C++, C, thrift.
For PHP do also URL highlighting which helps understand the control flow in webapps. Also highlight local/globals/parameters variables differently. Also highlight bad smells (especially security related bad smells)
Integrate other PL artifacts:
- The builtins API reference
- PLEAC cookbooks
=> a single place to query information about the code (no need to first grep the code, then google for the function because it turns out to be a builtin).
Can easily go the definition of a function (whether it’s a builtin or not, thanks to the parsable PHP manual and HPHP idl files).
Can easily go to the example of use of a function (whether it’s a builtin or not, thanks to PLEAC for the builtin functions).
Far more flexible and powerful than the previous treemap visualizer which was using Graphics. Now also render file content!
support for PHP ocaml
Allow to use and experiment the treemap code visualizer on the pfff source itself; to see if such features are useful.
very basic support. Just highlighting
using JSON as support. Will help make pfff less php-specific.
support linear patterns (e.g. sgrep -e ‘X & X’) and a -pvar option to print matched metavarables instead of matched code
reorganized the treemap and h_program-lang/ to be less facebook and pfff specific. Have a commons/file_type.ml for instance.
warn about “unused variable” and “use of undefined variable”.
use fast global analysis (bonus: it’s flib-aware and desugar the require_module_xxx and other flib conventions).
a more precise TAGS file generator (bonus: it’s xhp-aware).
parsing/unparsing/dumping. preliminary refactoring support.
a more conveninent AST to work on for doing complex analysis such as dataflow, type-inference, tainted analysis, etc.
to grab all the files needed to check one file, in a way similar to what gcc does with cpp. Provide a DFS and BFS algo.
rank, filter, parallelize (using MPI), cronize.
fix parsing (lexer) and unparsing bugs
introduce the transfo field, mimicing part of coccinelle.
improve support for XHP and refactoring, merging tokens heuristic.
static method calls analysis (with self/parent special cases handling)
users_of_class, users_of_define, extenders/implementers of class
fix bugs in lexer, now can parse <?= code
and moved code from facebook/ to analyze_php/
unit tests for parsing, analysis, deadcode, callgraph, xdebug
extract and modularize php highlighting logic from gtk GUI.
started integrate treemap and web GUI.
used by web ui of acrichton
static arrays lint checks
dead? proto of undeterministic PHP bugs finder using diff and xdebug
control flow graph analysis: useful for cyclomatic complexity, and potentially useful or far more things (sgrep, dataflow, etc)
dead? start of dataflow analysis
start of coverage analysis (static and dynamic)
start of include_require static analysis (and flib file dependencies too)
dead? start of type unioning
but for now very rudimentary update: not used for now
reorganized json/sexp output, factorize code and use more ocaml.ml
Done for type “inference/extraction” at the beginnning and useful for coverage too!
update: not really used for now, superseded by julien static type inference
could be useful at some point for better type checking or type inference update: not used for now
by source-to-source transformation.
now I can code in PHP :)
with map_php.ml. Used by ppp and closure implemetation.
- a -emacs flag
- improved -xhp and made it the default operating mode
- do fixpoint analysis per file
a code matcher working at the AST level
update: superseded by cairo-based viewer (but reused most of the algorithms)
ref from sebastien bergmann
Just have A new -pp option to give opportunity to call a preprocessor (eg ‘xhpize -d’).
a new -json option and json support
also supported in sgrep.
programmer manual for parsing_php/ internals manual for parsing_php/
!!use literate programming method (via noweb/syncweb)!! (hence the special marks in the source)
callgraph for methods (using weak heuristic), with optimisations to scale (partially because use weak heuristic)
deadcode analysis v2, v3, v4
IRC support (adapting ocamlirc/) update: not used anymore
complement git.ml
deadcode analysis v1
update: superseded by codemap, a fancy GUI using cairo and gtk
reused Zend flex/bison code.