Skip to content

Latest commit

 

History

History
40 lines (29 loc) · 3.23 KB

mx-print-token-graph.md

File metadata and controls

40 lines (29 loc) · 3.23 KB

mx-print-token-graph

This tools helps to visualize the relationship between different types of tokens:

  • Parsed tokens, which are directly referenced by Decls, Stmts, Attrs, etc. These are the tokens that make it to the parsing and semantic analysis phases of the compiler. Whitespace and comments are represented in parsed tokens that are not a result of macro expansion.
  • Macro tokens, which are directly referenced by Macros. These are the tokens that are interpreted by the C preprocessor. There is a one-to-one relationship between the "final remaining" macro tokens after the pre-processing step and parsed tokens. Whitespace is not represented in macro tokens.
  • File tokens, which are the tokens that are present in a source file, whitespace and comments included. These are directly referenced by Files.

There is one more category of tokens, type tokens, which are referenced by Type entities. Types are more ephemeral than other entities in Multiplier; however, it's valuable to be able to print them, and so all Type entities have corresponding tokens representing their printable form.

This tool is also valuable at showing how, given a Token, one can find their way back into the AST using a feature called TokenContexts. Mostly, though, the output of this tool helps to reify various connections between entities in Multiplier's data model. Anyway, let's get back to the tool. First, we'll get a list of interesting functions and their corresponding fragments:

% ./bin/mx-list-functions --db /tmp/curl.db | grep scanf           
1152921504606847034 2305843009214747321 9475573621013348352 fscanf  decl
1152921504606847034 2305843009214747339 9475573621032222720 scanf decl
1152921504606847034 2305843009214747345 9475573621038514176 sscanf  decl
1152921504606847034 2305843009214747377 9475573621072068608 __svfscanf  decl
1152921504606847034 2305843009214747407 9475573621103525888 vfscanf decl
1152921504606847034 2305843009214747408 9475573621104574464 vscanf  decl
1152921504606847034 2305843009214747412 9475573621108768768 vsscanf decl
1152921504606847464 2305843009214764555 9475573639084507136 ber_scanf decl

We'll use the fragment ID, 2305843009214747339, for scanf from the second column of the above output. I chose scanf because it usually has a macro associated with it, marking it as a "scanf-like function". Then, we copy this fragment ID, and run mx-print-token-graph:

% mx-print-token-graph --db /tmp/curl.db --fragment_id 2305843009214747339 >/tmp/test.dot
% xdot /tmp/test.dot

The output is a DOT digraph, which rendered looks as follows:

Token graph of scanf

There are options to add additional data to the graphs, but it makes them less readable. These options are:

  • --with_categories: Shows the token categories (these are useful for syntax highlighting).
  • --with_related_entity_ids: Shows the related entity IDs built in for each token. These are not references, but are instead embedded by the indexer when saving the tokens to the database. These built-in related entity IDs are used to make "clickable" tokens.
  • --with_token_offsets: This is for debugging.