JavaScripts source maps seem to typically be at no finer than token granularity. As an example, identity-map uses token granularity.
I know I've seen other examples, but can't remember where.
Why don't we use AST-node based granularity instead? That is, if our source maps had locations for all and only starts of AST nodes, what would be the downside?
In my understanding, source maps are used for crash stack decoding and for debugging: there will never be an error location or useful breakpoint that isn't at the start of some AST node, right?
Some further clarification:
The question pertains to cases where the AST is already known. So "it's more expensive to generate an AST than an array of tokens" wouldn't answer the question.
The practical impact of this question is that if we could decrease the granularity of source maps while preserving the behavior of debuggers and crash stack decoders, then source maps could be much smaller. The main advantage being performance of debuggers: dev tools can take a long time to process large source files, making debugging a pain.
Here is an example of adding source map locations at the token level using the source-map library:
for (const token of tokens) {
    generator.addMapping({
      source: "source.js",
      original: token.location(),
      generated: generated.get(token).location(),
    });
}
And here is an example of adding locations at the AST node level:
for (const node of nodes) {
    generator.addMapping({
      source: "source.js",
      original: node.location(),
      generated: generated.get(node).location(),
    });
}
Q1: Why expect there to be fewer starts of AST Nodes than starts of tokens?
A1: Because if there were more starts of AST Nodes than starts of tokens then there would be an AST Node that starts at a non-token. Which would be quite an accomplishment for the author of the parser! To make this concrete, suppose you have the following JavaScript statement:
const a = function *() { return a + ++ b }
Here are the locations at the starts of tokens:
const a = function *() { return a + ++ b } /*
^     ^   ^        ^^^ ^ ^      ^ ^ ^  ^ ^
*/
Here's roughly where most parsers will say the starts of AST Nodes are.
const a = function *() { return a + ++ b } /*
^     ^   ^              ^      ^   ^  ^
*/
That's a 46% reduction in the number of source-map locations!
Q2: Why expect AST-Node-granularity source maps to be smaller?
A2:See A1 above
Q3: What format would you use for referencing AST Nodes?
A3: No format. See the sample code in Update 1 above. I am talking about adding source map locations for the starts of AST Nodes. The process is almost exactly the same as the process for adding source map locations for the starts of tokens, except you are adding fewer locations.
Q4: How can you assert that all tools dealing with the source map use the same AST representation?
A4: Assume we control the entire pipeline and are using the same parser everywhere.
A Sourcemap is a file that maps from the transformed source to the original source. It is a mapping between the generated/transpiled/minified JavaScript file and one or more original source files. The main purpose of Sourcemaps is to aid debugging.
A source map is a file that maps from the transformed source to the original source, enabling the browser to reconstruct the original source and present the reconstructed original in the debugger.
Source maps connect the bundle file with corresponding source files. Source maps are not a Webpack only concept. It is a common debugging technique which is respected by all modern browsers. Webpack has the ability to generate source maps.
You need to open browser Dev Tools the, go to "Sources", on the right panel, you click on simple. js . It states "Source Map detected".
The TypeScript compiler actually only emits sourcemap locations on AST node bounds, with some exceptions to improve compatibility with certain tools that expect mappings for certain positions, so token-based maps actually aren't quite universal. In the example you give, TS's sourcemaps are for positions like so:
const a = function *() { return a + ++ b } /*
^     ^^  ^              ^      ^^  ^  ^^^
*/
Which are generally both the start and end of each Identifier AST node (plus starts otherwise).
The rationale for mapping both start and end positions for an Identifier AST node is pretty simple - when you rename an Identifier, you want a selection range on that renamed identifier to be able to map back to the original identifier, without necessarily relying on heuristics.
It is possible to use AST granularity, but usually to build an AST, you need before to tokenize the code anyway. For debugging purpose AST is an unneccessary step as the syntax analyzer must be fed with tokenized data, in order to work.
An interesting resource on topic
I suggest also to explore acornJS sourcecode and take a look how it produces AST
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With