From 51a7a9dae7b76e63d3a432c993c89892223bba1a Mon Sep 17 00:00:00 2001 From: Kevin da Silva Date: Tue, 19 Oct 2021 21:03:18 -0300 Subject: [PATCH 1/2] Create JsonParser.md --- Projects/3-Advanced/JsonParser.md | 72 +++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 Projects/3-Advanced/JsonParser.md diff --git a/Projects/3-Advanced/JsonParser.md b/Projects/3-Advanced/JsonParser.md new file mode 100644 index 00000000..b11907e8 --- /dev/null +++ b/Projects/3-Advanced/JsonParser.md @@ -0,0 +1,72 @@ +# Json Parser + +**Tier:** 3-Advanced + +Json is an essential part of a regular developers workflow and one of the most popular +communication formats between systems and web applications. But have you ever asked yourself: +How is a standard text format .json translated into many different data structures +(objects in js, arrays in php and data types in haskell)? + +The only way to discover this is by digging in the basics of compilers and automata theory. + +Requirements: + +- You must have a good knowledge of data structures(trees, tables, tuples) and a good produtivity with one programming language. +- You must be familiar with automata theory and formal languages or computing theory. + +All the requirements will probably have many courses and tutorials available in your native language, +so i recommend you to do an youtube search on the topics above. + +## User Stories + +### Tokenization +The first phase of the compiler is the lexical analysis that consists into reading a .json file and add labels to the data. +Json has the following token types: +OpenObjToken | CloseObjToken | IdentifierKeyToken | StringToken | NumberToken | NullToken +| OpenArrayToken | CloseArrayToken | BooleanToken | SeparatorToken | Empty --end of file + +- [ ] Read the inputs and store than in a similar data structure like this: [(OpenObjToken. "{"), (SeparatorToken. ",")] + +### Sintatic Analysis +Sintatic Analysis consist of checking if all the tokens are in the correct place (open object "{" appears first than close object token "}", for example). +The json rules are described in this free context grammar: +A -> [C] | {K} | {} | [] +C -> TEC | T +K -> "key":TEK | "key":T +E -> , +T -> "str" | 0-9 | 0.0-9.9 | A | true | false | null + +- [ ] Create sintatic analysis rules and process the tokens list we get in the previous step. +- [ ] Create a parse tree following the rules of the free context grammar using the tokens you got previously. + +### Semantic analysis +Semantic analysis is about guaranteeing the actions and meaning of our programs, +for example checking if a variable exists before using it or if the value is numeric before multiplying its value. +In json we dont have these concepts that can generate errors but we have something we have to check in semantic analysis: +We have to check if a key is declared more than once in the same scope: +{ key: "123", key: 123 } --we should print a warning that the key is declared twice +{ key: "123", scope: { key: 123 } } --we should not print a warning because the keys are not in the same scopes. + +- [ ] Process the parse tree and create semantic analysis to warn if a key is duplicated in the same scope +- [ ] Create the intermediary representation if necessary or just use the previous parse tree if not. + +### Code generation +The phase of the backend of the compiler generally we compile our programs to binary, +but with our json parser we are going to compile then to a .js file with the same name of the .json file +src/{JSON FILE NAME}.js +module.exports = {our processed json exporting as object} + +- [ ] Read the parse tree or the IR and convert the lines into js code +- [ ] Save the file with the same name of the .json file. + +## Bonus features + +- [ ] Keep track of collumn and line of the file and inform it in an error message when you get an invalid .json structure +- [ ] Separate your errors into Lexical errors and sintatic errors with a label in the beginning of the error like: "[LEXICAL ERROR] error line:3 col:2" +- [ ] Wrap it all into a cli or a simple user interface +- [ ] Try to add colors to differentiate your errors, success and warning events in the parser +- [ ] Using the same parsetree or intermediary representation try to convert to a .yml file instead of a .js object + +## Example projects + +[My implementation of the json parser](https://github.com/KevinDaSilvaS/sparser-invaders) From 6f112d8a89a2363eef23ada5ca53e65485ea1d68 Mon Sep 17 00:00:00 2001 From: Kevin da Silva Date: Tue, 19 Oct 2021 21:19:28 -0300 Subject: [PATCH 2/2] Update JsonParser.md --- Projects/3-Advanced/JsonParser.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/Projects/3-Advanced/JsonParser.md b/Projects/3-Advanced/JsonParser.md index b11907e8..497fa7bf 100644 --- a/Projects/3-Advanced/JsonParser.md +++ b/Projects/3-Advanced/JsonParser.md @@ -30,11 +30,11 @@ OpenObjToken | CloseObjToken | IdentifierKeyToken | StringToken | NumberToken | ### Sintatic Analysis Sintatic Analysis consist of checking if all the tokens are in the correct place (open object "{" appears first than close object token "}", for example). The json rules are described in this free context grammar: -A -> [C] | {K} | {} | [] +``A -> [C] | {K} | {} | [] C -> TEC | T K -> "key":TEK | "key":T E -> , -T -> "str" | 0-9 | 0.0-9.9 | A | true | false | null +T -> "str" | 0-9 | 0.0-9.9 | A | true | false | null`` - [ ] Create sintatic analysis rules and process the tokens list we get in the previous step. - [ ] Create a parse tree following the rules of the free context grammar using the tokens you got previously. @@ -44,8 +44,8 @@ Semantic analysis is about guaranteeing the actions and meaning of our programs, for example checking if a variable exists before using it or if the value is numeric before multiplying its value. In json we dont have these concepts that can generate errors but we have something we have to check in semantic analysis: We have to check if a key is declared more than once in the same scope: -{ key: "123", key: 123 } --we should print a warning that the key is declared twice -{ key: "123", scope: { key: 123 } } --we should not print a warning because the keys are not in the same scopes. +``{ key: "123", key: 123 } --we should print a warning that the key is declared twice +{ key: "123", scope: { key: 123 } } --we should not print a warning because the keys are not in the same scopes.`` - [ ] Process the parse tree and create semantic analysis to warn if a key is duplicated in the same scope - [ ] Create the intermediary representation if necessary or just use the previous parse tree if not. @@ -54,7 +54,7 @@ We have to check if a key is declared more than once in the same scope: The phase of the backend of the compiler generally we compile our programs to binary, but with our json parser we are going to compile then to a .js file with the same name of the .json file src/{JSON FILE NAME}.js -module.exports = {our processed json exporting as object} +``module.exports = {our processed json exporting as object}`` - [ ] Read the parse tree or the IR and convert the lines into js code - [ ] Save the file with the same name of the .json file.