ToY Language Compiler using Lex and YACC
1
Mudumba Hanumat Krishna Vignesh, 2Nallapareddy Vamsidhar Reddy, 3Sake Mohith Sai, 4Meena
Belwal
1, 2, 3, 4
Department of Computer Science and Engineering, Amrita School of Computing, Bengaluru,
Amrita Vishwa Vidyapeetham, Karnataka, India.
1
vigneshmudumba40@[Link], 2nvamsidhar2003@[Link], 3sakemohith14@[Link],
4
b_meena@[Link]
Abstract
The project we are working on now is connected to the modern compiler application, but it
is the development of a parser and a type-checker for ToY, sub-language, using tools like
flex, bison and Java. Designing algorithm that can able to perform all jobs pertaining to a
new language parser and type checker is a hard task. It's hard because you have to design
fast algorithms that capture the formal syntax and semantics of the language. Blending in,
making and programming a real compiler with Flex, bison, and Java all together is an
overcome. The problems that we are facing are the lack of the design of the parser and type-
checker of the ToY language. To start off there will be a flexible and bison used in order to
solve the lexical and syntax analysis problem. Among other things, semantic problems can
be solved in Java. By implementing our methodology, we created a parser and a type-checker
for the ToY language that was done with the accurate syntax and semantic analysis and thus
the way was cleared for the next stage in compiler construction.
Keywords: Lexical analysis, Syntax analysis, Semantic analysis, Symbol table, Compiler
phases.
1. Introduction
The compiler construction is a basic area in computer science that contains the creation of the
software that can translate the high-level programming languages into the machine-readable
code. In this paper, we describe the process and the materials of the compiler for the ToY
programming language through Lex and YACC. ToY is an easy, educational programming
language that is intended to be the first step for the learners of the programming language
design and compiler construction.
The development of efficient compilers is a topic of research that is currently among the most
advanced in software engineering, which is a part of computer science. The researchers are
constantly busy with creating key systems where the code preparation becomes easy and
flexible, hence serving as the basis for development of elaborate confounding techniques.
Although it is written and intelligible, it's becoming the complication that has to be dealt with,
a compiler for the unique new language. The process becomes more difficult since use of
tools like flex, bison, and Java which all work and interact with each other in a complicated
way and require thorough understanding of tools, how they work and how they interact with
each other.
Electronic copy available at: [Link]
The main point here is to employ Flex and Bison to establish a parser and a type-checker for
ToY language, while Java as semantic representation as a language to fulfil this task. The
main research question we address is: Where do we tie the parsing module and type checker
for the ToY language based on the Java version of flex, bison and scanner?
The detailed specifications are covered in Section 2, highlighting the summary of the ToY
language. The next section will talk about the process by which the type-checker and the
parser are implemented. Our practicing results were exhibited in section 4 and section 5. The
6th section would be the final one, being a summary of the whole document with an epilogue
about possible follow up studies.
2. Related works
The authors, Schäfer et al. [1] conduct a thorough investigation of the process of compiler-
building using LLVM, Flex, and Bison. They really delve into Java platform focusing on
lexical analysis, parsing, and code generation. This article offered a comprehensive outline
of the whole compiler- building process that can be continued by people whose experience
in this field differs.
The authors, Basak et al. [2] mentioned the design of ToPL, a compiler for a typed, object-
oriented, functional language. These individuals have put forth their comprehension of
compilers' complexity for the creation of such languages. They have also touched on topics
such as type inference, object-oriented features, and functional programming structures.
The article of Khedr et al. [3] is a nice tutorial for those who start compiling. Theoretically,
this is a guide for creating compilers using LLVM, Flex, and Bison. It is very helpful for
beginners because of its clear explanations and practical examples make it an invaluable
guide for the one approaching this area for the first time.
Jones and Parker [4] contribute with a new parser generation method utilizing JavaCC and
AST transformation. The paper brings in new methods to facilitate the generation of parsers,
making it more effective and less prone to errors. Their study, among other things, deals with
the ways of boosting the parser generation approaches, thus improving the compiler
construction.
Pohl et al. [5] mentioned LLVM compiler design in elementary concepts of programming
language with the help of pass, analysis, optimization, and code generation. This paper
overviews the lexical analysis, parsing, semantic analysis, and the code creation, as well as
the other parts of the compiler design. The authors reveal insights and practical examples
which are quite bright if anything is taken care of the design and process of LLVM compilers.
Gupta et al. [6] provided a beneficial way to study compiler construction with the help of
JAVA along with its LLVM-version. This paper overviews the lexical analysis, parsing,
semantic analysis. This article not only provides tips that are useful for both beginner and
advanced programmers who are looking to construct compiler set-up, also practical exercises
and real-world examples.
Jain et al. [7] This paper has been prepared as a textbook which covers basic principles of
compiler design and implementation with the help of LLVM, Flex, and Bison. If you are a
novice in compiler construction, this tutorial is an excellent point you can begin with.
Electronic copy available at: [Link]
The authors Bansal et al. [8] provided the introduction with Flex and Bison of the compiler
design, and implementation with LLVM. Compiler development theory is expounded in the
following paper through clear proposals and relevant examples, covering the parsing,
semantic analysis, lexical analysis and code production basics.
Agarwal et al. [9] this insight uncovers compiler features. In their explanation of the
principles behind compilation, they include lexical analysis, parsing and code creation
processes with Flex, Bison and LLVM. This article is the link between the abstract ideas and
the practical application of these ideas by using detailed explanations and examples.
Gupta et al. [10] provided code manual for constructing a compiler with Flex, Bison, and
LLVM is in]. This article enables in understanding real-world examples and with in-depth
explanations gives the walking steps of compiler development.
The Aggarwal et al [11] exhaustive manual to compiler design with Flex, Bison, and LLVM.
The literal use of real-life examples and thorough explanations, the text goes over some of
the fields of compiler design in a step by step manner and depth.
Singh et al. [12] introduced compiler writing in Flex, Bison, and LLVM. This paper employs
a simple format consisting of brief explanations and relevant illustrations to cover the principle
elements of compiler development, which include parsing.
Verma et al. [13] have furnished us with a detailed compendium of compiler constructs and
their implementations done with aid of Flex, Bison, and LLVM. The paper presents the code
generation devising process by using the real-life scenarios and detailed instructions. It
provides all the compiler steps of the code generation stage.
The singh et al. [14] explained practical experience with Flex, Bison, and LLVM is definitely
helpful in compiler development. By means of true world scenarios and explanations in
depth, the paper provides every step of the compiler process. This paper helps the
readers in the understanding of the compiler construction ideas and methods by providing the
practical experience.
Agarwal et al. [15] described in detail a study compiler design method with Flex, Bison, and
LLVM. The article on compiler design is very informative as it uses real-world examples and
good explanations to cover various compiler design topics.
Kumar et al. [16], provide the basic building blocks on developing a compiler with Flex,
Bison, and LLVM. The paper gives a complete guide on how to make a compiler by giving
real-life examples and detailed explanations of each stage of the process. The goal of this
paper is to help readers walk though real-life cases in order to become proficient at the skill
of building compilers.
Shiva Teja et al. [17] came up with a method based on Recursive Descent Parsing and Abstract
Syntax Trees (AST) used for the recognition and evaluation of mathematical expressions in
an efficient and accurate way. Also, the inclusion of the graph visualization enabled users to
get to know mathematical structures naturally.
Likith et al. [18] provided a simple interface for the Compiler to carry out mathematical
operations. In order to construct a compiler that can comprehend English-like words and
Electronic copy available at: [Link]
perform the associated math operations, this project successfully combines the strength of
Lex and YACC.
Reddy et al. [19] suggested a PPO-based way of creating a deep reinforcement learning based
summary of the abstract. Their model outclassed Seq2Seq and PEGASUS baselines. The RL
based PPO model was designed in such a way that it could acquire the optimal summarization
techniques.
Antony P J and Dr. Soman [20] showed that a lot of methods have been used to create
morphological analysers and generators for the Indian languages. Besides, it was noticed that
the majority of the current parsers for Indian languages are built on the basis of statistical and
hybrid methods. The main problem that the development of such systems faces is to create
them in such a way that they are able to deal with the agglutinative and morphologically
sophisticated characteristics of the languages.
3. Methodology
We present our system diagram in figure 1. Various phases that are used in our flowchart are
explained below the system diagram.
3.1 System Diagram:
Figure 1. Flowchart
3.2 Language Specification:
Specify to the syntax and semantics of the ToY language. Put down the tokens and grammar
rules in EBNF.
3.3 Lexer (Using Flex):
Count on a lexer created using Flex to tokenize the input ToY code. Provide output to the
parser for parsing your tokenized stream.
3.4 Parser (Using Bison):
The parser is designed using Bison to accept a tokenized input and to produce a parse tree
from it. The semantic actions may take place to compose the AST (abstract syntax tree)
during the parsing.
Electronic copy available at: [Link]
3.5 Symbol Table:
Establish a symbol table that will contain the details of identifiers and their types. Put in
functionality that uses the symbol table for type- checking.
3.6 Abstract Syntax Tree (AST):
Exact design of the AST nodes for the ToY language is to be set. Amend the parser to construct
the AST while parsing.
3.7 Type-checking:
Add type-checking rules corresponding language specification on the ToY. Use the symbol
table and AST as a corrector for the type.
3.8 Error Handling:
Error handling techniques for lexical, syntax, and semantic errors should be included.
4. Results and Analysis
First, the parser for ToY compiler as well as the type checker was designed on Bison and
Flex for ToY language. The ToY language with a specific syntax should be executed. The
compiler is able to handle the various program structure elements, including variables,
expressions, control flow statements, and so forth.
We tested the ToY compiler for lexical, syntax and semantic analysis. We present the results
achieved through the following cases:
Case-1: Variable declaration
Input:
Figure 2. input for lexical analysis
Electronic copy available at: [Link]
We gave a simple code demonstrated in figure 2, where variables are declared in it based on the syntax
and gave it as input for the ToY compiler.
Output:
Figure 3. output tokens generated for case – 1
We got a list of tokens demonstrated in figure 3 for the input in figure 2, with their respective data
types and line numbers where it occurs in the code, produced by lexical analysis of the given code as
output.
Case-2: Mixed Declaration
Input:
Fig 4 – input for lexical analysis
Electronic copy available at: [Link]
The code in figure 4 shows multiple variable declarations. The function declares various
variables including characters, strings, arrays. However, double quotes are used to declare
strings instead of char arrays.
Output:
Figure 5. output for case – 2
The output is shown in figure 5 for input in figure 4. Since the input have some errors like character is
declared without being used, string spans over multiple lines, char exceeds maximum length of one
character, the lexical analysis should not generate the token.
This shows the correctness of our ToY language compiler in the scenario wherein due to errors the
output tokens should not be generated.
Case-3: Array operator
Input:
Figure 6. input for syntax analysis
Several variables are declared and initialised in the figure 6, along with them integer arrays
also declared and initialised with values 1 and 2 in this code. We gave it as input
Electronic copy available at: [Link]
Output:
Figure 7. output for case-3
We got list of tokens in figure 7 for the input given in figure 6, with their corresponding data
types and line numbers where it occurs in the code, and it is produced by syntax analysis as
output.
Symbol Table for syntax analysis:
Figure 8. output for case – 3
Electronic copy available at: [Link]
The table in figure 8 is generated for the input in figure 6. It gives a summary of symbols
present in the given program, generated during syntax analysis. It provides information like
names, datatypes, scopes, array sizes and whether it is a function are not, if it is then how
many arguments they take.
Case-4: Function and switch
Input:
Figure 9. input for semantic analysis
The code in figure 9 is given as input, it defines global integer variables and a function m ()
which compares local variables. Various variables like float and integer array are initialised
and a switch statement is also included.
Output:
Electronic copy available at: [Link]
Figure 10. output tokens generated for case - 4
Output tokens shown in figure 10, are generated by the ToY compiler for the input given in
figure 9, during semantic analysis with their corresponding data types and line numbers
where it occurs in the code.
Symbol Table for semantic analysis:
Figure 11. output for semantic analysis (case-4)
The table in figure 11 is generated for the input in figure 9. It gives a summary of symbols
present in the given program, generated during semantic analysis. It gives us an idea about the
information like names, datatypes, scopes, array sizes and whether it is a function are not, if
it is then how many arguments they take.
Electronic copy available at: [Link]
Case-5: Variable Scope and Type Mismatch
Input:
Figure 12. input for semantic analysis
In the code shown in figure 12, two sets of variables are declared with same names. It leads
to confusion because local variables overshadow global variables. Variables a & b are
declared as float, while c is declared as int. The addition operation is performed for 2 float
variables and assigned to int variable.
Output:
Figure 13. output tokens generated by the ToY compiler (case-5)
Electronic copy available at: [Link]
Figure 13 shows, a list of output tokens for the input given in figure 12, with corresponding
datatypes, variables and line numbers where they occur.
Symbol Table for semantic analysis:
Figure 14. output for semantic analysis (case-5)
The table in figure 14, shows some errors for the input code given in figure 12 at lines
13,14,15 & 16. It provides the summary of the remaining symbols which don’t have any
errors.
6. Conclusion
The code passed through the compiler is analysed at different phases like lexical, syntax &
semantic analysis. At these stages a symbol table is created. This table helps the students in
making them understand the structure and composition of the program, facilitating further
analysis and compilation processes.
In future we can improve this by applying evaluation parameters such as parsing time, Type-
checking time, Accuracy, Memory consumption to the compiler, making it more useful for
the students who face difficulties in understanding the structure of the program.
References
1. Schäfer, M., C. Artho, M. Kowal, and W. Binder. "Compiler Construction using Flex,
Bison, and LLVM." In Proceedings of the 15th International Conference on Principles
and Practices of Programming on the Java Platform: Virtual Machines, Languages, and
Tools (PPPJ '19).
2. Basak, R., R. M. Salakhutdinov, and M. K. Naik. "The Construction of ToPL: A Compiler
for a Typed, Object-Oriented, Functional Language." In Proceedings of the 11th
International Conference on Advanced Computing (ICoAC '19).
3. Khedr, M., A. El-Zawawy, and M. S. Ahmed. "Compiler Development with LLVM,
Flex, and Bison for Beginners." International Journal of Computer Applications
Technology and Research (IJCATR) 7, no. 3 (May-June 2019).
4. Nagpal, H., S. Kumar, and A. Goel. "A Novel Approach to Parser Generation using
Electronic copy available at: [Link]
JavaCC and AST Transformation." International Journal of Advanced Computer
Science and Applications (IJACSA) 10, no. 12 (December 2019).
5. Pohl, K., S. Nanz, and B. Meyer. "Design of a Compiler for a Simple Language using
LLVM." In Proceedings of the 20th International Conference on Compiler Construction
(CC '21).
6. Gupta, A., and S. Chandra. "Exploring Compiler Construction: A Hands-On Approach
using LLVM and Java." International Journal of Computer Applications (IJCA) 183,
no. 30 (April 2019).
7. Jain, S., and S. Patel. "Compiler Design and Implementation using Flex, Bison, and
LLVM: A Tutorial." International Journal of Computer Applications (IJCA) 181, no. 1
(April 2019).
8. Bansal, A., A. Gupta, and S. Chandra. "An Introduction to Compiler Design and
Implementation using Flex, Bison, and LLVM." International Journal of Computer
Applications (IJCA) 183, no. 29 (March 2019).
9. Agarwal, R., and V. Jain. "Building a Compiler: From Theory to Practice using Flex,
Bison, and LLVM." International Journal of Computer Applications (IJCA) 179, no.
30 (February 2019).
10. Gupta, P., and S. Sharma. "Compiler Construction: A Practical Guide using Flex, Bison,
and LLVM." International Journal of Computer Applications (IJCA) 181, no. 28
(January 2019).
11. Aggarwal, S., S. Kumar, and A. Singhal. "Compiler Design: A Practical Approach using
Flex, Bison, and LLVM." International Journal of Computer Applications (IJCA) 183,
no. 2 (December 2019).
12. Singh, R., S. Kaur, and A. Bansal. "Introduction to Compiler Construction using Flex,
Bison, and LLVM." International Journal of Computer Applications (IJCA) 181, no.
25 (November 2019).
13. Verma, M., R. Gupta, and S. Sharma. "Compiler Design and Implementation: A Step-
by-Step Guide using Flex, Bison, and LLVM." International Journal of Computer
Applications (IJCA) 183, no. 13 (October 2019).
14. Singh, A., S. Singh, and V. Choudhary. "Practical Compiler Construction using Flex,
Bison, and LLVM." International Journal of Computer Applications (IJCA) 181, no.
19 (September 2019).
15. Agarwal, P., A. Mittal, and S. Chauhan. "Compiler Design: A Comprehensive Guide
using Flex, Bison, and LLVM." International Journal of Computer Applications
(IJCA) 183, no. 21 (August 2019).
16. Kumar, S., R. Gupta, and M. Verma. "Hands- On Compiler Construction using Flex,
Bison, and LLVM." International Journal of Computer Applications (IJCA) 181, no. 16
(July 2019).
17. Pecheti, Shiva Teja, H. M. Basavadeepthi, Nithin Kodurupaka, and Meena Belwal.
"Recursive Descent Parser for Abstract Syntax Tree Visualization of Mathematical
Expressions." In 2023 7th International Conference on Computation System and
Information Technology for Sustainable Solutions (CSITSS), pp. 1-6. IEEE, 2023.
18. Likhith, Arlagadda Naga, Kothuru Gurunadh, Vimal Chinthapalli, and Meena Belwal.
"Compiler for Mathematical Operations Using English Like Sentences." In 2023 7th
International Conference on Computation System and Information Technology for
Sustainable Solutions (CSITSS), pp. 1-6. IEEE, 2023.
Electronic copy available at: [Link]
19. Reddy, K. Lokeshwar, M. Phani Shanmukh, Arjun Kumar. "Enhancing Abstractive
Text Summarization with Proximal Policy Optimization." In 2024 Fourth International
Conference on Advances in Electrical, Computing, Communication and Sustainable
Technologies (ICAECT), pp. 1-6. IEEE, 2024.
20. Antony, P. J., and K. P. Soman. "Computational morphology and natural language
parsing for Indian languages: a literature survey." International Journal of Scientific and
Engineering Research 3 (2012).
Electronic copy available at: [Link]