0% found this document useful (0 votes)
5 views10 pages

Custom Compiler Scanner Overview

The document presents a project on a custom compiler innovation, specifically focusing on a custom language and token scanner named Scanner.py. It outlines the innovative features, custom identifier rules, data types, reserved words, and the implementation of the scanner. The project aims to enhance clarity and functionality in language design while detailing the token recognition mechanism and providing examples of valid and invalid identifiers.

Uploaded by

AMNA NOOR
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

Custom Compiler Scanner Overview

The document presents a project on a custom compiler innovation, specifically focusing on a custom language and token scanner named Scanner.py. It outlines the innovative features, custom identifier rules, data types, reserved words, and the implementation of the scanner. The project aims to enhance clarity and functionality in language design while detailing the token recognition mechanism and providing examples of valid and invalid identifiers.

Uploaded by

AMNA NOOR
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Custom

Compiler Innovation
Custom Language and Token Scanner

Year 2024

Submitted to: Sir Mudassir Zaidi


Department of Computer Science

Amna Noor | BSCS51F21R025


Zain Ul Abidden | BSCS51F21R034
Yahya Khalid | BSCS51F21R045
Compiler Construction
Date of Submission: December 30, 2024
Table of Contents

1. [Link] Overview
 Introduction to [Link]
 Innovative Features and Design Choices
2. Custom Identifier Rules
 Overview
 Rules
 Regular Expression
 Examples
3. Custom Data Types
 digis (Integer)
 dot (Float)
 mark (Character)
 stream (String)
 toggle (Boolean)
4. Reserved Words
 When and Otherwise
 Multi-Way Condition: Selector, Option, Collapse, Preset
 Output Statement: Display
 Return Keyword: Giveback
 Iterative Loops: Iter and Iter-until
5. Scanner Implementation
 Token Recognition Mechanism
 Workflow Diagram
6. References
 Regular Expression Resources
[Link] Overview

Introduction to [Link]

This project is an innovative take on compiler construction, introducing custom syntax rules, data types,
and identifiers. It aims to simulate the behavior of a scanner phase of a compiler while incorporating
creativity in language design. The scanner reads a source file, identifies tokens based on custom rules,
and generates a detailed output.

Innovative Features and Design Choices

 Self-Descriptive Tokens: Selected meaningful and relevant words like digis (representing
integers) and mark (indicating characters) to ensure clarity.
 Unique and Relevant: Avoided generic terms, focusing on functional and distinct tokens.
 Optimized Short Words: Prioritized brevity for reduced code size and ease of use.
 Custom Identifier Pattern: Designed a unique pattern to maintain consistency and structure.

Custom
Custom Identifier
Identifier Rules
Rules

The identifier rules define the naming conventions for variables by ensuring clarity, readability, and
uniformity across all variable names.

Rules

1. Prefix:
Must begin with a #.

2. Casing:
Pascal casing is mandatory (e.g., #MyVariableName).
If there are multiple words, each must start with an uppercase letter.

3. Character Restrictions:
No special characters or numbers are allowed within the word part.
Hyphens (-) can only separate the word part and the number part.

4. Number Usage:
Digits, if used, must appear after a hyphen (-) at the end of the identifier.
The hyphen and digits together are optional, but they cannot appear separately.

5. Length:
The entire identifier must not exceed 20 characters (including the # prefix).
6. Reserved Words:
Reserved keywords of the language cannot be used as identifiers.

Regular Expression

^(? =. {1,20}$)#[𝐴 − 𝑍][𝑎 − 𝑧] + (? : [𝐴 − 𝑍][𝑎 − 𝑧]+) ∗ (−[0 − 9]+)? $

Explanation:

 ^ — Ensures the match starts at the beginning of the string.


 (?=.{1,20}$) — Limits the total length of the identifier to 20 characters.
 # — The identifier must start with a #.
 [A-Z][a-z]+ — The first word starts with an uppercase letter followed by lowercase letters.
 (?:[A-Z][a-z]+)* — Ensures subsequent words (if any) follow Pascal casing.
 (-[0-9]+)? — Allows an optional hyphen followed by one or more digits.
 $ — Ensures the match ends at the end of the string.

Examples

Valid Identifiers

 #MyVariable
 #ExampleName-123
 #PascalCase
 #Short-7
 #TestVariable-99

Invalid Identifiers

1. No Pascal Casing:
 #myvariable (First letter not capitalized).
2. No Word After Hyphen:
 #Variable- (Hyphen without digits).
3. Invalid Characters:
 #Var!able (Special characters not allowed).
4. Exceeds Length:
 #ThisIdentifierIsTooLong-1234.
5. Number Placement:
 #123Variable (Numbers not allowed in word part).
Custom Data Types

1. digis (Integer)

Purpose: Represents whole numbers without decimal points, such as counters or indices.
Example Usage:

digis #Age = 25;


digis #Count = 100;

2. dot (Float)

Purpose: Represents decimal numbers for precise values, such as measurements or monetary values.
Example Usage:

dot #Temperature = 36.5;


dot #Pi = 3.14159;

3. mark (Character)

Purpose: Holds a single character, such as a letter, number, or symbol.


Example Usage:

mark #Initial = 'A';


mark #Symbol = '$';

4. stream (String)

Purpose: Represents a sequence of characters. Useful for names, messages, or general text data.
Example Usage:

stream #Name = "amna";


stream #Greeting = "Hello, world!";

5. toggle (Boolean)

Purpose: Represents a binary state, such as true or false. Ideal for decision-making or conditions.
Example Usage:

toggle #IsActive = true;


toggle #IsComplete = false;
Custom Reserve Words

1. Conditional Keywords: When (if) and Otherwise (else)

Why These Words?

When and Otherwise: Easily understandable and match natural language flow for conditional logic.

Example Code:

digis #Temperature = 35;

when (#Temperature > 30) {


display("It's hot outside!");
} otherwise {
display("It's pleasant outside.");
}

2. Multi-Way Condition: Selector (switch), Option (case), Collapse (break),


Preset (default)

Why These Words?

 Selector: Highlights the selection process in a multi-way condition.


 Option: Represents each possible case in a clear way.
 Collapse: Reflects the termination of a single case.
 Preset: Suggests a predefined default value.

Example Code:

digis main() {
digis #Day = 3;

selector(#Day) {
option 1:
display("Monday: Start of the week!\n");
collapse;
option 2:
display("Tuesday: Keep going!\n");
collapse;
option 3:
display("Wednesday: Midweek hustle!\n");
collapse;
option 4:
display("Thursday: Almost there!\n");
collapse;
option 5:
display("Friday: Weekend vibes!\n");
collapse;
option 6:
display("Saturday: Relax and recharge!\n");
collapse;
option 7:
display("Sunday: Rest and enjoy!\n");
collapse;
preset:
display("Invalid day number\n");
}

giveback 0;
}

3. Output Statement: Display (printf or cout)

Why This Word?

 Display: Straightforward and represents the action of outputting information.

4. Return Keyword: Giveback (return)

Why This Word?

 Giveback: Suggests a meaningful way to describe returning a value or result back to the caller.

5. Iterative Loops: Iter (for) and Iter-until (while)

Why These Words?

Iter and Iter-until: Capture the essence of iteration in an easy-to-grasp and concise way.

Example Code:

// Using "iter" (for loop)


iter (digis #I = 1; #I <= 5; #I++) {
display("Iteration: ");
display(#I);
display("\n");
}

// Using "iter-until" (while loop)


digis #Counter = 1;
iter-until (#Counter > 5) {
display("Counter: ");
display(#Counter);
display("\n");
#Counter++;

Scanner Implementation

Python-based scanner for source code processing.

Token Recognition Mechanism

1. File Reading: Reads source code line by line from source_code.txt.


2. Token Recognition: Identifies tokens based on predefined rules; regular expressions for
identifiers.
3. Token Classification: Categorizes tokens into:
Reserved Words (e.g., when, selector)
Data Types (e.g., digis, dot)
Identifiers, Operators, Symbols, String Literals, Constants
4. Output: Writes tokens to scanner_output.txt with serial number, type, value, and line
number.

Example:

Input (source_code.txt):

Output (scanner_output.txt):
Workflow Diagram
References
1. [Link]
A tool for testing and understanding regular expressions. This link provides an interactive
platform where you can paste your regex pattern, test it against different inputs, and get
explanations for its components.

You might also like