0% found this document useful (0 votes)

62 views34 pages

Regular Expressions: Python For Everybody

Uploaded by

Anusha Vedanabhatla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views34 pages

Regular Expressions: Python For Everybody

Uploaded by

Anusha Vedanabhatla

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Regular Expressions

Chapter 11

Python for Everybody

[Link]
Regular Expressions
In computing, a regular expression, also referred to as
“regex” or “regexp”, provides a concise and flexible
means for matching strings of text, such as particular
characters, words, or patterns of characters. A regular
expression is written in a formal language that can be
interpreted by a regular expression processor.

[Link]
Regular Expressions
Really clever “wild card” expressions for matching
and parsing strings

[Link]
Really smart “Find” or “Search”
Understanding Regular Expressions

• Very powerful and quite cryptic

• Fun once you understand them
• Regular expressions are a language unto themselves
• A language of “marker characters” - programming with characters
• It is kind of an “old school” language - compact
[Link]
Regular Expression Quick Guide
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespace
\S Matches any non-whitespace character
* Repeats a character zero or more times
+ Repeats a character one or more times
[aeiou] Matches a single character in the listed set
[^XYZ] Matches a single character not in the listed set
{} Range set
[a-z] Any lower case alphabet
[A-Z] Any Upper case alphabet
[0-9] Any digits
The Regular Expression Module
• Before you can use regular expressions in your program, you must
import the library using “import re”

• You can use [Link]() to see if a string matches a regular expression,

similar to using the find() method for strings

• You can use [Link]() to extract portions of a string that match your
regular expression, similar to a combination of find() and slicing:
var[5:10]
Using [Link]() Like find()

import re
hand = open('[Link]')
for line in hand: hand = open('[Link]')
line = [Link]() for line in hand:
if [Link]('From:') >= 0: line = [Link]()
print(line) if [Link]('From:', line) :
print(line)
Using [Link]() Like startswith()
import re
hand = open('[Link]')
for line in hand: hand = open('[Link]')
line = [Link]() for line in hand:
if [Link]('From:') : line = [Link]()
print(line) if [Link]('^From:', line) :
print(line)

We fine-tune what is matched by adding special characters to the string

Wild-Card Characters
• The dot character matches any character

• If you add the asterisk character, the character is “any number of

times”
Many times
Match the start of the line
X-Sieve: CMU Sieve 2.3
X-DSPAM-Result: Innocent
X-DSPAM-Confidence: 0.8475
X-Content-Type-Message-Body: text/plain
^X.*:
Match any character
Fine-Tuning Your Match
Depending on how “clean” your data is and the purpose of your
application, you may want to narrow your match down a bit

Many times
Match the start of
X-Sieve: CMU Sieve 2.3 the line
X-DSPAM-Result: Innocent
X-Plane is behind schedule: two weeks
X-: Very short
^X.*:
Match any character
Fine-Tuning Your Match
Depending on how “clean” your data is and the purpose of your
application, you may want to narrow your match down a bit

One or more
Match the start of
X-Sieve: CMU Sieve 2.3 times
X-DSPAM-Result: Innocent the line
X-: Very Short
X-Plane is behind schedule: two weeks ^X-\S+:
Match any non-whitespace character
Matching and Extracting Data
• [Link]() returns a True/False depending on whether the string
matches the regular expression

• If we actually want the matching strings to be extracted, we use

[Link]()
>>> import re
[0-9]+ >>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = [Link]('[0-9]+',x)
>>> print(y)
['2', '19', '42']
One or more digits
Matching and Extracting Data
When we use [Link](), it returns a list of zero or more sub-strings that
match the regular expression

>>> import re
>>> x = 'My 2 favorite numbers are 19 and 42'
>>> y = [Link]('[0-9]+',x)
>>> print(y)
['2', '19', '42']
>>> y = [Link]('[AEIOU]+',x)
>>> print(y)
[]
Warning: Greedy Matching
The repeat characters (* and +) push outward in both directions (greedy)
to match the largest possible string
One or more
characters
>>> import re
>>> x = 'From: Using the : character'
>>> y = [Link]('^F.+:', x)
>>> print(y) ^F.+:
['From: Using the :']

First character in the Last character in the

Why not 'From:' ?
match is an F match is a :
Non-Greedy Matching
Not all regular expression repeat codes are greedy! If you
add a ? character, the + and * chill out a bit... One or more
characters but
not greedy
>>> import re
>>> x = 'From: Using the : character'
>>> y = [Link]('^F.+?:', x) ^F.+?:
>>> print(y)
['From:']
First character in the Last character in the
match is an F match is a :
Fine-Tuning String Extraction
You can refine the match for [Link]() and separately determine which portion of
the match is to be extracted by using parentheses

From [Link]@[Link] Sat Jan 5 09:14:16 2008

words = [Link]() [Link]@[Link]

email = words[1] ['[Link]', '[Link]']
pieces = [Link]('@')
print(pieces[1]) '[Link]'
The Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('@([^ ]*)',lin)
print(y)

['[Link]']
'@([^ ]*)'

Look through the string until you find an at sign

The Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('@([^ ]*)',lin)
print(y)

['[Link]']
'@([^ ]*)'

Match non-blank character Match many of them

The Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('@([^ ]*)',lin)
print(y)

['[Link]']
'@([^ ]*)'

Extract the non-blank characters

Even Cooler Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('^From .*@([^ ]*)',lin)
print(y)

['[Link]']
'^From .*@([^ ]*)'

Starting at the beginning of the line, look for the string 'From '
Even Cooler Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('^From .*@([^ ]*)',lin)
print(y)

['[Link]']
'^From .*@([^ ]*)'

Skip a bunch of characters, looking for an at sign

Even Cooler Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('^From .*@([^ ]*)',lin)
print(y)

['[Link]']
'^From .*@([^ ]*)'

Start extracting
Even Cooler Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('^From .*@([^ ]*)',lin)
print(y)

['[Link]']
'^From .*@([^ ]+)'

Match non-blank character Match many of them

Even Cooler Regex Version
From [Link]@[Link] Sat Jan 5 09:14:16 2008
import re
lin = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'
y = [Link]('^From .*@([^ ]*)',lin)
print(y)

['[Link]']
'^From .*@([^ ]+)'

Stop extracting
Spam Confidence
import re
hand = open('[Link]')
numlist = list()
for line in hand:
line = [Link]()
stuff = [Link]('^X-DSPAM-Confidence: ([0-9.]+)', line)
if len(stuff) != 1 : continue
num = float(stuff[0])
[Link](num)
print('Maximum:', max(numlist)) python [Link]
Maximum: 0.9907
X-DSPAM-Confidence: 0.8475
Escape Character
If you want a special regular expression character to just behave
normally (most of the time) you prefix it with '\'

>>> import re At least one or

>>> x = 'We just received $10.00 for cookies.' more
>>> y = [Link]('\$[0-9.]+',x)
>>> print(y)
['$10.00']
\$[0-9.]+
A real dollar sign A digit or period
Summary

• Regular expressions are a cryptic but powerful language for

matching strings and extracting elements from those strings
• Regular expressions have special characters that indicate intent
Acknowledgements / Contributions
These slides are Copyright 2010- Charles R. Severance (
...
[Link]) of the University of Michigan School of
Information and [Link] and made available under a
Creative Commons Attribution 4.0 License. Please maintain this
last slide in all copies of the document to comply with the
attribution requirements of the license. If you make a change,
feel free to add your name and organization to the list of
contributors on this page as you republish the materials.

Initial Development: Charles Severance, University of Michigan

School of Information

… Insert new Contributors and Translations here

Python Compound Interest Program
No ratings yet
Python Compound Interest Program
2 pages
Forensic Toolkit Essentials in Cybersecurity
No ratings yet
Forensic Toolkit Essentials in Cybersecurity
43 pages
TShark Basics and Output Formats
No ratings yet
TShark Basics and Output Formats
19 pages
Cisco Ethical Hacking with Kali Linux
No ratings yet
Cisco Ethical Hacking with Kali Linux
12 pages
Essential Netcat Command Guide
No ratings yet
Essential Netcat Command Guide
4 pages
Nikto and theHarvester Lab Guide
No ratings yet
Nikto and theHarvester Lab Guide
2 pages
Understanding Website Phishing Techniques
No ratings yet
Understanding Website Phishing Techniques
31 pages
Understanding Injection Exploits in Web Security
No ratings yet
Understanding Injection Exploits in Web Security
24 pages
Phishing Detection via Machine Learning
No ratings yet
Phishing Detection via Machine Learning
38 pages
Data Exfiltration Techniques Report
No ratings yet
Data Exfiltration Techniques Report
12 pages
Autopsy Ingest Modules Overview
No ratings yet
Autopsy Ingest Modules Overview
12 pages
Active Footprinting Techniques Explained
No ratings yet
Active Footprinting Techniques Explained
36 pages
Lab12 Netw Forensics Data Hiding
No ratings yet
Lab12 Netw Forensics Data Hiding
17 pages
Nikto Command Cheat Sheet
No ratings yet
Nikto Command Cheat Sheet
1 page
Crack Password Hashes with John the Ripper
No ratings yet
Crack Password Hashes with John the Ripper
8 pages
Devices in TCP/IP Model Layers
No ratings yet
Devices in TCP/IP Model Layers
10 pages
CHFI-v10 Exam Questions & Study Guide
No ratings yet
CHFI-v10 Exam Questions & Study Guide
4 pages
Wireshark Lab: Analyzing HTTP/HTTPS Traffic
No ratings yet
Wireshark Lab: Analyzing HTTP/HTTPS Traffic
7 pages
Electrical and Computer Engineering, University of Arizona Spring 2020 Lab #5
No ratings yet
Electrical and Computer Engineering, University of Arizona Spring 2020 Lab #5
9 pages
CSCI369 Lab 3
No ratings yet
CSCI369 Lab 3
4 pages
Understanding SQL Injection, XSS, CSRF
No ratings yet
Understanding SQL Injection, XSS, CSRF
12 pages
Intelligence Report Final
No ratings yet
Intelligence Report Final
145 pages
Wireshark Lab Manual for Network Security
No ratings yet
Wireshark Lab Manual for Network Security
20 pages
Lab 3.1 - Cracking Windows Passwords (Cain & Abel) : CCIS2400: Security Essentials
No ratings yet
Lab 3.1 - Cracking Windows Passwords (Cain & Abel) : CCIS2400: Security Essentials
4 pages
Forensic Encryption Techniques Explained
No ratings yet
Forensic Encryption Techniques Explained
37 pages
Set Up Snort IDS with Email Alerts
No ratings yet
Set Up Snort IDS with Email Alerts
46 pages
Network Forensics in Cybersecurity
No ratings yet
Network Forensics in Cybersecurity
28 pages
Vulnerability Scanning with SPARTA and Metasploit
No ratings yet
Vulnerability Scanning with SPARTA and Metasploit
5 pages
Network Forensics and Live Acquisition Guide
No ratings yet
Network Forensics and Live Acquisition Guide
34 pages
Proposal for Cyber Defense Organization
100% (1)
Proposal for Cyber Defense Organization
21 pages
Exploiting Samba 3.0.20 Vulnerabilities
No ratings yet
Exploiting Samba 3.0.20 Vulnerabilities
12 pages
Ethical Hacking Overview and Techniques
No ratings yet
Ethical Hacking Overview and Techniques
47 pages
RHEL7 STIG Overview for DoD Security
No ratings yet
RHEL7 STIG Overview for DoD Security
7 pages
Top Tools for SOC Analysts
No ratings yet
Top Tools for SOC Analysts
14 pages
Palo Alto Security Policy Filtering Guide
No ratings yet
Palo Alto Security Policy Filtering Guide
2 pages
Investigating Suspicious Scheduled Tasks
No ratings yet
Investigating Suspicious Scheduled Tasks
77 pages
Hping3 Command Options Guide
No ratings yet
Hping3 Command Options Guide
1 page
Understanding SSRF Vulnerabilities
No ratings yet
Understanding SSRF Vulnerabilities
5 pages
Analyzing the blacktds Threat Actor
No ratings yet
Analyzing the blacktds Threat Actor
5 pages
Understanding and Preventing Cyber Crime
100% (1)
Understanding and Preventing Cyber Crime
14 pages
Understanding Domain Name System (DNS)
No ratings yet
Understanding Domain Name System (DNS)
20 pages
Shodan Command Filters Overview
No ratings yet
Shodan Command Filters Overview
1 page
ASA 5506 11-3-1-2 - CCNA Security Comprehensive Lab
No ratings yet
ASA 5506 11-3-1-2 - CCNA Security Comprehensive Lab
18 pages
Exploiting vsftpd 2.3.4 with Metasploit
No ratings yet
Exploiting vsftpd 2.3.4 with Metasploit
10 pages
Virtual Machine Forensics Overview
No ratings yet
Virtual Machine Forensics Overview
36 pages
Zero Day Attacks
No ratings yet
Zero Day Attacks
15 pages
Web Application Penetration Testing Guide
No ratings yet
Web Application Penetration Testing Guide
11 pages
Malware Analysis: Spyware & Email Viruses
No ratings yet
Malware Analysis: Spyware & Email Viruses
16 pages
Tcpdump Primer With Examples
No ratings yet
Tcpdump Primer With Examples
11 pages
Internet Security Protocols Overview
No ratings yet
Internet Security Protocols Overview
36 pages
Wireshark TCP 3-Way Handshake Lab
No ratings yet
Wireshark TCP 3-Way Handshake Lab
6 pages
Network Reconnaissance and Scanning Tools
No ratings yet
Network Reconnaissance and Scanning Tools
4 pages
Email Header Analysis Guide
No ratings yet
Email Header Analysis Guide
7 pages
Cybersecurity Lab Manual: Nmap & Wireshark
No ratings yet
Cybersecurity Lab Manual: Nmap & Wireshark
57 pages
Metasploit Network Scanning Guide
No ratings yet
Metasploit Network Scanning Guide
16 pages
CS45-TCP Course Notes
No ratings yet
CS45-TCP Course Notes
25 pages
Python Regex Tutorial Regular Expressions Charles Severance Python Re Module Examples Regex Findall Search Extract
No ratings yet
Python Regex Tutorial Regular Expressions Charles Severance Python Re Module Examples Regex Findall Search Extract
34 pages
Regular Expressions: Python For Everybody
No ratings yet
Regular Expressions: Python For Everybody
34 pages
Understanding Regular Expressions in Python
No ratings yet
Understanding Regular Expressions in Python
32 pages
Python String Manipulation Guide
No ratings yet
Python String Manipulation Guide
43 pages
LMS User Guide - Simplilearn
No ratings yet
LMS User Guide - Simplilearn
31 pages
Lab Manual Data Structures Using C Lab: For MCA 2 Semester of VTU
No ratings yet
Lab Manual Data Structures Using C Lab: For MCA 2 Semester of VTU
38 pages
C++ Data Structures Laboratory Manual
No ratings yet
C++ Data Structures Laboratory Manual
81 pages
C Programming - Data Structures and Algorithms
No ratings yet
C Programming - Data Structures and Algorithms
167 pages
C Data Structures by Balaguruswamy
42% (12)
C Data Structures by Balaguruswamy
1 page
Slides
0% (1)
Slides
52 pages
New Microsoft Word Document
No ratings yet
New Microsoft Word Document
16 pages
R07 B.Tech Exam Dates December 2011
No ratings yet
R07 B.Tech Exam Dates December 2011
7 pages
Telugu Song Lyrics Compilation
No ratings yet
Telugu Song Lyrics Compilation
7 pages
Debian Edu Buster Manual
No ratings yet
Debian Edu Buster Manual
110 pages
Maxent F90 Library User Manual
No ratings yet
Maxent F90 Library User Manual
16 pages
Wendler's 5/3/1 Spreadsheet v1.3
No ratings yet
Wendler's 5/3/1 Spreadsheet v1.3
44 pages
Samsung Device Repair Service Guide
No ratings yet
Samsung Device Repair Service Guide
2 pages
Depth First Traversal and Java Concepts Quiz
No ratings yet
Depth First Traversal and Java Concepts Quiz
15 pages
Scranton School District Electronic Device Policy
No ratings yet
Scranton School District Electronic Device Policy
4 pages
Predicting Cyber Agility in IoT Security
No ratings yet
Predicting Cyber Agility in IoT Security
11 pages
HTML Programming Exercises for Beginners
No ratings yet
HTML Programming Exercises for Beginners
7 pages
MCA Project Report Preparation Guide
No ratings yet
MCA Project Report Preparation Guide
10 pages
Salome to OpenFOAM Mesh Conversion Guide
No ratings yet
Salome to OpenFOAM Mesh Conversion Guide
7 pages
IP Fast Reroute: Solutions & Challenges
No ratings yet
IP Fast Reroute: Solutions & Challenges
19 pages
BreezeAIR Smart IDU Guide
No ratings yet
BreezeAIR Smart IDU Guide
28 pages
6155R Um001b en P
No ratings yet
6155R Um001b en P
60 pages
SP420 Maintenance Guide: Preventive & Corrective
No ratings yet
SP420 Maintenance Guide: Preventive & Corrective
20 pages
Computer Skills for Business Success
No ratings yet
Computer Skills for Business Success
8 pages
Yabai Window Toggle Script Guide
No ratings yet
Yabai Window Toggle Script Guide
3 pages
Camera Service and Process Warnings
No ratings yet
Camera Service and Process Warnings
43 pages
Master Essbase 21c in 5 Days
No ratings yet
Master Essbase 21c in 5 Days
3 pages
Sjeat 810 244-248
No ratings yet
Sjeat 810 244-248
5 pages
A Smart Approach For Design of Digitally Controlled Multiple-Output DC-DC Converter
No ratings yet
A Smart Approach For Design of Digitally Controlled Multiple-Output DC-DC Converter
6 pages
BCA Internship Report: Buildex Fintech Experience
No ratings yet
BCA Internship Report: Buildex Fintech Experience
27 pages
Become a Certified AdWords Expert
No ratings yet
Become a Certified AdWords Expert
3 pages
Diploma in IT Environmental Studies Syllabus
No ratings yet
Diploma in IT Environmental Studies Syllabus
37 pages
Understanding Packet Sniffing Techniques
No ratings yet
Understanding Packet Sniffing Techniques
8 pages
Web Application Scan Setup Guide
No ratings yet
Web Application Scan Setup Guide
10 pages
Motherboard Components and Functions
No ratings yet
Motherboard Components and Functions
17 pages
Data Warehouse Absence Event Schema
No ratings yet
Data Warehouse Absence Event Schema
4,741 pages
JavaScript in HTML: Class 8 Answers
No ratings yet
JavaScript in HTML: Class 8 Answers
6 pages
UFS GlobalProtect VPN Access Guide
No ratings yet
UFS GlobalProtect VPN Access Guide
4 pages
C++ Looping Statements Lab Guide
No ratings yet
C++ Looping Statements Lab Guide
5 pages

Regular Expressions: Python For Everybody

Uploaded by

Regular Expressions: Python For Everybody

Uploaded by

Regular Expressions

Python for Everybody

• Very powerful and quite cryptic

• You can use [Link]() to see if a string matches a regular expression,

We fine-tune what is matched by adding special characters to the string

• If you add the asterisk character, the character is “any number of

• If we actually want the matching strings to be extracted, we use

First character in the Last character in the

From [Link]@[Link] Sat Jan 5 09:14:16 2008

>>> y = [Link]('\S+@\S+',x) \S+@\S+

From [Link]@[Link] Sat Jan 5 09:14:16 2008

From [Link]@[Link] Sat Jan 5 09:14:16 2008

>>> data = 'From [Link]@[Link] Sat Jan 5 09:14:16 2008'

From [Link]@[Link] Sat Jan 5 09:14:16 2008

words = [Link]() [Link]@[Link]

Look through the string until you find an at sign

Match non-blank character Match many of them

Extract the non-blank characters

Skip a bunch of characters, looking for an at sign

Match non-blank character Match many of them

>>> import re At least one or

• Regular expressions are a cryptic but powerful language for

Initial Development: Charles Severance, University of Michigan

… Insert new Contributors and Translations here

You might also like