0% found this document useful (0 votes)
6 views35 pages

Python Data Structures Overview

The document provides an overview of various Python data structures including Stack, Queue, Deque, Set, Dictionary, Linked List, and Binary Tree, detailing their properties, implementations, and time complexities. It explains key concepts such as height, depth, and size of trees, along with examples of how to create and manipulate these structures in Python. Additionally, it highlights the importance of choosing the right data structure for efficiency in programming tasks.

Uploaded by

Ramesh Chaudhary
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views35 pages

Python Data Structures Overview

The document provides an overview of various Python data structures including Stack, Queue, Deque, Set, Dictionary, Linked List, and Binary Tree, detailing their properties, implementations, and time complexities. It explains key concepts such as height, depth, and size of trees, along with examples of how to create and manipulate these structures in Python. Additionally, it highlights the importance of choosing the right data structure for efficiency in programming tasks.

Uploaded by

Ramesh Chaudhary
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Distilled AI

Back to [Link]

search...

Primers • Python Data Structures and Time Complexities


Python Data Structures
Stack
Queue
Double-ended Queue (Deque)
Set
Dictionary
Linked List
Binary Tree
Concepts
Edge
Path
Height
Depth
Level
Size
Example
Implementation
Create Root
Inserting Into a Tree
Traversing a Tree
Tree Traversal Algorithms
In-order Traversal
Pre-order Traversal
Post-order Traversal
Summary
Heap
heapq for Creating Min-heaps
What If You Want a Max-heap?
Building a Heap
Comparing Across Multiple Fields
Trie
Dictionary
Accessing a Non-existent Key
Graph
Adjacency Matrix
Adjacency List
Generate a Graph Using Dictionary in Python
Generate Graph
Generate the Path from One Node to the Other Node
Generate All the possible Paths from One Node to the Other
Generate the Shortest Path
Python Time Complexities
List
String
[Link]
Set
Back to Top
Dict
Counter
Notes
References
Citation

Python Data Structures

Stack

A stack follows the last in, first out (LIFO) principle. Unlike an array structure, which allows random
access at all the positions, a stack limits the insert (push) and remove (pop) operation to only one end
of the data structure, i.e., to the top. A stack also allows us to get the top element in O(1) time.
Though there isn’t an explicit stack class in Python, we can use a list instead. To follow the LIFO
principle, inserting and removing operations both occur at the tail of the list (which emulates the top of
the stack).
We can use append and pop to add and remove elements off the end in amortized O(1) time
(refer Python Time Complexities). Python lists are implemented as dynamic arrays. Once the array is
full, Python will allocate more memory for a larger array and copy the old elements to it. Since this
reallocation doesn’t happen too often, append will still be O(1) in most cases. Also, we can index into
the list with -1 to get the last element in a list. Basically, the end of the list will be the “top of the
stack”.

stack = [] # Empty stack


[Link](1) # Push 1 onto stack, current stack: [1]
[Link](2) # Push 2 onto stack, current stack: [1,2]
print(stack[-1]) # Print the element at the top, prints "2"
[Link]() # Pop 2 off the stack, current stack: [1]

A wrapper class to emulate a stack using a list is as follows:

class stack:
# by default pass in [] as the initial value
def __init__(self, initialVal=[]):
[Link] = initialVal

# push is to append to the tail of the list


def push(self, elem):
[Link](elem)
return [Link]

# pop is to remove from the tail of the list Back to Top


def pop(self):
return [Link](-1)

def checkStack(self):
print([Link])

def checkTop(self):
print([Link][-1])

Queue
Similar to a stack, a queue also limits the insertion and removal of elements to specific ends of the
data structure. However, unlike a stack, a queue follows the first in, first out (FIFO) principle.
In Python, a queue can also be implemented using a list. In accordance with the FIFO principle, the
insertion operation occurs at the tail of the list, while the deletion operation occurs at the head of the
list.
A wrapper class to emulate a queue using a list is as follows:

class queue:
# by default pass in [] as the initial value
def __init__(self, initialVal=[]):
[Link] = initialVal

# enqueue is to append to the tail of the list


def enqueue(self, elem):
[Link](elem)
return [Link]

# dequeue is to remove from the head of the list


def dequeue(self):
return [Link](0)

def checkQueue(self):
print([Link])

def checkHead(self)
print([Link][0])

def checkTail(self)
print([Link][-1])

Double-ended Queue (Deque)

In the collections package, Python provides deque , a double-ended queue class. With deque , we
can insert and remove from both ends in O(1) time. The deque constructor takes in an iterable – we
can pass in an empty list to create an empty deque.

from collections import deque


queue = deque([])

The main methods are append , appendleft , pop and popleft . As the names suggest,
append and pop will add and remove elements from the end, whereas appendleft and
popleft do the same at the front. In order to emulate a queue, we add from one side and remove
Back to Top
from the other.
from collections import deque

queue = deque([]) # Create an empty deque, current deque: []


[Link](5) # Add 5 to the front of the queue, current deque: [5]
[Link](1) # Add 1 to the front of the queue, current deque: [1,
element = [Link]() # Pop 5 from the end of the queue, current deque: [1]
next_element = [Link]() # Pop 1 from the end of the queue, current deque: []

If you noticed, we can also use deque to emulate a stack: we add and remove from the same side.
On a different note, a common mistake is using lists as queues. Let’s consider two ways we can use
lists as queues. We can add to the front of the list and remove from the end. This requires using
[insert]([Link] at index 0 and
pop .

queue = [2, 3, 4, 5]
[Link](0, 1) # Add 1 at index 0, [1, 2, 3, 4, 5]
element = [Link]() # Pop 5 from the end, [1, 2, 3, 4]

The problem is that inserting an element at the front of a list is O(N) , since all the other elements must
be shifted to the right (Time Complexity).
Let’s say instead we decided to add at the end and remove from the front.

queue = [5, 4, 3, 2]
[Link](1) # Add 1 to the end, [5, 4, 3, 2, 1]
element = [Link](0) # Pop 5 from the front, [4, 3, 2, 1]

The problem still remains, since popping from the front of the list will be O(N) as everything to the
right must shift to the left.
Even though using lists as queues is fine if the lists are relatively small, using them over the more
efficient deque in an interview setting where time complexity is important is not ideal.

Set
A set object is an unordered collection of distinct hashable objects. It’s one of Python’s built-in types
and allows the dynamic adding and removing of elements, iteration, and operations with another set
objects.
Here’s some operations on a set:

# set initialization by passing in a list


myset = set([1, 2, 3, 3, 3])

# set initialization using {}


myset = {1, 2, 3, 3, 3}

# iteration of set
for ele in myset:
print(ele)

# check if ele in set:


print(True if ele in myset else False)

# add an element to set:


Back to Top
[Link](ele)
# remove an element from set
[Link](ele)

# get length of the set


print(len(myset))

Here’s some operations on two sets:

myset1 = {1, 2, 3}
myset2 = {1, 2, 4, 5}

# union
myset = [Link](myset2)

# intersection
myset = [Link](myset2)

# difference
myset = [Link](myset2)

Dictionary
A dictionary contains mapping information of (key, value) pairs.
Given a key, the corresponding value can be found in the dictionary. It’s also one of Python’s built-in
types. The (key, value) pairs can be dynamically added, removed, and modified. A dictionary
also allows iteration through the keys, values, and (key, value) pairs.
Here’s some operations on a dictionary:

# dictionary initialization using {}


mydict = {'a': 1, 'b': 2}

# add new (key,value) pair


mydict['c'] = 3

# modify existing (key,value) pair


mydict['a'] = 5

# remove (key,value) pair


[Link]('a')

# get length of the dictionary


print(len(mydict))

# iteration through keys


for key in [Link]():
print(key)

# iteration through values


for value in [Link]():
print(value)

# iteration through (key,value) pairs

Back to Top
for key, value in [Link]():
print(key,value)

Linked List
A linked list is a linear data structure, with the previous node pointing to the next node. It can be
implemented by defining a ListNode class.
Here’s some operations on a linked list:

class ListNode:
def __init__(self, val):
[Link] = val
[Link] = None

# initiation of linked list


headNode = ListNode(1)
secondNode = ListNode(2)
thirdNode = ListNode(3)

[Link] = secondNode
[Link] = thirdNode

# iterate through the linked list


curNode = headNode
while curNode:
print([Link])
curNode = [Link]

# insert new listnode with value of 5 in between the secondNode and thirdNode
curNode = headNode
while [Link] != 2:
curNode = [Link]
newNode = ListNode(5)
[Link] = [Link]
[Link] = newNode

# remove the listnode with value of 5


curNode = headNode
while [Link] != 5:
curNode = [Link]
[Link] = [Link]

Binary Tree
A tree is a nonlinear data structure where each node has at most two children, which are referred to
as the left child and the right child.

Concepts

There are three important properties of trees: height, depth and level, together with edge and path
and tree (data structure) on wiki also explains them briefly:

Edge
Back to Top
The connection between one node to another is an edge.

An example of edge is shown above between A and B. Basically, an edge is a line between two
nodes, or a node and a leaf.

Path

A path is a sequence of nodes and edges connecting a node with a descendant.

A path starts from a node and ends at another node or a leaf. Please don’t look over the following points:
Back to Top
1. When we talk about a path, it includes all nodes and all edges along the path, not just edges.
2. The direction of a path is strictly from top to bottom and cannot be changed in middle. In the diagram,
we can’t really talk about a path from B to F although B is above F. Also there will be no path starting
from a leaf or from a child node to a parent node.

Height

The height of a node is the number of edges on the longest downward path between that node and a
leaf.

At first, we can see the above wiki definition has a redundant term - downward - inside. As we already
know from previous section, path can only be downward.

When looking at height:


1. Every node has height. So B can have height, so does A, C and D.
2. Leaf cannot have height as there will be no path starting from a leaf.
3. It is the longest path from the node to a leaf. So A’s height is the number of edges of the path to
E, NOT to G. And its height is 3.
The height of the root is 1.

The height of a tree is the number of edges on the longest downward path between the root and a leaf.
Essentially, the height of a tree is the height of its root. Note that if a tree has only one node, then that node
is at the same time the root node and the only leaf node, so the height of the tree is 0. On the other hand, if
the tree has no nodes, it’s height is -1.

Frequently, we may be asked the question: what is the max number of nodes a tree can have if the
height of the tree is h?. Of course the answer is 2h − 1. When h = 1, the number of node inside is 1,
which is just the root; also when a tree has just root, the height of the root is 1. Hence, the two
inductions match.
How about giving a height of 0? Then it means we don’t have any node in the tree; but still we may
have leaf inside. This is why in most languages, the type of a tree can be a leaf alone.
Moreover, when we use 2h − 1 to calculate the max number of nodes, leaves are not taken into
account. Leaf is not Node. It carries no key or data, and acts only like a STOP sign. We need to
remember this when we deal with properties of trees.

Depth

The depth of a node is the number of edges from the node to the tree’s root node.

Back to Top
We don’t care about path any more when depth pops in. We just count how many edges between the
targeting node and the root, ignoring directions. For example, D’s depth is 2.
Recall that when talking about height, we actually imply a baseline located at bottom. For depth, the
baseline is at top which is root level. That’s why we call it depth.
Note that the depth of the root is 0.

The depth of a binary tree is usually used to refer to the height of the tree.

Level

The level of a node is defined by 1 + the number of connections between the node and the root.

Simply put, level is depth plus 1.


The important thing to remember is when talking about level, it starts from 1 and the level Back
of thetoroot
Top is
1. We need to be careful about this when solving problems related to level.
Size

The size of a binary tree is the total number of nodes in that tree.

Example

The height of node 2 is 1 because from 2 there is a path to two leaf nodes (4 and 5), and each of the
two paths is only 1 edge long, so the largest is 1.
The height of node 3 is 2 because from 3 there is a path to only one leaf node (7), and it consists of 2
edges.
The height of the tree is 3 because from the root node 1 there is a path to three leaf nodes (4, 5, and
7), and the path to the 4 and 5 consists of 2 edges, whereas the path to the 7 consists of 3 edges, so
we pick the largest, which is 3.
The size of the tree is 7.
The depth of node 2 is 1; The depth of node 3 is 1; the depth node 6 is 2.
The depth of the binary tree is equal to the height of the tree, so it is 3.

Implementation

A tree represents the nodes connected by edges. It is a non-linear data structure. It has the following
properties −
One node is marked as Root node.
Every node other than the root is associated with one parent node.
Each node can have an arbitrary number of child node.

Create Root

We just create a Node class and add assign a value to the node. This becomes a tree with only a
root node.

class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data
Back to Top
def PrintTree(self):
print([Link])

root = Node(10)
[Link]()

When the above code is executed, it produces the following result −

10

Inserting Into a Tree

To insert into a tree we use the same node class created above and add a insert class to it. The
insert class compares the value of the node to the parent node and decides to add it as a left
node or a right node. Finally the PrintTree class is used to print the tree.

class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data

def insert(self, data):


# Compare the new value with the parent node
if [Link]:
if data < [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
elif data > [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else:
[Link] = data

# Print the tree


def PrintTree(self):
if [Link]:
[Link]()
print( [Link]),
if [Link]:
[Link]()

# Use the insert method to add nodes


root = Node(12)
[Link](6)
[Link](14)
[Link](3)
[Link]()

When the above code is executed, it produces the following result − Back to Top
3 6 12 14

Traversing a Tree

A tree can be traversed by deciding on a sequence to visit each node. We can start at a node then
visit the left subtree first and right subtree next, or we can also visit the right subtree first and left
subtree next. Accordingly there are different names for these tree traversal methods.

Tree Traversal Algorithms

Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes
are connected via edges (links) we always start from the root (head) node. That is, we cannot
randomly access a node in a tree. There are three ways which we use to traverse a tree.
In-order Traversal
Pre-order Traversal
Post-order Traversal

In-order Traversal

In this traversal method, the left subtree is visited first, then the root and later the right subtree. We
should always remember that every node may represent a subtree itself.
In the below python program, we use the Node class to create place holders for the root node as well
as the left and right nodes. Then, we create an insert function to add data to the tree. Finally, the In-
order traversal logic is implemented by creating an empty list and adding the left node first followed
by the root or parent node.
At last the left node is added to complete the In-order traversal. Please note that this process is
repeated for each subtree until all the nodes are traversed.

class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data

# Insert Node
def insert(self, data):
if [Link]:
if data < [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else data > [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else:
[Link] = data

# Print the Tree


def PrintTree(self):
if [Link]:
[Link]()
print([Link]),
if [Link]: Back to Top
[Link]()

# Inorder traversal
# Left -> Root -> Right
def inorderTraversal(self, root):
res = []
if root:
res = [Link]([Link])
[Link]([Link])
res = res + [Link]([Link])
return res

root = Node(27)
[Link](14)
[Link](35)
[Link](10)
[Link](19)
[Link](31)
[Link](42)
print([Link](root))

When the above code is executed, it produces the following result −

[10, 14, 19, 27, 31, 35, 42]

Pre-order Traversal

In this traversal method, the root node is visited first, then the left subtree and finally the right subtree.
In the below python program, we use the Node class to create place holders for the root node as well
as the left and right nodes. Then, we create an insert function to add data to the tree. Finally, the Pre-
order traversal logic is implemented by creating an empty list and adding the root node first followed
by the left node.
At last, the right node is added to complete the pre-order traversal. Please note that, this process is
repeated for each subtree until all the nodes are traversed.

class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data

# Insert Node
def insert(self, data):
if [Link]:
if data < [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
elif data > [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else:
Back to Top
[Link] = data
# Print the Tree
def PrintTree(self):
if [Link]:
[Link]()
print([Link]),
if [Link]:
[Link]()

# Preorder traversal
# Root -> Left ->Right
def PreorderTraversal(self, root):
res = []
if root:
[Link]([Link])
res = res + [Link]([Link])
res = res + [Link]([Link])
return res

root = Node(27)
[Link](14)
[Link](35)
[Link](10)
[Link](19)
[Link](31)
[Link](42)
print([Link](root))

When the above code is executed, it produces the following result −

[27, 14, 10, 19, 35, 31, 42]

Post-order Traversal

In this traversal method, the root node is visited last, hence the name. First, we traverse the left
subtree, then the right subtree and finally the root node.
In the below python program, we use the Node class to create place holders for the root node as well
as the left and right nodes. Then, we create an insert function to add data to the tree. Finally, the Post-
order traversal logic is implemented by creating an empty list and adding the left node first followed
by the right node.
At last the root or parent node is added to complete the Post-order traversal. Please note that, this
process is repeated for each subtree until all the nodes are traversed.

class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data

# Insert Node
def insert(self, data):
if [Link]:
if data < [Link]:
if [Link] is None:
[Link] = Node(data)
Back to Top
else:
[Link](data)
else if data > [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else:
[Link] = data

# Print the Tree


def PrintTree(self):
if [Link]:
[Link]()
print([Link]),
if [Link]:
[Link]()

# Postorder traversal
# Left ->Right -> Root
def PostorderTraversal(self, root):
res = []
if root:
res = [Link]([Link])
res = res + [Link]([Link])
[Link]([Link])
return res

root = Node(27)
[Link](14)
[Link](35)
[Link](10)
[Link](19)
[Link](31)
[Link](42)
print([Link](root))

When the above code is executed, it produces the following result −

[10, 19, 14, 31, 42, 35, 27]

Summary

class TreeNode:
def __init__(self, val):
[Link] = val
[Link] = None
[Link] = None

# initialization of a tree
rootNode = TreeNode(1)
leftNode = TreeNode(2)
rightNode = TreeNode(3)

[Link] = leftNode
[Link] = rightNode
Back to Top
# inorderTraversal of the tree (Left, Root, Right)
def inorderTraversal(root):
result = []

if not root:
return [Link]

def df(node: TreeNode):


if not node:
return
df([Link])
[Link]([Link])
df([Link])

df(root)
return result

# preorderTraversal of the tree (Root, Left, Right)


def preorderTraversal(root):
result = []

if not root:
return result

def df(node: TreeNode):


if not node:
return
[Link]([Link])
df([Link])
df([Link])

df(root)
return result

# postorderTraversal of the tree (Left, Right, Root)


def postorderTraversal(root):
result = []

if not root:
return result

def df(node: TreeNode):


if not node:
return
df([Link])
df([Link])
[Link]([Link])

df(root)
return result

print('inorder: ', inorderTraversal(rootNode))


print('preorder: ', preorderTraversal(rootNode))
print('postorder: ', postorderTraversal(rootNode))

Heap

heapq for Creating Min-heaps Back to Top


Similar to stacks, there is no explicit class to create a heap (or a priority queue) in Python. Instead, we
have heapq , a collection of methods that allows us to use lists as heaps.

import heapq
heap = [] # Create a new 'heap'. it is really just a list

We can add elements to our heap with [Link] and pop elements out of our heap with
[Link] .

[Link](heap, 1) # Push the value 1 into the heap


element = [Link](heap) # Pop the value on top of the heap: 1. The heap is

[Link] and [Link] will maintain the min-heap invariant in the list that we use
as a heap. As an example:

heap = []
numbers = [9,2,4,1,3,11]
k = 3
ksmallest = []

# Build the heap


for number in numbers:
[Link](heap, number)

# Pop k smallest elements


for i in range(k):
[Link]([Link](heap)) # ksmallest is [1,2,3]

print(ksmallest)

If we were to push all the elements in numbers into heap and pop out 3 times, we would get the 3
smallest elements – in this case [1,2,3] . There is no “setting” that tells the heapq functions to
treat our heap as a max-heap: the functions “assume” that the passed in heap is a min-heap and only
provide the expected results if it is.

What If You Want a Max-heap?


Back to Top
As previously stated, we cannot directly use a max-heap with heapq . Instead, we can try to invert
our elements in the min-heap so that the larger elements get popped out first.
For example, say you have a list of numbers, and want the K largest numbers. To begin, we can invert
the numbers by multiplying by -1. The numbers that were the largest are now the smallest. We can
now turn this collection of numbers into a min-heap. When we pop our inverted numbers from the
heap, we simply multiply by -1 again to get our initial number back.

numbers = [4, 1, 24, 2, 1]

# Invert numbers so that the largest values are now the smallest
numbers = [-1 * n for n in numbers]

# Turn numbers into min heap


[Link](numbers)

# Pop out 2 times


k = 2
klargest = []
for i in range(k):
# Multiply by -1 to get our initial number back
[Link](-1 * [Link](numbers))

# K largest will be [24, 4]

Thus, in situations where you need a max-heap, consider how you can reverse the order of the
elements, turning the largest elements into the smallest.

Building a Heap

As shown above, one approach is to make an additional empty list and push the elements into that list
with [Link] . The time complexity of this approach is O(N log N) where N is the number
of elements in the list.
A more efficient approach is to use [Link] . Given a list, this function will swap its elements
in place to make the list a min-heap. Moreover, [Link] only takes O(N) time.

# Approach 1: push new elements into the heap


numbers = [4, 1, 24, 2, 1]
heap = [] # our "heap"
for num in numbers:
[Link](heap, num)
smallest_num = [Link](heap)
# Time complexity: O(nlogn)

# Approach 2: use heapify()


numbers = [4, 1, 24, 2, 1]
[Link](numbers) # numbers can act as our heap now'.
smallest_num = [Link](numbers)
# Time complexity: O(n)

Comparing Across Multiple Fields

Suppose we had the age and height (in inches) of students stored in a list.

# each entry is a tuple (age, height) Back to Top


students = [(12, 54), (18, 78), (18, 62), (17, 67), (16, 67), (21, 76)]

Let’s say we wanted to get the K youngest students, and if two students had the same age, let’s
consider the one who is shorter to be younger. Unfortunately, there is no way for us to use a custom
comparator function to handle cases where two students have the same age. Instead, we can use
tuples to allow the heapq functions to compare multiple fields. To provide an example, consider the
students at indices 0 and 1. Since, student 0’s age, 12, is less than student 1’s, 18, student 0 will be
considered smaller and will be closer to the top of the heap than student 1. Let’s now consider
student 1 and student 2. Since both of their ages are the same, the heapq functions will compare the
next field in the tuple, which is height. Student 2 has a smaller height, so student 2 will be considered
smaller and will be closer to the top.

# Each entry is a tuple (age, height)


students = [(12, 54), (18, 78), (18, 62), (17, 67), (16, 67), (21, 76)]

# Since our student ages and heights are already tuples, no need to pre-process
[Link](students)

k = 4
kyoungest = []
for i in range(k):
[Link]([Link](students))

# K youngest will be [(12, 54), (16, 67), (17, 67), (18, 62)]

If you need to compare against multiple fields to find out which element is smaller or bigger, consider
having the heap’s elements be tuples.

Trie
Let’s understand Trie first. Trie is basically a tree where each node holds a character and can have up
to 26 children in the word add-search context, because we have 26 lower case characters in the
alphabet (from a to z ). So, basically each node represents a single character. Graphically
speaking:

If we insert the word abc in our trie, a will have b as a child, which in turn will have c as a child.
Graphically, this is how it looks like:

Back to Top
Note that in the above example, c denotes the end of the word for the word abc . Now, if we added
another word example, say ab , we don’t add a and b back again since they are already available
to us. We can simply re-use these characters. So, we have two words along this path abc and ab .
Basically, all words that start with a and b would live in this space, that’s what makes it efficient.
Hence, a trie is also called a prefix tree.
Now let’s take an example and build our tree.

Input
["WordDictionary","addWord","addWord","addWord","search","search","search","searc
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]

Output
[null,null,null,null,false,true,true,true]

The first word we add is bad , i.e., b -> a -> d .


Next, let’s add another word dad . Given that we don’t share the same prefix as the earlier word
bad ( bad starts with b , while dad starts with d ), we have to start with a different path. So, let’s
add dad as d -> a -> d .
We have one last word before we start searching, say mad . So, we don’t have m , then let’s add it:
m -> a -> d

Back to Top
So far we have three word’s and all of them end with a different d . But all three of them have different
prefixes that’s why they are along different path’s:

Now let’s look at the searching path:


The first word we gonna search for is pad . We start at the root and and begin looking for the
first character in the word, i.e., p . We immediately return False as p doesn’t even exist in
our input.
Now, let’s search for another word bad . We start at the root and see there are any nodes with
b . Yes, we have b . Now let’s check does b have a child a , which it does. For the last
character d , let’s check if a has a child d , and surely it does! Lastly we have to say the last
character d is in our Trie, which designated as the end of the word (marked red in the diagram
below). Therefore, we return True for the input bad .
Let’s search for .ad next. The dot . character matches any character. We start at the root
and go through all possible paths using a DFS or backtracking approach. So, let’s say we start
with going down b as the first node. Check if b has a child a , which it does. Now, check if
a has a child d , which it also does. Lastly, note that the last character d in our Trie is
designated as the end of the word. Therefore, we return True for this input .ad .
Let’s try one last search, looking for b.. . So, we start at the root and see there are any nodes
with b . Yes, we have b . Now, check if we have a character under b to match . , which we
have as a . Now, we’re looking for a character for our last dot . . Yes we have d and it is the
end of the word. Therefore, we return True for this input b.. . Back to Top
Here’s a visual demonstration for the overall process:

Dictionary

Dictionaries are the Python equivalent to Hash Maps and are an overpowered data structure. In most
cases, we can look up, add, delete and update keys in O(1) time. Let’s go over the basics of
dictionaries.

# empty dict
d = {}
# dict with some keys and values
d = {
"name": "James",
"age": 29
5: 29.3
}

When declaring a dictionary, we put the keys on the left and values on the right, separated by a colon.
In Python, the keys of a dictionary can only be hashable values, like int, str, tuple, and more (Hashable
Objects). In addition, a dictionary can have keys and values of multiple types as shown above.
We can also add, update and delete keys with [] .

d["numbers"] = [1, 2, 4, 12] # add key "numbers" to d


d["numbers"] = [1, 3, 6, 18] # update key "numbers" of d
Back to Top
del d["numbers"] # delete key "numbers" of d

We can also easily check if a key is in a dictionary with in .

doesNumbersExist = "numbers" in d

Python also provides a nice way of iterating over the keys inside a dictionary.

d = {"a": 0, "b": 5, "c": 6, "d": 7, "e": 11, "f": 19}

# iterate over each key in d


for key in d:
print(d[key])

When you are iterating over keys in this manner, remember that you cannot add new keys or delete
any existing keys, as this will result in an error.

Accessing a Non-existent Key

d = {"a": 0, "b": 7}
c_count = d["c"] # KeyError: 'c'

In the above snippet, c does not exist. We will get a KeyError when we run the code. As a result, in
many situations, we need to check if the key exists in a dictionary before we try to access it.
Although this check is simple, it can be a hassle in an interview if we need to do it many times.
Alternatively, if we use [Link], there is no need for this check. defaultdict
operates in the same way as a dictionary in nearly all aspects except for handling missing keys. When
accessing keys that don’t exist, defaultdict automatically sets a default value for them. The factory
function to create the default value can be passed in the constructor of defaultdict .

from collections import defaultdict

# create an empty defaultdict with default value equal to []


default_list_d = defaultdict(list)
default_list_d['a'].append(1) # default_list_d is now {"a": [1]}

# create a defaultdict with default value equal to 0


default_int_d = defaultdict(int)
default_int_d['c'] += 1 # default_int_d is now {"c": 1}

We can also pass in a lambda as the factory function to return custom default values. Let’s say for our
default value we return the tuple (0, 0).

pair_dict = defaultdict(lambda: (0,0))


print(pair_dict['point'] == (0,0)) # prints True

Using a defaultdict can help reduce the clutter in your code, speeding up your implementation.
Back to Top
Graph
A graph is a data structure that consists of the following two components:
1. A finite set of vertices/nodes.
2. A finite set of an ordered pair of the form (u, v) called as edges/arcs. The pair is ordered
because (u, v) is not the same as (v, u) in case of a directed graph (di-graph). The pair
of the form (u, v) indicates that there is an edge from vertex u to vertex v. The edges may
contain weight/value/cost.
Graphs are used to represent many real-life applications. Graphs are used to represent networks. The
networks may include paths in a city, a telephone network, a circuit network etc. Graphs are also used
in social networks such as LinkedIn, Facebook. For example, in Facebook, each person is
represented with a vertex (or node). Each node is a structure and contains information such as the
person’s ID, name, gender, and locale. See this for more applications of graph.
The following is an example of an undirected graph with five vertices:

The following two are the most commonly used representations of a graph:
1. Adjacency Matrix
2. Adjacency List
There are other representations also such as the Incidence Matrix and Incidence List. The choice of
graph representation is situation-specific and depends on the type of operations to be performed and
ease of use of each graph representation type.

Adjacency Matrix

An adjacency matrix is a 2D array of size V × V where V is the number of vertices in a graph. Let
the 2D array be adj[][] , a slot adj[i][j] = 1 indicates that there is an edge from vertex i to
vertex j . Adjacency matrix for an undirected graph is always symmetric. Adjacency matrix is also
used to represent weighted graphs. If adj[i][j] = w , then there is an edge from vertex i to
vertex j with weight w .
The adjacency matrix for the above example graph is:

Back to Top
Pros:
Representation is easier to implement and follow.
Removing an edge takes O(1) time. Queries such as checking whether there is an edge from
vertex u to vertex v are efficient and can be done O(1).
Cons:
Inefficient in terms of the space requirements, since it consumes O(V 2 ) space. Even if the
graph is sparse (i.e., it contains less number of edges), it still consumes the same space.
Adding a vertex takes O(V 2 ) time. Computing all neighbors of a vertex takes O(V) time (thus,
inefficient).
Please see this for a sample Python implementation of adjacency matrix.

Adjacency List

An array of lists is used. The size of the array is equal to the number of vertices. Let the array be an
array[] . An entry array[i] represents the list of vertices adjacent to the ith vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
represented as lists of pairs. Following is the adjacency list representation of the above graph:

Here’s a Python implementation using a list (which is internally implemented using a dynamic array) to
represent adjacency lists instead of the linked list. The dynamic array implementation of lists offers the
distinct advantage of cache friendliness (for sequential accesses) compared to linked lists.
Back to Top
"""
A Python program to demonstrate the adjacency
list representation of the graph
"""

# A class to represent the adjacency list of the node

class AdjNode:
def __init__(self, data):
[Link] = data
[Link] = None

# A class to represent a graph. A graph


# is the list of the adjacency lists.
# Size of the array will be the no. of the
# vertices "V"
class Graph:
def __init__(self, vertices):
self.V = vertices
[Link] = [None] * self.V

# Function to add an edge in an undirected graph


def add_edge(self, src, dest):
# Adding the node to the source node
node = AdjNode(dest)
[Link] = [Link][src]
[Link][src] = node

# Adding the source node to the destination as


# it is the undirected graph
node = AdjNode(src)
[Link] = [Link][dest]
[Link][dest] = node

# Function to print the graph


def print_graph(self):
for i in range(self.V):
print("Adjacency list of vertex {}\n head".format(i), end="")
temp = [Link][i]
while temp:
print(" -> {}".format([Link]), end="")
temp = [Link]
print(" \n")

# Driver program to the above graph class


if __name__ == "__main__":
V = 5
graph = Graph(V)
graph.add_edge(0, 1)
graph.add_edge(0, 4)
graph.add_edge(1, 2)
graph.add_edge(1, 3)
graph.add_edge(1, 4)
graph.add_edge(2, 3)
graph.add_edge(3, 4)

Back to Top
graph.print_graph()

which outputs:

Adjacency list of vertex 0


head -> 1-> 4

Adjacency list of vertex 1


head -> 0-> 2-> 3-> 4

Adjacency list of vertex 2


head -> 1-> 3

Adjacency list of vertex 3


head -> 1-> 2-> 4

Adjacency list of vertex 4


head -> 0-> 1-> 3

Pros:
Saves space since it needs O(‖V‖ + ‖E‖) space. In the worst case, there can be C(V, 2)
number of edges in a graph thus consuming O(V 2 ) space. Adding a vertex is easier than an
adjacency matrix. Computing all neighbors of a vertex takes optimal time.
Cons:
Queries like whether there is an edge from vertex u to vertex v are inefficient and need O(V)
time.

2
In real-life problems, graphs are sparse (i.e., |E| << |V | ). That’s why
^2$$ bound on
adjacency lists are commonly used for storing graphs since an adjacency matrix V
time complexity.
enforces a $$

For more, refer Wikipedia: Graph (abstract data type).

Generate a Graph Using Dictionary in Python

Let’s see how to implement a graph in python using the dictionary data structure in python. The keys
of the dictionary are the nodes/vertices of our graph and the corresponding values are lists that
contain nodes, each of which are connected to the original node by an edge/arc.
The keys of the dictionary used are the nodes of our graph and the corresponding values are lists with
each nodes, which are connecting by an edge. This simple graph has six nodes ( a-f ) and five arcs:

a -> c
b -> c
b -> e
c -> a
c -> b
c -> d
c -> e
d -> c
e -> c
e -> b

Back to Top
It can be represented by the following Python data structure. This is a dictionary whose keys are the
nodes of the graph. For each key, the corresponding value is a list containing the nodes that are
connected by a direct arc from this node.

graph = { "a" : ["c"],


"b" : ["c", "e"],
"c" : ["a", "b", "d", "e"],
"d" : ["c"],
"e" : ["c", "b"],
"f" : []
}

Graphical representation of above example:

defaultdict : Usually, a Python dictionary throws a KeyError if you try to get an item with a key
that is not currently in the dictionary. defaultdict allows that if a key is not found in the dictionary,
then instead of a KeyError being thrown, a new entry is created. The type of this new entry is given by
the argument of defaultdict .

Generate Graph

# Python program for


# validation of a graph

# import dictionary for graph


from collections import defaultdict

# function for adding edge to graph


graph = defaultdict(list)
def addEdge(graph,u,v):
graph[u].append(v)

# definition of function
Back to Top
def generate_edges(graph):
edges = []

# for each node in graph


for node in graph:

# for each neighbour node of a single node


for neighbour in graph[node]:

# if edge exists then append


[Link]((node, neighbour))
return edges

# declaration of graph as dictionary


addEdge(graph,'a','c')
addEdge(graph,'b','c')
addEdge(graph,'b','e')
addEdge(graph,'c','d')
addEdge(graph,'c','e')
addEdge(graph,'c','a')
addEdge(graph,'c','b')
addEdge(graph,'e','b')
addEdge(graph,'d','c')
addEdge(graph,'e','c')

# Driver Function call


# to print generated graph
print(generate_edges(graph))

which outputs:

[('a', 'c'), ('c', 'd'), ('c', 'e'), ('c', 'a'), ('c', 'b'),
('b', 'c'), ('b', 'e'), ('e', 'b'), ('e', 'c'), ('d', 'c')]

Note that since we have taken example of an undirected graph, we print the same edge twice say as
(‘a’,’c’) and (‘c’,’a’) . This issue can be fixed using a directed graph.

Generate the Path from One Node to the Other Node

Using Python dictionary, we can find the path from one node to the other in a Graph. The idea is
similar to DFS in graphs.
In the function, initially, the path is an empty list. In the starting, if the start node matches with the end
node, the function will return the path. Otherwise the code goes forward and hits all the values of the
starting node and searches for the path using recursion.

# Python program to generate the first


# path of the graph from the nodes provided

graph = {
'a': ['c'],
'b': ['d'],
'c': ['e'],
'd': ['a', 'd'],
'e': ['b', 'c']
}

# function to find path Back to Top


def find_path(graph, start, end, path=[]):
path = path + [start]
if start == end:
return path
for node in graph[start]:
if node not in path:
newpath = find_path(graph, node, end, path)
if newpath:
return newpath

# Driver function call to print the path


print(find_path(graph, 'd', 'c'))

which outputs:

['d', 'a', 'c']

Generate All the possible Paths from One Node to the Other

Let’s generate all the possible paths from the start node to the end node. The basic functioning works
same as the functioning of the above code. The place where the difference comes is instead of
instantly returning the first path, it saves that path in a list named as paths in the example below.
Finally, after iterating over all the possible ways, it returns the list of paths. If there is no path from the
starting node to the ending node, it returns None .

# Python program to generate the all possible


# path of the graph from the nodes provided
graph ={
'a':['c'],
'b':['d'],
'c':['e'],
'd':['a', 'd'],
'e':['b', 'c']
}

# function to generate all possible paths


def find_all_paths(graph, start, end, path =[]):
path = path + [start]
if start == end:
return [path]
paths = []
for node in graph[start]:
if node not in path:
newpaths = find_all_paths(graph, node, end, path)
for newpath in newpaths:
[Link](newpath)
return paths

# Driver function call to print all


# generated paths
print(find_all_paths(graph, 'd', 'c'))

which outputs:
Back to Top
[['d', 'a', 'c'], ['d', 'a', 'c']]

Generate the Shortest Path

To get to the shortest from all the paths, we use a little different approach as shown below. In this, as
we get the path from the start node to the end node, we compare the length of the path with a
variable named as shortest which is initialized with the None value.
If the length of generated path is less than the length of shortest, if shortest is not None , the newly
generated path is set as the value of shortest. Again, if there is no path, it returns None .

# Python program to generate shortest path

graph ={
'a':['c'],
'b':['d'],
'c':['e'],
'd':['a', 'd'],
'e':['b', 'c']
}

# function to find the shortest path


def find_shortest_path(graph, start, end, path =[]):
path = path + [start]
if start == end:
return path
shortest = None
for node in graph[start]:
if node not in path:
newpath = find_shortest_path(graph, node, end, path)
if newpath:
if not shortest or len(newpath) < len(shortest):
shortest = newpath
return shortest

# Driver function call to print


# the shortest path
print(find_shortest_path(graph, 'd', 'c'))

which outputs:

['d', 'a', 'c']

Python Time Complexities


This page documents the time-complexity (aka “Big O” or “Big Oh”) of various operations in current
CPython. Other Python implementations (or older or still-under development versions of CPython) may
have slightly different performance characteristics. However, it is generally safe to assume that they
are not slower by more than a factor of O(logn) .
Generally, n is the number of elements currently in the container. k is either the value of a parameter
or the number of elements in the parameter.

Back to Top
List
The average case assumes parameters generated uniformly at random.
Internally, a list is represented as an array; the largest costs come from growing beyond the current
allocation size (because everything must move), or from inserting or deleting somewhere near the
beginning (because everything after that must move). Note that slicing a list is O(k) where k is the
length of the slice (since slicing is just “copying part of the list” so the time complexity is the same as
a copy). If you need to add/remove at both ends, consider using a [Link] instead.

String
Note that since strings are immutable, a new string is created whenever an operation on the input
string (say, [Link]() ) is performed. Further, similar to lists, slicing a string is O(k) where k is
the length of the slice (since slicing is just “copying part of the string” so the time complexity is the
same as a copy). A string is also an iterable, so the time complexities are for other usual operations
are similar to that of a list.

[Link]

A deque (double-ended queue) is represented internally as a doubly linked list. (Well, a list of arrays
rather than objects, for greater efficiency.) Both ends are accessible, but even looking at the middle is
slow, and adding to or removing from the middle is slower still.

Back to Top
Set
See dict – the implementation is intentionally very similar.

As seen in the source code the complexities for set difference s-t or [Link](t)
( set_difference() ) and in-place set difference s.difference_update(t)
( set_difference_update_internal() ) are different! The first one is O(len(s)) (for every element
in s add it to the new set, if not in t ). The second one is O(len(t)) (for every element in t remove
it from s ). So care must be taken as to which is preferred, depending on which one is the longest set
and whether a new set is needed.
To perform set operations like s-t , both s and t need to be sets. However you can do the
method equivalents even if t is any iterable, for example [Link](l) , where l is a list.

Dict
The average case times listed for dict objects assume that the hash function for the objects is
sufficiently robust to make collisions uncommon. The average case assumes the keys used in
parameters are selected uniformly at random from the set of all keys.
Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn’t affect the
algorithmic complexity, but it can significantly affect the constant factors: how quickly a typical
program finishes.

Back to Top
Counter
[Link] is just a subclass of dict . Constructing it is O(n) , because it has to
iterate over the input, but operations on individual elements remain O(1) .

Notes
[1] = These operations rely on the “Amortized” part of “Amortized Worst Case”. Individual actions may
take surprisingly long, depending on the history of the container.
[2] = Popping the intermediate element at index k from a list of size n shifts all elements after k by one
slot to the left using memmove . n − k elements have to be moved, so the operation is O(n − k) . The
best case is popping the second to last element, which necessitates one move, the worst case is
popping the first element, which involves n − 1 moves. The average case for an average value of k is
popping the element the middle of the list, which takes O(n/2) = O(n) operations.
[3] = For these operations, the worst case n is the maximum size the container ever achieved, rather
than just the current size. For example, if N objects are added to a dictionary, then N − 1 are deleted,
the dictionary will still be sized for N objects (at least) until another insertion is made.

References
Python Implementations of Data Structures
Python Dictionary Tips
What are the differences between type() and isinstance()?
Quora: What is the height, size, and depth of a binary tree?
What is the time complexity of [Link]() in Python?
Python Time Complexity
Big-O of list slicing
What is the time complexity of slicing a list?
Graph and its representations
Generate a graph using Dictionary in Python

Citation
If you found our work useful, please cite it as:

@article{Chadha2020DistilledPythonDS,
title = {Python Data Structures and Time Complexities},
author = {Chadha, Aman},
journal = {Distilled AI},
Back to Top
year = {2020},
note = {\url{[Link]
}

| | | |

[Link]

Back to Top

You might also like