Python Data Structures Overview
Python Data Structures Overview
Back to [Link]
search...
Stack
A stack follows the last in, first out (LIFO) principle. Unlike an array structure, which allows random
access at all the positions, a stack limits the insert (push) and remove (pop) operation to only one end
of the data structure, i.e., to the top. A stack also allows us to get the top element in O(1) time.
Though there isn’t an explicit stack class in Python, we can use a list instead. To follow the LIFO
principle, inserting and removing operations both occur at the tail of the list (which emulates the top of
the stack).
We can use append and pop to add and remove elements off the end in amortized O(1) time
(refer Python Time Complexities). Python lists are implemented as dynamic arrays. Once the array is
full, Python will allocate more memory for a larger array and copy the old elements to it. Since this
reallocation doesn’t happen too often, append will still be O(1) in most cases. Also, we can index into
the list with -1 to get the last element in a list. Basically, the end of the list will be the “top of the
stack”.
class stack:
# by default pass in [] as the initial value
def __init__(self, initialVal=[]):
[Link] = initialVal
def checkStack(self):
print([Link])
def checkTop(self):
print([Link][-1])
Queue
Similar to a stack, a queue also limits the insertion and removal of elements to specific ends of the
data structure. However, unlike a stack, a queue follows the first in, first out (FIFO) principle.
In Python, a queue can also be implemented using a list. In accordance with the FIFO principle, the
insertion operation occurs at the tail of the list, while the deletion operation occurs at the head of the
list.
A wrapper class to emulate a queue using a list is as follows:
class queue:
# by default pass in [] as the initial value
def __init__(self, initialVal=[]):
[Link] = initialVal
def checkQueue(self):
print([Link])
def checkHead(self)
print([Link][0])
def checkTail(self)
print([Link][-1])
In the collections package, Python provides deque , a double-ended queue class. With deque , we
can insert and remove from both ends in O(1) time. The deque constructor takes in an iterable – we
can pass in an empty list to create an empty deque.
The main methods are append , appendleft , pop and popleft . As the names suggest,
append and pop will add and remove elements from the end, whereas appendleft and
popleft do the same at the front. In order to emulate a queue, we add from one side and remove
Back to Top
from the other.
from collections import deque
If you noticed, we can also use deque to emulate a stack: we add and remove from the same side.
On a different note, a common mistake is using lists as queues. Let’s consider two ways we can use
lists as queues. We can add to the front of the list and remove from the end. This requires using
[insert]([Link] at index 0 and
pop .
queue = [2, 3, 4, 5]
[Link](0, 1) # Add 1 at index 0, [1, 2, 3, 4, 5]
element = [Link]() # Pop 5 from the end, [1, 2, 3, 4]
The problem is that inserting an element at the front of a list is O(N) , since all the other elements must
be shifted to the right (Time Complexity).
Let’s say instead we decided to add at the end and remove from the front.
queue = [5, 4, 3, 2]
[Link](1) # Add 1 to the end, [5, 4, 3, 2, 1]
element = [Link](0) # Pop 5 from the front, [4, 3, 2, 1]
The problem still remains, since popping from the front of the list will be O(N) as everything to the
right must shift to the left.
Even though using lists as queues is fine if the lists are relatively small, using them over the more
efficient deque in an interview setting where time complexity is important is not ideal.
Set
A set object is an unordered collection of distinct hashable objects. It’s one of Python’s built-in types
and allows the dynamic adding and removing of elements, iteration, and operations with another set
objects.
Here’s some operations on a set:
# iteration of set
for ele in myset:
print(ele)
myset1 = {1, 2, 3}
myset2 = {1, 2, 4, 5}
# union
myset = [Link](myset2)
# intersection
myset = [Link](myset2)
# difference
myset = [Link](myset2)
Dictionary
A dictionary contains mapping information of (key, value) pairs.
Given a key, the corresponding value can be found in the dictionary. It’s also one of Python’s built-in
types. The (key, value) pairs can be dynamically added, removed, and modified. A dictionary
also allows iteration through the keys, values, and (key, value) pairs.
Here’s some operations on a dictionary:
Back to Top
for key, value in [Link]():
print(key,value)
Linked List
A linked list is a linear data structure, with the previous node pointing to the next node. It can be
implemented by defining a ListNode class.
Here’s some operations on a linked list:
class ListNode:
def __init__(self, val):
[Link] = val
[Link] = None
[Link] = secondNode
[Link] = thirdNode
# insert new listnode with value of 5 in between the secondNode and thirdNode
curNode = headNode
while [Link] != 2:
curNode = [Link]
newNode = ListNode(5)
[Link] = [Link]
[Link] = newNode
Binary Tree
A tree is a nonlinear data structure where each node has at most two children, which are referred to
as the left child and the right child.
Concepts
There are three important properties of trees: height, depth and level, together with edge and path
and tree (data structure) on wiki also explains them briefly:
Edge
Back to Top
The connection between one node to another is an edge.
An example of edge is shown above between A and B. Basically, an edge is a line between two
nodes, or a node and a leaf.
Path
A path starts from a node and ends at another node or a leaf. Please don’t look over the following points:
Back to Top
1. When we talk about a path, it includes all nodes and all edges along the path, not just edges.
2. The direction of a path is strictly from top to bottom and cannot be changed in middle. In the diagram,
we can’t really talk about a path from B to F although B is above F. Also there will be no path starting
from a leaf or from a child node to a parent node.
Height
The height of a node is the number of edges on the longest downward path between that node and a
leaf.
At first, we can see the above wiki definition has a redundant term - downward - inside. As we already
know from previous section, path can only be downward.
The height of a tree is the number of edges on the longest downward path between the root and a leaf.
Essentially, the height of a tree is the height of its root. Note that if a tree has only one node, then that node
is at the same time the root node and the only leaf node, so the height of the tree is 0. On the other hand, if
the tree has no nodes, it’s height is -1.
Frequently, we may be asked the question: what is the max number of nodes a tree can have if the
height of the tree is h?. Of course the answer is 2h − 1. When h = 1, the number of node inside is 1,
which is just the root; also when a tree has just root, the height of the root is 1. Hence, the two
inductions match.
How about giving a height of 0? Then it means we don’t have any node in the tree; but still we may
have leaf inside. This is why in most languages, the type of a tree can be a leaf alone.
Moreover, when we use 2h − 1 to calculate the max number of nodes, leaves are not taken into
account. Leaf is not Node. It carries no key or data, and acts only like a STOP sign. We need to
remember this when we deal with properties of trees.
Depth
The depth of a node is the number of edges from the node to the tree’s root node.
Back to Top
We don’t care about path any more when depth pops in. We just count how many edges between the
targeting node and the root, ignoring directions. For example, D’s depth is 2.
Recall that when talking about height, we actually imply a baseline located at bottom. For depth, the
baseline is at top which is root level. That’s why we call it depth.
Note that the depth of the root is 0.
The depth of a binary tree is usually used to refer to the height of the tree.
Level
The level of a node is defined by 1 + the number of connections between the node and the root.
The size of a binary tree is the total number of nodes in that tree.
Example
The height of node 2 is 1 because from 2 there is a path to two leaf nodes (4 and 5), and each of the
two paths is only 1 edge long, so the largest is 1.
The height of node 3 is 2 because from 3 there is a path to only one leaf node (7), and it consists of 2
edges.
The height of the tree is 3 because from the root node 1 there is a path to three leaf nodes (4, 5, and
7), and the path to the 4 and 5 consists of 2 edges, whereas the path to the 7 consists of 3 edges, so
we pick the largest, which is 3.
The size of the tree is 7.
The depth of node 2 is 1; The depth of node 3 is 1; the depth node 6 is 2.
The depth of the binary tree is equal to the height of the tree, so it is 3.
Implementation
A tree represents the nodes connected by edges. It is a non-linear data structure. It has the following
properties −
One node is marked as Root node.
Every node other than the root is associated with one parent node.
Each node can have an arbitrary number of child node.
Create Root
We just create a Node class and add assign a value to the node. This becomes a tree with only a
root node.
class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data
Back to Top
def PrintTree(self):
print([Link])
root = Node(10)
[Link]()
10
To insert into a tree we use the same node class created above and add a insert class to it. The
insert class compares the value of the node to the parent node and decides to add it as a left
node or a right node. Finally the PrintTree class is used to print the tree.
class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data
When the above code is executed, it produces the following result − Back to Top
3 6 12 14
Traversing a Tree
A tree can be traversed by deciding on a sequence to visit each node. We can start at a node then
visit the left subtree first and right subtree next, or we can also visit the right subtree first and left
subtree next. Accordingly there are different names for these tree traversal methods.
Traversal is a process to visit all the nodes of a tree and may print their values too. Because, all nodes
are connected via edges (links) we always start from the root (head) node. That is, we cannot
randomly access a node in a tree. There are three ways which we use to traverse a tree.
In-order Traversal
Pre-order Traversal
Post-order Traversal
In-order Traversal
In this traversal method, the left subtree is visited first, then the root and later the right subtree. We
should always remember that every node may represent a subtree itself.
In the below python program, we use the Node class to create place holders for the root node as well
as the left and right nodes. Then, we create an insert function to add data to the tree. Finally, the In-
order traversal logic is implemented by creating an empty list and adding the left node first followed
by the root or parent node.
At last the left node is added to complete the In-order traversal. Please note that this process is
repeated for each subtree until all the nodes are traversed.
class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data
# Insert Node
def insert(self, data):
if [Link]:
if data < [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else data > [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else:
[Link] = data
# Inorder traversal
# Left -> Root -> Right
def inorderTraversal(self, root):
res = []
if root:
res = [Link]([Link])
[Link]([Link])
res = res + [Link]([Link])
return res
root = Node(27)
[Link](14)
[Link](35)
[Link](10)
[Link](19)
[Link](31)
[Link](42)
print([Link](root))
Pre-order Traversal
In this traversal method, the root node is visited first, then the left subtree and finally the right subtree.
In the below python program, we use the Node class to create place holders for the root node as well
as the left and right nodes. Then, we create an insert function to add data to the tree. Finally, the Pre-
order traversal logic is implemented by creating an empty list and adding the root node first followed
by the left node.
At last, the right node is added to complete the pre-order traversal. Please note that, this process is
repeated for each subtree until all the nodes are traversed.
class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data
# Insert Node
def insert(self, data):
if [Link]:
if data < [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
elif data > [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else:
Back to Top
[Link] = data
# Print the Tree
def PrintTree(self):
if [Link]:
[Link]()
print([Link]),
if [Link]:
[Link]()
# Preorder traversal
# Root -> Left ->Right
def PreorderTraversal(self, root):
res = []
if root:
[Link]([Link])
res = res + [Link]([Link])
res = res + [Link]([Link])
return res
root = Node(27)
[Link](14)
[Link](35)
[Link](10)
[Link](19)
[Link](31)
[Link](42)
print([Link](root))
Post-order Traversal
In this traversal method, the root node is visited last, hence the name. First, we traverse the left
subtree, then the right subtree and finally the root node.
In the below python program, we use the Node class to create place holders for the root node as well
as the left and right nodes. Then, we create an insert function to add data to the tree. Finally, the Post-
order traversal logic is implemented by creating an empty list and adding the left node first followed
by the right node.
At last the root or parent node is added to complete the Post-order traversal. Please note that, this
process is repeated for each subtree until all the nodes are traversed.
class Node:
def __init__(self, data):
[Link] = None
[Link] = None
[Link] = data
# Insert Node
def insert(self, data):
if [Link]:
if data < [Link]:
if [Link] is None:
[Link] = Node(data)
Back to Top
else:
[Link](data)
else if data > [Link]:
if [Link] is None:
[Link] = Node(data)
else:
[Link](data)
else:
[Link] = data
# Postorder traversal
# Left ->Right -> Root
def PostorderTraversal(self, root):
res = []
if root:
res = [Link]([Link])
res = res + [Link]([Link])
[Link]([Link])
return res
root = Node(27)
[Link](14)
[Link](35)
[Link](10)
[Link](19)
[Link](31)
[Link](42)
print([Link](root))
Summary
class TreeNode:
def __init__(self, val):
[Link] = val
[Link] = None
[Link] = None
# initialization of a tree
rootNode = TreeNode(1)
leftNode = TreeNode(2)
rightNode = TreeNode(3)
[Link] = leftNode
[Link] = rightNode
Back to Top
# inorderTraversal of the tree (Left, Root, Right)
def inorderTraversal(root):
result = []
if not root:
return [Link]
df(root)
return result
if not root:
return result
df(root)
return result
if not root:
return result
df(root)
return result
Heap
import heapq
heap = [] # Create a new 'heap'. it is really just a list
We can add elements to our heap with [Link] and pop elements out of our heap with
[Link] .
[Link] and [Link] will maintain the min-heap invariant in the list that we use
as a heap. As an example:
heap = []
numbers = [9,2,4,1,3,11]
k = 3
ksmallest = []
print(ksmallest)
If we were to push all the elements in numbers into heap and pop out 3 times, we would get the 3
smallest elements – in this case [1,2,3] . There is no “setting” that tells the heapq functions to
treat our heap as a max-heap: the functions “assume” that the passed in heap is a min-heap and only
provide the expected results if it is.
# Invert numbers so that the largest values are now the smallest
numbers = [-1 * n for n in numbers]
Thus, in situations where you need a max-heap, consider how you can reverse the order of the
elements, turning the largest elements into the smallest.
Building a Heap
As shown above, one approach is to make an additional empty list and push the elements into that list
with [Link] . The time complexity of this approach is O(N log N) where N is the number
of elements in the list.
A more efficient approach is to use [Link] . Given a list, this function will swap its elements
in place to make the list a min-heap. Moreover, [Link] only takes O(N) time.
Suppose we had the age and height (in inches) of students stored in a list.
Let’s say we wanted to get the K youngest students, and if two students had the same age, let’s
consider the one who is shorter to be younger. Unfortunately, there is no way for us to use a custom
comparator function to handle cases where two students have the same age. Instead, we can use
tuples to allow the heapq functions to compare multiple fields. To provide an example, consider the
students at indices 0 and 1. Since, student 0’s age, 12, is less than student 1’s, 18, student 0 will be
considered smaller and will be closer to the top of the heap than student 1. Let’s now consider
student 1 and student 2. Since both of their ages are the same, the heapq functions will compare the
next field in the tuple, which is height. Student 2 has a smaller height, so student 2 will be considered
smaller and will be closer to the top.
# Since our student ages and heights are already tuples, no need to pre-process
[Link](students)
k = 4
kyoungest = []
for i in range(k):
[Link]([Link](students))
# K youngest will be [(12, 54), (16, 67), (17, 67), (18, 62)]
If you need to compare against multiple fields to find out which element is smaller or bigger, consider
having the heap’s elements be tuples.
Trie
Let’s understand Trie first. Trie is basically a tree where each node holds a character and can have up
to 26 children in the word add-search context, because we have 26 lower case characters in the
alphabet (from a to z ). So, basically each node represents a single character. Graphically
speaking:
If we insert the word abc in our trie, a will have b as a child, which in turn will have c as a child.
Graphically, this is how it looks like:
Back to Top
Note that in the above example, c denotes the end of the word for the word abc . Now, if we added
another word example, say ab , we don’t add a and b back again since they are already available
to us. We can simply re-use these characters. So, we have two words along this path abc and ab .
Basically, all words that start with a and b would live in this space, that’s what makes it efficient.
Hence, a trie is also called a prefix tree.
Now let’s take an example and build our tree.
Input
["WordDictionary","addWord","addWord","addWord","search","search","search","searc
[[],["bad"],["dad"],["mad"],["pad"],["bad"],[".ad"],["b.."]]
Output
[null,null,null,null,false,true,true,true]
Back to Top
So far we have three word’s and all of them end with a different d . But all three of them have different
prefixes that’s why they are along different path’s:
Dictionary
Dictionaries are the Python equivalent to Hash Maps and are an overpowered data structure. In most
cases, we can look up, add, delete and update keys in O(1) time. Let’s go over the basics of
dictionaries.
# empty dict
d = {}
# dict with some keys and values
d = {
"name": "James",
"age": 29
5: 29.3
}
When declaring a dictionary, we put the keys on the left and values on the right, separated by a colon.
In Python, the keys of a dictionary can only be hashable values, like int, str, tuple, and more (Hashable
Objects). In addition, a dictionary can have keys and values of multiple types as shown above.
We can also add, update and delete keys with [] .
doesNumbersExist = "numbers" in d
Python also provides a nice way of iterating over the keys inside a dictionary.
When you are iterating over keys in this manner, remember that you cannot add new keys or delete
any existing keys, as this will result in an error.
d = {"a": 0, "b": 7}
c_count = d["c"] # KeyError: 'c'
In the above snippet, c does not exist. We will get a KeyError when we run the code. As a result, in
many situations, we need to check if the key exists in a dictionary before we try to access it.
Although this check is simple, it can be a hassle in an interview if we need to do it many times.
Alternatively, if we use [Link], there is no need for this check. defaultdict
operates in the same way as a dictionary in nearly all aspects except for handling missing keys. When
accessing keys that don’t exist, defaultdict automatically sets a default value for them. The factory
function to create the default value can be passed in the constructor of defaultdict .
We can also pass in a lambda as the factory function to return custom default values. Let’s say for our
default value we return the tuple (0, 0).
Using a defaultdict can help reduce the clutter in your code, speeding up your implementation.
Back to Top
Graph
A graph is a data structure that consists of the following two components:
1. A finite set of vertices/nodes.
2. A finite set of an ordered pair of the form (u, v) called as edges/arcs. The pair is ordered
because (u, v) is not the same as (v, u) in case of a directed graph (di-graph). The pair
of the form (u, v) indicates that there is an edge from vertex u to vertex v. The edges may
contain weight/value/cost.
Graphs are used to represent many real-life applications. Graphs are used to represent networks. The
networks may include paths in a city, a telephone network, a circuit network etc. Graphs are also used
in social networks such as LinkedIn, Facebook. For example, in Facebook, each person is
represented with a vertex (or node). Each node is a structure and contains information such as the
person’s ID, name, gender, and locale. See this for more applications of graph.
The following is an example of an undirected graph with five vertices:
The following two are the most commonly used representations of a graph:
1. Adjacency Matrix
2. Adjacency List
There are other representations also such as the Incidence Matrix and Incidence List. The choice of
graph representation is situation-specific and depends on the type of operations to be performed and
ease of use of each graph representation type.
Adjacency Matrix
An adjacency matrix is a 2D array of size V × V where V is the number of vertices in a graph. Let
the 2D array be adj[][] , a slot adj[i][j] = 1 indicates that there is an edge from vertex i to
vertex j . Adjacency matrix for an undirected graph is always symmetric. Adjacency matrix is also
used to represent weighted graphs. If adj[i][j] = w , then there is an edge from vertex i to
vertex j with weight w .
The adjacency matrix for the above example graph is:
Back to Top
Pros:
Representation is easier to implement and follow.
Removing an edge takes O(1) time. Queries such as checking whether there is an edge from
vertex u to vertex v are efficient and can be done O(1).
Cons:
Inefficient in terms of the space requirements, since it consumes O(V 2 ) space. Even if the
graph is sparse (i.e., it contains less number of edges), it still consumes the same space.
Adding a vertex takes O(V 2 ) time. Computing all neighbors of a vertex takes O(V) time (thus,
inefficient).
Please see this for a sample Python implementation of adjacency matrix.
Adjacency List
An array of lists is used. The size of the array is equal to the number of vertices. Let the array be an
array[] . An entry array[i] represents the list of vertices adjacent to the ith vertex. This
representation can also be used to represent a weighted graph. The weights of edges can be
represented as lists of pairs. Following is the adjacency list representation of the above graph:
Here’s a Python implementation using a list (which is internally implemented using a dynamic array) to
represent adjacency lists instead of the linked list. The dynamic array implementation of lists offers the
distinct advantage of cache friendliness (for sequential accesses) compared to linked lists.
Back to Top
"""
A Python program to demonstrate the adjacency
list representation of the graph
"""
class AdjNode:
def __init__(self, data):
[Link] = data
[Link] = None
Back to Top
graph.print_graph()
which outputs:
Pros:
Saves space since it needs O(‖V‖ + ‖E‖) space. In the worst case, there can be C(V, 2)
number of edges in a graph thus consuming O(V 2 ) space. Adding a vertex is easier than an
adjacency matrix. Computing all neighbors of a vertex takes optimal time.
Cons:
Queries like whether there is an edge from vertex u to vertex v are inefficient and need O(V)
time.
2
In real-life problems, graphs are sparse (i.e., |E| << |V | ). That’s why
^2$$ bound on
adjacency lists are commonly used for storing graphs since an adjacency matrix V
time complexity.
enforces a $$
Let’s see how to implement a graph in python using the dictionary data structure in python. The keys
of the dictionary are the nodes/vertices of our graph and the corresponding values are lists that
contain nodes, each of which are connected to the original node by an edge/arc.
The keys of the dictionary used are the nodes of our graph and the corresponding values are lists with
each nodes, which are connecting by an edge. This simple graph has six nodes ( a-f ) and five arcs:
a -> c
b -> c
b -> e
c -> a
c -> b
c -> d
c -> e
d -> c
e -> c
e -> b
Back to Top
It can be represented by the following Python data structure. This is a dictionary whose keys are the
nodes of the graph. For each key, the corresponding value is a list containing the nodes that are
connected by a direct arc from this node.
defaultdict : Usually, a Python dictionary throws a KeyError if you try to get an item with a key
that is not currently in the dictionary. defaultdict allows that if a key is not found in the dictionary,
then instead of a KeyError being thrown, a new entry is created. The type of this new entry is given by
the argument of defaultdict .
Generate Graph
# definition of function
Back to Top
def generate_edges(graph):
edges = []
which outputs:
[('a', 'c'), ('c', 'd'), ('c', 'e'), ('c', 'a'), ('c', 'b'),
('b', 'c'), ('b', 'e'), ('e', 'b'), ('e', 'c'), ('d', 'c')]
Note that since we have taken example of an undirected graph, we print the same edge twice say as
(‘a’,’c’) and (‘c’,’a’) . This issue can be fixed using a directed graph.
Using Python dictionary, we can find the path from one node to the other in a Graph. The idea is
similar to DFS in graphs.
In the function, initially, the path is an empty list. In the starting, if the start node matches with the end
node, the function will return the path. Otherwise the code goes forward and hits all the values of the
starting node and searches for the path using recursion.
graph = {
'a': ['c'],
'b': ['d'],
'c': ['e'],
'd': ['a', 'd'],
'e': ['b', 'c']
}
which outputs:
Generate All the possible Paths from One Node to the Other
Let’s generate all the possible paths from the start node to the end node. The basic functioning works
same as the functioning of the above code. The place where the difference comes is instead of
instantly returning the first path, it saves that path in a list named as paths in the example below.
Finally, after iterating over all the possible ways, it returns the list of paths. If there is no path from the
starting node to the ending node, it returns None .
which outputs:
Back to Top
[['d', 'a', 'c'], ['d', 'a', 'c']]
To get to the shortest from all the paths, we use a little different approach as shown below. In this, as
we get the path from the start node to the end node, we compare the length of the path with a
variable named as shortest which is initialized with the None value.
If the length of generated path is less than the length of shortest, if shortest is not None , the newly
generated path is set as the value of shortest. Again, if there is no path, it returns None .
graph ={
'a':['c'],
'b':['d'],
'c':['e'],
'd':['a', 'd'],
'e':['b', 'c']
}
which outputs:
Back to Top
List
The average case assumes parameters generated uniformly at random.
Internally, a list is represented as an array; the largest costs come from growing beyond the current
allocation size (because everything must move), or from inserting or deleting somewhere near the
beginning (because everything after that must move). Note that slicing a list is O(k) where k is the
length of the slice (since slicing is just “copying part of the list” so the time complexity is the same as
a copy). If you need to add/remove at both ends, consider using a [Link] instead.
String
Note that since strings are immutable, a new string is created whenever an operation on the input
string (say, [Link]() ) is performed. Further, similar to lists, slicing a string is O(k) where k is
the length of the slice (since slicing is just “copying part of the string” so the time complexity is the
same as a copy). A string is also an iterable, so the time complexities are for other usual operations
are similar to that of a list.
[Link]
A deque (double-ended queue) is represented internally as a doubly linked list. (Well, a list of arrays
rather than objects, for greater efficiency.) Both ends are accessible, but even looking at the middle is
slow, and adding to or removing from the middle is slower still.
Back to Top
Set
See dict – the implementation is intentionally very similar.
As seen in the source code the complexities for set difference s-t or [Link](t)
( set_difference() ) and in-place set difference s.difference_update(t)
( set_difference_update_internal() ) are different! The first one is O(len(s)) (for every element
in s add it to the new set, if not in t ). The second one is O(len(t)) (for every element in t remove
it from s ). So care must be taken as to which is preferred, depending on which one is the longest set
and whether a new set is needed.
To perform set operations like s-t , both s and t need to be sets. However you can do the
method equivalents even if t is any iterable, for example [Link](l) , where l is a list.
Dict
The average case times listed for dict objects assume that the hash function for the objects is
sufficiently robust to make collisions uncommon. The average case assumes the keys used in
parameters are selected uniformly at random from the set of all keys.
Note that there is a fast-path for dicts that (in practice) only deal with str keys; this doesn’t affect the
algorithmic complexity, but it can significantly affect the constant factors: how quickly a typical
program finishes.
Back to Top
Counter
[Link] is just a subclass of dict . Constructing it is O(n) , because it has to
iterate over the input, but operations on individual elements remain O(1) .
Notes
[1] = These operations rely on the “Amortized” part of “Amortized Worst Case”. Individual actions may
take surprisingly long, depending on the history of the container.
[2] = Popping the intermediate element at index k from a list of size n shifts all elements after k by one
slot to the left using memmove . n − k elements have to be moved, so the operation is O(n − k) . The
best case is popping the second to last element, which necessitates one move, the worst case is
popping the first element, which involves n − 1 moves. The average case for an average value of k is
popping the element the middle of the list, which takes O(n/2) = O(n) operations.
[3] = For these operations, the worst case n is the maximum size the container ever achieved, rather
than just the current size. For example, if N objects are added to a dictionary, then N − 1 are deleted,
the dictionary will still be sized for N objects (at least) until another insertion is made.
References
Python Implementations of Data Structures
Python Dictionary Tips
What are the differences between type() and isinstance()?
Quora: What is the height, size, and depth of a binary tree?
What is the time complexity of [Link]() in Python?
Python Time Complexity
Big-O of list slicing
What is the time complexity of slicing a list?
Graph and its representations
Generate a graph using Dictionary in Python
Citation
If you found our work useful, please cite it as:
@article{Chadha2020DistilledPythonDS,
title = {Python Data Structures and Time Complexities},
author = {Chadha, Aman},
journal = {Distilled AI},
Back to Top
year = {2020},
note = {\url{[Link]
}
| | | |
[Link]
Back to Top