0% found this document useful (0 votes)
11 views7 pages

Understanding Binary Files in Python

The document provides an overview of binary files in Python, highlighting their differences from text files, advantages, and disadvantages. It explains the process of serialization and deserialization, particularly using the pickle module for converting Python objects to byte streams and vice versa. Additionally, it includes code examples for writing to and reading from binary files, as well as serializing and deserializing objects using pickle.

Uploaded by

dvnraju.svecw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Understanding Binary Files in Python

The document provides an overview of binary files in Python, highlighting their differences from text files, advantages, and disadvantages. It explains the process of serialization and deserialization, particularly using the pickle module for converting Python objects to byte streams and vice versa. Additionally, it includes code examples for writing to and reading from binary files, as well as serializing and deserializing objects using pickle.

Uploaded by

dvnraju.svecw
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Binary Files

Python Programming

Dr D V Naga Raju
Dept. of IT
Introduction
A binary file in Python is a file that contains data in a format that is not human-readable. Unlike
text files, which store data as a sequence of characters (usually readable text), binary files store
data in binary (bits and bytes). This means the data can represent any kind of information, such as
images, audio, or executable code.

Differences Between Binary Files and Text Files:

1. Format:
○ Text Files: Store data as a sequence of characters. Each character is encoded in a
specific encoding format (e.g., UTF-8).
○ Binary Files: Store data in binary form. They represent data as a series of bytes.
2. Human Readability:
○ Text Files: Human-readable (e.g., .txt files).
○ Binary Files: Not human-readable (e.g., .jpg, .exe files).
3. Usage:
○ Text Files: Used for storing textual data like documents, source code, etc.
○ Binary Files: Used for storing non-text data like images, audio, videos, and
compiled programs.

Advantages of Binary Files:

1. Efficient Storage: Binary files are more space-efficient since they store data in its native
binary format.
2. Performance: Reading and writing binary files can be faster because no
encoding/decoding is needed.
3. Complex Data: Suitable for storing complex data structures directly.

Disadvantages of Binary Files:

1. Not Human-Readable: Binary files cannot be read or edited easily with a text editor.
2. Portability Issues: Binary files may not be portable across different systems if they rely on
specific hardware or software configurations.
3. Error-Prone: Errors in binary files are harder to detect and fix compared to text files

1
# Open the file in binary write mode

f = open('[Link]', 'wb')

# Attempt to write a string to the binary file

[Link]('Hai') # This will raise an error TypeError: a bytes-like object is required, not 'str'

[Link]()

When a file is opened in binary mode ('wb'), the write method expects the data to be in
bytes. Strings are sequences of characters, and in binary mode, the file operations require
bytes.

Writing data to a binary files

# Open the file in binary write mode

with open('[Link]', 'wb') as f:

# Convert the string to bytes and write to the file

[Link]('Hai'.encode('utf-8'))

Reading data from a binary files

# Open the file in binary read mode

with open('[Link]', 'rb') as f:

# Read the bytes from the file

content = [Link]()

# Decode the bytes to a string

string_content = [Link]('utf-8')

print(string_content) # Output: Hai

2
Object Serialization

Serialization is the process of converting a Python object into a byte stream that can be easily
stored or transmitted.

The serialized form preserves the object's structure and internal data.

Common serialization formats used in Python include:

● pickle: A native Python serialization format that can handle most Python data types
● JSON: A human-readable text format that maps to basic data types like numbers, strings,
booleans, etc.
● XML: A markup language that can represent structured data

Deserialization

Deserialization is the reverse process of serialization, where the serialized data is converted back
into a Python object. This allows the original data to be reconstructed and used in the program.

3
Introduction to Pickle in Python

The pickle module in Python is used for serializing and deserializing Python objects. Serialization
(or pickling) is the process of converting a Python object into a byte stream, and deserialization
(or unpickling) is the process of converting the byte stream back into a Python object.

Key Features of the pickle Module

1. Serialization (Pickling):
○ Converts Python objects to a byte stream.
○ Supports complex data structures such as lists, dictionaries, classes, and more.
2. Deserialization (Unpickling):
○ Converts a byte stream back to a Python object.
○ Reconstructs the original object with its original structure and data.

Disadvantages of using Pickle

● Pickle is unsafe because it can execute malicious Python callables to construct objects.
When deserializing an object, Pickle cannot tell the difference between a malicious
callable and a non-malicious one. Due to this, users can end up executing arbitrary code
during deserialization.

The pickle module in Python provides several key methods for serializing and deserializing
objects.

Serialize an object

[Link](obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)

[Link](obj, protocol=None, *, fix_imports=True, buffer_callback=None)

Deserializing

[Link](file, *, fix_imports=True, encoding='ASCII', errors='strict', buffers=None)

[Link](data, /, *, fix_imports=True, encoding=”ASCII”, errors=”strict”, buffers=None)

4
dump() - This method serializes an object and writes it to a file

import pickle

student_names = ['A','B','C','D']

with open('student_file.pkl', 'wb') as f: # open a text file

[Link](student_names, f) # serialize the list

Note: The extension does not have to be .pkl. You can name this anything you’d like, and the file
will still be created. However, it is good practice to use the .pkl extension so that you are
reminded that this is a Pickle file.

dumps() - This method serializes an object and returns the serialized object as a byte
stream.

import pickle

# Example data

data = {'name': 'Alice', 'age': 30, 'scores': [85, 90, 92]}

# Serialize the object to a byte stream

byte_stream = [Link](data)

print(byte_stream)

deserialize the file and print the list:

load() - This method deserializes an object from a file.

with open('student_file.pkl', 'rb') as f:

student_names_loaded = [Link](f) # deserialize using load()

print(student_names_loaded) # print student names

5
loads() - This method serializes an object and returns the serialized object as a byte
stream.

import pickle

# Byte stream containing serialized data

byte_stream = [Link]({'name': 'Alice', 'age': 30, 'scores': [85, 90, 92]})

# Deserialize the object from the byte stream

loaded_data = [Link](byte_stream)

print(loaded_data) # Output: {'name': 'Alice', 'age': 30, 'scores': [85, 90, 92]}

You might also like