IO & OS
File Management
Opening Files
The argument of the open function is the path to the file. If the file is in the current working directory of the program, you can specify only its name. You can specify the mode used to open a file by applying a second argument to the open function. Sending "r" means open in read mode, which is the default. Sending "w" means write mode, for rewriting the contents of a file. Sending "a" means append mode, for adding new content to the end of the file.
Adding "b" to a mode opens it in binary mode, which is used for non-text files (such as image and sound files). Once a file has been opened and used, you should close it. This is done with the close method of the file object.
Reading Files
- file.read(): returns all the contents
- file.readlines():To retrieve each line in a file, you can use the readlines method to return a list in which each element is a line in the file.
Writing Files
- To write to files you use the write method
- The write method returns the number of bytes written to a file, if successful.
try:
myfile=open("filename.txt", 'r+') #mode read and write
#read all content
cont=myfile.read()
myfile.write("New line to insert in file")
print(cont)
finally:
myfile.close()
myfile=open("textfile.txt", 'r+') #mode read and write
#print each line
for line in myfile:
print(line.rstrip())
myfile.close()
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
This is line 10
An alternative way of doing this is using with statements. This creates a temporary variable (often called f), which is only accessible in the indented block of the with statement. The file is automatically closed at the end of the with statement, even if exceptions occur within it.
with open("filename.txt") as f:
print(f.read())
# Open a file for writing and create it if it doesn't exist
f = open("textfile.txt","w+")
# Open the file for appending text to the end
# f = open("textfile.txt","a+")
# write some lines of data to the file
for i in range(10):
f.write("This is line %d\r\n" % (i+1))
# close the file when done
f.close()
# Open the file back up and read the contents
f = open("textfile.txt","r")
if f.mode == 'r': # check to make sure that the file was opened
# use the read() function to read the entire file
# contents = f.read()
# print (contents)
fl = f.readlines() # readlines reads the individual lines into a list
for x in fl:
print (x)
This is line 1
This is line 2
This is line 3
This is line 4
This is line 5
This is line 6
This is line 7
This is line 8
This is line 9
This is line 10
Fast Input/Output
# Program to show time taken in fast
# I / O and normal I / O in python
from sys import stdin, stdout
import time
# Function for fast I / O
def fastIO():
# Stores the start time
start = time.perf_counter()
# To read single integer
n = stdin.readline()
# To input array
arr = [int(x) for x in stdin.readline().split()]
# Output integer
stdout.write(str(n))
# Output array
stdout.write(" ".join(map(str, arr)) + "\n")
# Stores the end time
end = time.perf_counter()
print("Time taken in fast IO", end-start)
# Function for normal I / O
def normalIO():
# Stores the start time
start = time.perf_counter()
# Input integer
n = int(input())
# Input array
arr = [int(x) for x in input().split()]
# Output integer
print(n)
# Output array
for i in arr:
print(i, end =" ")
print()
# Stores the end time
end = time.perf_counter()
print("Time taken in normal IO: ", end-start)
# Driver Code
if __name__ == '__main__':
fastIO()
normalIO()
Time taken in fast IO 0.001823046999994915
23
23
23
23
Time taken in normal IO: 5.173487307000002
OS Path Module
import os
from os import path
import datetime
from datetime import date, time, timedelta
import time
# Print the name of the OS
print (os.name)
# Check for item existence and type
print ("Item exists: ", path.exists("textfile.txt"))
print ("Item is a file: ", path.isfile("textfile.txt"))
print ("Item is a directory: ", path.isdir("textfile.txt"))
# Work with file paths
print ("Item's path: ",path.realpath("textfile.txt"))
print ("Item's path and name: ",path.split(path.realpath("textfile.txt")))
# Get the modification time
t = time.ctime(path.getmtime("textfile.txt"))
print (t)
print (datetime.datetime.fromtimestamp(path.getmtime("textfile.txt")))
# Calculate how long ago the item was modified
td= datetime.datetime.now() - datetime.datetime.fromtimestamp(path.getmtime("textfile.txt"))
print ("It has been " + str(td) + " since the file was modified")
print ("Or, " + str(td.total_seconds()) + " seconds")
posix
Item exists: True
Item is a file: True
Item is a directory: False
Item's path: /content/textfile.txt
Item's path and name: ('/content', 'textfile.txt')
Mon Jul 19 14:31:57 2021
2021-07-19 14:31:57.728190
It has been 0:06:55.454337 since the file was modified
Or, 415.454337 seconds
GLOB Module
Glob module searches all path names looking for files matching a specified pattern according to the rules dictated by the Unix shell. Results so obtained are returned in arbitrary order. Some requirements need traversal through a list of files at some location, mostly having a specific pattern. Python’s glob module has several functions that can help in listing files that match a given pattern under a specified folder.
Pattern Rules
- Follow standard Unix path expansion rules.
- Special characters supported : two different wild-cards- *, ? and character ranges expressed in [].
- The pattern rules are applied to segments of the filename (stopping at the path separator, /).
- Paths in the pattern can be relative or absolute.
Application
- It is useful in any situation where your program needs to look for a list of files on the file system with names matching a pattern.
- If you need a list of filenames that have a certain extension, prefix, or any common string in the middle, use glob instead of writing code to scan the directory contents yourself.
import glob
# search .py files
# in the current working directory
for py in glob.glob("*.py"):
print(py)
import glob
# Using character ranges []
print('Finding file using character ranges [] :- ')
print(glob.glob('./[0-9].*'))
# Using wildcard character *
print('\n Finding file using wildcard character * :- ')
print(glob.glob('*.gif'))
# Using wildcard character ?
print('\n Finding file using wildcard character ? :- ')
print(glob.glob('?.gif'))
# Using recursive attribute
print('\n Finding files using recursive attribute :- ')
print(glob.glob('**/*.txt', recursive=True))
Shutil Module: Filesystem Shell Methods
import os
from os import path
import shutil
from shutil import make_archive
from zipfile import ZipFile
# make a duplicate of an existing file
if path.exists("textfile.txt"):
# get the path to the file in the current directory
src = path.realpath("textfile.txt");
# # let's make a backup copy by appending "bak" to the name
dst = src + ".bak"
# # now use the shell to make a copy of the file
shutil.copy(src,dst)
# # copy over the permissions, modification times, and other info
shutil.copystat(src, dst)
# # rename the original file
os.rename("textfile.txt", "newfile.txt")
# now put things into a ZIP archive
root_dir,tail = path.split(src)
shutil.make_archive("archive", "zip", root_dir)
# more fine-grained control over ZIP files
with ZipFile("testzip.zip","w") as newzip:
newzip.write("newfile.txt")
newzip.write("textfile.txt.bak")
PyFileSystem
pip install fs
pip show fs
Name: fs
Version: 2.4.14
Summary: Python's filesystem abstraction layer
Home-page: https://github.com/PyFilesystem/pyfilesystem2
Author: Will McGugan
Author-email: will@willmcgugan.com
License: MIT
Location: /usr/local/lib/python3.7/dist-packages
Requires: appdirs, six, setuptools, pytz
Required-by:
# import the PyFilesystem library for OS files
from fs.osfs import OSFS
# TODO: open a local filesystem for the current directory
with OSFS(".") as myfs:
if (not myfs.exists("testdir")):
# create a sample data directory
myfs.makedir("testdir")
# create a file
with myfs.open("testdir/samplefile.txt", mode='w') as f:
f.write("This is some text")
# read the file contents
with myfs.open("testdir/samplefile.txt") as f:
content = f.read()
print(content)
# TODO: use the getinfo() function to return resource information
info = myfs.getinfo("testdir/samplefile.txt", namespaces=['details'])
print(info.name)
print(info.is_dir)
print(info.size)
print(info.type)
print(info.modified)
samplefile.txt
False
17
ResourceType.file
2021-12-23 04:09:10.571525+00:00
# TODO: try opening and reading a ZIP archive
from fs.zipfs import ZipFS
with ZipFS("FileExamples.zip") as thezip:
if (thezip.exists("FileExamples/File1.txt")):
with thezip.open("FileExamples/File1.txt") as f:
content = f.read()
print(content)
from fs.osfs import OSFS
# TODO: print a directory tree listing
with OSFS(".") as myfs:
myfs.tree()
|-- .config
| |-- configurations
| | `-- config_default
| |-- logs
| | `-- 2021.12.03
| | |-- 14.32.30.027140.log
| | |-- 14.32.50.522723.log
| | |-- 14.33.09.955489.log
| | |-- 14.33.16.964195.log
| | |-- 14.33.36.903459.log
| | `-- 14.33.37.701606.log
| |-- .last_opt_in_prompt.yaml
| |-- .last_survey_prompt.yaml
| |-- .last_update_check.json
| |-- active_config
| |-- config_sentinel
| `-- gce
|-- sample_data
| |-- anscombe.json
| |-- california_housing_test.csv
| |-- california_housing_train.csv
| |-- mnist_test.csv
| |-- mnist_train_small.csv
| `-- README.md
`-- testdir
`-- samplefile.txt
# TODO: use directory operation functions
# with OSFS(".") as myfs:
# dirlist = myfs.listdir("testdir")
# print(dirlist)
# with OSFS(".") as myfs:
# dirlist = list(myfs.scandir("testdir"))
# print(dirlist)
with OSFS(".") as myfs:
dirlist = list(myfs.filterdir("testdir", files=["*.txt"]))
print(dirlist)
# TODO: Use resource info with scandir
with OSFS(".") as myfs:
dirlist = myfs.scandir("testdir", namespaces=["details"])
for info in dirlist:
print(info.name, info.size)
# TODO: make a copy of a directory
# with OSFS(".") as myfs:
# myfs.copydir("testdir", "CopyOftestdir", create=True)
# TODO: remove a directory
# with OSFS(".") as myfs:
# if (myfs.exists("CopyOftestdir")):
# # myfs.removedir("CopyOftestdir")
# myfs.removetree("CopyOftestdir")
[<file 'samplefile.txt'>]
samplefile.txt 17
# Example file for using the File System walker
from fs.osfs import OSFS
from fs.zipfs import ZipFS
# create a basic file walker
with OSFS(".") as myfs:
print("-- Files --")
# TODO: use the files walker to process files
for path in myfs.walk.files(filter=["*.txt"]):
print(path)
print("-- Directories --")
# TODO: use the dirs walker for directories
for path in myfs.walk.dirs():
print(path)
# TODO: use the info property to step through items
# with OSFS(".") as myfs:
# for path, info in myfs.walk.info(namespaces=["details"]):
# print(path, info.is_dir, info.size)
# TODO: Use the walk object by itself:
# with OSFS("FileExamples") as myfs:
# for step in myfs.walk():
# print(step.path)
# print(step.files)
# print(step.dirs)
# TODO: Use the walker with a ZIP
# with ZipFS("FileExamples.zip") as thezip:
# print("-- Zip Contents --")
# for path in thezip.walk.files():
# print(path)
-- Files --
/testdir/samplefile.txt
-- Directories --
/.config
/testdir
/sample_data
/.config/logs
/.config/configurations
/.config/logs/2021.12.03
from fs.osfs import OSFS
# Challenge - figure out the total size of all text files in a folder structure
totalsize = 0
# Create a file walker to walk the FileExamples directory
with OSFS(".") as myfs:
# We need to specify the details namespace to get size info
for path, info in myfs.walk.info(namespaces=["details"]):
# Check for an ending extension of .txt
if path.endswith(".txt") and not info.is_dir:
totalsize += info.size
# print the final results
print("Total size of files is: {0}".format(totalsize))
Total size of files is: 17
CSV IO
# importing the csv module
import csv
# my data rows as dictionary objects
mydict =[{'branch': 'COE', 'cgpa': '9.0', 'name': 'Nikhil', 'year': '2'},
{'branch': 'COE', 'cgpa': '9.1', 'name': 'Sanchit', 'year': '2'},
{'branch': 'IT', 'cgpa': '9.3', 'name': 'Aditya', 'year': '2'},
{'branch': 'SE', 'cgpa': '9.5', 'name': 'Sagar', 'year': '1'},
{'branch': 'MCE', 'cgpa': '7.8', 'name': 'Prateek', 'year': '3'},
{'branch': 'EP', 'cgpa': '9.1', 'name': 'Sahil', 'year': '2'}]
# field names
fields = ['name', 'branch', 'year', 'cgpa']
# name of csv file
filename = "university_records.csv"
# writing to csv file
with open(filename, 'w') as csvfile:
# creating a csv dict writer object
writer = csv.DictWriter(csvfile, fieldnames = fields)
# writing headers (field names)
writer.writeheader()
# writing data rows
writer.writerows(mydict)
# importing csv module
import csv
# csv file name
filename = "university_records.csv"
# initializing the titles and rows list
fields = []
rows = []
# reading csv file
with open(filename, 'r') as csvfile:
# creating a csv reader object
csvreader = csv.reader(csvfile)
# extracting field names through first row
fields = next(csvreader)
# extracting each data row one by one
for row in csvreader:
rows.append(row)
# get total number of rows
print("Total no. of rows: %d"%(csvreader.line_num))
# printing the field names
print('Field names are:' + ', '.join(field for field in fields))
# printing first 5 rows
print('\nFirst 5 rows are:\n')
for row in rows[:5]:
# parsing each column of a row
for col in row:
print("%10s"%col),
print('\n')
Total no. of rows: 7
Field names are:name, branch, year, cgpa
First 5 rows are:
Nikhil
COE
2
9.0
Sanchit
COE
2
9.1
Aditya
IT
2
9.3
Sagar
SE
1
9.5
Prateek
MCE
3
7.8