Learning Outcomes

  • To learn how to delete single file directories.
  • To learn how to delete multiple file directories.
  • To practice deleting multiple .txt files and also other files.

So in the last episode we learned how to combine multiple .csv files within Python.

However as is often the case, after we’re finished with wrangling the data, we’ll need to delete these specific files from our local environment.

Therefore, its vital that we can:

  • Delete multiple folder directories within Python.
  • Delete all of the files within our current working directory that are a specific filetype (.csv, .txt) etc.

Import packages

import os
import glob
import pandas as pd
import shutil
!pwd
/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files

Notice how when we do !ls from the Jupyter notebook or ls from the terminal/command line we can see there are several .txt files and file directories that we’d like to delete and keep safe:


!ls - Jupyter notebook
ls - Command line

Let’s create a scenario where we have several folders and files that we’d like to either keep or delete.


Directories To Delete:

  • ahrefs_backlink_data
  • csv_data_to_delete
  • digital_marketing_content
  • seo_marketing_content

Directories To Keep Safe:

  • I_never_want_to_delete_this_folder

Files To Delete:

  • delete_me.txt
  • delete_this_file.txt
  • practicing_deleting.txt

Files To Keep Safe:

  • keepthisfilesafe.txt

Deleting Specific File Directories With Python

Firstly let’s see if we can find some patterns within the directories that we would like to delete or keep!

As we can see both the directories that we want to keep/delete contain underscores, so there is no difference here. However, we do want to delete all of the sub-directories apart from but one directory so we can just:

  1. Obtain all of the file directories within the current working directory.
  2. Remove the I_never_want_to_delete_this_folder from our python list and then delete the remaining file directories!

So let’s code that up ๐Ÿ˜

!pwd
/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files
# Let's define our current path here:
# You will need to change this to be unique to your specific directory path:
path = '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files'

This command will scan all of the files and folders within the current working directory, we will also filter this by adding and if statement and ensuring that the iterable (each item in the list) is a folder:


[ some_code_here if.is_dir()]

list_subfolders_with_paths = [f.path for f in os.scandir(path) 
if f.is_dir()]

print(f'''There are all of the current subfolders within our current working directory:
      n {list_subfolders_with_paths }''')

There are all of the current subfolders within our current working directory:
['/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/I_never_want_to_delete_this_folder', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete']

Now we can just do a list comprehension to only select file names that never contain the words “I_never_want” within the string.

subfolders_to_delete = [folder_name for folder_name in list_subfolders_with_paths 
                        if "I_never_want" not in folder_name]
print(f"These are the subfolders that we would like to delete: n n {subfolders_to_delete}")
These are the subfolders that we would like to delete: 
['/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete']

The important syntax to note above is:


if "I never want" not in folder_name

This means that as we loop over every file_name if “I_never_want” is not within the string name, it is included within the list comprehension, however the file I_never_want_to_delete_this_folder does have this string within it and is therefore excluded from the final python list.


Now that we’ve got all of the subfolders in a list we will just create a for loop to delete every folder with the following command:


shutil.rmtree()

for folder in subfolders_to_delete:
    print(folder)
    print('-----')
    shutil.rmtree(folder)
    print(f"Deleted the {folder} from your hardrive")

/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content
-----
Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/seo_marketing_content from your hardrive
/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data
-----
Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/ahrefs_backlink_data from your hardrive
/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints
-----
Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints from your hardrive
/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content
-----
Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/digital_marketing_content from your hardrive
/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete
-----
Deleted the /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/csv_data_to_delete from your hardrive

Pro Tip: You can only delete files once and they don’t go to your recycling bin!

So definitely make sure to use print() statements and double check that the files / folders are the ones you would like delete before committing to it!


We can double check that all of the folders have been deleted by either running:


!ls in a jupyter notebook
ls on terminal / gitbash

How To Delete Specific File Types In Your Current Working Directory With Python

Now that we’ve deleted all of the folders and the files/folders inside of those specific folders.

Let’s practice deleting some specific .txt files from our current working directory!

I’m going to show you two different ways we could solve this problem:

  1. The file that we want to keep doesn’t contain underscores _ , therefore we could delete all of the files containing underscores.
  2. All of the files that we want to delete also contain the phrase “delet” so we could technically delete all files which match this text string.

How To Get All Of The Files Within The Current Working Directory In Python

Firstly let’s obtain all of the directories and then filter it by only items that are also files:

files = [f for f in os.listdir('.') 
         if os.path.isfile(f)]
print(files)

['.DS_Store', 'how-to-delete-multiple-files-in-python.ipynb', 'practicing_deleting.txt', 'delete_this_file.txt', 'keepthisfilesafe.txt', 'delete_me.txt']

Remember that after you’ve run one of the following two methods, the second one will not work as the files have already been deleted:

Method One:


for f in files:
    # Then we will look at every file and if contains an underscore then delete the file! 
    if "_" in f:
        print(f)
        # os.remove() allows us to easily remove single files <3 
        os.remove(f)

# Output:
# .DS_Store
# practicing_deleting.txt
# delete_this_file.txt
# delete_me.txt

Method Two:

for f in files:
    # Searching for only files that contain both .txt and delet within the file name
    if ".txt" in f and "delet" in f:
        print(f)
        # You would need to uncomment the method below!
        # os.remove(f)
# Output: 
# practicing_deleting.txt
# delete_this_file.txt
# delete_me.txt

How To Delete Multiple File Types Within The Current Working Directory

Now let’s make our method two slightly more complex. For example let’s say we wanted to delete multiple filetype extensions including .pdf, .csv and .txt files!

# The touch command allows us to create new files via terminal:
!touch awesomefile.pdf
!touch text.csv
!touch thisisatest.txt
files = [f for f in os.listdir('.') if os.path.isfile(f)]

for f in files:
    if f.endswith(('.pdf','.csv', '.txt')):
        # This will only look at files ending with the above extensions! 
        print(f, "would be deleted if os.remove() was uncommented!")
        # os.remove(f)
text.csv would be deleted if os.remove() was uncommented!
awesomefile.pdf would be deleted if os.remove() was uncommented!
keepthisfilesafe.txt would be deleted if os.remove() was uncommented!
thisisatest.txt would be deleted if os.remove() was uncommented!

Another method would be to use negation (delete everything but).


if not (some_condition  - True / False)
for f in files:
    if not f.endswith(('.txt', '.ipynb')):
        print(f, "would be deleted if os.remove() was uncommented!")
        # os.remove(f)
text.csv would be deleted if os.remove() was uncommented!
awesomefile.pdf would be deleted if os.remove() was uncommented!

Remember!

.endswith() accepts a tuple of things that you want to match against and returns True if any of them match.

How To Search For Specific File Types From Current Directories Downwards

If you have tons of subfolders and you would like to find any files in any of these folders, you can use the os.walk() function.


for root, dirnames, filenames in os.walk(folder):
    for filename in filenames:
        if filename.endswith(extensions):
            do_something()
!pwd
/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files
folder="/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files"
# Create a list of matches 
matches = []
extensions = ('.csv', '.txt')

for root, dirnames, filenames in os.walk(folder):
    for filename in filenames:
        if filename.endswith(extensions):
            matches.append(os.path.join(root, filename))

print("This is the root folder:", root)
print("n These are the files matches obtained from all subsequent lower directories including the root folder:", matches)
This is the root folder: /Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/.ipynb_checkpoints

 These are the files matches obtained from all subsequent lower directories including the root folder: ['/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/text.csv', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/keepthisfilesafe.txt', '/Users/jamesphoenix/Desktop/Imran_And_James/Python_For_SEO/3_how_to_delete_multiple_local_files/thisisatest.txt']

How To Delete Specific File Types In All Of The Directories Below Your Current Working Directory

Now what about recursively deleting all files with a specific file type in a series of subfolders?

No problem!

We will still use the os.walk() function, however notice that instead of appending the results to a list, we can just delete the file instead:


for root, dirnames, filenames in os.walk(folder):
    for filename in filenames:
        if filename.endswith(extensions):
            os.remove(filename) # Notice here how we are using os.remove() instead of appending to a list!

Hopefully you can see how easy it is to delete files and folders within your local folders at scale.

Reading and deleting multiple files ensures that you can create simple data pipelines such as:

  • Manually download 100x .csv files.
  • Automatically opening all of the .csv files.
  • Concatenate the .csv files together into a pandas dataframe.
  • Perform some data manipulation on the merged data.
  • Save the concatenated pandas dataframe as a new csv i.e. master.csv
  • Delete all of the original .csv files.

In the next episode, we’ll be learning how we can further automate our data analysis and data pipelines by directly reading and writing data to google sheets in pandas!