How to create a Python Script to extract files using AWS Cloud9.

Abidoye Joshua mayowa
5 min readApr 7, 2023

--

Thanks for checking out my blog today. This is the 13th week with Level Up In Tech! This week we are going to work on using AWS cloud 9 to create a script using Python programming language that will generate a list of dictionaries about the files in my personal working directory, then it will display the files and their sizes in each directory.

What is Python:

Python is a general-purpose, versatile, and powerful programming language. It’s a great first language because Python code is concise and easy to read. Whatever you want to do, python can do it. Python is the language for you, from web development to machine learning to data science and cloud resources automation. Python is one of the easiest programming languages to learn, largely due to its object-oriented nature and simple syntax.

AWS Cloud9:AWS Cloud9 is a cloud-based integrated development environment (IDE) that lets you write, run, and debug your code with a browser. It includes a code editor, debugger, and terminal. Cloud9 comes prepackaged with essential tools for popular programming languages, including JavaScript, Python, PHP, and more. You don’t need to install files or configure your development machine to start new projects. Since your Cloud9 IDE is cloud-based, you can work on your projects from your office, home, or anywhere using an internet-connected machine. Cloud9 also provides a seamless experience for developing serverless applications enabling you to easily define resources, debug, and switch between local and remote execution of serverless applications. With Cloud9, you can quickly share your development environment with your team, enabling you to pair programs and track each other’s inputs in real time.

Python Basics:

List: Lists are used to store multiple items in a single variable. Lists are one of 4 built-in data types in Python used to store collections of data. Lists are created using square brackets:

Dictionaries: Dictionaries store data values in key: value pairs. A dictionary is a collection which is ordered, changeable and does not allow duplicates.

For loop: A for loop is used for iterating over a sequence (a list, a tuple, a dictionary, a set, or a string). This is less like the for keyword in other programming languages and works more like an iterator method as found in other object-orientated programming languages.

OS Module: The OS module in Python provides functions for interacting with the operating system. OS comes under Python’s standard utility modules. This module provides a portable way of using operating system-dependent functionality. The *os* and *os. path* modules include many functions to interact with the file system.

Since we have covered the basics, let’s get the project done.

Scenario:

Your company needs to learn about the files located on various machines. You have been asked to build a script that extracts information such as the name and size of the files in the current working directory and stores it in a list of dictionaries.

Create a script that generates a list of dictionaries about files in the working directory. Then print the list.

Push your code to GitHub and include the link in your write-up.

Prerequisites:

  • Elementary AWS command line knowledge.
  • Foundational Python programming language.
  • Basic knowledge of AWS Interactive Development Environment (IDE).

Create the .py script:

To create a script, you need to have an AWS account, log into your AWS Cloud9 environment, and create a Python file to work on.

  • Firstly, we need to create a new file before we get started! Select File > New From Template > Python File > Save as, and give it a name. Make sure you KEEP the extension. .py.

To show that we are writing a bash script, use the following shebang line at the top of your code:

#!/usr/bin/env python3.7
  • Secondly, we will import the os module, which allows us to interact with the file system:
# importing os module
import os
  • Next, we will get the current working directory: we will use the code below to get the working directory.
# Get the current working directory
current_dir = os.getcwd()
  • Create an empty list to store file information:
# Create an empty list to store file information
file_info = []
  • Create a Dictionary within a list that includes the file name and Size
# Walk through the directory tree and get file information
for dirpath, dirnames, filenames in os.walk(current_dir):
for file in filenames:
file_path = os.path.join(dirpath, file)
file_size = os.path.getsize(file_path)
file_info.append({"name": file, "size": file_size})
  • Finally, print the file to see the list of dictionaries with name and size:
# Print the list of file information
for file in file_info:
print(file)

The whole code we need to generate a list of dictionaries for files in our working directory with a script is attached below:

#!/usr/bin/env python3.7

import os

# Get the current working directory
current_dir = os.getcwd()

# Create an empty list to store file information
file_info = []

# Walk through the directory tree and get file information
for dirpath, dirnames, filenames in os.walk(current_dir):
for file in filenames:
file_path = os.path.join(dirpath, file)
file_size = os.path.getsize(file_path)
file_info.append({"name": file, "size": file_size})

# Print the list of file information
for file in file_info:
print(file)

The next step is for us to run the code on AWS Cloud 9. This script will generate a list of dictionaries containing each file's name and size in our working directory.

To list the files, enter the following command:

ls

From the image above, we can see a list of the files in our current working directory. Great! it’s time for us to run the script!

We did it. Congrats, guys; the file size and name are displayed in the image above.

git push --set-upstream origin proj13

Thank you, guys, for reading my post for the week.

As always, if you found this insightful, please give my blog a follow and connect with me on LinkedIn:

https://www.linkedin.com/in/joshua-abidoye-0ab796195/

--

--

Abidoye Joshua mayowa

DevOps Engineer. I'm interested in collaborating with anyone interested in cloud engineer or cloud DevOPs.