Day 1 of 101 Days of Python

May 2, 2020
I've decided to start working on a series of posts that I hope can eventually become a useful cookbook for others. This series will be focused on beginner problems posted to Reddit that are easily digestible, and will use multiple components from Python in each post. I'm setting a goal of 101 posts, which is probably too large, but I'm going for it anyway.
So let's kick off 101 days of Python!
The problem from Reddit (I cleaned the grammar a bit):
Okay I have a CSV file with a bunch of variables... Name, Address, Phone Number, Email. In that order.
Now what I need to do is create another CSV file with only people who have certain characters in their address...
i.e. I need everyone from "12" Or "Maple Avenue" and "Wyoming". So Basically I need to search the Address from each csv row for those 3 terms and create another CSV with Name, Address, Phone Number, and Email of only the people that have those either (12 AND Wyoming) in it or (Maple Avenue AND Wyoming) in it...
Let's start with something concrete, and break the problem into smaller steps.
The program will be running with some test data out of a file named data/day_1.csv. The contents of the test file:.
John,1234 Main St.,555-123-4567, Naomi,234 Apartment 412,555-758-9865, Bob,Amazing Penthouse on Park Place,555-785-5544, Sarah,456 Main St.,555-456-7845,
Let's work through this with the following steps in mind:
(1) Read the data from the csv file
(2) Convert the data to something usable by Python
(3) Filter the data to only what is needed using the search terms
(4) Write the filtered data to an output csv file
Step (1) is easy, we just need to read the data. We can setup a simple function the opens the file, reads the data, then closes the file.
def read_csv(input_file): f = open(input_file, "r") data = f.readlines() f.close() return data
Step (2) is a little more complicated. We now have a list of strings from Step (1), where each string in the list contains a row of data from the csv. Lets convert this row into an object that we can use. This can be done with the ContactInfo class below. The data will be stored in the class fields name, address, phone_number and email. The class also has two helper methods that will be used to write the class contents to an output csv file.
class ContactInfo: def __init__(self, name, address, phone_number, email): = name self.address = address self.phone_number = phone_number = email.strip("\n") def to_string(self): return ",".join([, self.address, self.phone_number,]) def print(self): print(self.to_string())
The ContactInfo class will store the data, but we need a way to convert each row of the csv (which is still a string) into a ContactInfo object. Let's do this with a function called create_contact_info.
def create_contact_info(data): [name, address, phone_number, email] = data.split(",") contact_info = ContactInfo(name, address, phone_number, email) return contact_info
That works better, now we can pass create_contact_info one row of the csv as a string, and we will get back a ContactInfo object.
Step (3) requires a function that will filter a list of ContactInfo objects based on address search terms provided.
def search_address(search_terms, contact_info): # Don't alter contact info filtered_data = copy.deepcopy(contact_info) for search_term in search_terms: filtered_data = list(filter(lambda x: search_term in x.address, filtered_data)) return filtered_data
Now search_address will take a list of search terms, and a list of ContactInfo objects and return the ContactInfo objects that have an address that meets all of the search terms.
For Step (4) we just need to write the filtered list of ContactInfo objects to a new csv file. Luckily this is just as easy as reading the csv lines.
def write_csv(contact_data, output_file): f = open(output_file, "w") for contact_info in contact_data: f.write(contact_info.to_string() + "\n") f.close()
Lets create a test and put all of it together:
def test(with_io=False): search_terms = ["Main"] if with_io: csv_info = read_csv("data/day_1.csv") contact_info = map(create_contact_info, csv_info) else: #Name,Address,Phone Number,Email csv_info = \ "John,1234 Main St.,555-123-4567,\n" \ "Naomi,234 Apartment 412,555-758-9865,\n" \ "Bob,Amazing Penthouse on Park Place,555-785-5544,\n" \ "Sarah,456 Main St.,555-456-7845," contact_info = map(create_contact_info, csv_info.split("\n")) filtered_contact_info = search_address(search_terms, contact_info) for contact_info in list(filtered_contact_info): contact_info.print()
Looks like it works just fine:
>>> from day_1 import * >>> test() John,1234 Main St.,555-123-4567, Sarah,456 Main St.,555-456-7845, >>> test(True) John,1234 Main St.,555-123-4567, Sarah,456 Main St.,555-456-7845,
Finally, lets create a way to run this from the command line:
if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument('-f', '--input_file', help='Input file name', required=True) parser.add_argument('-s', '--search_terms', help='Address search terms', required=True, nargs="+") parser.add_argument('-o', '--output_file', help='Output file name', required=True) args = parser.parse_args() csv_info = read_csv(args.input_file) contact_info = list(map(create_contact_info, csv_info)) filtered_contact_info = list(search_address(args.search_terms, contact_info)) write_csv(filtered_contact_info, args.output_file)
Try running it yourself from the command line to see if it gives you the correctly filtered csv file as an output. Even though I'm searching for "Main" and "1234" in the address, this will still work for the search terms mentioned in the original question.
$ python3 -f "data/day_1.csv" -s "Main" "1234" -o "data/day_1_filtered.csv"
All files are available here: