Day 2 of 101 Days of Python

May 2, 2020
Day 2 is another question from Reddit. This one seems interesting because it involves building a consolidated set of Google searches, where each one can be individually opened in a tab.
How to write a script for automating a set of Google searches with different parameters for a literature review
I am doing a literature review for my field (genetics), and I want to write a script for some of the searches I am doing as I have like 20+ DNA targets that I want to look for in association with other key terms. I have searched online for my issue but I get a lot of SEO and Google ad stuff. To explain in better detail:
Say I have
DNA-1
DNA-2
DNA-3
and I want to run 3 searches for
DNA-1 Immune System
DNA-2 Immune system
DNA-3 Immune system
In one go (albeit in three tabs), how can I automate this? Once I have this script then it would be really handy so I can change "Immune System" for "Asthma" and run another three searches at once easily looking for those three targets. It's just this is for 26 DNA sequences so manually doing this is becoming tedious very fast.
Apologies for the noob question, this isn't my field and I would spend time learning myself but I am pretty wrapped up in this review!
Thanks in advance.
To investigate, let's first take a look at what Google search requests look like when you search for something.
Let's type in a search into Google for "Green Apples" and see what the GET request looks like.
Negative Chart
If we grab the url from the browser, it should look something like this:
https://www.google.com/search?sxsrf=ALeKk01jwsOF3KQqufsi0ZGRUkOTgRDCxA%3A1588521916826&source=hp&ei=vOuuXvyFMOeJggexgoL4Aw&q=Green+Apples&oq=Green+Apples&gs_lcp=CgZwc3ktYWIQDDICCAAyAggAMgIIADICCAAyAggAMgIIADICCAAyAggAMgIIADICCAA6BAgjECc6BQgAEJECOgUIABCDAVCPCVjVFmCiNmgBcAB4AIABtQGIAdIHkgEEMTAuMpgBAKABAaoBB2d3cy13aXo&sclient=psy-ab&ved=0ahUKEwj808WkiZjpAhXnhOAKHTGBAD8Q4dUDCAw
Lets see if we can replace the oq=Green_Apples with the DNA search tems.
Copy and paste the search below into your browser - it has the "Green Apples" search terms replaced with "DNA-1 Immune System" search terms.
https://www.google.com/search?sxsrf=ALeKk01jwsOF3KQqufsi0ZGRUkOTgRDCxA%3A1588521916826&source=hp&ei=vOuuXvyFMOeJggexgoL4Aw&q=DNA-1+Immune+System&oq=DNA-1+Immune+System&gs_lcp=CgZwc3ktYWIQDDICCAAyAggAMgIIADICCAAyAggAMgIIADICCAAyAggAMgIIADICCAA6BAgjECc6BQgAEJECOgUIABCDAVCPCVjVFmCiNmgBcAB4AIABtQGIAdIHkgEEMTAuMpgBAKABAaoBB2d3cy13aXo&sclient=psy-ab&ved=0ahUKEwj808WkiZjpAhXnhOAKHTGBAD8Q4dUDCAw
It works great! So now we know that all we need to do is build this search string for each set of desired search terms, then open the url in a new tab.
Let's see how this can be automated with Python. We will actually build a webpage with clickable links that will make this whole process nice to view when we want to see all of the links.
So the steps for automating this process with Python will be:
(1) Build a list of Google search urls from search terms provided
(2) Build a clickable html link for each of these search terms
(3) Put all of these links onto a webpage were we can click on them
Step (1) is really easy, we just need a function that will insert a search term into the Google search url above, with all of the words separated by the + character.
def get_google_search_address(search_term): search_address = \ "https://www.google.com/search" + \ "?sxsrf=ALeKk01jwsOF3KQqufsi0ZGRUkOTgRDCxA%3A1588521916826&" + \ "source=hp&ei=vOuuXvyFMOeJggexgoL4Aw&q=" + \ search_term.replace(" ", "+") + \ "&oq=" + search_term.replace(" ", "+") + \ "&gs_lcp=CgZwc3ktYWIQDDICCAAyAggAMgIIADICCAAyAggAMgIIADICCAAyAggAMgIIADICCAA6BAgjEC" + \ "c6BQgAEJECOgUIABCDAVCPCVjVFmCiNmgBcAB4AIABtQGIAdIHkgEEMTAuMpgBAKABAaoBB2d3cy13aXo&s" + \ "client=psy-ab&ved=0ahUKEwj808WkiZjpAhXnhOAKHTGBAD8Q4dUDCAw" return search_address
Step (2) is also a breeze if you are familiar with html and css. We will add some css here to keep things pretty.
def build_search_term_link(search_term): return \ "<div style=\"height: 50px;\">" + \ "<a href=" + get_google_search_address(search_term) + " target=\"_blank\">" + \ "<br>" + \ "
" + \
"Open: " + search_term + " search" + \
"</h3>" + \
"</br>" + \
"</a>" + \
"</div>"
Negative Chart
In Step (3), we need to combine all of these links into an html file, and then write the html file so that we can open it and click on all of the links. Here we will also add more css styling, as well as join search terms together.
Note that build_search_page will take two lists of search terms, the first being the DNA-1, DNA-2, etc... mentioned in the question, and the second being the search terms that we want to change on the fly.
def build_search_page(base_terms, search_terms): body_style = \ "width: 500px;" + \ "margin: 50px auto;" + \ "border: 2px solid gray;" + \ "padding: 20px;" + \ "border-radius: 10px;" html = \ "<!DOCTYPE html>" + \ "<html>" + \ "<head>" + \ "</head>" + \ "<body style= \"" + body_style + "\">" + \ "<h1 style=\"text-align: center; line-height: 0;\"><br>Search Results<br></h1>" for base_term in base_terms: for search_term in search_terms: html += "".join(build_search_term_link(base_term + " " + search_term)) html += "</body>" return html def write_html(base_terms, search_terms): f = open("day_2.html", "w") f.write(build_search_page(base_terms, search_terms)) f.close()
Finally, lets build a main function that will perform all the steps listed. I'm setting this up so that we can run if from the console to generate an html page. I'm using the DNA search terms listed, along with some other disease related search terms.
if __name__ == "__main__": base_terms = ["DNA-1", "DNA-2", "DNA-3"] search_terms = ["Immune System", "COVID-19", "Pandemic"] write_html(base_terms, search_terms)
Running this from the console will produce an html file named "day_2.html" which is a nice list of clickable links for each combination of search terms listed. Our use of the target=_blank makes sure that each link opens in a new tab.
So just run this command from the console, then click on the day_2.html file produced, and you will see the search links in your browser!
$ python3 day_2.py
Negative Chart
How-could-Coronavirus-layoffs-impact-Home-Prices-in-the-United-States
How-to-scrape-Yahoo-Finance-data-with-Python-and-Beautiful-Soup
How-to-scrape-website-content-with-Python
How-to-use-Classes-in-Python
How-to-iterate-through-HTML-with-BeautifulSoup
Day-2-of-101-Days-of-Python
Day-1-of-101-Days-of-Python
Homemade-Time-Series-With-OCaml
Grok-Correlation
Efficient-Functional-Programming
Hello-World