Lanterns in Loudoun, VA. Photo: Ozan Aygun
Web-scraping using selenium and Python
Let's say you would like to get some data through web scraping but the information is somewhat buried in the target website. Perhaps the information is not readily available using a simple URL, but you would need to perform a few "clicks" to get what you need. Is there a way to do this programmatically? This is where selenium comes to help.
Here, I demonstrate how we can programmatically simulate browser actions using selenium and provide an example implementation in Python. The use case described here is scraping the daily stock prices for about 4000 securities in NASDAQ. The program will access to NASDAQ website, wait the page to load (the web page has numerous banner ads), closes a pop up window, and performs a few clicks to browse the 5 year worth of historical price data to download them as a csv file for a given ticker symbol.
The tutorial would give you an idea about how you can use selenium to simulate a browser session and scrape relatively complex information from web. One note ahead: the process is not necesarily the fastest and requires patience and optimization for a given project. As always, the recommended way of obtaining data is through databases, bulk downloads or APIs whenever possible, and web scraping should only be used as a last resort.
I hope you enjoyed reading this web scraping tutorial and learned something new today!