Business vector created by Freepik
What is Web Scraping?
One of the more popular uses of Python, web scraping is a powerful tool that you can use to play with data found on the Internet. Also known as web harvesting, programs make use of web scraping to read through HTML websites to retrieve useful information for data processing purposes or simply for information sharing.
Before You Learn to Web Scrape..
In order to understand how web scraping is done, one must have basic understanding of HTML fundamentals and syntax. Being able to read and understand the format of which HTML web pages are presented is good enough. Check out this resource if front-end language seems foreign to you, or if you just need a bit of a refresher.
Modules Required
Web scraping revolves around breaking down the HTML content of web pages and extracting what you want. Python offers the BeautifulSoup module which allows you to parse HTML into a format that you can work with. You can also make use of urllib.request to access webpages.
Approach
Web scraping can be done in many different ways, but the main approach is as follows:
- Use requests library to pull data from the webpage
- Use BeautifulSoup library to traverse and select relevant portions
- Input into the main program/file
Recommended Resources
There are many resources available to learn web scraping, depending on the type of learning style you prefer. Here are some resources we found to be most reliable and effective in learning the basics.
Python for Data Science Essential Training – Web Scrape in Practice
To access LyndaCampus, log in to NTULearn, go to “Self-paced Learning” and click on the LyndaCampus link provided, or simply log in at this link. Then, search for the course title given, or click on the image below.
This resource is also available in NTU Library. Click this link for more information.
Here’s an example of web scraping being used to extract random quotes from the TV series “How I Met Your Mother”.
Try making use of web scraping with the next application that you develop with Python, and share with us in the comments below.
Have fun scraping!
For more Python programming resources, check these other posts out.
Be sure to follow us on Twitter @NTUsgLibrary, and our hashtag #NTUsgLibraryDS!
Thank you for report!