Building Your First Web Scraper: A Step-by-Step Guide
Understanding the Basics of Web Scraping
Web scraping is a technique used to extract large amounts of data from websites where the data is extracted and saved to a local file in your computer or to a database in table (tabular) form.To understand web scraping, let's first understand what a web page is. A web page is a document that is often written in HTML (Hyper Text Markup Language). HTML is the standard markup language for documents designed to be displayed in a web browser. It can embed scripting languages such as JavaScript which affect the behavior and content of web pages. Inclusion of CSS (Cascading Style Sheets) defines the look and layout of content. The browser's web page rendering engine interprets these languages to display the formatted web page on your computer or mobile device.
When we perform web scraping, we deal directly with HTML to collect data. A basic understanding of HTML structure is therefore essential for web scraping. HTML consists of tags, and these tags are enclosed in angle brackets. For example, the opening tag could be and the closing tag would be . Each web page is a hierarchy of tags. There are different types of tags like