Family of markup languages for displaying information viewable in a web browser.
HTML, or HyperText Markup Language, is the standard markup language used for creating web pages. It is a cornerstone technology of the World Wide Web and is essential for web scraping. This article will provide a comprehensive overview of the basics of HTML.
HTML is used to describe the structure of web pages using markup. The elements of HTML are the building blocks of all websites. HTML allows images and objects to be embedded and can be used to create interactive forms. It also provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items.
HTML tags are the hidden keywords within a web page that define how your web browser must format and display the content. Most tags must have two parts, an opening and a closing part. For example, <html>
is the opening tag and </html>
is the closing tag. Note that the closing tag has the same text as the opening tag, but has an additional slash (/).
HTML attributes are special words used inside the opening tag to control the element's behaviour. HTML attributes are a modifier of an HTML element type. An attribute either modifies the default functionality of an element type or provides functionality to certain element types unable to function correctly without them. For example, the href
attribute in the <a>
(anchor) tag is used to specify the URL of the page the link goes to.
An HTML Document is mainly structured into head and body. The head element contains title and meta data of a web document. The body element contains the information that you want to display on a web page. Each HTML document begins with the declaration <!DOCTYPE html>
to help the browser understand the document type and version.
While HTML provides the structure, Cascading Style Sheets (CSS) are used to control presentation, formatting, and layout. CSS is used along with HTML to create beautiful websites. JavaScript, on the other hand, is a popular programming language that's used to create dynamic interactive content on websites.
Understanding the basics of HTML is crucial for web scraping as it allows you to understand how the data is structured and how to access it. In the next unit, we will introduce Beautiful Soup, a Python library that is used for web scraping purposes to pull the data out of HTML and XML files.