What is a web browser
A web browser is a computer program that allows users to download and view websites.
history of web browser
The first web browser, called WorldWideWeb, was created in 1990 by Sir Tim Berners-Lee. He then recruited Nicola Pellow to write the Line Mode Browser, which displayed web pages on dumb terminals. 1993 was a landmark year with the release of Mosaic, credited as “the world’s first popular browser”. Its innovative graphical user interface made the World Wide Web system easy to use and thus more accessible to the average person. This, in turn, sparked the Internet boom of the 1990s, when the Web grew at a very rapid rate. Marc Andreessen, the leader of the Mosaic team, soon started his own company, Netscape, which released the Mosaic-influenced Netscape Navigator in 1994. Navigator quickly became the most popular browser.
Microsoft debuted Internet Explorer in 1995, leading to a browser war with Netscape. Within a few years, Microsoft gained a dominant position in the browser market for two reasons: it bundled Internet Explorer with its popular Windows operating system and did so as freeware with no restrictions on usage. The market share of Internet Explorer peaked at over 95% in the early 2000s.
In 1998, Netscape launched what would become the Mozilla Foundation to create a new browser using the open source software model. This work evolved into the Firefox browser, first released by Mozilla in 2004. Firefox market share peaked at 32% in 2010.
Apple released its Safari browser in 2003. Safari remains the dominant browser on Apple devices, though it did not become popular elsewhere.
Google debuted its Chrome browser in 2008, which steadily took market share from Internet Explorer and became the most popular browser in 2012. Chrome has remained dominant ever since.
Microsoft released its Edge browser in 2015 as part of the Windows 10 release. (Internet Explorer is still used on older versions of Windows.)
Components
- User Interface: This component allows end-users to interact with all visual elements available on the web page. The visual elements include the address bar, home button, next button, and all other elements that fetch and display the web page requested by the end-user.
- Browser Engine: It is a core component of every web browser. The browser engine functions as an intermediary or a bridge between the user interface and the rendering engine. It queries and handles the rendering engine as per the inputs received from the user interface.
- Rendering Engine: As the name suggests, this component is responsible for rendering a specific web page requested by the user on their screen.
- Networking: This component is responsible for managing network calls using standard protocols like HTTP or FTP. It also looks after security issues associated with internet communication.
- JavaScript Interpreter: As the name suggests, it is responsible for parsing and executing the JavaScript code embedded in a website.
- UI Backend: This component uses the user interface methods of the underlying operating system. It is mainly used for drawing basic widgets (windows and combo boxes).
- Data Storage/Persistence: It is a persistent layer. A web browser needs to store various types of data locally, for example, cookies. As a result, browsers must be compatible with data storage mechanisms such as WebSQL, IndexedDB, FileSystem, etc.
Process
Navigation
TCP Handshake
TCP (Transmission Control Protocol) uses a three-way handshake to set up a TCP/IP connection over an IP based network.
- SYNchronize
The host, generally the browser, sends a TCP SYNchronize packet to the server.
- SYNchronize-ACKnowledgement
The server receives the SYN and sends back a SYNchronize-ACKnowledgement.
- ACKnowledge
The host receives the server’s SYN-ACK and sends an ACKnowledge. The server receives ACK and the TCP socket connection is established.
This handshake step happens after a DNS lookup and before the TLS handshake, which creating a secure connection. The connection can be terminated independently by each side of the connection via a four-way handshake.
TLS(Transport Layer Security) Negotiation
This step determines which cipher will be used to encrypt the communication, verifies the server, and establishes that a secure connection is in place before beginning the actual transfer of data. This requires three more round trips to the server before the request for content is actually sent.
(The DNS lookup, the TCP handshake, and 5 steps of the TLS handshake including clienthello, serverhello and certificate, clientkey and finished for both server and client)
While making the connection secure adds time to the page load, a secure connection is worth the latency expense, as the data transmitted between the browser and the web server cannot be decrypted by a third party.
After the 8 round trips, the browser is finally able to make the request.
TTFB(Time to First Byte)
TTFB refers to the time between the browser requesting a page and when it receives the first byte of information from the server. This time includes
- DNS lookup
- establishing the connection using a TCP handshake
- SSL handshake(if the request is made over https)
TCP Slow Start / 14kb rule
The first response packet will be 14Kb. This is part of TCP slow start, an algorithm which balances the speed of a network connection. Slow start gradually increases the amount of data transmitted until the network’s maximum bandwidth can be determined.
As the server sends data in TCP packets, the user’s client confirms delivery by returning acknowledgements, or ACKs. If the server sends too many packets too quickly, they will be dropped. Meaning, there will be no acknowledgement.
Critical Rendering Path
Parsing
Once the browser receives the first chunk of data, it can begin parsing the information received. Parsing is the step the browser takes to turn the data it receives over the network into the DOM and CSSOM, which is used by the renderer to paint a page to the screen.
It’s important for web performance optimization to include everything the browser needs to start rendering a page, or at least a template of the page - the CSS and HTML needed for the first render – in the first 14 kilobytes.
Building the DOM tree
The first step is processing the HTML markup and building the DOM tree. HTML parsing involves tokenization and tree construction. HTML tokens include start and end tags, as well as attribute names and values. If the document is well-formed, parsing it is straightforward and faster. The parser parses tokenized input into the document, building up the document tree.
When the parser finds non-blocking resources, such as an image, the browser will request those resources and continue parsing. Parsing can continue when a CSS file is encountered, but
<script>
tags—particularly those without an async or defer attribute—block rendering, and pause the parsing of HTML. Though the browser’s preload scanner hastens this process, excessive scripts can still be a significant bottleneck.Preload Scanner
While the browser builds the DOM tree, this process occupies the main thread. As this happens, the preload scanner will parse through the content available and request high priority resources like CSS, JavaScript, and web fonts.
Building the CSSOM
The second step in the critical rendering path is processing CSS and building the CSSOM tree. The CSS object model is similar to the DOM. The DOM and CSSOM are both trees. They are independent data structures. The browser converts the CSS rules into a map of styles it can understand and work with. The browser goes through each rule set in the CSS, creating a tree of nodes with parent, child, and sibling relationships based on the CSS selectors.
As with HTML, the browser needs to convert the received CSS rules into something it can work with. Hence, it repeats the HTML-to-object process, but for the CSS.
The CSSOM tree includes styles from the user agent style sheet. The browser begins with the most general rule applicable to a node and recursively refines the computed styles by applying more specific rules. In other words, it cascades the property values.
Other Processes
- JavaScript Compilation
While the CSS is being parsed and the CSSOM is created, other assets, including JavaScript files, are downloading (thanks to the preload scanner). JavaScript is interpreted, compiled, parsed and executed. The scripts are parsed into abstract syntax trees. Some browser engines take the Abstract Syntax Tree and pass it into an interpreter, outputting bytecode which is executed on the main thread. This is known as JavaScript compilation.
- Building the Accessibility Tree
The accessibility object model (AOM) is like a semantic version of the DOM. The browser updates the accessibility tree when the DOM is updated. The accessibility tree is not modifiable by assistive technologies themselves. Until the AOM is built, the content is not accessible to screen readers.
Render
Rendering steps include style, layout, paint and, in some cases, compositing. The CSSOM and DOM trees created in the parsing step are combined into a render tree which is then used to compute the layout of every visible element, which is then painted to the screen. In some cases, content can be promoted to their own layers and composited, improving performance by painting portions of the screen on the GPU instead of the CPU, freeing up the main thread.
Style
The third step in the critical rendering path is combining the DOM and CSSOM into a render tree. The computed style tree, or render tree, construction starts with the root of the DOM tree, traversing each visible node.
- tags that aren’t going to be displayed are not included in the render tree
- Nodes with
visibility: hidden
applied are included in the render tree, as they do take up space. - As we have not given any directives to override the user agent default, the script node will not be included in the render tree.
Each visible node has its CSSOM rules applied to it. The render tree holds all the visible nodes with content and computed styles – matching up all the relevant styles to every visible node in the DOM tree, and determining, based on the CSS cascade, what the computed styles are for each node.
Layout
The fourth step in the critical rendering path is running layout on the render tree to compute the geometry of each node.
Once the render tree is built, layout commences. The render tree identified which nodes are displayed (even if invisible) along with their computed styles, but not the dimensions or location of each node. To determine the exact size and location of each object, the browser starts at the root of the render tree and traverses it.
The btime the size and position of nodes are determined is called layout. Subsequent recalculations of node size and locations are called reflows. Suppose the initial layout occurs before the image is returned. If the size of a image is not declared, there will be a reflow once the image size is known.
Paint
The last step in the critical rendering path is painting the individual nodes to the screen, the first occurrence of which is called the first meaningful paint.
In the painting or rasterization phase, the browser converts each box calculated in the layout phase to actual pixels on the screen. Painting involves drawing every visual part of an element to the screen, including text, colors, borders, shadows, and replaced elements like buttons and images. The browser needs to do this super quickly.
To ensure smooth scrolling and animation, everything occupying the main thread, including calculating styles, along with reflow and paint, must take the browser less than 16.67ms to accomplish.
To ensure repainting can be done even faster than the initial paint, the drawing to the screen is generally broken down into several layers. If this occurs, then compositing is necessary.
Painting can break the elements in the layout tree into layers. Promoting content into layers on the GPU (instead of the main thread on the CPU) improves paint and repaint performance. There are specific properties and elements that instantiate a layer, including
Compositing
When sections of the document are drawn in different layers, overlapping each other, compositing is necessary to ensure they are drawn to the screen in the right order and the content is rendered correctly.
As the page continues to load assets, reflows can happen. A reflow sparks a repaint and a re-composite. Only the layer that needed to be repainted would be repainted, and composited if necessary.
Interactivity
Once the main thread is done painting the page, you would think we would be “all set.” That isn’t necessarily the case. If the load includes JavaScript, that was correctly deferred, and only executed after the onload event fires, the main thread might be busy, and not available for scrolling, touch, and other interactions.
TTI(Time to Interactive)
TTI is the measurement of how long it took from that first request which led to the DNS lookup to when the page is interactive.