A Deep Dive into the W3C WebDriver Specification

It’s time to look under the hood

Madhan published on March 28, 2020

6 min, 1091 words

w3c-webdriver

Before getting into the topic lets first understand the difference between the terms WebDriver and Selenium.

When it comes to testing, these two terms WebDriver and Selenium are interchangeably used to refer the automating the web application.

WebDriver is an HTTP based API to interact with a web browser. The standard is provided by W3C. WebDriver is a remote control interface that enables introspection and control of user agents. It provides a platform- and language-neutral wire protocol as a way for out-of-process programs to remotely instruct the behavior of web browsers.

Selenium is a range of tools and libraries that enable and support the automation of web browsers. Selenium WebDriver refers to both the language bindings and the implementations of the individual browser controlling code. This is commonly referred to as just WebDriver.

Most of the browser vendors implements the W3C WebDriver capabilities and protocol on Selenium 3.8.0 (JSON wire protocol is OBSOLETE now) as a standalone server in a binary executable format.

selenium_architecture Selenium Architecture

In this article, we are going to see WebDriver API in action, for that we need to download ChromeDriver and PostMan tool.

The first step is to download the ChromeDriver executable. Download the appropriate driver version based on your OS.

To interact with the API we need a tool that allows us to make HTTP requests. For that we need to download the Postman tool. So that you can send and receive API requests.

The test case that we are going to automate as follow,

Open the Chrome browser
Navigate to Google page
Find search text box
Send a search value
Find the search button
Click on the search button
Quit driver

Let’s first start ChromeDriver in Terminal

Extract the file and run it using the command ./chromedriver. You will be given the port on which the WebDriver API is running.

$ ./chromedriver

Starting ChromeDriver 80.0.3987.106 (f68069574609230cf9b635cd784cfb1bf81bb53a-refs/branch heads/3987@{#882})
on port9515 Only local connections are allowed.
Please protect ports used by ChromeDriver and related test frameworks to prevent access by malicious code.

1. Open the Chrome browser

{ "images": [ { "src": "https://cdn-images-1.medium.com/max/2540/1*HX7JzMXo5R75qtkFTA4zhQ.png", "title": "Step 1" }, { "src": "https://cdn-images-1.medium.com/max/2000/1*I3cVPlrRTp4_n9pylkQnRw.png", "title": "Step 2" }, { "src": "https://cdn-images-1.medium.com/max/2000/1*O9gSQmtKsBafxdqHAQT0OQ.png", "title": "Step 3" }, { "src": "https://cdn-images-1.medium.com/max/2000/1*faSnhxAkGGE6RIsYbw5PyQ.png", "title": "Step 4" } ] }

Now that the chromedriver started in the default port 9519. Let's open the browser. This is done by creating a new session. To create a new session using the WebDriver API, make an HTTP POST request to the /session endpoint. In addition, we need to define the type of browser. This information is sent in as a JSON object in the POST body. On success, the response includes a sessionId.

2. Navigate to Google page

The next step is to open a URL in the browser. This is done with an HTTP POST request to /session/<session_id>/url, with the POST body including the URL that will be opened

3. Find search text box

step 2 step 3

Now that we have opened the Google page, let's find the search text box. This is done with an HTTP POST request to /session/<session_id>/element, with the POST body including the location strategy and selector

4. Send a search value

After locating the search box, let's send the search value. This is done with an HTTP POST request to /session/<session_id>/element/<element_id>/value, with the POST body including the value in the text parameter

5. Find the search button

step 2 step 3

Let's find the search text button. This is done with an HTTP POST request to /session/<session_id>/element, with the POST body including the location strategy and selector

6. Click on search button

Now let's click the search text button. This is done with an HTTP POST request to /session/<session_id>/element/<element_id/click, with the POST body including the empty dictionary

7. Quit driver

To quit the driver, send the HTTP DELETE request to /session/<session_id>

* * * *

Originally published on Medium

🌟 🌟 🌟 The source code for this blog post can be found here 🌟🌟🌟

GitHub - madhank93/automation_using_chromedriver_postman

References

[1] https://www.selenium.dev/documentation/en/

[2] https://www.youtube.com/watch?v=IcCnzXTxFt0&feature=youtu.be

[3] https://www.slideshare.net/ptrthomas/a-deep-dive-into-the-w3c-webdriver-specification

[4] https://www.erranderr.com/blog/webdriver-ontology.html

[5] https://sfconservancy.org/news/2018/may/31/seleniumW3C/

[6] https://lists.w3.org/Archives/Public/public-browser-tools-testing/2016AprJun/0097.html