What happens when you type https://www.holbertonschool.com in your browser and press Enter
In this occasion let’s going to talk about all the things that are happening behind making a request for a url page. So when you type for example https://www.holbertonschool.com In your browser, certain things are happening to take you there. To understand what are the things that are happening in that process, let’s going to discuss about some concepts that are necessary to comprehend when you are requesting for a URL.
The first thing to know is that web sites lives in web servers, which is a computer system that hosts web pages. It is designed to serve web pages.
On the hardware side, the web server is a computer that stores the components of the web page such as HTML, images, CSS, documents, JS. A web service connects to the internet and supports the exchange of physical data with other devices connected to the web.
On the software side, a web server includes several parts that control how web users access hosted files.
In a more basic way, when a browser needs a file that is hosted on the web page, the browser sends a request to the HTTP server, which is the one that accepts the request when it reaches the correct web server and then finds the required document and sends it from return to the browser, and in case the server does NOT find the correct document it sends an error.
Now, let’s start to talk about important concepts to understand better all of this.
Domain Name System (DNS).
When you type www.holbertonschool.com in your browser you are making a DNS request to find the Internet Protocol associated with that page. Every URL you request has an Internet Protocol associated to it for identified each domain. This means that you can type the Internet protocol instead of the url or domain name, but that is the main reason of the DNS. The Internet Protocol is a number that identifies a web page, and it is how is recognized it when we request for that page, so the DNS make a transformation of that domain into the internet protocol. And what is the reason for that? For humans is easier to remember a name of a web page instead an amount of numbers, so DNS make the internet protocol friendly to remember through letters. That in simple words, but the DNS has a deepest process for doing that transformation from letter to numbers, and is the next:
1. The DNS first goes to the resolver that is our internet provider and checks if the internet protocol is stored in its cache. If not there, it will send the request to the next level.
2. The next one is the root server that doesn’t really know which is the internet protocol of that web page, but it knows where to send the request to find it. So the root server redirects the resolver to the next level.
3. The next level is the top level domain server (TLD) that saves high-level address information such as .org, .com, .net, among others. So the web page we are requesting for belongs to .com high level address, but that is all the TLD knows. For find the IP address the TLD pass the resolver to the next level.
4. The last level is the authoritative name servers that is the responsible to know everything about the domain, so this level is going to response with the internet protocol address to www.holbertonschool.com
This process happen just once because when the resolver receives the IP address, it is going to save it in the cache memory. So In the case that receives another request of www.holbertonschool.com it is going to be stored in that cache, and it doesn’t have to do all this process again.
Transmission Control Protocol/Internet Protocol.
With DNS we talk about the Internet Protocol(IP), but IP is nor the only type of protocol use by the internet. We also have to look the Transmission Control Protocol(TCP). The TCP enable that two hosts are able to make a connection and exchange data between each other. It’s a set of rules that define how servers and clients interact over the network, and how data should be transferred, broken into packets, received, etc. I am not going deeper in this, but if you want to know more, you can visit this link: fortinet
The firewall
Now we are going to talk about more security. All servers are susceptible to hackers if they know the address, and that is the reason of the firewalls. A firewall is a software or hardware useful to secure your network behind a set of rules, from unauthorized access.
The firewall filters the incoming traffic and the outgoing traffic, with the purpose to protect communication on the network.
In a simple way a firewall could be understand it like a wall in front some doors, the doors would be the ports and this ports have a set rules that determines who is allowed to get into the applications.
HTTPS/SSL
When we are making a request to the web page www.holbertonschool.com, in the browser before the subdomain( that would be for this example www) there is a protocol http. This protocols means Hypertext transfer Protocol Secure, and is the secure version of the http. So basically this protocol encrypts the information that is being sent and received in a web site. It protects that the info could be stole in the middle of the transference between the user and the web page for a hacker, but maybe this could not be enough for the security of the information, but there is the SSL.
SSL means secure socket layers and is the protocol that uses all the internet to provide security between the communications. The SSL uses a public encryption key to ensure the data. So when we are accessing to holbertonschool web page, the computer asks to the site for identification, so the web server where is the web page, sends to the computer a copy of the SSL certificate, that is a digital certificate used to identify a web site. Basically SSL is useful to let the computer know, that the web site we are visiting is trustworthy. When a website has this certificate we can see an icon next to the website name in the url, and sometimes it turns green in some browsers.
Load Balancer
As we said before, web sites live on servers. There are some web sites that have a lot of traffic, so they can’t be hosted in just one server, they need more than one. In this situations the load balancer is useful and necessary, because what it does is control the fluency of request to the servers, with the porpoise of control that a server don’t get overload of request.
So basically a load balancer has the following functions:
· Distribute client requests or network load across multiple servers efficiently,
· Guarantees availability and reliability by sending requests only to servers that are online.
· Provides flexibility to add or remove servers as demand requires
The load balancer use load balancing algorithms. We will not go into detail with this, but if you are interested in knowing more about the algorithms used, enter this link: avinetworks
HAproxy is the most common load balancer used, that can be configured with the round-robin algorithm, which distribute the request between the servers, in a sequential way.
Web server
As we said before a web server is a software that lives on a physical server and serves the static content of a web page like is the HTML, CSS, JS, files, among others. The most common servers a Nginx and Appache. So when we type www.holbertonschool.com all of the components inside the physical server works together to displayed all the content by our browser and response for our request.
Application server
As we said in the web server concept, a web server can only handle static content in web pages. So that is the reason we need to have an application server to make that the pages have interaction.
Application servers can communicate all the back end operations such as the business logic. This server is used for make an application dynamic by interacting with a data base to retrieve information that have been requested.
So in simple words a web service is designed to serve web pages, although it may not have the resources to run demanding web applications that’s where the Application server comes in which provides the processing power and memory to run specific applications.
Database
Finally, database is the collection of information that the web site will use, and it should be accessible and organized.
Databases are the last step in the web infrastructure, database is the collection of data and DBMS(database management system) is the program to interact with the database and retrieve, add, and modify data in it.
So for example if we want to add a new record, update it, edit it or remove it in www.holbertonschool.com , we have to use a database that could be MySQL or another one.
Conclusion
When we request any web pages, we get responses in about seconds, but the process that is behind of that is huge. It still many details that are missing here, but this is the base of what is happening under. I think it is really cool know some details about the process of request for a web page that takes few time and it is seems like magic, but it is really more complex than that.