Introduction to HTTP requests
Internet is all about connecting and getting together. As programmers, we are the ones who make this getting together happen. Getting together for normal people means fun, smiles, music, dance. For programmers getting together means TCP/IP. This article will get into HTTP requests, one of the most common ways for programs to communicate between each other over the web.
Every time you open your browser and key in a URL your browser makes an HTTP request and sends it to a web server. That is how web servers understand what you want from them. This is a seamless step that happens on your behalf but you never see it because you don't need to. Well, you don't need to know about it unless you are a programmer and want to have low level control for debugging, optimization, learning, curiosity - you name it.
Simple HTTP GET requests
Let's see what HTTP requests look like.
In this first example we request the URL http://www.example.com/path/to/page.php?var1=value1&var2=value2 .
Here is a hypothetical request from Firefox:
GET /path/to/page.php?var1=value1&var2=value2 HTTP/1.1 Host: www.example.com User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:28.0) Gecko/20100101 Firefox/28.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive
In fact, not all those headers are necessary. A stripped down version to the bare essentials looks like this:
GET /path/to/page.php?var1=value1&var2=value2 HTTP/1.1 Host: www.example.com
Notice that in both cases there is an empty line at the bottom. That empty line separates the request's header and body. In this case there is no need to have a body in our request, that is why the request must end with an empty line.
One small detail that makes a lot of difference in some servers, the line endings must be CR LF. In PHP that is echo "\r\n". A simple "\n" is not "legal" as a line break for HTTP headers and it may not be accepted by some web servers.
Another detail that makes a difference in terms of performance is the header Connection: keep-alive. If your request comes from a web browser you'd better keep that setting. If it comes from an API client that does not make frequent calls, it helps the web server if you inform it that you won't need it any more. In the language of HTTP headers that would be Connection: close .
Finally, notice the first line in both versions of example 1. They start with "GET" continue with a valid URL with some optional variables and end in "HTTP" and its version, in this case "1.1". There are three elements, and four things to notice. 1) That line must always be the first line in HTTP requests. The rest of them can go in any order you like. 2) The term "GET" is called the "HTTP method" or "HTTP verb"; read on for more. 3) The URL must exist. In case you wish to access the frontpage of the website this URL must be a simple "/". The rest of it, is a URL without the opening domain you usually have in a URL. Domains go to the following header "Host". 4) The closing part of that line is HTTP/1.1. It could also be HTTP/1.0 if you were using the specifications of HTTP v1.0.
There are many HTTP headers available. If you are new to this, it will prove to be very wise of you to study them in depth as they can solve nasty problems or help you make useful optimization decisions. RFC 2616 has a full list of HTTP v1.1 headers. The next version of HTTP (HTTP v2.0) is close to stable release, so stay tuned for more on this field.
URL encoded HTTP POST requests
We saw what GET requests look like, now let's examine POST requests. GET and POST are the two most common HTTP methods. POST requests are used when users submit an HTML form.
There are two types of HTTP POST requests, URL encoded and multipart. Multipart requests become useful when users upload files. In this section we will examine URL encoded POST requests.
In the second example we request the URL http://www.example.com/path/to/page.php with the same variables from the previous example.
Here is a hypothetical request from Firefox:
POST /path/to/page.php HTTP/1.1 Host: www.example.com User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:28.0) Gecko/20100101 Firefox/28.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Content-Type: application/x-www-form-urlencoded Content-Length: 23 var1=value1&var2=value2
Notice the two new headers "Content-Type", "Content-Length" and the variables in the body of the request. You can add more variables and everything follows the exact same format as you would do in URLs. That is, separate each pair of variable name and value with an ampersand (&) and escape every non latin character with the hexadecimal format %xx.
The value in Content-Length is the number of bytes in the body. Be careful on how you count bytes in strings. In case your system uses multibyte strings you should take steps to ensure you count bytes, not characters.
Like we did in the previous example, let's see the bare minimum version of this example:
POST /path/to/page.php HTTP/1.1 Host: www.example.com Content-Type: application/x-www-form-urlencoded Content-Length: 23 var1=value1&var2=value2
Multipart HTTP POST requests
As we explained before, multipart HTTP POST requests are created by browsers when users upload files. Of course, uploading a file is not a necessary requirement in order to use this type of request. Multipart requests have a different format from URL encoding. That saves a lot of space because URL encoding expands every byte outside the latin character range with three characters (%xx). Binary files have a lot of such characters, therefore this format helps not wasting bandwidth or other resources when your data contain non latin characters.
Our third example follows the lead of example 2. We request the URL http://www.example.com/path/to/page.php with the same variables plus a file for you to study the differences.
Here is a hypothetical request from Firefox:
POST /path/to/page.php HTTP/1.1 Host: www.example.com User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:28.0) Gecko/20100101 Firefox/28.0 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-US,en;q=0.5 Accept-Encoding: gzip, deflate Connection: keep-alive Content-Type: multipart/form-data; boundary=AbCdeFg1234 Content-Length: 374 --AbCdeFg1234 Content-Disposition: form-data; name="var1" Content-Type: text/plain value1 --AbCdeFg1234 Content-Disposition: form-data; name="var2" Content-Type: text/plain value2 --AbCdeFg1234 Content-Disposition: form-data; name="a_file"; filename="image.jpg" Content-Type: image/jpeg Content-Transfer-Encoding: binary [.. image data ..] --AbCdeFg1234--
There are a couple of things to notice here.
The first thing to notice is the boundary statement in the main header. After the equals sign (=) you can set an arbitrary string. It can be anything, as long as there is no chance it is something that exists in the content of your variables. A usual practice is a long sequence of dashes and a hexadecimal number, like -------------------4c2b8253f3829d7fa1b92c626889c5a1. An easy way to get a string like this in PHP is $boundary = '-------------------' . md5(time()); .
The boundary you set in your query separates the different variables. Notice that each time it must be placed in it's own line and use the prefix --. The last occurrence signifies the end of the form and it has two dashes both at the start and at the end.
In case you missed it from the previous examples, it is important to separate your lines in CR LF, which in PHP is \r\n, not just \n.
As you already know from the previous examples the main header is separated by the body of the request with an empty line. Notice that in this format every variable has its own header and body and these two are also separated by an empty line.
In case you wish to send files to the web server - and make use of the $_FILES variable in PHP, instead of $_POST - you have to add a filename header like the one we used in the last variable.
One last important detail is to notice the value in Content-Length in the main header. That value counts the total bytes of the body that follows, irrespectively of how many variables you have in your request. The first empty line (CR LF bytes) that separates the main header from the body does not add in that number.
Let's see how this request shrinks to the bare minimum, like we did in the previous examples:
POST /path/to/page.php HTTP/1.1 Host: www.example.com Content-Type: multipart/form-data; boundary=AbCdeFg1234 Content-Length: 287 --AbCdeFg1234 Content-Disposition: form-data; name="var1" value1 --AbCdeFg1234 Content-Disposition: form-data; name="var2" value2 --AbCdeFg1234 Content-Disposition: form-data; name="a_file"; filename="image.jpg" Content-Type: image/jpeg [.. image data ..] --AbCdeFg1234--
HTTP methods/verbs and RESTful APIs
HTTP methods/verbs can be GET, POST, HEAD, PUT, DELETE, TRACE, OPTIONS, CONNECT and PATCH.
HTTP was originally designed to serve simple HTML pages. From its beginning until today it has gone through stages where programmers experimented and attempted to expand the protocol in many different ways. Not all those methods/verbs listed before are really useful today. If you program from simple HTML you can use only GET and POST anyway.
The rest of them are either out of reach, or insecure, or not that useful.
Verbs other than GET or POST are usually used in APIs or making calls to existing APIs.
What is worth noting is the caching feature of GET requests. Proxies cache GET requests, so if you build an API you should encourage or even impose client-side programmers to use POST or other verbs for calls that should not be cacheable.
UPDATE: On 2018-04-18 we made several minor rectifications in this article.