Gene's short & hopefully entertaining introduction to CGI for people who already know how to program

by Gene Michael Stover

created Thursday, 2013-01-17 T 20:43:00Z
updated Thursday, 2013-01-17 T 21:18:07Z

What is this?

CGI is old, old old, but maybe you are like my 20-something coworkers who see quick-&-dirty web tools that I've built at work & think "Gene says he doesn't know a thing about JSP, but he writes these little web apps & says he used See Gee Eye. What the hell is that? Is that new? I can't imagine Gene using a new technology. So is it old? And if it's old, why would anyone use it?" Heck, they even asked me to do a presentation on it. So here's a written version of that presentation.

Yep, there are hundreds or thousands of CGI tutorials, & if you already know that, you sure can skip this one. But if you are like my coworkers, wondering what this CGI thing is, & if you want a (hopefully) entertaining tutorial about CGI, here ya go.

I assume that:

You already know how to program.
Your main motivation is your curiosity as a programmer.

I will show you:

what CGI is,
some examples & techniques,
some history,
some opinions & anecdotes, and
where you can learn more.

What is CGI?

CGI stands for Common Gateway Interface, which tells you nothing. If you already know that we're in webland, you might figure that the gateway and interface words imply that something is talking to something, & web servers are important in webland, so maybe it's a standard for web servers talking to... what? Paypal?

CGI is one of the most unhelpful acronyms I've seen, but at least it rolls off the tongue.

CGI is one way that a web server can execute your code that creates content on your web site.

The CGI specification & technology are...

old (dates back to 1993),
easy to understand,
requires little or no supporting software beyond the web server,
portable,
stable,
flexible, &
defined in RFC 3875.

What is CGI, really?

CGI lets you write stand-alone programs that create dynamic content for your web site. Here's how it works:

Your web server...
1. receives a request, tracks it down to some file.
2. realizes that the file is a CGI program.
3. sets up environment variables, STDIN, & STDOUT.
4. launches your CGI program.
Your CGI program starts. It's in its own process just like a command line program would be.
1. Learn about the request from environment variables.
2. Send output to the web browser by printing to STDOUT.
  1. Must send headers first.
  2. Must end the headers by sending an empty line.
  3. May send more content, most likely HTML content.
3. When CGI program is finished, it exits (i.e. return from main or whatever). Its process ends.
Your web server cleans up (probably reaping the dead process that was your CGI program).

Example

Here's a super simple CGI program in pseudo-C.

#include <stdio>

int main() {
  printf("Content-type: text/plain;\r\n");
  printf("\r\n");
  printf("Hello, world!\r\n");
  return 0;
}

That example shows:

Send the headers first. We send just a Content-type header.
End the headers with an empty line.
CGI programs send their content to the browser by printing to standard output.
1. In this example, the content is plain text, which is what the Content-type said we'd send.
Is careful to send CR LF to mark end-of-line.
Exits by returning from main.

The previous example was in C. Here's an equivalent example in Bourne shell to show that you can use any language you prefer when writing CGI programs:

#! /bin/sh

echo "Content-type: text/plain;"
echo ""
echo "Hello, world!"

Producing HTML

Make sure you send the correct Content-type. That is, "text/html".
Send your HTML!

#! /bin/sh

echo "Content-type: text/html;"
echo ""
echo "<html>"
echo "<head>"
echo "<title>Example that produces HTML</title>"
echo "</head>"
echo "<body>"
echo "<p>HTML, world!</p>"
echo "</body>"
echo "</html>"

Dynamic content

Let's send some content that changes every time. This time, we'll use clisp as another reminder that we can use any language we want (& because I like clisp).

#! /usr/local/bin/clisp

;;;
;;; Because we're going to print a lot to standard output, I'm
;;; defining this short-hand function that uses FORMAT to print
;;; to standard output.
;;;
(defun f (&rest args) (apply 'format t args))

(f "Content-type: text/html;~%")
(terpri)

(defun now () (decode-universal-time (get-universal-time)))

(defun day-of-week ()
  (multiple-value-bind (se mi ho da mo ye dow) (now)
    (elt '("Monday" "Tuesday" "Wednesday" "Thursday"
           "Friday" "Saturday" "Sunday")
         dow)))

(f "<html>")
(f "~&<head>")
(f "~&<title>Yet Another Example (Yae!)</title>")
(f "~&</head>")
(f "~&<body>")
(f "~&<p>It's ~A.</p>" (day-of-week))
(f "~&</body>")
(f "~&</html>")
(terpri)

This example shows:

Yet again that you can use any language you want.
Sending the headers right away, then an empty line.
That I can declare code in my script. In this case, I used defun, but that of course depends on the language you are using.

How to learn about the request

Your CGI program can learn about the request from environment variables. Here's a clisp program that dumps all of those environment variables.

#! /usr/local/bin/clisp

;;;
;;; Because we're going to print a lot to standard output, I'm
;;; defining this short-hand function that uses FORMAT to print
;;; to standard output.
;;;
(defun f (&rest args) (apply 'format t args))

(f "Content-type: text/html;~%")
(terpri)

;;;
;;; Encode the string for HTML.  In other words, if the string contains
;;; characters that would print one way as plain text but would screw up
;;; HTML if interpreted as (or embedded in) HTML, then you'll get a new
;;; string that will display (about) the same as HTML as would the original
;;; string as plain text.
;;;
(defun encode-html (x)
  (with-output-to-string (strm)
    (loop for c across x do
	(format strm "~A"
		(case c
		  (#\& "&amp;")
		  (#\< "&lt;")
		  (#\> "&gt;")
		  (#\" "&quot;")
		  (otherwise c))))))

(f "<html>")
(f "~&<head>")
(f "~&<title>Yet Another Example (Yae!)</title>")
(f "~&</head>")
(f "~&<body>")

(f "~&")
(dolist (pair (ext:getenv))
  (destructuring-bind (var value) pair
    (f "~&" var (encode-html value))))
(f "~&~A ~A")

(f "~&</body>")
(f "~&</html>")
(terpri)

Arguments in the request

From the previous examples, you know how to send content to the web browser, but how does your CGI program learn about the request? Like, what if...fixme

Libraries

There are precisely 3,302 CGI libraries. There used to be more, but CGI isn't as populare now as it used to be.

CGI is simple enough that it's easy (& fun!) to write your own library.

Tips, gotchas, & opinions

Send the headers as soon as possible

Send your headers as soon as possible.
Always send a Content-type header.
Seems to work best if you send the Content-type header first, though that's not part of the standard.
You normally can & should omit the Status header unless you want to achieve a specific result, probably redirection.
I've never send a Location header. I suspect it's most useful for RESTful web services implemented with CGI.
I could imagine sending cache control headers (such as Don't Cache), but I've never tried it.
Send a Content-length if you can, but in my opinion...
1. It's not worth constructing the reply in memory just so you can calculate the Content-length because the web server may be caching your result anyway. Seems that your best bet is to start sending output as soon as possible to minimize the time until the user's browser receives something, even if you don't reduce the total time to send the entire result or if by omitting the Content-type you have potentially prevented the user's browser from estimating total transfer time.
2. On unix with plain C, there's no difference between text & binary, but on other OSes & other languages, there is sometimes a difference that can make correct calculation of Content-length difficult. So unless you are sure you can get it right, don't do it.
The standard prohibits some HTTP headers.

If your CGI program sends output before sending headers, the web server will send an error header to the browser. So if you are testing your CGI program with a browser, & you added some debug statements that happen to execute before it sends the headers, you'll see an error that isn't the error you're investigating with those debug statements. So in my experience, it's best to send those headers right away. You normally just need to send "Content-type: text/html;" & no other headers, so you can simply hard-code that header at the very beginning.

End of line conventions

Do your best to make sure that your CGI program sends CR LF (a.k.a. ASCII 0x0D 0x0A) when terminating a header & for the empty line that ends the headers.
The web server will probably compensate for deviations from this, & some programming languages make it difficult to make sure you sent the CR LF instead of some other EOL convention, but it's best to try to get it right.
The end-of-line convention within the body of your reply depends on the content type. For example, HTML is very forgiving about EOL conventions, & if your content type is an image or other binary, non-text type, the concept of end-of-line doesn't apply at all.

Hiding the fact that it's CGI

On Apache, use mod_rewrite.

Beware the current directory

The current working directory for your CGI process is supposed to be the directory that contains the program file. Nevertheless, I suspect it's safer (more secure) if your CGI program relies on absolute pathnames when it accesses other files & programs.

Notes

3,302: Dude, I'm kidding.

End.