CGI
- Common Gateway Interface (CGI)
- Some Alternatives to CGI Scripting
- Sending Form Data
- Environment Variables
- Output of a CGI Script
Common Gateway Interface (CGI)
- The user interface to an Internet application is a web page containing a form for the user to fill out (this page could have been dynamically generated itself)
- As soon as the user submits the form, the web browser collects the form data and, together with the form action, assembles it into a CGI request
- When the web server gets a CGI request, it executes the application in a separate process (typically, as there are some servers that don't)
- The server passes the parameters received with the request (a name value pair for each field in the form) to the application and collects its output
- This output (usually HTML) is then returned to the client just like a static page
![]()
An Example CGI Script
- The following CGI script is implemented in Perl
- It computes the local time on the web server and returns the result to the browser
#!/usr/bin/perl -w use strict; my $time = localtime; print "Content-Type: text/html\n\n"; print "The current time on this server is: $time";
- It produces the following output:
The current time on this server is: Sat Jan 6 21:58:25 2001
Comments on the Time Server
#!/usr/bin/perl -w
- Tells the server to use the program "/usr/bin/perl" to interpret this script
- Run "which perl" on the command line to find out the location of the Perl interpreter
- The "-w" flag tells the Perl interpreter to produce warnings if there are potential errors in the script
use strict;
- Enables strict rules for variables, subroutines and references
- Helps locate subtle errors, and typos that might not generate a syntax error
- Enforces good programming practices (eg declare all variables)
print "Content-Type: text/html\n\n";
- Outputs header field declaring the content type of the document returned
- This is followed by an additional blank line to indicate that the header is complete
Invoking CGI Scripts
- Server typically configured to map a particular path such as "/cgi-bin" to CGI scripts
http://www.host.com/cgi-bin/localtime.cgi
- On Linux, CGI scripts must be made executable by running
> chmod 755 myscript.cgi
- Forgetting to make scripts executable is a common mistake
Special Instructions for the Lab
- The web server for this course runs on host sigma10 and port 8000
- This host is best accessed from within the lab, but you can access it externally using SSH (see the instructions on teh School Help page)
- HTML files and CGI scripts must be in a special directory
~<user>/course_html
- In addition, CGI scripts must have the extension ".cgi", however, they don't have to be in a special subdirectory
- For example, assuming the time server script was called "localtime.cgi", the full URL to the script on the sigma10 server would be
sigma10:8000/~<user>/localtime.cgi
- Note that you must also set the permissions for both your home directory and the "course_html" directory to "711"
- Change into your home directory (if you are not there already)
- Change the permissions for the home and "course-html" directories
> cd > chmod 711 . > chmod 711 course_html
Some Alternatives to CGI Scripting
- To put CGI into perspective, let us briefly mention some alternatives
- These alternatives all build on CGI's legacy and share underlying goal
- Handle requests and respond with dynamic content
- Alternatives attempt to avoid the main drawback of CGI scripts
- A separate process is created every time the script is executed
- Another reason is to simplify writing server-side programs, eg by allowing programmers to embed application code within their web pages
List of Alternatives
- ASP (Active Server Pages)
- Interpreter integrated into the web server
- Allows programmers to mix code and HTML on the same page instead of writing separate scripts
- Multiple languages supported (VisualBasic, JavaScript)
- Proprietary to Microsoft
- PHP (PHP Hypertext Processor)
- Interpreter integrated into the web server
- Like ASP and JSP, it supports code embedded in a web page
- Syntax similar to Perl
- Popular, in particular, because of its good integration with MySQL
- Open source
- ColdFusion
- HTML pages can contain tags that call ColdFusion functions
- Developers can create their own functions
- Proprietary to Allaire (since bought by Macromedia)
- Java servlets
- Similar to CGI scripts in that the are programs that generate documents
- Must be compiled before they are run; dynamically loaded by the web server
- Java Server Pages (JSP) allows developers to embed Java code within web pages
- Open source and commercial implementations
Sending Form Data
- There are two types of HTTP requests as indicated by the "method" attribute of the <form> tag: GET and POST
- In a GET request, form data is included with the URL and passed in the request line
- Request parameters are passed to the web server as part of the URL
- This means that the number of parameters that can be sent is limited by the maximum size of an URL (usually 1024 chars)
- In a POST request, parameters are sent as the content of the HTTP request
- There are no limitations on the size of the content
- The rationale is then to use POST for any non-trivial request
- A more subtle reason is that the results of a POST request will not be cached by the browser (why is this important?)
- Note that, while the parameters of a POST request are not sent as part of the URL, users can still see them using the browser's "view source"
Request Parameters
- Form input fields correspond to request parameters
<input type="checkbox" name="send_email" value="yes">
- In this example, if the checkbox is checked off, the parameter "send_email" is sent to the web server with value "yes"
- When a form is submitted, its data is sent in a query string
- In a query string, each parameter is represented as as a name value pair
- Name value pairs are separated by ampersands (&)
- In the case of a GET request, the query string is passed with the URL
- Syntactically it is separated from the URL specified in the form action by a "?"
register.cgi?email=joe@cool.com&send_email=yes- Note that query strings can also include data that is not formatted as name value pairs (in this case, it is up the CGI script to interpret it)
Encoding of Special Chrarcters
- The browser must encode special characters in the form data before sending them to the web server
- Collects names and values of form input fields
- Replaces special characters in the values with a "%" and a number
- Except for spaces, which are replaced with a "+"
- For example, "thanks for your help!" becomes
thanks+for+your+help%21
A Real Life Example
- Consider, again, the review form used by Amazon used as an example earlier
![]()
- When using a POST method, the form data will be submitted in the content of the HTTP request as follows:
rating=5& summary=Great+read& review=This+book+is+extremely+well+written+...& x=79& y=5- Note since the "preview" button does not have a name (consult the form in HTML > Lecture 2), the names "x" and "y" appear without a prefix (such as "preview.x")
Environment Variables
- The web server passes information to CGI scripts through environment variables (GET), and the standard input (POST)
- Script accesses environment variables via a predefined hashtable (%ENV)
- The following script displays all environment variables
#!/usr/bin/perl -w use strict; print <<END_OF_HTML; Content-Type: text/html <table border="1"> END_OF_HTML my $name; foreach $name (sort keys %ENV) { print <<END_OF_HTML; <tr> <th>$name</th> <td>$ENV{$name}</td> </tr> END_OF_HTML } print <<END_OF_HTML; </table> END_OF_HTML- Running the script produces a table with the environment variables for the script (the output was edited somewhat to simplify it)
HTTP_ACCEPT application/futuresplash, application/rtf, application/sdp, application/x-itool, application/x-rtsp, application/x-shockwave-flash, audio/basic, audio/mpeg, audio/vnd.qcelp, audio/wav, audio/x-aiff, audio/x-midi, image/gif, image/jpeg, image/pict, image/png, image/tiff, image/x-macpaint, image/x-photoshop, image/x-quicktime, image/x-targa, image/x-xbitmap, image/xbm, text/html, text/plain, video/flc, video/quicktime, video/x-msvideo, */* HTTP_USER_AGENT Mozilla/4.5 (compatible; OmniWeb/4.0.6; Mac_PowerPC) QUERY_STRING param1=value1śm2=value2 REMOTE_ADDR 127.0.0.1 REQUEST_METHOD GET SCRIPT_NAME /~mrw/cgi-bin/env.cgi
Common Environment Variables
- REQUEST_METHOD
- Indicates the method by which the request was made
- QUERY_STRING
- Query information (a list of name-value pairs)
- CONTENT_LENGTH
- Length of the data passed to the CGI script through STDIN
More Environment Variables
- SCRIPT_NAME
- Path of the script being executed
- PATH_INFO
- Extra path information passed to a CGI script
- REMOTE_HOST
- Remote hostname of the client making the request (could be the address of a proxy between client and server)
- REMOTE_ADDR
- Remote IP address of the client making the request
Still More Environment Variables
- HTTP_USER_AGENT
- Name and version of client's browser
- HTTP_ACCEPT
- List of the content types the client can accept
- HTTP_COOKIE
- Name-value pair stored earlier at the client by the server
Output of a CGI Script
- Every CGI script must output at least a header field indicating the content type, a location redirect, or status
- There are three types of header fields the script can output:
- Content-Type
- Specifying the type of content
- Location
- Specifying an URL to redirect the client to
- Status
- With a status code the server should include in the status line
- Each of these will be discussed below
Returning Content
- The most common response for CGI scripts is to output HTML
- A CGI script must indicate which type of the content it is returning
print "Content-Type: text/html\n\n"
- By specifying another content type than HTML you can output other types of documents (eg application/pdf)
- A blank line after the Content-Type tells the web server that this is the last header field (to be followed by the content)
Forwarding to Another URL
- Reasons for forwarding:
- If the same message is returned by multiple CGI scripts, forward them to a common document (eg a help page)
- The current script wants to invoke another script (can't do so directly!)
- Output a Location header with the URL of the new location
print "Location: such_and_such_document.html\n\n";
- URLs may be absolute or relative
- Absolute URLs or relative URLs with a relative path are sent back to the browser, which then creates another request for the new URL
- Relative URLs with a full path (eg /index.html) produce an internal request, which is handled by the web server without talking to the browser
Status Codes
- Typically a status code is something that the web server would add to the response by the CGI script, but you can send your own
print "Status: 400 Bad Request\n\n";
- Some common status codes are:
- 200 OK
- The request was processed successfully (added by web server)
- 302 Found
- The URL has changed and browsers should direct all future requests to the URL provided in the Location header
- 400 Bad Request
- Browser send an invalid request (this one will be detected by the web server)
- 401 Unauthorized
- Requested resource is in a protected realm (invalid password)
- 403 Forbidden
- Client is not allowed to access the requested resource (eg no permissions)
- 500 Internal Server Error
- Something happened on the server that caused a failure (eg a syntax error)