The notion of a
"thin-client" is a myth today. Perhaps this will change with
the proliferation of TV- or Palm-top browsers, but the vast majority of web
clients today use a highly functional PC, with plenty of client-side storage and
lots of interesting client-side content.
File Upload is a useful feature to virtually any web
application. Here is a sample of how some of our customers are integrating file
upload with their web applications:
Web-based e-mail use file upload to add attachments
to messages.
Extranet Applications use file upload to send files
among partners, such as certificates of conformance, software updates or
documentation.
Technical support sites use file upload to receive
error logs and defective documents from users
Intranet document publishing use file upload to share
files among users with a friendly web interface
Graphics libraries use file upload to control
submissions and generate thumbnails
ISP-hosted Storefronts use file upload to send
product images.
Web-based file upload is vastly superior alternative to
other means of transferring files to a central server over the Internet
protocols. Let's examine why.
HTTP vs. FTP
FTP has been the standard mechanism for sending files to
a server since the earliest days of TCP/IP. It is reliable, can take into
account text vs. binary formats across platforms, and there are ubiquitous
clients. However, compared to the flexibility of HTTP, it is deeply lacking.
Let's compare:
Authentication: With FTP uploads you must either manage many
user accounts or allow anonymous access. With uploads via a web application,
the application can determine who is allowed to upload, without a large
administrative burden.
Security: Uploads via HTTP can be SSL encoded so that the
information is encrypted during transmission. There is no means for doing that
using standard FTP.
Ease
of configuration: FTP uploads require the administrator to fine tune
NTFS permissions. With HTTP-based uploads and your application, this is
determined by the application as well as by the adminstrator, if desired.
Flexibility: Want to save DOC files in one location and graphics
in another? With FTP, your users have to know that. With a web application,
you can enforce these policies in your application and change them without
disrupting your users.
Power: With a web application, you can limit the size of the
uploaded file dynamically every time it is invoked. You could even change the
size depending on information contained in the same form. Additionally, you
can flush uploads that do match certain criteria, such as wrong MIME type or
file contents.
Simplicity and friendliness: A pleasing web page can offer
instructions, advise, on-line help. This is not possible with batch based FTP.
More importantly, when errors occur, you can provide immediate feedback to the
user and offer corrective action.
Firewall support: Many organizations do not allow out-bound FTP
for security and intellectual property reasons. While this is simply a
configuration issue, most firewalls do allow HTTP uploads.
Supplemental Information: An HTTP upload (using RFC1867) renders
accessible additional information about the upload, such as the user's
original filename. This can be very useful in intranet scenarios.
Upload to a database: Server-side components, such SA-FileUp,
allows you to upload to an OLE DB database. Try that with FTP!
Performance: both FTP and HTTP ultimately use the TCP
protocol, which is the primary determinant of transfer performance.
Reliability and Restart: Both FTP and HTTP 1.1 allow for
transfer restart. Unfortunately, many servers including IIS, do not support
restart of either protocol at this time. FTP restart is apparently coming in
IIS5.
In short, like the web itself, it is programmability of the server that offers vast advantages of HTTP
uploads over FTP.
Forms of HTTP upload
There are three mechanisms of file upload via HTTP:
RFC1867, PUT and WebDAV.
HTTP Upload Method 1: RFC1867
RFC1867 (http://info.internet.isi.edu/in-notes/rfc/files/rfc1867.txt) stayed
as a proposed standard within the IETF for a while before it received the
blessing of the W3C ultimately in HTML 3.2. It was first implemented by Netscape
in Navigator 2.0, followed by Microsoft as an add-on to IE 3.02 (32-bit) and
native in IE 3.03 (16-bit). It is a very simple yet powerful idea: define a new
type of form field
<INPUT TYPE=
"FILE">
and add different encoding scheme to the form itself, rather than the typical:
This encoding scheme is much more efficient at
transferring large amounts of data than the default
"application/x-url-encoded" form encoding scheme. As you may be aware,
URL encoding has a very limited character set. Anything outside of the character
set must be replaced by '%nn' where nn is the two digit hexadecimal equivalent.
For example, even the common <space> character is replaced by '%20'. If
the browser had to encode entire files using this inefficient scheme, the
transmitted size of the uploaded file could 2-3 times larger than the original
file! Instead, RFC1867 use Multipart MIME encoding, as commonly found in e-mail
messages, to transfer large amounts of data with no encoding,
and just a few simple but useful headers around the data.
The result looks like a regular HTML form post, but
rather than being say, 4 KB of form data, it can be megabytes long! RFC1867 also
proposed a number of attributes of the TYPE="FILE" tag that have yet to adopted by the browser vendors. These include:
ACCEPT: to let the web site restrict the type of file
to be uploaded before receiving the file
SIZE: to set size of a single filename text box or to
allow multiple files with a single <INPUT> tag
MAXLENGTH: to potentially set on the client-side, the
maximum size file to be transferred.
Wildcards and directory uploads: neither IE nor
Navigator supports wildcarded names or directories even though this is suggested in the RFC.
Fortunately, both browser vendors implemented the
suggested "Browse..." button so the user can easily pick the file to
be uploaded using the native "Open File..." dialog box.
The use of the VALUE clause is interesting. Normally, it
is intuitive to let the web site preset values of form fields for user
convenience. However in this case, it could allow a nefarious web site to preset
the name of the file to uploaded, and coupled with a client-side form submit,
"steal" files off a user's PC without their consent. In the summer of
1997, the CERT in conjunction with an employee at Bell Labs, issued a security
warning about this, and both Netscape and Microsoft quickly issued patches that
prevent presetting the file be uploaded (see: http://www.microsoft.com/ie/security/bell.htm)
This is unfortunate, since the original RFC1867 clearly
specified "it is important that a user agent not send any file that the
user has not explicitly asked to be sent." So rather than disabling
presetting the name entirely, the browser vendors could have simply issued an
alert dialog box such as : "Are you want to transmit files x, y, z to the
server?". As a final twist to this, yet another security hole was found in
IE 4.01 in mid-October that allows a web site to circumvent IE's current
security mechanism (see http://www.microsoft.com/windows/ie/security/paste.htm)
HTTP Upload Method 2: HTTP
PUT
HTTP 1.1 introduced a new HTTP verb: PUT. When a web
server receives an HTTP PUT and object name ("/myweb/image/x.gif"), it
will authenticate the user, and take the content of the HTTP stream and store it
directly to the web server. Since this could wreak havoc on a web site it is not
used frequently. It also takes away HTTP's greatest advantage:
programmability of the server. In the case of PUT, the web server
handles the request itself: there is no room for a CGI or ASP application to
step in. The only way for your application to capture a PUT is to operate on the
low-level, ISAPI filter level. Most web developers have no interest in this,
with due
reason.
HTTP Upload Method 3:
WebDAV
WebDAV (http://www.ietf.org/html.charters/webdav-charter.html) allows
Distributed Authoring and Versioning of web content. It introduces several new
HTTP verbs that permit uploading, locking/unlocking, check-in/check-out of web
content via HTTP. Think of it as a non-proprietary Configuration Management
(e.g. SourceSafe) plus file transfer for the web. Microsoft has publicly
announced that it will be supported in IIS5, Office 2000 and future versions of
IE. ISPs will love it as a replacement for the low-level, often broken,
mechanics of FrontPage server extensions. Note that it will not replace the
FrontPage server extensions: it will simply offer low-level standard services to
support the more sophisticated functions that the server extensions currently
perform. It is via WebDAV that Office 2000 can do those nifty "Save to
web" functions you may have seen
at the October '98 PDC.
Sounds great, right? Well, if all you are interested in
is uploading content, WebDAV is great. It solves many problems. However, if you
need file uploading within your web application, WebDAV will do nothing for you.
Like HTTP PUT, the WebDAV verbs are interpreted by the server, not your web
application. You need to work at the ISAPI filter level to access the WebDAV
verbs and interpret
the content in your application.
HTTP Upload Mechanisms: Conclusion
RFC1867 still remains the most flexible means of
uploading files to your web application. PUT has very limited use. WebDAV is
great for content authors, such as FrontPage users, but will be of little use to
web developers who want to add file
upload to their web application.
ASP Implementation
So we've concluded that RFC1867 is best way to add file
upload capabilities to your web application. How is it actually implemented?
What tools does Microsoft supply? What other tools are
available?
Microsoft's Posting
Acceptor
ASP does not understand the
"multipart/form-data" encoding scheme. Instead, Microsoft provides for
free the Posting Acceptor (http://www.microsoft.com/iis/support/iishelp/iis/htm/core/pareadme.htm).
The Posting Acceptor is an ISAPI application that accepts a REPOST to an ASP
page after the upload is complete. (See also Scott Stanfield's article in July
'98 issue of MIND).
SA-FileUp from Software Artisans
SA-FileUp (http://www.softartisans.com/softartisans/saf.html)
was one of the first commercial Active Server Components. Version 1 shipped in
May '97 and is currently in use on thousands of sites world-wide including
microsoft.com. Early betas used a combination of ISAPI filter and Active Server
component for integration with ASP. Microsoft then delivered ASP 1.0b (ASP.DLL
1.15.14.0) that provided a new method: Request.BinaryRead. The BinaryRead method
made available the raw, unprocessed data from the browser to an Active Server
component. Once that was available, SA-FileUp dropped the need for the ISAPI
filter
and now exists purely as an ASP component.
Using Request.BinaryRead, as does SA-FileUp, is mutually
exclusive with the Request.Form object. This makes sense: how could you read the
raw stream of data from the browser and concurrently parse it as it were form
information? To make life easier for the ASP developer, SA-FileUp reimplements
all of the Request.Form functionality in its own .Form
collection. This makes using SA-FileUp familiar to
ASP coders who are used to using Request.Form.
Comparison of Posting Acceptor and
SA-FileUp
Here is an
objective as possible comparison between PA and SA-FileUp:
ASP Integration: SA-FileUp is fully scriptable
by Active Server Pages. Rather than existing as a separate ISAPI DLL,
SA-FileUp integrates very smoothly with your ASP application.
Standards support: PA Upload from IE browsers
uses the proprietary WebPost API, rather than the standard RFC 1867, so by
default you need different forms for Netscape and IE users.
Anonymous Connections: Since PA uses an ISAPI
DLL, it must provide additional security protection outside of your ASP
application. For this reason, PA
disallows all anonymous connections by
default. PA 1.1 can allow
anonymous uploads, but since there is programmatic control of the upload,
there is a considerable security risk here. Since SA-FileUp is integrated with ASP, your
application can decide the appropriate level of security, including
anonymous.
Control of the Upload: PA does not allow any control of
the upload as it being sent. With SA-FileUp, you limit the size of the
upload, or decide at run-time to flush the upload. Best of all, you can
change the location of the upload dynamically.
Processing: PA has a two-step upload and repost
processing. With SA-FileUp, everything can be accomplished in a single step,
such as writing to a database depending on the status of the upload.
Uploading to a Database: PA can only upload to files.
SA-FileUp can upload to files as well as databases.
"Spaces in filenames": PA has a known issue when
processing filenames that contains spaces. SA-FileUp has no such
restriction.
Price: PA
is bundled with NT Option Pack and free for download from Microsoft.
SA-FileUp is not free: it is a supported commercial component.
Scott Stanfield, President of Vertigo Software (http://www.vertigosoftware.com ),
author of the Posting Acceptor article for the July '98 MIND Magazine, wrote
to Software Artisans upon learning about SA-FileUp after the MIND article was
published:
"We were very excited to learn
about [SA-FileUp]. Fantastic and very valuable product"
Common Support
Issues
By far, the most common support issues for file upload
are security related. Typically, a site has secured NTFS permissions
too carefully, which prevents the anonymous user account from
writing to the destination file location. Also, security is often misunderstood
by even advanced server
administrators.
Remember that IIS/ASP executes each ASP page in a
specific security context. If no authentication mechanism is in place (no Basic,
no NT Challenge/Response), each page is executed as the anonymous user. The NT
account that corresponds to the anonymous user can be set by the web
admin.
For IIS3, the default anonymous user is
IUSR_<computername>.
For IIS4, the default anonymous user is
IUSR_<computername> for all in-process web applications ("Run in a
separate memory space" is not checked). The default
anonymous user is IWAM_<computername> for all out of process web
applications ("Run in a separate memory space" is
checked).
When using SA-FileUp, you must ensure that the
destination directory has Read, Write and Delete permissions by the appropriate
user.
If authentication is in force, then IIS/ASP will
impersonate the authenticated user during the execution of the ASP page. So, the
authenticated user's NT login account must have Read, Write and Permissions to
the destination
directory.
A complete discussion of IIS security is beyond the
scope of the article. Please see the IIS 4 Resource Kit for a very good
explanation.
Let's See Some
Code
So enough theory, let see what the ASP code looks
like.
A Single File
Upload
Here is a simple HTML form that will upload a single file:
<HTML> <HEAD>
<TITLE>Please Upload Your File</TITLE>
</HEAD>
<BODY>
<form enctype="multipart/form-data" method="post" action="formresp.asp">
Enter filename to upload: <input type="file" name="f1"><br>
<input type="submit">
</form>
</BODY>
</HTML>
Here would be the file 'formresp.asp'
<%@ LANGUAGE="VBSCRIPT" %>
<HTML><HEAD>
<TITLE>Upload File Results</TITLE>
</HEAD>
<BODY>
Thank you for uploading your file.<br>
<% Set upl = Server.CreateObject("SoftArtisans.FileUp") %>
<% upl.SaveAs "C:\temp\upload.out" %><BR>
Total Bytes Written: <%=upl.TotalBytes%>
</BODY>
</HTML>
File Upload with Additional Form Elements
Adding additional form elements is easy. It behaves just like a usual HTML
form, as long as the ENCTYPE is specified correctly:
<HTML> <HEAD>
<TITLE>Please Upload Your File</TITLE>
</HEAD>
<BODY>
<form enctype="multipart/form-data" method="post" action="mformresp.asp">
Enter description: <input type="text" name="descrip"><br>
Enter filename to upload: <input type="file" name="f1"><br>
<input type="submit">
</form>
</BODY>
</HTML>
Here would be the file 'mformresp.asp':
<%@ LANGUAGE="VBSCRIPT" %>
<HTML><HEAD>
<TITLE>Upload File Results</TITLE>
</HEAD>
<BODY>
Thank you for uploading your file.<br>
<% Set upl = Server.CreateObject("SoftArtisans.FileUp") %>
<% upl.SaveAs "C:\temp\upload.out" %><BR>
Your description is: '<%=upl.Form("descrip")%>'<BR>
Total Bytes Written: <%=upl.TotalBytes%>
</BODY>
</HTML>
What About Multiple Files
For multiple files, since the browser's do not support the SIZE= attribute,
you must use an additional <INPUT> tag for every file:
Enter first filename: <input type="file" name="f1"><br>
Enter second filename: <input type="file" name="f2"><br>
The form processing is the same:
<%@ LANGUAGE="VBSCRIPT" %>
<HTML><HEAD>
<TITLE>Multiple File Upload Results</TITLE>
</HEAD>
<BODY>
Thank you for uploading your files.<br>
<% Set upl = Server.CreateObject("SoftArtisans.FileUp") %>
<% upl.Form("f1").SaveAs "C:\temp\upload1.out" %><BR>
Total Bytes Written for file 1: <%=upl.Form("f1").TotalBytes%>
<% upl.Form("f2").SaveAs "C:\temp\upload2.out" %><BR>
Total Bytes Written for file 2: <%=upl.Form("f2").TotalBytes%>
</BODY>
</HTML>
Limiting the Size of the Upload
To limit the size of the upload, simply set a property:
<%@ LANGUAGE="VBSCRIPT" %>
<HTML><HEAD>
<TITLE>Upload File Results</TITLE>
</HEAD>
<BODY>
Thank you for uploading your file.<br>
<% Set upl = Server.CreateObject("SoftArtisans.FileUp") %>
<% upl.MaxBytes = 1000 '--- limit the upload size to 1000 bytes %>
The maximum size that you are permitted to upload is <%=upl.MaxBytes%> bytes per file.<br>
<% upl.SaveAs "C:\temp\upload.out" %>
Total Bytes Written: <%=upl.TotalBytes%><br>
Server Filename: <%=upl.ServerName%><br>
Total Bytes Transmitted by you: <%=Request.TotalBytes%>
</BODY>
</HTML>
Any content after the 1000th byte will be discarded, so the web server's
disks are not unnecessarily filled.
Conclusion
Uploading files to your web application is simple: it can be accomplished in
as little as two lines of ASP code. HTTP/RFC1867 file upload is the preferred
mechanism because of the rich programming environment offered by the server.
SA-FileUp, as an Active Server component integrated with ASP, offers significant
advantages over the free Posting Acceptor from Microsoft.
David Wihl is President of Software Artisans, Inc. (http://www.softartisans.com), in
Brookline, MA, a rapidly growing provider of high performance Active Server
Components. He was the original author of SA-FileUp and is still deeply involved
with its exciting upcoming enhancements. He can be reached at wihl@softartisans.com
SA-FileUp™ is a trademark of Software Artisans, Inc.
All other trademarks are the property of their respective owners.