@Mail System Documentation : v 3.4
The latest version of this file can be found at:
http://atmail.nl/docs/

Large User base Information

1.   Running @Mail for large user bases
      1.2   Understanding how @Mail works

2.   Recommended Hardware setup
      2.1 Web Server
      2.2 Sendmail / SMTP Server  (MTA)
      2.3 SQL database Server

3.   Suggested Performance Enhancements
      3.1 Use mod_perl and Apache as the Web Server
      3.2 Separate SQL Server
      3.3 Create multiple tables to store Emails in the Database
      3.4 Multiple Web Server machines to handle web requests
      3.5 Group of machines to handle the SMTP / Sendmail
      3.6 Use Reiserfs file system over ext2fs (Linux only)

4.   Hotmail case study


This document outlines how to configure @Mail to handle a large user base, and heavily depends on the speed and storage of the hardware used. Designed for scalability, and taking full advantage an SQL database backed, the performance of your web eMail system is dramatically increased.

By employing multiple machines to handle Web requests, and using a powerful group of machines to handle the SQL database backend; @Mail can full fill the needs of up to one million active users.

1. How @Mail runs for large userbases

To allow @Mail to handle the load of a large userbase, each function of webemail (web access, email storage , smtp delivery/receiving) needs to have a dedicated server for the process ; balanced over several machines.

1.2 Understanding how @Mail works:

As the user logs in

         Login Screen : user enters username / password

        Web Server machines communicates to SQL server with username/password . If both match, proceed

         @Mail opens users account in web browser ; with menu bar / sidebar and Inbox loaded

        @Mail communicates to the SQL server to fetch user emails ; sends a query to the SQL server like:
select * from EmailDatabase_# where EmailTo='$username@$pop3server'

Receiving eMail into the system

         A message is sent from a user on the Internet (e.g ben@CGIsupport.com)

        The dedicated sendmail machine(s) receive a message from ben@CGIsupport.com
The sendmail machine inserts a new record to the SQL database.
(Example) Insert into EmailDatabase_# (EmailTo,EmailFrom,EmailMessage) Values ('$EmailTo','$EmailFrom','$Message')

The process if sending a user eMail

         User Composes a message in the web browser to be sent / User clicks 'submit'

        WebServer machine communicates to the sendmail-machines SMTP server ; Emailto = [user@host] msg = ....
The dedicated sendmail machine delivers the message to the external host ; or queues the message if an error occurs

@Mail is designed so the Server Load can be shared over several machines. The Web Server, SQL server and MTA can be run on different servers, significantly increasing the performance capabilities.

2.   Recommended Hardware Setup.

Below is the recommendation of hardware for a userbase of 1 million users. The hardware suggested can be used as a guide on how to build the webemail system. For userbases above 1 million , implement the same hardware X times over. An understanding of how @Mail's internal functions for storing emails / user profiles + configuration is required before you implement the hardware.

2.1 Web Server

Description:

Used to handle web request from users and store temporary files from users Inbox

Number of Machines:

4 - To handle 250,000 users each

Software:

Apache Web Server configured with Mod Perl , Perl5+ , and Perl modules to communicate with SQL server . @Mail setup under Apache

OS:

FreeBSD recommended with a custom built kernel dedicated to the task of @Mail . Or, alternatively use Linux with the same setup

Server Type:

Pentium Class Machine

CPU:

Pentium 1.5GHZ + . Multiprocessor recommended but not required

RAM:

512MB - 1GB

HardDrives:

IDE or SCSI drives. 8 or 12GB with RAID mirroring to a 2nd drive. The hard disk will not be used for intensive I/O , since all user-emails and profiles are stored on the master SQL server.

Network Card:

100 MBIT connected directly to the SQL machines(s)


2.2 Sendmail / SMTP Server  (MTA)

Description:

Used to receive emails from the Internet and to communicate with the SQL server to store the email

Number of Machines:

2 to 4 - To handle 1,000,000 users total

Software:

Sendmail only

OS:

FreeBSD recommended with a custom built kernel dedicated to the task of SMTP

Server Type:

Intel Class Machines

CPU:

Pentium III 700 +

RAM:

128 - 256 MB

HardDrives:

IDE drive .5-10 GB . The server will not store emails onto the local HDD, only the mail queue and temporary files.

Network Card:

100 MBIT connected directly to the SQL machines(s)


2.3 SQL Database Server

Description:

Used to store emails received by the system and to store user-profiles/preferences for webemail accounts

Number of Machines:

1 - To handle 1,000,000 users each

Software:

SQL Database Software (Oracle recommended , mySQL can also handle the task)

OS:

SunOS recommended with a custom built kernel dedicated to the task of serving a large SQL database.

Server Type:

RISC Class

CPU:

700mhz + . 2 x Multiprocessor required

RAM:

1GB

HardDrives:

IDE or SCSI drives .100GB - 150GB+ RAID mirroring to a 2nd drive. The hard disk will store all user-profiles and emails received by user accounts. The disk can be as large as the number of accounts hosted (1,000,000 users * X amount of disk quota for emails)

Network Card:

100 MBIT connected directly to the Web Server machines(s)

3 Suggested Performance Enhancements

3.1 Use Mod_Perl + Apache as the WebServer
It is highly recommended you use Mod_Perl for the web server. Since this dramatically increases the response time of requests to the @Mail CGI scripts , caches commonly used functions and uses persistent database connections . Using a custom compiled Apache web server with only the necessary modules included.

3.2 Maintain a Separate machine (1+) for storing user emails in the Database
To gain full performance, it is highly recommended you use a dedicated machine to host the SQL server. The system should run a Unix OS with a custom kernel built dedicated for the single process of handling a SQL server ; all other system daemons / features should be disabled. The computer should run on a 32-bit processors, although 64-bit is recommended for larger user bases (Sun / IBM machines) . The machine should have adequate RAM to cache entry's stored in the Database Server. For 32bit machines it is recommended to run mySQL, otherwise Oracle DB where available.

3.3 Create multiple tables to store Emails in the Database
To cutdown the size of the EmailDatabase and to dramatically increase the response time on queries , the table that stores user-emails needs to be cut up into different sections, depending on the Email address the message is for.

For example. A message is sent to john@domain.com ; @Mail receives the email, and goes to insert the record into the database. The software reads the first 2 characters of the recipients email (e.g jo) and inserts the email into the EmailDatabase_jo table. In doing so, it decreases the size of the table in the database , and allows for faster queries on the table.

3.4 Setup multiple Web Server machines to handle web requests
By setting up a group of web servers ; it is possible to balance the load between several different machines . When a user logs onto the system, they are redirected to a server within the group.

3.5 Deploy a group of machines to handle the SMTP serving
With a large number of users live on the system there will no doubt be a large amount of emails been delivered to users active in the system. Since 1 server alone could not handle to load of hundreds / thousands of emails been delivered to users Inboxes a minute , a group of machines are required to distribute the load.

3.6 Use Reiserfs FS over ext2fs (Linux only)
Reiserfs is a file system using a plug-in based object oriented variant on classical balanced tree algorithms. The results when compared to the ext2fs conventional block allocation based file system running under the same operating system and employing the same buffering code suggest that these algorithms are overall more efficient

4 Large User base Case Study

By learning how Hotmail.com have implemented their webemail service, we can gain extra information/insight on how to build a similar system using @Mail. Scalability is the key , it is important to choose a reliable and fast platform to host the service under.

See http://www.unix-vs-nt.org/kirch/hotmail.html for reference

“The software giant has attempted to exchange the Sun/Solaris infrastructure of Hotmail with NT since buying it in December 1997. However, the demands of supporting 10 million users reportedly proved too great for NT, and Solaris was reinstated. In a leaked report, sources close to Hotmail said: "... its whole mail server infrastructure is Solaris. NT couldn't handle it. On the web server, they're running MP Pentiums and Apache on FreeBSD. They're moving to Solaris for threads. The engineering team did its best to run NT - and failed. The issue's being escalated." Hotmail is running Apache's /1.2.1 web server which is not available for NT due to technical difficulties. A statement on Apache's website states: "The road to Windows NT has not been a pretty one. Several attempts have been made, both by Apache Group members and outside folks, but due to a lack of stability and a clear consensus on how to manage a true cross-platform development project, NT is not yet a standard platform supported by Apache. “

Hotmail use FreeBSD machines as the web servers, that communicate to a cluster of Sun machines that store user-emails/profiles in an Oracle Database. Using both Mod_Perl and CGI, hotmail.com was initially created. Since Microsoft purchased the service, they have attempted to port over the software to Windows with failure.

At Hotmail a group of machines are setup to handle the delivery of emails to user accounts ; the MX record of hotmail.com shows 6 machines are setup with the same MX priority . When a message is sent to a user@hotmail.com , the email is randomly sent to any one of the machines in the MX lookup.

hotmail.com. 28m26s IN MX 10 mail.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc2.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc4.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc5.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc6.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc7.law5.hotmail.com.

Hotmail use a group of machines to handle the delivery of emails to user accounts ; the MX record of hotmail.com shows 6 machines are setup with the same MX priority . When a message is sent to a user@hotmail.com , the email is randomly sent to any one of the machines in the MX lookup.