Large User base Information
1. Running @Mail for large user bases
1.2 Understanding how @Mail works
2. Recommended Hardware setup
2.1 Web Server
2.2 Sendmail / SMTP Server (MTA)
2.3 SQL database Server
3. Suggested Performance Enhancements
3.1 Use mod_perl and Apache as the Web Server
3.2 Separate SQL Server
3.3 Create multiple tables to store Emails in the
Database
3.4 Multiple Web Server machines to handle web requests
3.5 Group of machines to handle the SMTP / Sendmail
3.6 Use Reiserfs file system over ext2fs (Linux
only)
4. Hotmail case study
This document outlines how to configure @Mail to handle a large user
base, and heavily depends on the speed and storage of the hardware used.
Designed for scalability, and taking full advantage an SQL database
backed, the performance of your web eMail system is dramatically increased.
By employing multiple machines to handle Web requests, and using a
powerful group of machines to handle the SQL database backend; @Mail
can full fill the needs of up to one million active users.
1. How @Mail runs for large userbases
To allow @Mail to handle the load of a large userbase, each function
of webemail (web access, email storage , smtp delivery/receiving) needs
to have a dedicated server for the process ; balanced over several machines.
1.2 Understanding how @Mail works:
As the user logs in
Login Screen : user
enters username / password
Web Server machines communicates
to SQL server with username/password . If both match, proceed
@Mail opens users
account in web browser ; with menu bar / sidebar and Inbox loaded
@Mail communicates to the
SQL server to fetch user emails ; sends a query to the SQL server like:
select * from EmailDatabase_# where EmailTo='$username@$pop3server'
Receiving eMail into the system
A message is sent
from a user on the Internet (e.g ben@CGIsupport.com)
The dedicated sendmail
machine(s) receive a message from ben@CGIsupport.com
The sendmail machine inserts a new record to the SQL database.
(Example) Insert into EmailDatabase_# (EmailTo,EmailFrom,EmailMessage)
Values ('$EmailTo','$EmailFrom','$Message')
The process if sending a user eMail
User Composes a message
in the web browser to be sent / User clicks 'submit'
WebServer machine communicates
to the sendmail-machines SMTP server ; Emailto = [user@host] msg = ....
The dedicated sendmail machine delivers the message to the external
host ; or queues the message if an error occurs
@Mail is designed so the Server Load can be shared over several machines.
The Web Server, SQL server and MTA can be run on different servers,
significantly increasing the performance capabilities.
2. Recommended Hardware Setup.
Below is the recommendation of hardware for a userbase of 1 million
users. The hardware suggested can be used as a guide on how to build
the webemail system. For userbases above 1 million , implement the same
hardware X times over. An understanding of how @Mail's internal functions
for storing emails / user profiles + configuration is required before
you implement the hardware.
| 2.1 Web
Server |
| Description: |
Used to handle web request from
users and store temporary files from users Inbox |
| Number of Machines: |
4 - To handle 250,000 users each |
| Software: |
Apache Web Server configured with
Mod Perl , Perl5+ , and Perl modules to communicate with SQL server
. @Mail setup under Apache |
| OS: |
FreeBSD recommended with a custom
built kernel dedicated to the task of @Mail . Or, alternatively
use Linux with the same setup |
| Server Type: |
Pentium Class Machine |
| CPU: |
Pentium 1.5GHZ + . Multiprocessor
recommended but not required |
| RAM: |
512MB - 1GB |
| HardDrives: |
IDE or SCSI drives. 8 or 12GB with
RAID mirroring to a 2nd drive. The hard disk will not be used
for intensive I/O , since all user-emails and profiles are stored
on the master SQL server. |
| Network Card: |
100 MBIT connected directly to
the SQL machines(s) |
| 2.2 Sendmail
/ SMTP Server (MTA) |
| Description: |
Used to receive emails from the
Internet and to communicate with the SQL server to store the email |
| Number of Machines: |
2 to 4 - To handle 1,000,000 users
total |
| Software: |
Sendmail only |
| OS: |
FreeBSD recommended with a custom
built kernel dedicated to the task of SMTP |
| Server Type: |
Intel Class Machines |
| CPU: |
Pentium III 700 + |
| RAM: |
128 - 256 MB |
| HardDrives: |
IDE drive .5-10 GB . The server
will not store emails onto the local HDD, only the mail queue
and temporary files. |
| Network Card: |
100 MBIT connected directly to
the SQL machines(s) |
| 2.3 SQL
Database Server |
| Description: |
Used to store emails received by
the system and to store user-profiles/preferences for webemail
accounts |
| Number of Machines: |
1 - To handle 1,000,000 users each |
| Software: |
SQL Database Software (Oracle recommended
, mySQL can also handle the task) |
| OS: |
SunOS recommended with a custom
built kernel dedicated to the task of serving a large SQL database. |
| Server Type: |
RISC Class |
| CPU: |
700mhz + . 2 x Multiprocessor required |
| RAM: |
1GB |
| HardDrives: |
IDE or SCSI drives .100GB - 150GB+
RAID mirroring to a 2nd drive. The hard disk will store all user-profiles
and emails received by user accounts. The disk can be as large
as the number of accounts hosted (1,000,000 users * X amount of
disk quota for emails) |
| Network Card: |
100 MBIT connected directly to
the Web Server machines(s) |
3 Suggested Performance Enhancements
3.1 Use Mod_Perl + Apache as the WebServer
It is highly recommended you use Mod_Perl for the web server. Since
this dramatically increases the response time of requests to the @Mail
CGI scripts , caches commonly used functions and uses persistent database
connections . Using a custom compiled Apache web server with only the
necessary modules included.
3.2 Maintain a Separate machine (1+) for storing user
emails in the Database
To gain full performance, it is highly recommended you use a dedicated
machine to host the SQL server. The system should run a Unix OS with
a custom kernel built dedicated for the single process of handling a
SQL server ; all other system daemons / features should be disabled.
The computer should run on a 32-bit processors, although 64-bit is recommended
for larger user bases (Sun / IBM machines) . The machine should have
adequate RAM to cache entry's stored in the Database Server. For 32bit
machines it is recommended to run mySQL, otherwise Oracle DB where available.
3.3 Create multiple tables to store Emails in the Database
To cutdown the size of the EmailDatabase and to dramatically increase
the response time on queries , the table that stores user-emails needs
to be cut up into different sections, depending on the Email address
the message is for.
For example. A message is sent to john@domain.com ; @Mail receives the
email, and goes to insert the record into the database. The software
reads the first 2 characters of the recipients email (e.g jo) and inserts
the email into the EmailDatabase_jo table. In doing so, it decreases
the size of the table in the database , and allows for faster queries
on the table.
3.4 Setup multiple Web Server machines to handle web requests
By setting up a group of web servers ; it is possible to balance the
load between several different machines . When a user logs onto the
system, they are redirected to a server within the group.
3.5 Deploy a group of machines to handle the SMTP serving
With a large number of users live on the system there will no doubt
be a large amount of emails been delivered to users active in the system.
Since 1 server alone could not handle to load of hundreds / thousands
of emails been delivered to users Inboxes a minute , a group of machines
are required to distribute the load.
3.6 Use Reiserfs FS over ext2fs (Linux only)
Reiserfs is a file system using a plug-in based object oriented
variant on classical balanced tree algorithms. The results when compared
to the ext2fs conventional block allocation based file system running
under the same operating system and employing the same buffering code
suggest that these algorithms are overall more efficient
4 Large User base Case Study
By learning how Hotmail.com have implemented their webemail service,
we can gain extra information/insight on how to build a similar system
using @Mail. Scalability is the key , it is important to choose a reliable
and fast platform to host the service under.
See http://www.unix-vs-nt.org/kirch/hotmail.html for reference
“The software giant has attempted to exchange the Sun/Solaris infrastructure
of Hotmail with NT since buying it in December 1997. However, the demands
of supporting 10 million users reportedly proved too great for NT, and
Solaris was reinstated. In a leaked report, sources close to Hotmail
said: "... its whole mail server infrastructure is Solaris. NT
couldn't handle it. On the web server, they're running MP Pentiums and
Apache on FreeBSD. They're moving to Solaris for threads. The engineering
team did its best to run NT - and failed. The issue's being escalated."
Hotmail is running Apache's /1.2.1 web server which is not available
for NT due to technical difficulties. A statement on Apache's website
states: "The road to Windows NT has not been a pretty one. Several
attempts have been made, both by Apache Group members and outside folks,
but due to a lack of stability and a clear consensus on how to manage
a true cross-platform development project, NT is not yet a standard
platform supported by Apache. “
Hotmail use FreeBSD machines as the web servers, that communicate to
a cluster of Sun machines that store user-emails/profiles in an Oracle
Database. Using both Mod_Perl and CGI, hotmail.com was initially created.
Since Microsoft purchased the service, they have attempted to port over
the software to Windows with failure.
At Hotmail a group of machines are setup to handle the delivery of
emails to user accounts ; the MX record of hotmail.com shows 6 machines
are setup with the same MX priority . When a message is sent to a user@hotmail.com
, the email is randomly sent to any one of the machines in the MX lookup.
hotmail.com. 28m26s IN MX 10 mail.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc2.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc4.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc5.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc6.law5.hotmail.com.
hotmail.com. 28m26s IN MX 10 mc7.law5.hotmail.com.
Hotmail use a group of machines to handle the delivery of emails to
user accounts ; the MX record of hotmail.com shows 6 machines are setup
with the same MX priority . When a message is sent to a user@hotmail.com
, the email is randomly sent to any one of the machines in the MX lookup.