Advanced technologies as a way to squeeze out of a server the maximum performance
It is funny, but when a programmer is developing a product, he rarely thinks about the question: Could 2000 people press simultaneously one button at the same time? It turns out they can. Curiously enough the most engines were written by such programmers that do not work well under heavy loads. Who would have thought that just one additional INSERT or unwritten index or a recursive function curve could raise significantly the load averages.
This article will describe how the developers of the project managed to squeeze out of a single server with the Pentium 4 HT / 512Mb RAM the maximum performance, holding simultaneously 700 users on the forum and 120000 on tracker. Also, it will give some details about Highload.
Here is the project:
TorrentPier is the ordinary BitTorrent-tracker engine (primarily based on phpbb 2.x)
• Server on FreeBSD 6.0
• Pentium 4 HT / 512Mb RAM
• Web-server Apache
• MySQL Database
• Logic is in PHP
That is almost LAMP
Here are the steps:
• Installation of opcode cache on the server
• Replacement of apache to nginx
• Caching some temporary sampling out of RDBMS
• Translation of key part (read tracker) in C ++
• Optimization of the network stack FreeBSD, as well as its update to the latest STABLE
• Optimization of MySQL
• Caching BB-codes
• Code rewriting to SphinxSearch use
• Code profiling and installation of monitoring
• Query parsing from MySQL slow query log
Here are more details on the listed above steps:
Installation of opcode cache on the server
It is always needed! Installation of php-cache gave 300% in the performance, spending 15 minutes of time.
Caches are different: eAccelerator, xCache, APC, etc ... The last one was chosen, because of a good speed and ability to store user data in it.
Replacement of apache to nginx
Apache is heavy and slow, at first it was as the main web-server, and then nginx was installed, which was returning the static and compressive responses using gzip. Later, it was preferred the ligament of nginx + php-fpm (initially, it was spawn_fcgi, but now this version is better) to apache. The ligament was not popular for the production in those days, but it worked great!
Caching some temporary sampling out of RDBMS
RDBMS is a disaster. It is convenient, but this convenience is for a price. In this case, it is speed. Namely, we need it, so that part of the most popular results and not critical to the relevance of the query were cached in the APC. Some people may say why not in memcached ... Memcached is not a solution for everything. The APC was chosen in this particular case, because it does not use the TCP connection and works much faster. Moreover, as long as everything works on a single server, a distributed storage is not so necessary.
There is a possibility to choose any other key / value storage that does not necessarily store data in memory.
It is likely that memcached / memcachedb / memcacheQ will be the best option in your case.
Translation of key part (read tracker) in C ++
120000 active peers create the connections to nginx, and then it gets even worse, because every one of them pulls php. Do not you think that is too much? Therefore, one developer rewrote the XBTT code for the frontend TorrentPier. It was worth it, the client now refers to the tracker on 2710 port, which keeps in memory the list of peers, where it quickly finds and processes them, and then it returns them back to the peer. Once per minute it sends the results into the database. It gives good performance!
Here are the results of the test, the declaration time is set for 1 minute
input (rl0) output
packets errs bytes packets errs bytes colls drops
20K 0 2.5M 16K 0 1.5M 0 0
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
10 root 1 171 52 0K 8K RUN 1 538.6H 47.12% idle: cpu1
6994 root 1 108 0 98140K 96292K CPU0 0 3:57 33.98% xbt_tracker
11 root 1 171 52 0K 8K RUN 0 595.0H 31.20% idle: cpu0
35 root 1 -68 -187 0K 8K WAIT 0 17.1H 21.14% irq21: rl0
12 root 1 -44 -163 0K 8K WAIT 0 482:57 9.96% swi1: net
[[email protected]****] /usr/ports/devel/google-perftools/> netstat -an | wc -l
The price of the issue is 100 MB of the memory and 30% loading. Altogether, the same load can hold about 8 million peers on one machine at a half-hour time of declaration.
Optimization of the network stack FreeBSD, as well as its update to the latest STABLE
FreeBSD 6 version has a good scheduler 4BSD and FreeBSD 7 version has a nice thing as ULE, which works in times faster on SMP.
Also, It is recommended for any FreeBSD version to buy sysctl, which is an interface for examining and dynamically changing parameters in the BSD and Linux operating systems.
Optimization of MySQL
Any project will require a database. This project in not an exception.
myisam is used for two reasons:
• It is used by default
• it has a FULLTEXT index to search the forum
So we spent a lot of time on the buffers. Especially, tuning-primer.sh helped a great deal.
It is planned to transfer the database to Xtradb. In any case, the database fits into the memory =).
It turns out that phpbb converts bbcod to html. It is not good. The generated html code was cached in a separate database field for each post / signature. As a result, the database was doubled, but on the other hand the site got faster.
Code rewriting to SphinxSearch use
In order to implement this project one developer did the entire site search using ultra-fast SphinxSearch. The result exceeded all expectations. Server again got faster.
As an indirect effect this has allowed bringing in a super-mega-convenient rss with built-in search, which fits for the server.
Code profiling and installation of monitoring
It is strange that many people still do not used it, but they should do it. If we do not know where the bottleneck is how we will eliminate it. In order to do this, we put profiler hooks into the php code and installed munin to the server, which is a networked resource monitoring tool that can help analyze resource trends.
Query parsing from MySQL slow query log
Here is a classic! 20% of database queries takes about 80% of the time. After the log queries and subscription FORCE INDEX queries were parsed, loading dropped by half during the peak hours, and the home page started loading in 10 times faster.
It is highly recommended to do this operation once or twice a year or after the introduction of many small innovations. The mysqlsla tool would be very helpful.
Here are a few steps that turned the ordinary LAMP in an integrated system. Now most people use a regular Core2Duo 2GHz with 3GB of RAM that could be enough for loading during the peak hours.
If you have read this far, then the topic was interested.
|Vote for this post
Bring it to the Main Page