We read directly from MySQL files
Somehow, I found a reference in the depths of documentation MySQL on dev.mysql.com/ about the fact that if MyISAM is used, we can get an increase in 5-7 times in reading speed from the table, if we read the data from the table ourselves. I wanted to verify this fact, so I did it. Let us see what came out of it.
What type of data will we read from MyISAM?
Format MyISAM has several formats (namely they are multiple storage formats of indexes and several types of data storage for the most common data, which are fixed and dynamic). I was interested to consider the very simple case: when a table has a fixed length of record, but it does not contain NULL fields and the type of queries that we are going to test. This is a simple SELECT * FROM tbl WHERE (condition) is without sharing the indexes and other tables (it is called full scan).
Basically, I was inspired by the sporting interest. Nevertheless, the results were good, so maybe someone besides me would want to try to use this method in practice.
How do we read?
Initially, I have considered quite a few options as possible to read data from a MyISAM including the use of indexes. In order to read directly from the index files there is a pretty good description in the file myisam.txt, which previously had distributed with the source codes of MySQL, and now it can be found, for example: in MariaDB. For a program that wants to read from MyISAM along with the index structures may need to collect a noticeable part of MySQL. A good part of it is that we need to do this only once:).
I will look at standalone application that does not depend on the MySQL source codes, and it does not use indexes, but only reads directly from .MYD files in “fixed” format strings.
Format storage in MYD
The storage format is described very well for the different structures of MyISAM here: forge.mysql.com/wiki/MySQL_Internals_MyISAM. The format is so simple for the fixed strings that it can be given straight to the text of the article:
1) .MYD consists of the consecutive string and there is not any official information in .MYD.
2) Each string has a header, which contains the information on NULL fields and a flag that indicates that the string is deleted. If there are not NULL fields, then the header length is 1 byte.
3) The fields go in a binary format in the string in that order, which defines the fields in the table with the reverse byte order (little endian) and no blanks.
For the different types of MySQL binary conception could be diverse and it is well described at the link above.
How do we read from .MYD?
Let us assume that we intend to read a data from .MYD that written in C language, here are a few interesting moments of my realization that relate to the performance:
1) If the length of one string is small (let us say 10 bytes), even when using fread() we should not read one by one string, let us say 100 strings at once, it is accelerated reading from a file in 2 times for my program.
2) Most likely, an access to the individual fields will be misaligned, so if an architecture is different from x86 (for example, PPC), then we should not use the following simple structure in order to get some specific field: *(int*)(ptr) (ptr is a pointer to the beginning of the field, thus the field itself has a type int).
In addition, we have to consider the tables lock issues at the beginning of the reading (this can be done with LOCK TABLES tbl READ). If we do not do it, there is a chance to run into a "dirty" data during reading a data. On the other hand, when reading from MyISAM we can safely ignore locks and read from a table at a time when somebody has been writing in there with the least side effects.
Here is something about the data record
MyISAM with the fixed format strings is so simple that we can create ourselves tables to MySQL and fill them with data without indexes, because we do not need always, but if they are really needed, we can use MySQL to create them through ALTER TABLE.
What do we need?
It is known that MyISAM tables consist of 3 files: .frm, .MYI and .MYD. We already know how to generate .MYD file. It remains to generate .frm and .MYI files. We probably ask: why do we need .MYI at all? Are not there stored indexes (which we do not plan)? In fact, there are stored not only indexes.
The easiest way to generate .frm and. MYI is to take the files from some "donor", namely from a different table. This table should not have any indexes and the table must have a structure that corresponds to our .MYD file, or MySQL cannot read such a table:).
If we just copy .frm and .MYI files from the tables and record the strings in .MYD, then most likely we will find that MySQL assumes that the newly-made table has 0 strings. This should lead us to thoughts that MyISAM table is not so simple:). In fact, there is a fairly well-known fact that MyISAM tables give the information about the number of strings, which is located in one of the fields of the structure of the table. We just need to find the field. Oddly enough, this field is in MYI and is not in .frm file. It is in the same file that is "indexes". Also, there is another information that is specific for MyISAM and it is not written in .frm file (.frm is common to several types of tables).
In the above description of format MyISAM is a description of MYI files. In fact, we just need to rewrite only 2 fields in this file (for version MySQL 5.1):
1) state->state.records (offset as for the beginning of file - 0x1C, length of 8 bytes, direct order of byte (big endian), contains the total number of strings minus the deleted)
2) state->state.data_file_length (offset as for the beginning of file - 0x44, length of 8 bytes, direct order of byte (big endian), contains the file size .MYD in bytes).
If we have problems reading the "newly" tables, we should first check that we are not rewriting an existing table and creating a new one. If we want to write a table instead of the old one – first, we need to do DROP TABLE of this table directly from MySQL, and then recreate the table (so that MySQL accidentally does not use cache of the open files). Another option, we can do FLUSH TABLE for the table and then record the new data there (it is useful if we do APPEND to the table).
So, if you read the article to the end (and if you know C), then you should have enough knowledge to write yourself C program that reads and writes simple MyISAM tables. It depends on the complexity of queries that you want to run with your program, the increase in speed can be up to 5-6 times. I was able to write a program that does something useful by reading the entire table of MyISAM, it took me about ¼ of time if I would run of the same query directly from MySQL. In my opinion, it is a good timing. By the way, the program has came out about ~5 Kb and ~ 150 strings of C code.
So, if you ever need a super-high-performance of full scan, then try to use MyISAM + your program on C to read from the table. You will be pleasantly surprised just how easy it is to raise the performance of reading in the several times.
Here are numbers that you asked for:
1. The numbers will depend on the realization. I already wrote that I was able to speed up my desired query in 4 times. Total speed reading was about 1 Gb/sec., with the recording size in 25 bytes.
2. I did not detect any differences between FULL SCAN when using HANDLER or SELECT.
Using FULL SCAN in MyISAM gives the speed reading, according to my estimates at least 3-4 times more than the FULL SCAN in InnoDB.
3. FULL SCAN itself in MyISAM works in 5-10 times faster than an index scan with the same number of records.
4. PRIMARY KEY scan runs just as fast in InnoDB as the FULL SCAN, the scan for any other index run in 3-4 times slower.
The comparisons do not make a great sense with NoSQL, if FULL SCAN is done for a large table:
1. Memcached cannot do a FULL SCAN memory.
2. HandlerSocket is able to do FULL SCAN, but the speed is the same as SELECT / HANDLER have (if there are a lot of strings); also it is not supported in MySQL 5.1.
3. Mongo, Redis and others occupy more memory / storage that is totally unacceptable in many cases. No doubt, they work quickly, but their purposes are different.
|Vote for this post
Bring it to the Main Page