On the one hand, it is great that this programming language is going through a sort of rebirth and the universal popularity. On the other hand, the expansion of scripts size loads up the browser and has a negative effect on the loading time of pages. It is very sad that to download one tweet (140 symbols) requires 1.65 MB of scripts.
Let’s imagine that we have an electronic payment system with the database table operations. For example, we want to calculate the size of average operation. It is easy, here's a query:
SELECT avg(amount) FROM transfer;
generated in 3850 seconds
Now let’s imagine that the index should be the latest one, the entries are made every second in the table, and after a month there are millions of them. To aggregate the same data each time is very expensive. Traditional databases do not offer such optimizations. What to do?
There is a widespread view that the ordinary PHP developer does not need to control memory management, but "controlling" and "knowing" are slightly different concepts. I will try to throw light upon some aspects of memory management when working with variables and arrays, and some interesting pitfalls of the internal optimization of PHP. As you can see, the optimization is good, but if you do not know exactly how it is optimized, you might meet the pitfalls, which can make you pretty nervous.
Learning the basics
In PHP a variable consists of two parts: "name" that is stored in hash_table, symbol_table and "value" that is stored in zval container.
This method allows creating multiple variables that are referring to one value, which in some cases allows optimizing the memory usage. How it looks in practice will be written further.
The most common code elements without which it is difficult to imagine a functional script are the following things:
- Creation, assignment and removal of variables (numbers, strings, etc.).
- Creation of arrays and their bypass (as an example will be used the function foreach).
- Passing and return values for functions / methods.
I decided to find out if there is practical sense in writing ++iterator instead of iterator++ when handling iterators. My interest in this question arouse far not from my love to art but from practical reasons. We have intended for a long time to develop PVS-Studio not only in the direction of error search but in the direction of prompting tips on code optimization. A message telling you that you'd better write ++iterator is quite suitable in the scope of optimization.
But how much relevant is this recommendation nowadays? In ancient times, for instance, it was advised not to repeat calculations. It was a good manner to write:
TMP = A + 10;
X = TMP + B;
Y = TMP + C;
X = A + 10 + B;
Y = A + 10 + C;
Such subtle manual optimization is meaningless now. The compiler would handle this task as well. It's just unnecessary complication of code.
Let us learn how to scale your application without having any
Experience, it is very difficult. Now there are many websites that are devoted to these issues, unfortunately, there is no solution that is suitable for all cases. We still need to find solutions ourselves, which are suitable for our requirements. Just as I do.
Several years ago, my boss came to me and said: “We have a new project for you, namely to transfer a website, which already has 1 million visitors per a month. You need to move this website and make sure that traffic could grow in the future without any problems”. I was already an experienced programmer, but I did not have any experience in the field of scalability. I had to learn scalability in the hard way.
2147483647 (231-1), Mersenne prime is the maximum possible value for 32-bit integer that is the largest integer that can be written in 32 bits.
What does this have to do with the phone numbers? Ironically, it has the most direct relation. It turns out that a significant number of American programmers are developing the systems in order to optimize it, where numbers are stored on the server in the form of 32-bit integers. Thus, the maximum possible number is equal to (214) 748-3647 in the United States, where 214 is a code of Dallas. When we enter into the database greater value then it is stored the maximum possible number of 2147483647.
If we conduct a search on the Internet, we can find hundreds of the phone books from all over America, which refer to the same number in Dallas. We can only sympathize to the owner of this phone number.
How could not the project’s clients to notice the mistake on the part of developers? Probably many of them did the business in the region, where the code is less than 214, so that the other phone numbers just did not get into the database. Maybe the developers convinced someone that this is the best way to optimize: in such form the numbers take up less space than when are stored in the form of individual characters. In fact many are obsessed with the optimization. The lessons of Y2K did not go well for all; moreover, a new generation of programmers grew up who do not remember Y2K at all.