Regular Expressions
Raiting:
0

280 scribbles or the bursting strength of regular expressions


In general, as any beginning programmer of JavaScript (2 years ago), I wanted to carry out everything myself. So there appeared a very fast regular expression of 280 characters.

A little history


Approximately one and a half years ago, I found out about the library “yass”, which has been the fastest tool to find DOM elements in a JavaScript through CSS selectors (reference to tests).
Then I had a terrible interest. I wanted to invent a way that will be even faster. At that time, I just was reading the book “Regular expressions- Library of programmer” the second edition of J. Fridley. It was summer, I was still a student and I had plenty of free time. Work has begun...

I decided to write an article because of the following expression, which can almost completely analyze the CSS selector query (even a little advanced, which goes beyond the standard CSS3):
/(?:(?:\s*[+>~,]\s*|\s+)|[^:+>~,\s\\[\]]+(?:\\.[^:+>~,\s\\[\]]*)*)|\[(?:[^\\[\]]*(?:\\.[^\\[\]]*)*|[^=]+=~?\s*(?:"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*'))\]|:[^\\:([]+(?:\\.[^\\:([]*)*(?:\((?:[^\\()]*(?:\\.[^\\()]*)*|"[^\\"]*(?:\\.[^"\\]*)*"|'[^\\']*(?:\\.[^'\\]*)*')\))?/g

Let us be friends


I have to say that a normal person does not understand anything in the rows above! I am considering myself as abnormal, and in order to write that I did a parser of regular expressions in the JavaScript. In fact, it turned out as a simple form: in one box – a regular expression, in another - the search string, and the third - a result, several checkboxes.
Let us write this expression in a readable form, using the modifier “x” (I performed the emulation for JavaScript).
(?:
(?: \ S *[+>~,] \ s * | \ s +)
|
[^:+>~, \ S \ \ [\ ]]+(?: \ \ .[^:+>~, \ s \ \ [\ ]]*)*
)
|
\ [(?:
[^ \ \ [\ ]]*(?: \ \. [^ \ \ [\ ]]*)*
|
[^=]+=~? \ S *
(?:
"[^ \ \ "]*(?: \ \. [^" \ \ ]*)*"
|
'[^ \ \']*(?: \ \. [^ \ \ ]*)*'
)
) \]
|
: [^ \ \ :([]+(?: \ \. [^ \ \ :([]*)*
(?:
\ ((?:
[^ \ \ ()]*(?: \ \. [^ \ \ ()]*)*
|
"[^ \ \ "]*(?: \ \. [^" \ \ ]*)*"
|
'[^ \ \']*(?: \ \. [^ \ \ ]*)*'
) \)
)?

Some theory


In order to make it clear, I say that in this expression is part of the repeated structure of the form “start (normal characters) * (special characters (normal characters) *) end”. It is almost a universal construction of finding anything between some characters, such as finding text between the quotes and quotes can be embedded considering the shielding. More information can be found in the above mentioned book in the chapter of “Building of effective regular expressions”.
In our case, it concerns the search text between quotation marks (“and’), round and square brackets, as well as the characters “+”, “>”, “~”, “,”, “”, “:”.

Let us analyze


Basis for the construction of this expression is the ability to break apart a CSS selector. I broke it as:

• where we will look for (among the children of 1st generation, neighbors or within the tree, that is, +> ~ _ space _)
• pseudo selector type “: some_function (some_arguments)”
• selector attribute type “[someAttr (some expression like =) SomeValue]”

Let us now compare this with the expression.
The first part looks for or “+”, “>”, “,”, “~”, “\s +”, if it is not found, then it looks for everything in between.
Second part handles brackets. Template “[^=]+=~? \ s *” was built in order to be able to do a search on attribute selectors using the arbitrarily complex regular expressions.
Third part finds a match for pseudo selectors, the round brackets is optional.
All characters can be shielded with a backslash (“\”) or take the expression in single or double quotes, and then they will not be taken as a controlling.

Conclusion


I think, further it is clear how easy is to write a parser CSS3 selectors. Who are interested to experiment come here. I would be very grateful if someone would comment in terms of improving of the working speed or the strictness of regular expressions.
Of course, thanks to J. Fridley, the author of a series of priceless books on the regular expressions

PS: I apologize for the parser of regular expressions. It was created as an intermediate stage (it works in Chrome and FF for sure). If something does not work, there is a callback to change, click on the checkbox and/or simply paste a space in the field with a regular expression, it should run.
“Translated from another resource”
Pirat 5 september 2011, 14:53
Vote for this post
Bring it to the Main Page
 

Comments

Leave a Reply

B
I
U
S
Help
Avaible tags
  • <b>...</b>highlighting important text on the page in bold
  • <i>..</i>highlighting important text on the page in italic
  • <u>...</u>allocated with tag <u> text shownas underlined
  • <s>...</s>allocated with tag <s> text shown as strikethrough
  • <sup>...</sup>, <sub>...</sub>text in the tag <sup> appears as a superscript, <sub> - subscript
  • <blockquote>...</blockquote>For  highlight citation, use the tag <blockquote>
  • <code lang="lang">...</code>highlighting the program code (supported by bash, cpp, cs, css, xml, html, java, javascript, lisp, lua, php, perl, python, ruby, sql, scala, text)
  • <a href="http://...">...</a>link, specify the desired Internet address in the href attribute
  • <img src="http://..." alt="text" />specify the full path of image in the src attribute