Hello folks!


A task of handling hashtags has arisen in the context of data analysis from Twitter. It was needed to take hashtag and split it into separate words. The task seemed primitive, but it turned out, I underestimated it. I had to try several algorithms until I found that.

This article could be considered as a kind of chronology of completing the task with the analysis of the advantages and disadvantages of each used algorithms. So if you are interested in this topic, please make yourself comfortable here.

It should be noted that the task of breaking large text without spaces is very common in NLP. Neuro-linguistic programming (NLP) is an approach to communication, personal development, and psychotherapy created in the 1970s. The title refers to a stated connection between the neurological processes "neuro", language "linguistic" and behavioral patterns that have been learned through experience "programming" and can be organized to achieve specific goals in life.
xially 26 april 2012, 11:36