in fact, the so-called algorithm, personally, I have a colleague said is right: the so-called algorithm, is not to say that the mathematical model of the complex is the algorithm, even a simple formula for you to write, as long as can solve the existing business pain points, with the model of their own ideas. It is an algorithm, but it may not be universal, can only solve specific business needs.
in large-scale data under the premise of many complex algorithms, but the effect is not so good, or that we will find ways to simplify the process.
, all these years, I opened the excavator
(1) the first contact should be the classification of Bias
a simple hypothesis: chestnut has a number of large data sets, it nearly million blog as an example. If you provide a blog, let you go with the highest similarity query top N, that is what we usually thought? The usual practice is to calculate the similarity of this blog and other articles, the calculating method of similarity on the lot, is the most simple calculation of the vector angle, based on the vector angle similarity degree. OK, even if you use the most simple calculation process, you imagine the operation of nearly 10 million times how long it takes? Perhaps, some people say, I use Hadoop, using the distributed computing capability to accomplish this task, but if the actual operation, you will find that this is what an awful thing.
to write this article, from a few days ago within the Department members of an internal departments involved in some existing algorithms of review and finishing. But the comparison is embarrassed, because boss is not in the discussion, we became the Tucao conference, it is half the time in the product and business department Tucao ~ ~ but this is a gratifying thing, it can also be seen as our data Department, has been held by the light excavator digging deep into the stage walk.
give a simple chestnuts (well, eat chestnuts): such as SVM, this is a difficult convergence of the algorithm, in the premise of big data, some people want to use it, but want to use more data to train the model, after all, his data amount is too large, many people still want to use the training data as much as possible, in order to achieve the purpose of the model is more accurate. However, with the increase of the amount of training data, such as SVM to the convergence of the algorithm, the cost of computing resources is very great.
said the drag in all sorts of irrelevant matters so much, since a sort of work has not been completed
so, take the opportunity, but also on their own contact, understand, or do some sorts of things to do a sort algorithm. In fact, personally, do the algorithm itself is not born, in college, it is a network of learning some more, but I do not know what data mining algorithm.