home › Mozilla Firefox › The simplest description of how the Yandex search engine works. Search engine yandex ru Main search engine yandex

The simplest description of how the Yandex search engine works. Search engine yandex ru Main search engine yandex

In this article I will talk about what the Yandex search engine is about the work of this search engine and point out examples of sites that the Yandex search engine limits in ranking.

Search system Yandex, in terms of its popularity, ranks 20th in the world and 1st in Russia. Officially, Yandex was approved in 1997 on September 23, its development began within the Comp Tek International company, and already in 2000 Yandex began to exist as a separate company.

The founders of the company are Arkady Yuryevich Volozh, who is the general director, and Ilya Valentinovich Segalovich (1964-2013), Yandex founder and director of technology and development. We got a little familiar with the history of Yandex, now let's talk about its search engine.

And so the main direction of Yandex is the search engine, distinctive feature which is the fine tuning of the search query. The Yandex search engine allows you to search for your selected request in Russian, Ukrainian, Belarusian, Tatar, Kazakh, English, Turkish, German and French, while taking into account their morphological spelling.

Yandex has also developed a thorough algorithm for assessing the relevance and the principle of checking documents with the exception of their copies in different encodings. Unlike Google, more precisely from its PR-PageRank ranking algorithm, one more important point for the search engine Yandex, is the introduction of a thematic citation index - TIC.

Yandex search engine

http://www.yandex.ru
The Yandex search engine has robots, which are a specific program for checking sites for their relevance. Search robots go to the site using direct links, indexing new pages and saving them to their database. In order for the indexed page of the site to get to the TOP, which is very important, it is necessary to take into account such points of indexing as the frequency keywords on the page, the number of external links leading to your site, and the total weight of the site, which is measured by such an indicator as Yandex TIC.

An example of sites that Yandex system limits in ranking

Sites with non-unique content that has been copied or rewritten from other sites.

Sites that link intensively to each other in groups.

Sites with meaningless content.

Websites that use deceptive technology.

Forums and message boards that contain a lot of link spam.

Websites that try to earn relevance by placing external links that are not an invitation to the author to visit his resource.

Good afternoon, dear readers of my SEO blog ... This article is about how the Yandex search engine works what technologies and algorithms it uses to rank sites, what it does to prepare a response to users. Many people know that this flagship of Russian search sets the tone on Runet, owns the largest database in Eurasia, operates on more than a billion pages of content, knows the answer to any question. According to Liveinternet data for August 2012, Yandex's share in Russia is 60.5%. The monthly audience of the portal is 48.9 million people. But the most important thing for us bloggers is how the search engine receives our requests, how it processes them and what kind of result is obtained. On the one hand, knowing and understanding this information, it is easier for us to use all Yandex resources, on the other hand, it is easier to promote our blogs. Therefore, I propose to see the most important technologies the best search engine in Runet.

When an Internet user first wants to turn to a search engine for information, he may have one question: "How does the search take place?" But when he receives it, this question often changes to another: "Why so quickly?" Indeed, why does it take 20 seconds to search for a file on a computer, but the result of a query from the entire network of computers around the world appears in a second? The most interesting thing is that the first two questions (how the search takes place and why 1 second) can be in one answer - the search engine has prepared in advance for the user's request.

To understand how Yandex works, like any other search engine, let's draw an analogy with a telephone directory. To find any phone number, you need to know the last name of the subscriber, and any search in this case takes a maximum of a minute, because all the pages of the directory are a continuous alphabetical index. But imagine if the search went according to another option, where the phone numbers would be ordered by the numbers themselves. After such searches, which will drag on for a longer time, the numbers before the eyes of the seeker will stand for a very long time. 🙂

So the search engine puts all the information from the Internet in a convenient form. And most importantly, all this data is put into her directory in advance, before a visitor arrives with his requests. That is, when we ask Yandex a question, it already knows our answer. And gives it to us in a second. But this second includes a number of important processes, which we will now consider in detail.

Indexing the Internet

Yandex ru collects all information on the Internet that it can reach. With the help of special equipment, all content is viewed, including images by visual parameters. A search engine is engaged in such collection, and the process of collecting and preparing data is called indexing. The basis of such a machine includes computer system, which is otherwise called a search robot. It regularly crawls indexed sites, checks them for new content, and also scans the Internet for deleted pages. If it discovers that some such page no longer exists or is closed from indexing, then it removes it from search.

How does a search robot find new sites? Firstly, thanks to links from other sites. Because if a link is placed to a new web resource from an already indexed site, then the next time you visit the second one, the robot will visit the first one too. Secondly, there is a wonderful service, popularly called "addurilka" (from the phrase on english language -addurl - add address). In it, you can enter the address of your new site, which after a while will be visited by the search robot. Thirdly, with the help of a special program "Yandex.Bar" the visits of users who use it are tracked. Accordingly, if a person gets to a new web resource, a robot will soon appear there.

Do all pages go to search? Millions of pages are indexed every day. Among them there are pages of varying quality, which may contain different information - from unique content to sheer garbage. Moreover, according to statistics, there is much more garbage on the Internet. The search robot analyzes each document using special algorithms. He determines if he has any useful information, whether he can answer the user's request. If not, then such pages are not taken "as astronauts", if yes, then he is included in the search.

After a robot has visited a page and determined its usefulness, it appears in the search engine storage. Here is the analysis of any document to the very basics, as the masters of the auto center say - to the screws. The page is cleared of html markup, the clean text goes through a full inventory - the location of each word is counted. In such a disassembled form, the page turns into a table with numbers and letters, which is otherwise called an index. Now, no matter what happens to the web resource that contains this page, its last copy is always in the search. Even if the site no longer exists, copies of its documents are stored on the Internet for some time.

Each index, together with data on types of documents, encoding, language, together with copies, make up search base ... It is periodically updated, therefore it is located on special servers, with the help of which the requests of users of the search engine are processed.

How often does the indexing process take place? This primarily depends on the types of sites. The first type of web resource very often changes the content of its pages. That is, when a search robot comes to these pages each time, they contain different content each time. You won't be able to find anything from them next time, so such sites are not included in the index. The second type of sites is data warehouses, on the pages of which links to documents for download are periodically added. The content of such a site usually does not change, so the robot visits it extremely rarely. Other sites depend on the frequency of content updates. I mean the following - the faster new content appears on the site, the more often the search robot comes. And priority is given first of all to the most important web resources (a news site is an order of magnitude more important than any blog, for example).

Indexing allows you to perform the first function of a search engine - collecting information on new pages on the Internet. But Yandex also has a second function - searching for an answer to a user's request in an already prepared search base.

Yandex is preparing a response

The process of processing the request and issuing relevant answers is engaged in computer system "Metaseoisk" ... For her work, she first collects all the input information: from which region the request was made, what class it belongs to, are there any errors in the request, etc. After such processing, the metasearch checks whether the database contains exactly the same queries with the same parameters. If the answer is yes, then the system shows the user the previously saved results. If such a question does not exist in the database, the metasearch turns to the search database, which contains the index data.

And this is where amazing things happen. Imagine that there is one super-powerful computer that stores the entire Internet processed by search robots. The user sets a request and the search for all documents involved in the request begins in memory cells. The answer has been found and everyone is happy. But let's take another case when there are a lot of queries containing the same words in their bodies. The system must each time go through the same memory cells, which can increase the time for data processing several times. Accordingly, the time increases, which can lead to the loss of a user - he will turn to another search engine for help.

To avoid such delays, all copies in the site index are distributed across different computers... After sending the request, the metasearch instructs such servers to look for their piece of text. After that, all data from these machines are returned to the central computer, it combines all the results obtained and gives the user the top ten best answers. With this technology, two birds with one stone are killed at once: the search time decreases several times (the answer is obtained in a split second) and, thanks to the increase in sites, information is duplicated (data is not lost due to sudden breakdowns). The computers themselves with duplicate information make up the data center - this is a room with servers.

When a user of a search engine asks for a query, 20 out of 100 cases are ambiguous goals in the question. For example, if he writes the word "Napoleon" in the search line, then it is not yet known what answer he expects - a cake recipe or a biography of a great commander. Or the phrase "The Brothers Grimm" - fairy tales, films, musical group. In order to narrow such a possible fan of goals to specific answers in Yandex there is special technology Spectrum... It takes into account the needs of users using search query statistics. Of all the questions asked in Yandex by visitors, the Spectrum highlights various objects in them (names of people, titles of books, car models, etc.). These objects are divided into certain categories. At the moment there are more than 60 such categories. With the help of them, the search engine has in its base different meanings of words in user queries. Interestingly, these categories are periodically checked (analysis occurs a couple of times a week), which allows Yandex to more accurately answer the questions posed.

Based on the Spectrum technology, Yandex has organized dialog prompts. They appear under the search box, in which the user types his ambiguous query. This line reflects the categories to which the question object may belong. Further search results depend on the user's choice of such a category.

From 15 to 30% of all users of the Yandex search engine want to receive only local information (data from the region in which they live). For example, about new films in cinemas in your city. Therefore, the answer to such a request should be different for each region. In this regard, Yandex uses its technology search based on regions ... For example, the following answers can be received by residents who are looking for a repertoire of films in their Oktyabr cinema:

But this is the result the residents of the city of Stavropol will receive for the same request:

The user's region is determined primarily by his ip-address. Sometimes this data is not accurate, because a number of providers can work for several regions at once, and therefore change the ip-addresses of their users. In principle, if this happened to you, you can easily change your region in the settings in the search engine. It is listed in the upper right corner on the results page. You can change it.

Search engine Yandex ru - response results

When Metasearch has prepared an answer, the Yandex search engine should display it on the results page. It is a list of links to found documents with little information on each. The task of the technology for issuing results is to provide the user with the most relevant answers as informative as possible. The template for one such link looks like this:

Let's consider this form of the result in more detail. For search result title Yandex often uses the title of the page title (what optimizers write in the title tag). If it is not there, then words from the title of the article or post appear here. If the heading text is large, the search engine puts in this field its fragment that is most relevant to the given query.

It is very rare, but it happens that the header does not match the content of the request. In this case, Yandex forms its title for the search result using the text in the article or post. He will definitely have the query words.

For snippet the search engine uses all the text on the page. She selects all fragments where there is a response to the request, and then selects the most relevant of them and inserts links to the document into the form field. Thanks to this approach, a competent optimizer can remake it after seeing a snippet, thereby improving the attractiveness of the link.

For a better perception of the result for the user's request, headers are formatted as links in the text (highlighted in blue with underlining). For the attractiveness of the web resource and its recognition, a favicon is added - a small corporate site icon. It appears to the left of the text on the first line before the heading. All the words that were included in the request in the response are also highlighted in bold for ease of perception.

Recently, the Yandex search engine has added to the snippet various information, which will help the user find his answer even faster and more accurately. For example, if a user writes the name of an organization in his request, then in the snippet Yandex will add its address, contact phone numbers and a link to the location in geographic maps. If the search engine is familiar with the structure of a site in which there is a document with an answer for the user, he will certainly show it. Plus, Yandex can immediately add the most visited pages of such a web resource to the snippet, so that, if desired, the visitor can immediately go to the section he needs, saving his time.

There are snippets that contain the price of a product for an online store, a hotel or restaurant rating in the form of stars, and other interesting information with various numbers about objects in search documents. The task of such information is to give a complete list of data on those subjects or objects that are of interest to the user.

In general, already with various examples, the page with answers will look like this:

Ranking and assessors

Yandex's task is not only to search for all possible answers, but also to select the best (relevant) ones. After all, the user will not rummage through all the links that Yandex will provide him as a search result. The process of organizing search results is called ranking ... That is, it is the ranking that determines the quality of the proposed answers.

There are rules by which Yandex determines the relevant pages:

demotion in positions on the results page awaits sites that degrade search quality. Usually these are web resources whose owners are trying to trick the search engine. For example, these are sites with pages that contain meaningless or invisible text. Of course, it is visible and understandable to the search robot, but not to the visitor reading this document. Or sites that, when clicking on a link in the SERP, immediately transfer the user to a completely different site.
sites containing erotic content do not appear in the results or are greatly reduced in ranking. This is due to the fact that often such web resources use aggressive promotion methods.
sites infected with viruses are not reduced in the search results and are not excluded from the search results - in this case, the user is informed of the danger using a special icon. This is due to the fact that Yandex assumes that such web resources may contain important documents at the request of a search engine visitor.

For example, this is how Yandex will rank sites for the query "apple":

In addition to ranking factors, Yandex uses special samples with queries and answers to them, which search engine users consider the most appropriate. No machine can make such samples on this moment Is the prerogative of a person. In Yandex, such specialists are called assessors ... Their task is to fully analyze all search documents and evaluate the responses to the given queries. They choose the best answers and create a special training sample. In it, the search engine sees the relationship between relevant pages and their properties. With this information, Yandex can select the optimal ranking formula for each request. The method for constructing such a formula is called Matrixnet. The advantage of this system is that it is resistant to overfitting, which allows you to take into account a large number of ranking factors without increasing the number of unnecessary estimates and patterns.

At the end of my post, I want to show you some interesting statistics collected by the Yandex search engine in the course of its work.

1. The popularity of personal names in Russia and Russian cities (data taken from blogger and user accounts social networks in March 2012).

Great seer

In 1863, the great writer Jules Verne created his next book, Paris in the 20th century. In it, he described in detail the subway, car, electric chair, computer and even the Internet. However, the publisher refused to print the book and it lay for over 120 years until Jules Verne's great-grandson found it in 1989. The book was published in 1994.

Have long been an integral part of russian Internet... Search engines are now huge and complex mechanisms that represent not only a tool for finding information, but also attractive areas for business.

Most of the users of search engines have never thought (or thought, but did not find an answer) about the principle of work of search engines, about the scheme for processing user requests, about what these systems consist of and how they function ...

This master class aims to answer the question of how search engines work. However, you will not find here the factors that influence the ranking of documents. Moreover, you shouldn't count on a detailed explanation of the Yandex operation algorithm. He, according to Ilya Segalovich, director of technologies and development of the search engine "Yandex", can be recognized only "under torture" by Ilya Segalovich himself ...

2. Concept and functions of a search engine

A search engine is a software and hardware complex designed to perform searches on the Internet and responding to a user's request, specified in the form of a text phrase (search query), by issuing a list of links to information sources, in order of relevance (in accordance with the request). Major international search engines: Google , "Yahoo", "MSN". On the Russian Internet, these are Yandex, Rambler, and Aport.

Let's take a closer look at the concept of a search query using the Yandex search engine as an example. The search query should be formulated by the user in accordance with what he wants to find, as briefly and simply as possible. Let's say we want to find information in Yandex on how to choose a car. To do this, open home page "Yandex", and enter the text of the search query "how to choose a car." Further, our task is to open links to sources of information on the Internet provided at our request. However, it is quite possible not to find the information we need. If this happens, then either you need to rephrase your request, or there really is no relevant information on our request in the search engine database (this can be when setting very "narrow" queries, such as "how to choose a car in Arkhangelsk")

The primary task of any search engine is to deliver people exactly the information they are looking for. And to teach users to make "correct" requests to the system, ie. queries that follow the principles of search engines are not possible. Therefore, developers create algorithms and principles of search engines that would allow users to find the information they are looking for.

This means the search engine must "think" the way the user thinks when looking for information. When a user makes a request to a search engine, he wants to find what he needs as quickly and easily as possible. Having received the result, he assesses the work of the system, guided by several basic parameters. Did he find what he was looking for? If not, how many times did he have to rephrase the query to find what he was looking for? How relevant was he able to find information? How fast was the search engine processing the request? How convenient were the search results? Was the desired result the first or the hundredth? How much junk was found along with useful information? Will you find the information you need when you turn to a search engine, say, in a week, or in a month?

In order to satisfy all these questions with answers, search engine developers are constantly improving search algorithms and principles, adding new functions and capabilities, and trying in every possible way to speed up the system.

3. The main characteristics of the search engine

Let's describe the main characteristics of search engines:

Completeness
Completeness is one of the main characteristics of a search engine, which is the ratio of the number of documents found upon request to the total number of documents on the Internet that satisfy this request. For example, if there are 100 pages on the Internet containing the phrase “how to choose a car”, and only 60 of them were found for the corresponding query, then the search completeness will be 0.6. Obviously, the more complete the search, the less likely it is that a user will not find the document he needs, provided that it exists on the Internet at all.
Accuracy
Accuracy is another main characteristic of a search engine, which is determined by the degree to which the found documents match the user's request. For example, if the query “how to choose a car” contains 100 documents, 50 of them contain the phrase “how to choose a car”, and the rest simply contain these words (“how to choose the right radio tape recorder and install it in a car”), then the search accuracy is considered equal to 50/100 (\u003d 0.5). The more accurate the search, the faster the user will find the documents he needs, the less various kinds of "garbage" will be encountered among them, the less often the documents found will not match the request.
Relevance
Relevance is an equally important component of search, which is characterized by the time elapsing from the moment documents are published on the Internet until they are entered into the index base of the search engine. For example, the next day after an interesting news appeared, a large number of users turned to search engines with relevant queries. Objectively, less than a day has passed since the publication of news information on this topic, but the main documents have already been indexed and are available for search, thanks to the existence of the so-called "quick base" in large search engines, which is updated several times a day.
Search speed
Search speed is closely related to its resistance to stress. For example, according to Rambler Internet Holding LLC, today, during business hours, the Rambler search engine receives about 60 queries per second. Such workload requires a reduction in the processing time of an individual request. Here the interests of the user and the search engine coincide: the visitor wants to get results as quickly as possible, and the search engine must process the request as quickly as possible so as not to slow down the calculation of the following queries.
Visibility

4. Short story search engine development

In the initial period of the development of the Internet, the number of its users was small, and the amount of available information was relatively small. For the most part, only research workers had access to the Internet. At this time, the task of finding information on the Internet was not as urgent as it is now.

One of the first ways to organize access to information resources of the network was the creation of open catalogs of sites, links to resources in which were grouped according to subject. The first such project was the site Yahoo.com, which opened in the spring of 1994. After the number of sites in the directory increased significantly, the search option was added necessary information according to the catalog. IN full sense it was not yet a search engine, as the search area was limited only to the resources present in the directory and not to all Internet resources.

Link directories were widely used in the past, but have almost completely lost their popularity today. Since even modern catalogs, huge in their volume, contain information only about an insignificant part of the Internet. The largest directory of the DMOZ network (also called the Open Directory Project) contains information on 5 million resources, while the Google search engine base consists of more than 8 billion documents.

In 1995, the search engines Lycos and AltaVista appeared. The last for many years was a leader in the field of information search on the Internet.

In 1997, Sergey Brin and Larry Page created the Google search engine as part of a research project at Stanford University. Google is currently the most popular search engine in the world!

In September 1997, the Yandex search engine was officially announced, which is the most popular in the Russian-speaking Internet.

Currently, there are three main search engines (international) - Google, Yahoo, and with their own databases and search algorithms. Most other search engines (of which there are a large number) use the results of the three listed in one form or another. For example, AOL search (search.aol.com) uses a Google base, while AltaVista, Lycos and AllTheWeb use a Yahoo base.

5. The composition and principles of the search engine

In Russia, the main search engine is Yandex, then Rambler.ru, Google.ru, Aport.ru, Mail.ru. Moreover, at the moment, Mail.ru uses the mechanism and the search base of "Yandex".

Almost all major search engines have their own structure that is different from others. However, it is possible to single out the main components common to all search engines. Differences in the structure can only be in the form of the implementation of mechanisms for the interaction of these components.

Indexing module

The indexing module consists of three auxiliary programs (robots):

Spider (spider) - a program designed to download web pages. Spider provides page download and retrieves all internal links from this page. The html-code of each page is downloaded. Robots use HTTP protocols to download pages. The "spider" works as follows. The robot sends the “get / path / document” request and some other HTTP request commands to the server. In response, the robot receives a text stream containing service information and the document itself.

Page url
the date the page was downloaded
server response http header
page body (html-code)

Crawler ("traveling" spider) - a program that automatically crawls all the links found on the page. Highlights all links present on the page. Its task is to determine where the spider should go next, based on links or based on a predefined list of addresses. Crawler, following the links found, searches for new documents that are still unknown to the search engine.

Indexer is a program that analyzes web pages downloaded by spiders. The indexer parses the page into its component parts and analyzes them using its own lexical and morphological algorithms. Various page elements are analyzed, such as text, headings, links, structural and style features, special service html tags, etc.

Thus, the indexing module makes it possible to crawl a given set of resources by links, download the pages encountered, extract links to new pages from the received documents and perform a complete analysis of these documents.

Database

A database, or an index of a search engine, is a data storage system, an information array that stores specially converted parameters of all documents downloaded and processed by the indexing module.

Search Server

The search server is an essential element of the entire system, since the quality and speed of search directly depends on the algorithms that underlie its functioning.

The search engine works as follows:

The request received from the user is subjected to morphological analysis. The information environment of each document contained in the database is generated (which will subsequently be displayed in the form, that is, corresponding to the request text information on the search results page).
The received data is transmitted as input parameters special ranking module. The processing of data for all documents takes place, as a result of which, for each document, its own rating is calculated, which characterizes the relevance of the query entered by the user, and the various components of this document, stored in the search engine index.
Depending on the user's choice, this rating can be adjusted. additional conditions (for example, the so-called "advanced search").
Next, a snippet is generated, that is, for each found document, the title, a short annotation that best matches the request and a link to the document itself are extracted from the document table, and the found words are highlighted.
The resulting search results are transmitted to the user in the form of a SERP ( Search Engine Result Page) - search results pages.

As you can see, all these components are closely related to each other and work in interaction, forming a clear, rather complex mechanism for the search engine operation, which requires huge resources.

6. Conclusion

Now let's summarize all of the above.

The primary task of any search engine is to deliver people exactly the information they are looking for.
The main characteristics of search engines:
1. Completeness
2. Accuracy
3. Relevance
4. Search speed
5. Visibility
The first full-fledged search engine was the WebCrawler project, published in 1994.
The search engine includes components:
1. Indexing module
2. Database
3. Search Server

We hope that our master class will allow you to get a closer look at the concept of search engines, to better know the main functions, characteristics and the principle of operation of search engines.

Hello dear friends! In this article, we will continue to consider the Yandex search engine, and as you remember, in past articles, the history of the creation of this great company, which ranks first among competitors in Russia and not only, was considered.

All this is good, but beginners and experienced site builders are interested in the most main question, of course, related to how to bring your projects to the first places of the TOP results.

Therefore, let's look at how the Yandex search engine works in order to understand what kind of rake you can step on, and what you should expect from a Russian search engine.

In the last article, we discussed with you. The topic turned out to be quite interesting and useful. Therefore, I decided to supplement it, deepen it, so to speak.

So, probably, with the question "Why does the search engine index documents" I got excited - this is a no brainer. It remains to find out the question "how".

Website ranking algorithms

First, let's get acquainted with some algorithms that are fundamental to any search engine:

- Direct search algorithm.

What is it - you remember that you read a wonderful story in one of the books. And you start looking in turn. They took one book - leafed through - did not find, took another ... The principle is clear, but this method is extremely long. This is also understandable.

- Reverse search algorithm.

For this algorithm, a text file is created from every page of your blog. This file lists in alphabetical order ALL the words you have used. Even the position of this word in the text is indicated (coordinates in the text).

This is enough quick way, but already the search occurs with some kind of error.

The main thing here is to understand that this algorithm is not looking for the Internet, not with a blog search. And in a separate text file that was created a long time ago. When the robot came to you. And these files (reverse indexes) are stored on Yandex servers.

So, these were the basic search algorithms. Those. how Yandex simply finds the documents it needs. There shouldn't be any problems with this.

But Yandex knows more than one documents, and not even 100, but according to the latest data from my sources - Yandex knows about 11 billion documents (10,727,736,489 pages).

And among all this quantity, you need to select documents that are suitable for the request. And more importantly, you need to somehow rank them. Those. rank according to the degree of importance, or rather, according to the degree of usefulness to the reader.

Search Mathematical Models

To solve this issue, mathematical models come to the rescue. We will now talk about the simplest models.

Boolean mat. Model - If the word occurs in the document, the document is considered found. Just a coincidence and nothing complicated.

But there are problems here. For example, if you, as a user, enter some popular word, or even better the preposition "v", which is the most common word in Russian and is found in EVERY document, then you will be given so many results that you do not even realize such a number, how many documents did you find. Therefore, the following mate model appeared.

Vector mat. Model - this model determines the "weight" of the document. Not only does a coincidence occur, but this word must also appear several times. Moreover, the more a word occurs, the higher the relevance (correspondence).

It is the vector model that ALL search engines use.

Probabilistic model - more complex. The principle is this: the search engine found the page reference itself. For example, you are looking for information about the history of Yandex. Yandex has some kind of standard, let's say it will be my previous article about Yandex.

And he will compare all other documents with this article. And the logic here is this: the more your blog page looks like my article, the MORE LIKELY the fact that your blog page will also be useful to the reader and also tells about the history of Yandex.

To reduce the number of documents that need to be shown to the user, the concept of relevance was introduced, i.e. compliance.

How well your blog page really matches the topic. This is an important topic when it comes to search quality.

Assessors - who they are and what are they responsible for

This relevance is also needed to assess the quality of the algorithms.

For this there is a special forces headquarters - they are called Assessors. it special peoplewho browse search results with their hands.

They have instructions on how to check sites, how to rate, etc. And they determine with their hands your pages fit in order search queries or doesn't fit.

And the quality of search algorithms depends on the opinion of the assessors. If all the assessors say that the search results do not match the queries, then the ranking algorithm is incorrect, and here only Yandex is to blame.

If the assessors say that only one site does not match the request, it means that the site flies away somewhere far and goes down in the search results. More precisely, not the entire site, but only one article, but this is not the essence.

Of course, assessors cannot view and evaluate ALL articles with their hands and eyes. Well this is understandable.

And other parameters come to the rescue, according to which the ranking of pages passes.

There are a lot of them, for example:

page weight (VIC, PageRank, tumblers generally);
domain authority;
the relevance of the text to the request;
the relevance of the texts of external links to the request;
as well as many other ranking factors.

The assessors make comments, and the people who are responsible for setting the mathematical ranking model, in turn, edit the formula, as a result of which the search engine works better.

The main criteria for evaluating the work of the formula:

1. Accuracy of search engine results - percentage of documents that match the request (relevant). Those. the fewer pages not matching the request are present, the better.

2. Completeness of the search engine results is the ratio of relevant web pages for a given query to the total number of relevant documents in the collection (a set of pages in the search engine).

For example, if there are more relevant pages in the entire collection than in the search results, then this means that the search results are incomplete. This was due to the fact that some of the relevant web pages fell under the filter.

3. Relevance of search engine results is the correspondence of the web page to what is written in the snippet. For example, a document may be very different or may not exist at all, but it may be present in the output.

The relevance of the issue directly depends on how often the search robot scans documents from its collection.

Collection collection (indexing of site pages) is carried out by a special program - a search robot.

The search robot receives a list of addresses for indexing, copies them, then the contents of the copied web pages are sent for processing to an algorithm that converts them into reverse indexes.

Well, here "in a nutshell", if I may say so, we discussed the principles of the search engine.

Let's summarize:

A search robot comes to your blog.
The search robot stores the reverse index of the page for later search.
With the help of a mathematical model, the document is processed and displayed in the search results according to the formulas and taking into account the opinion of the assessor.

This is very, very simplified. Just to get a basic understanding of the Yandex search engine.

I have now written so much text, and perhaps so many things are not clear. Therefore, I suggest you return to this article a little later and watch this video.

This is an excellent guide that I used to study at one time.

Hopefully this information will help you better understand why any of your sites are in relevant search positions and do everything to improve them.

On this I say goodbye to you, if you have any questions, I am always happy to answer them in the comments. Or maybe you want to supplement the article?

In any case, give your opinion. !

Yandex, today, is the most popular search engine in Russia. Service statistics LiveInternet, shows the share of Yandex in the mass of the all-Russian audience - it is 53.4%, if we take into account only Moscow and the region, then even higher - 67.9% (Moscow, according to requests, occupies more than 50% of the whole of Russia).

The website www.yandex.ru was created in 1997, it needed only one server, which was located under the desktop of one of the group of the first Yandex developers, Dmitry, by the name of Teiblyum. Very quickly after the opening, we acquired a second server, and soon, when it was necessary to install another one, it became clear that there was enough space under the table either for three Yandex servers, or [...]

Search engine developers strive to give users the best answers to their queries. Sometimes this answer may be a number (for example, the weather in a city), a picture (for example, an address on a map), a translation of a word, or a quatrain. When there is a suitable array of information at hand, the answer can be given immediately. Therefore, Yandex supplements its search results on the Internet with answers from its [...]

Approximately every tenth request to Yandex is "navigational", that is, it consists of the name of an organization or a site and the user wants to go to the site of this organization. In this case search bar Yandex is used instead of the address bar of the browser, and the other nine search results are usually not of interest to the user. Without distracting the user from the main goal, we added after the main, [...]

The main task of a search engine is to answer a user's question. When a user sets a query, the search engine does not refer to every site on the Internet, but searches through the database of pages known to it - the search index. There she finds all the pages with words from the query. The user sees links to these pages on the search results pages.

As we can see, Yandex is not standing still, and I am sure that the search technologies of this system will continue to develop in order to improve the quality of search, which is difficult to call ideal.

On November 10, 2009 Yandex announced a new version of the search algorithm - Snezhinsk. Fundamental changes took place in the algorithm for calculating relevance - Yandex representatives wrote the following: “We managed to create a more accurate and much more complex mathematical model, which led to a significant increase in the quality of search. Thanks to the redesign of the ranking architecture in search, it was possible to implement the accounting of several thousand [...]

Testing new version Yandex algorithm began on July 9, 2008. According to Yandex, "the main changes in the program are associated with a new approach to machine learning and, as a result, differences in the way ranking factors are taken into account in the formula."

On April 14, 2008 at buki.yandex.ru a new one was tested search algorithm "Magadan". In addition to doubling the number of ranking factors, the following innovations were also added:

Before embarking on the algorithmic jungle, let's remember how the search engine works in general. The logical structure of the search engine can be represented in the form of three modules (see diagram) Robot (crawler, crawler) - special programwhich bypasses Internet sites and downloads their content. The robot has a special schedule according to which it makes its rounds. Site pages loaded by a robot, a special [...]

66. What has a big influence: a link from a free platform (blogspot, livejournal, etc.) or from a standalone site / blog? Free platforms carry less weight than offline sites. However, the impact could be greater. This is due to many factors: the current anchor list, the state of the sites being compared, etc. It is impossible to give an unambiguous answer to this question. 67. The greatest weight is transferred between […]

Vamana Tour - travel, flights and visas around the world and to India, Nepal, Sri Lanka, the Maldives, Mauritius and many other places on the planet. Advice for travelers and pilgrims. How to get the most out of your trip. Amazing historical chronicles and stories of experienced travelers.

What is the account of external links to the site used for? As you can see from the previous section, almost all factors influencing the ranking are under the control of the page author. Thus, it becomes impossible for a search engine to distinguish a truly high-quality document from a page created specifically for a given search phrase or even a page generated by a robot and does not contain useful information at all. […]

Popular in the category: