home › Working with video › Professional search for information on the Internet. Overview of programs for searching documents and data Search for professional information on the Internet

Professional search for information on the Internet. Overview of programs for searching documents and data Search for professional information on the Internet

Checking a nickname on dozens of services at a time, counting Facebook shares and visualizing the links of a Twitter account.

Social media content analysis is a hot topic among startups. There are more and more services for finding posts and people every year. But many of them either disappear quickly, are available in an unfinished state, or are expensive to use.

This material contains the few of them that allow you to quickly and free of charge to get really useful or just interesting information.

1. Search for profiles

Search system Snitch allows you to search for human profiles in four dozen services, including the websites of the world's leading universities and the base of US criminals:

Unfortunately, some of the sites for which you can check the boxes no longer work. For example, Google Uncle Sam, closed 5 years ago. But despite this and other jambs, Snitch is a useful service that allows you to significantly save time when looking for information about a person.

If for some service instead of blocks with search results a blank screen is displayed, then to view them you need to follow the link Open a new window:

2. Search for hashtags

It's very easy to use. It is necessary to drive the desired hashtag into the search form and in a second a list of recent entries marked by it in six social networks will appear:

3. Analysis of recent tweets

The service allows you to get a list of the last hundred tweets containing the desired word, hashtag or account name. And also find out some analytical information about the people who made these tweets and the time of their creation:

Let's say you need to identify which user triggered an unusually high number of clicks on an article from Twitter. We look at the last 100 tweets and see which of the people who mentioned the original concept has the most followers:

Owners paid subscription a large number of tweets are available for analysis:

4. Analysis of Twitter account

On Mentionapp you can enter the name of the account and get information about it (who most often retweets, what hashtags it uses, etc.) in the form of a connection diagram:

5. Search for tweets on the map

If you click anywhere on the map on, you can read the latest tweets made nearby:

6. Number of mentions in social networks

Sharedcount helps to assess the popularity of an article / site in social networks. You type in the URL and in a couple of seconds you have statistics of mentions on Facebook, Google+, Pinterest, Linkedln and Stumble Upon:

7. Search forums

Boardreader is a search engine for the forum and message boards:

Assessment of the scale of the disaster showed that there are almost 4 responses per one inhabitant of Russia on this portal.

8. Punching login on social networks

We go to knowem.com and fill in the nickname of the person. In response, we receive information about the services on which it is registered:

9. Determine the name of the person by email

If you are still looking for people by typing their email addresses into Google, then you should abandon this method. After all, there is pipl.com. You type in your email (nickname) and we get a list of profiles in social networks:

The information is not always accurate and complete, but the service is extremely useful.

That's all. It was worthwhile to tell more about Socialmention (unfinished analysis of reviews), Yomapic (search for photos from VK and Instagram on the map) and yandex.

By the middle of 2015, the global Internet network had already connected 3.2 billion users, that is, almost 43.8% of the world's population. For comparison: 15 years ago, only 6.5% of the population were Internet users, that is, the number of users increased more than 6 times! But more impressive are not quantitative, but qualitative indicators of the expansion of the introduction of Internet technologies in various areas of human activity: from global communications of social networks to household Internet things. Mobile Internet provided the opportunity for users to be online outside the office and at home: on the road, outside the city in nature.
Currently, there are hundreds of systems for finding information on the Internet. The most popular of them are available for the vast majority of users because they are free and easy to use: Google, Yandex, Nigma, Yahoo !, Bing ... More advanced users are offered "advanced search" interfaces, specialized searches "on social networks" , on news streams and ads for purchase and sale ... But all these wonderful search engines have a significant drawback, which I already noted above as a virtue: they are free.
If investors invest billions of dollars in the development of search engines, then a quite pertinent question arises: where do they earn money?
And they make money, in particular, by providing to user requests not so much the information that would be useful from the user's point of view, but that which the owners of search engines consider useful to the user. This is done by manipulating the order of issuing lists of responses to user search queries. Here and open advertising of certain Internet resources, and covertly manipulating the relevance of responses based on the commercial, political and ideological interests of search engine owners.
Therefore, among professional specialists in searching for information on the Internet, the problem of the pertinence of search engine results is very relevant.
Pertinence is the correspondence of documents found by the information retrieval system to the information needs of the user, regardless of how fully and how exactly this information need is expressed in the text of the information request itself. This is the ratio of the amount of useful information to the total amount of information received. Roughly speaking, this is search efficiency.
Specialists who carry out a qualified search for information on the Internet need to make some effort to filter search results, filtering out unnecessary information "noise". And for this, professional-level search tools are used.
One of these professional systems is the Russian program FileForFiles & SiteSputnik (SiteSputnik).
Developer Alexey Mylnikov from Volgograd.

"The FileForFiles & SiteSputnik program (SiteSputnik) is designed to organize and automate professional search, collection and monitoring of information posted on the Internet. Special attention is paid to receiving incoming new information on topics of interest. Several functions of information analysis have been implemented."

Monitoring and categorization of information flows

First, a few words about monitoring information flows, a special case of which is monitoring of media and social networks:

user specifies Sources that may contain necessary information, and the Rules for the selection of this information;

the program downloads fresh links from Sources, frees their content from garbage and repetitions and organizes them into Categories according to the Rules.

To see live a simple but real monitoring process, in which 6 sources and 4 headings are involved:
open the Demo version of the program;

further, in the window that appears, click on the button Together;

and when WebsiteSputnik this Project will be executed in real time, you:
- in the "Pure Stream" list, you will see all the new information from Sources,
- in the "Post-request" section - only economic and financial news that satisfy the rule,
- in the headings "About the President", "About the Prime Minister" and "Central Bank" - information related to the relevant objects.

In real Projects, you can use almost any number of Sources and Headings.
Your first working Projects can be created in a few hours, their improvement - in the process of operation.
The described information processing is available in the SiteSputnik Pro + News package and above.

2. Simple and batch search, collection of information

To familiarize yourself with the possibilities SiteSputnik Pro(basic version of the complete set of the program) :

open the Demo version of the program;

enter your first request, for example, your full name, as I did:
and click on the button Search.

The program (see the plate built by SiteSputnik) will interrogate in a few seconds 7 sources, will open in them 24 search pages, will find 227 relevant links, removes re-encountered links and of the remaining 156 unique links will make a list "Union".

Name
Source
Ordered
pages
Downloaded
pages
Found
links
Time
search
Efficiency
search
Links
New
Efficiency
New
Yandex 5 5 50 0:00:05 32% 0 0
Google 5 5 44 0:00:03 28% 0 0
Yahoo 5 5 50 0:00:05 32% 0 0
Rambler 5 4 56 0:00:07 36% 0 0
MSN (Bing) 5 3 23 0:00:04 15% 0 0
Yandex.Blogs 5 1 1 0:00:01 1% 0 0
Google Blogs 5 1 3 0:00:01 2% 0 0
Total: 35 24 227 0:00:26 — 0 0
Total: the number of unique links - 156 duplicate links - 46 %.

(! ) Repeat your request in a few hours or days, and you will see in a separate list only new links that appeared in the SERPs for this period of time. In the last two columns of the table, you can see how many new links each Source brought and its efficiency in terms of "novelty". When executing a query multiple times, a list containing only new links , is created relative to all previous executions of this request. Seemingly elementary and desired function, but the author is not aware of any program in which it is implemented.

(!! ) The described capabilities are supported not only for individual requests, but also for whole request packages :
The package you see consists of seven different queries collecting information about Vasily Shukshin from several Sources, including search engines, Wikipedia, exact search in Yandex news, metasearch and search for references on TV and radio stations. Into the script TV and Radio includes: "Channel One", "TV Russia", NTV, RBK TV, "Echo of Moscow", radio company "Mayak", ... and other sources of information. Each Source has its own search or viewing depth in the pages. It is listed in the third column.
Batch search allows a comprehensive one-click search collection of information on a given topic.
Separate list new links, on repeated executions of the package, will contain only previously not found links.
Remember what and when you asked the Internet and what he answered you no need- everything is automatically saved in the libraries and in the databases of the program.
I repeat that the capabilities described in this paragraph are fully included in the package. SiteSpunik Pro.

Read more in the instructions: SiteSputnik Pro for beginners.

Name Source	Ordered pages	Downloaded pages	Found links	Time search	Efficiency search
*Yandex*	5	5	50	0:00:05	32%
*Google*	5	5	44	0:00:03	28%
*Yahoo*	5	5	50	0:00:05	32%
*Rambler*	5	4	56	0:00:07	36%
*MSN (Bing)*	5	3	23	0:00:04	15%
*Yandex.Blogs*	5	1	1	0:00:01	1%
*Google Blogs*	5	1	3	0:00:01	2%
*Total:*	35	24	227	0:00:26	—

3. Objects and search monitoring

Quite often, the User faces the following task. You need to find out what is on the Internet about a specific object: a person or a company. For example, when hiring a new employee or when a new counterparty appears, you always know the full name, company name, phone numbers, TIN, OGRN or OGRNIP, you can also take ICQ, Skype and some other data. Further, using a call to a special function of the program WebsiteSputnik "Collecting information about the object" (equipment SiteSputnik Pro + Objects):
You enter the data that you know, and with one click of the mouse perform accurate and full search for links containing specified information. The search is performed on several search engines at once, for all the details at once, for several possible combinations of recording details at once: remember how you can write down a phone number in different ways. After a certain period of time, without performing boring routine work, you will receive a list of links, cleared of repetitions and, most importantly, sorted by relevance for the desired object. Relevance (significance) is achieved due to the fact that the first links in the SiteSputnik issuance will go to the links on which large quantity the requisites you have specified, and not those that have moved up the search engine results of the Webmaster.
Important .
The SiteSputnik program is better than other programs to extract real, but not the official information about the Object. For example, in the official database cellular operator it can be written that the phone belongs to Vasily Terekhin, but in reality this phone contains information that Alexander was selling a Ford Focus car in 2013, which is additional information for thought.
Search monitoring .
Search monitoring means the following. If you want to track the appearance new links, for a given object or arbitrary package of requests, then you just need to periodically repeat the corresponding search. Just like for a simple query, SiteSputnik program will create a "New" list, in which it will place only those links that were not found in any of the previous searches.
Search monitoring interesting not only in itself. It can be used in monitoring of media, social networks and other news sources, which was mentioned above in paragraph 1. Unlike other programs, in which it is possible to remove new information only from RSS feeds, the program WebsiteSputnik can be used for this embedded searches and search engines ... Also possible emulation(self-creation) several Rss feeds from arbitrary pages, moreover, emulation of an RSS stream on request and even a batch of requests.

To get the most out of the program, use its main features, namely:

request packages, packages with parameters, use the Assembler (collector), the "Analytic union" operation of the results of several tasks, if necessary, apply the basic search functions on the invisible Internet;

connect your sources to the information sources built into the program : other search engines and embedded searches, existing rss feeds you create own rss feeds with arbitrary pages, use the search function for new sources;

use the possibilities of the following types monitoring: Media, social networks and other sources, monitoring comments to news and messages, track the appearance of new information on existing pages;

engage Categories , External Features, Task Scheduler, Mailing List, Multiple Computers, Project Instructor, Install alarm To notify you of the occurrence of significant events, use the other functions listed below.

4. SiteSputnik program (SiteSputnik): equipment options and functions

- Program SiteSputnik is constantly improving in the direction of: "I need to find everything and with a guarantee".
"A program for interrogating the Internet", is another definition of the User for assigning the program.
A. Search and collection functions.
. Request package - execution of several queries at once, combining search results or separately. When forming the combined result, the newly found links are deleted. More information about packages - in the introduction to SiteSputnik, clearly - in the video: a joint and separate execution of requests. There are no analogues in domestic and foreign developments.
. Parameter packages. Any queries and query packages designed to solve standard search tasks, for example, search by phone, full name or e-mail, - can be parameterized, saved and executed from a library of ready-made queries with the substitution of actual (required) parameter values. Each parameter package has its own special extended search form ... You can use not one, but several search engines in it. It is possible to create forms that are very complex in their functional purpose. It is extremely important that shape can be created by the users themselves, without the participation of the author of the program or programmer. It is very simply written about this in the instructions, in more detail in a separate publication on the parameterization of search and on the forum, clearly in the video: search at once for all options for recording a number mobile phone and for several variants of address recording Email... There are no analogues.
. Assembler NEW- assembly of a search task from several ready-made : requests, request packages and parameter packages. Packages can contain other packages in their text. The nesting depth of packages is unlimited. You can create several search tasks, for example, about several legal and individuals, and complete these tasks at the same time. More details on the forum and in a separate publication about Assembler, clearly at video... There are no analogues.
. Metasearch - execution of a specific query simultaneously at a given "depth" of search for each of them. Metasearch is possible by built-in search engines, which include Yandex, Rambler, Google, Yahoo, MSN (Bing), Mail, Yandex and Google blogs, and by connected search tools. Working with multiple search engines looks like you are working with one search engine ... The rediscovered links are deleted. A clear metasearch for three connected social networks: VKontakte, Twitter and Youtube, is shown on video.
. Site metasearch - combining site search in Google, Yahoo, Yandex, MSN (Bing). Visually on video.
. Metasearch in office documents - combining search in PDF, XLS, DOC, RTF, PPT, FLASH files in Google, Yahoo, Yandex, MSN (Bing). You can choose any combination of file formats.
. Metasearch cache copies links in Yandex, Google, Yahoo, MSN (Bing). A list is compiled, in each paragraph of which all the snippets found for each link by each search engine are collected. There are no analogues.
. Deep search for Yandex, Google and Rambler allows you to combine into one list all links from regular search and all links, respectively, from the lists "More from the site", "Additional results from the site" and "Search on the site (Total ...)". Learn more about deep search on the forum. There are no analogues.
. Exact and complete search ... This means the following. On the one hand, each request can be executed on that and only on the source in the request language of which it is written. it exact search... On the other hand, there can be any number of such requests and sources. This ensures full search... More details in a separate publication on procedural search. There are no analogues.
. Search on the invisible internet .
It includes the following basic functions:
A special package of requests that can be improved by the User,
- search for invisible links using a spider (spider),
- search for invisible links in the vicinity of a visible link or folder by "image and likeness",
- special searches for open folders,
- search for invisible links and folders with standard names using special dictionaries,
- using your own built-in searches.
More details in a separate publication on SiteSputnik Invisible. Basic functions "are well known in narrow circles", but the way of their application has no analogues. The essence of this method is to build a sitemap visible from the Internet (in other words, materialization of the visible Internet), and only on the basis of visible links and relative to them, search for invisible links. The search for already visible links by "invisible" methods is not carried out.
B. Information monitoring functions.
. Monitoring for the appearance on the Internet new links on a given topic. Monitor the appearance new links using whole request packages that use any of the search methods mentioned above, and not the individual first pages of search engines. Union and intersection implemented new links from multiple separate searches. More details in the publication on monitoring (see § 1) and on the forum. There are no analogues.
. Collective information processing ... Creation corporate or professional network for collective collection, monitoring and analysis of information. The participants and creators of such a network are employees of the corporation, members of the professional community or interest groups. The geographic location of the participants does not matter. More details in a separate publication on organizing a network for collective collection, monitoring and analysis of information.
. Monitoring links (web pages) in order to detect changes in their content (content). Beta version. Found changes are highlighted with color and special characters. More details in a separate publication on monitoring (see § 2 and 3).
V. Information analysis functions.
. Materials categorization already described above. More details - in a separate publication on Headings. Rules for hitting Categories allow you to specify keywords and the distance between them, set logical "AND", "OR" and "NOT", apply a multilevel parenthesis structure and dictionaries (insert files) to which you can apply logical operations.
. VF technology - almost arbitrary expansion of the possibility of rubricating materials through the implementation of external functions that are organically built into the Rules for entering the Rubrics and can be implemented by the programmer independently without the participation of the author of the program.
. Numerical analysis occupancy of Headings, installation alarms and notification of the occurrence of significant events by highlighting the Headings in color and / or sending an alarm report by e-mail.
. Actual relevance. It is possible to arrange links in order close to significance of these links in relation to the problem being solved, bypassing the tricks of webmasters who use different ways increasing the ranking of sites in search engines. This is achieved by analyzing the results of several "diverse" queries on a given topic. Calculated, in the literal sense of the word, links containing maximum required information ... More details in the description of the method for finding the optimal supplier and on the forum. There are no analogues.
. Calculating Object Relationships - search for links, resources (sites), folders and domains on which objects are simultaneously mentioned. The most common objects are people and businesses. To search for connections, all program tools mentioned on this page can be used. SiteSputnik, which significantly increases the efficiency of your work. The operation is performed on any number of objects. More details in the introduction to the program, as well as in the description new function"objects and their connections". There are no analogues.
. Formation, unification and intersection of information flows on a variety of topics, stream matching. More details in a separate threading post.
. Building web maps sites, resources, folders and searched objects based on links found on the Internet using Google, Yahoo, Yandex, MSN (Bing) and Altavista links belonging to the site. Experts can find out: is it visible "superfluous" information from the Internet on their sites, as well as research the sites of competitors on this subject. A web sitemap is materialization of the visible internet ... More details in a separate publication on building web maps, clearly at video... There are no analogues.
. Search for new sources of information on a given topic, which can then be used to track the emergence of new necessary information. More details at.
G. Service functions.
. Scheduler provides work Scheduled: performs the specified program functions at a specified time. More details in a separate publication on the Planner.
. Project Instructor NEW is an assistant for creation and maintenance Projects for the search, collection, monitoring and analysis of information (heading and signaling). More details on the forum.
. Automatic archiving. V databases all the results of your work are automatically remembered, namely: queries, query packages, search and monitoring protocols, any other functions listed above and the results of their execution. Can structure work on topics and subtopics.
. Database includes sorts, simple search, and arbitrary SQL search. For the latter, there is a wizard for writing SQL queries. Using these tools, you can find and familiarize yourself with the work that you did yesterday, last month, a year ago, define a topic as a search criterion or set another search criterion on the content of the database.
. Technical constraints search engines. Some restrictions, such as the length of the query string, can be overcome. Provides the execution of not one, but several queries with the combination of search results or separately. You can read about a way to overcome the violation of the law of additivity for the main search engines. For one word or one phrase in quotation marks, a case-sensitive search in search engines, in particular, an abbreviation search, has been implemented.
Built in browser . Navigator through the pages. Multicolor marker to highlight keywords and arbitrary words. Bilisting and N-listing from generated documents.
. Unloading news feeds into a tabular view focused on import in Excel, MySQL, Access, Kronos and other Applications.

5. Installing and running the Program, computer requirements.

To install and run the program:

Download the file, copy the FileForFiles folder from it to your hard drive, for example, to D: \;

Demo version of the program will be installed and will open.

The program will work on any computer on which it is installed Windows any version.
Alexey Kutovenko

Professional search in the Internet

Introduction

Internet search is an important element of the web. Hardly anyone knows the exact number of web resources on the modern Internet. In any case, the bill goes into the billions. In order to be able to use the information necessary at a given moment, no matter whether it is for work or entertainment purposes, you first need to find it in this constantly replenished ocean of resources. This is not an easy task at all, since information on the modern Web is not structured, which creates problems of finding it. It is no coincidence that the peculiar "windows" into this information space Internet search engines have become.

There are hardly any Internet users who have never used large universal search engines. The names Google, Yandex and a couple of other large machines are on everyone's lips. They are excellent at coping with the daily tasks of the Internet search, and often users do not even bother looking for a replacement. At the same time, the number of Internet search engines nowadays is in the thousands. The reasons for this variety of alternative machines have different roots. Some projects are trying to compete directly with the leaders of the global market through careful work with national Internet resources. Others offer query-composing capabilities not found in well-known search engines. A significant number of alternative machines specialize in searching for a specific thematic area or a specific type of content, achieving impressive results in solving these problems. Be that as it may, the inclusion of such search engines in the user's own arsenal of Internet search tools can significantly improve its quality. Here, however, there is one nuance: you need to know about such machines and be able to use their capabilities.

We assume that the readers of this book are already quite familiar with the technique of searching with the help of universal search engines. So good that they felt the limitations associated with their use. Most likely, such people have already tried to search for and use certain additional tools. The printed word does not ignore the topic of Internet search: both articles periodically appear, and books are published. But their heroes, as a rule, are the same - several leading universal search engines. Our book is different in that it attempts to cover the entire spectrum of modern search solutions. Here you will find descriptions and recommendations for using the best modern services focused on solving the most common search problems. This book is for people who work a lot on the Internet and use the Web to find the information they need - be it business, study or hobby.

For internet searches to be successful, two conditions must be met: queries must be well-formulated and must be asked in the right places. In other words, the user is required, on the one hand, the ability to translate his search interests into the language of the search query, and on the other hand, a good knowledge of search engines, available search tools, their advantages and disadvantages, which will allow him to choose the most suitable search means in each case. ...

Currently, there is no single resource that meets all the requirements for Internet search. Therefore, a serious approach to search inevitably has to use different tools, using each in the most appropriate case.

There are many search tools available. They can be combined into several groups, each of which has certain advantages and disadvantages. The chapters of our book are devoted to the main groups of modern Internet search engines.

Chapter 1, "Universal Internet Search Engines", is devoted to large universal information retrieval systems on the Web. The main emphasis is on their most modern instruments, which usually fall out of sight of the general public. An overview of the capabilities of known machines gives us a kind of starting point and allows us to clearly imagine the scope of application of alternative search solutions.

Chapter 2, "Vertical Search," talks about systems that specialize in specific subject areas or specific types of content.

Chapter 3, "Metasearch", discusses meta search engines, capable of sending a request simultaneously to several Internet search engines, and then collecting and processing the results in a single interface.

Chapter 4, "Semantic and Visual Internet Search Engines," provides an overview of experimental systems that offer original user interfaces as well as interesting approaches to processing requests.

Chapter 5, "Recommender Engines," discusses the newly emerging search engine, aptly named Discovery Engines, or Discovery Engines. With their help, you can process a number of queries that are too tough for other types of Internet search engines.

If no finished product suits you, you can create your own internet search engine. Chapter 6, Personal Search Engines, is devoted to the creation of such personal machines.

Several chapters of our book are devoted to finding different types network content. Chapter 7, “Image Search,” discusses current trends in Internet image searches, as well as the capabilities of corresponding experimental systems. Chapter 8, "Video Search," provides an overview of the video search tools of the leading universal Internet search engines, as well as the best specialized systems in this area.

Chapter 9, Finding Hidden Content, provides an overview of systems that allow you to search for content that general search engines do not see. Such "hidden" content includes, for example, torrents or files hosted on FTP servers and file hosting.

Chapter 10, “Searching for the Web 3.0,” introduces Internet searches for data in “Semantic Web” formats.

The search does not end with just getting results from one or another search engine. The last chapter of our book, Chapter 11, "Helper Programs", is devoted to the tools for processing and saving the results.

Before starting a story about specific products, it makes sense to understand the classification modern means Internet search, as well as define the terms that are constantly found on the pages of our book.

Basic Internet search tools can be divided into the following main groups:

Search engines;

Web directories;

Help resources;

Local programs for searching the Internet.

The most popular search engine is search engines - the so-called Internet search engines (Search Engines). The three leaders globally are quite stable - Google, Yahoo! and Bing. Many countries add their own local search engines to this list, which are optimized for local content. With their help, theoretically, you can find any specific word on the pages of many millions of sites.

Despite many differences, all Internet search engines work on similar principles and, from a technical point of view, consist of similar subsystems.

The first structural part of the search engine is special programs used for automatic search and subsequent indexing of web pages. Such programs are commonly referred to as spiders, or bots. They scan the code of web pages, find links located on them, and thereby discover new web pages. There is alternative way inclusion of the site in the index. Many search engines offer resource owners the opportunity to independently add a site to their database. Be that as it may, then the web pages are downloaded, analyzed and indexed. In them, structural elements are highlighted, keywords are found, their links with other sites and web pages are determined. Other operations are also performed, the result of which is the formation of the index base of the search engine. This base is the second main element of any search engine. Currently, there is no one absolutely complete index base that would contain information about all the content of the Internet. Since different search engines use different programs to search web pages and build their index using different algorithms, the index bases of search engines can vary significantly. Some sites are indexed by several search engines, but there is always a certain percentage of resources included in the base of only one search engine. The fact that each search engine has such an original and non-overlapping part of the index allows us to make an important practical conclusion: if you use only one search engine, even the largest, you will definitely lose a certain percentage of useful links.

Professional search on the Internet requires specialized software, as well as specialized search engines and search services.

PROGRAMS

http://dr-watson.wix.com/home - the program is designed to explore arrays text information in order to identify entities and relationships between them. The result of the work is a report on the object under study.

http://www.fmsasg.com/ - one of the world's best communication and relationship visualization programs Sentinel Vizualizer. The company has completely Russified its products and has connected a hotline in Russian.

http://www.newprosoft.com/ - “Web Content Extractor” is the most powerful, easy-to-use web site data extraction software. Also has an effective Visual Web spider.

SiteSputnik – a unique software package in the world that allows you to search and process its results in the Visible and Invisible Internet, using all the search engines necessary for the user.

WebSite-Watcher - allows monitoring web pages, including password protected ones, monitoring forums, RSS feeds, newsgroups, local files. Possesses powerful system filters. Monitoring is carried out automatically and is delivered in a user-friendly form. The advanced program costs 50 euros. Constantly updated.

http://www.scribd.com/ is the most popular platform in the world and more and more widely used in Russia for the placement of various kinds of documents, books, etc. for free access with a very convenient search engine for names, topics, etc.

http://www.atlasti.com/ - is the most powerful and effective tool for high-quality information analysis available for individual users, small and even medium-sized businesses. The program is multifunctional and therefore useful. It combines the possibilities of creating a unified information environment for working with various text, tabular, audio and video files as a whole, as well as tools for qualitative analysis and visualization.

Ashampoo ClipFinder HD - an ever-increasing share of the information flow is video. Accordingly, competitive intelligence agents need tools to work with this format. One of these products is the presented free utility... It allows you to search for videos by specified criteria on video file storages such as YouTube. The program is easy to use, displays all search results on one page with detailed information, titles, duration, time when the video was uploaded to the storage, etc. There is a Russian interface.

http://www.advego.ru/plagiatus/ - the program was made by seo optimizers, but it is quite suitable as an Internet intelligence tool. Plagiarism shows the degree of uniqueness of the text, the sources of the text, the percentage of text coincidence. The program also checks the uniqueness of the specified URL. The program is free.

http://neiron.ru/toolbar/ - includes an add-on for combining Google and Yandex searches, and also allows for competitive analysis based on assessing the effectiveness of sites and contextual advertising... Implemented as a plugin for FF and GC.

http://web-data-extractor.net/ is a one-stop solution for obtaining any data available on the Internet. Setting up data cutting from any page is done in a few mouse clicks. You just need to select the area of data that you want to save and Datacol will select the formula for cutting this block.

CaptureSaver is a professional internet exploration tool. Simply irreplaceable working programm, which allows you to capture, store and export any Internet information, including not only web pages, blogs, but also RSS news, emails, images and more. It has the broadest functionality, an intuitive interface and a ridiculous price.

http://www.orbiscope.net/en/software.html - web monitoring system at more than affordable prices.

http://www.kbcrawl.co.uk/ - software for work, including on the "Invisible Internet".

http://www.copernic.com/en/products/agent/index.html - the program allows you to search using more than 90 search engines, more than 10 parameters. Allows you to combine results, eliminate duplicates, block broken links, show the most relevant results. Comes in free, personal and professional versions. Used by more than 20 million users.

Maltego is a fundamentally new software that allows you to establish the relationship of subjects, events and objects in real life and on the Internet.

SERVICES

new - web browser with dozens of pre-installed OSINT tools.

- an effective search engine aggregator for finding people in the main Russian social networks.

https://hunter.io/ is an efficient service for detecting and verifying email.

https://www.whatruns.com/ is an easy-to-use yet effective scanner to detect what's working and not working on a website and what security holes are. Also implemented as a plugin for Chrom.

https://www.crayon.co/ is an American budget-funded market and competitive intelligence platform on the Internet.

http://www.cs.cornell.edu/~bwong/octant/ - host identifier.

https://iplogger.ru/ is a simple and convenient service for determining someone else's IP.

http://linkurio.us/ is a powerful new product for economic security workers and corruption investigators. Processes and visualizes huge amounts of unstructured information from financial sources.

http://www.intelsuite.com/en - English-language online platform for competitive intelligence and monitoring.

http://yewno.com/about/ - the first operating system for translating information into knowledge and visualizing unstructured information. Currently supports English, French, German, Spanish and Portuguese.

https://start.avalancheonline.ru/landing/?next=%2F - Andrey Masalovich's forecast and analytical services.

https://www.outwit.com/products/hub/ - a complete set of standalone programs for professional work in web 1.

https://github.com/search?q=user%3Acmlh+maltego - extensions for Maltego.

http://www.whoishostingthis.com/ - search engine for hosting, IP addresses, etc.

http: // appfollow .ru / - analysis of applications based on reviews, ASO optimization, positions in the tops and search results for the App Store, Google Play and Windows Phone Store.

http://spiraldb.com/ is a service implemented as a plugin for Chrom, which allows you to get a lot of valuable information about any electronic resource.

https://millie.northernlight.com/dashboard.php?id=93 - free service collecting and structuring key information by industry and company. It is possible to use information panels based on text analysis.

http://byratino.info/ - collection of factual data from publicly available sources on the Internet.

http://www.datafox.co/ - CI platform that collects and analyzes information on companies of interest to clients. There is a demo.

https://unwiredlabs.com/home is a specialized application with an API for searching by geolocation of any device connected to the Internet.

http://visualping.io/ - service for monitoring sites and, first of all, the photos and images available on them. Even if the photo appears for a second, it will be in e-mail subscriber. Has a plugin for GoogleC hrome.

http://spyonweb.com/ is a research tool that allows for an in-depth analysis of any Internet resource.

http://bigvisor.ru/ - the service allows you to track advertising campaigns for certain segments of goods and services, or specific organizations.

http://www.itsec.pro/2013/09/microsoft-word.html - Artem Ageev's instructions for use Windows programs for the needs of competitive intelligence.

http://granoproject.org/ - open source tool source code for researchers who track networks of connections between persons and organizations in politics, economics, crime, etc. Allows you to connect, analyze and visualize information obtained from various sources, as well as show significant connections.

http://imgops.com/ - service for extracting metadata from graphic files and work with them.

http://sergeybelove.ru/tools/one-button-scan/ - a small on-line scanner for checking security holes of sites and other resources.

http://isce-library.net/epi.aspx - service for searching primary sources by a piece of text in English

https://www.rivaliq.com/ is an effective tool for conducting competitive intelligence in the Western, primarily European and American markets for goods and services.

http://watchthatpage.com/ is a service that allows you to automatically collect new information from monitored resources on the Internet. The service is free of charge.

http://falcon.io/ is a kind of Rapportive for the Web. It is not a replacement for Rapportive, but provides additional tools. Unlike Rapportive, it gives a general profile of a person, as if glued together from data from social networks and mentions on the web.http: //watchthatpage.com/ - a service that allows you to automatically collect new information from monitored resources on the Internet. The service is free of charge.

https://addons.mozilla.org/ru/firefox/addon/update-scanner/ - add-on for Firefox. Keeps track of updates to web pages. Useful for websites that do not have news feeds (Atom or RSS).

http://agregator.pro/ - aggregator of news and media portals. Used by marketers, analysts, etc. to analyze news streams on certain topics.

http://price.apishops.com/ - an automated web service for monitoring prices for selected product groups, specific online stores and other parameters.

http://www.la0.ru/ is a convenient and relevant service for analyzing links and backlinks to an Internet resource.

www.recordedfuture.com is a powerful data analysis and visualization tool implemented as an online cloud computing service.

http://advse.ru/ - a service under the slogan “Learn everything about your competitors”. Allows, in accordance with search queries, to get the sites of competitors, to analyze the advertising companies of competitors in Google and Yandex.

http://spyonweb.com/ - the service allows you to identify sites with the same characteristics, including those using the same statistics service identifiers Google analytics, IP addresses, etc.

http://www.connotate.com/solutions - a line of products for competitive intelligence, information flow management and information transformation into information assets. It includes both complex platforms and simple cheap services that allow effective monitoring along with the compression of information and obtaining only the necessary results.

http://www.clearci.com/ - A competitive intelligence platform for businesses of various sizes from start-ups and small companies to Fortune 500 companies. Resolved as a saas.

http://startingpage.com/ is a Google add-on that allows you to search on Google without fixing your IP address. Fully supports everything search capabilities Google, including in Russian.

http://newspapermap.com/ is a unique service very useful for a competitive scout. Connects geolocation with an online media search engine. Those. you choose the region you are interested in, or even a city, or language, on the map you see the place and a list of online versions of newspapers and magazines, click on the appropriate button and read. Supports Russian, very user-friendly interface.

http://infostream.com.ua/ is a very convenient, first-class selection, quite accessible for any wallet, the Infostream news monitoring system from one of the classics of Internet search, D.V. Lande.

http://www.instapaper.com/ is a very simple and effective tool for saving essential web pages. Can be used on computers, iPhones, iPads, etc.

http://screen-scraper.com/ - allows you to automatically extract all information from web pages, download the vast majority of file formats, automatically enter data into various forms. It stores downloaded files and pages in databases, performs many other extremely useful functions... Works under all major platforms, has a fully functional free and very powerful professional version.

http://www.mozenda.com/ - having several tariff plans and a web service of multifunctional web monitoring and delivery of information necessary to the user from selected sites, even for small businesses.

http://www.recipdonor.com/ - the service allows automatic monitoring of everything that happens on competitors' websites.

http://www.spyfu.com/ - and this is if you have foreign competitors.

www.webground.su is a service for monitoring Runet created by professionals of Internet search, which includes all major providers of information, news, etc., capable of individual monitoring settings for the needs of the user.

SEARCH

https: // www .idmarch .org / - the best search engine for the world archive of pdf documents in terms of output quality. Currently, more than 18 million pdf documents have been indexed, ranging from books to classified reports.

http://www.marketvisual.com/ is a unique search engine that allows you to search for owners and top management by full name, company name, position held or their combination. V search results contains not only the desired objects, but also their connections. Designed primarily for English-speaking countries.

http://worldc.am/ is a publicly available photo search engine linked to geolocation.

https://app.echosec.net/ is an open source search engine that describes itself as the most advanced analytical tool for law enforcement and security and intelligence professionals. Allows you to search for photos posted on various sites, social platforms and social networks in relation to specific geolocation coordinates. There are currently seven data sources connected. By the end of the year their number will be more than 450. Thanks to Dementiy for the tip.

http://www.quandl.com/ - A search engine for seven million financial, economic and social databases.

http://bitzakaz.ru/ - a search engine for tenders and government orders with additional paid functions

Website-Finder - makes it possible to find sites that are poorly indexed by Google. The only limitation is that it only searches 30 websites for each keyword. The program is easy to use.

http://www.dtsearch.com/ - the most powerful search engine that allows you to process terabytes of text. Works on desktop, internet and intranet. Supports both static and dynamic data. Allows you to search in all MS Office programs. The search is based on phrases, words, tags, indices and more. The only one available system federated search. Has both paid and free versions.

http://www.strategator.com/ - Searches, filters and aggregates company information from tens of thousands of web sources. Searches for the US, UK, major EEC countries. Differs in high relevance, user-friendliness, has a free and paid option ($ 14 per month).

http://www.shodanhq.com/ is an unusual search engine. Immediately after the appearance, he received the nickname "Google for hackers". It does not look for pages, but determines IP addresses, types of routers, computers, servers and workstations located at a particular address, traces the chains DNS servers and allows you to implement many other interesting functions for competitive intelligence.

http://search.usa.gov/ - a search engine for websites and open databases of all US government agencies. The databases contain a lot of practical useful information, including for use in our country.

http://visual.ly/ - Today, visualization is increasingly used to represent data. It is the first infographic search engine on the web. Along with the search engine, the portal has powerful data visualization tools that do not require programming skills.

http://go.mail.ru/realtime - search for discussions of topics, events, objects, subjects in real or custom time. The previously highly criticized Mail.ru search works very efficiently and produces interesting, relevant results.

Zanran is a freshly launched but already great first and only data finder extracting data from PDF files, EXCEL tables data in HTML pages.

http://www.ciradar.com/Competitive-Analysis.aspx is one of the world's best search engines for competitive intelligence in the deep web. Extracts almost all kinds of files in all formats on a topic of interest. Implemented as a web service. The prices are more than reasonable.

http://public.ru/ - Effective search and professional analysis of information, media archive since 1990. The online media library offers a wide range of information services: from access to electronic archives of Russian-language media publications and ready-made thematic press reviews to individual monitoring and exclusive analytical studies based on press materials.

Cluuz is a young search engine with ample opportunities for competitive intelligence, especially on the English-speaking Internet. It allows not only finding, but also visualizing, establishing connections between people, companies, domains, e-mails, addresses, etc.

www.wolframalpha.com is the search engine of tomorrow. On search query issues statistical and factual information available on the request object, including visualized information.

www.ist-budget.ru - universal search in databases of government purchases, trades, auctions, etc.

The machines must work.
People have to think.

The course "Professional Internet Search" is a convenient way to learn how to competently and effectively search for and find the information you need on the Web.

What professional Search?

The paradox of the Internet consists in the fact that information becomes More but find necessary information becomes more and more difficult... Professional search is efficient search necessary and reliable information.
V modern world information becomes capital, and the Internet becomes a convenient means of obtaining it, which is why the ability to find valuable information characterizes a person as high-class professional... Professional search should always be effective. Moreover, during the search, professionals not only look for the place where the information is stored, but also assess the authority of the resource, the relevance, accuracy, and completeness of the published information. The Internet heuristic helps us in this - a set of useful search rules, criteria for selecting and evaluating network information.

What will you learn and what will you learn?

You searched and couldn't find? Then the course will be extremely useful to you. You'll get comprehensive search instructions of what is already on the Internet, but at first glance it seems that it is simply impossible to find it ... Perhaps! You will learn, how to search to find! Each lesson is built on a combination of knowledge and experience, all received knowledge is tested in practice.

In the classroom You will learn how the modern Internet develops and how electronic information is distributed, how directories are created and how search engines work, why are metasearch engines needed and where did the "hidden" web come from, how forums differ from blogs and what fundraising is.

During the workshops You will learn use the query language correctly, choose keywords correctly, find information on the "hidden" web, find the images and files you need, evaluate public opinion in the blogosphere, search for personal information, and most importantly, correctly assess the reliability, relevance and completeness of the information found.

An Internet search course will allow you to significantly develop your cognitive, informational and communication skills.

What topics are covered in the Professional Search course?

The purpose of the course is to teach the capabilities and subtleties in one month modern search professional information on the web.

Each lesson (module) includes lecture, seminar in a forum format, test on the assimilation of the passed material, as well as several exercises and search tasks.

In the updated course, one-hour webinars will be held weekly - interactive virtual online seminars devoted to discussing the key tasks of professional Internet search.

Each training module is equipped with useful additional materials on course topics and easy-to-print handouts.

The thematic course plan consists of 10 interrelated modules:

1. Internetics: History, Technology and Research of the Internet.
2. Information search... Search directories.
3. Information retrieval systems... IPS close-up(Google, Yandex and others).
4. Metasearch systems and programs.
5. Internet Information Bureau: factual search in encyclopedias, reference books, dictionaries.
6. Bibliographic search: libraries, catalogs, programs.
7. Documentary search: electronic documents, electronic libraries, electronic journals.
8. The Hidden Web: Search for multimedia, databases, knowledge bases and files.
9. Search news(blogs and forums), contacts, institutions, fundraising.
10. Information retrieval strategies: generalization of Internet heuristic skills.

Why distance course?

The distance course has a whole several advantages.

Firstly, each lesson is allocated not one or two academic hours per week, but whole week... You can slowly master and assimilate the lecture material, perform exercises and search tasks.

Second, the distance course interactive... This means that you can always ask, clarify, learn from the teacher what you think is important. Your question will not go unanswered, and difficult search tasks can be discussed by the whole group to evaluate each skill in comparison.

Thirdly, you can study at a convenient time for you and you don't have to waste time commuting to class. Moreover, you can practice anywhere in the world where there is Internet access.

What is par for the course?

The course "Internet heuristics" will last one month and will consist of 10 modules, each of the modules consists of "quanta" lessons - they allow you to keep the pace you need to master the new material). The price of each module is only 300 rubles, for all classes you will pay only 3000 rubles. Please note that you do not have to buy additional textbooks, the course is fully provided with all the necessary teaching materials. In case of successful completion of the course, you will receive a MSU certificate of completion of the course "Professional Internet Search".

If you want to learn Internet resourcefulness, then you need to choose a convenient time for taking the course and sign up (just click on the sign up link opposite the convenient time slot at the top of the page)!

After registration, you will still have time to think and make a final decision. By the way, you can get acquainted with