Internet Marketing Blog

AdSense Articles I Affiliate Marketing I Article Submission Help I Business Planning I Search Engine Optimization I Advertise I Free Webmaster Tools I Online Business Blog

How Engines Work

How Search engines Work


Document and Data Search Programs review

 

To say that in our time of information technologies and infinite growth of data volume, available to both individual and society as a whole, there are a lot of problems of information search and processing would be next door to blasphemy. So in order not to bore you with subjective and objective opinions taken from various sources, let’s go straight to solving the problem. Everyone talks about it. So in order not to bore you with subjective and, partially, objective opinions taken from various sources, let’s go straight to solving the problem. Today we’ll talk about search. That is, about programs and information systems performing the search of data and documents we need.

Upgrade of "straight search"


Not so long ago when the trees were still big and there wasn’t much information, even in the enterprise’s local network, any search was conducted through humdrum looking over a bunch of accessible files and a consecutive check of their titles and contents. This type of search is called straight, and programs that use it are traditionally included into all OC and instrument packages. Yet it’s not in the capacity of even the most powerful modern computers to conduct the search in gigantic data volumes. Looking over hundreds of documents on the disc and search in the huge library & dozens of mailboxes — are quite different things. That’s why the programs of straight search gradually pale into insignificance when we are talking of universal tools.

Obviously, in the corporate sector this type of search is no longer called for. It simply doesn’t fit the data volumes. That’s why it’s not the first year, and especially lately, that technologies capable of performing quick and precise search in various document formats are more than vital. Not so long ago Microsoft’s “dad” Bill Gates himself, perhaps envying Google’s success, at one of the press conferences announced his wishes for the soft (and not only) giant to facilitate the development of search systems and technologies by all means. Yet it will take a lot longer before we see some phenomenal program or a more or less competitive service issued by Microsoft (MSN is far behind Google). So, let’s look at the already existing developments.

Index, query, relevancy
Modern search technologies are based on two processes: indexing of available information, query processing and a following output of results. Speaking of the former, any program (whether it’s a desktop search engine, a corporate information system, or an internet-search engine) creates its own search domain. That is, processes documents and forms their index (organized structure that contains information on processed data). Afterwards this created index will be used for in processing the query. What follows might not be at all simple technology-wise, but it doesn’t present any complexity to the average user. Then the program processes the query (by the keyword or phrase) and shows a list of documents that contain this key phrase. Since the information is stored in the structured index, the query processing takes much less time (up to hundred times faster) than if it was the straight search (the document selection is conducted not through file overlooking, but through analyzing the text data in the index).

The program shows a list of results in the order of relevancy (the level of correspondence to the query text). Different technologies, obviously, use different methods of search and relevancy determining (number of word inputs, usage frequency, the ratio of the query to the number of words in the document, the intervals between the words of the query phrase in the searched files, etc.). Based on those criteria the “weight” of the document is determinted, and depending on it the file takes a certain position in the list of results. When dealing with Internet search everything gets complicated, since a lot of other factors are to be considered (Page Rank Google is an example). But that’s the topic for a whole other article, so let’s not touch Internet.

Search Engine overlook

This article discusses features of some popular search engines that can boast their speed as well as a good set of functions. Yet showing off in advertisements is one thing, being able to withstand an expert’s detailed gaze is something totally different. Speaking of experts, there was a whole office of them ready to explore the software from the ground up. A set of programs (dtSearch Desktop, "Èùåéêà Ïðîô Deluxe", Google Desktop Search, SearchInform, Copernic Desktop Search, ISYS Desktop) was installed on a test computer (Athlon 2,2 MHz; 1 Gbyte RAM, IDE-winchester Seagate, 160 Gbytes, 7200 cir./min.; Windows XP). A text database of 20Gbytes volume was created in formats doc, txt and html for testing. A group of people, including me, was involved in testing, comparing and sharing their subjective impressions on each software. My own impressions from the event follow.

dtSearch Desktop 7.0

The prog


Author-Bio:Max Maglyas is a Belarusian journalist specializing in IT & software

 


Site promotion Information

Search engine positioning with organic optimization relies mainly on link popularity


Close
E-mail It