What are search engines?

Search engines make the web comfortable and enjoyable. Without them, people might find it difficult to get the information they are looking for online because there are a huge number of web pages available, but many of them just have titles based on the whim of the author and most of them are sitting on servers. with cryptic names. .

When most people talk about searching the Internet, they are actually referring to Internet search engines.

The first search engines had an index of a couple hundred thousand pages and documents, and received maybe a couple thousand queries every day. Today, a major Internet search engine will process vast amounts of web pages and react to millions of search queries on a daily basis. In this chapter, we’ll tell you how these main tasks are accomplished, and how search engines put it all together to let you discover all the information you need online.

When most people talk about searching the Internet, they are actually referring to Internet search engines. Before the web became the most visible aspect of the Internet, search engines were already in a position to greatly assist users in locating information online. Programs with names like ‘Archie’ and ‘Gopher’ maintained indexes of files stored on web-connected servers and significantly reduced the amount of time it took to find pages and documents. In the late eighties, getting the right value from the web meant knowing how to make use of Archie, Gopher, Veronica, and others.

Nowadays, most of the online users limit their searches to websites from all over the world, so we will limit this chapter to discussing the engines that concentrate on the contents of web pages. Before search engines can tell you where a file or document is located, it needs to be found. To locate information from the vast number of web pages out there, search engines use special computer robots, called spiders, to build lists of what is available on websites. Every time a spider builds its lists, the procedure is known as web crawling. In order to build and maintain a good keyword list, search engine spiders have to go through a large number of pages.

So how exactly does a spider begin its journeys within the web? The usual place to start is the lists of well-used pages and servers. The spider starts with a well-known site, indexing what’s on its web pages and following every link placed on the site. In this way, the spider system begins to visit and spread through the most favored parts of the network very quickly.

Google was initially an academic Internet search engine. The paper describing how the system was built (written by Lawrence Page and Sergey Brin) gave a good explanation of how quickly its spiders could possibly work. They built the first system to make use of multiple spiders, often three at a time. Each spider will keep around 300 web page connections open at any given time. At its maximum capacity, using 4 spiders, his system was able to scan over a hundred pages per second, creating around six hundred kilobytes of data.

Keeping everything running quickly meant creating a system to provide the spiders with the necessary data. Google’s first system had a server focused on providing URLs to spiders. Instead of using an online site provider for a domain name server that translates a server’s name into a web address, Google got its own DNS, so delays were minimized.

Every time a Google spider scanned an HTML web page, it took note of a couple of things:

what was on the website

Where the particular keywords were located.

Words appearing in subheadings, titles, meta tags along with other important positions were recorded for preferential consideration after a user performed a search. Google spiders were created to index every significant phrase on an entire page, skipping the “a”, “a” and “the” articles. Other spiders just take different approaches.

These different approaches are an attempt to help the spider operate faster and allow users to find their information more efficiently. For example, some spiders will keep an eye on what’s in the headings, subheadings, and links, combined with the top 100 most used words on the page and every word in the first 20 lines of text. Lycos is believed to make use of this method of scratching the web.

Other systems, for example AltaVista, go in another direction, indexing each and every word on an entire page, including “a”, “an”, “the” along with other “insignificant” words. The comprehensive aspect of this method is matched by other systems in that they direct to the invisible part of the web page, the meta tags.

With the major engines (Google, Yahoo, etc.) accounting for over 95% of searches done online, they have become a true marketing powerhouse for anyone who understands how they work and how they can be used.

Website design By BotEap.com

Add a Comment

Your email address will not be published. Required fields are marked *