
If you are at all interested in utilising search engines for increasing your website’s online visibility you really should take some time out to understand and prepare for their ‘spiders’. Also known as ‘bots’ or ‘crawlers’, search engine spiders are specially created software programs designed to move through the internet, from page to page via HREF or SRC links. Fully automated, as they manoeuvre through the internet key pieces of information are sent back to the search engine HQ for processing. Top search engines like Google, Yahoo!, Bing and Ask command the lion’s share of the global internet market (Google – 84.96%, Yahoo! – 6.24%, Bing – 3.39%, Ask – 0.76%) and similarly have spiders that are amongst the most advanced, linked to computers capable of herculean data processing, managing millions of calculations every second.
Upon crawling a webpage, information about your site allows its pages to be correctly indexed in a giant tightly managed database of documents that comprise the search engine’s so-called ‘index’. It is from this index that search engines are then able to retrieve the most relevant information to the many millions of searcher queries. Queries are processed, all the most relevant indexed documents to the query are matched and then put through a mathematical sorting equation, known as an ‘algorithm’, resulting in the most relevant results being put forward and ranking highest in the search engine results pages (SERPs).
In order for your webpages to be found and indexed by search engines it is vital to have external links to your website, preferably from websites that are frequently visited by search engine spiders. Google is one of the search engines that allows you to inform them of your website, via specific Google form-filling, to guarantee it being crawled. Submitting your site to search engines in this manner is generally considered slow due to many reasons including the sheer quantity of daily submissions from other webmasters. Spammers, people who actively go against the moral terms and conditions of search engines for profit, such as by creating 1000s of duplicate websites for adsense profits, have also frequently flooded search engine website submission forms with their own duplicate and poor quality website’s details. This too has lowered the general effectiveness of using search engine website submission forms to get your site rapidly seen and crawled.
When constructing your website, care must be taken to ensure spiders are able to see all of the webpage information. Many hundreds of ranking factors exist, it is your job to best optimise for these factors whilst adopting coding that doesn’t stop or cause partial crawling/ indexing by search engine bots. If you can it is best to minimise or externalise any coding that is not HTML, XHTML or CSS, since spiders don’t yet fully understand these other languages. Sitemaps aid deeper crawling and indexing of website pages. URLs should never contain more than two dynamic parameters, e.g. ‘?’ or ‘%’ and individual webpages should generally have less than 250 (preferably less than 100) unique outbound links, any more may simple not be followed. Pages that can only be seen after login details or forms are filled in, pages that are split into frames, require session IDs or cookies to enable navigation all tend to stop spidering. Keeping things as simple as possible is generally best. Decent quantities of quality content should ideally be placed as high up the source code of each page as possible, your most important outbound links also being the first thing spiders see as they work down the pages.
So as to focus spiders on pages you want to rank, rather than those you don’t, such as privacy policies or terms and conditions documents, the use of robots.txt files or configuring your site server to put forward error status messages to specific suspect bot user-agents are amongst the most effective precautionary measures. Also, Google’s ‘Webmaster Tools’ is very useful for helping you identify pages that don’t crawl well. Similarly, through crawl rate tweaking you are able to lessen the chance of obscene bandwidth use by visiting bots, lowering the chances that your site is temporarily taken off the SERPs by search engines.
Building and promoting websites tends to be a long and complicated process, ensuring search engines are able to access and understand your website is critical in your quest for top rankings.