Search Engine Crawlers and Dynamic Web Pages site site articles site information about site what is site Site Promotion Information Search Now: Search Engine Crawlers and Dynamic Web Pages plus articles and information on site
Article: 4425

Search Engine Crawlers and Dynamic Web Pages


This information brought to you by Todays Sponsor! (car dealer web site promotion)
Blinkx Video Search
World's largest video search engine. Over 26 million hours of video. Watch it all!
blinkx.com
 Free tech talk and gadget review video!
Watch GeekBrief with Cali Lewis right now!
geekbrief.mevio.com
 

Jerry Yu

There are misunderstandings and confusions in the Search Engine Optimization SEO world in regard to search engines indexing of dynamic web pages.

It has been claimed that search engine spiders dont index/crawl dynamic web pages well. This statement is only half true. The correct statement should be "Search engines dont index/crawl dynamic web pages well if the page URL contains "" without quotes character.". Search engines do index dynamic web pages very well if the page URL contains no "" characters.

URLs that contain "" are called dynamic URLs.

What web pages are dynamic

If you have knowledge about HTML, you know the web pages you create normally have .htm, or .html, file extension. These files are static because the HTML code dont change on the fly when requested and they are not processed by web servers. They can be viewed without using a web server.

A web page is said to be dynamic if it is created by using server-side scripting languages such as php, asp, jsp, perl, cgi and so on. These languages are like normal programming languages such as C++, Java, etc. The major difference is scripting languages cant be compiled beforehand. They can only be processed by web servers on the fly when the page is requested by a visitor. Dynamic pages cant be viewed without a web server.

When a dynamic page is requested, the web server first looks at the pages source code and if any server-side scripting code exist, it will process them and generate static HTML result. When processing of the full page has been completed, web server sends only pure HTML code to the web visitors browser.

Using scripting languages to create web pages gives you the power to do nearly anything you want. If the dynamic page has no "" character in its URL, search engine spiders treat the page the same as a normal HTML static page.

Query string parameters

When "" character is used, the pages full URL changes when values after "" change. The portion after "" is called the pages query string parameters, or simply query parameters. Every time when parameters changes, the resulted page will be different.

A page URL can contain more than one "" character. When this happens, search engine spiders will have difficult time to index the resulted page. If the page has only one "" character, major search engine spiders can crawl that page well. For example, Google can index and store a pages URL as http://www.examplesite.com/product.aspid=12345. But if the same pages URL is

http://www.examplesite.com/product.aspid=12345&category=23&page=3

Most search engines will not be able to index it well even though Googlebot and Yahoo! Slurp may be able to index it.

Note: Googlebot is Googles web-crawling robot. Yahoo! Slurp is Yahoos web-crawling robot. Search engine robots collect documents from the web to build a searchable index.

Yahoo help says

"Yahoo! does index dynamic pages, but for page discovery, our crawler mostly follows static links. We recommend you avoid using dynamically generated links except in directories that are not intended to be crawled/indexed e.g., those should have a /robots.txt exclusion."

Googles Webmaster Guidelines:

"If you decide to use dynamic pages i.e. the URL contains a "" character, be aware that not every search engine spider crawls dynamic pages as well as static pages. It helps to keep the parameters short and the number of them small."

Lets analyze what Google has stated above.

1. the URL contains a "" character: this means the definition of dynamic pages are those containing "" characters in URL.

2. keep the parameters short: this means the number of characters in each individual parameter should be short. There is no quantitative measurement given by Google but we can check some web forums to see examples. My Search engine friendly article http://www.webactionguide/action-guide/build-site/se-friendly.php referenced black hat seo discussion thread on Cre8ASiteForums. Its URL is http://www.cre8asiteforums.com/viewtopic.phpt=8386

This page was crawled by Google. The length of its query parameter is 4 characters. There are many other examples on the internet that have more characters and were crawled successfully. The maximum number of characters that can be accepted by Google is unknown.

3. keep the number of them small: this means we should keep the number of parameters in each URL as small as possible. The above Cre8ASiteForums example has one parameter.

At least now we can say Googlebot is able to crawl dynamic pages that have one query parameter and the number of characters in the parameter can be 4.

How to get your pages crawled if using query parameters are not avoidable

Query parameters are often used for database calls to retrieve stored information by using primary keys in one or more tables. Database Management System DBMS makes some tedious work easy to manage. When query parameters must be used for your site, consider build a site map page and hard code a pages URL. For example, the previous URL can be hard coded as

http://www.examplesite.com/product12345-23-3.asp

Hand code every dynamic page is time-consuming. If you use Apache web server, there is a Apache mod_rewrite module to help you http://httpd.apache.org/docs/mod/mod_rewrite.html rewrite the requested URL to one with no "" character embedded on the fly.

Another mod rewrite resource site is www.modrewrite.com.

An interesting article on weberblog.com talked about a practical example of how Google successfully indexed a dynamic page after applying mod_rewrite module. The page originally had 17 characters in the query parameter.

Before rewrite: http://www.weberblog.com/article.phpsroty=20040419170030157

After rewrite: http://www.weberblog.com/article.php/20040419170030157

So, if your site is experiencing the same problem, hurry up and implement mod_rewrite now.




Recommended Reading:

Blinkx Video Search 
  • World's largest video search engine. Over 26 million hours of video. Watch it all!

  • >> View Site
     
    Free tech talk and gadget review video! 
  • Watch GeekBrief with Cali Lewis right now!

  • >> View Site
     
    Free tech talk and news video! 
  • Watch Tech5 with John C. Dvorak now!

  • >> View Site
     
    Howie Mandel & The Talking Pine 
  • Howie Mandel tries to come up with a holiday campaign for Buy.com. Buy.com has millions of items at amazing prices, free shipping, video reviews, over 20 major categories of products.

  • >> View Site
     
    Atom Comedy: Great Moments In Human Interaction 
  • There are a small handful of moments in human life that are so perfect, so wonderful, so priceless that we strive to capture them forever on film. These are definitely not those moments...

  • >> View Site
     
    Gamer News, Videos, Screenshots & Reviews 
  • Independent Journalism Has Arrived At Crispy Gamer. Credible Reviews Without Publishers Ads. For Serious Gamers Only.

  • >> View Site
     
    Sweep the Leg 
  • Watch the Karate Kid Online. Free. Exclusively on Crackle.

  • >> View Site
     
    Condé Nast Portfolio 
  • A New Magazine That Will Change the Way You Look At Business.

  • >> View Site
     
    New Rap Music 
  • Search Emerging Artists & Help Vote Them Into Stardom. Join Today

  • >> View Site
     
    A Mood Booster to Combat The Credit Crunch 
  • Forget about the credit crunch for a little while and be just mildly entertained for a few minutes...

  • >> View Site
     

    RELATED ARTICLES >>
    How to Create A Homepage That Works - Site
     
    Want to Raise Your Rankings in the Search Engines Think Content - Site
     
    Site Maps: A Force to be Reckoned With - Site
     
    Using Meta Tags Wisely to Attract Search Engines and Visitors - Site
     
    Work With The Search Engines - Dont try to Outsmart the Search Engines - Site
     
    Blogging and Pinging- Powerful Backdoor Into Major Search Engines For Free - Site
     
    Linking for Fun and Profit - Site
     
    Link Popularity - A Thing of the Past - Site
     
    The Most Beneficial and Most Under-Used Web Site Promotion Tool - Site
     
    Write It and They Will Come - Site
     
    Search Engine Optimization - Site
     
    Why Articles Are Not The Route To High Search Engine Rankings - Site
     
    FIVE ADVANCED and Essential features of Your Follow up Autoresponder - Site
     
    How to get non Reciprocal Links - Site
     
    Last Updated: 2008-12-01     Need More? Check out Article-Max Table of Contents :: docuMAX Network