Site Explorer Blog Posts
This is the Site Explorer archive of the Yahoo! Search blog. To go back, use the "back" button on your browser. Or you may return to the Yahoo! Search Blog home page.
Yahoo! Slurp 3.0
Over the past few weeks, we've been preparing for the latest version of the Yahoo! Search crawler with some infrastructure updates, which recently caused a variance in our crawl behavior.
With everything now in place, the rollout has officially begun. The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for 'Yahoo! Slurp,' though it'll identify itself as Slurp 3.0 in your web logs.
As the new software undergoes a phased rollout to our production crawlers over the next several weeks, you'll see the following changes:
b) The crawlers will also publish a new user-agent, 'Yahoo! Slurp/3.0.' Existing robots.txt directives for 'Slurp' or 'Yahoo! Slurp' will continue to work, but if you have directives specific to 'Slurp/2.0,' they won't be recognized by the new crawler (though usage of the 'Slurp/2.0' user-agent is very rare on the web, so you won't likely be affected). We recommend specifying the shorter version of: User-agent: Slurp. Check out "How do I prevent my site or certain subdirectories from being crawled?" on our Help page for more details.
These changes will affect the main Yahoo! Web Search crawlers. Crawlers that similarly respect the Yahoo! Slurp directive but identify themselves more specifically, such as Yahoo! Slurp China and others, will not be impacted.
Let us know if you have any questions or observe anything unusual.
Sharad Verma & Yoram Arnon
Yahoo! Search
Yahoo! Search Support for X-Robots-Tag Directive to Simplify Webmaster's Control and Weather Update
Today we're announcing support for tags that give webmasters even more flexibility over which pages and documents are crawled and indexed by Yahoo! Search. Specifically, we're extending our support of page level exclusion tags -- NOINDEX, NOARCHIVE, NOSNIPPET, NOFOLLOW -- to provide additional control for archiving and summarization of ANY file type. Previously, these page level tags could only be expressed within html pages through the META directive (for e.g. <META NAME="Slurp" CONTENT="NOARCHIVE">), but based on feedback from our webmasters, Yahoo! now enables these tags to be expressed through X-Robots-Tag directive in the http header, giving webmasters the flexibility to achieve exclusions on PDF, Word documents, PowerPoint, video, and other file types, including html files, and increasing their coverage through a simplified process. Additionally, webmasters no longer need access to html templates in order to express exclusions for html files. To take advantage of this feature, simply add the following page level tags to the X-Robots-Tag directive in the HTTP Header. Here are a few examples:
Note: We'll still need to crawl the page to see and apply the tag, so if you don't wish to have the page crawled, use robots disallow on robots.txt.
X-Robots-Tag: NOARCHIVE -- If you don't want to display cache link in the search results page. X-Robots-Tag: NOSNIPPET -- If you don't want to display summary in the search results page. X-Robots-Tag: NOFOLLOW -- If you don't want Yahoo! to crawl links in the page.Along with this change, we'll be rolling out additional changes to our crawling, indexing and ranking algorithms over the next few days. We expect the update will be completed early next week, but you may see some changes in ranking as well as some shuffling of the pages in the index during this process.
We're at SES in Chicago and WebmasterWorld's PubCon in Las Vegas, participating in a few different panels this week. Please find us if you have any questions or suggestions or drop us your feedback here.
Sharad Verma
Yahoo! Search
Site Explorer Counts Resolved
Last week we announced that we were working on a fix to correct discrepancies in the page and inlink data counts. The good news is that the fix is now complete. You should see consistent page and inline counts in Site Explorer, whether you're logged in or logged out.
Try a query in Site Explorer to confirm. As always, let us know your feedback.
Priyank Garg
Yahoo! Search
Update on Site Explorer Results and Counts Data
Recently, some of you noticed changes in counts for Site Explorer results, where the counts were different for logged-in users versus logged-out users.
While the counts have been incorrect in some cases, the actual returned results have been correct. However, we did roll out a product fix yesterday and will be rolling out a couple more over the next few days to resolve this difference in counts some of you have observed.
Please disregard any counts for inlinks reported by Site Explorer from October 11 through next week. Thank you for raising this issue.
Priyank Garg
Yahoo! Search
Come and Explore your Site...
Yahoo! Small Business just made it easier for customers to submit and authenticate their sites to Yahoo! Site Explorer. Now, all you have to do is make sure that 'sitemap.xml' is enabled and your site will be submitted to Yahoo! Site Explorer automatically.
With this feature, new stores as well as existing stores with 'sitemap.xml' enabled will have access to the toolkit inside Site Explorer. Within a few hours of enabling, you'll be able to locate your indexed pages and the links to your sites, as well as delete pages in the index or rewrite dynamic URLs. To double check if your site was auto-authenticated, take a look in the 'Source' column in the 'My Sites' page in Site Explorer.
If you're looking for more information on the Sitemap feature, take a look at Sitemap help from Yahoo! Small Business. You can also read more about this feature on the Yahoo! Stores blog.
Welcome Yahoo! Small Business customers! Let us know how things are working for you in the comments below.
Priyank Garg
Yahoo! Search
Be Dynamic, Be Confident -- Yahoo! Search Supports You
Please excuse the dramatic start to this post. Between the anticipation of rolling this out and my incessant Harry Potter reading, I couldn't resist.
Once upon a time, on the World Wide Web, all URLs were fixed strings -- static in form. The idea of URL parameters then came along, allowing for database driven sites and session ids in URLs to create personalized experiences for users. At that time, the Web was alive with rich data and experiences. Then came the crawlers, which made it easier for users to navigate through the Web; however, they inevitably battled with dynamic URL parameters and every webmaster had to choose between a dynamic site and search traffic.
Today comes a new wave for search engines with the first-ever Beta launch of 'Dynamic URL Rewriting' in Site Explorer. The new feature provides the ability for site owners to alert Yahoo! of the dynamic parameters in URLs that they'd like Yahoo! to ignore, which we'll then automatically rewrite accordingly. Try this out for all the cases where you'd want to use parameters in your URLs that don't affect the content of your page, but that have other important uses.
How to get there?
So you might wonder what the feature really gives you. Utilizing the 'Dynamic URL Rewriting' feature enables:
Looking for more details on when to use URL parameters? Visit the Site Explorer Help page for additional background on the Beta feature to help define and omit what dynamic URLs Yahoo! should ignore.
We're here to address any questions/ needs that you have, so let us know how it works for you.
Priyank Garg
for Lakis, Amit B., Amit S., Jay, Judy, Srikanth, Zheng
Yahoo! Search
Webmasters Can Now Auto-Discover With Sitemaps
Since working with Google and Microsoft to support a single format for submission with Sitemaps, we have continued to discuss further enhancements to make it easy for webmasters to get their content to all search engines quickly.
All search crawlers recognize robots.txt, so it seemed like a good idea to use that mechanism to allow webmasters to share their Sitemaps. You agreed and encouraged us to allow robots.txt discovery of Sitemaps on our suggestion board. We took the idea to Google and Microsoft and are happy to announce today that you can now find your sitemaps in a uniform way across all participating engines. To do this, simply add the following line to your robots.txt file:
Sitemap: http://www.example.com/sitemap.xml
Please provide the complete URL for your Sitemap on this line. We will pick it up wherever you put it in your robots.txt file. This directive is not specific to user-agent. If you have multiple Sitemaps, you can point to your Sitemap index file on this line. Details about the Sitemaps protocol including this addition are available on the protocol website -- http://www.sitemaps.org.
If you prefer, you can continue to issue Sitemaps to Yahoo! Search by simply inputting the URL for your Sitemap and submitting. Or add feeds to a site you are already managing under 'My Sites' in Site Explorer. This also allows us to provide more feedback to you about what we are doing with the sitemap.
We're also happy to have some east coasters, Ask and IBM, announce their support for Sitemaps. The more the merrier!
We'll also be sharing more this week at SES NY.
If you have other thoughts about how we can collaborate with other search engines on standards such as robots.txt, we'd love to hear from you -- visit our suggestion board.
Priyank Garg
Product Manager, Yahoo! Search
Site Explorer Matures a Bit More and Accepts Mobile Feeds
It's been nearly two years since we first made Site Explorer available . How time flies! Since its inception, we've added a number of new features to Site Explorer, including Feed Submission, Site Authentication and more data for webmasters. And today, we've got a few more additions to share with our users.
Site Explorer offers Mobile Submit
Enhancing our Mobile Site Submit feature, publishers can now submit mobile sites and feeds to Site Explorer, which enables them to get their mobile sites into Yahoo! oneSearch and gain access to Yahoo!'s mobile user base. Our mobile crawler will consume these feeds to help it find new pages. The feeds can be:
Site Explorer is out of Beta
A while back we added the Delete URL feature to provide more direct control to webmasters. This was a critical stage for Site Explorer and after having successfully crossed that milestone, today we're taking it out of beta. Over the last few months, webmasters have tried out the various features and provided their feedback, which we're addressing in this release:
Report Spam
We've heard from a number of webmasters who are looking for ways to address spam, so we're trying out a new feature. Now when exploring your authenticated site, if you find a suspicious inlink, such as an off-topic link or a suspected linkfarm, just click on the 'Report Spam' button and submit a spam report.
We hope you find these updates useful. And as always, keep the feedback coming!
Yahoo! Site Explorer and Mobile Search teams
Keeping Ad Tracking and Dead URLs out of Yahoo! Search
We're often asked how Yahoo! Search determines which pages get indexed and which pages are left un-crawled. First and foremost, we honor the industry-standard robots.txt file format, which gives Webmasters several layers of control over which sites, pages and specific URLs should be indexed. Lately we've heard from a number of Webmasters asking how best to prevent ad tracking URLs and dead URLs from getting indexed, so we thought we'd respond via this post.
Ad tracking URLs
Ad tracking URLs are used by Webmasters to help determine what traffic is coming in from advertisements (e.g., Yahoo! Sponsored Search and Yahoo! Publisher Network) but aren't necessary to include in the Yahoo! Search index. Sometimes you might notice that these URLs still appear in the index. That's because they've appeared on pages that are "crawlable" or may have been copied over to crawlable pages by users. If you don't want Yahoo! Slurp, our Web crawler to index these URLs you can use wildcards in robots.txt. For example, if you are using the parameter 'ref' to track ad sources, you can use a rule like the one below to keep your tracking URLs from being Slurped:
User-Agent: Yahoo! Slurp
Disallow: /*ref=YahooPublisherNetwork
Dead URLs
The best way to remove dead URLs from the Yahoo! Search index is to return an HTTP Error 404 when our crawler requests the page. If you want to act before the 404 discovery and URL removal process completes, you can use Site Explorer to quickly delete the URLs from the index. One advantage to using Site Explorer is that you can delete multiple URLs including an entire subpath so long as the URL prefix is the same. As Danny Sullivan points out in his deep-dive post on the delete function, if you delete http://domain.com/subarea1/, then all the pages that begin with ?domain.com/subarea1? will get removed. E.g.:
http://domain.com/subarea1/page1.html
http://domain.com/subarea1/page45.html
We'll continue to visit the Yahoo! Search blog to give Webmasters like you pointers on how to better manage your sites in the Yahoo! Search index. Be sure to visit us at the Site Explorer Suggestion Board if there are specific areas that you'd like us address in more detail.
Thanks,
Priyank Garg
Yahoo! Search
Yahoo! Site Explorer: Authenticate your site via a META tag and more goodies
We spend a lot of time listening to our users, and I am happy to say we�ve gotten better at it. We�ve been using feedback forms and message boards, and finally at the Chicago SES last December, we launched our new Site Explorer Suggestion Board. This is a new user based ranking feedback tool, which was first introduced at an internal Yahoo! Hack day and is currently being deployed across the Yahoo! network. It allows you to make suggestions for the product, vote for existing suggestions or simply comment on them.
Today, we launched a new version of Site Explorer that addresses some of the Top Rated suggestions from our users. The key features are:
� Site Authentication using META tags: For those of you who cannot upload an authentication file to your site, such as a blog, you will now be able to authenticate your site in Site Explorer by including an authentication key as part of a META tag on the home page of your site. This is in addition to the existing mechanism of putting a file on your site home directory.
� Detailed Authentication Errors: We now provide detailed errors on authentication failures, making it much easier to diagnose possible problems.
� Delete URLs: For your authenticated sites, you can now delete any URLs from the index. Simply locate the URL in Site Explorer and click on the �Delete URL� button. The URL and all its subpaths will be deleted shortly thereafter. This is meant to work in conjunction with the robots.txt file while providing greater responsiveness. Please continue to use the robots.txt protocol to ensure that our crawler does not crawl pages you want to keep out of our index.
� Site Explorer Badge: Get a Site Explorer badge for your Website and retrieve the count of live links from the whole web. Go ahead, watch as your site becomes more popular, and show off your link wealth to your visitors.
These features address some of the most popular suggestions that we received on our new board. The full list of suggestions we will be able to address with this release is:
a) Allow removal of invalid or malformed URLs
b) Verification for blogs
c) Authentication Problem
d) More than 25 sitemaps
e) better labeling of TSV files
f) Site explorer should identify itself in the user agent string
g) https / ssl
h) Wait 1 day? (Speed of authentication)
Hope you�ll enjoy the improvements. Please share with us your experience using these features and continue to send us your feedback. It's very valuable to us!
Priyank Garg, Amit Kumar, Apostolos �Lakis� Karmirantzos, Di Chang, Judy Johnson
Yahoo! Search
Yahoo!, Google and Microsoft join forces (really !!) behind Sitemaps
The best part about to-do lists is when you get to cross something off, and today we can cross one more from the list of feedback we have collected from webmasters. You have asked us to support a single format for submission and today we want to talk about how we are teaming up with Google and Microsoft to support Sitemaps 0.90.
Together we're announcing www.sitemaps.org, which provides details of the current release of the Sitemaps protocol and will include future updates as we continue to collaborate on this common protocol. By offering an open standard for web sites, webmasters can use a single format to create a catalog of their site URLs and to notify changes to the major search engines. This should make is easier for web sites to provide search engines with content and metadata. And in turn, search engines can spend less time crawling unchanged pages and can update indexes faster as new content is discovered. This will help us reflect the changes more quickly, and improve our ability to provide more timely and relevant search results for users. Sitemaps is available to any site owner who wishes to communicate more easily with participating search engines. Simply create and upload an XML Sitemap and submit the URL of the file to search engines.
You can submit Sitemaps to Yahoo! Search through Site Explorer, just like you could add RSS feeds up to now. Just add the site to which the feed belongs, to your list of sites, and then add the feed for that site. We will retrieve the sitemap and use the data you provide us.
We are open to feedback and ideas on what more we can do with Site Explorer and Sitemaps. Share your thoughts in our forum, we?d love to hear from you.
Thanks and keep the list growing,
Priyank Garg
Product Manager, Yahoo! Search
Site Explorer Update: Authenticating Yahoo! Stores
First, thank you to those who are using Yahoo! Site Explorer to keep tabs on how your site is indexed by us, and especially to the folks who are using the forum to ask questions and suggest new features. Your feedback helps us prioritize features as well as come up with new features to add to the roadmap. One of these requests was for the ability to authenticate Yahoo! Stores in Site Explorer, which we have just introduced. Basically, that means you now you have the ability to add your Site Explorer key to your Small Business site. For more, please head on over to the Yahoo! Store Blog for the complete rundown.
As always, please send us your feedback on Site Explorer, or visit our forum to share your thoughts with other users.
Thank you!
Priyank Garg
Product Manager, Yahoo! Site Explorer
Site Explorer Authentication - Some Improvements and Notes
We have had phenomenal response to the new version of Yahoo! Site Explorer we launched two months ago. Thanks to the many of you who have come by and used the new interface, authenticated your site, and asked us questions on the forum. We have been answering many questions on the board, and there are a few common themes that we want to respond to in more detail.
A few other tips we wanted to share regarding authentication:
We have also made other minor updates to the interface designed to make Site Explorer easier to use.
We appreciate your feedback and are doing our best to address it. One of our goals is to make Site Explorer even more easy to use. So please let us know if our tweaks help make the tool a bit more webmaster friendly and continue to share your thoughts with us!
Priyank Garg, Amit Kumar, Apostolos 'Lakis' Karmirantzos, Di Chang
Site Explorer Team
Pointing Webmaster Queries to Site Explorer
A lot of webmasters use Yahoo! Search to get page and inlink data about their site, using 'site:', 'link:', 'linkdomain:' queries. Starting last night, we are redirecting all queries of this nature to the Site Explorer results pages, so that you can benefit from this tool's additional features.
To reiterate, the following types of queries will be redirected:
All other queries, such as the ones below, will not be redirected:
Site Explorer, since its launch last year, has had various features geared to serve webmaster needs for data about their web site, such as data downloads in TSV format and more accurate counts of results. On Tuesday we launched an upgraded version of the Site Explorer with several new features.
For those of you who will be seeing Site Explorer for the first time, we hope that you will find that these features make your lives easier.
If you want to extract this data programmatically, please use our Web Service APIs. The APIs provide the same data and will be more stable and easier to parse than our search page, which we regularly change to make user experience improvements for our users.
A hearty thank you to the many webmasters who have tried out Site Explorer's new functionality since the Tuesday update. If you haven't visited yet stop by to register your site and let us know your thoughts on Site Explorer in our forum.
Priyank Garg
Product Manager, Yahoo! Search
Site Explorer Update
We opened a little window into Yahoo! Search last year, when we launched Site Explorer. We hoped it would be useful to webmasters--providing you with information about the links to and from your site, neatly categorized and displayed in an easy-to-use interface. We've listened to your feedback, and are now ready with the next version of Site Explorer--our biggest update since December.
We're now organized around sites you'd like to track. You can explore these, and add feeds to each site. Once you authenticate your site, you can see much more information about your URLs as you explore your site, and monitor feeds you've submitted.
So what's new?
We hope you'll like our new interface, with a lot of little details sprinkled all over, such as the expandable results to reduce clutter, the ability to download more URLs from sites you own, and robust authentication. Share your comments through our feedback form or see what others are saying on the new Site Explorer forum.
We welcome you through the doors, and hope you'll forgive our tacky metaphors! :-)
Amit Kumar, Priyank Garg
and the entire Yahoo! Site Explorer Team
It?s Search. It?s Site Explorer. It?s Webzari!
As Searchblog readers may remember, we launched a tool called Site Explorer last year that you can use to see what pages from a site are indexed in the Yahoo! Search engine. You can also use Site Explorer to see page links.
The Site Explorer interface is based on the search results page experience and returns lists of pages that are indexed, and inlinks to your site, as you can see for the Searchblog.
But the Yahoo! Korea team took the basic functionality and gave it an entirely new look ? as you can see in the Webzari for the Searchblog. Sorry I can?t translate it for you. Here?s one screenshot that explains partly what the tool is showing:

If you mouse over the planets in the Webzari, it gives you more information about the links and clicking on the planets returns the corresponding blog entry or other text. Try clicking around on it ? even though you might not understand Korean, you?ll get the gist of things.
You can even save Webzari searches in My Hub, the Korean version of My Web.
Give Webzari a spin and leave us a comment to let us know what you think!
Arah Cho & Priyank Garg
Yahoo! Search
Reaching the Weatherman at Yahoo! Search
At conferences, webmasters always ask me how they can connect with us here at Yahoo! Search.
I usually have the time to flash a slide that shows a list of URLs to different forms and services for webmaster support and webmaster feedback. I also talk about Site Explorer, which webmasters can use to explore the Yahoo! Search index. I know most people don?t have the time to write these down, and hope that the information is disseminated via presentations made available to conference attendees.
I?ve also promised to make this information available via the Yahoo! Searchblog.
Information about Yahoo! Search can be obtained via a very handy URL - http://help.yahoo.com/search. Memorize, bookmark, email or copy it, whatever works...
We?ve also added a new link to this page, Webmaster Resources. This includes the list of resources that I usually pop up on screen at conferences. The URL is http://help.yahoo.com/search/resources. Save it and tag it on del.icio.us or MyWeb, whatever works...
Oh - and as we mentioned, we?re retiring the old feedback email address and replacing it with a simple form. It is included in the webmaster resource list noted above, and the URL is http://help.yahoo.com/search/feedback.
We hope this will make it easier for you to get answers and provide feedback. Although you may not hear from us directly, we do pay attention and the product changes for the better based on it.
Thanks,
Rajat Mukherjee
Yahoo! Search
Submitting Site Feeds and other Site Explorer updates
Two months ago we launched Site Explorer, a tool to explore the pages from your site in the Yahoo! Search index and the inlinks to those pages. Many of you have been using the tool actively and we appreciate the positive response and feedback we have received. It was gratifying to see the panelists at Webmaster World using the tool for site reviews. We have now launched it as a Beta on our International destinations, including Argentina, Australia & NZ, Brazil , Canada (English and French), France, Germany, India, Italy, Mexico, Singapore, Spain and UK as part of the Services and Tools.
Site Explorer also tries to make it easy for you to tell us what we don�t know about your site. To make it even simpler, we now accept site submissions in the following formats.
Note that for any URL (submitted directly or obtained from a feed), we will extract links from it and find pages we have not discovered already.
We�ve also added something many of you have asked for, the ability to filter out internal inlinks when exploring the inlinks to your site or to particular pages. Please try out these new features and let us know, as many of you already have, what you think about Site Explorer. Even though we can�t respond to all your emails, every piece of feedback is appreciated.
Enjoy exploring!
Priyank Garg
Product Manager
Webmasters, tell us what we don�t know
Having been to a couple of Search Engine Strategies conferences, I realized how much you look to search engines for information on how your content is indexed by them. I�ve heard stories of elaborate scripts that scrape search engines, using �site:�, �link:� and �linkdomain:� queries to understand your content�s relationship to other pages on the web. Through these queries, Yahoo! provides unique information, but often there is more that you are looking for.
Today we are launching Site Explorer from Yahoo! Search, a webmaster tool we talked about at SES San Jose. WebmasterWorld and Search Engine Roundtable have been tracking it, and so have some folks on My Web. Currently, you can use Site Explorer to:
Site Explorer is geared towards your needs, providing 50 results by default, web services APIs, the ability to export the data to a TSV file for further analysis, as well as free submission for missing URLs.
Tell us what we don�t know. If you don�t find a URL that you expect to be in the index, use free submit. In case you hadn�t heard, we are also accepting lists of URLs, so you don�t have to provide us one URL at a time.
This is a starting set of features of what we hope becomes a truly valuable tool for you to interact with us. So please send us feedback, tell us how the product works for you, and let us know what else you�d like to see. Enjoy exploring, and tell us what you don�t find!
Priyank Garg
Product Manager, Yahoo! Search

![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.ysearchblog.com%2Fimages%2Fbookmark.gif)
![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.ysearchblog.com%2Fimages%2Fdelicious.gif)
![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.ysearchblog.com%2Fimages%2Fdigg.gif)


![[image]](http://mowser.com/img?url=http%3A%2F%2Fwww.ysearchblog.com%2Fimages%2Fysb-hosted.gif)