Matt Cutts Discusses Webmaster Tools

Matt Cutts Discusses Webmaster Tools


I’m up in the Kirkland office
today, up here for an off side, a little bit of
planning, and they said, you know what? Why don’t we see if we can throw
together a video in 10 minutes or less. So we said, all right,
let’s give it a shot. So we’re thinking about some of
the common things that you do with the webmaster consul, or
some topics that webmasters want to hear about. People like to go to the
webmaster consul and check their back links. They also like to find out if
they have any penalties. There’s a lot of really good
stats in the webmaster consul, but one thing that I’ve been
hearing some questions about is how do I remove my
URLs from Google? So why would you want
to do this? Well, suppose you’re a school
and you accidentally left up social security numbers of all
your students up on the web. Or you’re a store and you left
up credit card numbers. Or maybe you run a forum and
suddenly you’ve gotten spammed by Ukrainian porn spammers,
which happened to a friend of mine recently. So whatever the reason, you want
to get some URLs out of Google instead of getting
URLs into Google. Well, let’s look at some of the
possible approaches, some of the different ways
you can do it. And what I’ll do is all talk
through each one of these and draw a happy face by one
or two that I think are especially good as far as
getting the content out of Google, or preventing it from
getting into Google in the first place. So the first thing that a lot
of people say is, OK, I just won’t link to a page. It’ll be my secret server page,
Google will never find it, that way I don’t have to
worry about it showing up in the search engine. This is not a great approach,
and I’ll give you a very simple reason why. We actually see so many people
surf to a page and then surf to another web server, and that
causes your browser to have a refer in the HTTP browser
codes, the header status, so that the page you
were at before shows up to that other web server. Now if that other web server
says, oh, these are the top referrers to my page, and
maybe that’s a clickable hyperlink, then Google can crawl
that other web server and find a link to your
so called secret page. So it’s very weak to try to
say, oh, you know what? I’m just not going to link to
it, I’ll keep it a secret, nobody will ever find
out about it. Because for whatever reason
somebody will surf from that page and have the referrer
set, or somebody will accidentally link to that page,
and if there’s a link on web to that page there’s
a reasonable chance we might find it. So I don’t recommend
using that, it’s a relatively weak way. Now another way you can do is
something called HT access. So that sounds scary, but let
me just tell you very simply this is a very simple file that
lets you do things like redirect from one URL
to another URL. The thing that I’m specifically
talking about is you can password protect
a sub directory, or even your entire site. Now, I don’t think we provide
an HT access tool on the web master consul, but that’s OK. There’s a lot of them out on
the web, and if you just search for something like HT
tool or wizard or something like that, you’ll find ones
where you can say, I’d like to password protect a directory,
and you can even tell it the directory and it will generate
one for you. Then you can just copy and paste
that into your web site. So this is very good. Why is this strong? Why am I going to draw
a happy face here? Well, if you have a password on
that directory Google bot’s not going to guess that
password, and so we’re not going to be able to crawl
that directory at all. And if we can’t get to
it it will never show up in our index. This is very strong, it’s very
robust, it will work for every search engine because someone
has to know the password to get into that directory. So this is one of the
two ways that I really, really recommend. It’s a preventative measure,
so it’s not good if it’s already had Google get into a
particular area of your site, but if you’re planning ahead
and you know what the sensitive areas are going to
be, just slap a password on there and it will work
really well. Alright, so here’s another way,
one that a lot of people know about. It’s called robots.txt. The standard has been around
for over a decade, since at least 2006, and essentially
it’s like an electronic no trespassing sign. It says, here are areas of
your site that Google, or other search engines, are
not allowed to crawl. We do provide a robots.txt tool
on the web master consul, so given a web site you can test
out URLs and see whether Google bots allowed
to get to them. You can test out whether
different variants of Google bot, like the image Google bot,
is allowed to get to it, and you can take out
new robots.txt files for a test drive. So you say, OK, suppose I try
this as my robots.txt, Could you crawl this URL? Could you crawl this URL? And you can just try it
out and make sure that it works OK. That’s nice, because otherwise
you might shoot yourself in the foot. Suppose you just make this
robots.txt live and it had a syntax error that would
let everybody in or keep everybody out? Well, that could cause problems.
So I recommend you take that tool out for a test
drive, get one that you like, and then you can then
put it live. Now robots.txt is interesting. Different search engines can
have slightly different policies about uncrawled URLs. I’ll give you a very
simple example. Way back in the day ebay.com,
newyorktimes.com, didn’t allow anyone to crawl their site. And so they had a robots.txt
that said user agent starred, disallow everybody. Nobody is allowed to crawl the
site if you’re a well behaved search engine. So that’s problematic, because
if you are a search engine and somebody types in eBay
and you don’t return ebay.com you look dumb. And so what we decided, and what
our policy still is, is that while we won’t crawl this
page because of robots.txt, we can show an uncrawled
reference. And sometimes we can be
pretty good about it. For example, if there’s an entry
for nytimes.com in the open directory project then we
can use that snippet from the ODP and we’ll show it with
nytimes.com being an uncrawled reference, and it can almost
look as good as if we crawled it, even though we weren’t
allowed to crawl it and didn’t crawl it. So use robots.txt to prevent
crawling, but it won’t completely prevent that
reference from showing up in Google. So there are other
ways to do that. Let’s move on to the
no index meta tag. What that essentially says, for
Google at least, is don’t show my page at all in
the search engines. So if we see no index we will
completely drop it from Google search results. We’ll still crawl it, but we
won’t actually show it in our list of search results
if somebody does a query for that page. So that’s pretty powerful,
it works very well. It’s very simple
to understand. There are a couple complicating
factors. Yahoo and Microsoft, even if you
use the no index meta tag, can still return the reference
to that page. They won’t return the snippet
and stuff like that, but you might see the link. We do see some people having
problems with that. For example, if you’re a web
master and you’re rolling out a new site, you might put up
that no index meta tag and be shifting around and developing
the site, and then you might forget and you might not take
that no index meta tag down. So a very simple example, the
Hungarian, I think, HR. No? One of the versions of
BMW has done this. Ben Harper, who’s a musician
you’ve probably heard about, for a long time, and maybe
I think still does on benharper.net, have a
no index meta tag. So if you’re the webmaster for
that site we’d love if you would take that down. So there are various people
within Google that have said, well maybe we should go to this
policy where we won’t show the full snippet, but
maybe we’ll just show a reference to that URL. There’s one other interesting
corner case on no index, which is we can only abide by that
no index meta tag if we’ve actually crawled that page. If we haven’t crawled that page
we haven’t seen the meta tag, so we don’t even
know it’s there. So in theory it’s possible that
you link to a page and we don’t get a chance to crawl that
page, and so we don’t see that there’s a no index and we
don’t drop it out completely. So there are a couple cases in
which you can at least see the reference show up within
Google, and Yahoo and Microsoft, I believe, will
pretty much always be willing to return the reference,
even if you use the no index meta tag. So here is another approach
that you can use. You can use no follow
on individual links. This is another kind of weak
approach, because inevitably you say, OK, there’s 20 links to
this page, I’m going to put a no follow on all of them. Maybe it’s a sign in page. If you’re expedia.com they
have a no follow on my itineraries, which makes
perfect sense. Why would you want Google bot
to crawl into itineraries, because that’s a personalized
kind of thing. But inevitably somebody links
to that page, or you forget and you don’t have every single
link with a no follow. And so it’s very common
that, I’ll draw a very simple example. So suppose we have a page A and
we have a no followed link to page B. Well, we won’t
follow that link. We drop it out of our link
graph, we just drop it completely. So we won’t discover page
B because of this link. But now, suppose there’s some
other guy on page C and he does link to page B. Well, we
might follow that link and as a result end up indexing B. So you can try to make sure that
every link to a page is no followed, but it’s sometimes
hard to make sure that every single one is
correctly handled. And so this, like no index, does
have these weird corner cases where you could very
easily see a page get crawled just because not every single
link was no followed. In the no index case it could
happen because we hadn’t actually gotten around to
crawling that page, and so we didn’t see the no
index meta tag. So let’s move on to another
really powerful way. I helped a friend use this
whenever her forum got spammed by porn spammers recently, and
that’s the URL removal tool. So HT access is great as
a preventative measure. You’ve put a password on it,
nobody can guess what it is, no search engine’s going
to get in there, it won’t get indexed. The other thing you can use,
if you do let the search engines in and then you want to
take it out later, is our URL removal tool. We’ve offered the URL removal
tool for at least five years, probably more. And for a long time it sat on
services.google.com and it was completely self service. It would run 24/7. But just recently the webmaster
consul team has integrated the URL
removal tool into the webmaster consul. And so it’s much, much simpler
to use, the UI is much better. The way that it used to work is
it would remove the URL for six months. And if that was a mistake,
suppose you removed your entire domain and you didn’t
really mean to, then you’d have to email Google’s user
support and say, I’m sorry, I didn’t mean to obliterate
my entire site. Can you revoke that? And someone at Google would
have to do that. Now you can do that yourself. And so it’s very powerful, but
it also gives you a safety net, because at any time you can
go in and you can say, oh, I didn’t mean to remove
my entire site. Revoke that request, and that
gets revoked very quickly. So to use the Google webmaster
consul it’s not that hard to prove that you own a site. You can either make a little
page on the root of the directory, the root of your
domain, to say, yes, this is me, or you can even update a
simple meta tag and say, here’s a little signature to
prove that it’s my site. Once you’ve proven this is my
domain then we give you a lot more stats and this wonderful
little URL removal tool. It can remove at a very nice
level of granularity. You can remove the whole domain,
you can remove just a sub directory, I think
you can even remove just individual URLs. And we show you the status
the whole time. We’ll show you all the pending
requests that you’ve done, and so it’s pending until it’s gone
live, and then once it’s live that’ll turn
into a revoke. And you can say,
you know what? I’ve gotten all the social
security numbers or credit card numbers or whatever
down, so revoke that. And now it’s safe to start
crawling again. So, of the ways to remove or
prevent your URLs from showing up in Google, there’s a lot
of different options. Some of them are very strong,
robots.txt, the no index meta tag, but they do have these
weird corner cases where we might show the reference to the
URL in various situations. So the ones that I definitely
recommend are HT access, that will prevent people from getting
in in the first place, and at least for Google,
the URL removal tool. So if you have URLs crawled that
you don’t want crawled you can still get them
out and get them out relatively quickly. Thanks very much, and I
hope that was helpful.

Daniel Ostrander

Related Posts

50 thoughts on “Matt Cutts Discusses Webmaster Tools

  1. coax says:

    thanks interesting info here

  2. darksnowman56 says:

    BURN!

  3. adamdoesit says:

    That's because google is from the future.

  4. Yebbo Media says:

    Thank you Matt

  5. Vassil Hristov says:

    Yay Matt!
    He definitely should do his own channel! 🙂

  6. Michał Karnicki says:

    Guys! This was so great and educating video! Thank you very much for this ~10 min clip. I would like to see more of this kind. Thank you Matt Cutts!

  7. Jeff S says:

    yeah, i caught that too. thats what happens when computer nerds get on camera. at that moment, he thought "the world will see this!" heheh

  8. welington Borges says:

    qikigogos'di'caros

  9. mkemal1 says:

    good presentation style

  10. friendofyou says:

    "been around for at least a decade, since 2006" 😉

  11. Chung Poon says:

    Nice drawing…

  12. zkraso3755 says:

    Haha, I was just about to comment on that, but I thought I would see if anybody else already did.

  13. 600RR4Life says:

    i got my site rank 1 google i am really happy.

  14. Ajay Singh says:

    good

  15. jimbobeire says:

    I have one criticism of this vid, but it's a biggy considering it's Google. The topic is very specific, and the vid title is very general. Content relevance to title = poor.
    Interesting content, but not very clever title choice. If I was searching for a general discussion of tools

  16. jimbobeire says:

    Actually, the info box is even contradictory.
    I wonder, if the title and info box are the reason it has 58,400 views rather than say 20,000? I think a lot more people are excited about SEO than this topic (although I found this interesting. Thanks Mr. Cutts.

  17. 600RR4Life says:

    The Most Powerful Tool Of Seo Would be title.

  18. lucylovesguitar says:

    I've always thought Matt and Joel look alike. Nice catch. 🙂

  19. Shiva Purohith says:

    best video and best person to learn about Webmaster Tools

  20. John Britsios says:

    Matt thanks for the informative video. But I have a question:

    Don't protected protected directories cause dangling/nodes problems (PR black-hole) if the links to them are not attributed with the nofollow attribute?

  21. Martin Zelewitz says:

    Matt,
    thanks for info. We have now gone through the process: in the crawler access / remove url section a specific sub-page has got the "removed" status (sind june 26). the subpage is no longer online, the cache is no longer visible but this sub-page still appears in serps. How can this be? for how long will this still be the case?

  22. satellitboy says:

    hahahahahahahahahahhahahahahahah
    Do you think he have time for you? rofl

  23. Bob McAlister says:

    you need to resize your video so that it stays loaded,

  24. wearealltubes says:

    Absolutely! We sell some products – they aren't dodgy, they are just a real pain for us – that we'd rather not sell unless a large, regular customer specifically asks for them, in which case we send them a "private" link (i.e. we just tell them it's private!).
    We also have deal/discount pages for special offers – only people coming through an email / affiliate etc can get to them.

    I'm sure there are dozens more reasons for wanting a page out of the index.

  25. Zacchaeus Nifong says:

    HEY! CHANGE THE TITLE OF THE VIDEO FROM, "MATT CUTTS DISCUSSES WEBMASTER TOOLS", TO "HOW TO REMOVE A URL FROM GOOGLE! Geez Matt, you know better.

  26. Konrad Braun says:

    @phxariz85020 I thought I was the only one wondering about that one. lol

  27. Creet says:

    You have wery nice ideas my site is maxiklicks.de and i have a very good googel pr

  28. MSSupport PrinceGeorge says:

    Matt- Have you ever tried to read the subtitles on a Google Video while following the verbal text? Other then that I really enjoy your videos

  29. Adonis K. says:

    dude you are far far away from 2010 😛

  30. Martin Toshev says:

    great video! Thanks for the tips.

  31. Poetra Anoegrah says:

    thank you for the enlightenment .. I so much know about google…

  32. Selvia Chandra says:

    Dear Sir, I am Herry Makassar, Indonesia. sorry my english not so good. please help to solaved my problem.

    I have error robot 3 . page not found 993 what should I do

    thank you

    Herry

  33. HarrisonHill says:

    04:22 could you craawl this URL, could you craawl this URL :))) Can't stop laughing :))

  34. 0u8124you says:

    Way to go again MATT, another "Good Will Hunting" equasion on the blackboard. No one can figure out but you and a few other MIT Techs and some Google employees. One day there will be a company who rivals google because they DONT ASSUME that the language they speak is understood by the masses.
    Google will then become the PanAm of the internet. A bankrupt defunk company who preaches to themselves. GOOD LUCK TRYING TO VERIFY YOUR WEBSITE on GOOGLE. Tool difficult and time consuming….DAMM…

  35. Dushyant Verma says:

    Thanks Matt

  36. Rafael Molinaro says:

    Nice

  37. ggg says:

    Int the first point "don't link to a page" Cutts makes a claim about the Referer header which is not correct. He claims that if I have a webpage “A” open in my web browser and then type in the address of some other page “B” that I do not want to be indexed the browser will send the URL of page “A” as the value of the Referer header for the request made to retrieve the content of page “B”. This is not what the Referer header is for and I am not aware of any browser which does this.

  38. ggg says:

    In the first point "don't link to a page" Cutts makes a claim about the Referer header which is not correct. He claims that if I have a webpage “A” open in my web browser and then type in the address of some other page “B” that I do not want to be indexed the browser will send the URL of page “A” as the value of the Referer header for the request made to retrieve the content of page “B”. This is not what the Referer header is for and I am not aware of any browser which does this.

  39. BluCoder says:

    The title of the video ("Matt Cutts Discusses Webmaster Tools") is misleading, please fix it.
    In the video Matts basically talks about how to prevent search engines to index a given web page.

  40. DecimateTheGamer says:

    hey

  41. Chandan Kumar Mehta says:

    Webmaster Tools, I use Daily, Its a Fantastic Web Application for Websters all around the Globe.

  42. Abdul Haseeb says:

    Thanks Matt, i was using meta tag 'CONTENT="noindex,nofollow" , I really Forgot to down those tag after developing website.

  43. Spook SEO says:

    Very educational video Matt! Webmaster tools are important for any strong SEO effort. It helps us to see our website as Google sees it. Using Webmaster Tools gives us insights into what pages have been indexed on our sites, what links are pointing and our rich keywords. Thanks for mentioning all the indicators that we can use in Webmaster Tools. Very well said again!

  44. FamousOn Web says:

    Nice Ideas given by Mr Matt Cuts to remove URL from the Google webmaster account. Thanks Matt Cuts for such an amazing Video.

  45. Aylin Zortu says:

    Really ?

  46. Cool Tony says:

    Thanks Matt. Helpful video. TT

  47. msbaddie marie says:

    Peppy pig

  48. test2 narasimha says:

    How to remove my links from other webmaster tools?? it is difficulty work that other webmaster didn't responded to mail for particular link to remove!! could you suggest us to remove bad links from other webmaster tools

  49. Networkrr says:

    I would really love to see a 2018 updated video reversion of this content. Many others probably would too!

Leave a Reply

Your email address will not be published. Required fields are marked *