Over the past couple of weeks, one of the biggest concerns about SEOmoz’s new Linkscape tool (which I recently blogged about in reference to the bots that Rand refuses to identify, and then again due to suspicious additions of a phantom 7 billion pages to one of his index sources) has been the complete lack of a method available for someone to remove their data from the tool. Assuming that all of the hints Rand has been so “subtly” dropping are accurate, and the one bot that they do actually have control over is in fact DotBot, then from the beginning the data was collected under false pretenses. The DotBot website clearly states the following as it’s purpose:
Our purpose is rather simple. We want to make the internet as open as possible. Currently only a select few corporations have a complete and useful index of the web. Our goal is to change that fact by crawling the web and releasing as much information about its structure and content as possible. We plan on doing this in a manner that will cover our costs (selling our index) and releasing it for free for the benefit of all webmasters.
If, again, DotBot is owned by SEOmoz, then actual goal of collecting those webpages was the development of a commercial tool. With that in mind, Rand’s refusal to remove pages from the index that the owners do not want in there takes on a whole new level of unreasonableness. When pressed about it, this is the most Rand is willing to compromise as far as removing sites from the index:
3)SEOmoz will ONLY remove your site from DISPLAYING your data through Linkscape if you add a customized SEOmoz meta tag to each and every page on your site, and even then, only after a 30-60 day time period.
Yes, although we are looking at ways to block an entire site from being shown in the future through a registration system. And yes, we can’t block anything until we’ve re-crawled and re-indexed that page, which can take 30-60 days depending on the speed with which we crawl/re-crawl a given URL.
4)SEOmoz is “unwilling to provide a clear concise way to keep data out of Linkscape.”
That’s what you said, and I merely copied it to point out that it had an exception. I know it’s a fun soundbyte, but without the important caveat in the sentence it was in, it’s really unfair to keep using this phrase. That caveat is that we are willing to provide one clear, concise way to keep data out of Linkscape – the seomoz noindex meta tag.
So, the only way Rand will voluntarily remove your site from his index is if you agree to basically brand your website with a meta tag using his company name, and then wait 30-60 days. Unfortunately for him, that’s really not his call.
You own your website and the data it contains (assuming you did not scrape it from somewhere else, of course), and that ownership is protected under US copyright law. Anyone whose rights are violated under that law have specific remedies available to them under the Digital Millennium Copyright Act.
Now, I cannot stress this strongly enough… these remedies are not intended to harass a website owner. They should be used neither frivolously nor fraudulently, and there are penalties for filing false information. You should under no circumstances perform this process for any urls or domains that you do not explicitly own, and if a counter-notification does get filed then you should in fact follow through with a lawsuit.
For all valid claims, I am outlining an easy to follow process for requesting that your information be removed from his index.
First, verify that your content is indeed in their tool. If it is, then the next step is to contact SEOmoz directly. Give them a chance to rectify the situation within a timely manner. Send a polite request that your entire domain be completely removed from the index powering their Linkscape tool, and for a way to confirm that it has indeed been done once they have. The support email for SEOmoz is listed on the site as sitesupport@seomoz.org, or you can fax them the request at (206) 338-3797. In this request you should list who you are, the address of your domain, and your contact information. Despite Rand’s insistence that they cannot do this, it might turn out that they do in fact have the ability after all. Do not skip the step of contacting them first. For tracking purposes, you might want to CC their ISP with this initial request, to document that you did indeed attempt to resolve the issue with them first, although this is not required. If you do decide to do that, SEOmoz’s ISP is HopOne Internet Corporation. The appropriate email to use for these matters, according to HopOne’s AUP, is abuse@hopone.net, and their fax is (604) 608-2953.
If after a reasonable amount of time, say, 24 hours, they still have not removed your sites information, then you can consider sending a formal DMCA letter to their ISP, HopOne. The requirements for such a letter are very specific, and are laid out in 17 U.S.C. ยง 512(c)(3), ” Elements of notification”. A sample DMCA notice for this purpose might look something like this:
To: abuse@hopone.net
Subject: Notice of Copyright Infringement
The copyrighted work at issue is the the entire set of links appearing on my domain at {www.mydomain.com}, each comprised of their respective URLs, anchor texts, and attributes, including both those constituting my websites navigation, as well as those linking my website to other websites on the Internet. While I acknowledge than an individual url in and of itself may not be copyrightable, I maintain that the set of links residing on my website taken as a whole or in sections do in fact comprise a structure that is unique and my own property.The freely accessible URL where my copyrighted material is located is accessed through the gateway page located at http://www.seomoz.org/linkscape . Since the interface that is displaying my content is only visible via an http POST request, it is necessary to enter my domain {www.mydomain.com} into the text box presented, and then press the button labeled “GO”, in order to view the infringing material. Note that while this does demonstrate the existence of the infringing material being used on the server, it is only the one open to the general public without paying a fee, although this request is for the removal of the information from the index completely, including from areas accessible only to paying members of the website.
The contact information for the company of the infringing website, as indicated by their Contact Us page, is as follows:
Office: (206) 632-3171
Fax: (206) 338-3797
sitesupport@seomoz.org
SEOmoz.org
1221 E. Pike St., Suite 200
Seattle, WA 98122I can be reached at {your@email.com}, or via telephone at {your telephone number}. My mailing address is {your full mailing address, including street and number, any apartment number, city, state, and zip code}.
I have a good faith belief that use of the copyrighted materials described above as allegedly infringing is not authorized by the copyright owner, its agent, or the law.
I swear, under penalty of perjury, that the information in the notification is accurate and that I am the copyright owner or am authorized to act on behalf of the owner of an exclusive right that is allegedly infringed.
At your earliest convenience, please respond to this letter at my email address listed above, and let me know what actions have been taken to resolve this matter. Thank you.
My electronic signature is below:
{Put Your Name Here}
Bottom line is, it would be nice if Rand would simply step up to the plate and actually be the nice guy he wants everyone to believe that he is. Until such time as that actually happens, however, as sad as it may be, this may be our only recourse to keep him from using our information without consent.
Actually, you were talking about seven (7) billion false pages in the index, not 11. It is 11 *millions* of pages they earlier claimed to already have, as mentioned in your previous post.
Also, you may want to mention that this rule applies to any search engine or information gathering party and this rule is not targeted specifically at SEOmoz, since you can replace their details with the actual details of your own offending party and send a DMCA notice to anyone, even to Google (if someone scrapes your content, for example). Then again, for the sake of the argument, I guess it’d be offtopic, but I think it’d be a cautious move not to get another “personal attack” claim.
I think everyone knows that DMCA’s are not SEOmoz specifc, and can apply to anyone who scrapes your website. I just thought it might be a good idea to point it out as an option to all of those webmasters that Rand is stonewalling.
Fixed the 7 vs. 11 thing, thanks. ๐
Michael – Not a lawyer, but our lawyer looked into this carefully. Basically, we operate under the same rules as search engines. Since we don’t keep any content other than page titles, URLs and anchor text and display not even snippets of content (like the engines) or re-publishing content, I think DMCA isn’t an effective removal strategy. For example, you can’t use DMCA to ask Google to remove your content from their search results, because the courts have ruled that it’s legitimate for them to show those content snippets (and even to show the cache). Linkscape doesn’t even get close to showing as much as the search engines do, and, just like the engines, we have an opt-out via meta tags.
Not trying to rain on the parade, just pointing out that DMCA is designed to protect against copyright infringement and the courts have, in the past, not held this up against search engines or sites that display only snippets of information in results.
If you’d like, I can have Sarah answer this more accurately and completely.
Yeah… search engines… they’re the ones who allow opting out via robots.txt and manual requests, right?
I think you may want to revisit the whole “we’ve got that covered” line of thinking, Rand, and maybe not make the intellectual leap that just because a free search engine has fair use over certain things that your commercial product (that in your head, anyways, does something similar) must fall into the same category.
You are acting and under the same rules of a search engine? Yeah okay. We all can freely search at your search engine, and opt out of our stuff being scraped and sold for profit… right. Got it. And this is like Google? I believe you should consult more than a few lawyers on this. Claiming just like a real search engine who does not steal anything on our pages without our permission is naive. Stealing stuff before the launch, and before any type of opt out procedure is put in place is bad practices even with a majority of true blackhats out there. You’ll have to do better than this.
Rand is absolutely correct here and using the DMCA will backfire severely here and he could win massage damages in a counter claim of a frivolous DMCA takedown.
From the chilling effects FAQ:
“Question: Does copyright protect words or short phrases?
Answer: No. Names, titles, and short phrases are not subject to copyright protection. These are not deemed to be “original works of authorship” under the Copyright Act. Names may be protected by trademark, in some instances. See the Trademark FAQ for more information.”
So if you feel compelled to roll the dice, don’t say I didn’t warn you when your bank account gets transferred to the Linkscape development fund.
BTW, if someone else follows this advice and gets in trouble, guess who’s next in line for posting this advice?
Bill, I’m sorry, but you’re quoting irrelevant statements to what is being discussed. We are not talking about copyrighting “words or short phrases” here. That’s not even relevant to the defense Rand is claiming to be covered under. “Names, titles, and short phrases” would fall under “Bill”, “Doctor”, and “Hello World”.
Some of the reasons that Google and other search engines can get away with displaying copyrighted material have alot to do with the fact that they give webmasters the means to remove or block their content (which trust me IS copyrighted) via robots.txt and other means.
I’ve seen Linkscape and nothing within the reports I saw would justify a DMCA take down.
I wouldn’t say this lightly because I’ve used the DMCA quite a few times and I know what my lawyer said it could and couldn’t be used for, but I’m no lawyer and suggest you consult with one quickly before you do something as silly as you suggest above.
There were no cache pages displayed, not even a complete snippet, just page titles and anchor text which the legal minds claim isn’t copyrightable, per the Chilling Effects FAQ and quite a few other places I’ve read.
The only way you would have a valid claim is if you hyperlinked an entire article of substantial text and then they displayed that entire article as anchor text, that simply wouldn’t fly as fair use.
If you want to continue and play the DMCA card which doesn’t apply whatsoever in this case, don’t be surprised when it blows up in your face.
BTW, my quote wasn’t irrelevant, I think we’re having a disconnect on the term title. Page titles are “short phrases” which aren’t the same as the type of title referenced in the quote but that doesn’t make it irrelevant since “short phrases” sums up page titles.
Fair enough, although looking at what they say in context (ie. in conjunction with names and titles) “short phrases” is still probably only referring to 3, maybe 4 words.
I still stand by my assertion that the collection taken as a whole has never been tested in court, do comprise a significant portion the “structure” of a website, and should fall under the protection offered by copyright.
It has never been tested, that part is true… but it would be a far cry from frivolous to feel you have the right to keep someone else from scraping it without permission and then reselling it.
Some firm with bucks just might test this out in court if linkscrape does not give a clear way to opt out of the link data.
The funny thing about all of this is that if this firm would have simply come clean…. totally, at launch, I doubt that webmasters would be this pissed off about things.
If Rand would have stated what he did yesterday:
“we are roguish”
at the beginning, we all would have accepted his decision to go to the dark side and left it at that. His marketing ploy of misleading people into thinking they could “easily” opt out was pathetic. Because of this, someone out there might try the copyright thing. I think both Michael and Bill have good points on both sides, but I highly doubt trying would backfire for all reasons given.
Why can’t Linkscape robot obey the robots.txt like other good robots?
They should be prosecuted for theft.
michael is right. SE’s have to provide a way to opt out via robots.txt
they are a rogue bot and rogue organization at this point. and IMO they are propaganda agents for google. i mean look at the characters involved there. there all pretty chummy with cutts. and they buy and sell links as well, they all do dammit.
i do not trust 99% of them anymore, all “SEO’s” seem to do anymore is snitch each other out, complain when they can’t rank, and claim holy ethics but sell PR links right at all their SEO/marketing sites.
and im sure they all own a copy of xrumer and im sure they practice black just as much as “white”
really SEO become a joke all over again. and who made it the joke that it is? the “SEO Experts”
P.S. i actually like your style of outing, and at least your outting the right parties who downright deserve it.
P.S.
i know this is old article i was just really bored. did this ever change have they released a way to opt out?