Posts Tagged ‘Site Architecture’

6 Not So Obvious Types of Duplicate Content

Friday, March 5th, 2010

When thinking about duplicate content, we generally only consider written content. Is what you are posting on your website original? Simply copying and pasting something from somewhere else is a big mistake- that much is obvious.

But something you may not consider to be duplicate content may be considered such by search engines like Google, Yahoo! and Bing. You see, they’re trying to return diverse content to their users …they have a vested interest in ensuring what they display on page 1 is helpful and diverse for their users.

That is what you have to consider – what do search engines consider duplicate? Not doing so could spell disaster for your site’s rankings. Site penalties can occur if a site is simply structured the same way for instance.

Continue reading for 6 not so obvious types of duplicate content to ensure you are not penalized for such an infraction.

1. Two websites share the same structure and content

Two websites having the same structure (i.e. same three column template) and the same content on a single page or site wide with the same linking scheme is prone to trouble. This is by far the most extreme example of duplicate content but the easiest to identify.

2. Identical structure with paraphrased content

Another scenario where two sites have an identical structure but the content is not 100% identical. Copywriters and content developers may see this as a grey area. But Google has a zero tolerance policy on this issue…content from one site simply cannot be a rehashed version of the same thing from another site.

3. Identical structure with similar content

In structural terms, it’s pretty clear two sites are identical. In this situation, the content on each site still has too close a resemblance. If it appears the content is managed in a similar fashion and presented in the same scope, the site(s) may be penalized.

4. Partially identical structure with similar content

While it may seem like splitting hairs, Google is very meticulous. Site A and Site B may only have a few pages that are identical but if the content between the two sites is sufficiently similar, they may take action and not index one of the sites.

5. Identical structure with reminiscent content

In this scenario, both sites have a similar structure and linking scheme while the content is relatively similar. Some content developers may think simply using a Thesaurus to change a few words may avoid detection but the search engines can spot this kind of move.

6. Unique structure with pieced together content

Two sites may have their own unique site structure and linking scheme but their content is simply scraped together from different sources the writer found. Search engines will flag this as duplicate content and act accordingly.

Image, videos and other document formats are sometimes ignored by the search engines since most don’t have the capability to spot duplicate forms of these types of content. They sometimes attempt to remove duplications based on file size, image size and file name however.  Therefore in the future, it will be important you think about this as technology continues to evolve.

It should be obvious that simply copying and pasting content to your site is not only dishonest, it is robbing the original creator of that piece without due credit and compensation. But these other scenarios where search engines may flag your site are just as important. While you may not think your site is a duplication of another, what the search engines see is really what matters.

Do Meta Tags Really Matter?

Monday, March 1st, 2010

As far as propelling your website to the top of the search engines, they don’t. While Meta tags have no significant impact on actual search engine rankings, they do provide value in how your website appears on a search engine results page (SERP).

Meta tags are basically text included in the source code of an HTML document that’s intended to describe the page to a search engine for the purpose of cataloging its content. There are two types of Meta tags – description and keyword

So do Meta tags matter?

Yes they do as the description found within the tag indicates what you want someone to see on a search engine results page. They help a searcher easily determine whether or not your page is relevant to their needs. Without it, many people will simply move on and think your site doesn’t offer them any value.

If you do not include a Meta tag in your source code, the search engine will glean your page and cherry-pick words it thinks best describes your page. This doesn’t work too well however and can result in terrible descriptions being displayed on a SERP.

You should be very careful in how you use a Meta tag though. Many SEOs have abused these tags in the past thinking it would garner them a competitive advantage. To avoid any potential problems, avoid repeating keywords and use only those words relevant to your site’s theme. Beware of any trademark infringements and check with legal counsel before using another company’s trademarked terms anywhere in your source code.

Typically, the character limit for both description and keyword Meta tags is 250, which includes spaces and commas. Anything past the 250 mark is generally ignored by the search engines.

Just be careful – improper use of a Meta tag could result in your site being penalized by the search engines.

Characteristics of Natural and Artificial Links

Wednesday, February 17th, 2010

Other sites linking to yours are one way search engines evaluate your site to determine where it should be displayed in a search engine results page. The more incoming links a site has, the more important the search engines see it.

But it depends on the type of links too – simply having a bunch of links pointing to your site isn’t going to pass muster.

It’s possible to go out and pay lots of money and do other nefarious things to get links to your site. Search engines like Google and Bing can see this, as they can differentiate between sites that have natural links to it versus ones that have artificial links.

So what’s the difference between the two?

First, the anchor-text, or the keywords that contain a link, is very diverse with natural links. One link to a site may contain “search engine optimization firm” and another may be “online marketing experts” for example. Artificial links though will have more uniform anchor-text…all of the links pointing to a site will only have one or two terms for its anchor-text.

This is one red flag to the search engines that you have an artificial link structure which in turn, causes your site to lose the rankings battle.

Another difference between natural and artificial links is the rate at which links appear. Sites with a natural link structure will see consistent increases in their link count while sites with an artificial link structure will see sudden and dramatic increases then a lull in activity.

Sites designed around a natural link structure do not have reciprocal links. Meaning, the site linking to them did it voluntarily and does not expect a link back in return. Almost all links in an artificial environment are reciprocal.

And finally, natural links point to resources that can be of further use to the reader. Artificial links mainly point to link farms and other places that serve no purpose in making the site more useful for its visitors.

Remember these differences when thinking about your site’s link structure. You should strive to create the most natural looking link structure as possible. From a search engine’s point of view, the best links are those that are unrequested…search engines reward those pages and sites that get voluntarily links for great content.

Finding a Proper Balance of Links for your Website

Wednesday, February 10th, 2010

Search engines like Google, Bing and Yahoo! find your website through other sites linking to it. A site with a large number of quality sites linking to it signifies a certain importance to the search engines, boosting your rankings in the process.

There are many ways you can acquire links to your site. They can be purchased from a link farm, or you can get people to link to you through social networks like Facebook, StumbleUpon and Digg. In the end, the highest quality links come from sites in a similar industry whose audience will find your content appealing and useful.

Allowing the structure of links to your site to become too homogenous can cause many negative consequences for your site and its rankings. Links coming from only one type of site, or only to your homepage or links that all have the same anchor text links are all red flags to the search engines that your site has an unnatural link structure.

As a result, search engines will penalize your site, perhaps even de-listing it from the search engines.

To avoid trouble like this, you should attempt a general 80/20 link balancing act, which means:

  • 80% of your links should come from sites that are topically relevant to yours with the remaining 20% coming from unrelated or marginally related sites
  • 80% of incoming links should go to your homepage with the remaining 20% (at minimum) going to sub-pages within your site
  • 80% of links should have your keywords in the anchor text while the remaining 20% having a less optimized link, like “click here” or your URL as the anchor text
  • 80% of your links should be one-way and the remaining 20% reciprocal

Of course, these are just general guidelines but a good rule of thumb to avoid any problems with the search engines. You don’t want your site to appear over-optimized to the search engines so you need to balance your link ratios to avoid this red flag.

Building Internal Link Structure after Google Indexes your Site

Friday, February 5th, 2010

Just what is the best way to unveil a new or vastly expanded site to the world? What I mean by “best way” is the best method for achieving high search engine rankings quickly.

There’s no universal way to answer that question. Every SEO/SEM has their own strategies that they implement, test and tweak. Simply throwing something up there and forgetting about it is a terrible idea.

But an interesting way of rolling out a new or renovated site was explored on a recent WebMasterWorld discussion thread. A senior member of the community, Wheel, is expanding a site he manages from 21 to approximately 5,000 pages. He’s looking to take a new approach to rolling out his site – let Google index all of it up front then go back and use Google and the site command to determine which pages to internally link to.

A popular SEO tactic is linking to other pages in a website from popular keywords. This gives you added boost in the search engines for that keyword phrase.

What’s different about Wheel is that he’s going to post all of his pages once and let Google go ahead and index them. He says he’s doing it this way because he has so much content that it would be impossible to sort through it all. Therefore, he will go ahead and get it all indexed then use his site command with keywords…[site: wheeldomain.com keyword+here]… to find the pages that contain that specific word(s) he wants to rank for. He will then choose the strongest pages and link to other pages on the site with that keyword.

Interesting method indeed, which drew mixed response in the forum since this method may initially seem backwards to most search engine optimization professionals. Some say Google will degrade his site outright while others think it will be wise to unveil the site in bits and pieces rather than all at once.

One reply to Wheel’s question at the bottom is pretty interesting – I suggest taking a look at it.

Where Does Site Traffic Come From?

Monday, January 11th, 2010

Of all the online marketing channels – organic search/SEO, referrals and PPC – where does the majority of traffic to a site originate from?

Does someone do a search on Google using keyword phrases to search for the products and/or services you offer online?

Or, are they referred to your site from an online directory like YellowPages or Google Maps? Or, do they see your PPC or social network ad?

Data recently compiled at HubSpot definitively proves that organic search is the primary driver of traffic to websites – which underscores the importance of them being search engine friendly. From the survey of 2,100 of its customers, the company shows that site traffic coming from online searches is 67.2% greater than from referral sites and 156% greater than PPC.

They further break the data down by industry – traffic from search engines is much higher in manufacturing, medicine/health services and retail. Referrals play a more important role in other industries like technology, software and online marketing but still does not exceed online search as a primary source of traffic.

So from this data, it really depends on your industry in determining what you allocate to each of these online marketing areas.

In terms of organic search and SEO, Google is by far the most popular search engine still, handling 71% of online searches this past November according to Hitwise. The two closest were Yahoo! at 15% and Bing at 9%.

It’s clear though – having a website optimized for the search engines is key to driving traffic.

Use Caution with Session IDs and Dynamic URLs

Wednesday, January 6th, 2010

In order for a spider to crawl your website and index it in the search engines effectively, the web address or URL for your webpages should be as simple as possible.

As we’ve discussed in the past, sites with static URLs that are simple are crawled and indexed much more efficiently than those containing dynamic characters and session Identifiers.

Session IDs are most common in ecommerce sites and are embedded in a URL so the website can track their customers from page to page and they are used keep track of items in a customer’s shopping cart. But these IDs cause problems for search engine spiders because they create a large number of links for the spider to crawl. This can create a situation where the search engine indexes essentially the same page over and over. Search engines like Google refer to it as a ‘spider trap’.

Below are a couple of examples of how session IDs can give the appearance of an endless number of pages within a single site. A spider coming to your website may find a page with the following URL:

http://www.yoursite/shop.cgi?id=dkom2354kle03i

This page gets indexed but when the spider returns later to look for new content, it finds the following:

http://www.yoursite/shop.cgi?id=hj545jkf93jf4k

This is actually the same page as before, just with a different special session ID but the spider sees it as a brand new URL. Because of this confusion, search engine spiders are programmed to avoid pages containing these session IDs.

While Google and others are trying to improve their ability to crawl URLs with session IDs, it’s best to avoid them whenever possible. It’s best to avoid them until you absolutely must track what a customer is doing, like when they start adding items to their shopping cart.

It’s also possible to store session IDs in cookies rather than URLs. Changing this may require the expertise of a web programmer though.

The gist of the story is this – the more dynamic variables in a URL, the more difficult it will be for search engines to index your pages. To maximize your position in the search engines, use simple URLs that are easy to locate, crawl and index.

What Extension Should I Choose for My Domain?

Wednesday, December 23rd, 2009

Choosing a domain name for your new website is the first step to developing your online brand and building rankings in the search engines.

But in addition to the domain name, you need to choose the domain extension as well – the .com, .net, .org, .biz or .info at the tail end of a web address.

In terms of ranking high in the search engines, .net or .org extensions are given equal weight. Plus, you are likely to find more domain names available with these extensions which can be purchased from their owners for a cheaper price than .com extension domain names.

.com extensions hold some advantages, mainly because of most web users’ familiarity with that domain extension. Not controlling the .com version of your domain means you could perhaps lose out on what’s called type-in traffic, or traffic that comes when a searcher types their query directly in their browser’s address bar.

Also, if someone else owns the .com version of your domain name, they can possibly bleed traffic from your site if people type in your domain with the .com extension. This is okay if your main goal is to rank high in the search engines but if you think this diversion of traffic will be a problem, be sure you can at least control the .com version of your name or choose another name altogether.

If you’re based or your target market is outside the United States, you can also consider country-specific domains like .co.uk (United Kingdom) or .co.in (India) for example. You will certainly garner an advantage in the search engines for people in the respective country performing search queries.

.info extensions are generally very cheap and abused by spammers, which is why they’re not recommended for building rankings in the search engines. The other domain extensions you’ve probably seen, .gov and .edu, are reserved exclusively for recognized educational institutions and agencies of the U.S. government.

Links from these sites though are extremely valuable.

Importance of Good Information Architecture

Monday, December 7th, 2009

Having a successful content and information oriented website means it has to be organized in a way that’s easy for users to navigate. This not only improves your conversion rate but your site’s rankings in the search engines as well.

It’s all too common for sites to have a lot of content – articles, blogs, video clips, photos, etc. – that’s totally disorganized and cluttered with noises, ads. Suffice it to say this would not lead to a good experience for any user. Without good user experience, no amount of optimization will help your site’s rankings.

So how can I be sure my site’s information architecture is the best it can be to lure in the most visitors and make the most conversions?

Understanding how people search online is the first step to developing good information architecture. When searching online, we want content that’s fast and simple and in small chunks…we like to stay on task.  Google knows this so to have high rankings, create sites using keywords you know people respond well to.  You can integrate head and long tail keywords to tap into the main terms people use when looking for what you’re offering.

Having too many links to off-site pages especially messes with a site’s information architecture. Having too much scattered and loosely connected information causes the site/page to lose its core message.

Accommodating your users is the number 1 goal of your website’s content. For it to work to your maximum advantage, it has to be setup in a way that doesn’t inhibit user friendliness or the search engine’s ability to crawl it.

Search engines look closely at user-friendliness when ranking websites. And especially since Google may begin factoring site speed into their ranking algorithm, flashy sites undoubtedly will suffer in terms of their ranking and conversion.

Google’s Search Engine Ranking Factors

Wednesday, November 25th, 2009

Just what are the factors Google uses in their algorithms to rank sites in the search engines?

No one knows exactly of course – search engine optimization professionals have been trying to do this even before Google was born.

At the recent PubCon conference, Google software engineer Matt Cutts commented that there are over 200 ranking factors in Google’s algorithm. So, SEOs on WebMasterWorld are starting to write down what these factors may be. There are only a few on there now so they have a long way to go.

Of course, determining the significance placed on each of these is a whole other kettle of beans. Search engine optimization pros have been trying to do this for years now. But as time drags on and more websites come online, this has only gotten more difficult.

Some of the major ranking factor categories include: domain, architecture, content, linking, penalties and more.

A list like this can be useful on some level…but knowing which elements carry more weight is what’s more important and where you need to focus your effort.

HTML Sitemap or XML Sitemap – Which is more valuable?

Friday, October 9th, 2009

Google software engineer Matt Cutts answers a question from a SEO in India about whether it’s better to build a HTML sitemap or XML sitemap.

An HTML sitemap is basically a good old fashioned landing page that contains links to all the pages on your website. It’s very useful for users trying to locate specific information. Larger sites may require several HTML sitemap pages but for smaller sites, they are a perfect resource for visitors trying to figure out what’s on your site.

Check out SEO Advantage’s HTML sitemap here…as you can see, all of our important pages are linked from this page. A visitor looking for SEO services can go here and easily find what they’re looking for.

An XML sitemap can have several files but is only visible to search engine spiders.

So when asked which one to prioritize, Cutts says an HTML sitemap since it’s viewable by both site visitors and search engine spiders.

Remember, Cutts is a software engineer not a SEO professional but he is correct in his assessment of the workability of each type of sitemap.

Creating an XML sitemap is very easy once you create an HTML sitemap according to Rusty Brick at S.E. Roundtable. Therefore, he suggests making both so you can be sure all of your pages are properly indexed.

Watch the quick video to learn more.

Customize your Site’s Appearance in Google

Monday, October 5th, 2009

Of all the talk here about web site customization and organic search engine rankings, one point we haven’t mentioned is customizing how your site appears in Google’s search engine results page.

That’s right, you can now easily customize how your listing appears in Google…before you were limited to just titles and descriptions but now you can include star ratings, product images, prices, business addresses and more.

Look at this example…you can see a star rating system, number of reviews and a price range.

Google’s Rich Snippets feature displays this information, pulling it from special tags imbedded in the page’s HTML code. Those special tags come in two forms: microformats or RDFa.

While they sound complicated, each of these formats are pretty easy to master. Developers have yet to settle on a standard but Google accepts both. To denote data to be displayed on your Google listing, you simply wrap it with descriptive class attributes in one of these two tags.

Here’s what the code would look like for Café Cakes:

<div class=”hreview”>

<span class=”name”>Café Cakes</span>

<span class=”rating”>4</span>out of 5.

<span class=”count”>28</span>reviews.

<span class=”pricerange”>$</span>

</div>

The “hreview” tag tells Google that it’s a review…the other information is added using the name, rating, count and pricerange span classes.

For now, Google only has Rich Snippet Listings for marketing restaurants online. They are working to add more categories but currently, business directory sites and others based on user reviews and categorizing businesses stand to gain the most. But Google is rapidly expanding this program so Rich Snippets is likely to become more relevant to many other types of websites in the future as well.

Search Engine News says “…listings that are enhanced with Rich Snippets can expect to increase their click through rate – so we highly recommend them.”

Google has some great examples and tutorials on the following Rich Snippets: Reviews, People, Products and Businesses and organizations. And learn much more about Google’s Rich Snippets in general here.

Google Software Engineer Says Google Does Not Use Meta Keyword Tag in Determining Rankings

Friday, September 25th, 2009

Recently, Google’s Webmaster Blog has received a number of questions regarding the search engine’s use of meta keyword tags, specifically how they use them for ranking websites.

To my surprise, Google totally disregards meta keyword tags in ranking websites for the search engine!

So why do they not use these identifying features we put into our web pages?

In the late ‘90’s, Google and other search engines looked at only content and weren’t so concerned with the number of links pointing to a page like they do now. It didn’t take long for keyword meta tags to become a place where dishonest webmasters would stuff irrelevant keywords the public would never see to accelerate their rankings.

Since this abuse was becoming a problem, Google quit looking at keyword meta tags.

But meta description tags are useful and shouldn’t be disregarded in your web pages…search engines like Google sometimes use them in the page descriptions you see right under the link to your page in the search results.

But as far as your rankings are concerned, Google doesn’t use the meta description tag at all and it’s unlikely this information will be used in the future.

Read more about this on Google’s Webmaster Blog and watch the short video of Matt Cutts explaining how they use meta keyword description tags.

Target Web page Indexing with your Robots.txt File

Thursday, September 10th, 2009

This strange sounding name isn’t some alternate website personality.

A robots.txt file is a simple text file placed in the root directory of a website that is used to provide instructions to a search engine spider that crawls and indexes your website.

Specifically, the file tells search engine spiders, which are actually computer programs, which pages NOT to index.

So why would you not want some of your pages crawled and indexed?

Well these computer programs have limited time and resources. You want them to spend their time indexing the high value pages on your site – ones with important content, product listings and sales pages.

Pages containing a shopping cart checkout for example are not that important so you do not want the spider to waste valuable resources and time indexing that. Anything in your cgi-bin folder and directories containing images or sensitive company info shouldn’t be indexed either.

That’s another important function of a robots.txt file – it helps protect your site from hackers. Search engine spiders will crawl and index just about anything it can get its hands on, including sensitive places like password files.

One more very important thing about your robots.txt file – adding the following two items (User-agent: * and Disallow: /) to your file can prevent all search engines from ever indexing any of your site. The asterisk is a generic symbol for all and the forward slash in the disallow command indicates the root directory, meaning everything you have.

To prevent only certain places on your site from being crawled and indexed, spell them out in the “disallow” line (i.e. Disallow: /cgi-bin/)

Of course, if you want every webpage in your site crawled and indexed, there is no need for a robots.txt file.

Optimize Web Pages for Search Engines – Be Very Careful with Frames, JavaScript and Flash

Thursday, April 23rd, 2009

Building a new web site? Or simply updating an existing one?

A new article at the search engine optimization knowledge center from SEO Advantage addresses three features of web pages that can hinder your rankings and how to deal with them.

Pages using Frames, JavaScript and Flash CAN be optimized for the search engines – BUT, each presents their own unique challenges.

While it’s best not to use Frames, JavaScript and Flash, web pages without them are easier for search engines to crawl and index…there are ways to minimize any negative effects of these features if they are a must.

Read the article here.