Why no Social Engine Optimization?

We have been talking about SEO for the past 6 blogs. Now I’m start to curious why there is a Social Engine Optimization given the social network is so popular and have become on par with how people discover things. I guess maybe the acronym is the same with SEO so maybe they are called something like Social Network Optimization (SNO) and turns out it is eventually called Social Media Optimization (SMO) the definition on wiki is:

Social media optimization (SMO) is the use of a number of social media outlets and communities to generate publicity to increase the awareness of a product, brand or event.

This sounds to me does not have much tech engineering in it at all but more close to the marketing and promotion side. Why would these guys using the word “Optimization”? My guess is they want to sounds smart in order to charge more money.

Well, let’s talk about some basic tech stuff that is needed for your website or blog to work better with social network/media. First let’s have a look at how my blog is doing on Facebook and Google plus:









As we can see, Facebook did a much better job than Google plus on all the posts. The last one Facebook picked my background image because that post does not have any image. So yes, social network will try it’s best to pick the right meta info to show on them but a lot of times they do a bad job. To help them, you can add some more instructions on your page. Most importantly they can help them pick the right image but also a lot other informations that they may get wrong or you want to customize.

Facebook uses Open Graph. Twitter uses Twitter Cards. And Google plus uses Schema.org markup, the same thing they suggest for SEO. Interesting but also make sense, they are a search company after all. So I guess I should not say Google did a bad job. I did not do the SEO job properly.

For more details on each, I’ll not cover in this post. You can read on their site or here is a general introduction article.

SEO Part 5: Microdata – help search engine understand better.

In my previous post, I explained that metadata is for search engines to easily grab the most important content of your webpage. But nature language is still hard for machine to understand. You need background knowledge and context to avoid ambiguity. For example, if you have the word “Apple” in your blog post, are you talking about the fruit or the company? The question is extremely trivial for human. One can read the context or even just glance the image in the post to figure it out in a split of a second. Computer has been improved a lot over the years on processing nature language. Smart phones can take voice command from users pretty accurately. But this takes a lot computer power and the expected input is limited to a small set of commands. If we use that to search engine, it will be too slow and results in too many mistakes. Add HTML Microdata with the schemas from schema.org can help solve this problem. The Getting started page from schema.org does a very good job demonstrate how to do it and it is very easy. And if you have a brief look at their Schemas Page, you will realize that it is probably even more useful for websites types other than blog. For each blog post, someone typed them in and can write proper metadata for the page. But if a page have multiple sections and is dynamically generated, it is better to use Microdata since each section can just say what they are. This is very important for website that is serving categorized data, or selling products online.

apple the fruite

apple the fruite

apple company logo

apple company logo


SEO Part 4: Meta, meta, metaaaaaaaa!

First let me apologize to Yoast. I questioned if they know what they are doing in Part 3. After use their WordPress plugin a little bit, I think they did a good job. Using their plugin, I can easily add meta data to my blog post. And they have analysis that help me a lot.

Yoast WordPress plugin for edit metadata

Yoast WordPress plugin for edit metadata

Yoast WordPress plugin that helps you analysis

Yoast WordPress plugin that helps you analysis

You can think of metadata as the abstract of an academic paper, or highlight of an article, but for your webpage. And the robots from search engines are like lazy people, they mostly judge your page from the metadata you provide instead of the content. Not that they are really lazy, a robot doesn’t even know what is lazy, but there are technical challenges that they may not be able to understand your page’s content even if they want to. Especially for picture and video. Also anything you put into flash even text, but frankly who is still using flash except ads. A picture worth a thousand words to us humans and also get over language barriers but for robot is really hard to figure out what is going on. They can easily tell what’s your picture’s size, resolutions and couple other technical values, but they really don’t know what is in your picture. So you need to add metadata. There are metadata comes with the picture file for example when and where is it taken and what is the camera, but for you, the most important metadata is what is the picture about and put that in the image’s title and alt fields.

Recently Google announced that they have some advance in have the robot recognize image. But before that get implemented so every image you post is recognized by the robot and then indexed, your best choice is give proper text meta data to all your images, videos, etc to make the robot happy. Hm… Are we already slaves to machines?


SEO Part 3: Sitemap added. Robots welcomed.

Just added sitemap for my blog http://www.gordonsun.me/sitemap_index.xml . Then submitted to Google Webmaster. Hopefully this will make my content more visible on Google and worth the hour I spent.

I started by Google “WordPress sitemap plugin” and got to this one Google XML Sitemaps. After install, it did not work. I assumed it is because this plugin uses file system but my blog is on Heroku and it does not allow modification of file system. I tried to search for solutions but could not find any related answers. Could it be that I’m the first one does this? Impossible. This article has the exact title that I’m looking for but I could not figure out what is it talking about at all. But it is using a different plugin – WordPress SEO by Yoast.

Their plugin seems popular and their art work is funny. But their plugin has a premium version and the free one seems to be just bait. Also, their website will redirect https://www.yoast.com/ to their naked domain https://yoast.com/ which is the opposite to what we discussed in Part 2. This makes me wonder do they really know what are they doing.

Anyhow, I installed their plugin. Their tutorial is nice. I turned on the sitemap feature. And same problem. 404 not found. But this time the plugin gave me some hint.

As you’re on NGINX, you’ll need the following rewrites:

rewrite ^/sitemap_index.xml$ /index.php?sitemap=1 last;
rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?sitemap=$1&sitemap_n=$2 last;

So the problem is that the sitemap url http://www.gordonsun.me/sitemap_index.xml looks like accessing a static file. But the file is actually does not exists. It  actually need to be dynamically generated by the plugin but the web server NGINX did not know that unless the “rewrite” lines above is added to its configuration. Also needed “rewrite ^/main-sitemap.xsl$ /index.php?xsl=main last;” because the NGINX I’m running is a newer version.

It seems to me that sitemap is a thing started back when website is very static and changes much less frequently. So the sitemap is also a static xml file on your server that you can change manually if you add/change pages on your website. But nowadays with plugins like what I’m using, it is dynamically generated. And the rewrite rule is used to pretend to be static.

This seems pretty messed up to me. Dynamic and static content should be treated very differently as we discussed in previous post. There is a .org for sitemap http://www.sitemaps.org/ . But I doubt if they have any plans to change that.

SEO Part 2: To www or not to www?

When I register this blog in Google webmaster tool, I receive an message from them and the first thing they ask me to do is to add both www.gordonsun.me and gordonsun.me to the record for index.

Google Webmaster www

Also later Google ask me to pick which one is the preferred link, with or without “www”.

In my case, I pick the “www” to be the preferred link. If you just try enter “gordonsun.me” in your browser and then press enter, you can see that the “www” is automatically added to the address. This is done by a 301 redirect that I set up using DNS provider. I’m using NameCheap, which from my experience is better than GoDaddy. For one thing, GoDaddy force me to login with my account number which is a 7 digit number that I can never remember.

NameCheap records

As you can see the non-www domain with “HOST NAME” @ is set to simply re-direct to the www url. And the www record is pointing to the Heroku subdomain where this site is actually hosted. Now for those who is too lazy to type “www.”, (probably all of us), “gordonsun.me” simply works. To better understand URL Redirect, CNAME and A record, here is a good article.

I have been doing this without too much thought for this blog and all my previous websites. But a little bit research surprised me. This actually has a name to it called “Naked Domain Problem” and particularly hard for Heroku that they have to put up an article to explain it. This article explore it in a little more details. There is wwwizer.com provide naked domain re-direct back to your www url. If I can point my naked domain to you, why can’t I just do a 301 redirect to my www url? My only guess would be the some DNS provider force an naked domain to be an A record so you need to point to an ip address.

Lastly, there is a www.yes-www.org advocate set up like this. It pointed out that cookies can also be a problem when use naked domain.


The URL Redirect that I showed above is probably not a DNS record type. www.namecheap.com must have encountered the problem too many times that they just added it to the dropdown of select DNS record type. Good job!

For example, at GoDaddy.com , the dropdown to select DNS record type looks like this:

GoDaddy DNS records settings

GoDaddy DNS records settings

There is no URL redirect. Instead, you need to use their URL forwarding section:

GoDaddy URL forwarding

GoDaddy URL forwarding

SEO Part 1: This post url has its name in it

Today is the first day I start to improve the SEO for my blog. Right now, this blog is not doing very well on Google. If I search “Gordon Sun blog”, I show up on second page and it is a link to the Reddit entry I posted for one of my blog post. I guess Reddit did a better SEO for me than what I’m doing here. But why that post is very strange. That post was not even popular.

Anyway, the first thing I noticed is that the Reddit post has a better url than my blog here. It has the title of the post in the url. So that is what I’m going to fix first. And WordPress made it easy. A simple setting change using the UI worked.

WordPress Url Change

Also when I test search in Google, “Gordon Sun personal blog” returns my page as the first result because I have “personal” in my site description in the <title> tag: <title>Gordon Sun | My personal blog about everything</title>

I have changed the tagline a little bit and hopefully in the future search “Gordon Sun blog” will return my result as #1. But it feels like Google takes some time to change. My Google webmaster tool is still showing nothing for this blog right now.

How robot sees your page?

Say you got an awesome website running, but nobody comes. (Happens to me all the time) What do you do?

You can of course take out your wallet and spend money on advertising online or offline. But that will cost you money and may not reach your desired result. Trust me, your bar is pretty high when you are spending your hard earned money.

Another way is to get all your friends and family to share it all over the places you can think of. Remember to setup Addthis so it is easier for everyone. But regardless how easy it is, likely after a few days, you will be the only person post everywhere and get banned or silently neutered. (May already happened to me on Reddit.)

Referral marketing is getting more and more popular recently but only applies if you are selling product or services for real money. I’m not sure how a completely free site like this blog can give viewer discount to broadcast to their friends. (If you figured this out, please please let me know.) 100% off $0 is still $0. All I can do is deliver high quantity content that is technology wise useful to you or funny enough to make you laugh (which is much more important) and hopefully 1% of you will click any of the share buttons. Will you be my 1%?

OK, let’s talk about the free and effective technical solution — SEO (Search Engine Optimization). It’s a pretty deep topic and there are people and even businesses do nothing else but that. I’m only going to talk about some basics here.

  1. Make sure search engine’s robot (crawler) have access to your site. Make sure you have your robot.txt file configured properly and allow search engine to craw your content. Do not just include everything either, it will confuse the search engine and convolute the result.
  2. Have a site map. The site map was something quite useful back in web 1.0 days for human beings. But I don’t even remember the last time I specifically go to the site map of any site for discovery. Nowadays, site map are more for search engine’s robot. They are robot after all, they need all the help they can get.
  3. Have your site page urls human readable so robot can read it. Weird, right? Well, eventually it’s human search things on search engine and they will enter human readable things so that’s why.
  4. Have structured content that can be easily and properly indexed by search engine. Add proper and complete <meta> in your html page’s header, use alt for images, and SCHEMA ALL THE THINGS.
  5. Lastly, how robot see your page? Use tools provided by search engine. For example Google webmaster tool. This link is the Google’s cached version of my blog.

Now I realized that I have not done a proper SEO for my blog yet. I’ll focus on that for the next couple of days and share the process here. Hopefully I can do a good job. And if I don’t, then you should not be able to find my blog in the first place.

Moving datacenter is like moving apartment in real life!

  1. You have a date that you have to move. Normally you set the date over a weekend.
  2. You tell all your visitors do not come visit you that weekend.
  3. You start packing a long time before the due date. But you have a lot of other things to do so the packing is really slow.
  4. You are not very worried because you think you can pack things up in the last couple days.
  5. You think you can move everything things Saturday and Sunday will be cleaning things up.
  6. Friday night, you realize your keys does not work. (Half of the team had no access to the new datacenter)
  7. Saturday comes, you start to packing and find a lot of things does not pack nicely into your box. So you start to carry things by hand one by one.
  8. Moving elevator (data pipe) is really slow and eventually breaks down. You have to call your super (admin).
  9. Sunday comes, you are not even half way done and start to panic.
  10. When you are done, you can’t find a lot of things. (For datacenter move, a lot of things does not work.)
  11. You spend a lot of time to change your addresses info all over the place. (This is way harder for datacenter move. Imagine your garage, living room, bed rooms, closet, and even shelves all have their own addresses and they all need to know each other.)
  12. You still got deliveries at your old place. A lot of people did not receive the notice.
  13. You got delivery at your new place and they fell through the floor. (Ah.. this getting less comparable I’m gonna stop)

Because we worked long hours over the weekend, today was actually pretty good today for us. Much less issues than we expected.


Should I post on Sunday?

I said before that I’m going to post here everyday. But for some religious that is a BIG NO NO to work on Sunday. I learned that from this episode in my favorite TV show The Good Wife. Though I’m not religious, I start to ask myself: Would it actually make me produce better quality posts if I refrain myself from post on Sunday or even all weekend? The answer to my question is simply NO. I’m a bad writer and if there is any hope for me to get better I have to write as much as I can.

I actually had to work this weekend. Company is doing some migrations and naturally we had to do it over the weekend when people are not using our system. Most of us do not work or go to school on weekend. The industrial world has a significant 7 day cycle that I image would be very bizarre to an alien observing us from far away galaxy. My theory is that 7 is the largest prime blow 10 and have 7 days in a week is going to be the least boring way to do it.

But work 5 days rest for 2, that is something more recent. I’m from China and back when I was a kid, we go to school 6 days a week and my parents go to work 6 days a week. And one day, the government decides that we should work 5 days every other week instead of 6. For half a year, everything was so confused. We quickly moved to just 5 workdays a week like the rest of the world and quickly. We felt so happy at the beginning  but soon got used to it. Unfortunately for farmers, they still work everyday in the summer because the corps don’t know what is Sunday, they actually need more care when it is Sunnyday. And of course for a lot of service business it is the reverse. They are most busy during weekend when people have time to shopping and do stuff. Except banks here in US. They have so much money, they are closed whenever they can.

So let’s talk about programmers. For us, remote collaboration is pretty common in the industry. All the places I’ve worked at had pretty good WFH (Work from home) policies. And frankly, once requirements is settled, it is much more efficient to work from home with less distractions. When I worked at Microsoft, we have long product cycle and requirements are settled down pretty early. But after I moved to internet related job, the 2 week sprint seems to be most common practice. And requirements changes a lot, I find it benefit more to be physically around peers. We all know Yahoo’s WFH ban. Personally I feel that is a little bit too extreme.

I feel we should ask a harder question: is the 5 workday week a good schedule for computer/IT industry? I have heard of a very interesting proposal that is you work 4 days and 10 hour each day. You come in early and leave late and avoid the bad traffic which also saves you time. But that solution is probably focus more on solve the problem of commuters with kids who have no school on Friday.

What do you think a programmer’s schedule should be? Leave a comment below.


Image in this blog post. Cautious if you are using 56k modem.

My blog now can post image. And an image worth a thousand words so I’m a big writer now. Joking aside, when I think about that idiom as a programmer, it also makes a lot of sense. The size of an image can be much bigger than a thousand words. It presents us quite a few technical challenges such as compression, transformation, render, storage and serving them on the internet. I have a little experience on the last one so I’ll talk about it just a little bit. There are basically 2 types of content on a website — dynamic and static.

  • Dynamic: For example when you request this post, WordPress worked hard to build it for you. If you notice the url you are viewing, it has a query parameter p=151 , basically means you are asking for post 151. WordPress will do couple things but most importantly go to the database and fetch the content that you are reading right now, apply templates and rules to build the html page and give it back to you. This is dynamic content. WordPress does it every time for each request dynamically. At least for now before this blog get too popular and I have to turned on the memcached. I dare you to make this blog popular by share it everywhere using the buttons all over on this page that I talked about in my previous post .
  • Static: After your browser get the html WordPress returns, it is not going to display it directly to you. If you want to see it directly you can right click on a white space of this page and select “View Source”. The source html will not be candy for your eye. What the browser will do is fetch all the required static resources this html page need and then render them together with the html and display it to you. These include CSS, JavaScript, Fonts and most importantly — images. These things does not change from request to request and are static.

You should have some sense of the difference between dynamic and static content now. Static content tends to be larger in size and changes much less frequently. Large in size slows things down but change less give the web server an opportunity to optimize. So a lot of modern web server architecture suggest and supports separate the dynamic and static content serving. Just Google “separating static and dynamic web content”. For this site, we are using Amazon S3. I have another blog post about how I set it up here. You can think of it as a static content server for everyone. It is much more than just that of course. It charges by traffic so does not cost much unless your site is really popular. Again I dare you to make this blog popular by share it everywhere using the buttons all over on this page that I talked about in my previous post . Have a good static day!   AcousticModem