DockerCon 2016 summary

I attended DockerCon 2016 during June 20th and 21st. Feels pretty lucky that some convention center in sillicon valley need renovation so DockerCon choose to be held almost right next door to me. Today I finally finished watching videos of the talks that I did not go to here. And I have been playing with the Mac Beta version of Docker for the last month, so it’s time for this brief summary.

  1. Docker community is growing fast! Everyone is onboard now. Those are not will be irrelevant soon.
  2. Docker Inc, is producing tons of stuff. So many new features in the Beta. How did they do it? Well some purchases I’m sure like Tutum which I really like.
  3. DevOps is really interesting but if you dive deep it is hard. Respect to all DevOps!
  4. I feel know so little about my own trade (service and application development), after watching this video https://www.youtube.com/watch?v=pD0rEtEEwck

How many creative projects should a person work on simultaneously?

This question was discussed in one of the podcasts I listen to called Question of the Day. Their answer is 5-10. I think that is very true. I am currently stuck with one of the projects I’m working on. The project is CraigsMenu. Also checkout the blog there if you can. I have been trying to push myself to work on it for the past 2 days but for whatever reason, I simply cannot. I feel like I’m in a video game and this particular magic spell has been used too much and now requires a very long cool down period. I’m going to switch to something else and I hope it can solve the problem. Now it’s time for me to pick another creative project. Already feeling excited!

Amazing story about Leicester City F.C.

Summary of the story: Leicester City F.C. is a small soccer club in British Premier League. They only got into the league after 13-14 season. In 14-15 season, they almost got demoted. They were at the bottom after 29 of 38 games. But against all odds, they started their crazy come back, won 7 of the rest 9 games and survived. The unbelievable part is in 15-16 season. Because their performance was not that great in 14-15 season, their odds of winning the champion, according to some sports gambling company, was 5000:1 . They also got a new manager (head coach) that was not considered a good choice by most. They do not have tons of money to buy any super stars. Their main striker Jamie Vardy was still playing non-league soccer in 2012. However, against all odds, including the 5000:1 by the gambling company, they won the champion leading second 10 points. Jamie Vardy also broke the record of score most in consecutive games.

I’m not a huge soccer fan so I only learn this story today. It is really amazing. It’s a classic underdog story. I really feel inspired by it. Some high school teacher even put sign in class room to encourage students saying “If Leicester City can win the champion, you can do anything.”

Go Fox!

Coding interview: Find median of 2 sorted arrays of int part 1

You are given 2 array of integers as input. They are sorted. Write a function can find the median of all the numbers.

First need to clarify what is median. When given a bunch of numbers, if you sort them, the number right in the middle is called the median. For example if we have [1, 2, 3, 4, 8, 9], the median is 4. When we have even number of numbers, it is the mean of those two in the middle. For example if we have [1, 2, 3, 4, 8, 9, 10], then the median is (4+8)/2=6

Next let’s solve the problem with a simple direct and easy to understand solution:

Really simple, right? And we are very sure it is correct as long as the merge function and median function are correct. Well you probably will be asked to implement them during an interview so here is one version:

Now, what is the problem with this approach? If you say it is too slow then you probably know the speed of the solution is O(m+n) which is linear. And you are right that we can do better than that but there are problems other than time. The action of merge array uses O(m+n) space which we can also avoid. But the bigger problem is that when we try to allocated the space for array c, we may not able to because m+n may overflow, same reason that I had to do the trick when I try to find the median. So let’s avoid it and avoid allocate a new array all together in part 2

 

 

Falsifiability

Recently I’m bothered by insomnia. There is no one else but me to blame. However one night I encountered a video about Falsifiability by teachphilosophy on Youtube. I made some comments and to my surprise techphilosophy replied nicely with a lot of constructive discussions. This kind of experience is very rare on internet. I exchanged more comments with him/her and clearly he/she has deep philosophy background and my thoughts are much less sophisticated and scattered. “less sophisticated” in my opinion may or may not be a bad thing but “scattered” definitely is. So I hope I can do a better job with this blog post than the comments I have on Youtube.

First, if you are not familiar with Falsifiability, please read wiki. Now, let’s talk about all the examples in techphilosophy’s video.

1. There is a planet between Mercury and Earth. This is a pretty straight forward falsifiable statement, unless IAU went nuts again and keep changing the definition of “planet” then this statement is not well defined. But I want to talk a little about how do we know that the statement is true. We know planet Venus is there between Mercury and Earth, but how can you say “you know”? I’m pretty sure you have not been to Venus and feel it with your hand. You probably have seen it in the sky but how do you know it is a real planet but not a man made satellite? And how do you know it is between Mercury and Earth? Well, we have to use the Occam’s razor. You can safely assume that all astronomist in the world did not create a big Venus hoax for no reason than fool you.

2. All swans are white. Classic example of something falsifiable and got falsified. I don’t have much to say about it.

3. Nonspatial/Nontemporal Fairies live inside my nose.

4. techphilosophy skipped it so no idea what did he put there. But I’m guessing it may be “God exists” and he does not want to get into trouble.

5. I am currently conscious. This is the one where the most discussions happened between me and techphilosophy. His/her opinion is, if I may summarize, this is not falsifiable but very valuable and useful knowledge. I think it may or may not be falsifiable depends on how you define certain words, and when you make it not falsifiable, it is a topic as valuable and as ridiculous as #3. In my opinion, strictly non falsifiable statements are all equivalent. Because of historical reasons, philosophers still likes to cling on to this topic but with advance in neuroscience, they are clearly loosing the grip. So let’s start with definition of conscious, here are some that Google shows:

  • aware of and responding to one’s surroundings; awake.
  • having knowledge of something; aware.
  • painfully aware of; sensitive to.
  • concerned with or worried about a particular matter.
  • (of an action or feeling) deliberate and intentional.

From dictionary.com

  • aware of one’s own existence, sensations, thoughts, surroundings,etc.
  • fully aware of or sensitive to something (often followed by of):
  • having the mental faculties fully active:
  • known to oneself; felt:
  • aware of what one is doing:
  • aware of oneself; self-conscious.
  • deliberate; intentional:

Let’s start with a simple one: deliberate;intentional. Sounds like this will get used a lot in court, right? And who decide if accused was or was not deliberate/intentional? Not himself/herself, it will be judge or jury based on their opinions hopefully based on facts. So for this definition only what other people thinks matters. Remember this because it will be useful later when we talk about other definitions that seems does not involve others but only about oneself.

Next, several of the definitions are about aware of things, surroundings. This is another thing that can be hotly debated in a court since it can also impact the outcome a lot. But again, debated by other people, only what others thinks matters. It probably shows up more in hospital though. Still decided or tested by other people, doctors, nurses, etc.

Lastly, let’s discuss the one that does not involve other people “aware of one’s own existence”. Put that in the statement, it becomes: “I am currently aware of my own existence.” The idea is that when a person makes this claim in his/her own head, he/she is certainly not sleeping or unconscious and this becomes a self fulfilling statement and is always true. And because it does not involve others, it is not falsifiable. Well, with the advance in neuroscience, we can easily tell if a person is awake or dreaming and with advance in machine learning, we can even guess what a person is seeing (link). But even ignore that, let’s say no one is attaching any EEG machine to your shaved head, no one is even around you to observe you, no camera is recording what are you doing for anyone to see in the future, let’s try to make the environment allow the statement that “you are currently conscious” truly unfalsifiable. Then you can think to yourself “I am currently conscious”. But are you?? It has become very similar to the question: “If a tree falls in a forest and no one is around to hear it, does it make a sound?”. Except you say that you are the tree and you are there to hear it. Well, if you perceive that you are conscious as if the tree was able to make a record of the sound when it fell, then the statement becomes falsifiable. It is possible that in the future we will have a machine that can read your memory. So you have to forget your experience to make it truly non falsifiable.

In conclusion, I am not exactly sure what am I trying to say. I think what I’m suggesting is to apply falsifiability strictly to a topic before we decide to engage. I believe it will save a lot of meaningless discussions and a lot of time.

 

 

Static over Dynamic

One reason I believe made Unix and Linux are very successful is their philosophy that “Everything is a file”. But the reason that this philosophy made them successful is files are relatively static than other things (Windows registry anyone?), in my opinion.

Another example would be for certain data that changes infrequently, simply put them in a static file and do a code change is much better than save in a database. Even though a code change and re-deploy is needed when change happens, but the performance and reliability is much better. A lot of time, I’ve seen people try to boost the performance by introduce caching at all levels thinking they get the good part from both: flexibility and the performance. Well what they got is complexity since cache invalidation is the second most difficult thing in computer engineering (right after naming things) according to Phil Karlton. One interesting story that I experienced first handed: One day, we accidentally stopped a service that has been running without any issue for 6 month. We thought, OK, let’s just re-start it. But it won’t, giving errors about cannot read a url. Turns out it is trying to do a one time load of some static data from an external team. The developer did not want to write a static file so he gave the justification that this is dynamic if the other team changed the data, we just need to restart the service. The other team had no idea this service depends on this url to start. There is completely no traffic on that and they removed it together with bunch of things 5 months ago.

Lastly, I have been trying to do a dev op work whole day today. I need to start a MariaDB Galera cluster. Our company’s setup script has evolved over the years from Chef to Puppet to finally Salt. I copied a set of working salt script to setup MariaDB Galera cluster from another team, but ended up reading everything in it try to understand what are the dynamic things it is doing based on host name, configurations, etc. I really wish that we are using Docker right now, because it is a file, it is static. Pull a image is basically copy paste a file. It is much much faster and guaranteed works.

Things I value in software/application design

Below are my personal opinions. I will explain some on them in details in future posts.

  • Proper design over hacking.
    • Not the security type of hacking, for the differences, read my post here. There is a Chinese old saying: “Think three times before you act.” My experience currently is that if the developer think half way through before he starts, it will be a relative successful project. And we give this all sorts of fancy terms such as “Agile”, “Bias for action”, “RAD”. I call it stupid.
  • Static over dynamic.
    • I mentioned this preference in my previous post here. Another example I want to give is when I debug some “fancy” code, it gives me a lot of headache when everything is dynamically picked. Reading the code will give you no clue what actually happens and you have to set break points and get dirty.
  • Separation over combination (which can be easier to use short term)
    • There are a lot of “powerful” library and frameworks out there. Their feature list is long and getting longer. But in my opinion, they should “do one thing and do one thing well” (Unix philosophy). And you should also keep this in mind when you write your code.
  • Simplicity over optimization.
    • “Premature optimization is the root of all evil.” —Donald Knuth. Need I say more? Actually one thing I would like to say is that quite often, you don’t even realize you are doing premature optimization. You are simply doing nature things as a well trained engineer, cache this, minify that, etc. I say you should question everything you do all the time if it is absolutely necessary.
  • Encapsulation over extensibility
    • Don’t get me wrong, inheritance and polymorphism are powerful programming concepts. Some libraries and frameworks leverage this and you can simply implement an interface or two to use them. However, I rarely see they get used properly when build internal systems. I see all the time an interface or parent class got one and only one implementer, which only makes the debugging experience horrible because you cannot get to the real code directly from the caller.
  • Configuration over convention (No I did not get the order wrong. I mean it.)
    • Convention over configuration is a software design paradigm advocated and embraced by a lot of people. It even has its own wikipedia page. I hate it, especially those “RAD” frameworks using this as an excuse to create tons of “dark magic”. The result is poor discoverability, hard to maintain code, buggy and hard to debug. An example I have to mention is a PHP framework called Lithium. Just don’t use it.

Can we have fixed versions for all the things?

Recently, we encountered some bower packages version conflicts issues. I believe the story is that we installed a new package called angular-touch. We installed it locally on our dev environment using the following command:

bower install angular-touch –save

The –save will add the package to the bower.json file. However, for some reason, we did not get a warning about that it needs a newer version of angular (1.4.3) than what we had (1.3.6). It may just silently updated it. Only when we committed the change and the build machine start to build from scratch, the issue showed up and we got his error:

Unable to find a suitable version for angular, please choose one:
1) angular#>=1 <1.3.0 which resolved to 1.2.28 and is required by angular-bootstrap#0.12.0
2) angular#1.2.28 which resolved to 1.2.28 and is required by angular-loader#1.2.28
3) angular#1.3.12 which resolved to 1.3.12 and is required by angular-resource#1.3.12
4) angular#1.3.16 which resolved to 1.3.16 and is required by angular-mocks#1.3.16, eMenu-web
5) angular#>= 1.0.8 which resolved to 1.3.16 and is required by angular-ui-router#0.2.10
6) angular#>=1.2.10 which resolved to 1.3.16 and is required by angular-carousel#0.3.12
7) angular#>=1.0.8 which resolved to 1.3.16 and is required by ngGeolocation#0.0.7
8) angular#~1.x which resolved to 1.3.16 and is required by angular-spinner#0.6.2
9) angular#>= 1.2.23 which resolved to 1.3.16 and is required by ngCordova#0.1.15-alpha
10) angular#1.4.3 which resolved to 1.4.3 and is required by angular-touch#1.4.3Prefix the choice with ! to persist it to bower.json

Here is something I really don’t like (that’s probably also why I’m writing a blog about it): Bower, like a lot other package management systems (npm, pip, maven, etc) allows you to specify version ranges. In this example, you can see notations “>=”, “<“, “~1.x” etc. Personally, I like all the things to be fixed. My thought is that deterministic is way better than random. That is probably why I like docker a lot.

I understand why they did this though. Because when your project depends on multiple libraries (let’s say A and B), they may in turn all depends on another library (let’s say C). If A want v1.1 of C and B want v1.2 of C, you got a conflict when A may work with anything greater than v1.1. So, to make it easy, they allow library writers to specify version range and must have some smart logic to pick one of the many versions that satisfies all the requirements (or just pick the latest that satisfies). It is only when they really cannot find anything to satisfies it, they ask a human like above. However in our case, it is too late because it is the build machine installing the packages.

Now, maybe it will be very annoying if we do not have version range and developers have to manually resolve conflicts a lot. Personally I feel much better knowing what exactly is my program using but others may not care and just want to use some library easily. Then I would ask this question: why don’t we have a platform can hold multiple versions of same package and let others use whatever version they need? Maybe it is very hard and need support from the programming language level. But in my opinion, it will be awesome. We will be able to encapsulate a lot better. Yes, the final size of your program may be several times depends on average how many versions of packages you included, but disk space and memory are getting much cheaper these days.

Update:

Dependency Hell

Explain WordPress as a house to a beginner

Today I met with a non-technical person who just started to use WordPress. I used a metaphor to explain to her what WordPress is and I felt that it is pretty good and I’m going to share here.

Imagine all the domain names out there are real estate land. The website we are building for the domain name is going to be the house on the land. You can build it from scratch by write HTML manually but just like if you build a house starting with cutting down some trees for wood, it will be very slow and ugly. WordPress is like a mobile house that is already built, you just need to tow it onto the land.

After the house is in place, you need to pick an existing overall look and feel of the house. This is picking the themes of the site. Unlike the style of the house, the theme can be changed pretty easily from one to another. Just like you can paint one side of the house in a different color, you can tweak specific things by customize the theme, but requires some coding skill.

The house comes with a kitchen. By kitchen I mean the blog functionality. Using that you can create dishes (blogs) and share with the world. However, if you want more rooms (static pages), you can create them. Again, you can create manually by write html of the page but a better way to do it is use some visual editor, like just order the carpet, curtain, furnitures and put them in the room.

Tools like visual editor or other ones that helps you create blogs, or help you SEO (make your house easy to find in a crowed city) are called plugins.

Got a better analogy? Leave a comment below.

Backup strategy of this blog when it is running on docker.

So you can see that I had some backup issue recently and lost couple posts. Honestly, I still have not figured out what was the problem so I’m going to lay out my backup stack and if you can spot anything wrong, leave a comment below.

First, this blog is running on 2 dockers. One for WordPress. It just have the PHP code running, with some credentials as environment variables in the memory, so nothing need to be back up. The other docker runs the MariaDB (basically MySQL), which contains all the the data for the post, comments, users, etc. That is what I want to backup.

backup1

If you read my previous posts about docker, you know that it is mostly read only. So if you start a docker running a database on it, it will lose the data whenever you restart it. Unless you use data volume. So my MariaDB docker has a volume for the folders: /etc/mysql and /var/lib/mysql , where MariaDB will save the data. And they will stay the same when I stop/start the container.

backup2

However, I did not map the data volume to a physical folder on the host. I’m not 100% where the data volume actually is but I think if I destroy the container, the data volume will be gone. I did not map it to a physical folder because the machine is just a VM on Microsoft Azure. I don’t think there is much value mapping it and save it there since the machine can be gone easily.

What I did was to backup the data volume to Amazon S3. I originally used this dockup project and its docker image. But it has some problems with restore. So I used a fork of it here. The fork did not have a docker image, so I built it here. I created 2 services on tutum.co and tell them to mount the data volume from my MariaDB. With one click of a button, I can easily backup/restore my MariaDB data volume as a zip file onto/from Amazon S3.

backup3 backup4

But I did not want to do this manually every time I post something. (Or maybe I should have) So I set up a cron job like thing to back it up every day. I used https://github.com/sunshineo/tutum-schedule which is my fork from https://github.com/alexdebrie/tutum-schedule . What I changed was make the project run a non stop python process directly instead of supervisord. I discussed this with alexdebrie in the tutum slack channel and we all agreed that this is a good idea.

So there you are, something worked pretty well when I tested end of May. I did not post must the month of June but the backup did happen everyday. However, the backed up file never changed even after I made some post. This is a mystery to me.