jb…

Opinionated

May 24, 2010

As a carpenter has his tools, so do I, as a “knowledge worker” have my computer. I spend many hours a day with my Mac. I have my workflow honed and finely tuned, and I know when something is wrong with my computer, when there is more friction than there needs to be. I am a Systems Administrator, so my knowledge work is to ensure that other people can get their work done. My job is to keep the servers, services, and systems I support up and running 24x7. The tools I use to get this job done mean a lot to me, and over the years I’ve tried many of them with varying levels of success. I know exactly what my ideal setup is, and I’m working towards filling out my toolbox with the very best as I strive to bring my craft to the next level of mastery.

At home, my computer is college, entertainment, finance, photos, blogging, and fun. Mostly fun. There have been times when I’ve walked away from even owning a computer at home, seeing it as a distraction more than anything, but I always come back to wanting one around, if for nothing else than as an outlet for creativity.

In the years that I’ve been using computers, I’ve found that I desire simplicity more than configurability. Favoring fewer options over more. The machine I use needs to be beautiful to look at, because I spend a lot of time looking at it. It needs to be simple to use, because I have work to do, photos to edit, words to write, and I don’t want to have to mess with anti-virus updates or X windows crashing because of some beta driver bug that made its way into the mainstream release. I just want my computer to let me do what needs to be done.

Beautifully designed and crafted, simple to use, powerful… my computer needs to be a Mac. No one else on the market can release a computer that matches a Mac. I’m not sure why, it’s like they don’t know how. They try, but they fail.

Open source operating systems like Ubuntu are not as good because there are far too many cooks in the kitchen. Ubuntu is not an operating system like OS X is an operating system. It’s the Linux kernel, the ext4 filesystem, the Xwindow system, the Gnome desktop, the Firefox browser, and thousands of other open source packages and applications that work loosely together, and are developed by different teams. OS X is developed by Apple.

Windows XP is a suitable operating system to work with at the office, but I am far more productive on a Mac. With tools like Yojimbo, Spotlight, and Quicklook, Macs are far better suited for information management. I hear Windows 7 is nice. My wife has it on her PC, and so far, it is still just a PC.

As much as I love Macs for their design and ease of use, I also see the faults of some business decisions Apple has made in the past few years. The App Store is either a resounding success or a horrible failure, depending on who you talk to. In sheer volume, 200,000 apps is a lot of applications, but like Windows was last decade, most of them are crap. Apple’s decision to approve each app in the store is admirable to a degree, but they are not executing well at all. Some people are philosophically opposed to the app store, saying that the iPod/IPad/iPhone ecosystems should be open for any application to run on them, as is the case on the Mac. I do not care about this aspect, but I do wish that Apple would fix their approval process to make the system much more transparent. There should be clear cut guidelines on what is acceptable and what is not, and those guidelines should be applied across the board. Random app store rejections are the running gag of the current implementation. Its wrong, and it needs to be fixed.

Is the Apple today the same as the Apple so many fell in love with in the ’80s and ’90s? The scrappy underdog that just won’t die? No, and I couldn’t care less. I find it interesting that the era that some romanticize is actually one of the worst in the history of the company. Back when Apple was allowing clones and releasing crap with the Apple logo on it. Good riddance to bad rubbish. OS 9 was not interesting to me. OS X is.

I was using Linux and OpenBSD when I first heard of OS X, my first iBook was a revelation. Finally, someone had put a decent GUI on a Unix box. Apple has only gotten better from there.

I know there are a lot of very smart people who disagree with me. Lets let the next twenty years decide who is right.

Along with the app store debacle, there is Apple’s stance on Flash. My personal feeling is that if Flash were a true open standard, if anyone could create Flash applications without relying on Adobe, it’d be a whole different ball game. As it is, Flash is controlled by Adobe in its entirety, and that seems wrong for the Internet. The Web is the great leveling field, a mechanic in Kansas has the same chance of creating an awesome web site as a multi-billion dollar corporation. All the tools to create amazing web sites are free, and the specs for building the sties are readily available. All you need is a computer, Internet access, and a text editor. With Flash, you need some pretty expensive software. Also, having run a video serving site in the past, I can tell you that HTML 5 would have been a Godsend back then. It would have been so much simpler to just drop a .mov or .ogg file enclosed in video tags than the junk code I had to put in.

I’d like to watch Hulu on my iPad. Netflix already rocks on it. Flash is not a necessity.

Finally, there’s Google. I used to love Google, back when it was a search engine. They could have been happy with just being the best search engine in the world, and making billions, hand over fist, but no… they had to go and get greedy. Eric Schmidt sat on the Apple board of directors and saw what Apple was doing, and thought to himself… Google could do that. So, they “stabbed Apple in the back”, and released Android, and then the Nexus One, a direct competitor to Apple’s iPhone. Bad form, old boy, bad form indeed.

Also, I don’t like Google’s business model any more. I used to be fine with it, when they would show ads on the search results. Now though, Google wants to watch everything you do online, and figure out a way to monazite your activity. Your email, calendar, RSS feeds, photographs, friends, chats, videos, music, there’s even Google Health where you can put your medical record in Google. It all goes into the big black box that is Google, to be analyzed for who knows how long. Me, I like to be a little more honesty with my transactions. That’s why I pay for my email service. I give Apple money, they give me an email address, and a few other perks. It’s as simple as it gets.

I think that about does it for the major topics of the day. Of course, in all these things, I might be wrong. However, if I am wrong, and you want to call me out on it, I suggest you do your homework first. I’ve done mine. I have several years of experience, and a finely honed sense of craftsmanship.

I am, after all, strongly opinionated.

Add a User - Send an Email

May 20, 2010

I was asked on Twitter the other day why I disliked IBM’s enterprise software. This, in addition to my previous TWS rant, is my answer to that question.

We wanted to do two simple tasks. Add a new user to the system, and have the system send emails automatically when there’s a problem. Seems easy enough, unless you are using the Tivoli Workload Scheduler. Then it’s an entirely different matter.

Add A User

Some new websites like Postulous create a new user for you when you send them an email. Others like Tumblr need only three the username and password to get them setup. To add a user to TWS, you would think that there would be a nice GUI with a menu option that says “Add User”, but none exist. Instead, you have to log into the command line on the server, run the command “dumpsec”, and redirect it’s output to a file. Then, you have to vi that file, and edit the XML to add the username to the correct group. Save that file, and run “makesec filename” to load the new user into the system.

Then, restart the TWS application server. IBM is not sure if this is a required step or not, or a least the help I had on the phone wasn’t sure.

Then, you need to go into the web interface for TWS, and add the user into WebSphere as well.

It’s like the creators of TWS got together for a brainstorming session one day and asked “What is the most difficult, unintuitive way we can add a user to the system?” Congratulations, folks… I think you nailed it.

Send An Email

The heart of TWS is scheduling jobs, and then acting on the results of those jobs. One task we wanted from it was to let us know if it couldn’t run a job by sending us an email. For two years we worked with consultants and IBM to get this to work. We wound up having to get some of the original developers on the phone from Rome, and with their help we finally found the problem.

TWS stores some of it’s settings in a DB2 database. That right there is enough for me to toss the entire application in the trash. In Unix, configuration settings are stored in a plain text file, one file per application if possible. And if that wasn’t bad enough, we found that one of the binaries was modifying the configuration settings in the DB2 database when it was launched, changing the port that a certain daemon was supposed to be listening on. This daemon was responsible for listening for incoming configurations from the main server, including the configuration telling it to send the email. It’s hard for me to express how wrong this is, but I’ll try.

Any daemon should read its configuration from a file under the /etc directory. That’s how it works in Unix, and for the past thirty years its worked out pretty great. No daemon should have access to modify the configuration of any other daemon. Also, if listening on a certain port is central to the communications and functioning of the application, don’t make that configurable, just hard code the daemon to listen on that port. I suppose it would also be acceptable to allow configuration that would override the default, but only if the daemon reads its configuration from a plain text file and only if in the absence of the overriding configuration the default port is chosen to listen on.

Again, $30,000 and the application is held together with duct tape and silly putty.

MobileMe Mail Revisited

May 16, 2010

If the only computers you use are Macs, or if you only use Microsoft Outlook on a PC, you may not have spent a significant amount of time in the web interface to MobileMe. The web interface occupies a curious spot in Apple’s portfolio as one of their few products that tries to do everything, and gets very little of it right. One piece in particular that gets most things wrong is the web interface to MobileMe Mail.

MobileMe Mail is built on SproutCore, a library of Javascript and HTML 5 frameworks intended to build desktop like applications for the web. The Mail web app looks like the desktop Mail application, and has certain features that try to mimic the feel of the desktop application, but falls short in functionality. The Mail web app tries to bring the desktop experience to the web, but in doing so ignores the web paradigm that makes Gmail such a success. I’ve narrowed down the problems with the web interface to five main areas.

Speed. Too often I’ve seen the dreaded “MobileMe Mail is loading” message. MobieMe Mail Wait Message That this message even exists tells me that the application is too slow. The message is a web equivalent to an app splash screen; I don’t want to see a splash screen, I want to see my mail. Mail displays another spinning gear and a “Loading…” message when switching between folders, or sometimes just between messages in the same folder, instead of text. These messages do not tell me that Mail is working, they tell me that something is wrong with the applications internal design.

Search. In the top right hand corner of MobileMe Mail is a search field. Typing in the search filed will allow you to search in only the currently selected folder. This is wrong. When Apple introduced spotlight in Tiger, they made a huge deal out of it, and rightly so. Search on the desktop changed the way I used my Mac, and the speed improvements in Leopard made it the default way I find anything. In placing this search field in the web interface, I expect to be able to search not only all of my mail in all of my folders, but all of my contacts, documents, appointments, and media available in MobileMe. Instead, it searches in one folder in one application. Not only that, but it appears to only search basic fields in each message, not through the text of the message itself.

Rules. There are no rules for automatically processing incoming mail. This is also wrong. Rules should sync from the desktop Mail app. For some users who may be using an iPad, and only and iPad, there is no way to filter incoming mail. The lack of rules and filtering severely limits the usability of the web interface, and is another big advantage Gmail has over MobileMe Mail.

Folders. While Gmail has no folders and instead uses labels and search, MobileMe Mail has neither good search, nor good folder management. The only options available for folders is to create, delete, and rename the folders. Since Mail is unable to use search as an organizing feature, folders are the only option.

Notes. It’s almost hard to include notes as a part of MobileMe Mail that does things wrong, when it actually doesn’t do notes at all. iPhone OS 4 is supposed to (finally) bring notes syncing between the desktop Mail client and the iPhone over IMAP, and therefore MobileMe, but I haven’t seen an announcement of the Notes being added to the Mail interface, or as an additional web service.

Apple has made strides to improve their online services since the rocky conversion from DotMac, and I hope that they continue to make the MobileMe web interface better, faster, and more usable. Critics often call MobileMe Apple’s answer to Google’s, Yahoo’s, and Microsoft’s online services, offered at a premium price. However, it is important to keep in mind that MobileMe’s Mail is not a competitor to GMail, Yahoo Mail, or even Hotmail. MobileMe is meant to enhance the experience of owning multiple Apple devices: Macs, iPhones, iPads, and iPods, by providing a synchronization service between them for things that matter. The web interface to MobileMe is an addition to the MobileMe service, not the service itself.

Managing Nagios Configs

December 9, 2009

We don’t have a very big Nagios installation, comparatively anyway, but it is big enough to find that the default layout for configurations is insane. I tried using the provided layout, until I wound up with single text files with thousands of lines in them. This made it very hard to do individual customizations for servers, and separating out who wants to be notified for what. Here is what I came up with for managing our Nagios configs.

It seems that the repositories are always behind in Nagios, so it is one of the very few apps that I recommend installing from source. I install Nagios in /usr/local/nagios, the default when compiling, I’ll just call it $nag. The Nagios binary is in $nag/bin, the plugins in $nag/libexec, and the config files in $nag/etc. The easiest way to understand nagios is to follow its start up procedures. I keep an /etc/init.d/nagios file for initialization, The file defines, among other things, where the home directory for Nagios is, what config file to use as its base, and where the Nagios binary and plugins are. The important thing to understand is that this file is the first pointer in a long string of pointers that Nagios uses for configuration.

Inside the nagios.cfg file are the cfg_dir directives. These are pointers that tell Nagios that it can find additional configurations inside the directories listed. Once Nagios is given a directory to look at, it will read each file ending in .cfg inside of that directory. The first directory that I have listed is $nag/etc/defaults. I keep four files in this directory: commands.cfg, dependencies.cfg, generic.cfg, and timeperiods.cfg.

The file “commands.cfg” contains the definitions of all check commands that Nagios can understand. They look like this:

 # 'check_local_load' command definition
 define command{
        command_name    check_local_load
        command_line    $USER1$/check_load -w $ARG1$ -c $ARG2$
        }

The file also contains the alert commands, or what Nagios will do when it finds something that it needs to let you know about:

define command{
command_name notify-by-email
command_line	/usr/bin/printf "%b" "***** Nagios *****\n\nNotification Type: 	$NOTIFICATIONTYPE$\nHost: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: 	$HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" | /usr/	bin/mail -s "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **" 	$CONTACTEMAIL$
}

This allows us to call a command later in Nagios by it’s defined command_name,such as check_local_load, instead of having to call the entire command including arguments. Keeps the configs clean.

The next file, “generic.cfg”, contains templates for host configurations. This file allows us to do two things: list common options that are defined for all of the hosts, and separate hosts into notification groups. The definitions look like this:

define host{
        name                            generic-admin
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        register                        0
        check_command           check-host-alive
        max_check_attempts      3
        notification_interval   120
        notification_period     24x7
        notification_options    d,u,r
				contact_groups          admin,admin_pager
        action_url /nagios/pnp/index.php?host=$HOSTNAME$
        }

There are two separate types of generic definitions, hosts and services, for the two types of monitoring that Nagios does. The important section for most of my purposes above is the “contact_groups” line. This allows me to group contacts with hosts, so it answers the question of “who gets notified if this server goes down?”. The same thing applies to the service template below.

define service{
        name                            generic-full	
        active_checks_enabled           1
        passive_checks_enabled          1
        parallelize_check               1
        obsess_over_service             1
        check_freshness                 0
        notifications_enabled           1
        event_handler_enabled           1
        flap_detection_enabled          1
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        register                        0
        is_volatile                     0
        check_period                    24x7
        max_check_attempts              3
        normal_check_interval           5
        retry_check_interval            1
        notification_interval           120
        notification_period             24x7
        notification_options            w,c,r
	contact_groups                  admins,admin_pager,webmin
	process_perf_data 1
	action_url /nagios/pnp/index.php?host=$HOSTNAME$&srv=$SERVICEDESC$
        }

The other two files, timeperiods.cfg and dependencies.cfg, I haven’t done a whole lot with yet.

The next directory parsed as defined in nagios.cfg is $nag/etc/users, which, surprisingly enough, is where all of the users are defined. I keep two files in this directory, users.cfg and contactgroups,cfg. The users.cfg file contains a list of every user, and since I have different needs for pagers and regular email alerts, each user is defined twice:

define contact{
        contact_name                    Jon
        alias                           Jon Buys
        service_notification_period     24x7
        host_notification_period        24x7
        service_notification_options    w,u,c,r
        host_notification_options       d,u,r
        service_notification_commands   service-notify-by-email
        host_notification_commands      host-notify-by-email
        email                           jbuys@dollarwork.com
        }

define contact{
        contact_name                    Jon_pager
        alias                           Jon Buys
  	  service_notification_period     24x7
  	  host_notification_period        24x7
  	  service_notification_options    u,c,r
  	  host_notification_options       d,u,r
  	  service_notification_commands   notify-for-disk
  	  host_notification_commands      host-notify-by-email
  	  email 				5555555555@my.phone.company.net
  	  }

This lets me group the users more effectively in the second file, contactgroups.cfg:

define contactgroup{
        contactgroup_name admins
  	  alias           sysadmins
  	  members Jon,Gary,nagios_alerts
  	  }

define contactgroup{
        contactgroup_name admin_pager
        alias           sysadmin pagers
        members Jon_pager,Gary_pager,OSS_Primary_Phone,nagios_alerts
}

Now, check the definitions in the generic.cfg file above, and you’ll start to see the chain of config files coming together. The glue sticking it all together is the server definition files. Each logical group of servers gets their own directory, defined in nagios.cfg. For example, we have a group of servers that provides a specific web service (which I’ll call “mesh”), there are web servers, application servers, and database servers that I group together in one directory, named “mesh”. Inside of this directory, each server has its own config file, named like $hostname.cfg. There is also a mesh.cfg, which groups all of the servers together in a host group. The $hostname.cfg files look like this:

 define host{
        use                     generic-host
  	  host_name              	m-app1 
  	  alias                   m-app1
  	  address                 10.10.10.1
  	  }

define service{
  	  use                             generic-full
  	  host_name                       m-app1
  	  service_description             PING
  	  check_command                   check_ping!100.0,20%!500.0,60%
  	  }

define service{
  	  use                             generic-full
  	  host_name                       m-app1
  	  service_description             DISKUSE
  	  check_command                   check_nrpe!check_df
  	  }

Each server has a host definition at the top, and all of the services that are monitored on that server at the bottom. The first section’s line “use generic-host” calls the “generic-host” template from the generic.cfg file above. Each subsequent “define service” section has a “use” line that also calls the templates defined in generic.cfg. Putting each server in its own file makes it very easy to add and remove servers from Nagios. To remove them, just remove (or, safer, rename) the $hostname.cfg file and delete the name from the $groupname.cfg file. It’s also very easy to script the creation of new hosts given a list of host names and IP addresses.

The mesh.cfg file contains the hostgroup configuration for the group:

define hostgroup{
   	 hostgroup_name  mesh
   	 alias           Mesh Production
   	 members         mdbs1,mdbs2,mdbs3,mdbs4,mdbs5,mdbs6,mdbs7,m-app1,m-app2,m-app3,m-store1,m-store2,m-nfs1,m-nfs2
   	 }

This file is not as important, but it makes the Nagios web interface a little more helpful.

You’ll also notice that the check_command line above contains “check_nrpe!check_df”. This means that I use the nrpe (Nagios Remote Plugin Execution) add-on to actually monitor the services on the remote hosts. Each server has nrpe installed, and has one configuration file (/usr/local/nagios/etc/nrpe.cfg). The nrpe.cfg file has a corresponding line that says

command[check_df]=/usr/local/nagios/libexec/check_disk -e -L -w 6% -c 4%

This translates the check_df command sent by the check_nrpe command into the longer command defined above. This makes it easy to install and configure nrpe once, then zip up the /usr/local/nagios directory and unzip it on all new servers.

Nagios is nearly limitless in its abilities, but but because of the complexity of its configuration it can be daunting to newcomers. This setup is designed to make it just a little bit easier to understand, and easier to script.

Blizzard 2009

December 9, 2009

100_1931, originally uploaded by jonbuys.

Iowa got its first big storm of the winter season yesterday, and as of right now its still going on. We couldn't go anywhere even if we wanted to. We got about 13" of snow so far, but the wind gusts up to 50mph are the big problem. Just about everything is shut down, schools, work places, and even some of the larger roads.

Good day to get caught up on somethings I've been meaning to get done.

New SysAdmin Tips

December 4, 2009

My answer to a great question over at serverfault.

First off, find your logs. Most Linux distros log to /var/log/messages, although I’ve seen a couple log to /var/log/syslog. If something is wrong, most likely there will be some relevant information in the logs. Also, if you are dealing with email at all, don’t forget /var/log/mail. Double-check your applications, find out if any of them log somewhere ridiculous, outside of syslog.

Brush up on your vi skills. Nano might be what all the cool kids are using these days, but experience has taught me that vi is the only text editor that is guaranteed to be on the system. Once you get used to the keyboard shortcuts, and start creating your own triggers, vi will be like second nature to you.

Read the man page, and then run the following commands on each machine, and copy the results into your documentation:

    cat /etc/*release*
    cat /etc/hosts
    cat /etc/resolv.conf
    cat /etc/nsswitch
    df -h
    ifconfig -a
    free -m
    crontab -l
    ls /etc/cron.d
    echo $SHELL

That will serve as the beginnings of your documentation. Those commands let you know your environment, and can help narrow down problems later on.

Grep through your logs and search for “error” or “failed”. That will give you an idea of what’s not working as it should. Your users will give you their opinion on whats wrong, listen closely to what they have to say. They don’t understand the system, but they see it in a different way than you do.

When you have a problem, check things in this order:

Disk Space (df -h): Linux, and some apps that run on Linux, do some very strange things when disk space runs out. It may seem unrelated, until you check and find a filesystem 100% full.
Top: Top will let you know if you’ve got some process that’s stuck out there eating up all of your available CPU cycles. Nothing should consume 99% CPU for any extended period of time. If its a legitimate process, it should probably fluctuate up and down. While you are in top, check…
System Load: The system load should normally be below 3 on a standard server or workstation. The system load is based on CPU, memory, and I/O.
Memory (free -m): RAM use in Linux is a little different. It’s not uncommon to see a server with nearly all of its RAM used up. Don’t Panic, if you see this, it’s mostly just cache, and will be cleared out as needed. However, pay close attention to the amount of swap in use. If possible, keep this as close to zero as you can. Insufficient memory can lead to all kinds of performance problems.

Logs: Go back to your logs, run tail -500 /var/log/messages

more and start reading through and seeing what’s been going on. Hopefully, the logs will be able to point you in the direction you need to go next.

A well maintained Linux server can run for years without problems. We just shut one down that had been running for 748 days, and we only shut it down because we had migrated the application over to new hardware. Hopefully, this will help you get your feet wet, and get you off to a good start.

One last thing, always make a copy of a config file you intend to change, and always copy the line you are changing, and comment out the original, adding your reason for changing it. This will get you into the habit of documenting as you go, and may save your hide 9 months down the road.

Linux Hidden ARP

October 9, 2009

To enable an interface on a web server to be part of an IBM load balanced cluster, we need to be able to share an ip address between multiple machines. This breaks the IP protocol however, because you could never be sure which machine will answer for a request for that IP address. To fix this problem, we need to get down into the IP protocol and investigate how the Address Resolution Protocol or ARP, works.

Bear with me as I go into a short description on how an IP device operates on an IP network. When a device receives a packet from its network, it will look at the destination IP address and ask a series of questions from it:

Is this MY ip address?
Is this ip address on a network that I am directly connected to?
Do I know how to get to this network?

If the answer to the first question is yes, then the job is done, because the packet reached its destination. If the answer is no, it asks the second question. If the answer to the second question is no, it asks the third question, and either drops the packet as unroutable, or forwards the packet on to the next IP hop, normally the device’s default gateway.

However, if the answer to the second question is yes, the device follows another method to determine how to get the packet to it’s destination. IP addresses are not really used on local networks except by higher level tools or network aware application. On the lower level, all local subnet traffic is routed by MAC address. So when the device needs to send a packet to an IP address on the subnet that it is attached to, it follows these steps:

Check my ARP table for an IP to MAC address mapping
If needed, issue an ARP broadcast for the IP address – an ARP broadcast is a question going out to all devices on the subnet that has the simple setup of “if this is your IP address, give me your MAC address”
Once the reply for the ARP address is received, the packet is forwarded to the appropriate host.

So, to put this all in perspective, when multiple machines share the same IP address, each of the machines will reply to the ARP request, and depending on the order in which the replies are received, it is entirely possible that a different machine will respond each time. When this happens, it breaks the load balancing architecture, and brings us down to one server actually in use.

The next question is normally: Why is that? Why do the web servers need that IP address anyway? The answer to this is also deep in the IP protocol, and requires a brief explanation of how the load balancing architecture works.

To the outside world, there is one ip address for myserv.whatever. Our public address is 192.268.0.181 (or, whatever). This address is assigned three places on one subnet: load balancer, first web server, and second web server. The only server that is needs to respond to ARP requests is load balancer. When the load balancer receives a packet destined for 192.168.0.181, it replaces the destination MAC address with one of the addresses from one of the web servers, first web server or second web server, and forwards it on. This packet still has the original source and destination IP addresses on it, so remember what happens when an IP device on an IP network receives a packet… it asks the three questions outlined above. So, if the web servers did not have the 192.168.0.181 address assigned to them, they would drop the packet (because they are not set up to route, they would not bother asking the second or third questions). Since the web servers do have the ip address assigned to one of their interfaces, they accept the packet and respond to the request (usually an http request).

So, that covers the why?, let’s look at how?. Enable the hidden ARP function by entering the following into /etc/sysctl.conf:

# Disable response to broadcasts. 
# You don't want yourself becoming a Smurf amplifier.
net.ipv4.icmp_echo_ignore_broadcasts = 1 
# enable route verification on all interfaces 
net.ipv4.conf.all.rp_filter = 1 
# enable ipV6 forwarding 
#net.ipv6.conf.all.forwarding = 1 
net.ipv4.conf.all.arp_ignore = 3 
net.ipv4.conf.all.arp_announce = 2

The relevant settings are explained here:

arp_ignore = 3: Do not reply for local addresses configured with scope host, only resolutions for global and link addresses are replied.

For this setting the really interesting part is the configured with scope host part. Before, using ifconfig to assign addresses to interfaces we did not have the option to configure a scope on an interface. A newer (well, relatively speaking) command, ip addr is needed to assign the scope of host to the loopback device. The command to do this is:

ip addr add 192.168.0.181/32 scope host dev lo label lo:1

There are some important differences in the syntax of this command that need to be understood to make use of it on a regular basis. The first is the idea of a label being added to an interface. ip addr does not attempt to fool you into thinking that you have multiple physical interfaces, it will allow you to add multiple addresses to an existing interface and apply labels to them to distinguish them from each other. The labels allow ifconfig to read the configuration and see the labels as different devices.

example:

lo	Link encap:Local Loopback 
	inet addr:127.0.0.1 Mask:255.0.0.0 
	inet6 addr: ::1/128 Scope:Host 
	UP LOOPBACK RUNNING MTU:16436 Metric:1 
	RX packets:9477 errors:0 dropped:0 overruns:0 frame:0 
	TX packets:9477 errors:0 dropped:0 overruns:0 carrier:0 
	collisions:0 txqueuelen:0 
	RX bytes:902055 (880.9 Kb) TX bytes:902055 (880.9 Kb)
	
lo:1	Link encap:Local Loopback
		inet addr:192.168.0.181 Mask:255.255.255.255
		UP LOOPBACK RUNNING MTU:16436 Metric:1
lo:2	Link encap:Local Loopback 
		inet addr:192.168.0.184 Mask:255.255.255.255 
		UP LOOPBACK RUNNING MTU:16436 Metric:1

Here, lo, lo:1, and lo:2 are viewed as separate devices by ifconfig.

Here is the output from the ip addr show command:

1: lo: <LOOPBACK,UP> mtu 16436 qdisc noqueue 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 brd 127.255.255.255 scope host lo
    inet 192.168.0.181/32 scope host lo:1
    inet 192.168.0.184/32 scope host lo:2
    inet 192.168.0.179/32 scope host lo:3
    inet 192.168.0.174/32 scope host lo:4
    inet 192.168.0.199/32 scope host lo:5
    inet 192.168.0.213/32 scope host lo:8
    inet 192.168.0.223/32 scope host lo:9
    inet 192.168.0.145/32 scope host lo:10
    inet 192.168.0.217/32 scope host lo:11
    inet 192.168.0.205/32 scope host lo:12
    inet 192.168.0.202/32 scope host lo:13
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever

Here we can see that the lo:1 (etc…) addresses are assigned directly under the standard lo interface, and are only differentiated from the standard loopback address by their label.

Here is the same output from the eth2 device:

4: eth2: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:10:18:2e:2e:a2 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.73/24 brd 192.168.0.255 scope global eth2
    inet 192.168.0.186/24 brd 192.168.0.255 scope global secondary eth2:1
    inet 192.168.0.183/24 brd 192.168.0.255 scope global secondary eth2:2
    inet 192.168.0.176/24 brd 192.168.0.255 scope global secondary eth2:3
    inet 192.168.0.178/24 brd 192.168.0.255 scope global secondary eth2:4
    inet 192.168.0.201/24 brd 192.168.0.255 scope global secondary eth2:7
    inet 192.168.0.212/24 brd 192.168.0.255 scope global secondary eth2:8
    inet 192.168.0.222/24 brd 192.168.0.255 scope global secondary eth2:9
    inet 192.168.0.147/24 brd 192.168.0.255 scope global secondary eth2:10
    inet 192.168.0.219/24 brd 192.168.0.255 scope global secondary eth2:11
    inet 192.168.0.46/24 brd 192.168.0.255 scope global secondary eth2:5
    inet 192.168.0.208/24 brd 192.168.0.255 scope global secondary eth2:12
    inet 192.168.0.204/24 brd 192.168.0.255 scope global secondary eth2:13
    inet6 fe80::210:18ff:fe2e:2ea2/64 scope link 
       valid_lft forever preferred_lft forever

Same as above, the addresses do not create virtual interfaces, they are simply applied to the real interface and assigned a label for management by ifconfig. Without the label, ifconfig will not see the assigned address.

arp_announce = 2: Always use the best local address for this target. In this mode we ignore the source address in the IP packet and try to select local address that we prefer for talks with the target host. Such local address is selected by looking for primary IP addresses on all our subnets on the outgoing interface that include the target IP address. If no suitable local address is found we select the first local address we have on the outgoing interface or on all other interfaces, with the hope we will receive reply for our request and even sometimes no matter the source IP address we announce.

This one is a little tricky, but I believe it deals with how the web servers talk with the clients requesting web pages. In order for the web page to come up and maintain a session, when the server sends a packet back to the client, it needs to come from the source ip address of the hidden ip address. In order to do this, the web server looks at the destination address of the client packet, and then responds using that as it’s source IP address instead of it’s actual IP address. Clear as mud, right!

I hope this helps explain things a little better about the hows and whys of the web server’s side of load balancing. Note however, that I didn’t talk at all about the edge server. That’s because the edge servers job is done at the application level, and correct configuration of it does not require special consideration at the OS level.

SLES and RHEL

September 2, 2009

Comparing two server operating systems, like SuSE Linux Enterprise Server (SLES) and RedHat Enterprise Linux (RHEL), needs to answer one question, “what do we want to do with the overall system”? The version of Linux running underneath the application is immaterial, as long as the application supports that version. It is my opinion that we should choose the OS that supports all of our applications, and gives us the best value for our money.

Price

RHEL advertises “Unlimited Virtual Machines”, but what they put in the small print is that the number of virtual machines you can run is unlimited only if you are using their Xen virtualization. We already have a significant investment in both money and knowledge in VMWare, so the RHEL license doesn’t apply, and we have to purchase a new license for each virtual machine. There is an option to purchase a RHEL for VMWare license, but it is expensive, and still limits you to a maximum of 10 virtual machines per-server.

SLES allows unlimited virtual machines regardless of the virtualization technology used. SLES also has a special license for an entire blade center, which (and I’d have to double check on this fact) may let us license the blade center, and purchase additional blades without having to license those blades separately. This license would allow us to run unlimited virtual machines and add physical capacity to the blade center as needed. This is the license we have for one of our blade centers, and I believe it cost $4500 for a three year contract. As I understand it, that means that for the 9 blades we have in it now, we spent $500 each for a three year license, which equates to $167 per blade per year. We also have the ability to add an additional five blades to this blade center, which would also be covered under the agreement. Doing so would bring our total per blade per year cost down to $108, for unlimited virtual machines.

For a comparison, right now, if we want to bring up a new RHEL server in our environment, we have to purchase another minimal RHEL license for $350, more if we actually want support and not just patches.

Even without the special blade center pricing (which may be IBM only), a single license for SLES priced by CDW costs $910 for three years. So, for $304 per blade per year, we can license two blade centers for $6,080 annually, which will cover all virtual machines. That price is off-the-shelf, so I’m sure our vendors could lower the price even more. In another pre-production environment, which resides on three physical servers running VMWare, there are 40 virtual machines, which, if we migrate them to REHL, would cost $14,000 annually.

Related to base price is what is included with the base price. SLES gives you the option to create a local patching mirror and synchronize regularly with their servers. This same functionality is available for RHEL as the “RHN Satellite Server” at a cost of $13,500, annually.

Performance

As far as I can tell, neither RHEL or SLES have a significant performance advantage. However, SLES has the option to do a very bare-bones, minimal install with no reliance on a graphical user interface. RHEL requires either a remote or local X windows session running to access its management tools. There are versions of the management tools in the command line, but they are either marked as depreciated, or do not offer all of the options of the GUI.

One of our environments is run on SLES 9, and another ran on SLES 8 for several years and all systems have had excellent performance.

Elsewheres

RHEL has no YaST equivalent, and the individual command line configuration tools do not have all of the options of their GUI counterparts. To effectively manage RHEL, we either have to keep an X server running locally and tunnel X, or use the old school Unix tools, and edit text files. Also, RedHat keeps its text files in several different places, and it has taken us a lot of trial and error to find out which one is right. Admittedly, that’s more of an annoyance than anything, but it still takes time.

RHEL has major problems with LDAP. We had an outage on a database server that was a result of an improper LDAP configuration, the same LDAP config we have on all of the other servers. RHEL was attempting to authenticate a local daemon that inspects hardware against LDAP, before the NIC card was even discovered, much less started. I can think of no good reason that would ever be an option.

I’m not sure how RedHat is competing these days. Cent OS and Scientific Linux distribute the source of RHEL for free, Oracle has a lower price option, and Novell’s SLES kills RedHat in pricing. It almost seems to me that RedHat is living on it’s name alone.

Writing about Jekyll

August 25, 2009

I’m writing an article for TAB about my new blogging engine, Jekyll. I’ve taken most of the reliance on the command line out of dealing with Jekyll on a day to day basis, and instead have a few Automator workflows in the scripts menu in the Mac menubar. It’s a great setup, I’m really enjoying it. I’m sure there will be quite a bit of enhancement yet to come, but my initial workflow looks like this:

Click “New Blog Post”
Write the article
Click “Run Jekyll”
Make sure everything worked using the local webrick web server.
Click “Kill Jekyll”
Click “Sync Site”

Here’s what I’ve got so far in the automator workflows:

New Blog Post

First, I run the “Ask for Text” action to get the name of the post. Then, I run this script:

NAME=`echo $1 | sed s/\ /-/g`
USERNAME=`whoami`
POSTNAME=`date "+%Y-%m-%d"-$NAME`
POST_FQN=/Users/$USERNAME/Sites/_posts/$POSTNAME.markdown
touch $POST_FQN
echo "---" >> $POST_FQN
echo "layout: post" >> $POST_FQN
echo "title: $1" >> $POST_FQN
echo "---" >> $POST_FQN
/usr/bin/mate $POST_FQN

Run Jekyll

First, I run this script:

USERNAME=`whoami`
cd /Users/$USERNAME/Sites
/usr/bin/jekyll > /dev/null
/usr/bin/jekyll --server  > /dev/null 2>&1 &
/usr/local/bin/growlnotify --appIcon Automator Jekyll is Done -m 'And there was much rejoicing.'
echo "http://localhost:4000"

Followed by the “New Safari Document” Automator action. This runs Jekyll which converts the blog post I just wrote in markdown syntax to html, updates the site navigation, starts the local web server and opens the site in Safari to preview.

Kill Jekyll

Since I start the local server in the last step, I need to kill it in this step. This action does just that.

PID=`ps -eaf | grep "jekyll --server" | grep -v grep | awk '{ print $2 }'`
kill $PID
/usr/local/bin/growlnotify --appIcon Automator Jekyll is Dead -m 'Long Live Jekyll.'

This is entered in as a shell script action, and is the only action in this workflow.

Sync Site

Once I’m certain everything looks good, I run the final Automator action to upload the site:

cd /Users/USERNAME/Sites/_site/
rsync -avz -e ssh . USERNAME@jonathanbuys.com:/home/USERNAME/jonathanbuys.com/ > /dev/null
/usr/local/bin/growlnotify --appIcon Automator Site Sync Complete -m 'Check it out.'

This is also a single Automator action workflow. You’ll notice that I use Growl to notify me that the script is finished. This is also not really necessary, but it’s fun anyway.

Like I said, there’s a lot of improvement yet to go, but I think it’s a solid start. I’m at a point now where I’m tempted to start writing a Wordpress import feature, which seems to be the only major piece missing from the Jekyll puzzle. I’m not sure what this would take just yet, but I’ve got a few ideas. I haven’t tried uploading any images or media yet, but since everything is static, I assume it would just be a matter of placing the image in a /images folder and embedding it in html. So far, I’m having a lot of fun, and that’s what blogging is really all about.

The Unix Love Affair

August 10, 2009

There’s been times when I’ve walked away from the command line, times when I’ve thought about doing something else for a living. There’s even been brief periods of time when I’ve flirted with Windows servers. However, I’ve always come back to Unix, in one form or another. Starting with Solaris, then OpenBSD, then every flavor of Linux under the sun, to AIX, and back to Linux. Unix is something that I understand, something that makes sense.

Back in ‘96 when I started in the tech field, I discovered that I have a knack for understanding technology. Back then it was HF receivers and transmitters, circuit flow and 9600 baud circuits. Now I’m binding dual gigabit NICs together for additional bandwidth and failover in Red Hat. The process, the flow of logic, and the basics of troubleshooting still remain the same.

To troubleshoot a system effectively, you need to do more than just follow a list of pre-defined steps. You need to understand the system, you need to know the deep internals of not only how it works, but why. In the past 13 years of working in technology, I’ve found that learning the why is vastly more valuable.

Which brings me back to why I love working with Unix systems again. I understand why they act the way that they do, I understand the nature of the behavior. I find the layout of the filesystem to be elegant, and a minimally configured system to be best. I know that there are a lot of problems with the FSH, and I know that it’s been mangled more than once, but still. In Unix, everything is configured with a text file somewhere, normally in /etc, but from time to time somewhere else. Everything is a file, which is why tools like lsof work so well.

Yes, Unix can be frustrating, and yes, there are things that other operating systems do better. It is far from perfect, and has many faults. But, in the end, there is so much more to love about Unix then there is to hate.

Previous Page: 27 of 33 Next