Tuesday, November 10, 2009

Git vs. SVN (I know, right, another one?!?!)

I've been endeavoring to set up a code repository (or even a document repository, if that need should arise) and have been weighing the merits of both Git and SVN.

At the very heart of the comparisons lies the manner in which Git and SVN operate. SVN is a central repository. When you "checkout" a file in SVN, you get only the most recent version. Should you need to do backwards comparisons you must communicate with the server for this. You get no history, either. SVN relies on the availability of the central repository to operate.

Git, on the other hand, is fully distributed. There is generally a "blessed repository" from which everyone will start and ultimately commit to, but when you "clone" that repository you get a full copy of it. Backing up a Git repository with many contributors is actually trivial as there are countless copies of that repository floating around.

Branching

Another major difference revolves around branching. In Git, branching is a way of life (as is the subsequent merging of branches). You want to develop a new feature? Branch on your local box and work on it there, then merge it back into your local main repository before committing back to the blessed repository.

This is not so in SVN. Branching is not done as often (nor as easily). Branching must occur in the central repository and is not a way of life. In this area Git outshines SVN.

Client Tools

One area where Git does not outshine SVN is in the client tools. SVN has been around forever (in digital terms). There are very elegant clients for SVN (such as TortoiseSVN) which allow for an incredible ease of use when working with repositories. Further, most modern IDEs have SVN repository manipulation as a core capability. There are several options for working with SVN in Eclipse, for instance, one of which is core to Eclipse itself.

Git, on the other hand, is young. The tools out there are not nearly as elegant nor are they as wide-spread. What's worse, Git is incredibly Linux centered. There are two Windows clients for Git (with the advent of JGit, that will climb to three), all of which require one to work with the command line. Some GUI projects, such as TortoiseGit, are in the works but will not be ready for prime-time for a while. The last issue here is that there is only limited integration with IDEs. With time, these situations will change, but for now it is a major draw-back to adoption by those other than the most determined.

Ease of Setup

To the end that I would like to work with both systems I decided to set up both on our Windows Server 2003 server. I chose to use Cygwin and OpenSSH, along with Gitosis (a Perl mod for Git), for Git. I used Shannon Cornish's tutorial to set things up (along with a little help from scie.nti.st on matters Gitosis). This turned out to be a rather easy and relatively painless way to go about things.

The basic gist is that you install Git when you install Cygwin then install and setup OpenSSH (by far the most difficult part). At this point you can connect to the server using SSH and clone any repository you would like. Installing Gitosis on top of things (recursively using Git, no less, which is so cool in my book) allows you to use public/private key pairs to authenticate users. You can then use Git to clone the control repository of Gitosis and admin the system remotely. Very elegant and one which doesn't require the anticipated user to have to input a password or create an account on the server.

Setting up SVN was more difficult. The differences, though, are myriad. While the above Git scheme works on SSH the method I chose to use for SVN works over HTTP/HTTPS, which has advantages all of its own. I worked off of several tutorials, but the most significant was this tutorial.

The real difficult part here is that you have to rely on Apache. It seems a bit overkill to have to install Apache and get it running in order to serve up your repository, but this is the accepted way of doing things. Once you have it running you must still log into the server to create a username/password combo for any user that wants to use the system, and you must also log in to the server in order to administer the repository.

The Best of Both Worlds

I have to say that the thing which gets me most excited about Git is the notion of branching it carries with it. I really like the thought of creating a local branch for every new feature. It seems natural to me.

On the other hand, I don't think that I want to saddle everyone else around me with command line tools and vi if they want to work with our repositories. So, can a compromise be made?

In fact, it can! Git has the wonderful ability to clone and commit to SVN repositories. The real details are outlined here by Clinton R. Nixon. In this way, I can take the pain of the command line on myself without foisting it on anyone else, but I also get all of the wonderful features Git brings with it.

Conclusion

In light of all of this, we will be hosting our repositories using SVN. However, I will be keeping an eye towards the maturity of the Git clients. If they should ever advance to the level where any "power user" can attain them, then we very well might switch.

Wednesday, October 7, 2009

A Thought on Environments: Portability

My friend asked me yesterday what I thought of the Kindle. My response was that I was a fan of actually holding a book, feeling the paper, reading in a full fidelity mode. I spoke of how tired my eyes could become from reading on a screen all day. I made a decent case for not adopting the Kindle.

Then, on a whim, I checked out the Kindle app for the iPhone and immediately found myself sucked in.

The first thing that did it was the free availability of a book that we have all been discussing here at the office, Bertrand Russell's "The Problems of Philosophy". Turns out that it is a "classic" and Amazon offers many of the classics for free. I have now downloaded 10 free classics for my iPhone Kindle app and am well on my way to finding book reading Nirvana.

Free content put the hook in my mouth, but what set it was the concept of WhisperSync. WhisperSync is a service that Amazon offers which will sync your content between devices. Now, this is not just the raw content, this is the detailed content, the state content.

For instance, let's say that I am on page 50 of "The Problems of Philosophy" on my iPhone. Further, let us say that I have a Kindle at home on which I do the bulk of my reading. As I read on the iPhone, the Kindle app updates the state for that book. When I get home and fire up my (physical) Kindle my place in the book comes right up (in computer terms, the state is restored). No futzing around with finding my place, everything is just magically the same.

This got me thinking about the synchronicity of environments. In our research the environment you work in is of paramount importance. That environment can be unique to you, or you can share it with others. Everyone can have their own, if necessary. The environment is at least partially reflected in software.

One thing Dr. Sousa-Poza and I talk about a lot is the ability to "save" environments. Environments should be transportable and shareable. If you need to see what I see then you should be able to load up a copy of my environment, see things just as I see them with the data I've been using. What's more, your environment should be able to subsume my environment! Environments should be nestable yet discrete.

WhisperSync brings an interesting possibility to my mind. Shouldn't the environmental changes that I enact on one device translate to another device? What if I access my environment from my iPhone and then switch to my laptop or a web browser? Shouldn't the environment be exactly the same?

This necessitates two things: A place of storage (centralized or decentralized, makes no difference) that all environments have in common and the ability to capture state.

Ideally, the system would operate like this (from the 10,000 foot view):

  1. I access my environment and make some change to its state.
  2. That change is transmitted to the "server" which keeps track of environment state. ("server" is in quotations as a means to capture an idea. It need not be an actual server. It might be better to think of it as a state oracle)
  3. I switch to a different device.
  4. As I start up my environment on the new device the state is restored from the "server".
  5. I continue my work.
Such a system would be truly powerful from a portability standpoint. I could do my work wherever I needed to be and have that work mirrored wherever I go. I could work from multiple devices and not lose a thing.

Monday, August 10, 2009

An Analyst's Development Environment


Here in the land of academic research we're working with a "new" take on mashups. It seems like a no-brainer to me but a lot of people have expressed interest and surprise when I explain to them what we're doing. For now let's call it an analyst's development environment (ADE).

One thing that mashups are really, really good at is taking disparate data sources and allowing "momentary" relationships in the sources to be created. This in effect creates a new data source that is a fusion of the inputs. As is often the case in fusions, this new source tends to be more than just the sum of the parts. You often come up with new views on the data as you add extra sources.

Most people stop here at the fusion stage. Once they have the new view onto the data they rely on other tools outside the scope of a mashup to do interesting things. They might pipe that data into a tool such as Fusion Charts in order to visualize it or they might pipe it into an analysis tool such as a model or sim. But, why do they need to leave the scope of the mashup to do this? What if that analysis or the creation of the Fusion Charts XML was an automated part of the mashup itself?

Mashups deal with web services primarily (though there are some nifty products out there that allow you to mash more than just web services). A web service is usually considered to be a data source. But, in practice they are much more than that. Consider all of the specialized web services provided by Google for geolocation or Amazon for looking up aspects of books. The simplest example I can give you is Google's web service which converts an address to a lat and long pair (called geocoding). With these in mind let's take a different look at web services. Let's look at them as processing units.

A processing unit has 3 criteria: it takes input; does something interesting with that input; and provides output. Processing units are the basis of modern programming. They're known as methods, functions, procedures, etc. depending on context. We can most often build bigger processing units from simpler units.

Web services fit these 3 criteria handily. You can easily provide input, they can easily do something interesting with that input and then just as easily provide output. All communication is done in a standardized protocol driven environment.

The interesting thing about web services is that we can string them together (with the right tools) rather easily into processes. That's exactly what we're doing here. Each web service is either a data source or a processing unit. Given the ability to ferry data from one web service to the next (in an easy way) it is possible to create mashups that do more than just mash data. They actually do some form of processing.

Consider what it would be like if you had a web service endpoint attached to a model? You could pre-mash your data from various sources then run it all through the model and create a new output that would be very interesting. It would be so easy.

Using Presto we recently put together a demo which worked along these lines. It made our demo come together in several weeks rather than over several months. We used Presto to access databases then ferried that data (in XML format) into a custom built web service that took said data and ran XSL transforms on it. That produced Fusion Charts XML which we then piped into our presentation layer for visualization. It was easy.

Here is a diagram of what the actual flow of the mashup was.

Here is a screen shot of the actual chart produced by the generated Fusion Charts XML.


An ADE would work in a similar way. Using provided tools which allow for ferrying of data from one endpoint to another and given a grab-bag of analysis and transformation web services an analyst could create some amazing things with little effort or technical know-how. The only developer support would be in the creation of any custom web services. It could be a very powerful tool.

Wednesday, July 22, 2009

The Walled Garden

Let me hereby declare that I love my iPhone. It is useful and wonderful and keeps me connected all the time. I have been using it in lieu of my computer at home for quite some time now. I write emails on it, craft witty 140 character tweets on a regular basis, listen to books on iPod and even play extremely enjoyable games. It is a great experience.

I have, however, begun to chafe under the strictures placed on my iPhone by both AT&T and Apple (often in conjunction with each other).

My gripes against AT&T are especially aggravating. I pay them enough money as it is for the privilege of using my iPhone, why do I need to pay them even more in order to use my iPhone as a modem? It does not seem fair that I will have to shell out an additional $30/month to do what is freely available on other, older and less capable smart phones.

What's worse, if AT&T sees an app as competitive to it's business model, it will limit that app, or flat out deny it! Consider Skype: Skype offers free calls over the Internet to other Skype users, yet AT&T will not allow Skype to make calls over its 3G or Edge networks. They pull the undue competition card.

On the Apple front, a nifty app came to my attention recently that I thought was a truly innovative and awesome use of the iPhone. Given an iPhone 3GS (with its video capabilities, compass and GPS) an "Augmented Reality" app has been developed called TwitARound.

TwitARound looks at the tweets from Twitter in your area and plots them on a map. The AR part, though, comes when you hold your phone up. The app takes your GPS position and your bearing from the compass and lays the tweet on the screen. So, as you move in a circle with your iPhone in front of your face, you can see the actual locations on your iPhone of the tweets as they would appear if the tweets were layered over real life. It's quite awesome and I would like to see more apps like this.

However, because TwitARound accesses APIs which Apple has not, but should have, made public, it cannot be published in the iTunes store.

Apple plays the non-public API card too much. For instance, they did not make their "find my phone" APIs public so that they could charge you a monthly fee through mobileMe. There are already jailbroken apps which can do this, but since they didn't make the APIs public, you won't see legitimate apps show up in the app store.

Call me naive or non-business-savvy, but all of this seems like bad business to me. As a consumer, I want freedom. It's my device, I should be able to do with it as I choose.

So, while I love my iPhone, I chafe. Yes, I chafe.

Update: (on 7/29/09)

First off, it turns out that Apple will release the video camera APIs with iPhone OS 3.1 (per Ars here). Yay for Apple on this one. It's good to see that some of the "hidden" functionality is being exposed. Now, let's see if they expose the "find my phone" API or if they milk it for more money.

Secondly, the app denial shenanigans continue. In a story here (also on Ars) it appears that all apps relating to Google Voice are being pulled and any apps which feature Google Voice are being denied. The scuttlebutt is that AT&T is pulling the strings here. Some disagree, but my vote goes towards AT&T.

The Dark Side of Twitter

I've seen a very interesting phenomenon going on in the Twitter-verse recently. It has brought to my attention that Twitter (and micro-blogging in general) can be used for reasons that are not above-board. What, pray-tell, is this dark and nefarious phenomenon?

I keep getting followed by prostitutes.

The first time it happened I just thought it was some random individual with a sick sense of self. However, the next day, another woman of the same ilk followed me, and the next day another. That's when I started getting curious (not about what the women offered, but about what was really going on).

Invariably, they all posted a provocative picture of a woman with at least one post which was anywhere from lewd to slightly suggestive. That post would have a link attached. The link takes you to some triple-X "dating" service. Within a couple of days the account is shut down (you get the "Nothing to see here, move along" message when you try to visit the account).

No doubt, for some reason I am not aware of my twitter user name has been picked up by this "dating" service and they keep following me with fake accounts, all in vain hopes of promoting their "service". It's all at least partly automated, it has to be, and there's probably one person sitting behind a desk creating profiles then running those profiles through some tool they had custom made to follow a few thousand people.

The practice, though, really brings questions to my mind about what twitter can't be used for. If it can be used for prostitute marketing, why not black-market marketing or subversive political marketing? Why even marketing at all? I once had the privilege of speaking with an individual that detailed how an anarchist group used Twitter to attempt to disrupt the RNC in Colorado.

Of course, far from being upset by all of this I tend to think of this as rather ingenious. What uses can Twitter serve? What's the most creative use any of you have seen?

Monday, April 27, 2009

JavaScript: Callbacks in Loops

I just finished a mashup that had to be blogged about. I suffered to find this solution, and I wanted to share what I learned with the world.

In the mashup I took a twitter feed and plotted the tweets onto a map based on the location of the tweeter. Let me set the stage.

The Google Map has already been set up and the list of tweets has been obtained. It is now time to plot the tweets onto the map. This will be done within a function called addMarkers. The HTTP Geocoder that Google provides will be doing our geocoding. For more information on this service, see this.

Keep in mind that I'm doing all of this in a Presto Mashlet, and will be calling out to the HTTP Geocoder via a URLProxy call that is undocumented but available for use.

At first blush, the following approach seems appropriate. Here is an excerpt from the addMarkers function:



However, this suffers from a very serious drawback, and that drawback revolves around the scope of the function as it exists on the stack. Remember that you are calling out and receiving an asynchronous response via the callback. There's no telling where this loop will be when a callback returns, but the scope of the function is maintained on the stack until all of the callbacks have been completed.

When a callback returns, the current value of i will be used to index into tweets! Since all of these calls take time, the most common result is that i will actually be out of bounds of tweets. Recall that updating the loop variable is the last operation done in any JavaScript for loop. Once you have looped through all of your indexes you, of necessity, must set i to be out of bounds of tweets. Therefor, i will be equal with tweets.length.

The result is that you pass an undefined object into placeMarker in place of what should have been the tweet.

The next logical step is that you should create a variable to hold the value of i, like this:

var myTweet = i;
...
this.placeMarker(point, tweets[myTweet]);

However, this will fail as well!

The problem here is that myTweet is still within the scope of our addMarkers function. addMarkers will therefor have only one copy of myTweet. Once again, you end up in a situation where the loop will probably finish before any of the callbacks return. The net result this time, however, is slightly different. You will pass in a valid tweet to placeMarkers, but it will be the last tweet in every instance. You'll have the same tweet attached to all of your markers on the map, the last tweet in the list.

So, how do you remove the timing issues? This is where I suffered. I hunted and pecked out half-solutions for quite a while. Finally, I had to start thinking outside of the normal box to come up with a solution.

The whole problem revolves around all of the callbacks returning to a shared scope in the stack, that being the scope of addMarkers. Once you consider it that way, it becomes obvious that providing each callback with its own scope on the stack is what is needed. The way to do that is to have a function fire off the HTTP Geocoder request. The function will get its own spot on the stack and will have its own scope. Let addMarkers maintain the loop and call this function whenever it wants to fire off a request. Pass in the tweets and the desired value of i to be remembered.

Consider the following:

This approach will result in the correct tweet being displayed with the correct marker on the map.

Wednesday, April 15, 2009

Como Se Llama?


Originally, I created my twitter account with the handle @jitlife. Obviously, jitlife is my blog, so I thought it made sense. After all, I want people reading my blog, right? Twitter seemed like a good pointer to my blog.

However, I started rethinking this mindset and eventually asked myself this question: Am I marketing my blog, or am I marketing me? By being @jitlife, I was marketing my blog. Therefor, I determined to change my twitter handle.

In deciding on my new twitter handle, I came up with a few criteria:
  1. It has to be short
  2. It has to reference me as an individual
I was going to add as 3 that it had to be clever, but the more that I thought about this, the more I realized that the first 2 are the most important, and in that order.

It has to be short as twitter only allows 140 characters. If someone is @replying to me and they have to type in a 15 character handle, well, they'll be less inclined to do so (at least from a mobile device) and they'll also have less space to say what they want to say.

That it must reference me is quite obvious once you realize that I'm marketing myself. The problem here is that all of the obvious references to me were taken! @rollins, @mrollins, @mikerollins, etc. All, gone. Most were taken and had only one or two posts, which is frustrating, but so is life.

Barring the obvious, I decided to get clever. I chose @rollinsio. Briefly, it's a silly name I call myself when I'm talking in a fake Spanish accent but it's also clever in that it could stand for Rollins I/O: perfect for twitter! It's short and it references me (rollins is prominent).