The millionth Twitter parser

28 July 2009 –

For some reason the thing I get contacted the most about my site is that thing in the right column:

I’m trying to show my latest tweet on my website like yours. How do you make the names and links show up like that?

Conceptually it’s very simple. If you like Regular Expressions, the implementation is very simple too. If you don’t, those bits might be described as hellish. Here’s the workflow:

  • Fetch latest tweet via the Twitter API
  • Find all @name references, look up their real name, and replace with a link
  • Find all URLs (“http(s)://…”) and replace with a link to that URL
  • Find all hashtags (#hashtag) and replace with a link to Twitter Search
  • If the tweet starts with @name(s), wrap them in a “To {@name(s)}:”

Simple! Right? Right. So we take a tweet which looks like:

@ballance @bspotter a summary of my view on #sharepoint - http://rexmorgan.net/media/spcomparison.jpg

And format it to look like:

<span class="salutation">To 
    <a href="http://twitter.com/ballance">Chris</a>,
    <a href="http://twitter.com/bspotter">Brandon</a>:
</span>
a summary of my view on <a href="http://twitter.com/#search?q=%23sharepoint">#sharepoint</a>
 - <a href="http://rexmorgan.net/media/spcomparison.jpg">http://rexmorgan.net/media/spcomparison.jpg</a>

<a class="date" href="http://twitter.com/rexm/status/2772241681">— <span>8:06 PM yesterday</span></a>

The code’s a bit too long to paste in this format, so I got to take Rick Strahl’s sweet CodePaste.NET for a spin and dropped the code here.

Two thing to keep in mind: (1) Twitter can sometimes be slow. Multiple lookups to get people’s real names means this can be even slower. Now, Twitter is pretty cool, but it’s not worth making your audience wait for it to time out. It’s not worth making your audience wait at all. Stick all this code in an HttpHandler and include it as a JavaScript at the bottom of your page, so it will execute after your page is safely in your user's browser. If it fails, you'll just have a blank spot on your page:

</body>
<script type="text/javascript" src="twitter.ashx"></script>

(2) The Twitter's REST API has a 150-request-per-hour limit from a single account or IP. Cache your tweet results enough not to go over that limit, or you will be blacklisted! I cache mine for two minutes. Cheers!

Rotate Image Hue in c#/.NET

3 July 2009 –

I am on vacation at the beach with my wife and a few friends, and was perusing Stackoverflow at the house while waiting for everyone to get ready to go out. I ran across a question about how to rotate hue in code, which I myself have wondered before but never really needed to. Here's what I started with:

  • I do enough design work to know the difference between HSB (also called HSV) and RGB, so I know changing the hue is extremely easy if I have the HSB value (the "H" stands for Hue!).
  • I also know GDI+ in .NET works primarily in (A)RGB. I can iterate over the pixels of a Bitmap object and Get and Set the pixel value in RGB.
  • There is a mathematical way to convert an RGB value to HSB and vice-versa, but I have no idea what that is.

A few minutes on Google yielded the equations, so I slammed this together in a few minutes more. The basic workflow is:

  • Load an image and an amount to shift the hue by
  • Iterate over all the pixels in the image
  • For each pixel, convert its RGB value to HSB, change the hue, convert back to RGB, set the pixel's new RGB value
  • Save the image

Here's a sample input/output:

Original Hue Shifted

Here's the central code which changes the hue for a single pixel:


private Color CalculateHueChange(Color oldColor, float hue)
{
    HLSRGB colorConverter = new HLSRGB(
        oldColor.R,
        oldColor.G,
        oldColor.B);
    float startHue = colorConverter.Hue;
    colorConverter.Hue = startHue + hue;
    return Color.FromArgb(
        colorConverter.Red,
        colorConverter.Green,
        colorConverter.Blue);
}

Not exactly rocket science, but interesting to me and presumably the person who asked the question. Here's the source. Disclaimer: This is *not* an efficient way to do any kind of image manipulation. It is just a proof-of-concept. Cheers!

Fixing ASP.NET “Keep Alive”

18 June 2009 –

One issue that has plagued ASP.NET developers for several years now is IIS’ default behavior to shut down the .NET worker process after an idle time (usually 20 minutes). Since ASP.NET’s biggest bottleneck is usually the application startup (compiling ASPX files and loading assemblies into the AppDomain), this means the first visitor to your site after 20+ minutes of no activity can have a very long wait before that page comes back. Embarrassingly long, for you. Subsequent pages come through at the speed you know your site is capable of, but the first impression is done and gone.

If you control the web server, this isn’t a big deal. A simple config change in IIS6 or IIS7 will keep the worker process alive – and your site code in memory, waiting to be served - indefinitely.

But plenty of sites run on shared hosting. In that case, you don’t have any real access to IIS settings. It’s also not really in the interest of your shared hosting provider to offer to keep your site in memory any more than it absolutely needs to be. So what to do? There is a handful of services which offer to hit your site every x minutes to keep IIS from killing your application. Those just don’t seem right, though. Keeping an application alive should be something we have very granular control over – not some one-off service that might disappear tomorrow.

So I wrote a little class that will let an ASP.NET application decide when and whether to “keep alive” itself. Nothing special, really – it puts an object into cache for 10 minutes, and when the 10 minutes are up, the object requests a page in your site through IIS and re-inserts itself into the cache for 10 more minutes.

As long as your application wants to keep itself alive, it can call KeepAlive.Start(url). Just pass in any URL in your site (e.g. "http://www.rexmorgan.net/journal"). You can check IsKeepingAlive to see if your app is currently keeping itself alive. If you want to stop, call KeepAlive.Stop(). (Of course, that won’t actually kill your app – just allows IIS to eventually kill the process if it goes idle).

Here’s the code. If you find it useful, let me know! Cheers:


public class KeepAlive
{
     private static KeepAlive instance;
     private static object sync = new object();
     private string _applicationUrl;
     private string _cacheKey;

     private KeepAlive(string applicationUrl)
     {
         _applicationUrl = applicationUrl;
         _cacheKey = Guid.NewGuid().ToString();
         instance = this;
     }

     public static bool IsKeepingAlive
     {
         get
         {
             lock (sync)
             {
                 return instance != null;
             }
         }
     }

     public static void Start(string applicationUrl)
     {
         if(IsKeepingAlive)
         {
             return;
         }
         lock (sync)
         {
             instance = new KeepAlive(applicationUrl);
             instance.Insert();
         }
     }

     public static void Stop()
     {
         lock (sync)
         {
             HttpRuntime.Cache.Remove(instance._cacheKey);
             instance = null;
         }
     }

     private void Callback(string key, object value, CacheItemRemovedReason reason)
     {
         if (reason == CacheItemRemovedReason.Expired)
         {
             FetchApplicationUr();
             Insert();
         }
     }

     private void Insert()
     {
         HttpRuntime.Cache.Add(_cacheKey,
             this,
             null,
             Cache.NoAbsoluteExpiration,
             new TimeSpan(0, 10, 0),
             CacheItemPriority.Normal,
             this.Callback);
     }

     private void FetchApplicationUrl()
     {
         try
         {
             HttpWebRequest request = HttpWebRequest.Create(this._applicationUrl) as HttpWebRequest;
             using(HttpWebResponse response = request.GetResponse() as HttpWebResponse)
             {
                 HttpStatusCode status = response.StatusCode;
                 //log status
             }
         }
         catch (Exception ex)
         {
             //log exception
         }
     }
}

Stack Overflow Search Provider

2 March 2009 –

Stack Overflow is a fantastic site. From the standpoint of a consumer (me), it’s a huge success on nearly all fronts. Kudos to Jeff and Joel and the team for the great work. However, I’ve noticed one glaring piece of suck on the site – the search. We are way too spoiled by Google to put up with this circa-2000-quality search results. So, until they fix it (given their track record so far, I expect it to eventually excel), I have taken to using Google’s index of the site. Typing site:stackoverflow.com gets old though, so to save me (and now, you) the extra 23 keystrokes, I’ve made a Stack Overflow Google Search provider for Firefox:


<searchplugin xmlns="http://www.mozilla.org/2006/browser/search/">
     <shortname>Stack Overflow</shortname>
     <description>Stack Overflow Google Search</description>
     <inputencoding>UTF-8</inputencoding>
     <img height="16" width="16" />data:image/x-icon;base64,R0lGODlhEAAQAMQAAGZmZu6+gNhWI+aeQfvv35lmM8+4oOB4T/fd0+CGEbuZd9bCrf///+HClvPPoOSWMbSPaeWObOiae9thMvro4vffwOTWyequYPj18f337/XXsOimUAAAAAAAAAAAAAAAACH5BAAHAP8ALAAAAAAQABAAAAVhICOOYjYJCKmSkjBRaxwJRxZXzpgdQhRviYdGRDlJVoRLIjgsTmyrTOARrCBSJMPCgmEQNINNrEAmKwyNSsyggJTJKgBAhbEsDAy5SC/v9/NzgIB+fIF6MXuGhIuHjIwMIQA7</img>
     <url template="http://www.google.com/search" method="GET" type="text/html">
         <param value="site:stackoverflow.com {searchTerms}" name="q" />
         <param value="utf-8" name="ie" />
         <param value="utf-8" name="oe" />
         <param value="t" name="aq" />
     </url>
     <searchform>http://www.google.com/search?q=site%3Astackoverflow.com</searchform>
</searchplugin>

Just create a new xml file in your Mozilla Firefox\searchplugins folder, paste the above in, and restart FF. You should see something like this:

sosearch

Code Name Velocity

4 June 2008 –

At TechEd 2008 (which unfortunately I was not able to attend, but a few colleagues are providing me continuous updates), Microsoft announced a lot of things that are really important to me. New breakthroughs in voice recognition are always cool, but I am a web guy at my core so that's what turns my crank.

I have a few side projects that need a distributed in-memory cache. These projects are all in .NET so my obvious choices were SharedCache, ScaleOut StateServer, NCache or memcached Win32 with one of several .NET clients. Memcached is a lot more free than the latter two and more well-regarded than the first, so last week I started a little personal project of porting memcached to .NET. My C is pretty rusty but relatively speaking memcached is pretty simple, and it seems like a managed version could perform quite well. Since I know c# better than English, it makes enhancements and customizations to my caching layer much easier.

Thankfully I didn't burn too much time on memcached.NET before Microsoft announced Velocity - a distributed cache written to natively support the .NET stack. After digging in to the CTP code, I came away very impressed. In the past I've viewed the caching layer of an application very simplistically from a usage perspective - a persistent dictionary of keys and frequently used objects. Distributing it across multiple machines adds complexity to the service but not the consumers. Velocity has some features with huge implications in the kind of value a cache can deliver:

  • Lookups by tag. This means one thing to me: multi-dimensional object keying. No more ugly pointers or multiple copies of an object per key. This increases the value of the caching layer by orders of magnitude over a simple key-value pair table.
  • In-process storage. With a typical cache service, our cache will store a serialized copy of User::Bob in memory. Bob is very popular in our application, so his user object is deserialized into the application process on almost every request. This is wasteful and contributes to memory pressure since we are essentially storing the same object at least twice - sometimes more. In-process storage allows heavily used items like User::Bob to stay in the application process space instead of being serialized to the cache service.
  • OOTB Session Provider. Velocity includes a SessionStateStoreProvider implementation out of the box. A high performance, high availability state server for free without putting any pressure on your SQL box (which is probably why you are considering Velocity in the first place).
  • Atomic operations & concurrency management. Optimistic locking with versioning opens the door to much more sophisticated functionality than would otherwise be practical with the typical pessimistic locking. I predict applications that are able to align themselves with actual usage patterns much more intelligently.

As I noted in the beginning, memcached is pretty simple. And as my short list of highlights suggests, Velocity is anything but. Although memcached and Velocity solve the same problem - offloading high-volume requests from the data tier to an in-memory intermediary - they serve surprisingly different purposes. Scott Watermasysk summarizes the distinction neatly:

On one hand you have Memcached which treats the cache as something you should never rely on. It is there to help but you should always assume it is going to fail on you and even more importantly (to Memcached) you should accept that as a fact. If you read the Memcached FAQ you can almost here the author laughing when talking about fault tolerance. On the other side of the fence you have features like replication and high availability.

The practical implications are significant. What kind of service are you running? What kind of data are you moving? Who is your audience? These are very high level architectural questions that help shape whether to go with a memcached-like solution or a Velocity-like solution. Jim Benedetto, in an interview with Baseline Magazine, described MySpace's answer to these questions:

100% reliability is not necessarily [the] top priority. "That's one of the benefits of not being a bank, of being a free service"...on MySpace the occasional glitch might mean the Web site loses track of someone's latest profile update, but it doesn't mean the site has lost track of that person's money. "That's one of the keys to the Web site's performance, knowing that we can accept some loss of data"

Usually my default reaction such an idea is "no way!" Loosely coupling system components to the point where data in one layer might not make it to the next layer every time makes me feel very uncomfortable. Embracing discomfort is necessary for growth though, so let's give it a big bear hug and dive in. The original purpose of memcached was to support LiveJournal. In this environment, a failure rate of the caching layer might just mean an extra few round trips to the database. As long as the caching layer works some of the time, we still see a benefit. But then issues start to get sticky. What if the failure comes when we try to invalidate a cache item because a journal post was updated? The old version will be exposed to the public longer than intended. Is this acceptable behavior? Obviously I don't know the answer to that for LiveJournal's case. But the point is significant - how does reliability affect the business value of your application? For MySpace, responding promptly to a hundred million requests and failing a percentage of those delivers greater value than rolling service brownouts or throttling everyone.

Ultimately, the Velocity announcement has brought these questions to the forefront of my mind. I'll be relaxing in Ireland for the next week and a half, but I'm sure I'll be thinking about how to answer these questions for my apps. Make sure to do the same for yours.

© 2002-2009 Rex Morgan.
Content available under a Creative Commons license.
Site code and design may not be reproduced.