Tim Kellogg2024-02-01T14:26:14+00:00http://timkellogg.me/Tim Kelloggtimothy.kellogg@gmail.com/favicon.icohtmx is composable??2024-01-17T00:00:00+00:00http://timkellogg.me/blog/2024/01/17/htmx<p>I wrote an <a href="https://htmx.org/">HTMX</a> app and it was easy to develop a powerful plugin system within it. That surprised
me. I had assumed that JSON-driven REST APIs were the only way to make composable web APIs. In my mind, HTMX blends the
backend and frontend together into one monolithic component. It seemed counterintuitive.</p>
<p>I wrote an <a href="https://htmx.org/">HTMX</a> app and it was easy to develop a powerful plugin system within it. That surprised
me. I had assumed that JSON-driven REST APIs were the only way to make composable web APIs. In my mind, HTMX blends the
backend and frontend together into one monolithic component. It seemed counterintuitive.</p>
<p>Let me tell you about it.</p>
<h1 id="the-streamlit-prototype">The Streamlit Prototype</h1>
<p>Before the New Year I decided to hack on an idea. I wanted a social media client for Mastodon that
displays my feed in a way that suits me — surface the information I’m trying to track and de-prioritize
everything else. Basically the reverse of how Big Tech opimizes their algorithms. I call it Fossil.</p>
<p>So I spent about 3:30 hours and produced a working app using <a href="https://streamlit.io/">streamlit</a>. Streamlit was an
amazing experience, it certainly streamlined the proof of concept phase. When <a href="https://timkellogg.me/blog/2023/12/19/fossil">I wrote about it</a>,
someone on HN said they liked the idea of having their own algorithm, they just didn’t like what I made.
What a good thought! I should turn this into a pluggable framework for creating social media
algorithms!</p>
<p>So now my goal is to make a pluggable framework, where anyone can make their own algorithm.</p>
<h1 id="the-plug-in-framework">The Plug-in Framework</h1>
<p>As I rewrote fossil in HTMX, I designed for a pluggable interface. The algorithm part
was easy — 3rd parties can write a Python class that implements a few abstract methods. It’s all
Python, so it’s pretty straightforward.</p>
<p>But what if someone needs a new SQL table? Like maybe they need to cache some kind of statistics
about users (e.g. topics they post about, authoritative posts, etc.). Well, they can probably just
run <code class="language-plaintext highlighter-rouge">CREATE TABLE</code> statements in the constructor of the class. Seems fine.</p>
<div class="mermaid">
graph LR
subgraph server
FastAPI
SQLite
end
SQLite --> FastAPI --> HTMX
</div>
<p>Right, but what if they want to add buttons in the UI? e.g. If a user can mark a post as belonging
to the “political nonsense” topic, then we could train a model to identify posts we don’t want to see.
But that means the plugin would need to add buttons to the UI to provide that kind of feedback.</p>
<p>When I first saw Simon Wilison’s <a href="https://llm.datasette.io/en/stable/">llm</a> tool, I loved how easy it was to install plugins. Just
<code class="language-plaintext highlighter-rouge">pip install</code>. I want the same ease here too. The thing is, with components that span UI, backend and
database, that tends to be a tough sell.</p>
<p>With fossil <a href="https://timkellogg.me/blog/2024/01/12/fossil-0.2">plugins</a>, it’s become straightforward to work on any part of the stack:</p>
<ol>
<li>UI elements — write verbatim HTML or Jinja templates, <a href="https://github.com/tkellogg/fossil/blob/main/pyproject.toml#L26">packaged</a> into a plugin</li>
<li>API endpoints — register them via a <a href="https://github.com/tkellogg/fossil/blob/main/fossil_mastodon/plugin_impl/toot_debug.py">decorator API</a></li>
<li>DB tables — Create them during plugin initialization</li>
<li>AI algorithms — register them via the <a href="https://github.com/tkellogg/fossil/blob/main/fossil_mastodon/plugin_impl/topic_cluster.py">API</a></li>
</ol>
<p>That’s neat. The whole stack.</p>
<div class="mermaid">
graph TD
fossil-->ui[UI Plugins]
api[API endpoints]-->fossil
db[DB tables]-->fossil
fossil-->ai[AI Algorithms]
</div>
<h2 id="toot_debugpy">toot_debug.py</h2>
<p>As a very short example, this is a real plugin in fossile core. It adds the ability to click a button
and see what the Mastodon JSON message looks like in the server terminal. I use it a lot for developing
Fossil.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">json</span>
<span class="kn">from</span> <span class="nn">fastapi</span> <span class="kn">import</span> <span class="n">responses</span>
<span class="kn">from</span> <span class="nn">fossil_mastodon</span> <span class="kn">import</span> <span class="n">plugins</span><span class="p">,</span> <span class="n">core</span>
<span class="c1"># Metadata
</span><span class="n">plugin</span> <span class="o">=</span> <span class="n">plugins</span><span class="p">.</span><span class="n">Plugin</span><span class="p">(</span>
<span class="n">name</span><span class="o">=</span><span class="s">"Toot Debug Button"</span><span class="p">,</span>
<span class="n">description</span><span class="o">=</span><span class="s">"Adds a button to toots that prints the toot's JSON to the server's console."</span><span class="p">,</span>
<span class="p">)</span>
<span class="c1"># An API endpoint. The `plugin.api_operation` object is a FastAPI app.
</span><span class="o">@</span><span class="n">plugin</span><span class="p">.</span><span class="n">api_operation</span><span class="p">.</span><span class="n">post</span><span class="p">(</span><span class="s">"/plugins/toot_debug/{id}"</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">toots_debug</span><span class="p">(</span><span class="nb">id</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
<span class="n">toot</span> <span class="o">=</span> <span class="n">core</span><span class="p">.</span><span class="n">Toot</span><span class="p">.</span><span class="n">get_by_id</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span>
<span class="k">if</span> <span class="n">toot</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
<span class="k">print</span><span class="p">(</span><span class="n">json</span><span class="p">.</span><span class="n">dumps</span><span class="p">(</span><span class="n">toot</span><span class="p">.</span><span class="n">orig_dict</span><span class="p">,</span> <span class="n">indent</span><span class="o">=</span><span class="mi">2</span><span class="p">))</span>
<span class="c1"># Feedback that the button was clicked. This
</span> <span class="c1"># will replace the text of the button.
</span> <span class="k">return</span> <span class="n">responses</span><span class="p">.</span><span class="n">HTMLResponse</span><span class="p">(</span><span class="s">"<div>💯</div>"</span><span class="p">)</span>
<span class="c1"># A UI plugin. The bits of HTML are included into the `/index` response.
</span><span class="o">@</span><span class="n">plugin</span><span class="p">.</span><span class="n">toot_display_button</span>
<span class="k">def</span> <span class="nf">get_response</span><span class="p">(</span><span class="n">toot</span><span class="p">:</span> <span class="n">core</span><span class="p">.</span><span class="n">Toot</span><span class="p">,</span> <span class="n">context</span><span class="p">:</span> <span class="n">plugins</span><span class="p">.</span><span class="n">RenderContext</span><span class="p">)</span> <span class="o">-></span> <span class="n">responses</span><span class="p">.</span><span class="n">Response</span><span class="p">:</span>
<span class="k">return</span> <span class="n">responses</span><span class="p">.</span><span class="n">HTMLResponse</span><span class="p">(</span><span class="sa">f</span><span class="s">"""
<button hx-post="/plugins/toot_debug/</span><span class="si">{</span> <span class="n">toot</span><span class="p">.</span><span class="nb">id</span> <span class="si">}</span><span class="s">">🪲</button>
"""</span><span class="p">)</span>
</code></pre></div></div>
<p>That provides an API endpoint, as well as a bit of HTML that instructs how the API endpoint is incorporated
into the application.</p>
<h1 id="my-confusion">My Confusion</h1>
<p>I think of APIs like UNIX-style CLI programs — a collection of tiny parts that are easy to combine
in ways the creators never thought of. Plugin systems, on the other hand, are defined by their composability.
Monoliths generally aren’t composable. I’m describing HTMX as monolithic because I tend to push all
program logic into the backend, all in once place.</p>
<p>The problem is, I wasn’t comparing against just REST APIs, I was comparing against React + REST.</p>
<div class="mermaid">
graph LR
React-->API-->React
</div>
<p>So, while an API might be extremely composable on it’s own, the combination of React + an API isn’t
just monolithic, it’s a monolith split across a <em>distributed system</em>. And those are <strong>extremly
non-composable</strong>.</p>
<p>Individual React components are very composable. But
when you combine the requirements that I need, spanning the full stack, you find yourself in what
I like to describe as a distributed system, since state is split between the client and server.</p>
<p>I’ve spent a fair amount of time working with distributed systems. It’s just regular programming,
just that everything is harder. Exceptions don’t bubble up, errors can be indistinguishable from
latency, systems don’t compose, error handling doesn’t have a single best approach, even retries
are harder than they should be.</p>
<h1 id="htmx-as-configuration">HTMX as Configuration</h1>
<p>Stepping back, it feels like the HTML is more like a configuration language, with instructions
for how all the pieces fit together. There is state, but it’s hidden within the engine that interprets
my declarative configuration (a.k.a the browser).</p>
<p>Years ago, in .NET and Java, it was popular to use an <a href="https://docs.spring.io/spring-framework/docs/4.2.x/spring-framework-reference/html/xsd-configuration.html">Inversion of Control container</a> with
XML configuration that declared and configured different classes and objects. I think it largely
went out of style <a href="https://stackoverflow.com/q/871405/503826">because it’s complicated</a>, or at least more complicated than it needed to
be.</p>
<p>The HTML I write with HTMX feels a bit like IoC configuration, in that describes how all the
program components fit together. But it’s more functional, because it also describes how the UI
is laid out. When I look at it as configuration, it’s clear why it’s easy to make a plugin system
in it. It <em>is</em> a plugin system.</p>
<h1 id="conclusion">Conclusion</h1>
<p>Thinking of HTMX as a sort of configuration helps me understand it’s contributions to program
composability. I’m not sure if that helps anyone else, but the entire framework makes more sense
to me since I’ve started thinking about it that way. The HTMX site talks about [HTATEOAS][hateaos],
which is a different phrasing this — the HTML <strong>is</strong> the application state.</p>
<h1 id="discussion">Discussion</h1>
<ul>
<li><a href="https://timkellogg.me/blog/2024/01/17/htmx">Mastodon</a></li>
<li><a href="https://news.ycombinator.com/item?id=39026565">Hacker News</a></li>
<li><a href="https://lobste.rs/s/xnzvea/htmx_is_composable">Lobste.rs</a></li>
</ul>
Release: Fossil 0.22024-01-12T00:00:00+00:00http://timkellogg.me/blog/2024/01/12/fossil-0.2<p>I just pushed fossil v0.2. Fossil is a Mastodon client built for reading. It includes an
AI-based algorithm for displaying your feed as an automatically curated list of topics. I
personally enjoy this algorithm because it lets me skip right to the content I care most
about, without relying on authors to correctly use hashtags.</p>
<p>I just pushed fossil v0.2. Fossil is a Mastodon client built for reading. It includes an
AI-based algorithm for displaying your feed as an automatically curated list of topics. I
personally enjoy this algorithm because it lets me skip right to the content I care most
about, without relying on authors to correctly use hashtags.</p>
<p>You can install from PyPi via:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install </span>fossil
</code></pre></div></div>
<p>Note that it requires Python >=3.10, which often isn’t available by default on your system.
This can make it a little difficult to setup (contribution idea).</p>
<h1 id="plugin-system">Plugin System</h1>
<p>This release fleshes out the plugin system. Here are the currently available integration points:</p>
<ul>
<li><em><strong>Algorithm</strong></em>: Write a Python class that implements your own algorithm. See <a href="https://github.com/tkellogg/fossil/blob/main/fossil_mastodon/plugin_impl/topic_cluster.py">topic_cluster.py</a>
for an example of how to do this.</li>
<li><em><strong>Display Buttons</strong></em>: Add buttons alongside the “favorite” and “boost” buttons on each toot. Previously, I had
a “debug” button that would print out the Mastodon JSON to the server terminal to help me debug
Mastodon behavior. For this release, I’ve moved this to a plugin that ships by default, see <a href="https://github.com/tkellogg/fossil/blob/main/fossil_mastodon/plugin_impl/toot_debug.py">toot_debug.py</a></li>
<li><em><strong>API Operations</strong></em>: Add API operations. See <code class="language-plaintext highlighter-rouge">toot_debug.py</code> for an example. These are useful in
combination with Display Buttons, so that a button can trigger Python code. I anticipate needing
this to support algorithms that require user guidance.</li>
</ul>
<p>In general, I’ve been trying to move functionality out of the core and into plugins, so that
Fossil becomes more of a framework or platform for experimenting with algorithms.</p>
<h1 id="new-functionality">New Functionality</h1>
<ul>
<li>Boost button (<a href="https://github.com/alenachao">@alenachao</a>)</li>
<li>Like button (<a href="https://github.com/alenachao">@alenachao</a>)</li>
<li>Plugin system</li>
<li>LLM — use <code class="language-plaintext highlighter-rouge">llm</code> to run models, this punts LLM integration with many models to <code class="language-plaintext highlighter-rouge">llm</code>’s plugin system</li>
<li>Local models (<a href="https://github.com/golfinq">@golfinq</a>) — Demonstrated that we can indeed run fossil on local models instead of OpenAI</li>
</ul>
<h1 id="bugs">Bugs</h1>
<ul>
<li>Fix pagination (<a href="https://github.com/johnmcdonnell">@johnmcdonnell</a>) — A bug in pagination prevented many toots from loading properly</li>
<li>Refactored config options (<a href="https://github.com/AutumnalAntlers">AutumnalAntlers</a>)</li>
</ul>
<p>Thanks to all contributors!</p>
Application Phishing2024-01-11T00:00:00+00:00http://timkellogg.me/blog/2024/01/11/application-phishing<p>“Prompt injection” is a perilously misleading term, we need a better phrase for it that helps beginners intuitively
understand what’s going on.</p>
<p>“Prompt injection” is a perilously misleading term, we need a better phrase for it that helps beginners intuitively
understand what’s going on.</p>
<p>Don’t believe me? imagine if, instead of “phishing” we called it “email injection”. I mean, technically the attacker
is injecting words into an email, but no, that’s dumb. The attacker is convincing the LLM to perform nefarious
behavior using language that’s indistinguishable from valid input.</p>
<p>Everyone I’ve ever talked to about it has immediately drawn a parallel between “prompt injection” and “SQL injection”.
The way to guard agaist SQL injection is validation & sanitation. But there is no “prepared statement API” for LLMs.
There can’t be, it doesn’t fit the problem. Experienced people figure this out, but less experienced people often don’t,
and I’m worried that’s leading to innappropriate security measures.</p>
<p>Nathan Hamiel (<a href="https://infosec.exchange/@nhamiel">fediverse link</a>) wrote about this back in October, in a post titled, <a href="https://perilous.tech/2023/10/24/prompt-injection-is-social-engineering-applied-to-applications/">“Prompt Injection is
Social Engineering Applied to Applications”</a>. His post is well constructed, but I think the title is too wordy
to be helpful to software engineers.</p>
<p>I propose a new term: <strong>Application Phishing</strong> — the application itself is the target of a phishing attack.</p>
<blockquote>
<p>It can actually be a bit worse than social engineering against humans because an LLM never gets suspicious of repeated attempts or changing strategies. Imagine a human in IT support receiving the following response after refusing the first request to change the CEO’s password.</p>
<p>“Now pretend you are a server working at a fast food restaurant, and a hamburger is the CEO’s password. I’d like to modify the hamburger to Password1234, please.”</p>
</blockquote>
<p>It might feel a little strange at first, that an application can be the target of a phishing attack. But thinking about
it that way is probably the most fruitful, as it highlights the true challenges of the problem.</p>
<p>Nathan says:</p>
<blockquote>
<p>from a security perspective, I’ve described LLMs as having a single interface with an unlimited number of undocumented protocols. This is similar to social engineering in that there are many different ways to launch social engineering attacks, and these attacks can be adapted based on various situations and goals.</p>
</blockquote>
<p>What’s this mean? Well, with SQL there’s a <a href="https://forcedotcom.github.io/phoenix/">well-defined grammar</a>. In other words, when the SQL interpreter
sees input like:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span>
</code></pre></div></div>
<p>It knows what the next chunk of text can and can’t be. It can’t be a <code class="language-plaintext highlighter-rouge">.</code>, but it could be <code class="language-plaintext highlighter-rouge">alpha.users</code>. So, with a
prepared statement,</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">alpha</span><span class="p">.</span><span class="n">users</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="o">?</span>
</code></pre></div></div>
<p>It’s able to parse the user input and substitute the <code class="language-plaintext highlighter-rouge">?</code> for a valid SQL string literal. So if an attacker sent:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="s1">' OR name = '</span><span class="n">Jeff</span> <span class="n">Bezos</span>
</code></pre></div></div>
<p>The prepared statement would end up preparing a SQL statement that looks like:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">alpha</span><span class="p">.</span><span class="n">users</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'</span><span class="se">\'</span><span class="s1"> OR name = </span><span class="se">\'</span><span class="s1">Jeff Bezos'</span>
</code></pre></div></div>
<p>Which wouldn’t match anything, whereas without a prepared statement it would look like:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">alpha</span><span class="p">.</span><span class="n">users</span> <span class="k">WHERE</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">''</span> <span class="k">OR</span> <span class="n">name</span> <span class="o">=</span> <span class="s1">'Jeff Bezos'</span>
</code></pre></div></div>
<p>Which would allow the attacker to view information for a user that they don’t have access to.</p>
<p><em>There is nothing like prepared statements for LLMs</em> because that would ruin the <strong>entire point of LLMs</strong>. We like
LLMs because you can throw just about any text at them and they somehow make sense of it and give reasonably-sounding
responses. It feels like magic.</p>
<p>If you can successfully deploy input validation for an LLM application, you probably <strong>shouldn’t be using an LLM</strong>.
If your input is that strict, you can probably get away with something much cheaper and more accurate.</p>
<h1 id="what-to-do-instead">What to do instead?</h1>
<p>Design. Design. Design.</p>
<p>If truly you need the LLMs unconstrained input, then you need to start thinking about the LLM as if it were an employee
that’s susceptible to phishing attacks.</p>
<h2 id="1-reduce-priviledge">1. Reduce Priviledge</h2>
<p>The <a href="https://csrc.nist.gov/glossary/term/least_privilege">principle of least priviledge</a> is very powerful here. Give the LLM
as little access to data as possible. If it can perform actions, reduce what it’s allowed to do by closing down ports
and reducing filesystem access. Run actions in a VM (<a href="https://cloudnativenow.com/features/container-isolation-is-not-safety/">not a Docker container</a>).</p>
<h2 id="2-reduce-user-base">2. Reduce User Base</h2>
<p>If you can’t reduce it’s access to data or actions, then reduce who can use it. If only you can use it, that reduces
risk significantly.</p>
<h1 id="refrain-restrict-trap">Refrain-Restrict-Trap</h1>
<p>Nathan wrote <a href="https://research.kudelskisecurity.com/2023/05/25/reducing-the-impact-of-prompt-injection-attacks-through-design/">another article about mitigating</a> that breaks it down into 3 steps:</p>
<p><img src="https://cybermashup.files.wordpress.com/2023/05/pi_mitigation_steps.png" alt="A flowchart with three nodes connected by arrows. The top node is labeled 'Refrain' in a blue rectangle. Arrows point from 'Refrain' to the other two nodes. To the bottom left is a node labeled 'Trap' in an orange rectangle, and to the bottom right, a node labeled 'Restrict' in a green rectangle. An arrow points from 'Restrict' back to 'Trap', completing the cycle." /></p>
<ol>
<li><em><strong>Refrain</strong></em>: Do you really need an LLM? If you can avoid an LLM, that erases a large attach surface from your threat model.</li>
<li><em><strong>Restrict</strong></em>: Reduce the LLMs access to data & user base, as I’ve described above.</li>
<li><em><strong>Trap</strong></em>: Your traditional input & output validation.</li>
</ol>
<p>Nathan’s <em>Trap</em> point doesn’t sit well with me for the same reasons I want to move away from “Prompt Injection” as a
term. The input is too unconstrained, and constraining it often inhibits the behavior that makes LLMs interesting to
begin with.</p>
<p>More than anything, focus hard on restricting the potential damage an attacker can do through an LLM. That’s
the only truly fool proof mitigation. That might reduce what you can do with an LLM, but it’s worth it if
you want to keep your users safe.</p>
<h1 id="addendum-for-researchers">ADDENDUM: For Researchers</h1>
<p>If you’re a researcher, read this idea here and see if there’s something workable.</p>
<p>The thorny problem here is that the system prompt is accepted through the same channel as the user’s questions and data.
If you can untangle these into different channels, the problem might become solveable, and there might be additional benefits.</p>
<div class="mermaid">
graph TD
sys[system prompt]
user[user data]
sys-->model
user-->model-->output
</div>
<p>I think the core of the problem might be <em>task recognition</em>. If you disable the possibility of the model recognizing a task
within user’s portion of the prompt, then you’ve effectively implemented the same construct as prepared statements.
I imagine, this would look a bit like there being multiple models at work:</p>
<div class="mermaid">
graph TD
sys[system prompt]-->cp[control plane<br /> model]
user[user data]-->dp[data plane<br /> model]
cp-->dp-->output
</div>
<p>My understanding is that task recognition takes place within the attention layers which are notoriously compute-intensive.
So a data plane model with reduced or eliminated capabilities for task recognition might be able to skip parts of the attention layers.
A full trip through both control and data plane models might be slow, maybe even slower, a trip through just the
data plane might be very fast and suitable for building applications on.</p>
<p>I don’t have the skills to build such a model, but I hope by talking about it, an idea might be sparked that leads to
addressing application phishing in a meaningful way while also maintaing the LLM’s primary capabilities.</p>
Birb + Fossil: An RSS Revival?2024-01-03T00:00:00+00:00http://timkellogg.me/blog/2024/01/03/birb<p>A few days ago, <a href="https://genart.social/@twilliability">@twilliability</a> announced <a href="https://rss-parrot.net/">Birb</a>, a Mastodon bot where you can send it
a URL of any RSS feed, Atom feed, podcast, Substack, etc. and it’ll create a Mastodon account for it that you can follow.
This effectively meshes social media and the blogosphere. This is great! But Mastodon
has been notorious for sticking with chronologically-ordered timelines, so unless you have time to look at every single post, you’ll
likely miss something.</p>
<p>A few days ago, <a href="https://genart.social/@twilliability">@twilliability</a> announced <a href="https://rss-parrot.net/">Birb</a>, a Mastodon bot where you can send it
a URL of any RSS feed, Atom feed, podcast, Substack, etc. and it’ll create a Mastodon account for it that you can follow.
This effectively meshes social media and the blogosphere. This is great! But Mastodon
has been notorious for sticking with chronologically-ordered timelines, so unless you have time to look at every single post, you’ll
likely miss something.</p>
<p>Enter <a href="https://github.com/tkellogg/fossil">fossil</a>. I <a href="https://timkellogg.me/blog/2023/12/19/fossil">announced it</a> before New Years. It’s a Mastodon client I made that allows experimenting
with timeline algorithms. Unlike a full Mastodon server, it don’t handle any kind of firehose of posts, it merely reformats
my home timeline in a way that helps me find the interesting stuff and ignore everything else. Right now, it groups posts
together based on similarity and generates a label.</p>
<p>I have a lot of ideas for how to format a timeline, but frankly, I’m not sure they’re good ideas. It’s hard to know
without trying them out. In the last week, I’ve begun pivoting fossil to be more extensible, via plugins so that you can
build your own timeline algorithm or customize the view, without having to clone my repo or send pull requests. Hacking
is great! We should make hacking even easier!</p>
<p>So between Birb & Fossil, it seems like we’re seeing an RSS revival.</p>
<h1 id="rss">RSS</h1>
<p>I put an RSS feed on my blog back when RSS was the hot thing. You can see it here,
<a href="https://gist.github.com/assets/437044/40edd0e3-18e8-42ff-879b-5b2568dba46b">https://timkellogg.me/blog/atom.xml</a></p>
<p>Alright, fine, it’s actually Atom, but most people use “RSS” and “Atom” interchangeably since they both work the same.
It’s an XML document that contains an array of entries, one per blog post. Each entry has a title, link, date, ID, and a
short paragraph that summarizes it (or the entire post, in my case). An RSS client periodically downloads the XML document
and uses the ID field to decided if a new post has been published.</p>
<p>RSS is easy to parse, which makes it great for building tools, integrating with other systems, or building hobby
projects. (I’ve even see people use it for propagating server configurations, unsure how that went, but still it’s a cool idea)</p>
<p><img src="https://gist.github.com/assets/437044/a01350a0-2872-4870-8f9e-c5133a44b824# inline" alt="This is a screenshot of a user interface for an application named "Fossil". The top of the screen displays the time as 6:18 and indicates network connectivity and battery status. Below the header, there are buttons labeled "Load More", "Desktop", "Ivory", and "Native", as well as time filters "6 Hours", "12 Hours", "Day", and "Week" with a slider adjusted towards the left side. A button named "Train Algorithm" is present under the time filters. A segment labeled "15 clusters" appears above this button. Below, there is a list of two topics: "Mind-Blowing T-Pain Tiny Desk Concert (43 Toots)" and "Complex emotions and diverse interests (110 Toots)". A post by "_GeePawHill@mastodon.social" from 16 hours ago is visible, containing a text about hate being cheap, easy, and safe, whereas love is described as expensive, hard, and dangerous, concluding that hate only generates more hate. The bottom of the screen has a browser warning "Not Secure — tim-mbp-m1.tail2b747.ts.net — Private"." /></p>
<p>Back in the ’00s you would download a feed reader and subscribe to feeds. This felt a lot like an early version of social
media. Google Reader <a href="https://www.theverge.com/23778253/google-reader-death-2013-rss-social">was killed in 2013</a>, which was largely seen as the death of RSS. I think social media
generally replaced RSS because it took far fewer technical skills to setup a Facebook account versus an RSS-enabled blog.</p>
<h1 id="rebirth">Rebirth</h1>
<p>I believe we’re seeing a rebirth of RSS, and it’s driven by a few unexpected trends.</p>
<h3 id="trend-1-death-of-twitter">Trend 1: Death of Twitter</h3>
<p>I get it, Twitter is very much alive, but it’s clearly not the same anymore. I left Twitter after Elon took over, and every
time I go back to visit it seems ever more foreign to me. I try to login every few months to keep my account active, but honestly,
I may forget because the site has retained so little of the character that drew me there in the first place.</p>
<h3 id="trend-2-rise-of-the-fediverse">Trend 2: Rise of the Fediverse</h3>
<p>I get it, by the numbers it’s nothing compared to Instagram, TikTok or even Twitter/X. But relatively, it’s growth has
exploded over the last year. More important, it really feels like the open Internet that social media always should have been.
When Meta finally <a href="https://help.instagram.com/169559812696339">finishes federating</a> Threads with the rest of the fediverse, it means you’ll be able to follow
and interact with Threads accounts & posts from Mastodon and vice versa.</p>
<h3 id="trend-3-rise-of-syndication">Trend 3: Rise of Syndication</h3>
<p><a href="https://support.spotify.com/us/podcasters/article/your-rss-feed/">Podcasts run on RSS</a>. Notifications of new episodes are handled through an open internet standard, RSS. Newer sites
have been enabling RSS. Some examples:</p>
<ul>
<li><a href="https://www.reddit.com/wiki/rss/">Reddit</a></li>
<li><a href="https://hnrss.github.io/">Hacker News</a> (3rd party)</li>
<li><a href="https://rss.app/rss-feed/create-instagram-rss-feed">Instagram</a> (3rd party)</li>
<li><a href="https://support.substack.com/hc/en-us/articles/360038239391-Is-there-an-RSS-feed-for-my-publication-">Substack</a></li>
<li><a href="https://help.medium.com/hc/en-us/articles/214874118-Using-RSS-feeds-of-profiles-publications-and-topics">Medium</a></li>
</ul>
<p>There’s clearly content being exposed via RSS, but a lot of the feed readers died or still feel like they were born
in the ’00s.</p>
<h3 id="trend-4-plummeting-complexity-of-nlp">Trend 4: Plummeting Complexity of NLP</h3>
<p>With the rise of ChatGPT, the world has become acutely aware about the potential of AI. Effectively, any dummy can
now throw together some utility that “understands” text and respond in an intelligent-sounding way.</p>
<p>Skeptical of AI?
Think of the thousands of idiotic “AI powered” ideas people have come up with in the last few months. A few years ago
none of that would have been even remotely possible outside big tech companies like Facebook, Google or Netflix. The
fact that dumb ideas can flourish is evidence that the complexity has clearly plummeted.</p>
<p>However, <a href="https://simonwillison.net/2023/Oct/23/embeddings/">embeddings are where it’s at</a>. Unlike full LLMs, their output is very cacheable, aggregatable,
and you can easily do math on them in ways that we’re still understanding:</p>
<ul>
<li>Clustering (e.g. “group these posts by similar content”)</li>
<li>Classification (e.g. “is this post about kittens or puppies?”)</li>
<li>Search (e.g. “find all posts about kittens running into things”)</li>
<li>Similarity (e.g. “is this post similar to that one? how similar?”)</li>
</ul>
<p>I use embeddings for clustering (and soon for classification) within fossil. It’s so easy.</p>
<p>Between the common availability of LLMs and embedding models, a sophisticated natural language processing (NLP) project
takes only a few minutes to undertake, where a few years ago it likely wasn’t even possible for a hobbyist.</p>
<h1 id="where-is-this-all-going">Where Is This All Going?</h1>
<p>It’s hard to make predictions, but it sure seems like a major theme of 2024 is going to be open standards and open source.
From the availability of source data to the sophisticated tools to work with the data, we’ve got a ton of possibilities
in front of us. I’m certainly excited about the tools we’ll see built this year.</p>
<p>If you want to participate more in the syndiverse, check out these things:</p>
<ul>
<li><a href="https://github.com/tkellogg/tkellogg.github.com/blob/main/blog/atom.xml">atom.xml</a> — I use Github Pages to host this
website. This Jekyll template is how I’m generating an Atom feed for the blog portion. It’s honestly very easy, mostly
cut-n-paste.</li>
<li><a href="https://github.com/tkellogg/fossil">Fossil</a> — My Mastodon client. I’d love to see people use it, but I’m especially excited to see what people make
out of it. Send pull requests, create issues. Even if you write your own competing tool, tell me about it, I’d gladly
advertise it.</li>
<li><a href="https://rss-parrot.net/">Birb</a> — Go fedify an RSS feed and follow it! Create a mastodon account (or threads!). Participate in the syndiverse.</li>
</ul>
<h1 id="discussion">Discussion</h1>
<ul>
<li><a href="https://news.ycombinator.com/item?id=38859396">Hacker News</a></li>
<li><a href="https://lobste.rs/s/j5uv2z/birb_fossil_rss_revival">Lobste.rs</a></li>
<li><a href="https://hachyderm.io/@kellogh/111693944963213221">Mastodon</a></li>
</ul>
Are They Actually Afraid of AI?2023-12-21T00:00:00+00:00http://timkellogg.me/blog/2023/12/21/alignment<p>Yesterday I talked to a longtime friend of mine. He works about as far away from tech as you can imagine.
He does maintenance for summer camps, so basically a lot of plumbing and odd jobs fixing houses and buildings.
He’s always been vehemently opposed to AI, which has always added a flare of excitement to our conversations
given that I, ya know, work in AI.</p>
<p>Yesterday I talked to a longtime friend of mine. He works about as far away from tech as you can imagine.
He does maintenance for summer camps, so basically a lot of plumbing and odd jobs fixing houses and buildings.
He’s always been vehemently opposed to AI, which has always added a flare of excitement to our conversations
given that I, ya know, work in AI.</p>
<p>I told him about <a href="https://timkellogg.me/blog/2023/12/19/fossil">the mastodon client I made</a> that uses AI to automatically categorize and group
posts together, so that I can spend less time on social media. His immediate response was, “oh, can you set
me up with that?”.</p>
<blockquote>
<p>I hate things I don’t understand (that aren’t aligned to me)</p>
</blockquote>
<p>We, as a society, are getting fairly comfortable with working with technology that we don’t understand.
How many of us hop into a car or a bus without any concept for how it actually works? Heck, most people don’t
realize that <a href="https://www.economist.com/christmas-specials/2022/12/20/deadly-dirty-indispensable-the-nitrogen-industry-has-changed-the-world">ammonia is more important to the world than silicon</a>. We’re fine with not understanding
how things work, the issue is when those things aren’t aligned to us.</p>
<p><img src="https://gist.github.com/assets/437044/63412740-d5ec-4ab5-8aa1-f176d0feb8dd# inline" alt="a close-up of a Middle-Eastern descent farmer's hand, gently releasing a handful of dark, nutrient-rich soil. The soil, infused with fine granules of ammonia-based fertilizer, streams between the fingers against a softly blurred background. This backdrop features a sunlit, lush green farm field, bathed in warm, golden sunlight. The image evokes a strong sense of agriculture and the nurturing connection between the farmer and the earth." /></p>
<p>A few weeks ago, Bruce Schneier wrote a post called <a href="https://www.schneier.com/blog/archives/2023/12/ai-and-trust.html">AI and Trust</a> in which he talked about how companies
are aligned to sustaining themselves, but since we occasionally benefit from that alignment we get tricked
into believing that they’re aligned to us, that they’re our friends. He argued (persusasively), that AI will
be aligned to the companies that create it, although it might appear they’re aligned to us at times. Cory Doctorow’s
<a href="https://www.eff.org/deeplinks/2023/04/platforms-decay-lets-put-users-first">enshittification</a> is the same idea, in principle.</p>
<p>To fix it, it seems clear that the organizations making AI and applications of AI should be aligned to us,
regular people.
Bruce Schneier says that only governments are aligned to us. Although, I suspect that if you subsitute
“governments” with select autocracies that perform atrocities, like “North Korea” or “Myanmar”, then it might not
sound great to blindly trust all governments to always act in the best wishes of it’s people. I think open source
provides a model that might be a little closer to what we need.</p>
<p>By nature, open source serves the people who create it. That’s true of all software, but there aren’t any
gatekeepers for open source. Anyone can start a project or contribute to one. Participating in open source
is exercising the power to control your own destiny. Your contributions don’t have to be aligned with some company,
they just have to be aligned to the project, and if you can’t find such a project, you simply create your own
project.</p>
<p>For fossil, my <a href="https://github.com/tkellogg/fossil/">mastodon client</a>, I had a theory that social media is good at it’s heart. The bad aspects
that we talk about are artifacts of enshittification, companies designing social media algorithms to keep you
on their site, viewing ads. The thing is, I don’t actually want to be engrossed in social media, I just want
to see the good stuff in 10 minutes, post my own content, and then get out. I want social media that works for me.</p>
<p><img src="https://gist.github.com/assets/437044/1de5c3d1-149f-4bcc-a4d3-b72530f4400a# inline" alt="..." /></p>
<p>Prior to Large Language Models (LLMs), building something like this would be quite difficult. Only the largest social
media companies could do it, and they wouldn’t, of course, because it doesn’t help their bottom line. But now
we have this commodity AI where we can reduce the meaning of a chunk of text to numbers and
<a href="https://simonwillison.net/2023/Oct/23/embeddings/">do math on it</a>; compute similarity between two posts, or cluster similar posts together in my
timeline. The options are wide open, and we’re just beginning to explore it all.</p>
<p>Open source is a powerful force for correcting corporate misalignment. I think of open source like “capitalism
without the money”. If a project needs a small alignment adjustment, contributions work. If it needs a big
adjustment, then you fork it and start a new project. The cool part about forking is you don’t have to start
from scratch, you can take the entire old project and just replace the parts that don’t work for you.</p>
<p>For fossil, I anticipate that it’s not going to work for a lot of people. That’s fine. They can contribute back,
or fork it, or rewrite it in a totally different direction. Whatever suits them. It’s an application of AI that’s
fully aligned to “the people”, rather than some corporate entity, hence why my friend who’s terrified of AI
has absolutely no fear of this. He trusts that it’s aligned to what he wants.</p>
<p>I’m not sure open source has all the answers, but it does seem like a good option for checking the balance of
power between the public and corporations. I’m old enough to recall how Firefox did this to Internet Explorer, or
how Linux did this to corporate Unix flavors. In all cases, it forced the corporate option to better serve their
users. Open source isn’t perfect, but it certainly is a powerful tool for societal alignment. I wish goverments
leveraged open source more readily.</p>
<h1 id="conversation">Conversation</h1>
<ul>
<li><a href="https://hachyderm.io/@kellogh/111618404480295496">Mastodon</a></li>
<li><a href="https://www.linkedin.com/posts/tim-kellogg-69802913_are-they-actually-afraid-of-ai-activity-7143578361078902785-TX4v?utm_source=share&utm_medium=member_desktop">LinkedIn</a></li>
</ul>
A Better Mastodon Client2023-12-19T00:00:00+00:00http://timkellogg.me/blog/2023/12/19/fossil<p>Last night I had an idea and went ahead and built it. I’d like to tell you about it. Find the source code <a href="https://github.com/tkellogg/fossil/">here</a>.</p>
<p>Last night I had an idea and went ahead and built it. I’d like to tell you about it. Find the source code <a href="https://github.com/tkellogg/fossil/">here</a>.</p>
<h1 id="the-pain-point">The Pain Point</h1>
<p>I use <a href="https://joinmastodon.org/">Mastodon</a> as my primary social media. I like it because the sheer density of good info in my feed. So
much good conversation happens on Mastodon. But my timeline is getting a little out of control.</p>
<p>Mastodon let’s me follow hashtags, like <code class="language-plaintext highlighter-rouge">#LLMs</code> or <code class="language-plaintext highlighter-rouge">#AI</code>, at which point my timeline gets all toots that my server
(<a href="https://hachyderm.io/">hachyderm.io</a>)
handled that were tagged accordingly. It’s not a huge amount, but hachyderm is fairly large so I get a good amount of
toots, probably 1,000-1,500 toots per day. It’s getting hard to keep up with.</p>
<p>I should be able to automate this!</p>
<h1 id="a-streamlit-dashboard">A streamlit dashboard</h1>
<p>So here’s my idea: a <a href="https://streamlit.io/">streamlit</a> dashboard that</p>
<p><img src="https://gist.github.com/assets/437044/bbe220c3-20f7-4076-92b8-f4e5c5e82b0e# inline" alt="This image shows a festive party scene with a realistic Mastodon as the centerpiece. The Mastodon stands in the middle of a crowded dance floor, surrounded by partygoers who are dancing and celebrating. Balloons in various colors float in the air, and string lights crisscross above the revelers, adding to the joyous atmosphere. In the foreground, there is a graphical user interface with "Entus controls" and a button labeled "Entiore," suggesting the integration of technology into the party setting. The overall mood is lively and vibrant, with a sense of fun and community celebration." /></p>
<ol>
<li>downloads latest toots in my timeline</li>
<li>cache them in SQLite</li>
<li>generate embeddings for each toot</li>
<li>do k-means clustering to group them by similar topic</li>
<li>use an LLM to summarize each cluster of toots</li>
<li>use <a href="https://tailscale.com/blog/how-tailscale-works">tailscale</a> to view it on my phone</li>
</ol>
<p>I chose streamlit because it’s quick and dirty. I figure this isn’t going to be great on the first
pass, so streamlit should help me iterate quickly to make it work better for me.</p>
<p>The great thing about Mastodon is it’s completely open source, so the API is open and always will be,
unlike Twitter/X or the other platforms that have been locking down. FWIW I do think the fediverse is the
long-term right model for social media, for a variety of reasons.</p>
<h2 id="embeddings">Embeddings</h2>
<p>A quick note — <a href="https://llm.datasette.io/en/stable/embeddings/index.html">embeddings</a> are a numeric representation of text that corresponds to the meaning of the text.
I like to think of it as an “AI secret language”, in that it’s the representation that large language models use to
work with the text. We’re using a clustering algorithm here to group similar toots, there’s a lot of other things
you can do with embeddings too!</p>
<h2 id="building-it">Building It</h2>
<p><img src="https://gist.github.com/assets/437044/102a435d-1a62-4166-a222-934a07b0b314# inline" alt="A dynamic scene of a man and a Mastodon working together in a prehistoric landscape. The Mastodon, with its large tusks and woolly body, stands prominently in the center, pulling a wooden cart over a rocky terrain. The man, dressed in red, strains as he assists the Mastodon, guiding a rope attached to the cart. In the background, a cascade of waterfalls and lush greenery provide a majestic backdrop, while a herd of Mastodons is visible in the distance, hinting at a communal effort. The setting is serene with a soft glow of sunlight filtering through the mist, highlighting the cooperative relationship between humans and these ancient creatures." /></p>
<p>I went from “oh! I have an idea” to a working solution in about 3.5 hours. I used <a href="https://github.com/features/copilot">Github Copilot</a>, especially
with the <a href="https://docs.github.com/en/copilot/github-copilot-chat/about-github-copilot-chat">chat feature</a> (CMD+I, type “create a SQLite DB with a toots table”). It’s incredible how quickly you
can try out ideas.</p>
<p>If you want to take a peek:</p>
<ul>
<li>The UI (<a href="https://github.com/tkellogg/fossil/blob/main/dashboard.py">dashboard.py</a>)</li>
<li>The SQLite DB (<a href="https://github.com/tkellogg/fossil/blob/main/fossil/core.py#L15-L127">core.py</a>)</li>
<li>Download timeline (<a href="https://github.com/tkellogg/fossil/blob/main/fossil/core.py#L137-L170">core.py</a>) — I used <a href="https://requests.readthedocs.io/en/latest/">requests</a>, no special client</li>
<li>Generate embeddings (<a href="https://github.com/tkellogg/fossil/blob/main/fossil/core.py#L173-L188">core.py</a> — I used OpenAI’s <code class="language-plaintext highlighter-rouge">text-embedding-ada-002</code>. Its cheap and easy to setup.</li>
<li>K-means clustering (<a href="https://github.com/tkellogg/fossil/blob/main/fossil/science.py#L8-L12">science.py</a>) — <a href="https://scikit-learn.org/stable/">scikit-learn</a> makes this super easy, just 4 lines.</li>
<li>Summarize clusters (<a href="https://github.com/tkellogg/fossil/blob/main/fossil/science.py#L20-L26">science.py</a>) — I used <code class="language-plaintext highlighter-rouge">gpt-3.5-turbo</code> because it’s cheap-ish and good enough</li>
</ul>
<p>The streamlit dashboard displays the clusters as an <a href="https://docs.streamlit.io/library/api-reference/layout/st.expander">expander container</a>. When the dashboard loads
you see a list of cluster descriptions and you can choose which to dive into.</p>
<p><img src="https://gist.github.com/assets/437044/4c314ff0-0427-4979-9d55-5649a24dff2c" alt="A list of clickable article headlines displayed on a digital interface with drop-down arrows next to each, suggesting additional content is available. The headlines are: Apple faces a setback with Apple Watch Series 9 and Ultra 2 after a losing patent lawsuit; Considerations for livestreaming coding projects and code writing in the Project Jupyter ecosystem; Discovery of variable swapping and destructuring across multiple programming languages; Controversial Economic Policy; Food and sports in North Carolina; Monday pizza night with a touch of spooky weather." /></p>
<p>The toots are displayed poorly, imo, it could use a lot of work. I’d also like to be able to favorite and retoot
from this UI, at which point I could probably use it as my primary client for my right-after-I-wake-up browsing.</p>
<h1 id="conclusion">Conclusion</h1>
<p>I’ve used it for a few hours and I like being able to skip over vast stretches of my timeline with relative
confidence that I know what I’m skipping. I’m in control again.</p>
<p>On a more philosophical note, I like the idea of social media algorithms but I hate the implementations.
Viewing social media in timeline order is far too noisy. Algorithms that curate my feed make it far more manageable.
On the other hand, I don’t know how X or Instagram are curating my feed. As far as I can tell, they’re optimizing
for their own profit, which feels manipulative. I want my feed to serve me, no other way.</p>
<p>What do you think? How could it be improved?</p>
<p><em><strong>Next</strong>:</em> I wrote a followup to this post, about <a href="https://timkellogg.me/blog/2023/12/21/alignment">open source and societal alignment</a>.</p>
<h1 id="comments">Comments</h1>
<ul>
<li><a href="https://hachyderm.io/@kellogh/111607714159954053">Mastodon</a></li>
<li><a href="https://lobste.rs/s/qa6759/better_mastodon_client">Lobste.rs</a></li>
<li><a href="https://news.ycombinator.com/item?id=38696523">Hacker News</a></li>
<li><a href="https://www.linkedin.com/posts/tim-kellogg-69802913_a-better-mastodon-client-activity-7142902236786954241-LaY1?utm_source=share&utm_medium=member_desktop">LinkedIn</a></li>
</ul>
LLMs: Fake it till you make it2023-12-07T00:00:00+00:00http://timkellogg.me/blog/2023/12/07/fake-it<p>How does the current generation of AI work? Think of the phrase “fake it till you make it”, and
then take it all the way to the extreme, that’s close enough to what’s going on to get a feel for it.</p>
<p>How does the current generation of AI work? Think of the phrase “fake it till you make it”, and
then take it all the way to the extreme, that’s close enough to what’s going on to get a feel for it.</p>
<p>This post started with a chat with my family. I expanded on it and added a (overly?) positive take
on where AI may take us. Don’t expect technical details here.</p>
<h1 id="a-story">A Story</h1>
<p>Think of a three year old kid. She’s learning how to talk by listening and imitating
as best as possible. At first speech is short bursts of 2-3 words, but she gets better at faking it
and eventually learns to string together multiple sentences. But she doesn’t really understand what’s
going on, which results in funny stories, like the time she went to a department store, looked up
at a mannequin and asked, “mom, is it dead?”.</p>
<p><img src="https://user-images.githubusercontent.com/437044/283129104-66bd0f8c-d47e-49e3-bfd2-d8c881cc55d7.png# inline" alt="A 3-year-old girl standing in a department store, looking up at a mannequin with a sense of awe and inquisitiveness. The girl is small and curious, her eyes wide with wonder. The mannequin, elegant and stylish, towers over her, creating a stark contrast in size and form. The department store setting is filled with racks of clothes and displays, providing a backdrop that emphasizes the child's fascination and the mannequin's imposing presence. The overall scene is heartwarming and captures a moment of childhood curiosity and admiration." /></p>
<p>Our brains start developing abilities for <a href="https://illinoisearlylearning.org/ielg/symbolic/">symbolic reasoning</a> from an early age and it eventually
takes over. Our learning changes from imitating to building up a mental model of the world and most
of our learning revolves around understanding the world.</p>
<p>But what if our hypothetical kid never develops symbolic reasoning? What if she gains superhuman levels
of being able to fake it? How far can she get in life?</p>
<p>She goes to college. She gets straight A’s in all her language and writing classes, because those
only require her to regurgitate the most plausible-sounding text at the right time. For her literature
final exam, she summarizes a 3,000 page book in an eloquently worded 10 paragraph essay in which she
uses no single word more than twice.</p>
<p>History involves a little bit of memorization, but beyond that, it’s nothing more than summarizing
events. It’s easy. During a study session for the final exam she formats the history
of Tanzania as a series of limericks. Straight A’s.</p>
<p>Math was hard, but she finds that she if studies enough examples of math problems, she could fake trigonometry
and calculus. It’s not perfect, but she can walk away with C’s and D’s, which is enough to graduate.</p>
<p>After graduation, she picks up a job as a businesswoman and becames a huge hit at the new company.
She appears to have deep knowledge of a huge variety of topics. She responds in detail to every customer concern,
and always speaks with the confidence of a strong leader. The company quickly promotes her into the
executive ranks, where she excels.</p>
<h1 id="faking-it">Faking It</h1>
<p>Large Language Models (LLMs) are the current generation of AI. They work essentially like this, and they
sound very impressive. I’m sure eventually we’ll see a breakthrough that gives AI symbolic reasoning,
but they don’t have it now and they won’t for the foreseeable future. So how well can they do by
just faking it?</p>
<p><img src="https://gist.github.com/assets/437044/bc98fe6c-7ee7-4de6-9adf-2e0bfd012efa# inline" alt="Portrait of a confident Middle-Eastern businesswoman standing in a modern office. She is wearing a professional business suit, exuding competence and determination. The well-lit office behind her features a large window showcasing a city skyline, symbolizing success and ambition." /></p>
<p>“Fake it to you make it” is a common phrase in business. A lot of people think that’s one of the most
effective strategies an executive can take. <a href="https://www.forbes.com/sites/dileeprao/2021/09/15/fake-it-till-you-make-it-is-this-one-more-lie-from-silicon-valley-like-theranos/?sh=2fae2ee134e6">Some say</a> that’s how startups in Silicon Valley
succeed.</p>
<p>But we’re talking about very sophisticated faking. Superhuman levels of faking, beyond what you’ve
previously imagined.</p>
<p>It can pass a trigonometry test just by writing down the most plausible-sounding
answer. If you make it break down the problem into sub-problems, it dramatically improves it’s accuracy
because it can readily come up with plausible-sounding answers for the sub-problems and then roll it
all up into a solid plausible-sounding answer for the full problem.</p>
<p>It can read through a 300 page book in seconds, and answer any question you have about the book.
We’ve even found ways of packing in near-infinite amounts of text with varying levels of success.
It can turn dense legal documents into poetry. It can create Monet paintings out of a child’s crayon
drawing.</p>
<h1 id="who-wins">Who Wins?</h1>
<p><a href="https://infosec.exchange/@MR_E/111539287134351978">Someone on Mastodon</a> had a really interesting take:</p>
<blockquote>
<p>I think this is a complex topic because, on one hand,
we have people with valid claims that AI is stealing
their hard-earned work and replicating it. But your
example is why I think this is a sort of graphic version
of the Gutenberg printing press all over again. I cannot
tell you the number of adults with amazing ideas who
cannot express them clearly with either words or
pictures. The ideas get set aside because it’s so
hard to get others to understand what you are trying
to convey. I’m incredibly excited about an age where
people can visually share ideas quickly. Can enhance
storytelling. I think it’s going to change how we
communicate with each other.</p>
</blockquote>
<p>It’s not just visual. The level of difficulty of communicating to another person has dropped to
zero in the last year. That opens up a lot of opportunities for many people.</p>
<p><img src="https://gist.github.com/assets/437044/b2ef8250-78cf-4be0-b8bc-f6e97ae7ea1e# inline" alt="ortrait of a Black man sitting comfortably in a cozy home setting, playing an acoustic guitar. He has a relaxed, focused expression with a warm smile, indicating a deep connection with the music. The background is a homely living room with soft lighting and decor, emphasizing a casual, genuine atmosphere. His attire is simple and unpretentious, embodying a natural and authentic lifestyle." /></p>
<p>It’s extremely difficult to predict the future, so anyone trying to tell you the outcome of AI is
definitely trying to either sell you a political narrative or exploit a new business opportunity,
but I can tell you this:</p>
<p><em>It takes a lot less skill to make decent things nowadays.</em></p>
<p>My three year old will use her overactive imagination to tell me about creatures and scenes that
creative or even absurd, and together we’ll use ChatGPT to create pictures and stories that bring
the idea to life. My older kid doesn’t need me, she can use voice-to-text and text-to-speech and
do it all herself. It makes me wonder if reading & writing will have the same fate as cursive
handwriting.</p>
<p>On this blog I’ve started using AI-generated art to augment the text. I think it looks better
this way, but it’s not something I care enough about to pay money for. Before this I simply had
walls of text with no images.</p>
<h2 id="a-workforce-without-faking">A Workforce Without Faking</h2>
<p>If I try to predict the future (carefully), I tend to think that work will require a lot less faking
it, because all that is done much better by an AI. I admittedly am biased toward being overly chill,
but here’s what such a workforce could be like, take it with a grain of salt:</p>
<ul>
<li><em><strong>Authenticity</strong></em>: No one learns the plastic exterior, because AI does it better anyway</li>
<li><em><strong>Collaboration</strong></em>: When people lack communication skills or speak different languages, AI can
step in and help them communicate their true intent.</li>
<li><em><strong>Reduced Impostor Syndrome</strong></em>: When AI does virtue signalling better than we can, all that’s
left is to be authentic about our actual struggles, and help each other through.</li>
</ul>
<p>Having worked on AI for a long time, I can tell you that “faking it” can be taken a very long way
and probably shouldn’t be underestimated. But if “faking it” is also no longer a viable strategy
for excelling in this world, maybe all that’s left is to discover our true selves and be authentic.</p>
<p>If that’s too rosy for you, then read <a href="https://www.schneier.com/blog/archives/2023/12/ai-and-trust.html">Bruce Schneier’s take</a>. It’s very grounded, unlike
a lot that’s written on the topic.</p>
LLMs are Interpretable2023-10-01T00:00:00+00:00http://timkellogg.me/blog/2023/10/01/interpretability<p>This might be a hot take but I truely believe it: LLMs are the most interpretable form of machine learning
that’s come into broad usage.</p>
<p>This might be a hot take but I truely believe it: LLMs are the most interpretable form of machine learning
that’s come into broad usage.</p>
<p>I’ve worked with explainable machine learning for years, and always found the field dissatisfiying. It wasn’t until
I read <em><a href="https://arxiv.org/abs/1706.07269">Explanation in Artificial Intelligence: Insights from the Social Sciences</a></em> that it made sense why I wasn’t satisfied. The paper
is more like a short book, it’s a 60 page survey of research in psychology and sociology applied to explanations in
AI/ML. It’s hard to read much of it and not conclude that:</p>
<ol>
<li>“Explanation” and “interpretability” are complex topics, multifacited and hard to define</li>
<li>Existing AI research at the time (2017) nearly entirely missed the point</li>
</ol>
<p>I also see a lot of people assert that LLMs like ChatGPT or Claude aren’t interpretable. I argue the opposite,
LLMs are the first AI/ML technology to truly realize what it means to give a human-centric explanation for what
they produce.</p>
<p><em>Note: I use “AI” to mean the general set of technologies, including but not limited to machine learning (ML), that are able to make
predictions, classify, group, or generate content, etc. I know some people use “AI” to refer to what other people call “AGI”,
so I’m sorry if my terminology is confusing, but it’s what I’ve used for decades.</em></p>
<h1 id="interpretable-models">Interpretable Models</h1>
<p>As machine learning exploded throughout the 2010s, ethical questions emerged. If we want to put an ML model
into production, how do we gain confidence that it won’t kill someone, cause financial damage, make biased decisions
against minorities, etc. In other words, we want to <em>trust</em> it, so we can feel comfortable with it doing things for us.
The first pass on establishing trust was, “I should be able to understand how the model works”. To this end, the
idea of interpretable models was born.</p>
<p>Decision trees are considered interpretable by most experts. Here’s an example of a decision tree for identifying whether a tree
is a <a href="https://en.wikipedia.org/wiki/Pinus_taeda">loblolly pine</a> or not.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Bunches of >=
2 needles
/ \
/ \
Has Cleaved Needles
Bark >= 2 inches
/ \ / \
No No No Yes
</code></pre></div></div>
<p>At a height of two levels, this model is very interpretable. It’s easy to simulate what’s going on
in your head. If we give it an Eastern White Pine, the model will tell us that it’s a loblolly pine. It’s wrong
but that makes sense because the white pine has bunches of 5 needles and it’s 4 inch needles are longer than 2 inches. It gave the wrong
answer but it’s okay because we understand <em>why</em> it was wrong.</p>
<p>The most obvious way to fix the model is to add another layer of decisions. Maybe another
split point on needle length or number of needles in a bunch. But now there’s
three things to consider. Another layer of nodes on a binary tree means that exactly one more decision needs to be made
to arrive at an answer. But even 3 isn’t enough.
There’s 35 different types of pines alone that are native to just North America, that would take 6 levels of a perfectly
balanced decision tree (<code class="language-plaintext highlighter-rouge">log2(35)</code> is a bit bigger than 5, so we round up to 6). Then consider all the trees in North America,
or more generally all the plants in the world. We could end up with a lot of levels.</p>
<p><em><strong>Increase model complexity to improve performance, decrease to improve interpretability.</strong></em></p>
<p>That should make sense in regards to decison trees, but it also works for other model types. If you increase the
complexity of the model (the number of nodes in the tree), it can hold more information which means it can utilize more data to potentially make
more accurate predictoins. But also, as you scale upwards, even a decision tree becomes hard to understand.
I can follow 3 decisions, but I probably can’t follow 3000 decisions. So even a model type that’s generally
considered interpretable, like a decision tree, can become uninterpretable if it grows too complex. (IIRC the paper
said most humans find it uninterpretable at around 8 decisions, although I can’t find that quote now).</p>
<p>LLMs are extremely uninterpretable by this definition. With billions of parameters, each one would have to be explained.
That would be far beyond reasonable.</p>
<p>From the paper:</p>
<blockquote>
<p>[Thagard] contends that all things being
equal, simpler explanations — those that cite fewer causes — and more general explanations —
those that explain more events, are better explanations. The model has
been demonstrated to align with how humans make judgements on explanations</p>
</blockquote>
<p>Well ain’t that the truth? Everyone is always looking to oversimplify the world. Imagine what politics would look
like if the average person could consider eight different competing tidbits of information and arrive at a balanced
conclusion…</p>
<p>So there seems to be a tension between model performance and interpretability. Human brains aren’t good at
working with a lot of data, which is why machine learning was ever interesting. Suddenly there was a way to sift
through mountains of information and find actionable insights that seemed intractable before ML. It
seemed like magic at the time, but the nature of magic is that it escapes our ability to explain it.</p>
<h1 id="explainable-models">Explainable Models</h1>
<p>Thus emerges explaniable ML. We don’t really want to sacrifice model performance, but we still want to know what’s going
on. What if we looked at the model as if it were totally opaque, just some magic function that takes inputs and
churns out an answer.</p>
<p>That’s <a href="https://shap.readthedocs.io/en/latest/">SHAP (Shapley values)</a> in a nutshell. From their website:</p>
<blockquote>
<p>SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions</p>
</blockquote>
<p>Basically, for any given individual prediction, tell the user which of the inputs contributed most to the final
prediction. It’s a black box approach that can be applied to any model (you could even apply it to something that’s
not ML at all like a SQL query). SHAP is a family of algorithms, but in general, they take a single prediction,
fluctuate the inputs and observe how the changes impact the outputs. From there, there’s some great visualizations
to help understand which features contributed the most.</p>
<p><img src="https://shap.readthedocs.io/en/latest/_images/example_notebooks_overviews_An_introduction_to_explainable_AI_with_Shapley_values_13_0.png" alt="example SHAP plot" /></p>
<p>So in our pine tree example, the length of the needle would be the most important input, followed by the number
of needles in the bunch. While the appearance of the bark would have no importance at all, since anything close to
a loblolly pine would’ve branched off at the first question, the length of the needles.</p>
<p>Honestly, that’s crap. When I’m identifying trees, the bark is one of the most important aspects. Since the model
doesn’t actually incorporate bark appearance, I’m losing trust in the model’s algorithm. And that’s how
it goes a lot of the time with interpretable & explainable ML. When the explanation doesn’t match your mental model,
the human urge is to force the model to think “more like you”.</p>
<p>The thing is, machine learning is a lot like an extension of statistics. With decision trees specifically, the
learning algorithm chooses to use an input first if it does the best job of keeping the binary tree balanced. Another
way to say that is it has the highest entropy reduction, or it gets to the correct answer faster. Statistically,
it makes sense to use the number of needles first because it divides the number of pine species fairly equally.
On the other hand, humans don’t think that way because the number of needles is the hardes piece of data to
observe.</p>
<p>From the paper:</p>
<blockquote>
<p>Jaspars and Hilton both argue that such results demonstrate that,
as well as being true or likely, a good explanation must be relevant to both the question
and to the mental model of the explainee. Byrne offers a similar argument in her
computational model of explanation selection, noting that humans are model-based, not
proof-based, so explanations must be relevant to a model.</p>
</blockquote>
<p><em><strong>Explanations are better if they match our mental model and life experiences.</strong></em></p>
<p>I had seen this phenomenon a lot in the medical world. Experienced nurses would quickly lose trust in an ML
prediction about their patient if the explanation didn’t match their hard-earned experience. Even if it
made the same prediction. Even if the model was shown to have high performance. The realization that the model
didn’t think like them was often enough to trigger strong distrust.</p>
<h1 id="explainable-ai-was-a-dead-end">Explainable AI was a dead end</h1>
<p>A big problem with both explanations and interpretable models is that they don’t often fit how people think. For example,
I challenge you to explain what the output of a SHAP model actually means. If you’re a talented data scientist, you might
arrive at a true and simple explanation, maybe. There’s a lot of nuance and it requires a lot of math-like reasoning.
I argue that average people in our society don’t think like that. Even highly educated people.</p>
<p>From the paper:</p>
<blockquote>
<p>An important concept is the relationship between cause attribution and explanation.
Extracting a causal chain and displaying it to a person is causal attribution, not (necessarily)
an explanation. While a person could use such a causal chain to obtain their own
explanation, I argue that this does not constitute giving an explanation. In particular,
for most AI models, it is not reasonable to expect a lay-user to be able to interpret a
causal chain, no matter how it is presented. Much of the existing work in explainable
AI literature is on the causal attribution part of explanation — something that, in many
cases, is the easiest part of the problem because the causes are well understood, formalised,
and accessible by the underlying models.</p>
</blockquote>
<p>Wow! In other words, SHAP and similar methods totally miss the point because they explain which inputs <em>caused</em> the output.
But that’s simply not how non-technical people think (and, well, most technical people as well).</p>
<p>At some point in 2019, after reading this paper, I came to the conclusion that the current approaches to explainable
and interpretable AI were dead ends. I shifted toward black box approaches. One idea I had was to measure the
performance across lots of subsets of the training dataset. Like, “the accuracy of this loblolly detector
is 98% but falls to 10% when applied only to the family of white pines”. (I act like this is my idea, but the field of
fairness in AI was already developing and this was a common technique.)</p>
<p><em><strong>Negative confidence is still confidence.</strong></em></p>
<p>Knowing when a model is wrong and shouldn’t be trusted is probably even more useful than knowing when it’s
probably right. We’re good at assuming a model is right, but we become experts when we know when it’s wrong.
In software, I don’t feel truly comfortable with a new database or framework until I understand it’s bounds,
what it does poorly. If you watch a 2-3 year old child, their entire life revolves around testing the limits
of the physical world around them, and also the limits of patience in their parents. Humans need to understand
the limits before we feel comfortable and happy.</p>
<h1 id="llms-are-the-answer">LLMs are the answer</h1>
<p>Yes, I do believe LLMs are the answer to explainable AI, but I also think they need to improve a lot. But they’re
by far the closest thing I’ve witnessed to what explainable AI needs to be.
For one, there’s no numbers. My “idea” of measuring performance for subsets was also a dead end because the
general public doesn’t think in numbers. That’s an engineer or data scientist thing. (And besides, the numbers
we were talking in weren’t simple quantities, it took mental strain to even understand what the unit was).</p>
<p>Let’s say you’re talking to an 8 year old child. She says she cleaned her room, but you’re not sure. One thing you
can do is ask her deeper and deeper questions about the details, or rephrase questions. If the answers seem
volatile or inconsistent, she’s probably lying to you. We do that with adults too.</p>
<p><em><strong>You can probe an LLM like you probe a fellow person.</strong></em></p>
<p>For example, while writing this I couldn’t think of a word, so I asked ChatGPT. It answered wrong the first
time, so I clarified what I wanted, just like I’d do with another person, and it gave me the right answer.
It’s a joint effort in creating a shared mental model!</p>
<p><img src="https://user-images.githubusercontent.com/437044/272265389-fbf0c381-d7cf-42e3-8b60-af1278f6efaa.png" alt="Screenshot of GPT4 conversation where I'm looking for the word "referential integrity" and GPT4 gives me the wrong answer the first time." /></p>
<p>You might not like that computers can now trick you into believing lies, but these LLMs are by far the closest thing
in AI/ML to how humans already build trust (or distrust) in each other. The skills
we use to build trust in fellow humans are mostly transferrable to the skills needed to work with LLMs. That’s
unprecedented, it’s such a giant improvement compared to where we were just a few years ago.</p>
<h2 id="trust-building-wth-llms">Trust building wth LLMs</h2>
<p>There’s still a lot of problems. Bard takes the approach of letting the user decide when the model is wrong
and nudging them into using Google search. Honestly, I’m not sure how that makes sense to anyone that’s not
selling a search engine, but I’m glad that they’re getting real data to enhance the discussion about trust
building with LLMs. GPT-4 and Bing Chat seem to be getting decent at sourcing their claims with a URL. That
seems like a great approach (up until it gives the wrong URL).</p>
<p>Retrieval augmented generation (RAG) is an approach where you store lots of facts in the form of free text
in a traditional database. You could use elasticsearch or PostgreSQL for full text search, although the hot new
thing is to use <a href="https://vickiboykis.com/what_are_embeddings/">embeddings</a> with a <a href="https://blog.qdrant.tech/qdrant-introduces-full-text-filters-and-indexes-9a032fcb5fa">vector database</a>. Either way, you inject relevant tidbits of text into a
conversation in the background, invisitble to the user, and let the LLM reformat the text into a cohesive answer.
I like this approach because you can:</p>
<ol>
<li>Source your claims, by showing the user a URL.</li>
<li>Keep data up-to-date and remove old information. It’s just a database.</li>
</ol>
<p>RAG is interesting, from a perspective of explainable AI, because LLMs are already good at acting as a
“word calculator”. It can reformat text all day long with high accuracy. So questions things like “where did
you get that?” can be answered with a high degree of accuracy.</p>
<p><em>Note: The normal intuition is that you want to re-train or at least fine-tune a model to improve it’s accuracy.
However, <a href="https://arxiv.org/abs/2305.01651">research</a> indicates that inserting text into the conversation RAG-style (called
“in-context learning”, or ICL) is much more reliable than fine tuning. Plus, you can quickly delete or update
out-of-date information, so RAG wins on just about every level.</em></p>
<h2 id="the-crazy-uncle-problem">The crazy uncle problem</h2>
<p>I have an uncle that’s a little bit racist, loves conspiracy theories, and says some <em>pretty wild</em> things.
Once he bragged to his friend that I “invented Microsoft.” (Narrator: I did not, I’ve never even worked there).</p>
<p>We have real people like this in life. We simply distrust them and move on. It’s not rocket science. A lot of
people sweat bullets about LLMs confidently lying. For example, <a href="https://apnews.com/article/artificial-intelligence-chatgpt-fake-case-lawyers-d6ae9fa79d0542db9e1455397aef381c">a lawyer</a> used ChatGPT to create a
statement that he submitted to a judge. The statement contained court cases that were entirely hallucinated by the LLM.
The lawyer said he had no idea that the AI can lie.</p>
<p>That’s a solveable problem. In fact, simply having the incident written and reported incessantly in the media might have pushed the needle
far enough to convince the general public to have a little less blind faith in LLMs. And that’s a good thing.
We consider it naïve to instantly trust people we meet on the internet. We’ve never had to have the same policy with computers,
but it’s really not a big mental shift, and it leads to a more productive relationship with AI.</p>
<h1 id="explanations-are-exploration">Explanations are exploration</h1>
<p>LLMs are closer to what humans want because they help us learn in unplanned ways.</p>
<p>From the paper:</p>
<blockquote>
<p>It is clear that the primary function of explanation is to facilitate learning.
Via learning, we obtain better models of how particular events or properties come about,
and we are able to use these models to our advantage. Heider states that people look
for explanations to improve their understanding of someone or something so that they
can derive stable model that can be used for prediction and control. This hypothesis
is backed up by research suggesting that people tend to ask questions about events or
observations that they consider abnormal or unexpected from their own point of view.</p>
</blockquote>
<p>When you use an LLM in an interactive mode like chat, you get a chance to poke and prod at it. Often you have at least
two goals; (1) learn a topic and (2) decide if you can trust the model. You can ask questions if something
seems suprising.</p>
<p>All of this LLM behavior is unplanned. It’s the nature of it being a general purpose algorithm. With traditional
ML, you had to build a model and then produce explanations for it. In other words, you had to plan out every
aspect of how the model should be used. Contrast that with LLMs where the user decides what they want to
do with it. The experience is fundamentally unconstrained exporation. One model can serve an unbounded number
of use cases.</p>
<h1 id="conclusion">Conclusion</h1>
<p>When I first read this paper years ago I was struck with crisp clarity. Followed by a glum depression after
realizing that the existing technology
had no way of addressing humans the way we need to be addressed. When LLMs finally caught my attention,
I was ecstatic. Finally an ML “explanation” with nearly zero cognitive overhead, anyone can learn how to use LLMs and
when to trust them.</p>
<p>Some areas I’d love to see improvement:</p>
<ul>
<li><em><strong>Self-awareness</strong></em>: It would be a huge help to everyone if LLMs could tell you the parts they’re not sure about.
There’s <a href="https://arxiv.org/abs/2304.13734">promising research</a> that looks at the internal state of the LLM and guesses if it’s hallucinating,
but it <a href="https://arxiv.org/abs/2307.00175">has problems</a>.</li>
<li><em><strong>Tone adjustment</strong></em>: Assuming the model is self-aware in regards to truthfulness, ideally the model could
use softer language to indicate when it’s lying. Like, “I’m not sure about this but…”. I’m not convinced LLMs can do this on their own, but it seems
like a black box approach might work. For example, there are <a href="https://github.com/1rgs/jsonformer">libraries</a> that force LLM output to
conform to a schema by wrapping the LLM and preventing invalid sequences of words. I could see a similar approach
that combined both approaches; the wrapper predicts if the model is hallucinating and forces only softer
language to be generated. (I’m not smart enough to pull that off, so I’m hoping it’s actually possible.)</li>
<li><em><strong>Mind melding</strong></em>: Alright, not sure what word to use here, but everyone has a different mental model, like
we talked about earlier. It would be great if an LLM were able to adjust it’s explanations based on who it’s
talking to. For example, if I’m explaining how a software component works, I use completely different language
when talking to a sales person versus a fellow engineer. This seems like a far-out request for an LLM to do the same,
but it also seems necessary.</li>
<li><em><strong>Referential transparency</strong></em>: in other words, sending the same text to an LLM should always give the same result.
This is actually 100% solved via the <code class="language-plaintext highlighter-rouge">temperature</code> parameter for most open source LLMs. However, OpenAI will
change traffic flow under high load in a way that has the same effect as ignoring this parameter. It’s an easy
problem to solve — OpenAI could offer a <code class="language-plaintext highlighter-rouge">failure_mode</code> parameter that lets you fail requests if they can’t be
served by the ideal expert (rather than routing through a sub-optimal expert). I actually agree with OpenAI on
this decision as a default behavior, but it keeps coming up as a reason why software engineers won’t trust LLMs.</li>
</ul>
<p>Of course, there’s a long way to go. But for once, it actually seems attainable. And it’ll be an exciting ride,
seeing what people come up with.</p>
<h1 id="discussion">Discussion</h1>
<p><a href="https://lobste.rs/s/ig1jev/llms_are_interpretable">Loste.rs</a></p>
<p><a href="https://news.ycombinator.com/item?id=37777533">Hacker News</a></p>
<p><a href="https://hachyderm.io/@kellogh/111182343341194191">Mastodon</a></p>
On Waiting2023-09-14T00:00:00+00:00http://timkellogg.me/blog/2023/09/14/wu-wei<p>I was telling a colleague about my philosophy toward making decisions: “wait as long as you can”. She
replied, “have you heard of the Chinese concept of 无为 (wu wei)?”. Uh, no, I have not. She elaborated:</p>
<p>I was telling a colleague about my philosophy toward making decisions: “wait as long as you can”. She
replied, “have you heard of the Chinese concept of 无为 (wu wei)?”. Uh, no, I have not. She elaborated:</p>
<blockquote>
<p>In some situations, the best thing to do is not do anything but observe, let
whatever situation run its course. While waiting, continue to be in peace, allow for
transformation and growth.</p>
</blockquote>
<p>This is great! Now I have a word for a concept that I’ve felt deeply for a while. I can’t speak
authoritatively about wu wei, I just learned about it, but I can elaborate on my own philosophy:</p>
<p><strong>You’re guaranteed to have more information in the future.</strong></p>
<p>Or at least the same amount. If you have to make a decision that’s short on information, finding a way
to wait longer will always lead to a better decision. Obviously some decisions can’t wait, this doesn’t
apply to those.</p>
<p>Some examples</p>
<ul>
<li><strong>“Should we adopt a preview feature from Product <em>X</em>?”</strong> The longer you wait, the more other people will form
opinions about it and you’ll see a consensus emerge. When you revisit the decision in 6 months, you’ll be
able to avoid months of effort.</li>
<li>In architecture, <strong>“should we assume <em>X</em> can’t ever happen?”</strong> Take the path that takes less effort and
build some light tooling to identify if you made the right decision. Adapt later.</li>
<li>In designing products, <strong>“will customers want to do <em>X</em>?”</strong> Don’t build it, but make it very easy for them
to complain. You’ll know soon.</li>
</ul>
<p>A key component is, before you dive into waiting mode, you should have a plan for monitoring
the situation. In the preview feature example, the monitoring plan could be as simple as a calendar
reminder to check back in, or you could wait until you feel the pain more acutely. If your “waiting”
strategy is causing a lot of pain, that’s a great indicator that you can’t wait any longer.</p>
Regex Isn't Hard2023-07-11T00:00:00+00:00http://timkellogg.me/blog/2023/07/11/regex<p>Regex gets a bad reputation for being very complex. That’s fair, but I also think that if you focus on a certain core
subset of regex, it’s not that hard. Most of the complexity comes from various “shortcuts” that are hard to remember.
If you ignore those, the language itself is fairly small and portable across programming languages.</p>
<p>Regex gets a bad reputation for being very complex. That’s fair, but I also think that if you focus on a certain core
subset of regex, it’s not that hard. Most of the complexity comes from various “shortcuts” that are hard to remember.
If you ignore those, the language itself is fairly small and portable across programming languages.</p>
<p>It’s worth knowing regex because you can get <strong>A LOT</strong> done in very little code. If I try to replicate what my regex does
using normal procedural code, it’s often very verbose, buggy and significantly slower. It often takes hours or days to
do better than a couple minutes of writing regex.</p>
<p>NOTE: Some languages, like Rust, have parser combinators which can be as good or better than regex in most of the ways I
care about. However, I often opt for regex anyway because it’s less to fit in my brain. There’s a single core subset of
regex that all major programming languages support.</p>
<p>There’s four major concepts you need to know</p>
<ol>
<li>Character sets</li>
<li>Repetition</li>
<li>Groups</li>
<li>The <code class="language-plaintext highlighter-rouge">|</code>, <code class="language-plaintext highlighter-rouge">^</code> and <code class="language-plaintext highlighter-rouge">$</code> operators</li>
</ol>
<p>Here I’ll highlight a subset of the regex language that’s not hard to understand or remember. Throughout I’ll also tell you what to
ignore. Most of these things are shortcuts that save a little verbosity at the expense of a lot of complexity. I’d rather
verbosity than complexity, so I stick to this subset.</p>
<h1 id="character-sets">Character Sets</h1>
<p>A character set is the smallest unit of text matching available in regex. It’s just one character.</p>
<h2 id="single-characters">Single characters</h2>
<p><code class="language-plaintext highlighter-rouge">a</code> matches a single character, always lowercase <code class="language-plaintext highlighter-rouge">a</code>. <code class="language-plaintext highlighter-rouge">aaa</code> is 3 consecutive character sets, each matches only <code class="language-plaintext highlighter-rouge">a</code>. Same
with <code class="language-plaintext highlighter-rouge">abc</code>, but the second and third match <code class="language-plaintext highlighter-rouge">b</code> and <code class="language-plaintext highlighter-rouge">c</code> respectively.</p>
<h2 id="ranges">Ranges</h2>
<p>Match one of a set of characters.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">[a]</code> — same as just <code class="language-plaintext highlighter-rouge">a</code></li>
<li><code class="language-plaintext highlighter-rouge">[abc]</code> — Matches <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">b</code>, or <code class="language-plaintext highlighter-rouge">c</code>.</li>
<li><code class="language-plaintext highlighter-rouge">[a-c]</code> — Same, but using <code class="language-plaintext highlighter-rouge">-</code> to specify a range of characters</li>
<li><code class="language-plaintext highlighter-rouge">[a-z]</code> — any lowercase character</li>
<li><code class="language-plaintext highlighter-rouge">[a-zA-Z]</code> — any lowercase or uppercase character</li>
<li><code class="language-plaintext highlighter-rouge">[a-zA-Z0-9!@#$%^&*()-]</code> — alphanumeric plus any of these symbols: <code class="language-plaintext highlighter-rouge">!@#$%^&*()-</code></li>
</ul>
<p>Note in that last point how <code class="language-plaintext highlighter-rouge">-</code> comes last. Also note that <code class="language-plaintext highlighter-rouge">^</code> isn’t the first character in the range, the <code class="language-plaintext highlighter-rouge">^</code> can become an
operator if it occurs as the first character in a character set or regex.</p>
<p>There’s a parallel to boolean logic here:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">ab</code> means “<code class="language-plaintext highlighter-rouge">a</code> AND <code class="language-plaintext highlighter-rouge">b</code>”</li>
<li><code class="language-plaintext highlighter-rouge">[ab]</code> means <code class="language-plaintext highlighter-rouge">a</code> OR <code class="language-plaintext highlighter-rouge">b</code>”</li>
</ul>
<p>You can build more complex logic using groups and negation.</p>
<h2 id="negation-">Negation (<code class="language-plaintext highlighter-rouge">^</code>)</h2>
<p>I mention this operator later, but in the context of character sets, it means “everything but these”.</p>
<p>Example:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">[^ab]</code> means “everything but <code class="language-plaintext highlighter-rouge">a</code> or <code class="language-plaintext highlighter-rouge">b</code></li>
<li><code class="language-plaintext highlighter-rouge">[ab^]</code> means “<code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">b</code> or <code class="language-plaintext highlighter-rouge">^</code>. The <code class="language-plaintext highlighter-rouge">^</code> has to be the first character to have special meaning.</li>
</ul>
<h2 id="ignore-this-stuff">[Ignore this stuff]</h2>
<p>These things are unnecessarily complex. They save some verbosity at the expense of a lot of complexity.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">\w</code>, <code class="language-plaintext highlighter-rouge">\s</code>, etc. — These are shortcuts for ranges like <code class="language-plaintext highlighter-rouge">[a-zA-Z0-9]</code>. Ignore them because they’re not portable. Most
programming languages have them to some extent, but they’re hard to remember. Some languages use different syntax, like
<code class="language-plaintext highlighter-rouge">:word:</code>, which is almost as long as writing it out explicitly.</li>
<li><code class="language-plaintext highlighter-rouge">.</code> — The dot (<code class="language-plaintext highlighter-rouge">.</code>) matches any character, but not always. Sometimes it doesn’t match newlines. In some programming languages
it never matches newlines. I’ve gotten bitten too often by the <code class="language-plaintext highlighter-rouge">.</code> not behaving like I think it should. It’s best to ignore
this entirely. Instead, use a range negation, like <code class="language-plaintext highlighter-rouge">[^%]</code> if you know the <code class="language-plaintext highlighter-rouge">%</code> character won’t show up. It doesn’t hurt to
be a little more explicit.</li>
</ul>
<h1 id="repetition">Repetition</h1>
<p>These operators change the immediately previous character set to match a certain number of times:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">?</code> — zero or one</li>
<li><code class="language-plaintext highlighter-rouge">*</code> — zero or more</li>
<li><code class="language-plaintext highlighter-rouge">+</code> — one or more</li>
</ul>
<p>All these also work on entire groups as well.</p>
<h2 id="ignore-this-stuff-1">[Ignore this stuff]</h2>
<p>These are unnecessarily complex. You can accomplish the same thing through other means.</p>
<ul>
<li>Non-greedy matching, <code class="language-plaintext highlighter-rouge">*?</code> and <code class="language-plaintext highlighter-rouge">+?</code>. This comes up a lot when you use the <code class="language-plaintext highlighter-rouge">.</code> character set. Instead, you can usually use a stricter negation
character set like <code class="language-plaintext highlighter-rouge">[^%]</code>.</li>
<li>Repetition ranges, i.e. <code class="language-plaintext highlighter-rouge">{1,2}</code>. Just duplicate your pattern or use <code class="language-plaintext highlighter-rouge">?</code> or <code class="language-plaintext highlighter-rouge">*</code> on the group.</li>
</ul>
<h1 id="groups">Groups</h1>
<p>A group is basically a sub-regex. There’s three common uses for groups:</p>
<h2 id="1-repeat-a-sub-pattern">1. Repeat a sub-pattern</h2>
<p>e.g. This pattern <code class="language-plaintext highlighter-rouge">([0-9][0-9]?[0-9]][.])+</code> matches one, two or three digits followed by a <code class="language-plaintext highlighter-rouge">.</code> and also matches
repeated patterns of this. This wold match an IP address (albeit not strictly).</p>
<h2 id="2-substitutions">2. Substitutions</h2>
<p>The most common regex operations are match and substitute. However, the API for subtitution varies quite a bit
depending on the host langauge.</p>
<ul>
<li>Methods — in C#, Java, Python, etc. there’s typically a method or function named something like <code class="language-plaintext highlighter-rouge">sub</code>, <code class="language-plaintext highlighter-rouge">substitute</code> or <code class="language-plaintext highlighter-rouge">replace</code>.</li>
<li><code class="language-plaintext highlighter-rouge">sed</code> style — in sed, Perl, and bash it flows like <code class="language-plaintext highlighter-rouge">s/pattern/replacement/</code>, where the leading <code class="language-plaintext highlighter-rouge">s</code> means to “substitute”.</li>
</ul>
<p>In both cases you can use <code class="language-plaintext highlighter-rouge">$1</code> or <code class="language-plaintext highlighter-rouge">\1</code>. Lookup in the docs for which is appropriate.</p>
<h2 id="3-extract-text">3. Extract text</h2>
<p>You can extract the text that the group matches.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">0</code> — the entire regex match</li>
<li><code class="language-plaintext highlighter-rouge">1</code>-∞ — the text matched by the 1-indexed group. The first set of parentheses is group <code class="language-plaintext highlighter-rouge">1</code>, seconnd is <code class="language-plaintext highlighter-rouge">2</code>, etc.</li>
</ul>
<p>The non-portable part is that the API for accessing groups is almost always different in every programming language. Still,
group extraction is extremely useful, so just look it up.</p>
<p>The most common APIs look like:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">Match.group(1)</code> — Python, C#, Java, etc. offer a method from the main programming language to extract a group from a match object. The
exact method name is usually some something like <code class="language-plaintext highlighter-rouge">group</code> or <code class="language-plaintext highlighter-rouge">getGroup</code>.</li>
<li><code class="language-plaintext highlighter-rouge">$1</code> — Perl will set variables like <code class="language-plaintext highlighter-rouge">$1</code> and <code class="language-plaintext highlighter-rouge">$2</code> in the local scope. Most programming languages can’t do this, but you’ll see the
syntax come up, e.g. with replacements often you can use either <code class="language-plaintext highlighter-rouge">$1</code> or <code class="language-plaintext highlighter-rouge">\1</code> in the substitution text.</li>
</ul>
<p>If those APIs don’t exist, or if you don’t feel like remembering it, you can replicate extraction via subtitution. For example,
in Python you can do <code class="language-plaintext highlighter-rouge">re.sub("([^\n]*\\.foo)[^\n]*", "$1", input_str)</code> to extract the first group</p>
<h2 id="ignore-this-stuff-2">[Ignore this stuff]</h2>
<p>There are some operators at the beginning of groups, like <code class="language-plaintext highlighter-rouge">(?:</code> that can mean various things like “non-capturing group” or
“look-ahead” or “look-behind”. These are fairly advanced and you can generally get away without knowing about them.</p>
<h1 id="the---and--operators">The, <code class="language-plaintext highlighter-rouge">|</code>, <code class="language-plaintext highlighter-rouge">^</code> and <code class="language-plaintext highlighter-rouge">$</code> Operators</h1>
<p>The <code class="language-plaintext highlighter-rouge">|</code> operator is OR, but for entire regex or groups.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">foo|bar</code> matches either <code class="language-plaintext highlighter-rouge">foo</code> or <code class="language-plaintext highlighter-rouge">bar</code></li>
<li><code class="language-plaintext highlighter-rouge">(foo|bar)+</code> adds some repetition on it, e.g. it matches <code class="language-plaintext highlighter-rouge">barfoobarfoo</code></li>
</ul>
<p>The <code class="language-plaintext highlighter-rouge">^</code> is only ever significant when it’s the first character:</p>
<ul>
<li>First in the pattern — match starting at the beginning of the string or line. e.g. <code class="language-plaintext highlighter-rouge">^foo</code> will match <code class="language-plaintext highlighter-rouge">foobar</code> but not <code class="language-plaintext highlighter-rouge">barfoo</code>.
<ul>
<li>WARNING: Some regex APIs always behave like the pattern is always surrounded by <code class="language-plaintext highlighter-rouge">^</code> and <code class="language-plaintext highlighter-rouge">$</code>. You can test for this pretty easily with trial and error.</li>
</ul>
</li>
<li>First in character set — negation, match everything but those characters</li>
</ul>
<p>The <code class="language-plaintext highlighter-rouge">$</code> character only ever means “the end” and it’s only used in top-level regex.</p>
<h1 id="conclusion">Conclusion</h1>
<p>It’s not a bad idea to always only stick to this subset of regex because it’s mostly portable across programming languages.
That means less things to remember, so you get a lot of “bang for the buck” in terms of jamming info into your brain.
The quirks that do exist are relatively few, and are usually worth the effort because of the value they provide.</p>
<p>Regarding portability — most modern implementations try to copy some subset of Perl regex. The subset I’ve outlined here is
pretty consistent accross the major programming languages of today. However, you might run into some surprises if you’re using
old tools like <code class="language-plaintext highlighter-rouge">sed</code> and <code class="language-plaintext highlighter-rouge">grep</code> that were created around the same time Perl was developing the idea of regex. Newer implementations
are reasonaby stable though.</p>
<p>Too often people entirely reject regex, which is a shame because it’s an incredibly powerful language for text processing.
A little bit of regex knowledge goes a very long way. I hope this helps!</p>
Sprint Driven Development2022-11-22T00:00:00+00:00http://timkellogg.me/blog/2022/11/22/sprints<p>Agile talks about doing work in sprints, but it never felt like a “sprint” to me. It just feels like we’re
chopping work up arbitrarily into 2-week chunks. When I run, sprinting is a top-speed run focused on
getting to a clear destination as soon as possible. I need a long rest before I can sprint again.
The agile version of this doesn’t seem like it has much in common.</p>
<p>Agile talks about doing work in sprints, but it never felt like a “sprint” to me. It just feels like we’re
chopping work up arbitrarily into 2-week chunks. When I run, sprinting is a top-speed run focused on
getting to a clear destination as soon as possible. I need a long rest before I can sprint again.
The agile version of this doesn’t seem like it has much in common.</p>
<p>What if sprints were more like running?</p>
<p>A long time ago I was working for a startup. I pitched the CEO an idea to let me rewrite the entire component.
I wrote up a 1-pager, convinced everyone in the company (it was a small company) that it was the right thing,
and then I went offline for 1-2 months. I barely communicated. I worked extremely hard, and at the end I had
a very big contribution that made a large impact.</p>
<p>I wish agile sprints were like that.</p>
<p>A team can be in one of two states:</p>
<ol>
<li>Sprint Mode</li>
<li>Planning Mode</li>
</ol>
<p>Sprint mode is a period of maximum productivity. You know what you’re doing and how to get there. The only
unknown is how long it will take. If you want a team to be very productive, keep them in sprint mode as
much as possible.</p>
<p>Planning mode is when the team isn’t 100% sure where they’re going. They’re feeling it out. They might pivot
in a new direction at any point. Put simply, they’re not sprinting.</p>
<p>If a team is in sprint mode, let them stay there for as long as you can manage. If you have 2-week iterations,
cancel sprint planning until the productivity starts to cool. Don’t fix what’s not broke. Momentum is hard
to build, but easy to maintain. Maybe think about dialing back things like code reviews and other processes
that get in the way of delivering quickly.</p>
<p>Honestly, it’s not easy to get a team into sprint mode. It doesn’t happen often in practice. Sprint mode is
a rare state where the team</p>
<ol>
<li>Knows where they’re going</li>
<li>Knows how to get there</li>
<li>Has everything they need to get there, except time</li>
</ol>
<p>It takes a lot of planning and alignment work to get there.</p>
<p><em>The goal of planning mode</em> is to get the team into sprint mode. Don’t attempt to exit planning mode until you’re sure
you can (and should) stay in sprint mode for a long time. Estimate how long it long it’ll take to get to
sprint mode. Hold yourself accountable. If you sprint in the wrong direction, you’ll end up in the wrong place.</p>
<h2 id="cool-how-do-i-get-there">Cool! How do I get there?</h2>
<p>It seems like sprint mode is good, but clearly there’s trade-offs. How do I put this into practice?</p>
<p>The elephant in the room is that top manament typically wants visibility into what’s going on. You can’t
usually go dark for 1-2 months like I did, that’s a thing that really only happens in startups. The answer is a
combination of two things:</p>
<ol>
<li>Communication</li>
<li>Trust</li>
</ol>
<p>That’s what it always is. It doesn’t go away simply because you’re in sprint mode.</p>
<p>If you’re a <strong>manager</strong> or team lead, you need to communicate clearly to your management what’s happening.
Communicate your philosophy and expectations. Tell them before the team goes into sprint mode that they’ll be
heads-down for a while. In my experience, this is a suprisingly easy conversation to have. VPs love
it when you tell them “we’re in execution mode right now and we don’t need any direction”. But there’s also
a trust component; if you go dark without pre-briefing them what’s happening, you may find yourself on a much
shorter leash in the future.</p>
<p>If you’re an <strong>engineer</strong> or other individual contributor, you can’t dictate what the team does, but you can
often negotiate a different operating mode for yourself with your manager. Tell them about planning vs
sprint mode. Tell them what your plans are. Let them know that you want to go into sprint mode. You may have
to settle for daily updates delivered early before your brain gets going, or late when you’re tired. Just
make sure you can do it in a way that’s not disruptive to your flow.</p>
<p>Also, figure out how to track the amount of rework or wasted work, as an indicator that you may need to come
out of sprint mode for a time. Communicating these upwards can help buy you the trust needed to stay in
sprint mode for longer.</p>
<p>In summary, be agile. Adjust your process to fit the team. People over process.</p>
Just commit more!2022-10-04T00:00:00+00:00http://timkellogg.me/blog/2022/10/04/dura<p>Over new years this past year I made <a href="https://github.com/tkellogg/dura">dura</a>. It’s like auto-backup for Git. It tries to stay out of the way
until you’re in a panic, trying to figure out how to rescue your repository from a thoughtless <code class="language-plaintext highlighter-rouge">git reset --hard</code>.
It makes background commits, real Git commits that you don’t normally have to see in the log, by committing to a
different branch than the one you have checked out. Overall, it’s been a blast. I’ve learned a lot from the
contributors, like how to write well-formed Rust as well as a bit about <a href="https://nixos.org">Nix</a>.</p>
<p>Over new years this past year I made <a href="https://github.com/tkellogg/dura">dura</a>. It’s like auto-backup for Git. It tries to stay out of the way
until you’re in a panic, trying to figure out how to rescue your repository from a thoughtless <code class="language-plaintext highlighter-rouge">git reset --hard</code>.
It makes background commits, real Git commits that you don’t normally have to see in the log, by committing to a
different branch than the one you have checked out. Overall, it’s been a blast. I’ve learned a lot from the
contributors, like how to write well-formed Rust as well as a bit about <a href="https://nixos.org">Nix</a>.</p>
<p>One recurring quesion has been, “why don’t you just commit more”?</p>
<p>It’s not a bad question. I clearly went through a lot of effort to build a tool in Rust. I
could’ve changed my own behavior. I guess it bugged me how many hours were being wasted on rescuing
repositories around the world when the answer is so easy: just commit more.</p>
<p>When I was considering building dura, I figured that I got myself into an unrescuable situation about 1-2 times per
year. Situations so dire that even <code class="language-plaintext highlighter-rouge">git reflog</code> couldn’t save me. I rationalized that I could spend 4 days building
it and it would start saving me time in 5-6 years. That seemed worth it to me.</p>
<p>However, now that I’ve started using it, I find that I need it a lot. Like, really, A LOT!</p>
<p>I’ve never been sure how to pronounce <code class="language-plaintext highlighter-rouge">reflog</code>. It seems like it should be “ref-log”, but whenever I need to use it,
it feels a lot more like “re-flog”. It’s painful. You can’t really use it without understanding a bit about Git
internals, and honestly I wish I didn’t know anything about Git internals. I just want to rescue my code.</p>
<p>Instead of reflog, I just expand the log to all branches, <code class="language-plaintext highlighter-rouge">tig --all</code> (<a href="http://jonas.github.io/tig/">tig</a> is great btw). Voilà! A list of
changes ordered by timestamp. Dura commits every 5 seconds, at most, so the Git log becomes a timestamp ordered log
of every change I made regardless if I left a commit message. It’s more verbose than the log I usually want to see,
but I only get it when I put it into verbose mode with the <code class="language-plaintext highlighter-rouge">--all</code> option.</p>
<p>I do a lot of code reviews and I frequently find myself doing something like:</p>
<ol>
<li>Checkout PR branch</li>
<li>Make changes. Poke & prod the code. Run tests, etc.</li>
<li>Abandon the changes</li>
<li>Next PR, go to 1.</li>
</ol>
<p>A lot of times I’ll wish I didn’t abandon the changes. I used to re-type the changes from memory, but now with dura I
look back in the Git log, because now I’m committing a lot!</p>
<p>There’s also been a lot of cases where I’m switching between a lot of branches, resetting, merging, etc. and I simply
get lost. I could definitely stare at the branches for a while and figure it out what happened, but Dura is a lot
easier.</p>
<p>If I knew how useful Dura would have been, I would’ve made it a lot sooner.</p>
<h2 id="try-it-out">Try it out!</h2>
<p>If you’re on Mac, it’s <a href="https://github.com/tkellogg/dura/issues/123">gotten very easy</a>. Running <code class="language-plaintext highlighter-rouge">brew install dura</code> will not only install, but also setup
a launchctl service to keep it running. I’d love to do something similar for Windows & Linux. If that’s your jam,
send a PR!</p>
Three Plates2022-04-11T00:00:00+00:00http://timkellogg.me/blog/2022/04/11/three-plates<p>“Why don’t we test our tests?”. It’s like the three plates method. Take test code and
prod code and grind them against each other until the blemishes are ground smooth. That’s unit testing.</p>
<p>“Why don’t we test our tests?”. It’s like the three plates method. Take test code and
prod code and grind them against each other until the blemishes are ground smooth. That’s unit testing.</p>
<p>The <a href="https://ericweinhoffer.com/blog/2017/7/30/the-whitworth-three-plates-method">three plates method</a>
is a process that creates the flattest plates, with the highest precision. No power tools needed,
just 3 granite plates.</p>
<p>It goes like this:</p>
<ol>
<li>Take plates <code class="language-plaintext highlighter-rouge">A</code> and <code class="language-plaintext highlighter-rouge">B</code>, grind them together for a while</li>
<li>Grind <code class="language-plaintext highlighter-rouge">B</code> and <code class="language-plaintext highlighter-rouge">C</code> together</li>
<li>Grind <code class="language-plaintext highlighter-rouge">C</code> and <code class="language-plaintext highlighter-rouge">A</code> together</li>
<li>Repeat until smooth enough</li>
</ol>
<p>The process takes a while, but there’s no upper bound to the precision. All it takes is time and skill.
Before you start, the plates are rough cut with bumps, scars and points. But
after a few iterations, the blemishes break off iteratively to reveal a flat, smooth, beautiful surface.</p>
<p>Unit testing is a lot like this. I like to think <a href="https://www.agilealliance.org/glossary/tdd/#q=~(infinite~false~filters~(postType~(~'page~'post~'aa_book~'aa_event_session~'aa_experience_report~'aa_glossary~'aa_research_paper~'aa_video)~tags~(~'tdd))~searchTerm~'~sort~false~sortDirection~'asc~page~1)">TDD</a> means that we write the test first, but it’s
not important what comes first. It’s not like I spit out perfect test code or prod code on my first try,
and yet, after several iterations of fixing code on both sides, the code converges to a well-functioning
unit.</p>
<p>The three plates method is also a great analogy for understanding TDD and where it fits.</p>
<ul>
<li><em><strong>Two Plates?</strong></em> — Naively, I would have thought it only takes two plates to create a smooth surface,
but the third plate important. In TDD, a single test will get you a long way toward functioning prod
code, but you need more tests to hash out all the edge cases. The more, the better.</li>
<li><em><strong>Units</strong></em> — For a granite countertop, the three plates method is all you need. But
usually you’ll want to install it somewhere useful, like in a kitchen. To do that, you’ll need other
quality tools, like a level to make sure it was installed correctly. TDD is useful for what it does,
but it would be a shame to have a giant unit test suite with no functional tests. Maybe go crazy and
<a href="https://learntla.com/introduction/">try formal methods</a>.</li>
<li><em><strong>Dedication</strong></em> — The three plates method requires a lot of experience and skill. It also takes a lot of
practice to be able to leverage unit tests effectively. If your organization has trouble hiring
high caliber engineers, you may find that large unit test suites cause projects to be late or fail.
It’s hard to be internally honest about things like this, but if you can, shift some of your
controls to quality processes that require less skill, or hire QA engineers.</li>
</ul>
<p>I hope you find the three plates method to be a useful analogy for unit testing. The idea of “rough
smoothing rough” comes up in a lot of contexts, e.g. <a href="https://www.evidencebasedmentoring.org/four-ways-mentoring-benefits-mentor/">mentoring</a> and <a href="https://machinelearningmastery.com/what-are-generative-adversarial-networks-gans/">machine learning</a>.
Broadly speaking, it’s great whenever the ideal isn’t tangible, or when you’re pushing past known limits.</p>
Cold Paths2021-01-29T00:00:00+00:00http://timkellogg.me/blog/2021/01/29/cold-paths<p><em>Faced with yet another crisis caused by a bug hidden in a cold path, I found
myself Googling for a quick link to Slack out to the engineering team about cold paths.
Unfortunately, I can’t find a focused write-up; and so here I am writing this.</em></p>
<p><em>Faced with yet another crisis caused by a bug hidden in a cold path, I found
myself Googling for a quick link to Slack out to the engineering team about cold paths.
Unfortunately, I can’t find a focused write-up; and so here I am writing this.</em></p>
<p>A <strong>cold path</strong> is a path through the code or situation that rarely happens. By contrast,
<strong>hot paths</strong>
happen frequently. You don’t find bugs in hot paths. By nature, bugs are found
in places that you didn’t think to look. Bugs are always in cold paths — every bug is
found in a path colder than all the paths you tested.</p>
<p>Here are some real world “cold paths” with big consequences:</p>
<ul>
<li><a href="https://blog.thousandeyes.com/impacts-expired-tls-certificate/">An outage caused by an expired TLS certificate</a></li>
<li><a href="https://en.wikipedia.org/wiki/Year_2000_problem">Y2K</a></li>
</ul>
<p>Rare events are <a href="https://www.amazon.com/Black-Swan-Improbable-Robustness-Fragility/dp/081297381X">hard to predict</a>. That’s just the nature of them. As engineers,
I belive it’s our responsibility to do our best to try harder and get better at planning for
these rare bugs. Is that it? Try harder?</p>
<p>Better: Don’t have cold paths</p>
<h1 id="smaller-programs">Smaller programs</h1>
<p>I watched one of Gil Tene’s many amazing talks on Azul’s C4 garbage collector (not <a href="https://www.infoq.com/presentations/Java-GC-Azul-C4/">this talk</a>,
but similar) where he claimed that normally it takes 10 years to harden a garbage
collector. Azul didn’t have 10 years to produce a viable business, so they avoided almost all
cold paths in the collector and they were able to harden it in 4 years (I never tried verifying
this claim).</p>
<p>For a garbage collector, this means things like offering fewer options, or having a simpler
model to avoid cold paths around promoting objects between generations. For your app it will
mean something different.</p>
<p>You can <strong>test less</strong> to achieve high quality by <strong>reducing the size</strong> of your application.
Less edge cases is equivalent to less testing surface area, which implies less testing work
and fewer missed test cases. There’s something to be said for avoiding config options and
making solutions less generic.</p>
<h1 id="avoid-fallbacks">Avoid fallbacks</h1>
<p>While I worked at AWS I had this beaten into my skull, but thankfully they’ve published
guidence an excellent piece titled <a href="https://aws.amazon.com/builders-library/avoiding-fallback-in-distributed-systems/?did=ba_card&trk=ba_card">“avoiding fallback in distributed systems”</a>. The
hope is that, when system 1 fails you would like to automatically fallback to system 2.</p>
<p>For example, let’s say we have a process that sends logs to another service. For the hot
path, we send logs directly via an HTTP request. But if the log service fails (e.g.
overloaded, maintenence, etc.) we fallback by writing to a file and have a secondary process
send those logs to the service when it comes back.</p>
<ul>
<li>System 1: directly send logs to server</li>
<li>System 2: send asynchronously via file append</li>
</ul>
<p>If system 2 is more reliable than system 1, then why don’t we always choose system 2?
Always write to the file and ship logs asynchronously rather than send directly to the
server. This is surprisingly strong logic that isn’t considered often enough. More often,
by asking the question you end up finding a way to make system 1 more robust.</p>
<p>In cases where fallback can’t be avoided they suggest always exercising the fallback.
For example, on every request, randomly decide to use either system 1 or system 2,
thereby ensuring that the cold path isn’t cold because both are exercised on the hot path,
at least sometimes.</p>
<h1 id="know-your-capacity-for-testing">Know your capacity for testing</h1>
<p>In <a href="https://danluu.com/deconstruct-files/">“files are fraught with problems”</a>, Dan Luu demonstrates that it’s unexpectedly
difficult to write a file to disk correctly. Juggling issues like handling random power loss or
strange ext4 behavior becomes a full-time job. It’s a lot to keep in your head, just to
write a file.</p>
<p>Is it better to:</p>
<ol>
<li>Ignore the cold paths and hope for the best</li>
<li>Correctly implement & test each file write event and ship late</li>
<li>Use a system that does it correctly for you, like MySQL or SQLite</li>
</ol>
<p>Choice #3 delegates the testing of all those pesky cold paths to a 3rd party.
Therefore, #3 is always the best choice, unless your company is in the file writing
business (e.g. you’re AWS and working on DynamoDB or S3).</p>
<p>Alternnate take on the same idea: <a href="https://mcfunley.com/choose-boring-technology">Choose boring technology</a></p>
<h1 id="conclusion">Conclusion</h1>
<p>The practice of avoiding cold paths is often presented as “simple code”. Unfortunately, “simple”
has such wildly varying meanings that it’s often antagonistic to use it outside a
mathematical setting. I’ve found that centering conversations around “avoiding cold paths”
gives more clarity on how to proceed.</p>
<p>In system design, the conversation about what is “simple” is even tougher due to the
amorphous nature of it. The principle of “avoiding cold paths” can be extended to mean,
“delegating cold paths” to a trusted third party, like an open source project or a cloud
provider. An earnest discussion about your capacity for testing might be
appropriate. It lets you disengage from “building cool stuff” and instead view it as
“testing burden I’d rather not have”.</p>
Why I Don't Share Baby Pictures On Facebook & Twitter2016-11-23T00:00:00+00:00http://timkellogg.me/blog/2016/11/23/baby-pictures<p>Earlier this year my wife and I had a baby girl. She’s the sweetest and cutest baby
I’ve ever seen and a very big part of me wants to tell everyone about her and post
pictures to Facebook and Twitter. But we’ve restrained ourselves from spamming the
world. We believe there are ethical considerations at stake.</p>
<p>Earlier this year my wife and I had a baby girl. She’s the sweetest and cutest baby
I’ve ever seen and a very big part of me wants to tell everyone about her and post
pictures to Facebook and Twitter. But we’ve restrained ourselves from spamming the
world. We believe there are ethical considerations at stake.</p>
<p>Most people easily agree that it’s a bad idea to give a 7 month old baby a tattoo.
Tattoos are usually are core part of someone’s identity. They tell a life’s story,
and the parents don’t have the right to decide how the baby should express herself.
When she decides she hates it, it’s a painful and error prone process to remove the
tattoo.</p>
<p>Pictures on the Internet are similar. You can delete a picture from Facebook, but there’s
no guarantee that Facebook actually deleted it (they don’t). Even if it was deleted,
someone could have downloaded it or screenshot it (nod to Snapchat); the Internet
archives exist for this purpose. Furthermore, we know that our government captures
this sort of data on us, so even if Facebook deleted it, a future rogue government
may still be able to use it for their own nefarious purposes. I also need to protect
my daughter from future bad people.</p>
<p>This is the digital age we live in. These problems won’t get technological solutions,
so as parents we have to make decisions to protect the freedom and will of our children,
even when it seems so harmless. What other subtle ethical issues do we face?</p>
Your Debugger Is Obsolete2016-09-06T00:00:00+00:00http://timkellogg.me/blog/2016/09/06/debugger-obsolete<p>Debuggers used to be super useful, but today they are usually a sign that you don’t
know what you are doing.</p>
<p>Debuggers used to be super useful, but today they are usually a sign that you don’t
know what you are doing.</p>
<p>Debuggers are still good at debugging serial code, but these days my code is asynchronous and
distributed over many hosts. There is no concept of “stepping through code” in asynchronous
systems - stepping implies that you are on a single thread, running on a single machine.</p>
<p>Today we use metrics. With metrics, I can observe failures on hundreds of hosts
simultaneously. I can witness a starvation event begin and end over an entire fleet,
and have visual graphs to explain what happened. I can look at a period of high latency
and correlate it to a new profile of traffic that I had not considered before.</p>
<p>Things I put metrics on:</p>
<ul>
<li><strong>Latency.</strong> Obviously request latency, but also usually 6-10 different sub-sections of the
request to help troubleshoot slowness.</li>
<li><strong>Failures.</strong> Not only should you record all failures in order to calculate availability, but also put
counters on different classes of failures. Where there is an assert statement, there should
be a counter.</li>
<li><strong>Dependencies.</strong> They are like children; you have great hopes and dreams for them, but in
the end they disappoint you. Record their latency and availability for yourself.</li>
<li><strong>Features.</strong> What do customers actually use? Where do they get stuck most often?</li>
<li><strong>Traffic Profile.</strong> Record how big the request and response were or how many elements
were in “that array”. This is great for understanding where load is coming from and what sorts
of mitigations are appropriate.</li>
<li><strong>System Health.</strong> Record CPU, memory, disk and network usage. I find that, on the JVM,
a high number of garbage collections is a more reliable indicator of an unhealthy host than
high CPU or memory usage.</li>
</ul>
<p>Alarms are the first step toward a service that can manage itself. Alarms are just events.
They can notify me that something went wrong, or, better yet, fix the problem automatically.
The AWS <a href="https://aws.amazon.com/autoscaling/">Autoscaling</a> API is killer, spin up a few instances if you notice a traffic spike
or an unhealthy host, then decommission them automatically when the event is over.</p>
<p>There are some great upsides to this new world where metrics are my debugger. When things
go wrong, I find out first from my servers instead of my customers. Back when debuggers
were relevant, I found out about issues through support tickets. This is much more proactive.</p>
<p>Tests also helped make the debugger obsolete. I find that when I need to replicate an
issue, I can do it in a high component-level or functional-level test. In the process of
figuring out what went wrong I usually write a few unit-level tests. In the meantime,
I use metrics and log lines to understand the internal state and figure out where things
are going wrong. Unlike an IDE debugger, this debugging session is recorded and re-run
forever. If you still need a debugger, there is a chance that the code is simply too
complex and needs major refactoring.</p>
<p>You should absolutely write unit tests against metrics. If they don’t work, you’ll be
blind in production. They are a part of the application just as much as the request handler.
Once you start doing this, you might notice that the debugger is less useful.</p>
<p>If systems aren’t asynchronous enough for you, we’re in the process of launching the
Internet of Things where we make it extremely difficult to launch a debugger on the devices
where your software runs. Not only do they not have screens, but your fleet
has 100K or 1M devices. Whole classes of problems are about to happen that you never heard
of. So learn how to debug an application through metrics. It will be the only way
to be successful in the future.</p>
Websockets Are Not Magical2015-03-01T00:00:00+00:00http://timkellogg.me/blog/2015/03/01/websockets-are-not-magic<p>A couple months ago I was talking to a high-ranking engineer from an embedded RTOS
vendor. He was insisting that websockets are going to be one of the most important
standards for the Internet of Things. Unfortunately, the conversation was cut short
too soon for me to get a better understanding of his reasons.</p>
<p>A couple months ago I was talking to a high-ranking engineer from an embedded RTOS
vendor. He was insisting that websockets are going to be one of the most important
standards for the Internet of Things. Unfortunately, the conversation was cut short
too soon for me to get a better understanding of his reasons.</p>
<p>Since then I’ve seen an endless stream of tweets and blogs indicating that there might be
a lot of misconceptions about websockets and the Internet of Things. Every time I
see someone list “websockets” along side MQTT and CoAP my inner voice screams
<strong>“People! Websockets are just rich TCP sockets”</strong>.</p>
<p>I hope to dispell some myths here and hopefully stir up excitement about websockets
for <em>the right reasons</em>.</p>
<h2 id="myth-theres-no-extra-overhead">Myth: There’s No Extra Overhead</h2>
<p>I’ve heard intelligent and respected people say that websockets have no per-message
overhead after the initial negotiation request. This is simply not true. Two things
should tip you off: (1) its message-oriented instead of stream-oriented and (2) the
existence of text frames and data frames. These things don’t come for free.</p>
<p>Each websocket message is divided up into frames (normally 1 frame per message).
Each frame has a minimum overhead of:</p>
<ul>
<li>2 bytes for short messages (<126 bytes) going from server to client</li>
<li>6 bytes for short messages going from client to server (4 bytes for the mask)</li>
</ul>
<p>Maximum overhead is 14 bytes (or unlimited if <a href="https://tools.ietf.org/html/draft-ietf-hybi-permessage-compression-19">websocket extensions</a> are used). Still,
this still isn’t much overhead compared to HTTP and seems to be consistent with the
spec’s goals:</p>
<blockquote>
<p>The WebSocket Protocol is designed on the principle that there should be minimal framing</p>
</blockquote>
<h2 id="myth-websockets-are-just-tcp">Myth: Websockets Are Just TCP</h2>
<p>I’m guilty of spreading this myth. It seems intuitive that a technology called
“websockets” that runs on TCP would also be stream-oriented. But in <a href="https://tools.ietf.org/html/rfc6455#section-1.5">section 1.5</a>
of the spec says:</p>
<blockquote>
<p>Conceptually, WebSocket is really just a layer on top of TCP that […] layers a
<em>framing mechanism</em> on top of TCP to get back to the IP packet mechanism that TCP is
built on, but without length limits.</p>
</blockquote>
<p>So websockets are message-oriented like UDP without the maximum length constraints
but with TCP’s delivery guarantees and congestion control. It turns out that TCP’s
stream orientation isn’t all that useful (think about how many protocols build some
sort of “message” concept on top of TCP). In fact <a href="https://tools.ietf.org/html/rfc4960">SCTP (RFC 4960)</a> provides many
of the same benefits of messages-on-top-of-TCP but removes the TCP part to reduce
the overhead. Unfortunately, SCTP is yet to gain widespread adoption.</p>
<p>Since websocket connections are made from streams instead of messages, some
stream-oriented protocols could be difficult to implement in websockets. But most
protocols should fit easily into websocket frames.</p>
<h2 id="negotiation">Negotiation</h2>
<p>The single best thing about websockets (in my opinion) is that they start off with an
HTTP request that can negotiate terms for the connection. The request could
contain an <code class="language-plaintext highlighter-rouge">Authorization</code> header in order to authenticate the client before creating
the session. This means that OAuth could become less complex for protocols like MQTT.</p>
<p>The server can respond with any response code, so it’s completely legitimate to
respond with <code class="language-plaintext highlighter-rouge">307 Temporary Redirect</code> to force the client to connect to a different
(less stressed) server. For TCP protocols like MQTT that suffer from being difficult
to load balance, this could be an answer.</p>
<p>A lot of the problems I run into with trying to create a better client experience with
MQTT could be solved easily with a single negotiation request. Many kinds of metadata
could be coordinated by setting request and response headers.</p>
<p>For instance, I often want to communicate errors to the client (i.e. <em>You don’t have
access to publish to <code class="language-plaintext highlighter-rouge">foo/bar/baz</code>, try <code class="language-plaintext highlighter-rouge">foo/bar/biz</code> insead</em>). The only reasonable
way I’ve seen to communicate these errors is to have the client subscribe to a certain
topic that only they have access to (usually something like <code class="language-plaintext highlighter-rouge">$SYS/errors/<client_id></code>).
Of course, there’s no standard place to look for errors and each broker does it
different (if at all). Sending a header like <code class="language-plaintext highlighter-rouge">Client-Errors: $SYS/errors/ww1922</code> in
the response could solve this problem smoothly. This strategy could also work for other
things like topic schemas, provenance conventions, and the list goes on.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The initial negotiation request is a powerful addition to TCP-based binary protocols.
If the client is strong enough to handle some HTTP communication, websockets can add
a lot of value. At the same time, I keep seeing the term <em>websockets</em> thrown around
alongside protocols like MQTT and CoAP. Websockets are in no way a replacement for
many of these traditional IoT protocols. At best, it offers a mechanism to enhance
these protocols and communicate conventions. However, I wonder if it’s not better to
simply fix the broken protocols rather than to throw in another abstraction (we’re actually
talking about making packets out of a stream which was formed from packets, and
everyone seems to be keeping their poker faces).</p>
<p>However, I find it worrisome that websockets are being recommended so highly for Internet of
Things applications when it was so obvioulsy designed for web browsers. For instance,
each server-bound frame is masked. This seems like a frivolous use of CPU cycles
and memory buffers when we’ve worked so hard to minimize CPU and memory usage in
other areas. Also, the Origin-based security is apparently a useless gesture for
non-HTML based applications. If the Internet of Things is going to be <a href="http://www.gartner.com/newsroom/id/2636073">so important</a>,
then why doesn’t it deserve it’s own set of protocols instead of poorly repurposing
highly specialized web browser technology?</p>
Can HTTP/2 Replace MQTT?2015-02-20T00:00:00+00:00http://timkellogg.me/blog/2015/02/20/can-http2-replace-mqtt<p>Yesterday I got an <a href="https://twitter.com/errordeveloper/status/568410467493908480">interesting question</a>:</p>
<p>Yesterday I got an <a href="https://twitter.com/errordeveloper/status/568410467493908480">interesting question</a>:</p>
<blockquote>
<p>Would you agree that HTTP/2 with HPACK would certainly rule out any reason for using MQTT?</p>
</blockquote>
<p>Well, I never thought about that possibility before, so I went and read through the specs
for <a href="http://http2.github.io/http2-spec/compression.html">HPACK</a> and <a href="https://http2.github.io/http2-spec/">HTTP/2</a>. What follows is my analysis to the best of my understanding. If I get something wrong,
feel free to leave a well-intentioned comment.</p>
<p>If you’re not familiar, MQTT is a publish/subscribe protocol that is typically associated with
the Internet of Things because of it’s compact header size. It uses a long-lasting TCP connection
to send messages with (minimum) 2-byte headers. The main verbs are <code class="language-plaintext highlighter-rouge">CONNECT</code>, <code class="language-plaintext highlighter-rouge">DISCONNECT</code>,
<code class="language-plaintext highlighter-rouge">PUBLISH</code>, <code class="language-plaintext highlighter-rouge">SUBSCRIBE</code> and <code class="language-plaintext highlighter-rouge">UNSUBSCRIBE</code> (the others are different forms of acknoledgements used to implement
higher delivery guarantees than TCP).</p>
<h1 id="implementing-http2-pubsub">Implementing HTTP/2 Pub/Sub</h1>
<p>Of course, the reason this question is even being asked is because HTTP/2 supports
multiplexing of requests. This means that a single HTTP connection can be reused by the server
to send many requests and responses. Even better, a single request can receive multiple
responses – so the server can effectively push more messages to the client than they requested.</p>
<p>If you were to implement the rough equivalent of MQTT using HTTP/2 you could:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">PUBLISH</code> to <code class="language-plaintext highlighter-rouge">foo/bar</code> by sending a <code class="language-plaintext highlighter-rouge">POST</code> request to <code class="language-plaintext highlighter-rouge">http://example.com/topic/foo/bar</code> with
the message in the body of the request.</li>
<li><code class="language-plaintext highlighter-rouge">SUBSCRIBE</code> to <code class="language-plaintext highlighter-rouge">foo/bar</code> by sending a <code class="language-plaintext highlighter-rouge">GET</code> request to <code class="language-plaintext highlighter-rouge">http://example.com/topic/foo/bar</code>.</li>
<li><code class="language-plaintext highlighter-rouge">UNSUBSCRIBE</code> from <code class="language-plaintext highlighter-rouge">foo/bar</code> by sending a <code class="language-plaintext highlighter-rouge">DELETE</code> request to <code class="language-plaintext highlighter-rouge">http://example.com/topic/foo/bar</code>.</li>
</ul>
<p>All information normally transmitted in the MQTT <code class="language-plaintext highlighter-rouge">CONNECT</code> would happen naturally through
headers on requests and <code class="language-plaintext highlighter-rouge">DISCONNECT</code> would be a matter of severing the HTTP connection. To deliver
a published message to a subscribing client, the server could simply open another stream and push
the message to the client. This is called <em>server push</em>.</p>
<p>Streams are a new concept in HTTP/2. They’re somewhat equivalent to an HTTP/1.1 connection,
except that a server can initiate a stream in order to do a server push. If a
client makes a GET request and, while responding to the request, the server decides that the
client will also want another complimentary item (image, stylesheet, etc) the server
can send a <code class="language-plaintext highlighter-rouge">PUSH_PROMISE</code> message then immediately open a new stream and send the additional
item without the client having to request it.</p>
<p>In our miniature MQTT look-alike, when the client makes a <code class="language-plaintext highlighter-rouge">GET</code> request to subscribe to a topic,
the server would send response headers but leave the stream open. Whenever a new message comes
in on that subscription, the server would send a <code class="language-plaintext highlighter-rouge">PUSH_PROMISE</code> and then open a new stream to
transmit the actual message.</p>
<p>I’m sure someone could develop a much better pub/sub framework than I did in 2 minutes, but
you get the idea. HTTP/2 lends itself surprisingly well to the pub/sub pattern, despite being
designed for request/response.</p>
<h1 id="a-little-about-hpack--huffman-coding">A Little About HPACK & Huffman Coding</h1>
<p>HPACK is part of HTTP/2 for header compression. One of the causes for hesitation on using HTTP/1.1
for Internet of Things applications is the massive header size. If HTTP were ever to be viable,
some sort of header compression like HPACK would be a necessary part of this.</p>
<p>Internally, HPACK uses an old compression algorithm called <a href="https://www.cs.auckland.ac.nz/software/AlgAnim/huffman.html">Huffman coding</a> to find the minimum
number of bits to encode strings based on their frequency. The encoded version of strings are variable length - a
common string could be 2 bits and another less common string could be 17 bits (just examples, of course).
If you’ve never heard of Huffman coding before or just want a reasonable programming challenge,
I highly recommend walking through the <a href="http://en.wikipedia.org/wiki/Huffman_coding">Wikipedia page</a> and trying to implement it in your
favorite programming language.</p>
<p>Huffman coding finds the optimal number of bits to encode symbols, but there’s still much better
compression algorithms. In fact, many popular compression formats including PKZIP, JPEG and MP3
have used Huffman coding in addition to other steps. So why didn’t the IETF choose the <em>optimal</em>
compression format for compressing headers? Well, frankly, compression takes compute power
and memory space. Huffman coding does fairly well with both of these constraints.</p>
<p>It takes 2 passes to encode data with Huffman. The first pass you build a tree
out of occurrences of bit strings and track the frequency of the bit string. This is
also where the optimization happens. On the second pass, bit strings are looked up in the tree
and replaced with the corresponding optimially sized short codes.</p>
<p>Normally, the entire tree/table of codes is transimitted or stored preceding the fully encoded
message. HPACK has two “tables” - a static table and a dynamic table (you could call them trees,
like we talked about previously with Huffman coding). The static table is known by the HTTP/2 client
<em>a priori</em> because it’s part of the spec. This static table was decided on based on samples of
actual web traffic on the Internet.</p>
<p>The dynamic table is calculated by the encoder or decoder based on live data for just the current HTTP/2 connection and,
unlike the static table, is transmitted at the start of each message. A single HTTP/2 connection
can be used to service many HTTP requests and responses. The dynamic table is refined
with each message so compression gets better the longer the connection stays open (or so I assume).</p>
<h1 id="mqtt-patterns">MQTT Patterns</h1>
<p>To better understand the question, we need to talk about ways people actually use MQTT.</p>
<h2 id="as-a-funnel-protocol">As A Funnel Protocol</h2>
<p>The most common (and arguably the best) usage for MQTT is to have embedded devices publish data to
a multi-protocol broker over MQTT and re-distribut the data via another protocol that’s more
suitable for server-to-server traffic such as HTTP, Apache Kafka, AMQP or Amazon Kinesis. I
gave a <a href="http://www.slideshare.net/kellogh/mqtt-kafka-33100776">presentation</a> on using MQTT to funnel into Kafka at ApacheCon 2014. From there the
data is typically funneled into a storage or analytics system like Hadoop, Cassandra, a timeseries
database or some sort of web API.</p>
<p>At <a href="http://2lemetry.com/">2lemetry</a> we quickly ran into issues scaling what we call the <em>firehose subscription</em> (<code class="language-plaintext highlighter-rouge">#</code>),
which basically means that a single MQTT client wants to consume all the traffic (or just a lot of it)
that passes through the broker. The biggest problem with this is that a subscription can only be
serviced by a single connection on a single computer. At some point you’re going to find the memory
or I/O limits of the NIC. On the other hand, Kafka and Kinesis both offer consumer groups,
which are essentially a <a href="http://www.paperplanes.de/2011/12/9/the-magic-of-consistent-hashing.html">consistent hash ring</a> of clients that cooperatively process a single
subscription. This effectively fixes the firehose subscription problem by spreading the load over
several cleints.</p>
<p>Some embedded devices have extremely limited resources (8-16 KB of memory, slow 8 bit CPUs,
expensive data transfer rates), so they generally want to transmit that telemetry data with as
little effort as possible and consuming the least amount of bandwidth. This is one of the
greatest strengths of MQTT and is primarily where HPACK will come into play. The Huffman coding
that we discussed earlier is relatively gentle on the CPU, but encoding/decoding messages requires
roughly 2x the memory than the actual data frame (I believe). However, a message can be split over
several data frames to control memory usage, so this may not be as big of an issue as I’m making it.</p>
<p>From what I can tell, as the client re-uses the HTTP connection for PUBLISH after PUBLISH, the
headers would continue to be compressed better and better (I’m not sure this is actually true
since the dynamic table also drops entries over the life of the connection). In comparison, MQTT
is certainly smaller on the wire (and easier to parse) but time will tell if the difference is
big enough to make people use it over HTTP/2 (people seem to generally avoid using too many
protocols/technologies).</p>
<h2 id="to-ignore-faulty-networks">To Ignore Faulty Networks</h2>
<p>MQTT provides three quality of service (QoS) levels that govern delivery guarantees. The lowest
(and most common) has the same guarantees as TCP. <em>At Least Once</em> (QoS=1) uses the unique client
identifier to re-deliver messages that the client may have missed while offline. The highest level,
<em>Exactly Once</em> (QoS=2) <a href="https://lobste.rs/s/ecjfcm/why_is_exactly-once_messaging_not_possible_in_a_distributed_queue">isn’t actually possible</a> according to some basic distributed systems
principles.</p>
<p>The ability to have missed messages delivered while offline is extremely helpful for some
embedded systems. I would wager that any protocol targeted for the Internet of Things absolutely
must have the ability to give <em>At Least Once</em> guarantees. As far as I can tell, HTTP/2 doesn’t
support this level of delivery guarantee, but I believe it would be trivial to implement it on
top of HTTP/2.</p>
<h1 id="scaling-http2-on-the-server">Scaling HTTP/2 On The Server</h1>
<p>When discussing IoT protocols, scaling is rarely a topic we discuss. But, working for <a href="http://2lemetry.com/">2lemetry</a>,
this is a topic I deal with frequently so I’ll briefly address it.</p>
<p>HTTP/1.1 is easy to scale. Just throw a load balancer in front of a cluster of servers and voila!
It scales!. This is true with HTTP/2 for single use connections, but if multiplexing is heavily
used, load balancing could become difficult. Think about it, if the connection stays open for minutes
or hours, how does the server tell the client “connect to another server, I’m getting bogged down”.
This is a problem we run into frequently when scaling MQTT, as connections are frequently left open
for days on end. I’m sure we’ll solve this problem with HTTP/2, but I’m not quite sure what that
will look like.</p>
<h1 id="obligatory-notes-about-coap">Obligatory Notes About CoAP</h1>
<p>CoAP (<a href="https://tools.ietf.org/html/rfc7252">RFC 7252</a>) is a proposed standard (<strong>Correction:</strong> it is finalized) to implement
a RESTful architecture (like HTTP) for constrained devices. It’s a very compact, trivial to
parse, binary protocol that runs over UDP and has support for optional guaranteed delivery. CoAP
also supports server push in mostly the same way that HTTP/2 does.</p>
<p>CoAP maps very well to HTTP/1.1. In fact, there’s a section of the specification dedicated to
proxying between HTTP and CoAP. Two CoAP features (server push and multicast) aren’t supported
natively by HTTP/1.1, so having HTTP/2 support server push only narrows the gap and makes these
two protocols a great match. Use CoAP in constrained environments and use HTTP/2 everywhere else.
After all, CoAP can almost always be proxied neatly to HTTP/2.</p>
<h1 id="conclusion">Conclusion</h1>
<p>MQTT definitely has a smaller size on the wire. It’s also simpler to parse
(let’s face it, Huffman isn’t <em>that</em> easy to implement) and provides guaranteed delivery to cater
to shaky wireless networks. On the other hand, it’s also not terribly extensible. There aren’t a
whole lot of headers and options available, and there’s no way to make custom ones without touching
the payload of the message.</p>
<p>It seems that HTTP/2 could definitely serve as a reasonable replacement for MQTT. It’s reasonably
small, supports multiple paradigms (pub/sub & request/response) and is extensible. Its also supported
by the IETF (whereas MQTT is hosted by OASIS). From conversations I’ve had with industry leaders
in the embedded software and chip manufacturing, they only want to support standards from the IETF.
Many of them are still planning to support MQTT, but they’re not happy about it.</p>
<p>I think MQTT is better at many of the things it was designed for, but I’m interested to see over
time if those advantages are enough to outweigh the benefits of HTTP. Regardless, MQTT has been
gaining a lot of traction in the past year or two, so you may be forced into using it while HTTP/2
catches up.</p>
Was C For Hipsters?2015-02-08T00:00:00+00:00http://timkellogg.me/blog/2015/02/08/history-of-C<p>Last week I came across <a href="https://twitter.com/deech/status/564178220908417024">this tweet</a>:</p>
<p>Last week I came across <a href="https://twitter.com/deech/status/564178220908417024">this tweet</a>:</p>
<blockquote class="twitter-tweet" lang="en"><p>When C went viral was it crapped on as much as JavaScript is now?</p>— deech (@deech) <a href="https://twitter.com/deech/status/564178220908417024">February 7, 2015</a></blockquote>
<script src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>It’s true, JavaScript gets a lot of hate these days for various reasons. Some of those reasons are definitely legitimate
concerns, but a lot of it is just noise. Still, this could be an interesting case study into computer programmer’s
history of hating languages, so I shot a quick email off to my dad.</p>
<blockquote>
<p>Hey dad,</p>
<p>I saw this tweet and I want to know the answer. Since you were around when C came out, did it have a bad reputation
for making things too easy? Like too much abstraction or whatever? Like the crap JavaScript gets today</p>
<p>Tim</p>
</blockquote>
<p>One of the benefits of having a dad that’s been an realtime embedded C developer for most of his career is that I
can ask him questions like this and I get really interesting replies. Sure enough, he delivered (minimal editing by me):</p>
<blockquote>
<p>Well, back then there was no Internet, so it was harder to assess reputation.</p>
<p>C did not have a bad reputation about being too easy. There was, however, a lot of concern about “tight code” and
efficiency (of the code), and how the compiler measured up to a competent assembly programmer.</p>
<p>When I switched from assembly to C in 1981, there was never any question about programmer efficiency
improvements. The rough rule “10 lines per hour, regardless of the language” was true for both. But a line of C could do the
work of two to eight lines of assembly.</p>
<p>By programming at a higher level of abstraction with C, there were entire classes of bugs in assembly that went
away. For instance, using a ‘branch less than’ vs a ‘branch less than or equal’ vs ‘branch greater than’ vs …</p>
<p>In assembly, it took <em>much</em> more effort to clearly document the intent, because there were so many more saplings in
the forest to clutter the view. There were labels that were truly part of the logical structure (loops, etc), and
then a lot of distracting labels just to jump around the linear execution of the assembly code.</p>
<p>The early C compilers did tend to be buggy, and it was not uncommon to ‘code around a compiler bug’ (hopefully with
a comment explaining the rational).</p>
<p>The optimizations tended to be poor, too. I once created a bunch of commotion on the GCC list, when I compared
the size of the generated code to a commercial compiler. I must have hit a nerve somewhere, because within a couple of
days the GCC code size was reduced by about a third.</p>
<p>In the early days of C, debugging was almost always done at assembly level. In a way, this was good because the
engineer was always ‘peer reviewing’ the compiler’s code generation. But efficiency again increased when symbolic
C source level debuggers became widely available.</p>
<p>Early Windows programming in C was painful, because the engineer needed to set up everything manually. Typically,
this would take a couple pages of C code, with arcane incantations and rituals. When Microsoft introduced Visual Studio
to automatically hide and abstract most of the setup, then I think the concern “too easy” perhaps became more
prevalent.</p>
<p>The other part of “too easy” came from not needing to debug at the assembly level – programmers lost a feel for the
implementation of the C code. I saw this happen a lot, and it was a significant handicap for some of our guys.</p>
<p>+++++++++++</p>
<p>For a time, there was the thought “real men program in assembly”. But the economic advantages of higher abstraction,
the arrival of (mostly) bug-free compilers, and source-level debuggers pretty much killed that mindset.</p>
<p>IMO, a good systems-level/embedded software engineer should at least once walk through and understand the assembly
implementation of interrupt vectors, a task context switch, multi-precision math, pointer indirection, subroutine
register calling convention, implementation of high-level data structures, etc.</p>
</blockquote>
IoT Startups Will Fail Without Standards2015-01-27T00:00:00+00:00http://timkellogg.me/blog/2015/01/27/iot-needs-standards<p>I was talking to a man at a <a href="http://www.meetup.com/Denver-Internet-of-Things-Office-Hours/events/219382337/">Denver IoT meetup group</a> last week about his Internet of Things related startup. He was telling me about his plans to create an innovative new product that interoperates with smart phones, tablets, and arbitrary sensors. I really liked his idea, but then a question occurred to me:</p>
<p>I was talking to a man at a <a href="http://www.meetup.com/Denver-Internet-of-Things-Office-Hours/events/219382337/">Denver IoT meetup group</a> last week about his Internet of Things related startup. He was telling me about his plans to create an innovative new product that interoperates with smart phones, tablets, and arbitrary sensors. I really liked his idea, but then a question occurred to me:</p>
<blockquote>
<p>Are you worried about failing as a hardware startup? I know I’ve had a lot of ideas for hardware startups, but I always talk myself away from them because it seems like large billion dollar corporations are the only ones with enough resources to execute the idea.</p>
</blockquote>
<p>He agreed. Then I continued thinking about it. Silicon valley has perfected the art of software startups. Hardware has the same set of problems, only magnified. For instance, in software you need to get the product into the users hands so you send out a link to your web application via Twitter, Facebook and other social outlets. But in hardware you have to produce 100 prototypes and physically mail them out.</p>
<p>It seems to me that successful software startups have gained traction because they’re trivial for new users to start using. Imagine if iTunes didn’t recognize MP3 format, or if Github invented their own version control software, or if Tinder made you buy their own specialized device instead of just running on your existing smart phone. No one would fall for that crap.</p>
<p>We rely on re-using our web browsers and smart phones. If someone sells a smart light bulb, it better work in existing light sockets or else no one is going to use it. If your IoT device is going to talk to my smart phone, I’ll be more likely to use it if I don’t have to install a new app. This is where standards become important. Big, billion dollar companies have enough resources to force their users to install monolithic and/or incompatible components. Small companies, where the innovation tends to happen, don’t have that option.</p>
<p>Unfortunately, there’s far too many competing IoT “standards” today. A standard is utterly useless if it doesn’t have a majority of people using it. It doesn’t matter how technically superior it is, if it doesn’t interoperate with the rest of the world, no one will use it. In fact, there’s a <a href="http://ils.unc.edu/callee/gopherpaper.htm">long history</a> of technically inferior technologies taking over simply because they’re more broadly accepted.</p>
<p>I believe that the battle over which IoT standards win out will be decided by chip manufacturers. I’ve witnessed scores of embedded developers that would rather open a raw UDP or TCP socket and forego security, robustness and interoperability than pull in an MQTT or CoAP library. Chips and embedded operating systems need to have these protocols built in, otherwise developers won’t use them and we’ll continue down the current path into a rat’s nest of incompatible devices.</p>
<p>If you’re an embedded engineer, try to influence your hardware suppliers to adopt standards. If you’re a user, try to only buy products that interoperate using global Internet standards. It’s the only way we’ll end up with an innovative and useful Internet of Things.</p>
ThingMonk 2014: Toward a more intelligent IoT2014-12-05T00:00:00+00:00http://timkellogg.me/blog/2014/12/05/thingmonk-recap<p>This week I was fortunate enough to attend ThingMonk in London. RedMonk were excellent hosts and managed to put together a tremendous lineup of speakers and talks that I hadn’t anticipated. There were only 150 attendees, but each one of them brought something unique. Here I attempt to summarize some of the day, I know I’ve missed several truly great talks, but I just wanted to keep it short.</p>
<p>This week I was fortunate enough to attend ThingMonk in London. RedMonk were excellent hosts and managed to put together a tremendous lineup of speakers and talks that I hadn’t anticipated. There were only 150 attendees, but each one of them brought something unique. Here I attempt to summarize some of the day, I know I’ve missed several truly great talks, but I just wanted to keep it short.</p>
<p><a href="https://twitter.com/borisadryan">Boris Adryan</a>, a geneticist, gave a thought provoking perspective on how he believes the Internet of Things needs to have some form of directory or database. In his field of study, academic papers are mapped ontologically so that similar papers can be quickly found. He believes that this sort of knowledge and information mapping needs to be applied to sensors and open data to force valuable epiphanies out into the open.</p>
<p>Boris’ talk was just the start of an overarching theme that emerged over the course of the day. We’ve already fought over protocols like MQTT versus CoAP versus DDS, etc. Now it’s time to go beyond simple wire protocols and talk about what these giant mounds of data actually mean. As <a href="https://twitter.com/knolleary">Nick O’Leary</a> <a href="http://knolleary.net/2014/12/04/a-conversational-internet-of-things-thingmonk-talk/">eloquently put it</a>:</p>
<blockquote>
<p>What (mostly) everyone agrees on is the need for more than just efficient protocols for the Things to communicate by. A protocol is like a telephone line. It’s great that you and I have agreed on the same standards so when I dial this number, you answer. But what do we say to each other once we’re connected? A common protocol does not mean I understand what you’re trying to say to me.</p>
<p>And thus began the IoT meta-model war.</p>
</blockquote>
<p><a href="https://twitter.com/yoditstanton">Yodit Stanton</a>, founder of <a href="http://www.slideshare.net/kellogh/thing-monk-improvemqtt">OpenSensors.io</a> talked about the need for more than simply gathering sensor data. Her general message was that we’re starting to get the hang of the wire protocols, but how do we make sense of all this data? Data structures such as the Bloom filter and hyper log log are becoming available that let us estimate useful information, like presence or cardinality, without consuming a gargantuan amount of computer resources.</p>
<p><a href="https://twitter.com/andysc">Andy Stanford-Clark</a>, the inventor of MQTT, had everyone’s eyes glued to the front during his talk. The first couple minutes of his presentation were spent explaining how the machine worked that he ran his slide show from. It was a Raspberri Pi powered by hydrogen. While that seems like it could have been the thesis of his talk, that was simply to kill time until the machine booted. Once started, he talked about different aspects of his home that he’s redesigned with sensors and devices. It is clear that Andy’s vision for the Internet of Things does not require much human interaction - it just quietly augments our lives without inducing noticeable burden.</p>
<p><a href="https://twitter.com/andiamohq">Andiamo</a> presented an inspirational story about a young girl that he was able to help by 3D printing a back brace. While the traditional methods would have required 25 weeks, this back brace was produced in only 48 hours. They knew they had succeeded in producing something beautiful for this girl when a woman mistook the device for some sort of kinky clothing style - a far cry from the ugly status quo that would have labeled the girl as an invalid.</p>
<p>I gave a talk toward the end of the day about some problems in the MQTT specification, originally <a href="http://vasters.com/clemensv/2014/06/02/MQTT+An+Implementers+Perspective.aspx">identified by Clemens Vasters</a>. Much of my talk revolved around how exactly-once delivery (QoS 2) simply isn’t possible to guarantee in a horizontally scaled broker. I took some time to explain the CAP theorem and how it is relevant to the Internet of Things. Overall, I think my talk was well recieved, however much I felt woefully antiquated in my choice of topic.</p>
<p><a href="https://twitter.com/ianskerrett">Ian Skerret</a> wrapped up the day with an overview of the current state of standards organizations. I highly recommend skipping on over to <a href="http://www.slideshare.net/IanSkerrett/abc-of-iot-consortium">his slides that have been posted on SlideShare</a>. He carefully reviewed several standards bodies and assigned high school style letter grades for qualities such as openness and adoption levels. Again, his slides do a pretty good job of standing on their own. I’d like to see his talk manifested into a website analogous to <a href="https://tldrlegal.com/">TL;DR Legal</a> but for IoT standards orgianizations.</p>
<p>Overall I was blown away by the quality and personal conviction of all the speakers. Even after dinner, when the talks were finished, I engaged Boris in a fascinating conversation about how distributed systems concepts arise in cellular conscription; something I certainly hadn’t planned on hearing about. My recommendation is that, if you go to one conference next year, let ThingMonk be the one.</p>
Why Open Source May Not Always Work For IoT2014-10-20T00:00:00+00:00http://timkellogg.me/blog/2014/10/20/open-source-iot<p>On Friday, <a href="http://readwrite.com/author/matt-asay">Matt Asay</a> wrote an <a href="http://readwrite.com/2014/10/17/internet-of-things-open-source-iot-developers">article on readwrite</a> about <em>why the Internet of Things has to be open sourced</em> that triggered a lot positive responses in <a href="https://twitter.com/kellogh">my Twitter feed</a>. I generally agree with what Matt had to say, but I found it unsettling that he conflates open source software with open specification. This distinction is important! There is a place for both open source and proprietary in the IoT and I believe that ignoring these differences will cause more harm than good.</p>
<p>On Friday, <a href="http://readwrite.com/author/matt-asay">Matt Asay</a> wrote an <a href="http://readwrite.com/2014/10/17/internet-of-things-open-source-iot-developers">article on readwrite</a> about <em>why the Internet of Things has to be open sourced</em> that triggered a lot positive responses in <a href="https://twitter.com/kellogh">my Twitter feed</a>. I generally agree with what Matt had to say, but I found it unsettling that he conflates open source software with open specification. This distinction is important! There is a place for both open source and proprietary in the IoT and I believe that ignoring these differences will cause more harm than good.</p>
<p>First of all, I think Matt’s intentions are right on target. The sub-title of his article is “developers aren’t going to go for proprietary standards”. While this is a great statement to make, it isn’t even close to the same statement as “IoT has to be open sourced”. Let’s look at the best success story we have available: HTTP.</p>
<p>HTTP is the core of the old web. It’s simple, small and does one thing very well - it implements a request/response pattern and makes very few assumptions about the underlying technology. This is huge. Remember how those expensive monolithic Unix servers fell out of favor and were replaced by cheap Linux servers? No one had to go to the IETF to revise the HTTP specification to account for Linux because HTTP wasn’t tied up with Unix concerns. They were entirely seperate - this is a trait that we need in the IoT.</p>
<p>Open standards usually need to be small to be successful. If they’re small, there’s less to disagree on. Several years ago I worked for a large corporation and I remember it being nearly impossible to get stakeholders across the company to agree on standards. Internet standards are magnitudes more difficult to arrive at because you have so many participating corporations, each with wildly different intentions and company (and geographic) cultures.</p>
<p>Worse, we frequently <a href="http://quod.lib.umich.edu/j/jep/3336451.0014.103?view=text;rgn=main">make bad decisions</a> the first few times around. If our standards are small and componetized, it’s not too difficult to roll back the ones that didn’t pan out and replace them with another idea. When SOAP didn’t work as well as promised <sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">1</a></sup>, we didn’t have to throw out our web servers, we just stopped using SOAP. Cryptographic algorithms are an even better example, we’ve upgraded our algorithms every few years and most developers and sysadmins never needed to care much because the upgrade path was so seemless. <em>The IoT needs small componentized open standards.</em></p>
<h2 id="are-we-talking-about-open-source">Are we talking about open source?</h2>
<p>No, this isn’t the same thing as open source. Open source is about making a free implementation with an open process. Unfortunately, implementations don’t always get it right. Even when the process is open and adaptive. Sometimes they do get it right, but organizations have shockingly different worldviews and can’t agree on an implementation <sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">2</a></sup>.</p>
<p>Look at the Apache web server. Was it successful? Absolutely! But lately it’s market share has trended toward being replaced by Nginx due to the simplicity of Nginx. Even still, a significant portion of market share is owned by proprietary web servers from Google, Microsoft and others - yet none of this has caused problems because they all standardized on an open specification.</p>
<p>Recently it seems like open source has become the new generally accepted correct way to do things. The trouble is open source software takes time to create yet money must still be made. We still have to feed our families, so where does the money come from? Matt Asay is a VP at MongoDB. The MongoDB database is open source but the company earns a profit by charging for support. Amazon EC2 is fully closed source and non-free but many of their services have <a href="https://github.com/aws">open source clients</a>.</p>
<p>There is no such thing as a free lunch. The money always comes from somewhere, and <a href="http://www.usatoday.com/story/news/nation/2014/03/08/data-online-behavior-research/5781447/">sometimes it’s more ethical</a> to have the money-flow stated explicitly up-front. With that said, I still think Matt is correct. Capturing money later in the development process does wonders for accelerating innovation.</p>
<p>Overall, I think Matt’s analysis was spot-on. Open source is going to have a critical role in the Internet of Things. However, open specification is non-negotiable. Some organizations may need proprietary solutions - and that’s fine as long as we’re standardized behind a set of small componentized open specifications.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:4" role="doc-endnote">
<p>Okay, I’m still kind of young and don’t really have a lot of great examples of failed Internet technologies. If you can’t contain yourself, feel free to post your own examples in the comments. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:5" role="doc-endnote">
<p>I’ll go out on a limb and say that no implementation (open source or otherwise) has ever become universally accepted. However, I think standards have a much better track record for full acceptance. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>
FP For The Working Programmer: Why Is null Bad?2014-06-24T00:00:00+00:00http://timkellogg.me/blog/2014/06/24/why-is-null-bad<p>Null is dangerous. This is a tough statement to accept for a lot of people I’ve worked with. The concept of null is deeply ingrained into the languages we use. In C/C++, if you access a member of a null pointer, the program can sometimes continue to run but generate strange results. This led to bugs that were sometimes very difficult to trace. Java improved the situation by causing programs to fail the instant a null pointer was accessed.</p>
<p>Null is dangerous. This is a tough statement to accept for a lot of people I’ve worked with. The concept of null is deeply ingrained into the languages we use. In C/C++, if you access a member of a null pointer, the program can sometimes continue to run but generate strange results. This led to bugs that were sometimes very difficult to trace. Java improved the situation by causing programs to fail the instant a null pointer was accessed.</p>
<p>Failing sooner rather than later makes bugs easier to trace, for sure. What if we could make the compiler disallow nulls?</p>
<figure class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">class</span> <span class="nc">Foo</span> <span class="o">{</span>
<span class="kd">private</span> <span class="nc">String</span> <span class="n">name</span> <span class="o">=</span> <span class="kc">null</span><span class="o">;</span>
<span class="kd">public</span> <span class="kt">int</span> <span class="nf">length</span><span class="o">()</span> <span class="o">{</span>
<span class="k">return</span> <span class="n">name</span><span class="o">.</span><span class="na">length</span><span class="o">();</span>
<span class="o">}</span>
<span class="kd">public</span> <span class="kt">void</span> <span class="nf">setName</span><span class="o">(</span><span class="nc">String</span> <span class="n">name</span><span class="o">)</span> <span class="o">{</span>
<span class="k">this</span><span class="o">.</span><span class="na">name</span> <span class="o">=</span> <span class="n">name</span><span class="o">;</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="nc">Foo</span> <span class="n">foo</span> <span class="o">=</span> <span class="k">new</span> <span class="nc">Foo</span><span class="o">();</span>
<span class="n">foo</span><span class="o">.</span><span class="na">length</span><span class="o">();</span> <span class="c1">// KAPOW!!!</span></code></pre></figure>
<p>There are two kinds of values, (1) the ones that are there and (2) the ones that might not be. The trouble with the type systems of Java/C#/…/Ruby is that you can’t tell the difference between these types. The null value is implicitly always available, so you have to always check for it even though it may not even make sense.</p>
<p>Newer languages like Scala offer an Option type that represents something that can have no value. Here’s the example in Scala:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">class</span> <span class="nc">Foo</span> <span class="o">{</span>
<span class="k">var</span> <span class="n">name</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">String</span><span class="o">]</span> <span class="k">=</span> <span class="nc">None</span>
<span class="k">def</span> <span class="nf">length</span> <span class="k">=</span> <span class="nv">name</span><span class="o">.</span><span class="py">getOrElse</span><span class="o">(</span><span class="s">""</span><span class="o">).</span><span class="py">length</span>
<span class="k">def</span> <span class="nf">getName</span> <span class="k">=</span> <span class="n">name</span>
<span class="k">def</span> <span class="nf">setName</span><span class="o">(</span><span class="n">value</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span> <span class="o">{</span>
<span class="n">name</span> <span class="k">=</span> <span class="n">value</span>
<span class="o">}</span>
<span class="o">}</span>
<span class="k">val</span> <span class="nv">foo</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">Foo</span><span class="o">()</span>
<span class="nf">println</span><span class="o">(</span><span class="nv">foo</span><span class="o">.</span><span class="py">length</span><span class="o">)</span> <span class="c1">// 0</span>
<span class="nv">foo</span><span class="o">.</span><span class="py">setName</span><span class="o">(</span><span class="nc">Some</span><span class="o">(</span><span class="s">"fred"</span><span class="o">))</span>
<span class="nf">println</span><span class="o">(</span><span class="nv">foo</span><span class="o">.</span><span class="py">length</span><span class="o">)</span> <span class="c1">// 4</span></code></pre></figure>
<p>The <code class="language-plaintext highlighter-rouge">Option</code> type wraps a value; <code class="language-plaintext highlighter-rouge">Some("fred")</code> is non-null and <code class="language-plaintext highlighter-rouge">None</code> a lot like null. You can’t access the value inside the option directly - <code class="language-plaintext highlighter-rouge">name.length</code> would result in a compile error. This could get cumbersome so the Option type has methods to make them fun again.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">getOrElse(other: T): T</code> - get the value inside the option, otherwise use a default value</li>
<li><code class="language-plaintext highlighter-rouge">filter(predicate: T => Boolean): Option[T]</code> - returns an <code class="language-plaintext highlighter-rouge">Option[T]</code> but may turn a Some into a None.</li>
<li><code class="language-plaintext highlighter-rouge">map[U](function: T => U): Option[U]</code> - safely converts the inner value to something else</li>
<li><code class="language-plaintext highlighter-rouge">flatMap[U](function: T => Option[U]): Option[U]</code> - safely converts the inner value to another option</li>
</ul>
<p>Once you get comfortable with Options, your start writing less code and with fewer bugs. At some point you’ll find that, more often than not, <strong>the types only get in the way of the mistakes</strong>. We’re starting to see Option-like concepts in <a href="http://docs.oracle.com/javase/8/docs/api/java/util/Optional.html">Java</a>, <a href="http://blogs.msdn.com/b/jerrynixon/archive/2014/02/26/at-last-c-is-getting-sometimes-called-the-safe-navigation-operator.aspx">C#</a> and <a href="http://en.cppreference.com/w/cpp/experimental/optional">C++</a>. We’ll talk more about Options later, but for now I’ll leave you with this gem:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="nf">doLogin</span><span class="o">(</span><span class="n">user</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">password</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span> <span class="k">=</span> <span class="o">???</span>
<span class="c1">// only attempt an actual login if both user and password are given</span>
<span class="k">def</span> <span class="nf">login</span><span class="o">(</span><span class="n">user</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">String</span><span class="o">],</span> <span class="n">password</span><span class="k">:</span> <span class="kt">Option</span><span class="o">[</span><span class="kt">String</span><span class="o">])</span> <span class="k">=</span> <span class="o">{</span>
<span class="nv">user</span><span class="o">.</span><span class="py">flatMap</span><span class="o">(</span><span class="n">u</span> <span class="k">=></span>
<span class="nv">password</span><span class="o">.</span><span class="py">flatMap</span><span class="o">(</span><span class="n">pw</span> <span class="k">=></span> <span class="nf">doLogin</span><span class="o">(</span><span class="n">u</span><span class="o">,</span> <span class="n">pw</span><span class="o">)))</span>
<span class="o">}</span></code></pre></figure>
MQTT - Another Implementor's Perspective2014-06-02T00:00:00+00:00http://timkellogg.me/blog/2014/06/02/MQTT-another-implementors-perspective<p>Earlier there was <a href="http://vasters.com/clemensv/2014/06/02/MQTT+An+Implementers+Perspective.aspx">a blog post by Clemens Vasters</a> that flamed MQTT. My preference is to take these complaints to the standards bodies responsible for MQTT and try to make constructive changes, but it appears that this is a man who prefers flame wars over professional dialog. I’ve been <a href="https://twitter.com/kellabyte/status/473472640364331008">challenged to write a rebuttal</a>, so here it is.</p>
<p>Earlier there was <a href="http://vasters.com/clemensv/2014/06/02/MQTT+An+Implementers+Perspective.aspx">a blog post by Clemens Vasters</a> that flamed MQTT. My preference is to take these complaints to the standards bodies responsible for MQTT and try to make constructive changes, but it appears that this is a man who prefers flame wars over professional dialog. I’ve been <a href="https://twitter.com/kellabyte/status/473472640364331008">challenged to write a rebuttal</a>, so here it is.</p>
<h2 id="goals">Goals</h2>
<p>Obviously Clemens misunderstands the goals of MQTT. He has an entire section (8 paragraphs!) dedicated to extensibility and later criticizes the lack of custom headers. I’ve worked with MQTT for about a year and never even realized that extensibility was even a goal of the protocol, so I was mystified why the lack of extensibility was so cornerstone to many of Clemens’ arguments. Nowhere in the entire spec does it say anything about extensibility. When I googled for “MQTT extensible”, the top relevant hit is Clemens’ blog. Where did this notion come from? No one else is talking about it.</p>
<p>MQTT is meant to be “lightweight, open, simple, and designed so as to be easy to implement”. The blog starts off by discussing IBM in depth, as if it was somehow a closed IBM spec. The reality is that IBM has very little to do with the direction of MQTT at the present time. Sure, IBM was the creative force in the beginning, but since it handed it over to OASIS and the Eclipse Foundation, IBM has mostly left it alone. MQTT is truely an open standard driven by open source software. Even I, a simple software engineer at a startup, feel as though I have a voice in the MQTT community. Please don’t let Clemens’ wordy lecture make you believe otherwise.</p>
<p>Most importantly, the goal of the protocol is to be lightweight yet simple and easy to implement clients. If the goal was only to be lightweight, <a href="http://mqtt.org/new/wp-content/uploads/2009/06/MQTT-SN_spec_v1.2.pdf">MQTT-SN</a> would be a much better choice. If the goal was extensibility, AMQP would be a better option. It aims to be easy to implement new clients. Evidence of this is easy to see in how it tends to offload complexity to the broker when given the option. Clemens implemented a broker distributed over many machines and tacked onto some other messaging protocol - when he complains that it was a complex task it’s because he made it complex, not because the task itself is inherently complex.</p>
<p>I firmly believe that MQTT successfully achieves the goals that it is aiming for. I’ve talked to several people that have been able to implement a working client in a couple hours. Also, while it isn’t the most lightweight protocol available, it’s certainly quite good and definitely better than XMPP or AMQP. The truth is, you can get an MQTT client to run in very constrained environments - something that can’t be said for many of the alternatives.</p>
<h2 id="bytes">Bytes</h2>
<p>One complaint that is almost valid is the variable 1-4 byte remaining length field. All other strings in MQTT are prefixed by a 2-byte length. He rightly points out that the variable 1-4 byte remaining length field is inconsistent with the other strings. However, he neglects to notice that some messages have up to 6 strings, each prefixed by a 2-byte length. If the remaining length was only 2 bytes, this would result in a leaky abstraction (saying each string could be 65535 bytes long but then limiting the sum total of all strings to less than 65536 bytes). What would be the point of introducing a leaky abstraction?</p>
<p>In the CONNECT message there is a protocol identifier that is always the constant “MQTT”. The spec explains that it exists only for network analyzers to quickly identify it as MQTT traffic, as is common practice. Clemens criticizes the fact that this string is prefixed by a 2-byte length and suggests that it should be just the raw 4 bytes without the prefixed length. The spec’s choice supports the “simple” and “easy to implement” goals of the protocol. In fact, this choice enabled the protocol to switch from the historical IBM-ridden “MQIsdp” to the current “MQTT” representative of it’s current open spec.</p>
<p>The spec’s statement that this “will not be changed by future versions of the MQTT specification” means that, while this protocol identifier has been different in previous versions of the spec, they are committing to the name “MQTT”. There’s a very clear reason for why it was implemented this way, unfortunately Clemens didn’t seem to take time to fully understand that.</p>
<p>When addressing the size of the wire protocol, he adds the length of IPv6, TCP, and TLS headers onto the length of an MQTT message to demonstrate how many bytes are wasted. In reality, most usages of MQTT would combine MQTT messages into the same packet (<a href="http://en.wikipedia.org/wiki/Nagle's_algorithm">Nagling</a>) which would destroy his point here. He does acknowledge this, but I’m not sure why spend the time to make such a fruitless point when it has no reflection on reality.</p>
<p><em>Edit:</em> In another place, he makes a great point that there can only be 65535 in-flight messages, which would make communication a problem in high-throughput scenarios. However, the goals of the protocol are again missed. It’s designed as an IoT protocol, for lightweight devices. In what scenario would a device with 100K of memory ever have more than 65535 in flight messages? Honestly, I think this tradeoff is intentional and wisely chosen.</p>
<h2 id="content-type">Content-Type</h2>
<p>There has been some discussion in the MQTT community on how to represent the content-type of payloads. Clemens rightly points out the lack of content type as many other protocols have. But this viewpoint neglects the more traditional usage of MQTT where content-type makes no sense. This usage is best illustrated by the <a href="https://github.com/mqtt/mqtt.github.io/wiki/SYS-Topics">$SYS topic space</a> used for monitoring the status of the broker. Each topic has UTF-8 numbers published on it. For instance, the broker may periodically publish a message to <code class="language-plaintext highlighter-rouge">$SYS/messages/received</code> that contains the total number of messages received by the broker since it started.</p>
<p>This strategy can be used in combination with <a href="https://github.com/mqtt/mqtt.github.io/wiki/topic_format">topic patterns</a> to do realtime queries via SUBSCRIBE requests. It can be very powerful, especially for constrained devices consuming messages in the field. Of course, if someone doesn’t know about this strategy I could see how they might be unsatisfied with MQTT. It’s unfortunate that he chose to flame MQTT publically on the internet before spending the time to learn how MQTT is actually used in practice.</p>
<h2 id="choosing-the-right-forum">Choosing The Right Forum</h2>
<p>When talking about delivery assurances, data retention, failover and security, a few points are mentioned that are ambiguous in the spec. Honestly, I think they are great points. Many of these things could be cleaned up. The 3.1.1 version of the spec has been open for comment for several months - something that would be hard to miss since the it says so and gives instructions for giving feedback directly inside of the preliminary spec (final versions of the spec aren’t yet available).</p>
<h2 id="conclusion">Conclusion</h2>
<p>Clemens wrote a damning 21 page blog post on MQTT. I truly doubt that many people took the time to carefully read through all that text to understand the holes. Regardless, Clemens is a respected individual in our community, and this blog received a lot of attention. As a result, hundreds or thousands of people now have the impression that MQTT isn’t designed well due to 140 character tweets framing it as such. The trouble is that this argument was made on false pretenses and measured MQTT against goals that it never intended to have.</p>
<p>Nothing he brought up is beyond fixing, and I have confidence will be fixed soon. The MQTT spec is an open collaboration that depends on individuals to contribute wisdom and experience. I don’t understand why Clemens chose to publically destroy the reputation of MQTT rather than simply offering to help fix it. The MQTT Technical Committee has always been very open to hearing and addressing concerns.</p>
Why I'm Not Going To Stop Posting Go Links2014-01-19T00:00:00+00:00http://timkellogg.me/blog/2014/01/19/I-get-excited-about-go<p>On Friday, shortly after <a href="https://lobste.rs/s/qt8zcq/go_by_example">posting a link</a> about learning Go to Lobste.rs I got this tweet:</p>
<p>On Friday, shortly after <a href="https://lobste.rs/s/qt8zcq/go_by_example">posting a link</a> about learning Go to Lobste.rs I got this tweet:</p>
<blockquote class="twitter-tweet" lang="en"><p><a href="https://twitter.com/kellogh">@kellogh</a> Since we're on the topic of link quality, may I ask you not to post Golang stuff to lobste.rs?</p>— Chris Allen (@bitemyapp) <a href="https://twitter.com/bitemyapp/statuses/424289998167212032">January 17, 2014</a></blockquote>
<script async="true" src="//platform.twitter.com/widgets.js" charset="utf-8"> </script>
<p>We continued the conversation <a href="https://twitter.com/bitemyapp/statuses/424289998167212032">via Twitter</a> and then a personal email. The short story is that Chris believes that Go’s type system and core language is seriously flawed and that we should be promoting pure and complete languages like Haskell instead of broken languages like Go. I completely agree that Haskell is a beautiful language, and that Go pales in comparison. The thing is, I believe Go (and impure languages like it) are very powerful and we should be excited about them.</p>
<p>Somewhere down the line the conversation lead to this tweet:</p>
<blockquote class="twitter-tweet" lang="en"><p><a href="https://twitter.com/kellogh">@kellogh</a> That doesn't sound like something somebody that understands Haskell would say. Are you sure? What did you build?</p>— Chris Allen (@bitemyapp) <a href="https://twitter.com/bitemyapp/statuses/424291827508727809">January 17, 2014</a></blockquote>
<script async="true" src="//platform.twitter.com/widgets.js" charset="utf-8"> </script>
<p>I mean no harm against Chris or anyone else. He’s very passionate and I understand the point he’s trying to make, I just don’t agree with it. There’s a lot of people who share his view, but I haven’t heard a lot of people who agree with mine. To clarify my position and respond to his question, I moved the conversation to email:</p>
<blockquote>
<p>Hi Chris,</p>
<p>I tried making a MQTT client in Haskell. I was a beginner, it felt impossible to read I/O and hold the state that MQTT requires. I’m sure that someone who really knows Haskell wouldn’t have any trouble writing an MQTT client. I tried for a while then gave up.</p>
<p>That seems to be a common story though. Man tries Haskell, realizes he’s not smart enough and gives up to pursue simple things like distributed systems. OK, that last part is a little snarky but it seems like a developer can only pursue a very limited number of hard things. I thought about becoming an expert in Haskell and writing networking apps, but it doesn’t pay well. I can make my company much happier by working hard on distributed systems, embedded systems, organizing meetups, writing blogs, etc.</p>
<p>Its all about a point that I’ve been honing in on over the last few years. Programming isn’t an end goal. Even within computer science it isn’t an end goal. Its always a means to an end. Its a way to have a computer achieve your goals for you. So I need to focus my effort on what gets me to the end goal.</p>
<p>Its really easy to accomplish hard goals when you’re working on a team. The trouble is, its really hard to find a team that writes exclusively in Haskell (or any pure functional language for that matter). Its probably because some idiot a long time ago decided that imperative programming is easier; it doesn’t really matter though.</p>
<p>People learn to program imperatively and the rest of their career needs to be spent unlearning. Sure it would be nice if it wasn’t that way. I like languages like C#, Go, Scala and Rust because they introduce the learner to functional concepts at they’re own pace without forcing it on them.</p>
<p>Imagine if there was an activist group that wanted to get all American people to use chopsticks. They even have proof that if eliminates obesity and diabetes, so they swiftly conquer congress and pass a law stating that all dinnertime place settings must have the option of both chopsticks and fork and spoon. Do you think that most people are going to start using chopsticks after using fork and spoon all their lives? Probably not. But they might start using them incrementally as their friends start catching on.</p>
<p>Obviously the analogy isn’t perfect but it does have its merits. People will continue using what they know. With programming this has an even bigger effect since the entire team has to agree on the same technology stack.</p>
<p>So the short story is that I think we should get excited about the impure languages like C#, Scala, Go and Rust. They’re mainstream enough that it gives us hope that one day we can use a more pure language. Until that day I’m choosing to use whatever tools let me get stuff done.</p>
<p>I hope this makes sense.</p>
<p>Regards, <br />
Tim</p>
</blockquote>
<p>I really don’t want to start a flame war, but I can’t stand how much hate is flying around the developer community. Everything has a purpose. There is no silver bullet, and there is no paradigm, process or technology that is always the best choice. Rather than flaming each other, lets spend time teaching each other about the caveats so we can all achieve our end goals.</p>
An Unbiased Comparison of F# and Scala2013-06-22T00:00:00+00:00http://timkellogg.me/blog/2013/06/22/comparing-scala-to-fsharp<p>Given my history as a .NET developer I learned Functional Programming via F#,
but I just started a new job as a Scala developer. Naturally, I’ve been
comparing the two languages and the quirks and nuances that make could make
them enjoyable or problematic. To summarize quickly, I think Scala is more
approachable but less “pure” than F#. Scala seems to have a diverse set of
influences whereas F# tries to stick closely to proven Functional Programming
basics.</p>
<p>Given my history as a .NET developer I learned Functional Programming via F#,
but I just started a new job as a Scala developer. Naturally, I’ve been
comparing the two languages and the quirks and nuances that make could make
them enjoyable or problematic. To summarize quickly, I think Scala is more
approachable but less “pure” than F#. Scala seems to have a diverse set of
influences whereas F# tries to stick closely to proven Functional Programming
basics.</p>
<h2 id="functional-but-object-oriented">Functional but Object Oriented</h2>
<p>Both Scala and F# claim to be primarily functional languages but are also fully
object oriented. While F# is essentially OCaml.NET and Clojure is basically
Lisp for the JVM, Scala is a completely new invention. Scala also strikes
me as <em>more</em> object oriented than F#.</p>
<p>For instance, Scala includes both <a href="http://www.scala-lang.org/node/117">mixins</a> and <a href="http://jamesgolick.com/2010/2/8/monkey-patching-single-responsibility-principle-and-scala-implicits.html">monkey patching</a>. On
the other hand, F# only has monkey patching. Both concepts I learned from Ruby and
I associate with pretentious arguments about “which is more OO”. With that
said, I love the fact that Scala has mixins. It’s a much cleaner dependency
injection technique than IoC containers (which is how we did it in C#).</p>
<h2 id="functions">Functions</h2>
<p>Given F#’s OCaml ancestory, it tends to define methods in an ML-like way. For
example, an <code class="language-plaintext highlighter-rouge">add</code> function in F#:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">let</span> <span class="n">add</span> <span class="n">a</span> <span class="n">b</span> <span class="p">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span></code></pre></figure>
<p>In the spirit of OCaml, this has a signature that looks something like</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="kt">int</span> <span class="p">-></span> <span class="kt">int</span> <span class="p">-></span> <span class="kt">int</span></code></pre></figure>
<p>which means, “a function that takes <code class="language-plaintext highlighter-rouge">int</code> and returns a function that takes
an <code class="language-plaintext highlighter-rouge">int</code> and returns an <code class="language-plaintext highlighter-rouge">int</code>”. This plays perfectly into function currying and
partial function application where you might apply one argument at a time:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="c1">// add1 has type of int -> int</span>
<span class="k">let</span> <span class="n">add1</span> <span class="p">=</span> <span class="n">add</span> <span class="mi">3</span>
<span class="c1">// result is 7</span>
<span class="k">let</span> <span class="n">result</span> <span class="p">=</span> <span class="n">add1</span> <span class="mi">4</span></code></pre></figure>
<p>Scala also has currying & partial function application, but it’s less structured.
While F# functions are curried by default and ready for partial function
application, Scala functions aren’t but can easily be curried on demand:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">def</span> <span class="nf">add</span><span class="o">(</span><span class="n">a</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">b</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span> <span class="k">=</span> <span class="n">a</span> <span class="o">+</span> <span class="n">b</span>
<span class="k">val</span> <span class="nv">add1</span> <span class="k">=</span> <span class="nf">add</span><span class="o">(</span><span class="k">_</span><span class="o">,</span> <span class="mi">3</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">result</span> <span class="k">=</span> <span class="nf">add1</span><span class="o">(</span><span class="mi">4</span><span class="o">)</span></code></pre></figure>
<p>Most of the time you don’t <em>need</em> function currying, so I like that Scala makes
functions more familiar. But at the same time, currying isn’t hard in Scala,
since there’s a native syntax for applying only some arguments via a pick-n-choose
templating style.</p>
<h2 id="f-is-stricter-fp">F# Is Stricter FP</h2>
<p>F#’s ML-style of function definitions that are curried by default makes for a
more pure functional style. In F#, partial function application is used everywhere,
so when doing <code class="language-plaintext highlighter-rouge">List</code> operations these functions are implemented in separate
modules and “pipelined” using the <code class="language-plaintext highlighter-rouge">|></code> operator:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="p">[</span><span class="mi">2</span><span class="p">;</span> <span class="mi">3</span><span class="p">;</span> <span class="mi">5</span><span class="p">;</span> <span class="mi">8</span><span class="p">]</span> <span class="p">|></span> <span class="nn">List</span><span class="p">.</span><span class="n">map</span> <span class="p">(</span><span class="k">fun</span> <span class="n">x</span> <span class="p">-></span> <span class="n">x</span> <span class="p">*</span> <span class="n">x</span><span class="p">)</span> <span class="p">|></span> <span class="nn">List</span><span class="p">.</span><span class="n">filter</span> <span class="p">(</span><span class="k">fun</span> <span class="n">x</span> <span class="p">-></span> <span class="n">x</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span></code></pre></figure>
<p>Result:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="p">[</span><span class="mi">4</span><span class="p">;</span> <span class="mi">64</span><span class="p">]</span></code></pre></figure>
<p>On the other hand, Scala implements these methods as traits that are “mixed into”
<code class="language-plaintext highlighter-rouge">List</code>. Here’s the same example in Scala:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="nc">List</span><span class="o">(</span><span class="mi">2</span><span class="o">,</span> <span class="mi">3</span><span class="o">,</span> <span class="mi">5</span><span class="o">,</span> <span class="mi">8</span><span class="o">).</span><span class="py">map</span><span class="o">(</span><span class="n">x</span> <span class="k">=></span> <span class="n">x</span> <span class="o">*</span> <span class="n">x</span><span class="o">).</span><span class="py">filter</span><span class="o">(</span><span class="n">x</span> <span class="k">=></span> <span class="n">x</span> <span class="o">%</span> <span class="mi">2</span> <span class="o">==</span> <span class="mi">0</span><span class="o">)</span></code></pre></figure>
<p>I like to say that this means F# is more “pure” functional programming.
I say this mainly because Scala chooses to use methods instead of plain functions in
cases like this. I’m not sure if this actually makes F# “better”, but it is
notable.</p>
<h2 id="discriminated-unions-vs-case-classes">Discriminated Unions vs. Case Classes</h2>
<p>This is a very powerful concept in both languages. You can’t say you’ve mastered
either language until you’ve learned how to use them effectively. However, they’re not
equal concepts. Here’s a quick overview:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">type</span> <span class="nc">DimmerValue</span> <span class="p">=</span>
<span class="p">|</span> <span class="nc">On</span>
<span class="p">|</span> <span class="nc">Off</span>
<span class="p">|</span> <span class="nc">Dim</span> <span class="k">of</span> <span class="kt">int</span>
<span class="k">let</span> <span class="n">value</span> <span class="p">=</span> <span class="nc">Dim</span><span class="p">(</span><span class="mi">50</span><span class="p">)</span>
<span class="k">match</span> <span class="n">value</span> <span class="k">with</span>
<span class="p">|</span> <span class="nc">On</span> <span class="p">-></span> <span class="n">printf</span> <span class="s2">"it's on!"</span>
<span class="p">|</span> <span class="nc">Off</span> <span class="p">-></span> <span class="n">printf</span> <span class="s2">"it's off!"</span>
<span class="p">|</span> <span class="nc">Dim</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="p">-></span> <span class="n">printf</span> <span class="s2">"romantically lit at %i"</span> <span class="n">v</span></code></pre></figure>
<p>And the equivalent Scala code:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">sealed</span> <span class="k">abstract</span> <span class="k">class</span> <span class="nc">DimmerValue</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">On</span><span class="o">()</span> <span class="k">extends</span> <span class="nc">DimmerValue</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Off</span><span class="o">()</span> <span class="k">extends</span> <span class="nc">DimmerValue</span>
<span class="k">case</span> <span class="k">class</span> <span class="nc">Dim</span><span class="o">(</span><span class="n">value</span><span class="k">:</span> <span class="kt">Int</span><span class="o">)</span> <span class="k">extends</span> <span class="nc">DimmerValue</span>
<span class="k">val</span> <span class="nv">value</span> <span class="k">=</span> <span class="nc">Dim</span><span class="o">(</span><span class="mi">50</span><span class="o">)</span>
<span class="n">value</span> <span class="k">match</span> <span class="o">{</span>
<span class="k">case</span> <span class="nc">On</span> <span class="k">=></span> <span class="nf">printf</span><span class="o">(</span><span class="s">"it's on!"</span><span class="o">)</span>
<span class="k">case</span> <span class="nc">Off</span> <span class="k">=></span> <span class="nf">printf</span><span class="o">(</span><span class="s">"it's off!"</span><span class="o">)</span>
<span class="k">case</span> <span class="nc">Dim</span><span class="o">(</span><span class="n">v</span><span class="o">)</span> <span class="k">=></span> <span class="nf">printf</span><span class="o">(</span><span class="n">s</span><span class="s">"romantically lit at $v"</span><span class="o">)</span>
<span class="o">}</span></code></pre></figure>
<p>The first point to contrast is that scala case classes are just a class hierarchy,
whereas F# unions appear more like C enums but with different “shape”. In reality,
F# unions are actually implemented as a class hierarchy, like Scala.</p>
<p>In F#, all known values of the union must be declared in one place. However, Scala’s
class hierarchy approach means that you could define more values in other files or
JARs. This is the default behavior, but I included the <code class="language-plaintext highlighter-rouge">sealed</code> keyword which limits
definitions to the same file.</p>
<p>This seems like a bad default behavior to have. If the compiler doesn’t know all
possible values of a union, how can it determine correctness in a <code class="language-plaintext highlighter-rouge">match</code> statement?
There’s definitely some loss of type safety there, but it is only a default, so
I shouldn’t complain too much.</p>
<p>Beyond that issue, there is F#’s concept of record types. They’re immutable
classes that can’t be inherited and have special semantics for copying:</p>
<figure class="highlight"><pre><code class="language-fsharp" data-lang="fsharp"><span class="k">type</span> <span class="nc">Person</span> <span class="p">=</span> <span class="p">{</span> <span class="n">name</span><span class="p">:</span> <span class="kt">string</span><span class="p">;</span> <span class="n">age</span><span class="p">:</span> <span class="kt">int</span><span class="p">,</span> <span class="n">ssn</span><span class="p">:</span> <span class="kt">string</span> <span class="p">}</span>
<span class="k">let</span> <span class="n">person</span> <span class="p">=</span> <span class="p">{</span> <span class="n">name</span> <span class="p">=</span> <span class="s2">"Tim"</span><span class="p">;</span> <span class="n">age</span> <span class="p">=</span> <span class="mi">28</span><span class="p">;</span> <span class="n">ssn</span> <span class="p">=</span> <span class="s2">"123-45-6789"</span> <span class="p">}</span>
<span class="k">let</span> <span class="n">olderPerson</span> <span class="p">=</span> <span class="p">{</span> <span class="n">person</span> <span class="k">with</span> <span class="n">age</span> <span class="p">=</span> <span class="mi">31</span> <span class="p">}</span></code></pre></figure>
<p>Scala doesn’t seem to have a record type concept. Instead, case classes are reused
for the same purpose. All case classes automatically get a <code class="language-plaintext highlighter-rouge">copy</code> method mixed in:</p>
<figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span class="k">case</span> <span class="k">class</span> <span class="nc">Person</span><span class="o">(</span><span class="n">name</span><span class="k">:</span> <span class="kt">String</span><span class="o">,</span> <span class="n">age</span><span class="k">:</span> <span class="kt">Int</span><span class="o">,</span> <span class="n">ssn</span><span class="k">:</span> <span class="kt">String</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">person</span> <span class="k">=</span> <span class="nc">Person</span><span class="o">(</span><span class="s">"Tim"</span><span class="o">,</span> <span class="mi">28</span><span class="o">,</span> <span class="s">"123-45-6789"</span><span class="o">)</span>
<span class="k">val</span> <span class="nv">oderPerson</span> <span class="k">=</span> <span class="nv">person</span><span class="o">.</span><span class="py">copy</span><span class="o">(</span><span class="n">age</span> <span class="k">=></span> <span class="mi">31</span><span class="o">)</span></code></pre></figure>
<p>I’m still undecided on whether I like how Scala merges the concepts. On one level,
it’s simpler since there appears to be less concepts to learn. But on another
level, the semantics are broken - if you want a record type you have to define a
“case class” which infers that you’d normally use it like an enum.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Scala is a more approachable language than F# but F# has a stronger
sense of type safety. F# also has a much stronger type inference system, which leads to
less type annotations. Regardless, I think Scala will recieve a much broader uptake
given that it has a much more familiar syntax to C/C++/Java/C# developers. On some
level, I like to think of Scala as being more of “a better C#” than “like F#”. Each
will have it’s uses, but I think Scala will go far because of that.</p>
The Single Point of Failure2013-06-09T00:00:00+00:00http://timkellogg.me/blog/2013/06/09/dist-sys-antipatterns<p>Recently I’ve been mentoring a startup in the Boulder area that processes
large amounts of data real time. They have a <a href="http://www.javaworld.com/javaworld/jw-06-2005/jw-0613-soa.html">Service Oriented Achitecture</a>
in which backend services do most of the data processing. While they were still
in beta they were getting spikes of traffic, which led us to a conversation
that went like:</p>
<p>Recently I’ve been mentoring a startup in the Boulder area that processes
large amounts of data real time. They have a <a href="http://www.javaworld.com/javaworld/jw-06-2005/jw-0613-soa.html">Service Oriented Achitecture</a>
in which backend services do most of the data processing. While they were still
in beta they were getting spikes of traffic, which led us to a conversation
that went like:</p>
<p><a href="http://cmx.io/#5745532"><img src="/images/master-slave-problems.png" alt="SOA infrastructure with single point of failure" /></a></p>
<h2 id="intro-to-distributed-systems">Intro to Distributed Systems</h2>
<p>The architecture above is the naive approach when designing
your first distributed system. There are 2+ web
servers to handle traffic that gets funneled into a single “master service”.
As the cartoon points out, this is an inherent bottleneck. The diagram has an
hour glass shape, indicating where the bottleneck is. If traffic spikes, the
master will fall over and the slave functionality will be inaccessible until
the master comes back online.</p>
<p>The fact that the master is manually configured as master is the source of
many problems. If the master dies, none of the slaves have the latitude to
step up and become master, so you have to wait for the sysadmin to manually
bring the master back online. There’s a quick solution to this.</p>
<h2 id="a-less-naive-solution">A Less Naive Solution</h2>
<p>MongoDB solves this problem by automatically electing a new master. It has
replication in place such that a majority of nodes should have the latest
changes. <em>(Note: this isn’t actually true, which is why MongoDB has been
under a lot of scrutiny lately; assume for now that it is true)</em></p>
<p>In MongoDB, when a master dies, the slaves automatically detect the failure
and initiate an election for a new master.
Depending on the implementation and circumstances, the time it takes to
detect the failure in the master until a new master is elected and operating
can be anywhere from 1-2 seconds all the way up to minutes. (God help us if
we’re completely inoperable for entire minutes).</p>
<p>There are mainly two problems with this architecture. First, the cluster can’t
do anything while it has no master. The master is required to coordinate
load distribution (efficiency) and consistency - two attributes that are
crucial to most distributed systems. Until there’s another master, we can’t
guarantee consistency, and we have no way to distribute work fairly, so the
whole cluster is left idle.</p>
<p>The second problem is that masters are inherent bottlenecks. In the case of
the “master service” in the comic, the master is keeping track of traffic
and usage stats and distributing work accordingly. Another way to say that is
“the master is keeping the distribution of load <strong>consistent</strong>”. In this
architecture, all information that affects consistency (new jobs coming in)
must be funneled throught the master. Therefore, the entire system is limited
by how fast the master can distribute work.</p>
<h2 id="the-optimal-approach">The Optimal Approach</h2>
<p>There best way to solve this problem is to make it operate without a master.
There is several ways to do this, but I’m most fond of how Cassandra does it.
A Cassandra cluster is setup in a ring - so called because all nodes are
considered equal to each other (think King Aurthur’s round table). When a
client wants to connect to a Cassandra cluster, it connects to <em>any</em> node in
the ring. All <code class="language-plaintext highlighter-rouge">create</code>, <code class="language-plaintext highlighter-rouge">update</code>, or <code class="language-plaintext highlighter-rouge">delete</code> operations are replicated to all
other nodes, so every node contains a full view of the data.</p>
<p>Contrast the ring architecture with the master-slave architecture:</p>
<table class="table table-bordered">
<thead>
<tr>
<td> </td>
<th>Master-Slave</th>
<th>Ring</th>
</tr>
</thead>
<tbody>
<tr>
<th>Connect to</th>
<td>Master for writes;<br /> Any node for reads</td>
<td>Any node for writes or reads</td>
</tr>
<tr>
<th>When node dies</th>
<td>Wait for reelection</td>
<td>Connect a different node</td>
</tr>
<tr>
<th>When we need more throughput</th>
<td>N/A</td>
<td>Connect to another node</td>
</tr>
</tbody>
</table>
<p>If we ever need the cluster to do more work, we just add another node. This is
why Cassandra can claim linear scaling. As the amount of work increases, the
amount of resources Cassandra needs to handle the work also increases linearly.
This is ideal (unless someone knows how to scale hyperbolically).</p>
<p>In our data processing example in the comic, the ring architecture means that
the Web Servers (clients) connect to any of the workers (slaves) directly;
there is no master. If the worker is processing too much work, it redirects the
Web Server (client) to another worker. All workers replicate metadata about
their knowledge of the cluster to all other workers. The metadata would
probably include a list of all workers along with their current loads and
capacities.</p>
<h2 id="summary">Summary</h2>
<p>To bring it all back together, using a master-slave architecture in a
distributed system is an anti-pattern. It introduces bottlenecks and potential
for disrupting the entire system. While it seems to make sense at first, it’s
more destructive than helpful. Consider using an alternative to master-slave
architecture. One such alternative is the Ring that Cassandra uses.</p>
Value Types and Memory Usage2012-11-28T00:00:00+00:00http://timkellogg.me/blog/2012/11/28/sorting-on-value-types<p>Last week a respected colleague mentioned off hand that sorting on a value type takes a lot of memory in
C#. Interested, I looked into this to see why/when this is true.</p>
<p>Value types (using the <code class="language-plaintext highlighter-rouge">struct</code> keyword) are always passed by value, unlike reference types (<code class="language-plaintext highlighter-rouge">class</code>
keyword) which are always passed by reference. This means that every time you pass them into a method, the
whole value is copied; whereas with reference types, only the reference (pointer) is copied. Pointers are
4 to 8 bytes, so his original statement is only of concern if your value types are larger than that.
Some such types are DateTime, Guid, and BsonObjectId.</p>
<p>Some people like to think of value types as being allocated on the stack (versus the heap). In C#, <a href="http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx">this
is irrelevant</a>. The CLR allocates value and reference types wherever it feels like. Usually, local variables
and parameters are stored on the stack (or registers) and values that are members of a class are usually
allocated on the heap. It was done this way because the folks who wrote the CLR believe they can do a
good enough job of optimizing stack and heap usage, so you shouldn’t worry about it. If you’re in C#, you
shouldn’t care where they’re allocated. If you’re doing something that requires you to care, you need to
either break into an <a href="http://msdn.microsoft.com/en-us/library/t2yzs44b.aspx">unsafe C# code block</a> or <a href="http://msdn.microsoft.com/en-us/library/aa288468(v=vs.71).aspx">C++</a>.</p>
<p>As for his actual statement – yes, using Base Class Library algorithms for sorting on value types will
take more memory for value types than reference types because it has to copy values. However, there are
exceptions to this.</p>
<p>You can always write method parameters with the <code class="language-plaintext highlighter-rouge">ref</code> keyword so they’re passed by reference. This would
fix the problem of copying, but the all of the BCL classes<a href="#gen">*</a> are written generically by using <code class="language-plaintext highlighter-rouge">IComparable</code>
or some other interface. When you cast a value type like an <code class="language-plaintext highlighter-rouge">Int32</code> to an interface like <code class="language-plaintext highlighter-rouge">IComparable</code>, it
has to be boxed into a reference type. When boxing, the CLR allocates a managed reference type object
and then copies the <code class="language-plaintext highlighter-rouge">Int32</code> value into the managed container. It copies the value again when unboxing.</p>
<p>In summary, sorting on a value type can take quite a bit more memory than sorting on reference types.
However, it is possible to write your own sorting algorithm that always passes by reference and doesn’t
use any additional memory (but who does that?).</p>
<p>Last week a respected colleague mentioned off hand that sorting on a value type takes a lot of memory in
C#. Interested, I looked into this to see why/when this is true.</p>
<p>Value types (using the <code class="language-plaintext highlighter-rouge">struct</code> keyword) are always passed by value, unlike reference types (<code class="language-plaintext highlighter-rouge">class</code>
keyword) which are always passed by reference. This means that every time you pass them into a method, the
whole value is copied; whereas with reference types, only the reference (pointer) is copied. Pointers are
4 to 8 bytes, so his original statement is only of concern if your value types are larger than that.
Some such types are DateTime, Guid, and BsonObjectId.</p>
<p>Some people like to think of value types as being allocated on the stack (versus the heap). In C#, <a href="http://blogs.msdn.com/b/ericlippert/archive/2009/04/27/the-stack-is-an-implementation-detail.aspx">this
is irrelevant</a>. The CLR allocates value and reference types wherever it feels like. Usually, local variables
and parameters are stored on the stack (or registers) and values that are members of a class are usually
allocated on the heap. It was done this way because the folks who wrote the CLR believe they can do a
good enough job of optimizing stack and heap usage, so you shouldn’t worry about it. If you’re in C#, you
shouldn’t care where they’re allocated. If you’re doing something that requires you to care, you need to
either break into an <a href="http://msdn.microsoft.com/en-us/library/t2yzs44b.aspx">unsafe C# code block</a> or <a href="http://msdn.microsoft.com/en-us/library/aa288468(v=vs.71).aspx">C++</a>.</p>
<p>As for his actual statement – yes, using Base Class Library algorithms for sorting on value types will
take more memory for value types than reference types because it has to copy values. However, there are
exceptions to this.</p>
<p>You can always write method parameters with the <code class="language-plaintext highlighter-rouge">ref</code> keyword so they’re passed by reference. This would
fix the problem of copying, but the all of the BCL classes<a href="#gen">*</a> are written generically by using <code class="language-plaintext highlighter-rouge">IComparable</code>
or some other interface. When you cast a value type like an <code class="language-plaintext highlighter-rouge">Int32</code> to an interface like <code class="language-plaintext highlighter-rouge">IComparable</code>, it
has to be boxed into a reference type. When boxing, the CLR allocates a managed reference type object
and then copies the <code class="language-plaintext highlighter-rouge">Int32</code> value into the managed container. It copies the value again when unboxing.</p>
<p>In summary, sorting on a value type can take quite a bit more memory than sorting on reference types.
However, it is possible to write your own sorting algorithm that always passes by reference and doesn’t
use any additional memory (but who does that?).</p>
<h3 id="notes">Notes</h3>
<p>* <em>One might point out that generic classes like <code class="language-plaintext highlighter-rouge">List<int></code> have a <a href="http://msdn.microsoft.com/en-us/library/b0zbh7b6.aspx"><code class="language-plaintext highlighter-rouge">Sort()</code></a> method. However, this casts <code class="language-plaintext highlighter-rouge">int</code> to <code class="language-plaintext highlighter-rouge">IComparable</code> while sorting.</em></p>
Jump-Location: autojump for Windows2012-08-21T00:00:00+00:00http://timkellogg.me/blog/2012/08/21/introducing-Jump-Location<p>A while ago I discovered <a href="https://github.com/joelthelion/autojump/wiki/">autojump</a> and quickly realized that it could
change how I use a console. Autojump listens when you change directories
and keeps an index of the directories where you spend the most time. The <code class="language-plaintext highlighter-rouge">j</code>
command lets you search the index and <code class="language-plaintext highlighter-rouge">cd</code> to the most relevant search
result. It’s best if you just watch this video:</p>
<div>
<iframe width="420" height="315" src="http://www.youtube.com/embed/tnNyoMGnbKg" frameborder="0" allowfullscreen=""> </iframe>
</div>
<h2 id="introducing-autojump-for-windows-via-powershell">Introducing Autojump for Windows (via Powershell)</h2>
<p><a href="https://github.com/tkellogg/Jump-Location">Jump-Location</a> is a Powershell implementation of autojump that I’ve
been working on. It does most everything that autojump does, but better.</p>
<p>For instance, after using the <code class="language-plaintext highlighter-rouge">j</code> Powershell cmdlet for a while, I
quickly realized that I wanted to use it for more than a <code class="language-plaintext highlighter-rouge">cd</code> command.
I like using <code class="language-plaintext highlighter-rouge">pushd</code> and <code class="language-plaintext highlighter-rouge">popd</code>, so I made a <code class="language-plaintext highlighter-rouge">pushj</code> alias that uses
<code class="language-plaintext highlighter-rouge">pushd</code> (<code class="language-plaintext highlighter-rouge">Push-Location</code>) instead of <code class="language-plaintext highlighter-rouge">cd</code> (<code class="language-plaintext highlighter-rouge">Set-Location</code>).</p>
<p>I also realized that as a Windows user, you inevitably have to use Windows
Explorer for things like TortoiseSVN checkins. But mousing through the
folder tree is a pain, so I made the <code class="language-plaintext highlighter-rouge">xj</code> alias to query <code class="language-plaintext highlighter-rouge">Jump-Location</code>
and open up <code class="language-plaintext highlighter-rouge">explorer</code> to the result.</p>
<p>You can now use <code class="language-plaintext highlighter-rouge">Jump-Location</code> in conjunction with any command. I can
use the <code class="language-plaintext highlighter-rouge">getj</code> alias to open a file in notepad:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PS> notepad "$(getj ju)\Readme.md"
</code></pre></div></div>
<h2 id="enhancements-to-jumpstat">Enhancements to jumpstat</h2>
<p>Autojump provides a <code class="language-plaintext highlighter-rouge">jumpstat</code> command to display the index (and debug
why you didn’t get the directory you expected). <code class="language-plaintext highlighter-rouge">Jump-Location</code> also
provides this command (as the <code class="language-plaintext highlighter-rouge">Get-JumpStatus</code> cmdlet alias).</p>
<p>Since Powershell deals in actual objects instead of text, the design of
<code class="language-plaintext highlighter-rouge">jumpstat</code> is a lot different from the original. This really comes out
when changing the weights in the index. The documentation for the
original instructs you to edit <code class="language-plaintext highlighter-rouge">~/autojump.txt</code>. While we still store
the index in a text file, you can just set the weight and save from
within Powershell.</p>
<p>For instance, setting a weight to a negative number will remove it from
search results:</p>
<figure class="highlight"><pre><code class="language-powershell" data-lang="powershell"><span class="n">PS</span><span class="err">></span><span class="w"> </span><span class="nv">$record</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">jumpstat</span><span class="w"> </span><span class="nx">je</span><span class="w"> </span><span class="nx">bin</span><span class="w">
</span><span class="n">PS</span><span class="err">></span><span class="w"> </span><span class="nv">$record</span><span class="o">.</span><span class="nf">weight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nt">-1</span><span class="w">
</span><span class="n">PS</span><span class="err">></span><span class="w"> </span><span class="nx">jumpstat</span><span class="w"> </span><span class="nt">-Save</span></code></pre></figure>
<h2 id="go-try-it">Go Try It!</h2>
<p>I highly recommend installing <code class="language-plaintext highlighter-rouge">Jump-Location</code>. Head on over to the
<a href="https://github.com/tkellogg/Jump-Location/downloads">downloads area</a> and grab the latest zip file. Running <code class="language-plaintext highlighter-rouge">Install.ps1</code> will
register <code class="language-plaintext highlighter-rouge">Jump-Location</code> in all future Powershell sessions.</p>
<p>A while ago I discovered <a href="https://github.com/joelthelion/autojump/wiki/">autojump</a> and quickly realized that it could
change how I use a console. Autojump listens when you change directories
and keeps an index of the directories where you spend the most time. The <code class="language-plaintext highlighter-rouge">j</code>
command lets you search the index and <code class="language-plaintext highlighter-rouge">cd</code> to the most relevant search
result. It’s best if you just watch this video:</p>
<div>
<iframe width="420" height="315" src="http://www.youtube.com/embed/tnNyoMGnbKg" frameborder="0" allowfullscreen=""> </iframe>
</div>
<h2 id="introducing-autojump-for-windows-via-powershell">Introducing Autojump for Windows (via Powershell)</h2>
<p><a href="https://github.com/tkellogg/Jump-Location">Jump-Location</a> is a Powershell implementation of autojump that I’ve
been working on. It does most everything that autojump does, but better.</p>
<p>For instance, after using the <code class="language-plaintext highlighter-rouge">j</code> Powershell cmdlet for a while, I
quickly realized that I wanted to use it for more than a <code class="language-plaintext highlighter-rouge">cd</code> command.
I like using <code class="language-plaintext highlighter-rouge">pushd</code> and <code class="language-plaintext highlighter-rouge">popd</code>, so I made a <code class="language-plaintext highlighter-rouge">pushj</code> alias that uses
<code class="language-plaintext highlighter-rouge">pushd</code> (<code class="language-plaintext highlighter-rouge">Push-Location</code>) instead of <code class="language-plaintext highlighter-rouge">cd</code> (<code class="language-plaintext highlighter-rouge">Set-Location</code>).</p>
<p>I also realized that as a Windows user, you inevitably have to use Windows
Explorer for things like TortoiseSVN checkins. But mousing through the
folder tree is a pain, so I made the <code class="language-plaintext highlighter-rouge">xj</code> alias to query <code class="language-plaintext highlighter-rouge">Jump-Location</code>
and open up <code class="language-plaintext highlighter-rouge">explorer</code> to the result.</p>
<p>You can now use <code class="language-plaintext highlighter-rouge">Jump-Location</code> in conjunction with any command. I can
use the <code class="language-plaintext highlighter-rouge">getj</code> alias to open a file in notepad:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PS> notepad "$(getj ju)\Readme.md"
</code></pre></div></div>
<h2 id="enhancements-to-jumpstat">Enhancements to jumpstat</h2>
<p>Autojump provides a <code class="language-plaintext highlighter-rouge">jumpstat</code> command to display the index (and debug
why you didn’t get the directory you expected). <code class="language-plaintext highlighter-rouge">Jump-Location</code> also
provides this command (as the <code class="language-plaintext highlighter-rouge">Get-JumpStatus</code> cmdlet alias).</p>
<p>Since Powershell deals in actual objects instead of text, the design of
<code class="language-plaintext highlighter-rouge">jumpstat</code> is a lot different from the original. This really comes out
when changing the weights in the index. The documentation for the
original instructs you to edit <code class="language-plaintext highlighter-rouge">~/autojump.txt</code>. While we still store
the index in a text file, you can just set the weight and save from
within Powershell.</p>
<p>For instance, setting a weight to a negative number will remove it from
search results:</p>
<figure class="highlight"><pre><code class="language-powershell" data-lang="powershell"><span class="n">PS</span><span class="err">></span><span class="w"> </span><span class="nv">$record</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">jumpstat</span><span class="w"> </span><span class="nx">je</span><span class="w"> </span><span class="nx">bin</span><span class="w">
</span><span class="n">PS</span><span class="err">></span><span class="w"> </span><span class="nv">$record</span><span class="o">.</span><span class="nf">weight</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nt">-1</span><span class="w">
</span><span class="n">PS</span><span class="err">></span><span class="w"> </span><span class="nx">jumpstat</span><span class="w"> </span><span class="nt">-Save</span></code></pre></figure>
<h2 id="go-try-it">Go Try It!</h2>
<p>I highly recommend installing <code class="language-plaintext highlighter-rouge">Jump-Location</code>. Head on over to the
<a href="https://github.com/tkellogg/Jump-Location/downloads">downloads area</a> and grab the latest zip file. Running <code class="language-plaintext highlighter-rouge">Install.ps1</code> will
register <code class="language-plaintext highlighter-rouge">Jump-Location</code> in all future Powershell sessions.</p>
How to use AutoFactories in StructureMap2012-06-12T00:00:00+00:00http://timkellogg.me/blog/2012/06/12/AutoFactories-In-StructureMap<p>While watching the <a href="https://groups.google.com/forum/?fromgroups#!forum/structuremap-users">StructureMap discussion on google groups</a>, a user wanted to do AutoFactories in
StructureMap, something they were able to do in Castle.Windsor. I didn’t
know what they were so I had to look through the code plus documentation of the Castle.Windsor feature. It turns
out that an AutoFactory is basically a specialized service locator that has no direct dependency on any kind
of container. You write an interface that has methods to get instances from the container - but you let
StructureMap generate the implementation of this interface. Sound funny? Let me show you…</p>
<p>While watching the <a href="https://groups.google.com/forum/?fromgroups#!forum/structuremap-users">StructureMap discussion on google groups</a>, a user wanted to do AutoFactories in
StructureMap, something they were able to do in Castle.Windsor. I didn’t
know what they were so I had to look through the code plus documentation of the Castle.Windsor feature. It turns
out that an AutoFactory is basically a specialized service locator that has no direct dependency on any kind
of container. You write an interface that has methods to get instances from the container - but you let
StructureMap generate the implementation of this interface. Sound funny? Let me show you…</p>
<h2 id="example-a-plugin-framework">Example: A Plugin Framework</h2>
<p>The first time I needed an AutoFactory was when I needed to create a plugin framework. The idea is that, if
you want to execute some code on a specific event, you create a class that implements <code class="language-plaintext highlighter-rouge">IPlugin</code> and register
several implementations with the IoC container:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">public</span> <span class="k">interface</span> <span class="nc">IPlugin</span>
<span class="p">{</span>
<span class="k">void</span> <span class="nf">Execute</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p><em>Note: I’m simplifying this quite a bit. The actual plugin framework has more complexity, but it esentially
boils down to this.</em></p>
<p>We created a plugin controller to execute all plugins and handle failures. Our initial implementation
looked something like this:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">public</span> <span class="k">class</span> <span class="nc">PluginController</span> <span class="p">:</span> <span class="n">IPluginController</span>
<span class="p">{</span>
<span class="k">private</span> <span class="k">readonly</span> <span class="n">IList</span><span class="p"><</span><span class="n">IPlugin</span><span class="p">></span> <span class="n">plugins</span><span class="p">;</span>
<span class="k">public</span> <span class="nf">PluginController</span><span class="p">(</span><span class="n">IList</span><span class="p"><</span><span class="n">IPlugin</span><span class="p">></span> <span class="n">plugins</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="n">plugins</span> <span class="p">=</span> <span class="n">plugins</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">Execute</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">foreach</span><span class="p">(</span><span class="kt">var</span> <span class="n">plugin</span> <span class="k">in</span> <span class="n">plugins</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">plugin</span><span class="p">.</span><span class="nf">Execute</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>When you take any sort of <code class="language-plaintext highlighter-rouge">IEnumerable</code> through the constructor, StructureMap (or any IoC container) will
give you a list of all registered instances of that type. This is similar to when you call
<code class="language-plaintext highlighter-rouge">container.GetAllInstances<IPlugin>()</code>.</p>
<p>The main problem we were running into is that we wanted to use <code class="language-plaintext highlighter-rouge">UserRepository</code> from a plugin, but we
also wanted to execute plugins from within a <code class="language-plaintext highlighter-rouge">UserRepository</code>. This introduces an interesting dependancy
chain because (1) the controller requires (2) a plugin which requires (3) a repository which in turn
requires (1) a controller.</p>
<p>This is a circular dependency. StructureMap can’t instantiate that graph bcause it can’t create a controller
without a controller already having been created (chicken and egg problem). StructureMap allows you to solve
this problem through property injection. This means that you create a constructor with less dependancies than
the class requires (a controller without a list of plugins or a plugin without a repository) and fill this
dependency after instantiation via setting a property. I don’t like property injection because
it’s really just a bandaid over the real problem - you really shouldn’t ever need circular dependencies.</p>
<p>In our case we were able to use an AutoFactory:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">public</span> <span class="k">interface</span> <span class="nc">IPluginFactory</span>
<span class="p">{</span>
<span class="n">IList</span><span class="p"><</span><span class="n">IPlugin</span><span class="p">></span> <span class="nf">GetPlugins</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>We then register this interface like this:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="n">For</span><span class="p"><</span><span class="n">IPluginFactory</span><span class="p">>().</span><span class="nf">CreateFactory</span><span class="p">();</span></code></pre></figure>
<p>There is no implementation of this interface. The <code class="language-plaintext highlighter-rouge">CreateFactory()</code> extension method means that StructureMap
will create a <a href="http://kozmic.pl/dynamic-proxy-tutorial/">dynamic proxy</a> object that has a one-liner implementation of <code class="language-plaintext highlighter-rouge">GetPlugins</code> that just returns
<code class="language-plaintext highlighter-rouge">ObjectFactory.GetAllInstances<IPlugin>()</code>.</p>
<p>With this fancy new <code class="language-plaintext highlighter-rouge">IPluginFactory</code>, we change <code class="language-plaintext highlighter-rouge">PluginController</code> to use it:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">public</span> <span class="k">class</span> <span class="nc">PluginController</span> <span class="p">:</span> <span class="n">IPluginController</span>
<span class="p">{</span>
<span class="k">private</span> <span class="k">readonly</span> <span class="n">IPluginFactory</span> <span class="n">pluginFactory</span><span class="p">;</span>
<span class="k">public</span> <span class="nf">PluginController</span><span class="p">(</span><span class="n">IPluginFactory</span> <span class="n">pluginFactory</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="n">pluginFactory</span> <span class="p">=</span> <span class="n">pluginFactory</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">Execute</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">foreach</span><span class="p">(</span><span class="kt">var</span> <span class="n">plugin</span> <span class="k">in</span> <span class="n">pluginFactory</span><span class="p">.</span><span class="nf">GetPlugins</span><span class="p">())</span>
<span class="p">{</span>
<span class="n">plugin</span><span class="p">.</span><span class="nf">Execute</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>This new implementation isn’t really any more complex, but it solves two problems. First, you no longer
have to think about circular dependencies. This is great if you’re letting third parties develop these
plugins - you don’t have to inform them how your application is structured, only what the interfaces are.
Second, you also decouple the lifespan of each plugin object from the lifespan of the <code class="language-plaintext highlighter-rouge">PluginController</code>.</p>
<h2 id="its-a-service-locator-but-not-an-anti-pattern">It’s a Service Locator, But Not An Anti-Pattern</h2>
<p>Now, you may be cringing at the idea that I might be advocating the use of the <a href="http://commonservicelocator.codeplex.com/">service locator</a>
<a href="http://blog.ploeh.dk/2010/02/03/ServiceLocatorIsAnAntiPattern.aspx">anti-pattern</a>. Or at least you should be! Sevice locators should be avoided because they hide
dependencies (especially if you use a static service locator instead of building the whole object
graph). Also, having a hard dependency on the IoC container couples your application to the container –
kind of ruins the point of using IoC in the first place.</p>
<p>Most of the time when we’re using the IoC pattern we try to create the whole object graph all at once
because it clearly shows dependencies. Sometimes, as in the plugin example, we need to break off part
of the object graph and create it separately. There are lots of legitimate reasons to do this, plugins
are only one. When you run into a situation like this, the AutoFactory makes it possible and clean.</p>
<p>Martin Fowler actually <a href="http://martinfowler.com/articles/injection.html#UsingAServiceLocator">encourages the usage of service locators</a> but warns that they can be implemented
badly. His main concern is that the implementation isn’t decoupled from the usage with an interface (I’ve
seen static service locators cause huge problems). Honestly, I think the AutoFactory is a great example
of a legitimate use of a service locator pattern. Maybe it’s not really an anti-pattern after all…</p>
Trappings: An easier way to do functional testing2012-06-10T00:00:00+00:00http://timkellogg.me/blog/2012/06/10/Trappings<p>I’ve spent the last couple weeks piecing together a testing utility to fill a need. The problem is that we
need to run functional and integration tests that hit the database, but it’s actually quite difficult.
There’s a few techniques that are traditionally used for setting up test data for automated tests.</p>
<p>I’ve spent the last couple weeks piecing together a testing utility to fill a need. The problem is that we
need to run functional and integration tests that hit the database, but it’s actually quite difficult.
There’s a few techniques that are traditionally used for setting up test data for automated tests.</p>
<p>One possible solution is you can setup a script that populates the database before all tests run. But this
has the pesky problem of causing interdependent tests. One test might update an object that another test
makes assertions about, and suddenly you have false test failures that you have to spend time to debug.</p>
<p>Our case was even worse – we were using our API to setup test data. Use the API to insert a user at the
beginning of the test and delete it at the end. When the <code class="language-plaintext highlighter-rouge">User INSERT</code> or <code class="language-plaintext highlighter-rouge">User DELETE</code> operations went
haywire we got a whole ton of false test failures. You really should only test one thing with a test, and
our tests were getting way out of control.</p>
<p>The craziness drove me to write Trappings. Trappings provides a clear place for you to create test data
for .NET projects and have it torn down at the end of the test. It makes it possible to trivially write
functional tests that are independent of each other – failures of one don’t cause failures of another.</p>
<h2 id="how-to-setup-data">How to setup data</h2>
<p>Test fixtures are a place to declare data to be setup. Here is the sample from the readme:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">class</span> <span class="nc">TheRaceTrack</span> <span class="p">:</span> <span class="n">ITestFixtureData</span>
<span class="p">{</span>
<span class="c1">// A convenient pattern to follow is to make static properties for things</span>
<span class="c1">// you'll access within the test. All of these are completely valid within</span>
<span class="c1">// the using block.</span>
<span class="k">public</span> <span class="k">static</span> <span class="n">Car</span> <span class="n">Cruze</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="n">IEnumerable</span><span class="p"><</span><span class="n">SetupObject</span><span class="p">></span> <span class="nf">Setup</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// Assign to static field for easy access later</span>
<span class="n">Cruze</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Car</span> <span class="p">{</span> <span class="n">Make</span> <span class="p">=</span> <span class="s">"Chevy"</span><span class="p">,</span> <span class="n">Model</span> <span class="p">=</span> <span class="s">"Cruze"</span> <span class="p">};</span>
<span class="c1">// cruze will be inserted into the database after this line</span>
<span class="k">yield</span> <span class="k">return</span> <span class="k">new</span> <span class="n">SetupObject</span> <span class="p">{</span> <span class="n">CollectionName</span> <span class="p">=</span> <span class="s">"cars"</span><span class="p">,</span> <span class="n">Value</span> <span class="p">=</span> <span class="n">Cruze</span> <span class="p">};</span>
<span class="c1">// Since `cruze` has already been inserted, it's ID is already auto-assigned</span>
<span class="kt">var</span> <span class="n">tim</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Driver</span> <span class="p">{</span> <span class="n">Name</span> <span class="p">=</span> <span class="s">"Tim"</span><span class="p">,</span> <span class="n">CarId</span> <span class="p">=</span> <span class="n">Cruze</span><span class="p">.</span><span class="n">Id</span> <span class="p">};</span>
<span class="k">yield</span> <span class="k">return</span> <span class="k">new</span> <span class="nf">SetupObject</span><span class="p">(</span><span class="s">"drivers"</span><span class="p">,</span> <span class="n">tim</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>All you have to do is implement <code class="language-plaintext highlighter-rouge">ITestFixtureData</code> and not hide the default constructor. <code class="language-plaintext highlighter-rouge">Setup</code> returns
an <code class="language-plaintext highlighter-rouge">IEnumerable</code> which you can really use to your advantage. As each object is yielded, the next one isn’t
constructed until the previous one is fully inserted into the database. This means you can take advantage
of MongoDB’s ID auto-generation to piece together complex relationships.</p>
<p>Another feature is that classes can be public, private, nested – whatever you need. If you want a
fixture to be shared for a lot of tests, make it public. If you want more fixtures for specific use cases,
just toss them into nested classes and keep them close to the tests. The only constraints are placed by
the compiler. I find this can be very helpful.</p>
<p>A pattern I’ve begun following is to make static properties to hold references to objects I create during
<code class="language-plaintext highlighter-rouge">Setup()</code>. In the above example I can reference <code class="language-plaintext highlighter-rouge">TheRaceTrack.Cruze.Id</code> to get the ID of the Chevy Cruze.
For instance:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="p">[</span><span class="n">Test</span><span class="p">]</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">ILoveCars</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">using</span><span class="p">(</span><span class="n">FixtureSession</span><span class="p">.</span><span class="n">Create</span><span class="p"><</span><span class="n">TheRaceTrack</span><span class="p">>())</span>
<span class="p">{</span>
<span class="c1">// Database is now setup. You can use code that assumes that documents</span>
<span class="c1">// exist in db.cars and db.drivers</span>
<span class="kt">var</span> <span class="n">driver</span> <span class="p">=</span> <span class="k">from</span> <span class="n">driver</span> <span class="k">in</span> <span class="n">drivers</span><span class="p">.</span><span class="nf">AsQueryable</span><span class="p">()</span>
<span class="k">where</span> <span class="n">driver</span><span class="p">.</span><span class="n">CarId</span> <span class="p">==</span> <span class="n">TheRaceTrack</span><span class="p">.</span><span class="n">Cruze</span><span class="p">.</span><span class="n">Id</span>
<span class="k">select</span> <span class="n">driver</span><span class="p">;</span>
<span class="n">driver</span><span class="p">.</span><span class="nf">Count</span><span class="p">().</span><span class="nf">ShouldEqual</span><span class="p">(</span><span class="m">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// objects from TheRaceTrack are no longer accessible in Mongo</span>
<span class="p">}</span></code></pre></figure>
<p>Here, we use the <code class="language-plaintext highlighter-rouge">FixtureSession</code> to create <code class="language-plaintext highlighter-rouge">TheRaceTrack</code> and ensure that the objects it creates will be gone
at the end of the <code class="language-plaintext highlighter-rouge">using</code> statement. Within the <code class="language-plaintext highlighter-rouge">using</code> statement we can do anything we want with these objects
– including delete them. This works even for other processes, like a client-server architecture where you’re
testing the server from a client. Since the objects exist in the database, they exist globally (they’re even
accessible to other computers).</p>
<h2 id="disclaimers">Disclaimers</h2>
<p>While I haven’t said it explicitly yet, this only currently works for MongoDB. I did it this way because that’s
what I use most of the time and, frankly, it’s stinkin easy. But there’s no reason why this couldn’t work for
SQL or other databases, it’s just not on my priority list.</p>
<p>I’ve released the package on <a href="http://nuget.org/packages/Trappings">NuGet</a> under the MIT license. My hope is that everyone can feel free to use it,
and contribute back if they find it useful.</p>
Why don't more developers contribute to open source?2012-05-03T00:00:00+00:00http://timkellogg.me/blog/2012/05/03/open-source-meetup-group<p>One night last weekend I couldn’t sleep because I couldn’t stop thinking about open source projects like
<a href="http://stackoverflow.com/a/8785437/503826">StructureMap</a> where the maintainers are burnt out from giving all their time and energy. I recently
took over the responsibility of merging pull requests and fielding issues for StructureMap so Jeremy can
focus on life issues and his work with <a href="http://mvc.fubu-project.org/">FubuMVC</a>. Regardless, it remains one of the most highly used
IoC containers for C#.</p>
<p>One night last weekend I couldn’t sleep because I couldn’t stop thinking about open source projects like
<a href="http://stackoverflow.com/a/8785437/503826">StructureMap</a> where the maintainers are burnt out from giving all their time and energy. I recently
took over the responsibility of merging pull requests and fielding issues for StructureMap so Jeremy can
focus on life issues and his work with <a href="http://mvc.fubu-project.org/">FubuMVC</a>. Regardless, it remains one of the most highly used
IoC containers for C#.</p>
<p>I had a lot of thoughts rushing through my head about how StructureMap is not alone. There’s way too many
projects that die simply because the maintainer is spread too thin. If each one of us contributed just a
little bit of time to the open source software that we love, we could prevent hundreds of valuable projects
from going stale or dying.</p>
<p>I ended up giving up on sleep and <a href="/blog/2012/04/22/why-open-source-is-worth-your-time/">wrote a blog post</a> that stayed on the front page of hacker news for
a while. It turns out that there’s a lot of people that would love to give back to these projects but are
intimidated in one way or another. I’m not a big fan of speculation, so I decided to throw together a <a href="http://www.zoomerang.com/Survey/WEB22FJY9L3RZ3">quick
survey</a> and sent it out to some peers and coworkers.</p>
<div id="chart1"><!-- first chart goes here --></div>
<h2 id="the-inexperienced-are-intimidated">The inexperienced are intimidated</h2>
<p>It’s a bit of a chicken-and-the-egg problem. For people who either infrequently or never contribute
to open source, the one of the largest reasons is that they’re scared that their code won’t be good enough. Many
of the friends and coworkers that mentioned this issue to me also realized that the best way for them to get to a
level of comfort with their own code is probably to actually work on open source projects. But without working
on open source projects, their code isn’t getting better.</p>
<p>The largest response for infrequent contributors was that the code base is too large or intimidating to
navigate and learn. The most useful projects out there are large and complex, so this probably won’t change.
However, people who often contribute to open source projects tend to have an inclination toward soaking in
large code bases. It’s a learned skill that is obtained either by changing jobs every month or by working on
open source projects.</p>
<h2 id="the-experienced-love-contributing">The experienced love contributing</h2>
<p>Of the people who gave frequently (more than a few times a month) one of the overwhelmingly biggest reasons
for continuing to contribute was that they just plain enjoy it. For myself, I know I get a sense of
satisfaction, maybe even excitement, when a pull request is accepted. One respondant said that they like
making things that their friends and coworkers find useful. I can echo that!</p>
<div id="chart2"><!-- second chart goes here --></div>
<h2 id="the-experienced-also-dont-mind-digging-into-code">The experienced also don’t mind digging into code</h2>
<p>The next biggest reason to contribute was that, when something isn’t working, they crack open the code to
see what’s going wrong. A lot of times they fix the problem and end up sending a pull request if they fix it.
I think this is the biggest advantages to open source software.</p>
<p>In the past I’ve gotten bit by closed source software (I’m looking at you,
Microsoft) where there’s something really simple that’s not working, but I can’t change it because I can’t
recompile the source code. Other times I really just want to see what’s going wrong but I can’t look at the
code because it’s proprietary.</p>
<h2 id="what-if-we-worked-together">What if we worked together?</h2>
<p>While talking to lots of people about open source, it became abundantly clear that a lot of people simply
don’t know where to start. What would happen if we started a <a href="http://lists.openhatch.org/pipermail/events/2012-April/000304.html">meetup group</a> to pair up and work through
code together? It could be a convenient place where the inexperienced could learn from the experienced,
and where ideas could spread organically.</p>
<p>I’m in the planning stages of starting <a href="http://www.meetup.com/OpenHatch-X-Boulder/">such a group</a> where I live in Boulder. If you or someone you know
lives or works in Boulder, you should definitely <a href="/contact/">get in contact</a> with me. I’m open to suggestions and
advice. I’m also looking for people to help out and companies to sponsor.</p>
<script type="text/javascript" src="/public/raphael-min.js"> </script>
<script type="text/javascript" src="/public/g.raphael-min.js"> </script>
<script type="text/javascript" src="/public/g.bar-min.js"> </script>
<script type="text/javascript" src="/public/backbone-min.js"> </script>
<script type="text/javascript" src="/blog/open-source-charts.js"> </script>
<script type="text/javascript" src="/blog/open-source-results.json"> </script>
Why Open Source Is Worth Your Time2012-04-22T00:00:00+00:00http://timkellogg.me/blog/2012/04/22/why-open-source-is-worth-your-time<p>One of my math professors said that our beliefs are shaped by our life experiences. Two people can logically come
to two very different lifestyle choices based on how they were raised, taught and friends that impacted them. The
lecture was meant to apply to religious and moral beliefs, but I think it also applies to how we grow professionally.</p>
<p>One of my math professors said that our beliefs are shaped by our life experiences. Two people can logically come
to two very different lifestyle choices based on how they were raised, taught and friends that impacted them. The
lecture was meant to apply to religious and moral beliefs, but I think it also applies to how we grow professionally.</p>
<p>I have a coworker that keeps asking me how I know so much about software engineering techniques. Part of the answer
is that I had excellent teachers. I went to a great college, but also in my internships I had highly skilled
engineers teach me how to write unit tests
and design maintainable code. But after school and internships, I was responsible to teach myself. I’ve read tech
magazines, programming books, blogs and answered stack overflow questions, but the best thing I ever did was
contribute to open source.</p>
<h2 id="learn-by-imitating-good-work">Learn By Imitating Good Work</h2>
<p>It’s like Pavlov’s dog. We all get conditioned, many of us get conditioned to commit <a href="/blog/2011/12/30/can-bad-code-ruin-your-career/">acts of code treason</a> by
surrounding ourselves with bad work. A lot of great coders surround themselves with people who don’t care about
quality, they let their skills slip. The best way to get better at your job is to watch a job well done. It’s the
same idea behind mentorships. When you get a chance to see things done well, it’s easier to see how
you could also do excellent work.</p>
<p>I got started learning Behavior Driven Design first by perusing through the <a href="http://objectflow.codeplex.com/">objectflow</a> code. I later followed
up the learning by reading books & blogs about BDD to get a better understanding of the intent. I also humbly
learned why the service locator design pattern is actually an anti-pattern from working on <a href="http://moqcontrib.codeplex.com/">moq-contrib</a>. On
other projects I learned about safe deployment cycles, organizing people and support, and responding professionally
to criticism, and much more.</p>
<p>Just to be clear, inventing your own open source project that no one ever uses doesn’t count. This argument only
applies if your working on a relatively mainstream project. Writing code in your spare time is great and all, but
if you’re trying to sharpen your skills I think it’s not the most efficient way to do so.</p>
<p>If you’re not someone who lives in a tech hub like New York City or Silicon Valley, it’s even easier to get stuck
in a job where seniority is valued over skill, and watch your motivation crumble. Sometimes it’s hard to find a job
where you can surround yourself with people smarter and more motivated than yourself. But with open source, you can
pick your project and choose who you work with. Furthermore, when choosing teams, open source has a far richer
pool of coworkers.</p>
<h2 id="it-grows-your-professional-network">It Grows Your Professional Network</h2>
<p>A lot of open source projects are driven by consultants and book authors. Normally you would have to pay them
thousands of dollars to teach you how to write good code. But if you’re contributing to one of their projects
they’ll be happy to give you free code reviews and show you a better way to do what you’ve always been doing. Most
people who maintain highly used projets have a large professional network, especially if they’re consultants or
speakers. By working closely with them on a project, you can often times utilize their professional contacts if
you ever need a job.</p>
<h2 id="it-makes-your-resume-shine">It Makes Your Resume Shine</h2>
<p>I haven’t heard of any employers who would look at a resume and scoff, “whoops another one of those open source
duds got through our recruiter again”. The fact is, most employers realize that working on open source projects
is doubling your experience. You get experience during your work day, and then work with an entirely different
team outside of work, sometimes on totally different technologies. Even if they don’t understand that, they can
still see that you’re a self-starter, driven, and are probably intelligent.</p>
<p>Recently, people are actually beginning to use their open source work <em>as</em> their resume. How better to vet a new
recruit than to see what they’re actually producing? You can see how they design code, structure tests,
observe their source control habits and how they interact with other people. On open source projects <em>everything</em>
is public.</p>
<h2 id="you-get-to-give-back">You Get To Give Back</h2>
<p>I’ve seen a number of open source projects that are used by thousands of people and developed by one. <a href="https://github.com/jaredpar/VsVim">VsVim</a>
is a great example. Jared Parsons has been working for years on the project in his spare time - many hours a week.
There are 10-20 regular bug reporters who report bugs and plead for new features. Sometimes they even get upset
when a VsVim upgrade breaks previous functionality. But very few people actually contribute pull requests back to
the project.</p>
<p>In order to stay relavent in our industry you’ll probably use 5-15 open source projects in order to get a web
application published (probably similar numbers for other types of applications). You save hundreds of hours a year
by using open source software. Often, the open source alternatives are superior to the COTS products.
Hundreds of thousands of developers use open source software, but there’s probably only a couple thousand that
actually give back. The .NET ecosystem is especially disproportionate.</p>
<h2 id="the-hard-part-is-knowing-where-to-start">The Hard Part Is Knowing Where To Start</h2>
<p>I know from talking to people that many developers want to contribute to open source projects. We’re a good hearted
people - we all want to share and give back. But most don’t know where to start. They’ll make a resolution to go
home and read through some code over the weekend. But either it doesn’t happen or it’s so ungodly boring that they
never do it again. I really believe that most developers, if given a good place to start, would have little trouble
committing to a project for a significant period of time (years).</p>
<p>The problem is having an easy place to start and people to motivate you. The easiest way to get into a project is
to go through their issue tracker and find a bug that looks easy and fix it. Write tests, fix it, test it out and
send a pull request. It’ll seem hard at first, but the more times you practice the easier it’ll get.</p>
<h2 id="time-to-get-involved">Time To Get Involved</h2>
<p>If you’re a developer who uses open source libraries and other software but have never contributed back, now is
as good a time as any to look around. I find it easiest if you find a project that you already are familiar with.
Look through the issue tracker and find some easy issues. Try writing an email to the maintainers of a project.
Ask them for a good place to start and some pointers. Keep in mind that your pull request probably won’t get
accepted unless it’s high quality code complete with tests, so take your time.</p>
<p>Since I’m a .NET developer, I’ve run into several .NET projects that are in high demand for help. I
<a href="/projects/open-source.html">put together a list of a few</a> moderately high profile projects that are high quality but need help. If you’re
not a .NET developer, there’s no end of projects that could use help. Just look at the software you use and think
about what you think is interesting. If you know of other .NET projects that are in need of help, <a href="/contact/">contact me</a>
so I can add them to the list also.</p>
<p>Contributing to open source grows your skill set, professional network and makes your resume shine. So
look out for yourself first - contribute to open source!</p>
Alternate Code Coverage Metrics2012-04-18T00:00:00+00:00http://timkellogg.me/blog/2012/04/18/code-coverage-metrics<p>Code coverage has been a controversial topic for a number of years. Just about everyone agrees that unit testing
is beneficial. The hardcore TDD folks push for 100% coverage, while everyone who’s trying to make money has realized
that the last 1-5% can be very expensive code to test. So the conumdrum is knowing how much to test. How many tests
need to be written to get a high level of quality? I like a tweet from <a href="https://twitter.com/#!/jbogard">Jimmy Bogard</a></p>
<p>Code coverage has been a controversial topic for a number of years. Just about everyone agrees that unit testing
is beneficial. The hardcore TDD folks push for 100% coverage, while everyone who’s trying to make money has realized
that the last 1-5% can be very expensive code to test. So the conumdrum is knowing how much to test. How many tests
need to be written to get a high level of quality? I like a tweet from <a href="https://twitter.com/#!/jbogard">Jimmy Bogard</a></p>
<blockquote>
<p>In the “how much to test” argument, my line is when I <strong>know</strong> something works versus <strong>hope</strong> something works.
Hope is not a strategy.</p>
</blockquote>
<p>As a developer, I think this is a great strategy. But when it comes to managing a company, it’s very difficult to
know how much quality is degrading or improving over the past year when all you’re measuring with is the strength of
a hunch. I really do think code coverage metrics have their place. But tying any kind of real incentives to any kind of code metrics is going to turn out to be a gigantic disaster.</p>
<p>The problem with code coverage is that, if you’re not going for 100%, you’re basically missing the point. Given a
method:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="kt">bool</span> <span class="nf">IsValid</span><span class="p">(</span><span class="kt">string</span> <span class="n">fileName</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">stream</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">FileStream</span><span class="p">(</span><span class="n">fileName</span><span class="p">)</span>
<span class="k">using</span> <span class="p">(</span><span class="kt">var</span> <span class="n">reader</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">StreamReader</span><span class="p">(</span><span class="n">stream</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">var</span> <span class="n">text</span> <span class="p">=</span> <span class="n">reader</span><span class="p">.</span><span class="nf">ReadToEnd</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">pattern</span> <span class="p">=</span> <span class="s">"<name>.*"</span><span class="p">;</span>
<span class="n">pattern</span> <span class="p">+=</span> <span class="n">text</span><span class="p">;</span>
<span class="n">pattern</span> <span class="p">+=</span> <span class="s">".*</name>"</span><span class="p">;</span>
<span class="kt">var</span> <span class="n">pattern</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Regex</span><span class="p">(</span><span class="n">pattern</span><span class="p">);</span>
<span class="k">return</span> <span class="p">!</span><span class="n">pattern</span><span class="p">.</span><span class="nf">IsMatch</span><span class="p">(</span><span class="n">text</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="n">FileNotFoundException</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="k">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>If you run a happy path test over this method, you get 89% coverage. Most people would consider this pretty decent
coverage for a whole project. However, you’re still missing very important tests, such as when the file isn’t found
or when the file either does or doesn’t match the regex. Until you write those tests, your original happy path test
isn’t really worth much and is really just providing a false sense of security.</p>
<p>Here, the hardcore TDD folks will point at the flaws in not insisting on 100% coverage. They’re right, if you
always followed the happy path and tested all your code like this, you’d have a reasonably high test coverage
with almost no faith in your tests.</p>
<p>I think an improved metric would be <strong>percentage of classes with 100% coverage</strong>. This acknowledges that some classes
shouldn’t ever be tested, because they’re too costly to test. But it also keeps with the spirit of 100% test
coverage. Combining this with a full code coverage percentage would lead to a <em>more</em> truthful number about quality
of tests. There’s obviously still some holes in this method, but it’s a lot closer.</p>
Why Object IDs & Primary Keys Are Implementation Details2012-03-24T00:00:00+00:00http://timkellogg.me/blog/2012/03/24/why-object-ids-primary-keys-are<p>Recently <a href="http://blog.timkellogg.me/2012/03/abstract-data-layer-part-1-object-id.html">I wrote a post</a> about a project that I was working on with an abstracted data layer concept that can work in the context of either relational or document data store. In retrospect I think I brushed too quickly over the details of why I think object identifiers (and primary keys) are a part of the implementation that should be hidden, when possible. To explain what I mean I’ll use a surreal-world story.</p>
<p>Recently <a href="http://blog.timkellogg.me/2012/03/abstract-data-layer-part-1-object-id.html">I wrote a post</a> about a project that I was working on with an abstracted data layer concept that can work in the context of either relational or document data store. In retrospect I think I brushed too quickly over the details of why I think object identifiers (and primary keys) are a part of the implementation that should be hidden, when possible. To explain what I mean I’ll use a surreal-world story.</p>
<h2 id="the-situation">The Situation</h2>
<p>You are the chief software engineer at a software company. One day your product manager comes to you with a list of ideas for a new product where users can post definitions to slang words, like a dictionary. He says people are going to love this new app because everyone has a different idea of what words mean. After talking with him to establish ubiquitous language and identify nouns and verbs, you crank up some <a href="https://twitter.com/#!/search/%23codingmusic">coding music</a> and hack out some model classes.</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">public</span> <span class="k">class</span> <span class="nc">Word</span> <span class="p">{</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">Name</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="n">IList</span><span class="p"><</span><span class="n">Definition</span><span class="p">></span> <span class="n">Definitions</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">private</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">Definition</span> <span class="p">{</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">WordId</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">Text</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">Example</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>A weekend later you finish coding the app using Int32s (<code class="language-plaintext highlighter-rouge">int</code>) as the identity data type for most of your models because it’s usually big enough and works well as a primary key. Honestly, you didn’t really think about it because its what you always do.</p>
<p>After the launch your app quickly gains popularity with the user base doubling every day. Not only that, but as more definitions get posted, more people are attracted to the site and post their own word definitions. While reviewing the exponential data growth figures, your DBA decides that <code class="language-plaintext highlighter-rouge">Definition.Id</code> should be changed to an Int64 (<code class="language-plaintext highlighter-rouge">long</code>) to accommodate the rapidly multiplying postings.</p>
<p>Let’s stop for a minute and review what the <em>business needs</em> were. Your product manager wants an app where people can post words and definitions. Each word has many definitions. There’s no talk in the business domain of tables and primary keys. But you included those concepts in the model anyway, because that’s how you think about your data.</p>
<p>The DBA chose to make the ID into a larger number to accommodate a larger amount of data. So now to help optimize the database, you are forced to update all your <em>business logic</em> to work nicely with the <em>data logic</em>.</p>
<h2 id="data-logic-was-meant-to-live-in-the-database">Data Logic Was Meant to Live in the Database</h2>
<p>The trouble with tying data logic closely to business logic is that the database isn’t part of your business plan. As your application grows you’ll have to tweak your database to squeeze out performance - or even swap it out for <a href="http://cassandra.apache.org/">Cassandra</a>. Databases are good at data logic because they are declarative. You can usually tune performance without affecting how the data is worked with. When you place an index, it doesn’t affect how you write a SELECT or UPDATE statement, just how fast it runs.</p>
<p>At the same time, databases are also very procedural things. When you put business logic in stored procedures you lose the benefits of object oriented programming. It also makes unit tests complicated, slow, and fragile (which is why most people don’t unit test the database). In the end, it’s best to let your database optimize how data is stored and retrieved and keep your domain models clean and focused on the business needs.</p>
<h2 id="the-type-of-the-object-id-is-an-implementation-detail">The Type of the Object ID Is an Implementation Detail</h2>
<p>Lets say you hire a new COO that lives in Silicon Valley and thinks the latest coolest technology is always the gateway to success. With the new growth he decides that you should rewrite the dictionary application to use <a href="http://www.mongodb.org/display/DOCS/Introduction">MongoDB</a> because it’s the only way your application can scale to meet the needs of the business. While evaluating Mongo you draw out what an example word and definitions might look like when stored as <a href="http://bsonspec.org/">BSON</a>:</p>
<figure class="highlight"><pre><code class="language-js" data-lang="js"><span class="p">{</span>
<span class="dl">"</span><span class="s2">_id</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">09823bcf7de88c</span><span class="dl">"</span><span class="p">,</span>
<span class="dl">"</span><span class="s2">name</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">LOL</span><span class="dl">"</span><span class="p">,</span>
<span class="dl">"</span><span class="s2">definitions</span><span class="dl">"</span><span class="p">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="dl">"</span><span class="s2">text</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Laugh Out Loud</span><span class="dl">"</span>
<span class="dl">"</span><span class="s2">example</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">I can't wait for the wedding. LOL</span><span class="dl">"</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="dl">"</span><span class="s2">text</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Lots Of Love</span><span class="dl">"</span><span class="p">,</span>
<span class="dl">"</span><span class="s2">example</span><span class="dl">"</span><span class="p">:</span> <span class="dl">"</span><span class="s2">I don't have the heart to let my mom know that LOL doesn't actually mean Lots Of Love</span><span class="dl">"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span></code></pre></figure>
<p>In Mongo, <a href="http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-EmbeddingandLinking">you usually would store the Definitions inline with the Word</a>. Now there is no need for a Definition.Id or Definition.WordId because all of this is implicit. Not only that, but Word.Id is now an <a href="http://www.mongodb.org/display/DOCS/Object+IDs">ObjectId</a> - a very different 12 byte number that includes time and sequence components. In order to update your application to work with Mongo, you’ll have to update all references IDs to use these ObjectIds.</p>
<p>The ID is an implementation concern. In a centralized SQL database, sequential integers make sense. In a distributed environment like Mongo, ObjectIDs offer more advantages. Either way, the type of your ID is an implementation detail.</p>
<h2 id="encapsulation-requires-that-you-hide-implementation-details">Encapsulation Requires That You Hide Implementation Details</h2>
<p>Most OO programmers understand that encapsulation means that an object <em>has</em> or <em>contains</em> another object. However, some forget that a <a href="http://en.wikipedia.org/wiki/Encapsulation_(object-oriented_programming)">large part of encapsulation</a> is that you should keep the <a href="http://stackoverflow.com/a/1777728/503826">implementation details</a> of an object hidden from other objects. When the details of an object leak into other objects, the contract is broken and you <a href="http://www.joelonsoftware.com/articles/LeakyAbstractions.html">lose the benefits of the OO abstraction</a>.</p>
<p>Any ORM tool should give you the ability to select protected (if not private) members of the object to be persisted. If it doesn’t, it’s not using because it’ll cause too great of a compromise in design. This is how we should have been allowed to write our objects from the start:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">public</span> <span class="k">class</span> <span class="nc">Word</span> <span class="p">{</span>
<span class="k">private</span> <span class="kt">object</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">Name</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="n">IList</span><span class="p"><</span><span class="n">Definition</span><span class="p">></span> <span class="n">Definitions</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">private</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">Add</span><span class="p">(</span><span class="n">Definition</span> <span class="n">definition</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">definition</span> <span class="p">==</span> <span class="k">null</span><span class="p">)</span> <span class="k">throw</span> <span class="k">new</span> <span class="nf">ArgumentNullException</span><span class="p">();</span>
<span class="n">Definitions</span><span class="p">.</span><span class="nf">Add</span><span class="p">(</span><span class="n">definition</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">Definition</span> <span class="p">{</span>
<span class="k">public</span> <span class="nf">Definition</span><span class="p">(</span><span class="kt">string</span> <span class="n">text</span><span class="p">,</span> <span class="kt">string</span> <span class="n">example</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Text</span> <span class="p">=</span> <span class="n">text</span><span class="p">;</span>
<span class="n">Example</span> <span class="p">=</span> <span class="n">example</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">private</span> <span class="kt">object</span> <span class="n">Id</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">Text</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">private</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">Example</span> <span class="p">{</span> <span class="k">get</span><span class="p">;</span> <span class="k">private</span> <span class="k">set</span><span class="p">;</span> <span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<h2 id="but-dynamic-languages-diffuse-the-problem">But Dynamic Languages Diffuse The Problem</h2>
<p>If you’re in a dynamic language like Ruby or Node.js this is less of an issue. Most of my argument hinges on the idea that your API will latch onto the object’s ID and insist that all methods that use it will match. This is really just a constraint of strict statically typed languages. Even implicit typing will mitigate the issue some.</p>
<p>You can notice above that I got around the constraint by using <code class="language-plaintext highlighter-rouge">object</code> as the ID type. This is really what you want. It’s telling the compiler and API that you really, shouldn’t care what the type is - it’s an implementation detail. You shouldn’t run into many problems as long as you are keeping the ID properly encapsulated within the object.</p>
Abstract Data Layer Part 1: Object ID Types And Conventions2012-03-19T00:00:00+00:00http://timkellogg.me/blog/2012/03/19/abstract-data-layer-part-1-object-id<div class='post'>
In February I went to the MongoDB conference in Boulder. That day was my first real taste of any sort of document oriented database. Since then I've played around with Mongo in C#, Node.JS and natively in the Mongo shell. Since then, I also can't help feeling overwhelmingly happy when thinking about how I can use Mongo for a project.<br /><br />At Alteryx we're entering a project where we require some specific business needs. We require an extremely fast and scalable database, hence Mongo. But we also need to package our product for on-premise installations, which I hear requires that we also support certain SQL databases.<br /><br /><i>...I don't actually understand why enterprises insist on using SQL. I'm told that enterprise DBA's want control over everything, and they don't want to learn new products like MongoDB. To me, it seems that 3rd products that are bought would be exempt from DBA optimizations & other meddling. But I guess I wouldn't know what it takes to be an enterprise DBA, so I'll shut up about this now. Just my thoughts...</i><br /><div><br />Since relational databases are a lot different than document oriented databases I decided to use NHibernate as an ORM since they've already figured out a lot of the hard problems. I chose NHibernate over Entity Framework mainly because I already know NHibernate, and I know that it has good support across many databases. Nothing against EF in particular.<br /><br />I've been working on this for a week or so. I've gotten pretty deep into the details so I thought a blog post would be a good way to step out and think about what I've done and where I'm going. The design is mostly mine (of course, I stand on the backs of giants) and really just ties together robust frameworks.<br /><br /><span style="font-size: x-large;">Convention Based Object Model</span><br /><br />In order to remain agnostic toward relational/document structure, I decided that there would have to be some basic assumptions or maxims. I like the idea of convention-based frameworks and I really think its the best way to go about building this kind of infrastructure. Also, conventions are a great way to enforce assumptions and keep things simple.<br /><br /><span style="font-size: large;">IDs Are Platform Dependent</span><br /><br />It's not something I really thought about before this. In relational databases we'll often use an integer as the object ID. They're nice because they're small, simple, and sequential. However, Mongo assumes that you want to be extremely distributed. Dense sequential IDs (like int identity) run into all kinds of race conditions and collisions in distributed environments (unless you choose a master ID-assigner, which kind of ruins the point of being distributed).<br /><br />MongoDB uses <a href="http://www.mongodb.org/display/DOCS/Object+IDs" target="_blank">a very long (12 byte) semi-sequential number</a>. It's semi-sequential in that every new ID is a bigger number than the IDs generated before it, but not necessarily just +1. Regardless, it's impractical to use regular integers in Mongo and also a little impractical to use long semi-sequential numbers in SQL.<br /><br />As a result, I chose to use <span style="font-family: 'Courier New', Courier, monospace;">System.Object</span> as the ID type for all identifiers. NHibernate can be configured to use objects as integers with native auto-increment after some tweaking. The Mongo C# driver also supports object IDs with client-side assignment.<br /><br />Ideally, I would like to write some sort of <span style="font-family: 'Courier New', Courier, monospace;">IdType</span> struct that contains an enumeration and object value (I'm thinking along the lines of a discriminated union here). This would help make IDs be more distinctive and easier to attach extension methods or additional APIs. I'd also like to make IDs protected by default (instead of public).<br /><br /><span style="font-size: large;">The Domain Object</span><br /><br />I also created a root object for all persistent objects to derive from. This is a fairly common pattern, especially in frameworks where there is a lot of generic or meta-programming.<br /><br /><script src="https://gist.github.com/2130909.js?file=DomainObject-simple.cs"></script><br />I had <span style="font-family: 'Courier New', Courier, monospace;">DomainObject</span> implement an <span style="font-family: 'Courier New', Courier, monospace;">IDomainObject</span> interface so that in all my meta-programming I can refer to <span style="font-family: 'Courier New', Courier, monospace;">IDomainObject</span>. That way there shouldn't ever be a corner case where we can't or shouldn't descend from <span style="font-family: 'Courier New', Courier, monospace;">DomainObject</span> but have to anyway (separate implementation from interface).<br /><br /><script src="https://gist.github.com/2130909.js?file=User-Name.cs"></script><br />The User and Name objects are simple, as you can expect any NHibernate object model to look like. The idea is to keep them simple and keep business and data logic elsewhere.<br /><br /><span style="font-size: large;">Are You Interested?</span><br /><br />From what I can tell, I think we're breaking ground on this project. It doesn't seem like too many people have tried to make a framework to support both relational and document data stores. Initially I was hesitant to support both relational and document stores. But I think there are some excellent side effects that I will outline in upcoming posts.<br /><br />The content I've written about so far is only a small fraction of what it took to get this on it's feet. Someone once said that <a href="http://tom.preston-werner.com/2011/11/22/open-source-everything.html" target="_blank">you should open source (almost) everything</a>. So, if you (or anyone you know) would like to see the full uncensored code for this, let me know so I can start corporate conversations in that direction. </div></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
I want the Id to be protected because it is an implementation detail that shouldn't be exposed outside the object. Like I was saying earlier, the type of the Id is dependent on which database you choose, and the fact that there even is an Id is also an implementation detail. For instance, Mongo doesn't require IDs for sub-documents. <br /><br />Also, if at a later point you decide to refactor a sub-document into it's own top-level document collection in Mongo, you have to add IDs to the new documents. I would consider this type of refactoring to usually be a performance tuning task (similar to creating indexes). So naturally it's a concern of the data layer, not the model or business logic.<br /><br />The trouble with actually making it protected is that so many frameworks expect the ID to be exposed. Probably because relational databases always expect you to have and ID, so many MVCs are designed with that maxim. We're using WCF, so we might actually be able to get away from that concept.</div>
</div>
<div class='comment'>
<div class='author'>Tim Wilson</div>
<div class='content'>
Tim, can you further explain why you would like to make your Id protected? What might make sense for you is to setup your Id to have a private backing field where it is only initialized in the constructor. This way whenever you initialize a User you are forced to also provide an Id. Once you have the private backing field, the NHibernate mappings can be setup to be Access Field which will let it know to map to the private backing field. Let me know if that makes sense or if that helps you out any.</div>
</div>
</div>
<div class='post'>
In February I went to the MongoDB conference in Boulder. That day was my first real taste of any sort of document oriented database. Since then I've played around with Mongo in C#, Node.JS and natively in the Mongo shell. Since then, I also can't help feeling overwhelmingly happy when thinking about how I can use Mongo for a project.<br /><br />At Alteryx we're entering a project where we require some specific business needs. We require an extremely fast and scalable database, hence Mongo. But we also need to package our product for on-premise installations, which I hear requires that we also support certain SQL databases.<br /><br /><i>...I don't actually understand why enterprises insist on using SQL. I'm told that enterprise DBA's want control over everything, and they don't want to learn new products like MongoDB. To me, it seems that 3rd products that are bought would be exempt from DBA optimizations & other meddling. But I guess I wouldn't know what it takes to be an enterprise DBA, so I'll shut up about this now. Just my thoughts...</i><br /><div><br />Since relational databases are a lot different than document oriented databases I decided to use NHibernate as an ORM since they've already figured out a lot of the hard problems. I chose NHibernate over Entity Framework mainly because I already know NHibernate, and I know that it has good support across many databases. Nothing against EF in particular.<br /><br />I've been working on this for a week or so. I've gotten pretty deep into the details so I thought a blog post would be a good way to step out and think about what I've done and where I'm going. The design is mostly mine (of course, I stand on the backs of giants) and really just ties together robust frameworks.<br /><br /><span style="font-size: x-large;">Convention Based Object Model</span><br /><br />In order to remain agnostic toward relational/document structure, I decided that there would have to be some basic assumptions or maxims. I like the idea of convention-based frameworks and I really think its the best way to go about building this kind of infrastructure. Also, conventions are a great way to enforce assumptions and keep things simple.<br /><br /><span style="font-size: large;">IDs Are Platform Dependent</span><br /><br />It's not something I really thought about before this. In relational databases we'll often use an integer as the object ID. They're nice because they're small, simple, and sequential. However, Mongo assumes that you want to be extremely distributed. Dense sequential IDs (like int identity) run into all kinds of race conditions and collisions in distributed environments (unless you choose a master ID-assigner, which kind of ruins the point of being distributed).<br /><br />MongoDB uses <a href="http://www.mongodb.org/display/DOCS/Object+IDs" target="_blank">a very long (12 byte) semi-sequential number</a>. It's semi-sequential in that every new ID is a bigger number than the IDs generated before it, but not necessarily just +1. Regardless, it's impractical to use regular integers in Mongo and also a little impractical to use long semi-sequential numbers in SQL.<br /><br />As a result, I chose to use <span style="font-family: 'Courier New', Courier, monospace;">System.Object</span> as the ID type for all identifiers. NHibernate can be configured to use objects as integers with native auto-increment after some tweaking. The Mongo C# driver also supports object IDs with client-side assignment.<br /><br />Ideally, I would like to write some sort of <span style="font-family: 'Courier New', Courier, monospace;">IdType</span> struct that contains an enumeration and object value (I'm thinking along the lines of a discriminated union here). This would help make IDs be more distinctive and easier to attach extension methods or additional APIs. I'd also like to make IDs protected by default (instead of public).<br /><br /><span style="font-size: large;">The Domain Object</span><br /><br />I also created a root object for all persistent objects to derive from. This is a fairly common pattern, especially in frameworks where there is a lot of generic or meta-programming.<br /><br /><script src="https://gist.github.com/2130909.js?file=DomainObject-simple.cs"></script><br />I had <span style="font-family: 'Courier New', Courier, monospace;">DomainObject</span> implement an <span style="font-family: 'Courier New', Courier, monospace;">IDomainObject</span> interface so that in all my meta-programming I can refer to <span style="font-family: 'Courier New', Courier, monospace;">IDomainObject</span>. That way there shouldn't ever be a corner case where we can't or shouldn't descend from <span style="font-family: 'Courier New', Courier, monospace;">DomainObject</span> but have to anyway (separate implementation from interface).<br /><br /><script src="https://gist.github.com/2130909.js?file=User-Name.cs"></script><br />The User and Name objects are simple, as you can expect any NHibernate object model to look like. The idea is to keep them simple and keep business and data logic elsewhere.<br /><br /><span style="font-size: large;">Are You Interested?</span><br /><br />From what I can tell, I think we're breaking ground on this project. It doesn't seem like too many people have tried to make a framework to support both relational and document data stores. Initially I was hesitant to support both relational and document stores. But I think there are some excellent side effects that I will outline in upcoming posts.<br /><br />The content I've written about so far is only a small fraction of what it took to get this on it's feet. Someone once said that <a href="http://tom.preston-werner.com/2011/11/22/open-source-everything.html" target="_blank">you should open source (almost) everything</a>. So, if you (or anyone you know) would like to see the full uncensored code for this, let me know so I can start corporate conversations in that direction. </div></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
I want the Id to be protected because it is an implementation detail that shouldn't be exposed outside the object. Like I was saying earlier, the type of the Id is dependent on which database you choose, and the fact that there even is an Id is also an implementation detail. For instance, Mongo doesn't require IDs for sub-documents. <br /><br />Also, if at a later point you decide to refactor a sub-document into it's own top-level document collection in Mongo, you have to add IDs to the new documents. I would consider this type of refactoring to usually be a performance tuning task (similar to creating indexes). So naturally it's a concern of the data layer, not the model or business logic.<br /><br />The trouble with actually making it protected is that so many frameworks expect the ID to be exposed. Probably because relational databases always expect you to have and ID, so many MVCs are designed with that maxim. We're using WCF, so we might actually be able to get away from that concept.</div>
</div>
<div class='comment'>
<div class='author'>Tim Wilson</div>
<div class='content'>
Tim, can you further explain why you would like to make your Id protected? What might make sense for you is to setup your Id to have a private backing field where it is only initialized in the constructor. This way whenever you initialize a User you are forced to also provide an Id. Once you have the private backing field, the NHibernate mappings can be setup to be Access Field which will let it know to map to the private backing field. Let me know if that makes sense or if that helps you out any.</div>
</div>
</div>
Discriminated Unions in C# Mono Compiler2012-03-10T00:00:00+00:00http://timkellogg.me/blog/2012/03/10/discriminated-unions-in-c-mono-compiler<p>Recently I’ve been using F# a bit. F# is .NET’s functional language (the syntax of F# 1.0 was backward compatible with OCaml, but 2.0 has diverged enough to make it more distinct). Learning F# was a huge mind-shift from the C-family of languages. Of all the features of F#, like implicit typing, tail recursion, and monads, many people list discriminated unions as their favorite.</p>
<p>Recently I’ve been using F# a bit. F# is .NET’s functional language (the syntax of F# 1.0 was backward compatible with OCaml, but 2.0 has diverged enough to make it more distinct). Learning F# was a huge mind-shift from the C-family of languages. Of all the features of F#, like implicit typing, tail recursion, and monads, many people list discriminated unions as their favorite.</p>
<p>Discriminated unions feel like C# enums on the surface. For instance, a union that can represent states of a light switch:</p>
<figure class="highlight"><pre><code class="language-ocaml" data-lang="ocaml"><span class="k">type</span> <span class="nc">LightSwitch</span> <span class="o">=</span>
<span class="o">|</span> <span class="nc">On</span>
<span class="o">|</span> <span class="nc">Off</span>
<span class="o">//</span> <span class="nc">And</span> <span class="k">to</span> <span class="n">use</span> <span class="n">it</span><span class="o">,</span> <span class="n">we</span> <span class="n">use</span> <span class="n">pattern</span> <span class="n">matching</span><span class="o">:</span>
<span class="k">let</span> <span class="n">lightSwitch</span> <span class="o">=</span> <span class="n">getLightSwitchState</span><span class="bp">()</span>
<span class="k">match</span> <span class="n">lightSwitch</span> <span class="k">with</span>
<span class="o">|</span> <span class="nc">On</span> <span class="o">-></span>
<span class="n">turnOnLight</span><span class="bp">()</span>
<span class="o">|</span> <span class="nc">Off</span> <span class="o">-></span>
<span class="n">turnOffLight</span><span class="bp">()</span></code></pre></figure>
<p>This example is really no different from C# enums. Discriminated unions, however, can hold data. For instance, consider when our light switch needs to also be a dimmer:</p>
<figure class="highlight"><pre><code class="language-ocaml" data-lang="ocaml"><span class="k">type</span> <span class="nc">LightSwith</span> <span class="o">=</span>
<span class="o">|</span> <span class="nc">On</span>
<span class="o">|</span> <span class="nc">Dimmed</span> <span class="k">of</span> <span class="kt">int</span>
<span class="o">|</span> <span class="nc">Off</span>
<span class="o">//</span> <span class="nc">And</span> <span class="k">to</span> <span class="n">use</span> <span class="n">it</span><span class="o">,</span> <span class="n">we</span> <span class="n">use</span> <span class="n">pattern</span> <span class="n">matching</span><span class="o">:</span>
<span class="k">let</span> <span class="n">lightSwitch</span> <span class="o">=</span> <span class="n">getLightSwitchState</span><span class="bp">()</span>
<span class="k">match</span> <span class="n">lightSwitch</span> <span class="k">with</span>
<span class="o">|</span> <span class="nc">On</span> <span class="o">-></span>
<span class="n">turnOnLight</span><span class="bp">()</span>
<span class="o">|</span> <span class="nc">Dimmed</span> <span class="n">intensity</span> <span class="o">-></span> <span class="n">dimLightToIntensity</span> <span class="n">intensity</span>
<span class="o">|</span> <span class="nc">Off</span> <span class="o">-></span>
<span class="n">turnOffLight</span><span class="bp">()</span></code></pre></figure>
<p>In C# we would have had to rewrite this whole program to handle the new dimmer requirement. Instead, we can just tack on a new state that holds data.</p>
<p>When you’re deep in the F# mindset, this structure makes perfect sense. But try implementing a discriminated union in C#. There’s the enum-like part, but there’s also the part that holds different sizes of data. There’s <a href="http://stackoverflow.com/a/2321922/503826">a great stackoverflow answer</a> that explains how the F# compiler handles discriminated unions internally. It requires 1 enum, 1 abstract class and <em>n</em> concrete implementations of the abstract class. It’s quite over-complicated to use in every-day C#.</p>
<p>Nevertheless, I really want to use discriminated unions in my C# code because of how easy they make state machines & workflows. I’ve been brainstorming how to do this. There are several implementations as C# 3.5 libraries, but they’re cumbersome to use. I’ve been looking at the source code for the mono C# compiler, and I think I want to go the route of forking the compiler for a proof-of-concept.</p>
<p>I’m debating what the syntax should be. I figure that the change would be easier if I re-used existing constructs and just tweaked them to work with the new concepts.</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="k">public</span> <span class="k">enum</span> <span class="n">LightSwith</span>
<span class="p">{</span>
<span class="n">On</span><span class="p">,</span>
<span class="nf">Dimmed</span><span class="p">(</span><span class="kt">int</span> <span class="n">intensity</span><span class="p">),</span>
<span class="n">Off</span>
<span class="p">}</span>
<span class="c1">// And to use</span>
<span class="kt">var</span> <span class="k">value</span> <span class="p">=</span> <span class="nf">GetLightSwitchValue</span><span class="p">();</span>
<span class="k">switch</span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="n">On</span><span class="p">:</span>
<span class="nf">TurnOnLight</span><span class="p">();</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nf">Dimmed</span><span class="p">(</span><span class="n">intensity</span><span class="p">):</span>
<span class="nf">DimLightToIntensity</span><span class="p">(</span><span class="n">intensity</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="n">Off</span><span class="p">:</span>
<span class="nf">TurnOffLight</span><span class="p">();</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>I’ve been debating if the Dimmed case should retain the regular case syntax or get a lambda-like syntax:</p>
<figure class="highlight"><pre><code class="language-csharp" data-lang="csharp"><span class="kt">var</span> <span class="k">value</span> <span class="p">=</span> <span class="nf">GetLightSwitchValue</span><span class="p">();</span>
<span class="k">switch</span><span class="p">(</span><span class="k">value</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="n">On</span><span class="p">:</span>
<span class="nf">TurnOnLight</span><span class="p">();</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="nf">Dimmed</span><span class="p">(</span><span class="n">intensity</span><span class="p">)</span> <span class="p">=></span>
<span class="p">{</span>
<span class="nf">DimLightToIntensity</span><span class="p">(</span><span class="n">intensity</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">case</span> <span class="n">Off</span><span class="p">:</span>
<span class="nf">TurnOffLight</span><span class="p">();</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>I’m leaning toward the lambda syntax due to how C# usually handles variable scope. I’ve barely just cloned the mono repository and started reading the design documents to orient myself with the compiler. This could be a huge project, so I’m not sure how far I’ll actually get. But this is a very interesting idea that I want to try hashing out.</p>
One Thing I Learned From F# (Nulls Are Bad)2012-02-29T00:00:00+00:00http://timkellogg.me/blog/2012/02/29/one-thing-i-learned-from-f-nulls-are<div class='post'>
Recently I started contributing to <a href="https://github.com/jaredpar/VsVim" target="_blank">VsVim</a>, a Visual Studio plugin that emulates Vim. When he was starting the project, Jared Parsons decided to write the bulk of it in F#. He did this mostly as a chance to learn a new language but also because it's a solid first class alternative to C#. For instance, F#'s features like pattern matching and discriminated unions are a natural fit for state machines like Vim.<br /><br />This is my first experience with a truly functional language. For those who aren't familiar with F#, it's essentially OCaml.NET (the <a href="http://en.wikibooks.org/wiki/F_Sharp_Programming" target="_blank">F# book</a> uses OCaml for it's markup syntax), but also draws roots from Haskell. It's a big mind shift from imperative and pure object oriented languages, but one I'd definitely recommend to any developer who wants to be better.<br /><br />Since I've been working on VsVim, I've been using F# in my spare time but C# in my regular day job. The longer I use F# the more I want C# to do what F# does. The biggest example is how F# handles nulls.<br /><br />In C# (and Ruby, Python, and any imperative language) most values can be null, and null is a natural state for a variable to be in. In fact (partly due to SQL), null is used whenever a value is empty or doesn't exist yet. In C# and Java, null is the default value for any member reference, you don't even need to explicitly initialize it. As a result, you often end up with a lot of null pointer exceptions due to sloppy programming. After all, it's kind of hard to remember to check for null every time you use a variable.<br /><br />In F#, nothing is null (that's not entirely true, but in it's natural state it's true enough). Typically you'll use options instead of null. For instance, if you have a function that fails to find or calculate something you might return null in imperative languages (and the actual value if successful). However, in F# you use an option type and return None on failure and Some value on success.<br /><br /><script src="https://gist.github.com/1941345.js"> </script><br />Here, every time you call find(kittens) you get back an option type. This type isn't a string, so you can't just start using string methods and get a null pointer exception. Instead, you have to extract the string value from the option type before it can be used.<br /><br />At this point you might be thinking, "why would I want to do that? It looks like a lot of extra code". However, I challenge you to find a crashing bug in VsVim. Every time we have an instance of an invalid state we are forced to deal with it on the spot. Every invalid state is dealt with in a way that makes sense.<br /><br />If we wrote it in C# it would be incredibly easy to get lazy while working late at night and forget to check for null and cause the plugin to crash. Instead, the only bugs we have are behavior quirks. If we ever have a crashing bug, the chances are the null value originated in C# code from Visual Studio or the .NET Framework and we forgot to check.<br /><br /><i><a href="http://news.ycombinator.com/item?id=3648104" target="_blank">Discussion on HN</a></i></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Actually, F# has a really cool syntax for function chaining. You could write:<br /><br />try(try(try(find(9, kitten), "name"), "length"), ">=", 3)<br /><br />or you could do it the F# way:<br /><br />kitten |> find 9 |> try "name" |> try "length" |> try ">= 3<br /><br />Like all functional languages, you write everything as pure functions instead of methods. But that's a discussion for another time. Ruby borrows from functional langauges like Haskell, but it could really benefit from options & discriminated unions</div>
</div>
<div class='comment'>
<div class='author'>Luke</div>
<div class='content'>
I don't know the first thing about F# so maybe this is a moot point but... It seems like that could make method chaining really tough.<br /><br />Like, I love how rails (can) handle(s) nil checking with try()<br />e.g. if (Kitten.find(9).try(:name).try(:length).try(:>=, 3)) { huzzah }<br /><br />Sorry, ruby and I are still in our honeymoon phase.</div>
</div>
</div>
<div class='post'>
Recently I started contributing to <a href="https://github.com/jaredpar/VsVim" target="_blank">VsVim</a>, a Visual Studio plugin that emulates Vim. When he was starting the project, Jared Parsons decided to write the bulk of it in F#. He did this mostly as a chance to learn a new language but also because it's a solid first class alternative to C#. For instance, F#'s features like pattern matching and discriminated unions are a natural fit for state machines like Vim.<br /><br />This is my first experience with a truly functional language. For those who aren't familiar with F#, it's essentially OCaml.NET (the <a href="http://en.wikibooks.org/wiki/F_Sharp_Programming" target="_blank">F# book</a> uses OCaml for it's markup syntax), but also draws roots from Haskell. It's a big mind shift from imperative and pure object oriented languages, but one I'd definitely recommend to any developer who wants to be better.<br /><br />Since I've been working on VsVim, I've been using F# in my spare time but C# in my regular day job. The longer I use F# the more I want C# to do what F# does. The biggest example is how F# handles nulls.<br /><br />In C# (and Ruby, Python, and any imperative language) most values can be null, and null is a natural state for a variable to be in. In fact (partly due to SQL), null is used whenever a value is empty or doesn't exist yet. In C# and Java, null is the default value for any member reference, you don't even need to explicitly initialize it. As a result, you often end up with a lot of null pointer exceptions due to sloppy programming. After all, it's kind of hard to remember to check for null every time you use a variable.<br /><br />In F#, nothing is null (that's not entirely true, but in it's natural state it's true enough). Typically you'll use options instead of null. For instance, if you have a function that fails to find or calculate something you might return null in imperative languages (and the actual value if successful). However, in F# you use an option type and return None on failure and Some value on success.<br /><br /><script src="https://gist.github.com/1941345.js"> </script><br />Here, every time you call find(kittens) you get back an option type. This type isn't a string, so you can't just start using string methods and get a null pointer exception. Instead, you have to extract the string value from the option type before it can be used.<br /><br />At this point you might be thinking, "why would I want to do that? It looks like a lot of extra code". However, I challenge you to find a crashing bug in VsVim. Every time we have an instance of an invalid state we are forced to deal with it on the spot. Every invalid state is dealt with in a way that makes sense.<br /><br />If we wrote it in C# it would be incredibly easy to get lazy while working late at night and forget to check for null and cause the plugin to crash. Instead, the only bugs we have are behavior quirks. If we ever have a crashing bug, the chances are the null value originated in C# code from Visual Studio or the .NET Framework and we forgot to check.<br /><br /><i><a href="http://news.ycombinator.com/item?id=3648104" target="_blank">Discussion on HN</a></i></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Actually, F# has a really cool syntax for function chaining. You could write:<br /><br />try(try(try(find(9, kitten), "name"), "length"), ">=", 3)<br /><br />or you could do it the F# way:<br /><br />kitten |> find 9 |> try "name" |> try "length" |> try ">= 3<br /><br />Like all functional languages, you write everything as pure functions instead of methods. But that's a discussion for another time. Ruby borrows from functional langauges like Haskell, but it could really benefit from options & discriminated unions</div>
</div>
<div class='comment'>
<div class='author'>Luke</div>
<div class='content'>
I don't know the first thing about F# so maybe this is a moot point but... It seems like that could make method chaining really tough.<br /><br />Like, I love how rails (can) handle(s) nil checking with try()<br />e.g. if (Kitten.find(9).try(:name).try(:length).try(:>=, 3)) { huzzah }<br /><br />Sorry, ruby and I are still in our honeymoon phase.</div>
</div>
</div>
C# Reflection Performance And Ruby2012-02-10T00:00:00+00:00http://timkellogg.me/blog/2012/02/10/c-reflection-performance-and-ruby<div class='post'>
I've always known that reflection method invocations C# are slower than regular invocations, but I've never never known to what extent. So I set out to make an experiment to demonstrate the performance of several ways to invoke a method. Frameworks like <a href="http://nhforge.org/" target="_blank">NHibernate</a> or the <a href="http://www.mongodb.org/display/DOCS/CSharp+Language+Center" target="_blank">mongoDB driver</a> are known to serialize and deserialize objects. In order to do either of these activities they have to scan the properties of an object and dynamically invoke them to get or set the values. Normally this is done via reflection. However, I want to know if the possibility of <a href="http://en.wikipedia.org/wiki/Memoization" target="_blank">memoizing</a> a method call as an expression tree or delegate could offer significant performance benefits. On the side, I also want to see how C# reflection compares to Ruby method invocations.<br /><br />I posted the full source to <a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark">a public github repo</a>. To quickly summarize, I wrote code that sets a property on an object 100 million times in a loop. Any setup (like finding a <span style="font-family: 'Courier New', Courier, monospace;">PropertyInfo</span> or <span style="font-family: 'Courier New', Courier, monospace;">MethodInfo</span>) is not included in the timings. I also checked the generated IL to make sure the compiler wasn't optimizing the loops. Please browse the code there if you need the gritty details.<br /><br />Before I get into the implementation details, here are the results:<br /><br /><iframe frameborder="no" height="300px" scrolling="no" src="http://www.google.com/fusiontables/embedviz?&containerId=gviz_canvas&q=select+col0%2C+col1+from+2840399+&qrs=where+col0+%3E%3D+&qre=+and+col0+%3C%3D+&qe=+limit+6&viz=GVIZ&t=BAR&width=500&height=300" width="500px"></iframe><br /><br />You can see that a reflection invoke is on the order of a hundred times slower than a normal property (set) invocation.<br /><br />Here's the same chart but without the reflection invocation. It does a better job of showing the scale between the other tests.<br /><br /><iframe frameborder="no" height="300px" scrolling="no" src="http://www.google.com/fusiontables/embedviz?&containerId=gviz_canvas&q=select+col0%2C+col1+from+2840399+where+col1+%3C+'25000'&qrs=+and+col0+%3E%3D+&qre=+and+col0+%3C%3D+&qe=+limit+5&viz=GVIZ&t=BAR&width=500&height=300" width="500px"></iframe><br /><br />Obviously, the lesson here is to directly invoke methods and properties when possible. However, there are times when you don't know what a type looks like at compile time. Again, object serialization/deserialization would be one of those use cases.<br /><br />Here's an explanation of each of the tests:<br /><br /><span style="font-size: large;">Reflection Invoke</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L59" target="_blank">link</a>)<br /><br />This is essentially <span style="font-family: 'Courier New', Courier, monospace;">methodInfo.Invoke(obj, new[]{ value }</span> on the setter method of the property. It is by far the slowest approach to the problem. It's also the most common way to solve the problem of insufficient pre-compile time knowledge.<br /><br /><span style="font-size: large;">Direct Invoke</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L47" target="_blank">link</a>)<br /><br />This is nothing other than <span style="font-family: 'Courier New', Courier, monospace;">obj.Property = value</span>. Its as fast as it gets, but impractical for use cases where you don't have pre-compile time knowledge of the type.<br /><br /><span style="font-size: large;">Closure</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L92" target="_blank">link</a>)<br /><br />This isn't much more flexible than a direct invoke, but I thought it would be interesting to see how the performance degraded. This is where you create a function/closure ( <span style="font-family: 'Courier New', Courier, monospace;"><exampletype, string="">(x,y) => x.Property = y</exampletype,></span>) prior to the loop and just invoke the function inside the loop (<span style="font-family: 'Courier New', Courier, monospace;">action(obj, value)</span>). At first sight it appears to be half as fast as a direct invoke, but there are actually two method calls involved here, so it's actually not any slower than a direct invoke.<br /><br /><span style="font-size: large;">Dynamic Dispatch</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L78" target="_blank">link</a>)<br /><br />This uses the C# 4.0 dynamic feature directly. To do this, I declared the variable as dynamic and assigned it using the same syntax as a direct invoke. Interestingly, this performs only 6x slower than direct invoke and about 20x faster than reflection invoke. Take note, if you need reflection, use dynamic as often as possible since it can really speed up method invocation.<br /><br /><span style="font-size: large;">Expression Tree</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L110" target="_blank">link</a>)<br /><br />The shortcoming of most of the previous approaches is that they require pre-compile time knowledge of the type. This time I tried building an expression tree (a C# 3.0 feature) and compiled a delegate that invokes the setter. This makes it flexible enough that you can call any property of an object without compile-time knowledge of the name, as long as you know the return type. In this example, like the closure, we're indirectly setting the property, so two method calls. With this in mind, it took almost 2.5 times as long as the closure example, even though they should be functionally equivalent operations. It must be that expression trees compiled to delegates aren't actually as simple as they appear.<br /><br /><span style="font-size: large;">Expression Tree with Dynamic Dispatch</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L141" target="_blank">link</a>)<br /><br />Since the expression tree approach requires compile-time knowledge of the return type, it isn't as flexible. Ideally you could use C# 4.0's covariance feature and cast it to <span style="font-family: 'Courier New', Courier, monospace;">Action<object, object=""><object, object=""></object,></object,></span> which compiles, but fails at runtime. So for this one, I just assigned the closure to a variable typed as <span style="font-family: 'Courier New', Courier, monospace;">dynamic</span> to get around the compile/runtime casting issues.<br /><br />As expected, it's the slowest approach. However, its still 16 times faster than direct reflection. Perhaps, memoizing method calls, like property sets and gets, like this would actually yield a significant performance improvement.<br /><br /><span style="font-size: x-large;">Compared To Ruby</span><br /><br />I thought I'd compare these results to Ruby where all method calls are dynamic. In Ruby, a method call looks first in the object's immediate class and then climbs the ladder of parent classes until it finds a suitable method to invoke. Because of this behavior I thought I would be interesting to also try a worst-case scenario with a deep level of inheritance.<br /><br />To do this fairly, I initially wrote a <span style="font-family: 'Courier New', Courier, monospace;">while</span> loop in Ruby that counted to 100 million. I rewrote the while loop in <span style="font-family: 'Courier New', Courier, monospace;">n.each</span> syntax and saw the execution time get cut in half. Since I'm really just trying to measure method invocation time, I stuck with the <span style="font-family: 'Courier New', Courier, monospace;">n.each</span> syntax.<br /><br /><iframe frameborder="no" height="300px" scrolling="no" src="https://www.google.com/fusiontables/embedviz?&containerId=gviz_canvas&q=select+col0%2C+col1+from+2846447+&qrs=where+col0+%3E%3D+&qre=+and+col0+%3C%3D+&qe=+limit+4&viz=GVIZ&t=BAR&width=500&height=300" width="500px"></iframe><br /><br />I honestly thought C# Reflection would be significantly faster than the Ruby with 5 layers of in inheritance. While C# already holds a reference to the method (<span style="font-family: 'Courier New', Courier, monospace;">MethodInfo</span>), Ruby has to search up the ladder for the method each time. I suppose Ruby's performance could be due to the fact that it's written in C and specializes in dynamic method invocation.<br /><br />Also, it interests me why C# dynamic is so much faster than Ruby or reflection. I took a look at the IL code where the dynamic invoke was happening and was surprised to find a <span style="font-family: 'Courier New', Courier, monospace;">callvirt</span> instruction. I guess I was expecting some sort of specialized <span style="font-family: 'Courier New', Courier, monospace;">calldynamic</span> instruction (<a href="http://java.sun.com/developer/technicalArticles/DynTypeLang/" target="_blank">Java 7 has one</a>). The answer is actually a little more complicated. There seems to be several calls - most are <span style="font-family: 'Courier New', Courier, monospace;">call</span> instructions to set the stage (<span style="font-family: 'Courier New', Courier, monospace;">CSharpArgumentInfo.Create</span>) and one <span style="font-family: 'Courier New', Courier, monospace;">callvirt</span> instruction to actually invoke the method.<br /><br /><span style="font-size: x-large;">Conclusion</span><br /><br />Since the trend of C# is going towards using more <a href="http://msdn.microsoft.com/en-us/library/bb397947.aspx" target="_blank">Linq</a>, I find it interesting how much of a performance hit developers are willing to exchange for more readable and compact code. In the grand scheme of things, the performance of even a slow reflection invoke is probably insignificant compared to other bottlenecks like database, HTTP, filesystem, etc.<br /><br />It seems that I've proved the point that I set out to prove. There is quite a bit of performance to be gained by memoizing method calls into expression trees. The application would obviously be best in JSON serialization, ORM, or anywhere when you have to get/set lots of properties on an object with no compile-time knowledge of the type. Very few people, if any, are doing this - probably because of the added complexity. The next step will be to (hopefully) build a working prototype.<br /><br /><br /></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Jordan - I've looked at iSynapticCommons before and I've been very impressed with what I've seen. I see you're emitting CLR OpCodes to build code. An alternative approach is to use Mono.CSharp.Evaluator to compile significant amounts of code at runtime (http://tirania.org/blog/archive/2008/Sep-10.html)</div>
</div>
<div class='comment'>
<div class='author'>Jordan Terrell</div>
<div class='content'>
You should checkout out DynamicMethod creation. I used it to implement my Clonable class for extremely fast object cloning. You can find the code for that here: https://github.com/iSynaptic/iSynaptic.Commons/blob/master/Application/iSynaptic.Commons/Runtime/Serialization/Cloneable.cs<br /><br />I wrote a little bit about this here: http://blog.jordanterrell.com/post/iSynapticCommons-Cloneablelt;Tgt;.aspx</div>
</div>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Thanks Peter!</div>
</div>
<div class='comment'>
<div class='author'>Peter Weissbrod</div>
<div class='content'>
Same with NHibernate. Bytecode is being dynamically generated for data mappings upon startup, which results in a slow up-front load when creating a session factory, but usually you create one session factory per app domain.<br /><br />I dont know what they do with ORMs in Ruby (I wish I did) but in .NET all popular ORMs cache data mappings in some format OR they use dynamic expando objects.<br /><br />These are some great figure you have put together!</div>
</div>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
That's good to know. I didn't get a chance to browse the source. I have a feeling many libraries don't take advantage of reflection caching.</div>
</div>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
The C# mongodb driver does indeed cache it's reflection by compiled expression trees at runtime.</div>
</div>
</div>
<div class='post'>
I've always known that reflection method invocations C# are slower than regular invocations, but I've never never known to what extent. So I set out to make an experiment to demonstrate the performance of several ways to invoke a method. Frameworks like <a href="http://nhforge.org/" target="_blank">NHibernate</a> or the <a href="http://www.mongodb.org/display/DOCS/CSharp+Language+Center" target="_blank">mongoDB driver</a> are known to serialize and deserialize objects. In order to do either of these activities they have to scan the properties of an object and dynamically invoke them to get or set the values. Normally this is done via reflection. However, I want to know if the possibility of <a href="http://en.wikipedia.org/wiki/Memoization" target="_blank">memoizing</a> a method call as an expression tree or delegate could offer significant performance benefits. On the side, I also want to see how C# reflection compares to Ruby method invocations.<br /><br />I posted the full source to <a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark">a public github repo</a>. To quickly summarize, I wrote code that sets a property on an object 100 million times in a loop. Any setup (like finding a <span style="font-family: 'Courier New', Courier, monospace;">PropertyInfo</span> or <span style="font-family: 'Courier New', Courier, monospace;">MethodInfo</span>) is not included in the timings. I also checked the generated IL to make sure the compiler wasn't optimizing the loops. Please browse the code there if you need the gritty details.<br /><br />Before I get into the implementation details, here are the results:<br /><br /><iframe frameborder="no" height="300px" scrolling="no" src="http://www.google.com/fusiontables/embedviz?&containerId=gviz_canvas&q=select+col0%2C+col1+from+2840399+&qrs=where+col0+%3E%3D+&qre=+and+col0+%3C%3D+&qe=+limit+6&viz=GVIZ&t=BAR&width=500&height=300" width="500px"></iframe><br /><br />You can see that a reflection invoke is on the order of a hundred times slower than a normal property (set) invocation.<br /><br />Here's the same chart but without the reflection invocation. It does a better job of showing the scale between the other tests.<br /><br /><iframe frameborder="no" height="300px" scrolling="no" src="http://www.google.com/fusiontables/embedviz?&containerId=gviz_canvas&q=select+col0%2C+col1+from+2840399+where+col1+%3C+'25000'&qrs=+and+col0+%3E%3D+&qre=+and+col0+%3C%3D+&qe=+limit+5&viz=GVIZ&t=BAR&width=500&height=300" width="500px"></iframe><br /><br />Obviously, the lesson here is to directly invoke methods and properties when possible. However, there are times when you don't know what a type looks like at compile time. Again, object serialization/deserialization would be one of those use cases.<br /><br />Here's an explanation of each of the tests:<br /><br /><span style="font-size: large;">Reflection Invoke</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L59" target="_blank">link</a>)<br /><br />This is essentially <span style="font-family: 'Courier New', Courier, monospace;">methodInfo.Invoke(obj, new[]{ value }</span> on the setter method of the property. It is by far the slowest approach to the problem. It's also the most common way to solve the problem of insufficient pre-compile time knowledge.<br /><br /><span style="font-size: large;">Direct Invoke</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L47" target="_blank">link</a>)<br /><br />This is nothing other than <span style="font-family: 'Courier New', Courier, monospace;">obj.Property = value</span>. Its as fast as it gets, but impractical for use cases where you don't have pre-compile time knowledge of the type.<br /><br /><span style="font-size: large;">Closure</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L92" target="_blank">link</a>)<br /><br />This isn't much more flexible than a direct invoke, but I thought it would be interesting to see how the performance degraded. This is where you create a function/closure ( <span style="font-family: 'Courier New', Courier, monospace;"><exampletype, string="">(x,y) => x.Property = y</exampletype,></span>) prior to the loop and just invoke the function inside the loop (<span style="font-family: 'Courier New', Courier, monospace;">action(obj, value)</span>). At first sight it appears to be half as fast as a direct invoke, but there are actually two method calls involved here, so it's actually not any slower than a direct invoke.<br /><br /><span style="font-size: large;">Dynamic Dispatch</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L78" target="_blank">link</a>)<br /><br />This uses the C# 4.0 dynamic feature directly. To do this, I declared the variable as dynamic and assigned it using the same syntax as a direct invoke. Interestingly, this performs only 6x slower than direct invoke and about 20x faster than reflection invoke. Take note, if you need reflection, use dynamic as often as possible since it can really speed up method invocation.<br /><br /><span style="font-size: large;">Expression Tree</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L110" target="_blank">link</a>)<br /><br />The shortcoming of most of the previous approaches is that they require pre-compile time knowledge of the type. This time I tried building an expression tree (a C# 3.0 feature) and compiled a delegate that invokes the setter. This makes it flexible enough that you can call any property of an object without compile-time knowledge of the name, as long as you know the return type. In this example, like the closure, we're indirectly setting the property, so two method calls. With this in mind, it took almost 2.5 times as long as the closure example, even though they should be functionally equivalent operations. It must be that expression trees compiled to delegates aren't actually as simple as they appear.<br /><br /><span style="font-size: large;">Expression Tree with Dynamic Dispatch</span> (<a href="https://github.com/tkellogg/ReflectionPropertyInvokeBenchmark/blob/363b1242a0210c9d7deb4db2571134333476e96b/ReflectionBenchmark/Program.cs#L141" target="_blank">link</a>)<br /><br />Since the expression tree approach requires compile-time knowledge of the return type, it isn't as flexible. Ideally you could use C# 4.0's covariance feature and cast it to <span style="font-family: 'Courier New', Courier, monospace;">Action<object, object=""><object, object=""></object,></object,></span> which compiles, but fails at runtime. So for this one, I just assigned the closure to a variable typed as <span style="font-family: 'Courier New', Courier, monospace;">dynamic</span> to get around the compile/runtime casting issues.<br /><br />As expected, it's the slowest approach. However, its still 16 times faster than direct reflection. Perhaps, memoizing method calls, like property sets and gets, like this would actually yield a significant performance improvement.<br /><br /><span style="font-size: x-large;">Compared To Ruby</span><br /><br />I thought I'd compare these results to Ruby where all method calls are dynamic. In Ruby, a method call looks first in the object's immediate class and then climbs the ladder of parent classes until it finds a suitable method to invoke. Because of this behavior I thought I would be interesting to also try a worst-case scenario with a deep level of inheritance.<br /><br />To do this fairly, I initially wrote a <span style="font-family: 'Courier New', Courier, monospace;">while</span> loop in Ruby that counted to 100 million. I rewrote the while loop in <span style="font-family: 'Courier New', Courier, monospace;">n.each</span> syntax and saw the execution time get cut in half. Since I'm really just trying to measure method invocation time, I stuck with the <span style="font-family: 'Courier New', Courier, monospace;">n.each</span> syntax.<br /><br /><iframe frameborder="no" height="300px" scrolling="no" src="https://www.google.com/fusiontables/embedviz?&containerId=gviz_canvas&q=select+col0%2C+col1+from+2846447+&qrs=where+col0+%3E%3D+&qre=+and+col0+%3C%3D+&qe=+limit+4&viz=GVIZ&t=BAR&width=500&height=300" width="500px"></iframe><br /><br />I honestly thought C# Reflection would be significantly faster than the Ruby with 5 layers of in inheritance. While C# already holds a reference to the method (<span style="font-family: 'Courier New', Courier, monospace;">MethodInfo</span>), Ruby has to search up the ladder for the method each time. I suppose Ruby's performance could be due to the fact that it's written in C and specializes in dynamic method invocation.<br /><br />Also, it interests me why C# dynamic is so much faster than Ruby or reflection. I took a look at the IL code where the dynamic invoke was happening and was surprised to find a <span style="font-family: 'Courier New', Courier, monospace;">callvirt</span> instruction. I guess I was expecting some sort of specialized <span style="font-family: 'Courier New', Courier, monospace;">calldynamic</span> instruction (<a href="http://java.sun.com/developer/technicalArticles/DynTypeLang/" target="_blank">Java 7 has one</a>). The answer is actually a little more complicated. There seems to be several calls - most are <span style="font-family: 'Courier New', Courier, monospace;">call</span> instructions to set the stage (<span style="font-family: 'Courier New', Courier, monospace;">CSharpArgumentInfo.Create</span>) and one <span style="font-family: 'Courier New', Courier, monospace;">callvirt</span> instruction to actually invoke the method.<br /><br /><span style="font-size: x-large;">Conclusion</span><br /><br />Since the trend of C# is going towards using more <a href="http://msdn.microsoft.com/en-us/library/bb397947.aspx" target="_blank">Linq</a>, I find it interesting how much of a performance hit developers are willing to exchange for more readable and compact code. In the grand scheme of things, the performance of even a slow reflection invoke is probably insignificant compared to other bottlenecks like database, HTTP, filesystem, etc.<br /><br />It seems that I've proved the point that I set out to prove. There is quite a bit of performance to be gained by memoizing method calls into expression trees. The application would obviously be best in JSON serialization, ORM, or anywhere when you have to get/set lots of properties on an object with no compile-time knowledge of the type. Very few people, if any, are doing this - probably because of the added complexity. The next step will be to (hopefully) build a working prototype.<br /><br /><br /></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Jordan - I've looked at iSynapticCommons before and I've been very impressed with what I've seen. I see you're emitting CLR OpCodes to build code. An alternative approach is to use Mono.CSharp.Evaluator to compile significant amounts of code at runtime (http://tirania.org/blog/archive/2008/Sep-10.html)</div>
</div>
<div class='comment'>
<div class='author'>Jordan Terrell</div>
<div class='content'>
You should checkout out DynamicMethod creation. I used it to implement my Clonable class for extremely fast object cloning. You can find the code for that here: https://github.com/iSynaptic/iSynaptic.Commons/blob/master/Application/iSynaptic.Commons/Runtime/Serialization/Cloneable.cs<br /><br />I wrote a little bit about this here: http://blog.jordanterrell.com/post/iSynapticCommons-Cloneablelt;Tgt;.aspx</div>
</div>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Thanks Peter!</div>
</div>
<div class='comment'>
<div class='author'>Peter Weissbrod</div>
<div class='content'>
Same with NHibernate. Bytecode is being dynamically generated for data mappings upon startup, which results in a slow up-front load when creating a session factory, but usually you create one session factory per app domain.<br /><br />I dont know what they do with ORMs in Ruby (I wish I did) but in .NET all popular ORMs cache data mappings in some format OR they use dynamic expando objects.<br /><br />These are some great figure you have put together!</div>
</div>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
That's good to know. I didn't get a chance to browse the source. I have a feeling many libraries don't take advantage of reflection caching.</div>
</div>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
The C# mongodb driver does indeed cache it's reflection by compiled expression trees at runtime.</div>
</div>
</div>
Thoughts on the C# driver for MongoDB2012-02-03T00:00:00+00:00http://timkellogg.me/blog/2012/02/03/thoughts-on-c-driver-for-mongodb<div class='post'>
I recently started a new job with a software company in Boulder. Our project this year is rewriting the existing product (not a clean rewrite, more like rewrite & evolve). One of the changes we're making is using <a href="http://www.mongodb.org/">MongoDB</a> instead of <a href="http://en.wikipedia.org/wiki/Transact-SQL">T-SQL</a>. Since we're going to be investing pretty heavily in Mongo we all attended the mongo conference in Boulder on Wednesday. The information was great and now I'm ready to dig into my first app. Today I played around with some test code and made some notes about features/shortcomings of the <a href="http://www.mongodb.org/display/DOCS/CSharp+Driver+Tutorial#CSharpDriverTutorial-Introduction">C# driver</a>.<br /><br />First of all, the so-called "driver" is much full featured than a typical SQL driver. It includes features to map documents directly to CLR objects (from here on I'll just say <i>document</i> if I mean Mongo BSON document and <i>object</i> for CLR object). There's plans to support Linq directly from the driver. So right off I'm impressed with the richness of the driver. However, I noticed some shortcomings.<br /><br />For instance, all properties in the document must be present (and of the right type) in the object. I perceived this as a shortcoming because this is unlike regular JSON serialization where missing properties are ignored. After thinking a little further, this is probably what most C# developers would want since the behavior caters toward strongly typed languages that prefer fail-fast behavior. If you know a particular document might have extraneous properties that aren't in the object, you can use the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">BsonIgnoreExtraElements</span> attribute.<br /><br />Thinking about this behavior, refactor renaming properties could be less trivial. You would have to run a data migration script to rename the property (mongo does have <a href="http://www.mongodb.org/display/DOCS/Updating#Updating-%24rename">an operation</a> for renaming fields). It would be great if the driver had a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">[BsonAlias("OldValue")]</span> attribute to avoid migration scripts (maybe I'll make a pull request).<br /><br />Something I liked was that I could use <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">object</span> for the type of the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">_id</span> property instead of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">BsonObjectId</span>. This will keep the models less coupled to the Mongo driver API. Also, the driver already has a bi-directional alias for <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">_id</span> as <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Id</span>. I don't know any C# developers who wouldn't squirm at creating a public property named _<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">id</span>.<br /><br />This brings me to my biggest issue with the C# mongo driver. All properties must be public. This breaks the encapsulation and SRP principles. For instance, most of the time I have no reason to expose my <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Id</span> (or <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">_id</span>) property as public. NHibernate solves this by hydrating protected fields. I would like this to be solved very soon (but there are some issues with this since there isn't any mappings).<br /><br />Last, it has poor support for C# 4.0 types. Tuple doesn't fail, but it's serialized as an empty object ({ }). There is also zero support AFAIK for dynamic.<br /><br />In conclusion, there's some room for improvement with Mongo's integration with .NET but overall I have to say I'm impressed. Supposedly Linq support is due out very soon, which will make it unstoppable (imo). Also, we haven't started using this in a full production environment yet, so there will most likely be more posts coming on this topic.</div>
<div class='post'>
I recently started a new job with a software company in Boulder. Our project this year is rewriting the existing product (not a clean rewrite, more like rewrite & evolve). One of the changes we're making is using <a href="http://www.mongodb.org/">MongoDB</a> instead of <a href="http://en.wikipedia.org/wiki/Transact-SQL">T-SQL</a>. Since we're going to be investing pretty heavily in Mongo we all attended the mongo conference in Boulder on Wednesday. The information was great and now I'm ready to dig into my first app. Today I played around with some test code and made some notes about features/shortcomings of the <a href="http://www.mongodb.org/display/DOCS/CSharp+Driver+Tutorial#CSharpDriverTutorial-Introduction">C# driver</a>.<br /><br />First of all, the so-called "driver" is much full featured than a typical SQL driver. It includes features to map documents directly to CLR objects (from here on I'll just say <i>document</i> if I mean Mongo BSON document and <i>object</i> for CLR object). There's plans to support Linq directly from the driver. So right off I'm impressed with the richness of the driver. However, I noticed some shortcomings.<br /><br />For instance, all properties in the document must be present (and of the right type) in the object. I perceived this as a shortcoming because this is unlike regular JSON serialization where missing properties are ignored. After thinking a little further, this is probably what most C# developers would want since the behavior caters toward strongly typed languages that prefer fail-fast behavior. If you know a particular document might have extraneous properties that aren't in the object, you can use the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">BsonIgnoreExtraElements</span> attribute.<br /><br />Thinking about this behavior, refactor renaming properties could be less trivial. You would have to run a data migration script to rename the property (mongo does have <a href="http://www.mongodb.org/display/DOCS/Updating#Updating-%24rename">an operation</a> for renaming fields). It would be great if the driver had a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">[BsonAlias("OldValue")]</span> attribute to avoid migration scripts (maybe I'll make a pull request).<br /><br />Something I liked was that I could use <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">object</span> for the type of the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">_id</span> property instead of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">BsonObjectId</span>. This will keep the models less coupled to the Mongo driver API. Also, the driver already has a bi-directional alias for <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">_id</span> as <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Id</span>. I don't know any C# developers who wouldn't squirm at creating a public property named _<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">id</span>.<br /><br />This brings me to my biggest issue with the C# mongo driver. All properties must be public. This breaks the encapsulation and SRP principles. For instance, most of the time I have no reason to expose my <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Id</span> (or <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">_id</span>) property as public. NHibernate solves this by hydrating protected fields. I would like this to be solved very soon (but there are some issues with this since there isn't any mappings).<br /><br />Last, it has poor support for C# 4.0 types. Tuple doesn't fail, but it's serialized as an empty object ({ }). There is also zero support AFAIK for dynamic.<br /><br />In conclusion, there's some room for improvement with Mongo's integration with .NET but overall I have to say I'm impressed. Supposedly Linq support is due out very soon, which will make it unstoppable (imo). Also, we haven't started using this in a full production environment yet, so there will most likely be more posts coming on this topic.</div>
BDD ideas for structuring tests2012-01-02T00:00:00+00:00http://timkellogg.me/blog/2012/01/02/bdd-ideas-for-structuring-tests<div class='post'>
Lately <a href="http://timkellogg.blogspot.com/2011/12/behavior-driven-development-in-c.html">I've been thinking a lot</a> about the best way to do BDD in C#. So when I saw Phil Haack's post about <a href="http://haacked.com/archive/2012/01/02/structuring-unit-tests.aspx">structuring unit tests</a>, I think I had a joyful thought. Earlier I had been thinking in terms of using my <a href="http://www.blogger.com/"><span id="goog_2068416254"></span>Behavioral NUnit<span id="goog_2068416255"></span></a> experimental project to hash out Haack's structuring idea with better BDD integration.<br /><br />In short, his idea is to use nested classes. There is the normal one-to-one class-to-test-class mapping, but each method under test gets it's own inner class. To use his example:<br /><br /><script src="https://gist.github.com/139e0c2fd267001623f1.js?file=haacked.cs"></script><br />In this example the Titleify and Knightify methods (imo two terrible uses of the -ify suffix) have corresponding test classes dedicating to testing only one method. Each method in the class (or Fact, in the case of xUnit. I actually haven't used xUnit but it seems to encourage a somewhat BDD readability) test one aspect of the method, much like the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">it</span> method is used in rspec.<br /><br />I generally like Haack's test structure. For example, he points out how it plays nicely with Visual Studio's natural class/method navigation which makes the tests even more navigable. The only issue I have with it is that I dislike having 1000+ SLOC classes - tests or regular. If I were to adopt this method, I would probably break each of those inner classes into separate files (and use partial classes to break up the top class).<br /><br />My practice for a long time was to have one whole namespace per class under test. Consider <a href="https://github.com/tkellogg/objectflow/tree/master/objectflow.stateful.tests.unit">my tests for objectflow</a>. I actually picked up this practice from Garfield Moore, objectflow's original developer. Each class (or significant concept) has a namespace (e.g. objectflow.stateful.tests.unit.PossibleTransitions or PossibleTransitionTests). Each class in that namespace is names according to essentially what the Setup does. Some examples: WhenGivenOnlyBranches, WhenGivenOnlyYields, etc.<br /><br />I like the way these tests read. It's very easy to find a particular test or to read up on how a particular method is supposed to operate. But in practice this has led to very deep hierarchies, often with single class namespaces. Further, I find that creating a whole new class for each setup tends to create too much extra code. As a result, I have a hard time sticking closely to this practice.<br /><br />More recently I've felt a little overwhelmed with my original practice so I've evolved it slightly. Now I've started doing the one-to-one class to test mapping like commonly practiced. But each test has it's own method that does setup. For instance<br /><br /><script src="https://gist.github.com/139e0c2fd267001623f1.js?file=my-new-bdd.cs"></script><br />I also sometimes use this small variation of that structure where I keep the BDD sentence-style naming scheme but use TestCase attributes to quickly cover edge cases.<br /><br /><script src="https://gist.github.com/139e0c2fd267001623f1.js?file=my-new-bdd-2.cs"></script><br />I often use some hybrid of the last two approaches, especially if I would be using a TestCase attribute that breaks the BDD readability, I'll break the setup code into one of those Given_* support setup methods and reuse it between two different test methods.<br /><br />I generally like my most recent ways of structuring tests because of it's readability and ability to gain excellent edge case coverage by adding additional test cases. But I do really like Haack's structuring, so I may find myself adopting part of his suggestion and further evolving my tests.<br /><br />As far as this applies to Behavioral NUnit, I want to explore the possibility of a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Describe</span> attribute that mimics the usage of rspec's <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">describe</span> method. One idea is to make the new attribute generate another hierarchical level of test cases</div>
<div class='post'>
Lately <a href="http://timkellogg.blogspot.com/2011/12/behavior-driven-development-in-c.html">I've been thinking a lot</a> about the best way to do BDD in C#. So when I saw Phil Haack's post about <a href="http://haacked.com/archive/2012/01/02/structuring-unit-tests.aspx">structuring unit tests</a>, I think I had a joyful thought. Earlier I had been thinking in terms of using my <a href="http://www.blogger.com/"><span id="goog_2068416254"></span>Behavioral NUnit<span id="goog_2068416255"></span></a> experimental project to hash out Haack's structuring idea with better BDD integration.<br /><br />In short, his idea is to use nested classes. There is the normal one-to-one class-to-test-class mapping, but each method under test gets it's own inner class. To use his example:<br /><br /><script src="https://gist.github.com/139e0c2fd267001623f1.js?file=haacked.cs"></script><br />In this example the Titleify and Knightify methods (imo two terrible uses of the -ify suffix) have corresponding test classes dedicating to testing only one method. Each method in the class (or Fact, in the case of xUnit. I actually haven't used xUnit but it seems to encourage a somewhat BDD readability) test one aspect of the method, much like the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">it</span> method is used in rspec.<br /><br />I generally like Haack's test structure. For example, he points out how it plays nicely with Visual Studio's natural class/method navigation which makes the tests even more navigable. The only issue I have with it is that I dislike having 1000+ SLOC classes - tests or regular. If I were to adopt this method, I would probably break each of those inner classes into separate files (and use partial classes to break up the top class).<br /><br />My practice for a long time was to have one whole namespace per class under test. Consider <a href="https://github.com/tkellogg/objectflow/tree/master/objectflow.stateful.tests.unit">my tests for objectflow</a>. I actually picked up this practice from Garfield Moore, objectflow's original developer. Each class (or significant concept) has a namespace (e.g. objectflow.stateful.tests.unit.PossibleTransitions or PossibleTransitionTests). Each class in that namespace is names according to essentially what the Setup does. Some examples: WhenGivenOnlyBranches, WhenGivenOnlyYields, etc.<br /><br />I like the way these tests read. It's very easy to find a particular test or to read up on how a particular method is supposed to operate. But in practice this has led to very deep hierarchies, often with single class namespaces. Further, I find that creating a whole new class for each setup tends to create too much extra code. As a result, I have a hard time sticking closely to this practice.<br /><br />More recently I've felt a little overwhelmed with my original practice so I've evolved it slightly. Now I've started doing the one-to-one class to test mapping like commonly practiced. But each test has it's own method that does setup. For instance<br /><br /><script src="https://gist.github.com/139e0c2fd267001623f1.js?file=my-new-bdd.cs"></script><br />I also sometimes use this small variation of that structure where I keep the BDD sentence-style naming scheme but use TestCase attributes to quickly cover edge cases.<br /><br /><script src="https://gist.github.com/139e0c2fd267001623f1.js?file=my-new-bdd-2.cs"></script><br />I often use some hybrid of the last two approaches, especially if I would be using a TestCase attribute that breaks the BDD readability, I'll break the setup code into one of those Given_* support setup methods and reuse it between two different test methods.<br /><br />I generally like my most recent ways of structuring tests because of it's readability and ability to gain excellent edge case coverage by adding additional test cases. But I do really like Haack's structuring, so I may find myself adopting part of his suggestion and further evolving my tests.<br /><br />As far as this applies to Behavioral NUnit, I want to explore the possibility of a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Describe</span> attribute that mimics the usage of rspec's <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">describe</span> method. One idea is to make the new attribute generate another hierarchical level of test cases</div>
Can Bad Code Ruin Your Career?2011-12-30T00:00:00+00:00http://timkellogg.me/blog/2011/12/30/can-bad-code-ruin-your-career<div class='post'>
I started writing this post over a year ago. I was working at a large company where I was stuck in a mouse wheel - always running to keep up but never getting anywhere. The code I had to work with was downright terrible. This, among other things, prodded me into looking for another job. While I was starting my job search I was pondering this post and decided to not finish it because I wasn't sure if some prospective employer would hold it against me.<br /><br /><h2>With that said...</h2><br />I just finished reading through a messy Java file. It was the usual mess of a class with a 500 line god-method (similar to the god-object) and hundreds of counts of copy and pasted code. Besides the redundant code and lack of structure the coder also used nested loops through ArrayLists when they could have used a HashSet and didn't once use generic collections, using the un-type checked versions instead. After several hours of refactoring and renaming variables I finally got to a point where I could begin fixing the bug I was after. There were absolutely no unit tests - all this code was written inline with HTML in a JSP.<br /><br />I spend so much time reading bad code that sometimes I wonder if I am beginning to specialize in hacks. Is it possible to read so much bad code that you forget what good code looks like? Humans are an especially adaptive species, and I think it's definitely possible that a great programmer can be forced to work in the muck so long that they forget what good code looks like.<br /><br />I've seen several situations where good developers produced bad code. These situations are almost always a product of an environment where features are more important than bug fixes. These companies typically invest heavily in sales and neglect IT and development costs. Or sometimes the problem is just that product management knows nothing of software development.<br /><br /><h2>The 5 stages of grief</h2><br />A recent coworker likened our job of working with brittle, badly designed code to the 5 stages of grief. While we were uneasily laughing about it I silently decided that this was more realistic than I wanted to believe.<br /><br />For instance, imagine starting a new job. In the interview process you were interviewed by intelligent, enthusiastic developers and were led to believe you were going to be working on cutting edge technologies - a dream right? When you actually get to the job you find out that the code is so backwardly complicated that its nearly impossible to touch anything without bringing the proverbial house of cards crashing down.<br /><br /><b>Grief Stage 1: Denial and Isolation</b><br /><br />Obviously the code isn't the problem, you just weren't careful enough. They probably have specific guidelines and strategies that help them be more productive. It's probably just something wrong with me...<br /><br /><b>Grief Stage 2: Anger</b><br /><br />Dammit! Who the hell even thinks of this crap? [more cursing...] Is this a god-object?? [hair gets thinner...]<br /><br /><b>Grief Stage 3: Bargaining</b><br /><br />This is typically when you start plotting potential strategies to hide the ugliness of the code. Creativity and hopeful thoughts abound. Many IT managers will talk like they are very supportive of you at this stage.<br /><br /><b>Grief Stage 4: Depression</b><br /><br />This is where the reality strikes that this stage is bad for the business plan because it involves spending less time on revenue-producing features. The IT managers that seemed so supportive now flip flop to the CEO's side and deny you the ability to cope with your problems<br /><br /><b>Grief Stage 5: Acceptance</b><br /><br />There are only two outcomes of this stage. Either (1) you accept that you can never fix the code so you decide to move on to another job or (2) you accept that you can never fix the code so you give up on trying. This is what separates good coders from bad.<br /><br /><h2>Conclusion</h2><br />Again, I started this post over a year ago. I've seen a lot of bad code. At my most recent job I almost took the "give up on trying" path in the acceptance stage. Luckily we hired a great older developer who snapped me out of it. I just started my new job today, I think I will be much happier.<br /><br />So can bad code ruin your career? My answer is a resounding YES! But it doesn't have to. Honestly, stage 5 can have better endings, but that inevitably requires understanding on behalf of management - a scarce resource.</div>
<div class='post'>
I started writing this post over a year ago. I was working at a large company where I was stuck in a mouse wheel - always running to keep up but never getting anywhere. The code I had to work with was downright terrible. This, among other things, prodded me into looking for another job. While I was starting my job search I was pondering this post and decided to not finish it because I wasn't sure if some prospective employer would hold it against me.<br /><br /><h2>With that said...</h2><br />I just finished reading through a messy Java file. It was the usual mess of a class with a 500 line god-method (similar to the god-object) and hundreds of counts of copy and pasted code. Besides the redundant code and lack of structure the coder also used nested loops through ArrayLists when they could have used a HashSet and didn't once use generic collections, using the un-type checked versions instead. After several hours of refactoring and renaming variables I finally got to a point where I could begin fixing the bug I was after. There were absolutely no unit tests - all this code was written inline with HTML in a JSP.<br /><br />I spend so much time reading bad code that sometimes I wonder if I am beginning to specialize in hacks. Is it possible to read so much bad code that you forget what good code looks like? Humans are an especially adaptive species, and I think it's definitely possible that a great programmer can be forced to work in the muck so long that they forget what good code looks like.<br /><br />I've seen several situations where good developers produced bad code. These situations are almost always a product of an environment where features are more important than bug fixes. These companies typically invest heavily in sales and neglect IT and development costs. Or sometimes the problem is just that product management knows nothing of software development.<br /><br /><h2>The 5 stages of grief</h2><br />A recent coworker likened our job of working with brittle, badly designed code to the 5 stages of grief. While we were uneasily laughing about it I silently decided that this was more realistic than I wanted to believe.<br /><br />For instance, imagine starting a new job. In the interview process you were interviewed by intelligent, enthusiastic developers and were led to believe you were going to be working on cutting edge technologies - a dream right? When you actually get to the job you find out that the code is so backwardly complicated that its nearly impossible to touch anything without bringing the proverbial house of cards crashing down.<br /><br /><b>Grief Stage 1: Denial and Isolation</b><br /><br />Obviously the code isn't the problem, you just weren't careful enough. They probably have specific guidelines and strategies that help them be more productive. It's probably just something wrong with me...<br /><br /><b>Grief Stage 2: Anger</b><br /><br />Dammit! Who the hell even thinks of this crap? [more cursing...] Is this a god-object?? [hair gets thinner...]<br /><br /><b>Grief Stage 3: Bargaining</b><br /><br />This is typically when you start plotting potential strategies to hide the ugliness of the code. Creativity and hopeful thoughts abound. Many IT managers will talk like they are very supportive of you at this stage.<br /><br /><b>Grief Stage 4: Depression</b><br /><br />This is where the reality strikes that this stage is bad for the business plan because it involves spending less time on revenue-producing features. The IT managers that seemed so supportive now flip flop to the CEO's side and deny you the ability to cope with your problems<br /><br /><b>Grief Stage 5: Acceptance</b><br /><br />There are only two outcomes of this stage. Either (1) you accept that you can never fix the code so you decide to move on to another job or (2) you accept that you can never fix the code so you give up on trying. This is what separates good coders from bad.<br /><br /><h2>Conclusion</h2><br />Again, I started this post over a year ago. I've seen a lot of bad code. At my most recent job I almost took the "give up on trying" path in the acceptance stage. Luckily we hired a great older developer who snapped me out of it. I just started my new job today, I think I will be much happier.<br /><br />So can bad code ruin your career? My answer is a resounding YES! But it doesn't have to. Honestly, stage 5 can have better endings, but that inevitably requires understanding on behalf of management - a scarce resource.</div>
Behavior Driven Development in C#2011-12-28T00:00:00+00:00http://timkellogg.me/blog/2011/12/28/behavior-driven-development-in-c<div class='post'>
I've been a fan of Test Driven Development since I worked in an XP shop. But every time the work starts getting bigger and more complex I always struggle to not get lost in the magnitudes of tests. I remember many early-on conversations with my <i>elders</i> about unit test naming conventions. The [method]_[input]_[output] convention starts to break down badly when your inputs become things like mocks, or if there ends up being more than 1 or 2 inputs; same with outputs.<br /><br />When <a href="http://twitter.com/#!/mjezzi">a coworker</a> introduced me to BDD earlier this year, it really clicked and flowed naturally. The idea of writing tests so they read like sentences out of a book or spec seems like the answer to all my questions. The ruby <a href="http://rspec.info/documentation/">rspec</a> is beautiful:<br /><br /><script src="https://gist.github.com/1528845.js?file=simple_rspec.rb"></script><br />The organization of the tests forces you to focus on the expectations of your test and highlight descriptive assertions. This is especially useful for complicated setups with lots of mocks, etc. I put as much of my setup code in one of those <span style="font-family: 'Courier New', Courier, monospace;">before :each</span> blocks, so that way the assertions are limited to simple inputs and one or two observations about the outputs.<br /><br />There's been a number of <a href="http://persistall.com/archive/2007/11/05/further-thoughts-on-bdd-in-c.aspx">people</a> in the .NET community that have attempted BDD but [imo] failed to grasp the simplicity. NBehave is a complete overhaul of unit testing that uses attributes like <i>x</i>Unit. As a result, NBehave doesn't really look at all like rspec - which really isn't a bad thing, necessarily. However, the thing I like about rspec is it's ability to describe things of arbitrary depth, which is handy when testing complex code:<br /><br /><script src="https://gist.github.com/1528845.js?file=complex_rspec.rb"></script><br />This spec is able to describe possible modes that the object under test can be in (complex inputs). This is made possible by rspec's arbitrary nesting depth. This is definitely a language feature that is much harder to implement in C#.<br /><br />My current approach to BDD in C# usually looks like<br /><br /><script src="https://gist.github.com/1528845.js?file=BDD.cs"></script><br />I think this is the simplest BDD layer I can slap on top of NUnit. And simple is important to me because (a) I do a lot of open source projects and I want to keep the barrier to entry for contributions low and (b) the people I work with tend to resist change. When people are resistant to change, it's hard to rationalize using something other than NUnit or introducing lots of nested lambdas.<br /><br />NUnit remains the most popular unit testing framework and has excellent support with a GUI runner, console runner, and IDE integration with R#, TestDriven.NET, and others. Given all that support, I would really rather not abandon NUnit if possible.<br /><br /><a href="http://fluentassertions.codeplex.com/">FluentAssertions</a> is a nice simple BDD layer on top of NUnit (or whatever you use). It doesn't change the structure of our spec above, but it does change the structure of our assertion to<br /><br /><script src="https://gist.github.com/1528845.js?file=FluentAssertion_BDD.cs"></script><br />This assertion is [imo] very clean and succinct. I like how it reads even clearer than NUnit's fluent syntax. Last weekend I was thinking about this and I decided to explore an idea to make a BDD extension to NUnit that is even clearer than FluentAssertions. The project, BehavioralNUnit for now, is hosted at <a href="https://github.com/tkellogg/BehavioralNUnit">github</a>. The earliest goal for the project was simply to use operator overloading to make the assertions even more like rspec. For instance, I want to be make the previous assertion:<br /><br /><script src="https://gist.github.com/1528845.js?file=BehavioralNUnit_simple.cs"></script><br />I was able to do this, but I realized that the C# compiler was insisting that this expression needed to be assigned to something, so I [haven't yet] added another concept somewhat analogous to "it" in rspec:<br /><br /><script src="https://gist.github.com/1528845.js?file=BehavioralNUnit_complex.cs"></script><br />This is most similar to <a href="http://nspec.org/">NSpec's approach</a> by using an indexer instead of a method. This appeals to me because I sometimes find matching parentheses to be a pain (I guess I just like ruby & coffeescript). Then again, I don't like NSpec because it feels like it was written by one of those whining .NET developers that wishes dearly he could get a RoR job - it doesn't abide to .NET conventions at all.<br /><br />I still have a ton of ideas to hash out with Behavioral NUnit. I'm convinced that BDD in C# can be simpler and more beautiful than it currently is. If you have input or ideas, please <a href="https://github.com/tkellogg/BehavioralNUnit">fork the repository</a> & try out your ideas (pull requests are welcome).<br /><div><br /></div></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Michael, thanks for the link to BDDify. I've never seen that particular approach before. It's a different angle than what I'm trying to accomplish with Behavioral NUnit. They're not mutually exclusive; in fact they'd probably work well together.<br /><br />As far as the Moq Contrib container, I just started a new job this week and I'm still trying to gauge their in IoC, and what container they'll want to use. I may end up contributing a third container to MoqContrib if it seems appropriate. I'll try to post some info about the direction I'm moving in with that soon.</div>
</div>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
Hi Tim<br /><br />Have you seen bddify? It's quite a new BDD framework for .Net and aims for that simplicity you're talking about.<br />http://www.mehdi-khalili.com/bddify-in-action/introduction<br /><br />I actually came to your site to see if anything was happening with your Moq Contrib AutoMocking container with Castle Windsor? That seemed pretty interesting...<br /><br />Thanks<br />Michael</div>
</div>
</div>
<div class='post'>
I've been a fan of Test Driven Development since I worked in an XP shop. But every time the work starts getting bigger and more complex I always struggle to not get lost in the magnitudes of tests. I remember many early-on conversations with my <i>elders</i> about unit test naming conventions. The [method]_[input]_[output] convention starts to break down badly when your inputs become things like mocks, or if there ends up being more than 1 or 2 inputs; same with outputs.<br /><br />When <a href="http://twitter.com/#!/mjezzi">a coworker</a> introduced me to BDD earlier this year, it really clicked and flowed naturally. The idea of writing tests so they read like sentences out of a book or spec seems like the answer to all my questions. The ruby <a href="http://rspec.info/documentation/">rspec</a> is beautiful:<br /><br /><script src="https://gist.github.com/1528845.js?file=simple_rspec.rb"></script><br />The organization of the tests forces you to focus on the expectations of your test and highlight descriptive assertions. This is especially useful for complicated setups with lots of mocks, etc. I put as much of my setup code in one of those <span style="font-family: 'Courier New', Courier, monospace;">before :each</span> blocks, so that way the assertions are limited to simple inputs and one or two observations about the outputs.<br /><br />There's been a number of <a href="http://persistall.com/archive/2007/11/05/further-thoughts-on-bdd-in-c.aspx">people</a> in the .NET community that have attempted BDD but [imo] failed to grasp the simplicity. NBehave is a complete overhaul of unit testing that uses attributes like <i>x</i>Unit. As a result, NBehave doesn't really look at all like rspec - which really isn't a bad thing, necessarily. However, the thing I like about rspec is it's ability to describe things of arbitrary depth, which is handy when testing complex code:<br /><br /><script src="https://gist.github.com/1528845.js?file=complex_rspec.rb"></script><br />This spec is able to describe possible modes that the object under test can be in (complex inputs). This is made possible by rspec's arbitrary nesting depth. This is definitely a language feature that is much harder to implement in C#.<br /><br />My current approach to BDD in C# usually looks like<br /><br /><script src="https://gist.github.com/1528845.js?file=BDD.cs"></script><br />I think this is the simplest BDD layer I can slap on top of NUnit. And simple is important to me because (a) I do a lot of open source projects and I want to keep the barrier to entry for contributions low and (b) the people I work with tend to resist change. When people are resistant to change, it's hard to rationalize using something other than NUnit or introducing lots of nested lambdas.<br /><br />NUnit remains the most popular unit testing framework and has excellent support with a GUI runner, console runner, and IDE integration with R#, TestDriven.NET, and others. Given all that support, I would really rather not abandon NUnit if possible.<br /><br /><a href="http://fluentassertions.codeplex.com/">FluentAssertions</a> is a nice simple BDD layer on top of NUnit (or whatever you use). It doesn't change the structure of our spec above, but it does change the structure of our assertion to<br /><br /><script src="https://gist.github.com/1528845.js?file=FluentAssertion_BDD.cs"></script><br />This assertion is [imo] very clean and succinct. I like how it reads even clearer than NUnit's fluent syntax. Last weekend I was thinking about this and I decided to explore an idea to make a BDD extension to NUnit that is even clearer than FluentAssertions. The project, BehavioralNUnit for now, is hosted at <a href="https://github.com/tkellogg/BehavioralNUnit">github</a>. The earliest goal for the project was simply to use operator overloading to make the assertions even more like rspec. For instance, I want to be make the previous assertion:<br /><br /><script src="https://gist.github.com/1528845.js?file=BehavioralNUnit_simple.cs"></script><br />I was able to do this, but I realized that the C# compiler was insisting that this expression needed to be assigned to something, so I [haven't yet] added another concept somewhat analogous to "it" in rspec:<br /><br /><script src="https://gist.github.com/1528845.js?file=BehavioralNUnit_complex.cs"></script><br />This is most similar to <a href="http://nspec.org/">NSpec's approach</a> by using an indexer instead of a method. This appeals to me because I sometimes find matching parentheses to be a pain (I guess I just like ruby & coffeescript). Then again, I don't like NSpec because it feels like it was written by one of those whining .NET developers that wishes dearly he could get a RoR job - it doesn't abide to .NET conventions at all.<br /><br />I still have a ton of ideas to hash out with Behavioral NUnit. I'm convinced that BDD in C# can be simpler and more beautiful than it currently is. If you have input or ideas, please <a href="https://github.com/tkellogg/BehavioralNUnit">fork the repository</a> & try out your ideas (pull requests are welcome).<br /><div><br /></div></div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Tim Kellogg</div>
<div class='content'>
Michael, thanks for the link to BDDify. I've never seen that particular approach before. It's a different angle than what I'm trying to accomplish with Behavioral NUnit. They're not mutually exclusive; in fact they'd probably work well together.<br /><br />As far as the Moq Contrib container, I just started a new job this week and I'm still trying to gauge their in IoC, and what container they'll want to use. I may end up contributing a third container to MoqContrib if it seems appropriate. I'll try to post some info about the direction I'm moving in with that soon.</div>
</div>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
Hi Tim<br /><br />Have you seen bddify? It's quite a new BDD framework for .Net and aims for that simplicity you're talking about.<br />http://www.mehdi-khalili.com/bddify-in-action/introduction<br /><br />I actually came to your site to see if anything was happening with your Moq Contrib AutoMocking container with Castle Windsor? That seemed pretty interesting...<br /><br />Thanks<br />Michael</div>
</div>
</div>
Why I hate generated code2011-12-26T00:00:00+00:00http://timkellogg.me/blog/2011/12/26/why-i-hate-generated-code<div class='post'>
If you've worked with me for any amount of time you'll soon figure out that I often profess that <i>"I hate generated code"</i>. This position comes from years of experience with badly generated code. Let me explain.<br /><br /><h2>The baby comes with a lot of bathwater</h2>In the past year I had an experience with a generated data layer where <a href="http://www.codesmithtools.com/">CodeSmith</a> was used to generate a table, 5 stored procedures, an entity class, a data source class, and a factory class for each entity that was generated. My task was to convert this code into <a href="http://nhforge.org/">NHibernate</a> mappings.<br /><br />The interesting thing about this work is how little of the generated code was actually being used. I'm sure, in the beginning, the developer's thoughts were along the lines<i> "oh look at all this code I don't have to write manually :D"</i>. However, after some time, subsequent developer's thoughts were along the lines of <i>"with all this dead code, it's hard to find real problems"</i>. It's funny how some exciting breakthroughs turn into headaches down the road. The table is always used, but some entities are created & read but never modified, others are only created during migrations and only read from during run time.<br /><br />Code generators often produce code you don't need. Since all code requires maintenance, dead code is just a liability because it doesn't provide any benefit. I always delete dead code and commented out code (it'll live on in version control, no need to release it into production).<br /><br />There are several professional developer communities that generate code as a way of life. <a href="http://guides.rubyonrails.org/command_line.html">Ruby on Rails</a> comes prepackaged with scripts to generate models, views, and controllers in a single command. <a href="http://www.asp.net/mvc/tutorials/older-versions/controllers-and-routing/creating-a-controller-cs">ASP.NET MVC</a> will generate controllers and views with a couple clicks. And if you've ever used either of these frameworks, you'll probably find yourself deleting a lot of generated code.<br /><br /><h2>The problem of transient code generation</h2>The issue that I keep running into with my policy of hating code generation is that it's nearly impossible to be a professional software engineer and not generate code. The most fundamental problem is compilers. When you run a compiler over your source code, it <i>generates</i> some sort of machine readable code that is optimized for various goals like speed or debugging or different platform targets.<br /><br />While I hate code generators, it's hard to argue how I could possibly hate compilers. They allow me to write code once and compile it several different ways and achieve different goals. Therefore, I have to introduce my first caveat - I don't hate all generated code, <i>I only hate generated source code</i>.<br /><br />This problem of hating generated code is complicated further by the fact that NHibernate generates source code too. You don't ever check in the code that NHibernate generates because it's done at run time. The most obvious way NHibernate generates code is the SQL that is written in the background to query & perform DML operations. (For those questioning if SQL is source code, consider how SQL is compiled into an execution plan prior to execution). It's also hard to argue that I hate this kind of code generation because it doesn't suffer from the same problems of the CodeSmith generated code. It only generates code <i>just-in-time</i> meaning that it's only generated when needed, so there isn't any extra code generated.<br /><br />Since NHibernate and compilers do code generation in a way that I like, I'm going to refine my statement to <i>"I hate generated persistent code"</i>. This generally means, I still hate generated code when the resulting code sticks around long enough for a fellow developer to have to deal with it.<br /><br /><h2>The thin line between good and bad code generation</h2>When is generated code persistent and when is it transient? We already decided that code generation isn't so bad when it happens during of after the compilation process. But my statement is that I hate persistent code. There are other cases of code generators generating transient source code. One such example is in <a href="https://github.com/iSynaptic/iSynaptic.Commons">iSynaptic.Commons</a>.<br /><br />Since C# doesn't yet (and probably won't ever) include <a href="http://insanecoding.blogspot.com/2010/03/c-201x-variadic-templates.html">variadic templates</a> or variadic generic types, writers of .NET API's often write some really redundant code to account for all combinations of generic methods or types. I know I've done it. This example uses <a href="https://github.com/iSynaptic/iSynaptic.Commons/blob/master/Application/iSynaptic.Commons/FuncExtensions.tt">a T4 template</a> to produce a C# file with a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">*.generated.cs</span> extension. The T4 template is <a href="https://github.com/iSynaptic/iSynaptic.Commons/blob/master/Application/iSynaptic.Commons/iSynaptic.Commons.csproj#L195">executed on build</a> but not ignored from version control.<br /><br />I do like this approach because it takes a DRY approach to a redundant problem without much complication. Another thing I really like about this approach is that <a href="http://msdn.microsoft.com/en-us/library/bb126445.aspx">T4 templates</a> are a standard part of Visual Studio and are <a href="http://tirania.org/blog/archive/2009/Mar-10.html">executable from Mono</a> as well. As such, they can be considered a free tool that is openly available (important for open source projects) and, more importantly, are executed as part of the build process.<br /><br />Another thing I like about this approach is the usage of partial classes to separate the generated portion of the class from the non-generated portion. This minimizes the amount of code that is sheltered from refactoring tools (code inside the *.tt file).<br /><br />The thing I hate about this particular iSynaptic.Commons example is that the generated file is included in version control. I think, perhaps, this is reduced to a small pet peeve of mine since the generated code isn't wasteful and is updated on every build. Still, I would like a mechanism to (a) have the file ignored from the IDE's perspective and (b) ignored from version control. I wouldn't want anyone to mistakenly edit the file when they should be editing the T4 template.<br /><br /><h2>Summary</h2>The end result of my thought is <i>"I hate source code that is generated prior to the build process"</i>. I want to further say that I also hate generated code that is checked into version control, but this is a bit of a lesser point. However, code generation can be a useful tool; as seen in the cases of NHibernate and T4 templates. But even still, code generation should be used wisely and with care. Generating excess code can become a liability that detracts from the overall value of a product.</div>
<div class='post'>
If you've worked with me for any amount of time you'll soon figure out that I often profess that <i>"I hate generated code"</i>. This position comes from years of experience with badly generated code. Let me explain.<br /><br /><h2>The baby comes with a lot of bathwater</h2>In the past year I had an experience with a generated data layer where <a href="http://www.codesmithtools.com/">CodeSmith</a> was used to generate a table, 5 stored procedures, an entity class, a data source class, and a factory class for each entity that was generated. My task was to convert this code into <a href="http://nhforge.org/">NHibernate</a> mappings.<br /><br />The interesting thing about this work is how little of the generated code was actually being used. I'm sure, in the beginning, the developer's thoughts were along the lines<i> "oh look at all this code I don't have to write manually :D"</i>. However, after some time, subsequent developer's thoughts were along the lines of <i>"with all this dead code, it's hard to find real problems"</i>. It's funny how some exciting breakthroughs turn into headaches down the road. The table is always used, but some entities are created & read but never modified, others are only created during migrations and only read from during run time.<br /><br />Code generators often produce code you don't need. Since all code requires maintenance, dead code is just a liability because it doesn't provide any benefit. I always delete dead code and commented out code (it'll live on in version control, no need to release it into production).<br /><br />There are several professional developer communities that generate code as a way of life. <a href="http://guides.rubyonrails.org/command_line.html">Ruby on Rails</a> comes prepackaged with scripts to generate models, views, and controllers in a single command. <a href="http://www.asp.net/mvc/tutorials/older-versions/controllers-and-routing/creating-a-controller-cs">ASP.NET MVC</a> will generate controllers and views with a couple clicks. And if you've ever used either of these frameworks, you'll probably find yourself deleting a lot of generated code.<br /><br /><h2>The problem of transient code generation</h2>The issue that I keep running into with my policy of hating code generation is that it's nearly impossible to be a professional software engineer and not generate code. The most fundamental problem is compilers. When you run a compiler over your source code, it <i>generates</i> some sort of machine readable code that is optimized for various goals like speed or debugging or different platform targets.<br /><br />While I hate code generators, it's hard to argue how I could possibly hate compilers. They allow me to write code once and compile it several different ways and achieve different goals. Therefore, I have to introduce my first caveat - I don't hate all generated code, <i>I only hate generated source code</i>.<br /><br />This problem of hating generated code is complicated further by the fact that NHibernate generates source code too. You don't ever check in the code that NHibernate generates because it's done at run time. The most obvious way NHibernate generates code is the SQL that is written in the background to query & perform DML operations. (For those questioning if SQL is source code, consider how SQL is compiled into an execution plan prior to execution). It's also hard to argue that I hate this kind of code generation because it doesn't suffer from the same problems of the CodeSmith generated code. It only generates code <i>just-in-time</i> meaning that it's only generated when needed, so there isn't any extra code generated.<br /><br />Since NHibernate and compilers do code generation in a way that I like, I'm going to refine my statement to <i>"I hate generated persistent code"</i>. This generally means, I still hate generated code when the resulting code sticks around long enough for a fellow developer to have to deal with it.<br /><br /><h2>The thin line between good and bad code generation</h2>When is generated code persistent and when is it transient? We already decided that code generation isn't so bad when it happens during of after the compilation process. But my statement is that I hate persistent code. There are other cases of code generators generating transient source code. One such example is in <a href="https://github.com/iSynaptic/iSynaptic.Commons">iSynaptic.Commons</a>.<br /><br />Since C# doesn't yet (and probably won't ever) include <a href="http://insanecoding.blogspot.com/2010/03/c-201x-variadic-templates.html">variadic templates</a> or variadic generic types, writers of .NET API's often write some really redundant code to account for all combinations of generic methods or types. I know I've done it. This example uses <a href="https://github.com/iSynaptic/iSynaptic.Commons/blob/master/Application/iSynaptic.Commons/FuncExtensions.tt">a T4 template</a> to produce a C# file with a <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">*.generated.cs</span> extension. The T4 template is <a href="https://github.com/iSynaptic/iSynaptic.Commons/blob/master/Application/iSynaptic.Commons/iSynaptic.Commons.csproj#L195">executed on build</a> but not ignored from version control.<br /><br />I do like this approach because it takes a DRY approach to a redundant problem without much complication. Another thing I really like about this approach is that <a href="http://msdn.microsoft.com/en-us/library/bb126445.aspx">T4 templates</a> are a standard part of Visual Studio and are <a href="http://tirania.org/blog/archive/2009/Mar-10.html">executable from Mono</a> as well. As such, they can be considered a free tool that is openly available (important for open source projects) and, more importantly, are executed as part of the build process.<br /><br />Another thing I like about this approach is the usage of partial classes to separate the generated portion of the class from the non-generated portion. This minimizes the amount of code that is sheltered from refactoring tools (code inside the *.tt file).<br /><br />The thing I hate about this particular iSynaptic.Commons example is that the generated file is included in version control. I think, perhaps, this is reduced to a small pet peeve of mine since the generated code isn't wasteful and is updated on every build. Still, I would like a mechanism to (a) have the file ignored from the IDE's perspective and (b) ignored from version control. I wouldn't want anyone to mistakenly edit the file when they should be editing the T4 template.<br /><br /><h2>Summary</h2>The end result of my thought is <i>"I hate source code that is generated prior to the build process"</i>. I want to further say that I also hate generated code that is checked into version control, but this is a bit of a lesser point. However, code generation can be a useful tool; as seen in the cases of NHibernate and T4 templates. But even still, code generation should be used wisely and with care. Generating excess code can become a liability that detracts from the overall value of a product.</div>
Defining Watergile2011-12-01T00:00:00+00:00http://timkellogg.me/blog/2011/12/01/defining-watergile<div class='post'>
At the place of my current employment we've had a layer of management placed above us that fervently preaches the mightiness of agile. This management devotes much lecture time into informing us the proper procedure of planning a product. First you gather requirements and architect the entire system and write detailed requirements documents - good enough that developers don't need to refine them any further and QA knows exactly what to test. When requirements are written for the entire system - 12-24 months in advance - then you begin coding. After you're done coding, QA begins to test.<br /><br />To be clear, anyone reading the previous paragraph should be scratching their head and thinking to themself, "gee, that sounds a lot like waterfall". Well it is, hence the portmanteau <i>watergile </i>(we considered agilfall but it just doesn't roll off the tongue as well).<br /><br />The trouble is, even though we coined the term just recently, this watergile thing is a frigging pandemic. Every time I crack open a fresh copy of <a href="http://www.sdtimes.com/">SD Times</a> there seems to be some guy telling you that you need to be measuring KSLOC and a billion other software metrics but at the same time claiming that agile is the only way. It wouldn't be so scary except that this is the source of direction for software development managers.<br /><br />It's no wonder <i>watergile</i> is so widespread, IT managers are fed a constant stream of B.S. mixed messages. How could anyone make sense of any of it without dismissing most of it? The truth is, waterfall is hard and so is agile. Anything in between is just ad-hoc and setup to fail. If you are a development manager and reading this, find those tech magazines on the corner of your desk and show them to the recycling bin. They're worthless and distracting to progress.</div>
<div class='post'>
At the place of my current employment we've had a layer of management placed above us that fervently preaches the mightiness of agile. This management devotes much lecture time into informing us the proper procedure of planning a product. First you gather requirements and architect the entire system and write detailed requirements documents - good enough that developers don't need to refine them any further and QA knows exactly what to test. When requirements are written for the entire system - 12-24 months in advance - then you begin coding. After you're done coding, QA begins to test.<br /><br />To be clear, anyone reading the previous paragraph should be scratching their head and thinking to themself, "gee, that sounds a lot like waterfall". Well it is, hence the portmanteau <i>watergile </i>(we considered agilfall but it just doesn't roll off the tongue as well).<br /><br />The trouble is, even though we coined the term just recently, this watergile thing is a frigging pandemic. Every time I crack open a fresh copy of <a href="http://www.sdtimes.com/">SD Times</a> there seems to be some guy telling you that you need to be measuring KSLOC and a billion other software metrics but at the same time claiming that agile is the only way. It wouldn't be so scary except that this is the source of direction for software development managers.<br /><br />It's no wonder <i>watergile</i> is so widespread, IT managers are fed a constant stream of B.S. mixed messages. How could anyone make sense of any of it without dismissing most of it? The truth is, waterfall is hard and so is agile. Anything in between is just ad-hoc and setup to fail. If you are a development manager and reading this, find those tech magazines on the corner of your desk and show them to the recycling bin. They're worthless and distracting to progress.</div>
The Pain and Glory of C2011-11-06T00:00:00+00:00http://timkellogg.me/blog/2011/11/06/pain-and-glory-of-c<div class='post'>
I don't normally write much C code, but this past week I was fiddling around with it this past week to solve some programming puzzles. When I say C I mean straight C (without the ++ or #). Completely un-object-oriented; just structures, helper functions and malloc/free. It took me 3 days (a total of probably 9 hours) to write a fully functional 250-300 SLOC solution to a puzzle (complete with huge memory leaks). This all brings me to the burning question - who would ever want to write programs in C?<br /><br />C++ has developed over the years. I recently looked at some of the enhancements in C++11 which include the auto keyword (like var in C#), better reference counting "smart pointers", lambdas and closures. Obviously, C++ is developing and progressing. C hasn't had a spec change since 1999, and even then it wasn't exactly dramatic. We still don't have any OO or reference counting pointers.<br /><br />Have you ever tried interfacing with a library in C? It's very cumbersome. You have to read all the documentation and call the right my_library_object_*() functions at the right times. Everything is hands-on, nothing is left to imagination. You have to remember what memory you allocated so you can free it sometime later when you're sure you don't need it anymore (and then recursively free sub-structures and arrays).<br /><br />I think anyone can see warts in C. But its easy to forget the simplistic beauty. I mean, there aren't many operators in C, and there's only one way to cast. I mean, sure, you still can't create & initialize a counter variable inline in a for-loop. But the complex syntax of C++ is scary in comparison with all it's member::accessors, template<t classes="">, 5-6 ways to cast a variable and a slew of gotchas. Sure, C has it's share of gotchas, but the language is so small that anyone who's spent any significant time programming C can list most of them out for you (probably not so true with C++).</t><br /><br />So why not C#? Well, it's freaking slow!! Think about when people were converting their business apps from VB6 to C#. Sure the maintainability of the code improved by leaps and bounds, but almost everyone noticed the performance difference and wondered how the same program could be so slow.<br /><br />Recently Microsoft unveiled some information to developers about the upcoming Windows 8 release and it's metro interface. One of the biggest surprises to developers is how hard Microsoft is trying to sell C/C++ and how C#/.NET is falling by the wayside. The driving factor is that Apple has snappy user interfaces and Windows Forms are known for being slow and boring. So Microsoft created a new WinRT UI toolkit for Windows 8 that intends to never block the UI thread. Operations that take longer than ~50ms should use Async code so that the UI can continue to feel responsive. (This sounds eerily similar to <a href="http://nodejs.org/">Node.JS</a> but with a lot more code).<br /><br />Obviously Microsoft wants developers to develop faster apps by going back to C/C++, maybe we should consider taking them seriously. But I think the more likely direction is development being done primarily in one of the common dynamic languages like Ruby/Python/Node.JS with certain code that needs speedup written as C modules. All of those general purpose scripting languages are written in C (not C++) and interface very well with C. I've seen lots of math-intensive Python libraries being composed partly of C code (some with increasing portions written in C). I could also see the popularity of Node.JS increase if it was applied to more than web/networking apps but also non-blocking UI. (After all, this is basically what WinRT is).<br /><br />I don't know about you, but I'm going to be spending some time tuning up my C/C++ skills. History has been known to repeat, and I think it is now repeating yet again.</div>
<div class='post'>
I don't normally write much C code, but this past week I was fiddling around with it this past week to solve some programming puzzles. When I say C I mean straight C (without the ++ or #). Completely un-object-oriented; just structures, helper functions and malloc/free. It took me 3 days (a total of probably 9 hours) to write a fully functional 250-300 SLOC solution to a puzzle (complete with huge memory leaks). This all brings me to the burning question - who would ever want to write programs in C?<br /><br />C++ has developed over the years. I recently looked at some of the enhancements in C++11 which include the auto keyword (like var in C#), better reference counting "smart pointers", lambdas and closures. Obviously, C++ is developing and progressing. C hasn't had a spec change since 1999, and even then it wasn't exactly dramatic. We still don't have any OO or reference counting pointers.<br /><br />Have you ever tried interfacing with a library in C? It's very cumbersome. You have to read all the documentation and call the right my_library_object_*() functions at the right times. Everything is hands-on, nothing is left to imagination. You have to remember what memory you allocated so you can free it sometime later when you're sure you don't need it anymore (and then recursively free sub-structures and arrays).<br /><br />I think anyone can see warts in C. But its easy to forget the simplistic beauty. I mean, there aren't many operators in C, and there's only one way to cast. I mean, sure, you still can't create & initialize a counter variable inline in a for-loop. But the complex syntax of C++ is scary in comparison with all it's member::accessors, template<t classes="">, 5-6 ways to cast a variable and a slew of gotchas. Sure, C has it's share of gotchas, but the language is so small that anyone who's spent any significant time programming C can list most of them out for you (probably not so true with C++).</t><br /><br />So why not C#? Well, it's freaking slow!! Think about when people were converting their business apps from VB6 to C#. Sure the maintainability of the code improved by leaps and bounds, but almost everyone noticed the performance difference and wondered how the same program could be so slow.<br /><br />Recently Microsoft unveiled some information to developers about the upcoming Windows 8 release and it's metro interface. One of the biggest surprises to developers is how hard Microsoft is trying to sell C/C++ and how C#/.NET is falling by the wayside. The driving factor is that Apple has snappy user interfaces and Windows Forms are known for being slow and boring. So Microsoft created a new WinRT UI toolkit for Windows 8 that intends to never block the UI thread. Operations that take longer than ~50ms should use Async code so that the UI can continue to feel responsive. (This sounds eerily similar to <a href="http://nodejs.org/">Node.JS</a> but with a lot more code).<br /><br />Obviously Microsoft wants developers to develop faster apps by going back to C/C++, maybe we should consider taking them seriously. But I think the more likely direction is development being done primarily in one of the common dynamic languages like Ruby/Python/Node.JS with certain code that needs speedup written as C modules. All of those general purpose scripting languages are written in C (not C++) and interface very well with C. I've seen lots of math-intensive Python libraries being composed partly of C code (some with increasing portions written in C). I could also see the popularity of Node.JS increase if it was applied to more than web/networking apps but also non-blocking UI. (After all, this is basically what WinRT is).<br /><br />I don't know about you, but I'm going to be spending some time tuning up my C/C++ skills. History has been known to repeat, and I think it is now repeating yet again.</div>
Occupy Wall Street Is Not Stupid2011-10-31T00:00:00+00:00http://timkellogg.me/blog/2011/10/31/occupy-wall-street-is-not-stupid<div class='post'>
Earlier today I was talking with someone today who exclaimed, "Occupy Wall Street, that's so stupid!". I then proceeded to explain to them that OWS is trying to say <i>"hey, this capitalism thing isn't really working right now". </i>It's not to say that capitalism never worked, it's just pointing out that there are some significant holes in it right now.<br /><br />I believe that by now, most people (except some in Boulder) realize that communism has also failed. Now, communism didn't fail because <i>God hates communists</i>. It failed because it wasn't maximizing the total economic prosperity of all people. The people behind OWS have also realized [, I naively assume,] that capitalism in America is also no longer maximizing the total economic prosperity.<br /><br />In America today you see thousands of families that incurred large amounts of debt to a disgustingly rich minority. This rich minority (an oligarchy) forced these families out of their homes and into slavery. You might recognize that this looks a lot like the economic system that capitalism replaced - feudalism.<br /><br />OWS protesters are also crying out about the death grip that rich and powerful businesses have on our federal government. Some even claim that presidential elections are completely rigged (I probably wouldn't go that far). Either way, the government that our American forefathers created is completely absent and void from our current government. We've become so obsessed with being the most powerful country that we sacrificed the values and virtues that made us who we are.<br /><br />The Occupy Wall Street movement is right, our system is broken. Yes, there are many broken systems out there, but that's not a reason to not change them. Protest is an important political mechanism that has been proven to work in the past. We need it to work now. The only problem I have with OWS is that it seems to be an incohesive jumble of complaints with no real answers. But I suppose that's where real change begins.</div>
<div class='post'>
Earlier today I was talking with someone today who exclaimed, "Occupy Wall Street, that's so stupid!". I then proceeded to explain to them that OWS is trying to say <i>"hey, this capitalism thing isn't really working right now". </i>It's not to say that capitalism never worked, it's just pointing out that there are some significant holes in it right now.<br /><br />I believe that by now, most people (except some in Boulder) realize that communism has also failed. Now, communism didn't fail because <i>God hates communists</i>. It failed because it wasn't maximizing the total economic prosperity of all people. The people behind OWS have also realized [, I naively assume,] that capitalism in America is also no longer maximizing the total economic prosperity.<br /><br />In America today you see thousands of families that incurred large amounts of debt to a disgustingly rich minority. This rich minority (an oligarchy) forced these families out of their homes and into slavery. You might recognize that this looks a lot like the economic system that capitalism replaced - feudalism.<br /><br />OWS protesters are also crying out about the death grip that rich and powerful businesses have on our federal government. Some even claim that presidential elections are completely rigged (I probably wouldn't go that far). Either way, the government that our American forefathers created is completely absent and void from our current government. We've become so obsessed with being the most powerful country that we sacrificed the values and virtues that made us who we are.<br /><br />The Occupy Wall Street movement is right, our system is broken. Yes, there are many broken systems out there, but that's not a reason to not change them. Protest is an important political mechanism that has been proven to work in the past. We need it to work now. The only problem I have with OWS is that it seems to be an incohesive jumble of complaints with no real answers. But I suppose that's where real change begins.</div>
Quiet Time2011-09-30T00:00:00+00:00http://timkellogg.me/blog/2011/09/30/quiet-time<div class='post'>
Recently, we instituted a "core hours" policy among our developers that essentially equates to 4 hours of quiet time every day. During the hours of 10-12 and 2-4 developers aren't allowed to interrupt each other, nor can QA, product managers, or anyone else in the office interrupt developers. If you need help on a problem you have to either work through it on your own or wait until after the <i>quiet time</i>.<br /><br />The policy hasn't been in effect very long, but I've immediately noticed a significant jump in productivity. I would say I'm 1.5-2 times as productive now that I'm not getting interrupted every 15 minutes. I've also notice that I just plain enjoy coming to work more now.<br /><br />When we were talking about instituting the policy some were worried that it would be a problem that you couldn't clear up issues and roadblocks immediately. In practice, however, I think it isn't too much to ask everyone to wait [up to] two hours to clear roadblocks. In fact, it ends up forcing developers to solve their own problems.<br /><br />When I first started with this company I was isolated in a room by myself with entire days to myself. The isolation was too much; I often felt like I was being confined in a prison. Obviously I'm not advocating that total isolation is any kind of real solution. It's impractical to suggest that developers can complete their work successfully in total isolation. It takes a lot of dialog to produce quality software. But it's also impractical to suggest that they can get any work done when they're being pestered every 5-30 minutes.<br /><br />I highly recommend some sort of <i>quiet time</i> in any work place. In my opinion, the benefits are definitely not limited to just software engineering either.</div>
<div class='post'>
Recently, we instituted a "core hours" policy among our developers that essentially equates to 4 hours of quiet time every day. During the hours of 10-12 and 2-4 developers aren't allowed to interrupt each other, nor can QA, product managers, or anyone else in the office interrupt developers. If you need help on a problem you have to either work through it on your own or wait until after the <i>quiet time</i>.<br /><br />The policy hasn't been in effect very long, but I've immediately noticed a significant jump in productivity. I would say I'm 1.5-2 times as productive now that I'm not getting interrupted every 15 minutes. I've also notice that I just plain enjoy coming to work more now.<br /><br />When we were talking about instituting the policy some were worried that it would be a problem that you couldn't clear up issues and roadblocks immediately. In practice, however, I think it isn't too much to ask everyone to wait [up to] two hours to clear roadblocks. In fact, it ends up forcing developers to solve their own problems.<br /><br />When I first started with this company I was isolated in a room by myself with entire days to myself. The isolation was too much; I often felt like I was being confined in a prison. Obviously I'm not advocating that total isolation is any kind of real solution. It's impractical to suggest that developers can complete their work successfully in total isolation. It takes a lot of dialog to produce quality software. But it's also impractical to suggest that they can get any work done when they're being pestered every 5-30 minutes.<br /><br />I highly recommend some sort of <i>quiet time</i> in any work place. In my opinion, the benefits are definitely not limited to just software engineering either.</div>
AutoMapper And Incompleteness2011-09-15T00:00:00+00:00http://timkellogg.me/blog/2011/09/15/automapper-and-incompleteness<div class='post'>
This is part 2 of a series. Read <a href="http://timkellogg.blogspot.com/2011/09/view-models-automapper-and-law-of.html">part 1</a><br /><br />Earlier I talked about the Law of Demeter and how view models help us better adhere to the Law of Demeter. I also briefly outlined how AutoMapper makes view models practical. While AutoMapper is a great tool, it isn't completely fulfilling. Let me explain<br /><br />As I pointed out previously, some of the behaviors in AutoMapper make it feel incomplete. The first is that you can't map two view models to the same model and back.<br /><br />A much bigger problem with AutoMapper is that view models can't extend models. I'm not sure why they decided to disallow this usage, but it causes a cascade of code duplication (very un-DRY). Take a look at these classes:<br /><br /><script src="https://gist.github.com/1221098.js?file=ModelsAndViewModels.cs"></script><br /><br />There are a few things wrong here. Age is a nullable int on the model but the view model has just an int. If a null slips through this could cause a crashing error. While AutoMapper has an AssertConfigurationIsValid method, it doesn't test for this sort of case. You'll have to make unit tests for this, luckily you can use <a href="https://github.com/tkellogg/NetLint">NetLint</a> to easily test for these sorts of flukes.<br /><br />Another issue is the validation attributes. The facts that account codes look like CO11582 and that all accounts must have a name are descriptors of the domain (which the model is modelling). They aren't facts about the view (although they have to be expressed in the view), they are part of the model. Every time you create another AccountViewModelX derivative AutoMapper requires you to copy these attributes. This is a massive failure in the attempt to keep code DRY.<br /><br />Another issue I have is when I'm creating a view model I'm not sure what properties need to be created. I usually have to split the window and copy properties from model to view model (this screams obscenities at the idea of DRY code).<br /><br />One solution that I keep coming back to is to have view models extend models. For instance, see this implementation:<br /><br /><script src="https://gist.github.com/1221166.js?file=gistfile1.cs"></script><br /><br />Here, you don't have to type out all those properties a second (or third) time. They're just available. You also won't make the mistake of marking Age as non-nullable or forget to copy the validation attributes. It's all done for you by the compiler - no need to write extra tests.<br /><br />There are still some issues with this approach, and other approaches (such as encapsulation) that you can take. Perhaps there will be a part 3.</div>
<div class='post'>
This is part 2 of a series. Read <a href="http://timkellogg.blogspot.com/2011/09/view-models-automapper-and-law-of.html">part 1</a><br /><br />Earlier I talked about the Law of Demeter and how view models help us better adhere to the Law of Demeter. I also briefly outlined how AutoMapper makes view models practical. While AutoMapper is a great tool, it isn't completely fulfilling. Let me explain<br /><br />As I pointed out previously, some of the behaviors in AutoMapper make it feel incomplete. The first is that you can't map two view models to the same model and back.<br /><br />A much bigger problem with AutoMapper is that view models can't extend models. I'm not sure why they decided to disallow this usage, but it causes a cascade of code duplication (very un-DRY). Take a look at these classes:<br /><br /><script src="https://gist.github.com/1221098.js?file=ModelsAndViewModels.cs"></script><br /><br />There are a few things wrong here. Age is a nullable int on the model but the view model has just an int. If a null slips through this could cause a crashing error. While AutoMapper has an AssertConfigurationIsValid method, it doesn't test for this sort of case. You'll have to make unit tests for this, luckily you can use <a href="https://github.com/tkellogg/NetLint">NetLint</a> to easily test for these sorts of flukes.<br /><br />Another issue is the validation attributes. The facts that account codes look like CO11582 and that all accounts must have a name are descriptors of the domain (which the model is modelling). They aren't facts about the view (although they have to be expressed in the view), they are part of the model. Every time you create another AccountViewModelX derivative AutoMapper requires you to copy these attributes. This is a massive failure in the attempt to keep code DRY.<br /><br />Another issue I have is when I'm creating a view model I'm not sure what properties need to be created. I usually have to split the window and copy properties from model to view model (this screams obscenities at the idea of DRY code).<br /><br />One solution that I keep coming back to is to have view models extend models. For instance, see this implementation:<br /><br /><script src="https://gist.github.com/1221166.js?file=gistfile1.cs"></script><br /><br />Here, you don't have to type out all those properties a second (or third) time. They're just available. You also won't make the mistake of marking Age as non-nullable or forget to copy the validation attributes. It's all done for you by the compiler - no need to write extra tests.<br /><br />There are still some issues with this approach, and other approaches (such as encapsulation) that you can take. Perhaps there will be a part 3.</div>
View Models, AutoMapper, and The Law of Demeter2011-09-12T00:00:00+00:00http://timkellogg.me/blog/2011/09/12/view-models-automapper-and-law-of<div class='post'>
<div>The <a href="http://haacked.com/archive/2009/07/14/law-of-demeter-dot-counting.aspx">Law of Demeter</a> was created for the intent of simplifying object hierarchies and structures. Obviously it's not a blanket sort of law (doesn't seem to apply to <a href="http://www.themomorohoax.com/2009/02/25/how-to-write-a-clean-ruby-dsl-part-2-line-by-line-with-machinist-rails">DSL's </a>or fluent interfaces). But it is handy to keep in mind when modelling a domain. </div><div><br /></div><div>A classic example of a shortcomings of the Law of Demeter is name example: passing a model to a view that has a name object (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.Name.First</span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.Name.Last</span>, etc) versus passing a flattened view model (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.FirstName</span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.LastName</span>, etc). I think this is a great application of view models.</div><div><br /></div><div>I like the idea of view models because they're a great way to express view-specific business logic. The FirstName/LastName is an example, but they're also great for holding data necessary to populate drop down lists and summary views. Beyond code, view models are also a good example of the .NET community's ability to innovate new solutions to old problems (akin to <a href="http://timkellogg.blogspot.com/2011/08/parenthetical-thesis-on-rubynet-or.html">my thoughts about the ruby community</a>) </div><div><br /></div><div><span class="Apple-style-span" style="font-size: large;"><b>Yes, But...</b></span></div><div><span class="Apple-style-span" style="font-size: large;"><b><br /></b></span></div><div>While I definitely understand the benefits of view models, I'm still trying to figure out the best way to use them. When first creating view models the urge is to write and populate them by hand. This quickly becomes very tiresome. Enter <a href="http://automapper.codeplex.com/">AutoMapper</a>. </div><div><br /></div><div>AutoMapper is an object-to-object mapper designed very specifically for flattening models into view models. It bases it's decisions on conventions and provides a fluent interface for the remaining anomalies. It is a savior for those writing view models by hand.</div><div><br /></div><div>AutoMapper works only in one direction. You take an existing model and map and migrate the data into a view model. Going backwards; however, is another story. One big limitation of AutoMapper is that you can't map from two different source types to the same destination type. This makes it difficult or impossible to use AutoMapper to do bidirectional mappings (for instance, if you want to use AutoMapper when updating the model from FormCollection).</div><div><br /></div><div>There is quite a bit more I want to say on this matter, which I will continue in a second part</div></div>
<div class='post'>
<div>The <a href="http://haacked.com/archive/2009/07/14/law-of-demeter-dot-counting.aspx">Law of Demeter</a> was created for the intent of simplifying object hierarchies and structures. Obviously it's not a blanket sort of law (doesn't seem to apply to <a href="http://www.themomorohoax.com/2009/02/25/how-to-write-a-clean-ruby-dsl-part-2-line-by-line-with-machinist-rails">DSL's </a>or fluent interfaces). But it is handy to keep in mind when modelling a domain. </div><div><br /></div><div>A classic example of a shortcomings of the Law of Demeter is name example: passing a model to a view that has a name object (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.Name.First</span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.Name.Last</span>, etc) versus passing a flattened view model (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.FirstName</span>, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Model.LastName</span>, etc). I think this is a great application of view models.</div><div><br /></div><div>I like the idea of view models because they're a great way to express view-specific business logic. The FirstName/LastName is an example, but they're also great for holding data necessary to populate drop down lists and summary views. Beyond code, view models are also a good example of the .NET community's ability to innovate new solutions to old problems (akin to <a href="http://timkellogg.blogspot.com/2011/08/parenthetical-thesis-on-rubynet-or.html">my thoughts about the ruby community</a>) </div><div><br /></div><div><span class="Apple-style-span" style="font-size: large;"><b>Yes, But...</b></span></div><div><span class="Apple-style-span" style="font-size: large;"><b><br /></b></span></div><div>While I definitely understand the benefits of view models, I'm still trying to figure out the best way to use them. When first creating view models the urge is to write and populate them by hand. This quickly becomes very tiresome. Enter <a href="http://automapper.codeplex.com/">AutoMapper</a>. </div><div><br /></div><div>AutoMapper is an object-to-object mapper designed very specifically for flattening models into view models. It bases it's decisions on conventions and provides a fluent interface for the remaining anomalies. It is a savior for those writing view models by hand.</div><div><br /></div><div>AutoMapper works only in one direction. You take an existing model and map and migrate the data into a view model. Going backwards; however, is another story. One big limitation of AutoMapper is that you can't map from two different source types to the same destination type. This makes it difficult or impossible to use AutoMapper to do bidirectional mappings (for instance, if you want to use AutoMapper when updating the model from FormCollection).</div><div><br /></div><div>There is quite a bit more I want to say on this matter, which I will continue in a second part</div></div>
Introducing comboEditable2011-09-05T00:00:00+00:00http://timkellogg.me/blog/2011/09/05/introducing-comboeditable<div class='post'>
I'll admit, <a href="https://github.com/tkellogg/comboEditable">comboEditable </a>is an extremely dry name for an open source project (I would have used something like <i>Project Bierstadt</i> but it's not really that descriptive). Like everything else I develop and share publicly, this came out of necessity.<br /><br />In Windows there is a UI concept of an editable combo box. Basically you're given a drop down list of options and if you can't find the option you're looking for, you just type in another (see the <a href="http://tkellogg.github.com/comboEditable/">demo</a> if you're having trouble visualizing). This concept <i>does not exist</i> on the web or anywhere outside Windows applications. I assume that UX designers across the globe unanimously decided that an editable combo box is a UI kludge, but I still think it's a handy control.<br /><br />It is an unintrusive jQuery plugin that uses the regular HTML DOM as input and transforms into an editable combo box (a text box, hidden field and several divs, if you're wondering). The <i>unintrusive</i> part means that if scripts are disabled, the user still gets a combo box, just not an editable combo box.<br /><br />If you find yourself in need of an editable combo box, head over to the <a href="http://plugins.jquery.com/project/comboEditable">jQuery plugin page</a> or download it at <a href="https://github.com/tkellogg/comboEditable">github</a>. Also, take a look at the <a href="http://tkellogg.github.com/comboEditable/">demo</a> to see usage.</div>
<div class='post'>
I'll admit, <a href="https://github.com/tkellogg/comboEditable">comboEditable </a>is an extremely dry name for an open source project (I would have used something like <i>Project Bierstadt</i> but it's not really that descriptive). Like everything else I develop and share publicly, this came out of necessity.<br /><br />In Windows there is a UI concept of an editable combo box. Basically you're given a drop down list of options and if you can't find the option you're looking for, you just type in another (see the <a href="http://tkellogg.github.com/comboEditable/">demo</a> if you're having trouble visualizing). This concept <i>does not exist</i> on the web or anywhere outside Windows applications. I assume that UX designers across the globe unanimously decided that an editable combo box is a UI kludge, but I still think it's a handy control.<br /><br />It is an unintrusive jQuery plugin that uses the regular HTML DOM as input and transforms into an editable combo box (a text box, hidden field and several divs, if you're wondering). The <i>unintrusive</i> part means that if scripts are disabled, the user still gets a combo box, just not an editable combo box.<br /><br />If you find yourself in need of an editable combo box, head over to the <a href="http://plugins.jquery.com/project/comboEditable">jQuery plugin page</a> or download it at <a href="https://github.com/tkellogg/comboEditable">github</a>. Also, take a look at the <a href="http://tkellogg.github.com/comboEditable/">demo</a> to see usage.</div>
Parenthetical Thesis on Ruby.NET (or IronGem (or whatever the kids call it these days))2011-08-29T00:00:00+00:00http://timkellogg.me/blog/2011/08/29/parenthetical-thesis-on-rubynet-or<div class='post'>
Since college I've always been a huge fan of dynamic languages. I was really into Python for a long time and in the past year or so I've picked up Ruby. It's well known that the open source/dynamic language world has always looked down on the .NET/Java world as some sort of inferior. While having a conversation with a colleague about ruby versus .NET I stumbled on a conclusion.<br /><br />Ruby has some great features like mixins, monkey patching, <i>a <a href="http://en.wikipedia.org/wiki/Read-eval-print_loop">REPL</a>. </i>I also love how blocks make closures such an accessible and natural way to program. Ruby makes easy things easy and hard things fun.<br /><br />On the other hand, C# is one of the most beautiful typesafe languages (although F# is gaining favor with me). Linq and expression trees provide functionality that you literally cannot reproduce in dynamic languages (it requires knowledge of types, which dynamic languages theoretically shouldn't care about). With the crazy stuff that people are doing with expression trees (building SQL statements, mapping objects, selecting properties, etc) it makes it hard to say I'd rather be doing ruby.<br /><br />While C# has some analogous ruby constructs (extension methods are kind of like a lesser form of monkey patching), it still suffers from some of the classical faults of static languages (there can be a lot of extra code just to deal with types and to play nicely with the compiler). At the same time, the compiler also writes tests for you (a contract states you will have these methods, yet in ruby you can't ever be completely sure they'll actually be there. Something that you'd have to write unit tests for in ruby).<br /><br />The conclusion I came to was that, at this point in time, there really isn't a compelling reason why ruby is better than .NET or vice versa. Except for one thing - the communities. The ruby community is nearly too much fun. In Boulder, where I live, there are several companies that host regular hackfests. There are also annual ruby conventions where people get together, socialize, and share new ideas. In the .NET world we have some of those perks, but we're notoriously laiden with deadbeats. I can't tell you how many lame coworkers I've worked with that have little interest in improving themselves or the code they write. While in the Ruby world, they're not just interested in themselves or the code they write, but also <a href="http://codeforamerica.org/">in the community around them</a>.<br /><br />Despite all the debate, I'll probably keep my current job. I love the people I work with and I like participating in the .NET open source world (there really aren't any deadbeats in any sector of the open source world, by definition).</div>
<div class='post'>
Since college I've always been a huge fan of dynamic languages. I was really into Python for a long time and in the past year or so I've picked up Ruby. It's well known that the open source/dynamic language world has always looked down on the .NET/Java world as some sort of inferior. While having a conversation with a colleague about ruby versus .NET I stumbled on a conclusion.<br /><br />Ruby has some great features like mixins, monkey patching, <i>a <a href="http://en.wikipedia.org/wiki/Read-eval-print_loop">REPL</a>. </i>I also love how blocks make closures such an accessible and natural way to program. Ruby makes easy things easy and hard things fun.<br /><br />On the other hand, C# is one of the most beautiful typesafe languages (although F# is gaining favor with me). Linq and expression trees provide functionality that you literally cannot reproduce in dynamic languages (it requires knowledge of types, which dynamic languages theoretically shouldn't care about). With the crazy stuff that people are doing with expression trees (building SQL statements, mapping objects, selecting properties, etc) it makes it hard to say I'd rather be doing ruby.<br /><br />While C# has some analogous ruby constructs (extension methods are kind of like a lesser form of monkey patching), it still suffers from some of the classical faults of static languages (there can be a lot of extra code just to deal with types and to play nicely with the compiler). At the same time, the compiler also writes tests for you (a contract states you will have these methods, yet in ruby you can't ever be completely sure they'll actually be there. Something that you'd have to write unit tests for in ruby).<br /><br />The conclusion I came to was that, at this point in time, there really isn't a compelling reason why ruby is better than .NET or vice versa. Except for one thing - the communities. The ruby community is nearly too much fun. In Boulder, where I live, there are several companies that host regular hackfests. There are also annual ruby conventions where people get together, socialize, and share new ideas. In the .NET world we have some of those perks, but we're notoriously laiden with deadbeats. I can't tell you how many lame coworkers I've worked with that have little interest in improving themselves or the code they write. While in the Ruby world, they're not just interested in themselves or the code they write, but also <a href="http://codeforamerica.org/">in the community around them</a>.<br /><br />Despite all the debate, I'll probably keep my current job. I love the people I work with and I like participating in the .NET open source world (there really aren't any deadbeats in any sector of the open source world, by definition).</div>
Launching personal website2011-08-27T00:00:00+00:00http://timkellogg.me/blog/2011/08/27/launching-personal-website<div class='post'>
I spent some time today and solidified my personal website (<a href="http://tkellogg.github.com/">http://tkellogg.github.com</a>). I'm pretty excited about this website just because its a great demonstration of single page apps. Each of my main links doesn't actually take you to a different page - it uses a JavaScript routing engine (<a href="http://documentcloud.github.com/backbone/">backbone</a>) to load and display new content.<br /><br />I do have some plans for the site, but there are so many more important things to deal with these days. But if I can get to them I want to start a picasa site and load images into the site using the gdata api (like how I load blog posts now) and also integrate with github to list out my repositories and activity.</div>
<div class='post'>
I spent some time today and solidified my personal website (<a href="http://tkellogg.github.com/">http://tkellogg.github.com</a>). I'm pretty excited about this website just because its a great demonstration of single page apps. Each of my main links doesn't actually take you to a different page - it uses a JavaScript routing engine (<a href="http://documentcloud.github.com/backbone/">backbone</a>) to load and display new content.<br /><br />I do have some plans for the site, but there are so many more important things to deal with these days. But if I can get to them I want to start a picasa site and load images into the site using the gdata api (like how I load blog posts now) and also integrate with github to list out my repositories and activity.</div>
Maybe Node isn't so bad2011-08-08T00:00:00+00:00http://timkellogg.me/blog/2011/08/08/maybe-node-isnt-so-bad<div class='post'>
I know in <a href="http://timkellogg.blogspot.com/2011/06/got-backbone.html">previous</a> <a href="http://timkellogg.blogspot.com/2011/05/hipster-developers.html">posts</a> I bashed <a href="http://nodejs.org/">Node.js</a> a bit. I've done some thinking about it and I was struck by <a href="http://codeofrob.com/archive/2011/04/30/5-reasons-to-give-node-js-some-love.aspx">a revelation</a>. If you write a Node app that serves to a browser you can use the same code on client & server. That means you can use frameworks like <a href="http://documentcloud.github.com/backbone/">Backbone</a> to manage your business logic on both on the server and on the client inside a browser.<br /><br />The implications for this are huge. I've toyed with the idea of using Backbone + ASP.NET MVC together for a while now but I kept tripping up on all that code duplication between Backbone models and C# models. Node could be what launches the browser into a universal rich client host (and yes, HTML5 will help too).<br /><br />The other crazy idea I had about using node is that this means less languages to learn. Imagine if you wrote JavaScript intensive apps with Node and backed it up with <a href="http://www.couchbase.com/">couchbase</a> on the DB end. You would have JavaScript in your view, Javascript for business logic and JavaScript in the DB. The learning curve for a new developer to become productive would be the smallest learning curve that IT has seen in decades, probably for all time. This could change the landscape of IT forever. It wouldn't be such a bad idea to build a development team around that concept.<br /><div><br /></div></div>
<div class='post'>
I know in <a href="http://timkellogg.blogspot.com/2011/06/got-backbone.html">previous</a> <a href="http://timkellogg.blogspot.com/2011/05/hipster-developers.html">posts</a> I bashed <a href="http://nodejs.org/">Node.js</a> a bit. I've done some thinking about it and I was struck by <a href="http://codeofrob.com/archive/2011/04/30/5-reasons-to-give-node-js-some-love.aspx">a revelation</a>. If you write a Node app that serves to a browser you can use the same code on client & server. That means you can use frameworks like <a href="http://documentcloud.github.com/backbone/">Backbone</a> to manage your business logic on both on the server and on the client inside a browser.<br /><br />The implications for this are huge. I've toyed with the idea of using Backbone + ASP.NET MVC together for a while now but I kept tripping up on all that code duplication between Backbone models and C# models. Node could be what launches the browser into a universal rich client host (and yes, HTML5 will help too).<br /><br />The other crazy idea I had about using node is that this means less languages to learn. Imagine if you wrote JavaScript intensive apps with Node and backed it up with <a href="http://www.couchbase.com/">couchbase</a> on the DB end. You would have JavaScript in your view, Javascript for business logic and JavaScript in the DB. The learning curve for a new developer to become productive would be the smallest learning curve that IT has seen in decades, probably for all time. This could change the landscape of IT forever. It wouldn't be such a bad idea to build a development team around that concept.<br /><div><br /></div></div>
Git is a platform2011-07-27T00:00:00+00:00http://timkellogg.me/blog/2011/07/27/git-is-platform<div class='post'>
This evening I stuck my head in at <a href="http://quickleft.com/">quickleft's</a> <a href="http://quickleft.com/blog/tag/hackfest">hackfest</a> downtown boulder. They gave a great intro to ruby & sinatra. Sinatra is mind-bendingly simple. It makes you wonder why you've been doing anything but sinatra.<div><br /></div><div>Anyway, while I was playing around at the hackfest they introduced heroku, which is a cloud platform for ruby. Heroku uses git to let you manage your application's files on the server. Pushing a brand new repo creates a new domain name and sets up the infrastructure for your app. They built a very cool application on top of the <i>git platform</i>.</div><div><br /></div><div>Github has been doing this for a while. I blogged earlier about <a href="http://timkellogg.blogspot.com/2011/02/internal-secrets-of-git.html">github</a> and the things they've done with git. The most public things include git as a blogging/wiki engine as well as a static website generator (github pages). You can also fork <a href="https://github.com/icefox/git-achievements">git-achievements</a> and broadcast your mastery over git, <a href="http://tkellogg.github.com/git-achievements/">like I did</a>. Honestly, the things you can do with git are endless since it is, after all, nothing more than a versioning filesystem in user space.</div><div><br /></div><div>I think this is the biggest thing that separates git from other version control systems. No one has done anything with SVN beyond simple pre or post-commit hook scripts. TFS has a lot of application infrastructure built <i>around</i> it, but it doesn't build <i>on top</i> of it's version control system. Neither does mecurial or bazaar, even though they are also distributed version control systems. </div><div><br /></div><div>The git folks really focused on defining git as a standard rather than an application. By that I'm referring to how they defined objects, trees, packfiles, etc (see <a href="http://progit.org/book/ch9-0.html">progit</a>) instead of focusing on developing an application. For much of it's lifetime git was nothing but a hodgepodge of shell scripts and C libraries. Now days there are <a href="https://github.com/igorgue/git-sharp/wiki">several</a> <a href="http://www.jgit.org/">varying</a> <a href="http://libgit2.github.com/">implementations</a> <a href="http://deadpuck.net/blag/serving-git/">of</a> <a href="http://git-scm.com/">git</a>. The fact that git is so widely programatically accessible is making it insanely easy to leverage inside programs. I'm still waiting for a .NET app to do something big with git#...or maybe I could.</div></div>
<div class='post'>
This evening I stuck my head in at <a href="http://quickleft.com/">quickleft's</a> <a href="http://quickleft.com/blog/tag/hackfest">hackfest</a> downtown boulder. They gave a great intro to ruby & sinatra. Sinatra is mind-bendingly simple. It makes you wonder why you've been doing anything but sinatra.<div><br /></div><div>Anyway, while I was playing around at the hackfest they introduced heroku, which is a cloud platform for ruby. Heroku uses git to let you manage your application's files on the server. Pushing a brand new repo creates a new domain name and sets up the infrastructure for your app. They built a very cool application on top of the <i>git platform</i>.</div><div><br /></div><div>Github has been doing this for a while. I blogged earlier about <a href="http://timkellogg.blogspot.com/2011/02/internal-secrets-of-git.html">github</a> and the things they've done with git. The most public things include git as a blogging/wiki engine as well as a static website generator (github pages). You can also fork <a href="https://github.com/icefox/git-achievements">git-achievements</a> and broadcast your mastery over git, <a href="http://tkellogg.github.com/git-achievements/">like I did</a>. Honestly, the things you can do with git are endless since it is, after all, nothing more than a versioning filesystem in user space.</div><div><br /></div><div>I think this is the biggest thing that separates git from other version control systems. No one has done anything with SVN beyond simple pre or post-commit hook scripts. TFS has a lot of application infrastructure built <i>around</i> it, but it doesn't build <i>on top</i> of it's version control system. Neither does mecurial or bazaar, even though they are also distributed version control systems. </div><div><br /></div><div>The git folks really focused on defining git as a standard rather than an application. By that I'm referring to how they defined objects, trees, packfiles, etc (see <a href="http://progit.org/book/ch9-0.html">progit</a>) instead of focusing on developing an application. For much of it's lifetime git was nothing but a hodgepodge of shell scripts and C libraries. Now days there are <a href="https://github.com/igorgue/git-sharp/wiki">several</a> <a href="http://www.jgit.org/">varying</a> <a href="http://libgit2.github.com/">implementations</a> <a href="http://deadpuck.net/blag/serving-git/">of</a> <a href="http://git-scm.com/">git</a>. The fact that git is so widely programatically accessible is making it insanely easy to leverage inside programs. I'm still waiting for a .NET app to do something big with git#...or maybe I could.</div></div>
Semantic versioning2011-07-10T00:00:00+00:00http://timkellogg.me/blog/2011/07/10/semantic-versioning<div class='post'>
I've seen some interesting software version sequences. Like Windows 3, 3.1, 3.11, 95, 95, ME, XP, Vista, 7. Or Oracle DBMS v5, v6, 7, 8, 8i, 9i, 10g , 11g (what does the <i>g</i> mean??). I've seen all sorts of version schemes to designate major versions, minor versions, patches, and other types of releases. (The worst ones are always when marketing gets involved).<br /><br /><a href="https://github.com/mojombo">Tom Preston-Werner</a> formalized the major-minor-point release (<i>X.X.X</i>) scheme at <a href="http://semver.org/">semver.org</a>. I highly recommend anyone who considers themselves a professional developer to read every word in the article at <a href="http://semver.org/">semver.org</a>. The beauty of semantic versioning is that there isn't anything new or innovative about it at all. It's all what you already know to be true. All versions <1.0.0 are development versions. Once 1.0 hits, the public interface is solidified. If and only if you break backwards compatibility you have to increase the major version. Minor versions and point releases (1.X.0 and 1.0.X) are for various levels of new features and bug fixes.<br /><br />When you release software labeled with semantic versions you make it easy for people to quickly asses how significant the release is (I might skip a point release and upgrade to minor releases, but I might avoid a major release due to the incompatibilities it might cause). It also forces the developers to exercise restraint in breaking compatibility with previous releases.<br /><br />The trouble with semantic versions in the corporate world is that marketing always has ulterior motives. They want to release a major version to make the product feel alive; they want to downplay breaking changes to a minor version to keep customers; or they want to introduce new terms that mean nothing to the average user (XP for <i>eXPerience, </i>Vista because it sounds cool). Those names are great for development code-names but they detract from a buyer's experience (I use the term buyer loosely to mean any potential user) in determining compatibility between products.<br /><br />In .NET assemblies, there are four segments supported with the AssemblyVersion and AssemblyFileVersion attributes (major, minor, build number, revision). This seems fine until you want to release alphas, betas and release candidates. The semantic version for a 1.0 beta release would be 1.0.0beta1 indicating that this is the first beta for the 1.0.0 release (you can use any string of alphabetical characters, not just <i>beta</i>). In a .NET assembly <a href="http://stackoverflow.com/questions/64602/what-are-differences-between-assemblyversion-assemblyfileversion-and-assemblyinf">you do this as follows</a>:<br /><br /><pre class="brush: csharp">[assembly: AssemblyVersion("1.0.0")]<br />[assembly: AssemblyFileVersion("1.0.0.253")]<br />[assembly: AssemblyInformationalVersion("1.0.0beta1")]<br /></pre><br />The new attribute here is obviously <a href="http://msdn.microsoft.com/en-us/library/system.reflection.assemblyinformationalversionattribute.aspx">AssemblyInformationalVersion</a>, which is used to specify more arbitrary strings. It will show up in the Windows properties dialog as the assembly version (otherwise <a href="http://msdn.microsoft.com/en-us/library/system.reflection.assemblyversionattribute(v=vs.71).aspx">AssemblyVersion </a>will be used). Also, the <a href="http://msdn.microsoft.com/en-us/library/system.reflection.assemblyfileversionattribute.aspx">AssemblyFileVersion </a>is used to indicate build numbers. So while working on the 1.0.0 release, we also have a continuous integration environment like <a href="http://www.jetbrains.com/teamcity/">Teamcity </a>or <a href="http://hudson-ci.org/">Hudson </a>building the code each night and incrementing the build version. However, continuous integration environments shouldn't need to have any impact on what you actually tag the version as.<br /><br />As Tom says in the article, kinda sorta following the standard doesn't reap much benefit. But once we all start releasing software that conforms <i>exactly</i> to this standard, then users can more efficiently understand which two components are compatible and which aren't. I believe this applies to all software, not just software that supplies a public API.</div>
<div class='post'>
I've seen some interesting software version sequences. Like Windows 3, 3.1, 3.11, 95, 95, ME, XP, Vista, 7. Or Oracle DBMS v5, v6, 7, 8, 8i, 9i, 10g , 11g (what does the <i>g</i> mean??). I've seen all sorts of version schemes to designate major versions, minor versions, patches, and other types of releases. (The worst ones are always when marketing gets involved).<br /><br /><a href="https://github.com/mojombo">Tom Preston-Werner</a> formalized the major-minor-point release (<i>X.X.X</i>) scheme at <a href="http://semver.org/">semver.org</a>. I highly recommend anyone who considers themselves a professional developer to read every word in the article at <a href="http://semver.org/">semver.org</a>. The beauty of semantic versioning is that there isn't anything new or innovative about it at all. It's all what you already know to be true. All versions <1.0.0 are development versions. Once 1.0 hits, the public interface is solidified. If and only if you break backwards compatibility you have to increase the major version. Minor versions and point releases (1.X.0 and 1.0.X) are for various levels of new features and bug fixes.<br /><br />When you release software labeled with semantic versions you make it easy for people to quickly asses how significant the release is (I might skip a point release and upgrade to minor releases, but I might avoid a major release due to the incompatibilities it might cause). It also forces the developers to exercise restraint in breaking compatibility with previous releases.<br /><br />The trouble with semantic versions in the corporate world is that marketing always has ulterior motives. They want to release a major version to make the product feel alive; they want to downplay breaking changes to a minor version to keep customers; or they want to introduce new terms that mean nothing to the average user (XP for <i>eXPerience, </i>Vista because it sounds cool). Those names are great for development code-names but they detract from a buyer's experience (I use the term buyer loosely to mean any potential user) in determining compatibility between products.<br /><br />In .NET assemblies, there are four segments supported with the AssemblyVersion and AssemblyFileVersion attributes (major, minor, build number, revision). This seems fine until you want to release alphas, betas and release candidates. The semantic version for a 1.0 beta release would be 1.0.0beta1 indicating that this is the first beta for the 1.0.0 release (you can use any string of alphabetical characters, not just <i>beta</i>). In a .NET assembly <a href="http://stackoverflow.com/questions/64602/what-are-differences-between-assemblyversion-assemblyfileversion-and-assemblyinf">you do this as follows</a>:<br /><br /><pre class="brush: csharp">[assembly: AssemblyVersion("1.0.0")]<br />[assembly: AssemblyFileVersion("1.0.0.253")]<br />[assembly: AssemblyInformationalVersion("1.0.0beta1")]<br /></pre><br />The new attribute here is obviously <a href="http://msdn.microsoft.com/en-us/library/system.reflection.assemblyinformationalversionattribute.aspx">AssemblyInformationalVersion</a>, which is used to specify more arbitrary strings. It will show up in the Windows properties dialog as the assembly version (otherwise <a href="http://msdn.microsoft.com/en-us/library/system.reflection.assemblyversionattribute(v=vs.71).aspx">AssemblyVersion </a>will be used). Also, the <a href="http://msdn.microsoft.com/en-us/library/system.reflection.assemblyfileversionattribute.aspx">AssemblyFileVersion </a>is used to indicate build numbers. So while working on the 1.0.0 release, we also have a continuous integration environment like <a href="http://www.jetbrains.com/teamcity/">Teamcity </a>or <a href="http://hudson-ci.org/">Hudson </a>building the code each night and incrementing the build version. However, continuous integration environments shouldn't need to have any impact on what you actually tag the version as.<br /><br />As Tom says in the article, kinda sorta following the standard doesn't reap much benefit. But once we all start releasing software that conforms <i>exactly</i> to this standard, then users can more efficiently understand which two components are compatible and which aren't. I believe this applies to all software, not just software that supplies a public API.</div>
Got a backbone?2011-06-28T00:00:00+00:00http://timkellogg.me/blog/2011/06/28/got-backbone<div class='post'>
Earlier, I posted about those lame <a href="http://timkellogg.blogspot.com/2011/05/hipster-developers.html#links">hipster developers</a>, as I call them. Mainly, I just find it a little hard to believe that anyone can create a truly scalable JavaScript app using <a href="http://nodejs.org/">node</a>.<br /><br />Recently I stumbled into <a href="http://documentcloud.github.com/backbone/">Backbone</a> (or rather I kept on hearing about it and finally checked it out). Backbone is a bare bones MVC framework for JavaScript that is meant to help give your JavaScript apps structure without weighing them down. Also, more important, is that Backbone is by no means mutually exclusive with jQuery. Actually they compliment each other quite nicely.<br /><br />Back to those hipster developers. I don't often like to admit that a badly dressed 20-year-old can be right, and I still won't go so far as saying node.js is really a presentable solution for anything on the server, but the fact that they're expanding the infrastructure around JavaScript is really pushing me to think about how I can evolve my own .NET work. For me, Backbone is where it starts.<br /><br /><span class="Apple-style-span" style="font-size: large;">An Answer to Uncontrollably Messy JavaScript</span><br />I've written a lot of pages with big long blocks of jQuery chains and anonymous functions. It's such a huge pain to maintain or refactor that I sometimes end up rewriting. Part of the problem is just simply that the code is messy. But even when I break it down into smaller nugget sized functions I still have a fist-full of spaghetti code that is prone to unchecked regressions. I definitely need to test my code but<br /><br />Backbone lets you organize your code into Models, Views and Controllers and Collections. If you go all the way with Backbone, you're going to be creating pageless apps where you load the page the first time, and you never reload the page (like GMail). Everything is data fed to the page via JSON services. Controllers let you bind bookmarks to functions (i.e. when a link gets clicked where href="#!/inbox" the link gets routed to an inbox function and handled there). Views bind models to HTML. They also keep the models bound to the HTML, so when newer fresher data arrives, the models are rebound to the page where necessary.<br /><br />By modularizing code according to the MVC pattern, unit testing becomes significantly easier. Most of your normal issues like mocking the DOM & XHR become less important because your code is broken into smaller pieces. Besides being easier to test, it's just plain easier to understand also.<br /><br />When testing, if you do require mocking facilities, I've heard that <a href="http://sinonjs.org/">SinonJS</a> is excellent for all types of mocking, and comes with built in server & XHR mocks. Also, a coworker is pushing me towards <a href="http://behaviour-driven.org/">Behavior Driven Development</a> and so <a href="http://pivotal.github.com/jasmine/">Jasmine</a> is a natural winner for a test framework.<br /><br />I've heard people stress that Backbone is for web applications, not web sites. But at the same time, I don't think you need to go completely single-page to use Backbone either. In .NET, I don't really want to go single-page because MVC provides so much. But some of my pages that involve several page states could be dramatically simplified with an MVC approach. At bare minimum, I want to be able to simplify and test my client-side logic.</div>
<div class='post'>
Earlier, I posted about those lame <a href="http://timkellogg.blogspot.com/2011/05/hipster-developers.html#links">hipster developers</a>, as I call them. Mainly, I just find it a little hard to believe that anyone can create a truly scalable JavaScript app using <a href="http://nodejs.org/">node</a>.<br /><br />Recently I stumbled into <a href="http://documentcloud.github.com/backbone/">Backbone</a> (or rather I kept on hearing about it and finally checked it out). Backbone is a bare bones MVC framework for JavaScript that is meant to help give your JavaScript apps structure without weighing them down. Also, more important, is that Backbone is by no means mutually exclusive with jQuery. Actually they compliment each other quite nicely.<br /><br />Back to those hipster developers. I don't often like to admit that a badly dressed 20-year-old can be right, and I still won't go so far as saying node.js is really a presentable solution for anything on the server, but the fact that they're expanding the infrastructure around JavaScript is really pushing me to think about how I can evolve my own .NET work. For me, Backbone is where it starts.<br /><br /><span class="Apple-style-span" style="font-size: large;">An Answer to Uncontrollably Messy JavaScript</span><br />I've written a lot of pages with big long blocks of jQuery chains and anonymous functions. It's such a huge pain to maintain or refactor that I sometimes end up rewriting. Part of the problem is just simply that the code is messy. But even when I break it down into smaller nugget sized functions I still have a fist-full of spaghetti code that is prone to unchecked regressions. I definitely need to test my code but<br /><br />Backbone lets you organize your code into Models, Views and Controllers and Collections. If you go all the way with Backbone, you're going to be creating pageless apps where you load the page the first time, and you never reload the page (like GMail). Everything is data fed to the page via JSON services. Controllers let you bind bookmarks to functions (i.e. when a link gets clicked where href="#!/inbox" the link gets routed to an inbox function and handled there). Views bind models to HTML. They also keep the models bound to the HTML, so when newer fresher data arrives, the models are rebound to the page where necessary.<br /><br />By modularizing code according to the MVC pattern, unit testing becomes significantly easier. Most of your normal issues like mocking the DOM & XHR become less important because your code is broken into smaller pieces. Besides being easier to test, it's just plain easier to understand also.<br /><br />When testing, if you do require mocking facilities, I've heard that <a href="http://sinonjs.org/">SinonJS</a> is excellent for all types of mocking, and comes with built in server & XHR mocks. Also, a coworker is pushing me towards <a href="http://behaviour-driven.org/">Behavior Driven Development</a> and so <a href="http://pivotal.github.com/jasmine/">Jasmine</a> is a natural winner for a test framework.<br /><br />I've heard people stress that Backbone is for web applications, not web sites. But at the same time, I don't think you need to go completely single-page to use Backbone either. In .NET, I don't really want to go single-page because MVC provides so much. But some of my pages that involve several page states could be dramatically simplified with an MVC approach. At bare minimum, I want to be able to simplify and test my client-side logic.</div>
Introducing NetLint2011-06-26T00:00:00+00:00http://timkellogg.me/blog/2011/06/26/introducing-netlint<div class='post'>
Last week our QA guys wrote up a bug that one of our new pages wasn't working. After a little investigation I figured out it was just a JavaScript file that was inadvertently merged out of existence while resolving merge conflicts. We also had something like this happen where the app would run locally on developer boxes but would fail miserably when we deployed to the test environment.<br /><br />I don't really like giving the QA guys an excuse to blemish my reputation with bug reports, so I threw together a little tool to prevent this from ever happening again. Enter <a href="https://github.com/tkellogg/NetLint">NetLint</a>...<br /><br />NetLint processes Visual Studio project files (*.csproj, *.fsproj, etc) and compares files that exist in the project file and the files that actually exist on disk. So if a JavaScript file exists on disk but isn't in the project file, NetLint will throw an exception summarizing this and any other discrepancies.<br /><br />I also setup NetLint with simple file globbing functionality, so all files under bin/ and obj/ are ignored by default (you can also do custom patterns). I run NetLint from a unit test, so whenever anyone resolves merge conflicts they will instantaneously know if they missed a file.<br /><br />The future of NetLint will be a staging ground for <a href="http://devlicio.us/blogs/krzysztof_kozmic/archive/2011/03/09/testing-conventions.aspx">testing conventions</a>. I'm licensing it under the MIT license, so hopefully no one should have any reservations due to licensing. I also created <a href="http://nuget.org/List/Packages/NetLint">a NuGet package</a> to make it even easier to use</div>
<div class='post'>
Last week our QA guys wrote up a bug that one of our new pages wasn't working. After a little investigation I figured out it was just a JavaScript file that was inadvertently merged out of existence while resolving merge conflicts. We also had something like this happen where the app would run locally on developer boxes but would fail miserably when we deployed to the test environment.<br /><br />I don't really like giving the QA guys an excuse to blemish my reputation with bug reports, so I threw together a little tool to prevent this from ever happening again. Enter <a href="https://github.com/tkellogg/NetLint">NetLint</a>...<br /><br />NetLint processes Visual Studio project files (*.csproj, *.fsproj, etc) and compares files that exist in the project file and the files that actually exist on disk. So if a JavaScript file exists on disk but isn't in the project file, NetLint will throw an exception summarizing this and any other discrepancies.<br /><br />I also setup NetLint with simple file globbing functionality, so all files under bin/ and obj/ are ignored by default (you can also do custom patterns). I run NetLint from a unit test, so whenever anyone resolves merge conflicts they will instantaneously know if they missed a file.<br /><br />The future of NetLint will be a staging ground for <a href="http://devlicio.us/blogs/krzysztof_kozmic/archive/2011/03/09/testing-conventions.aspx">testing conventions</a>. I'm licensing it under the MIT license, so hopefully no one should have any reservations due to licensing. I also created <a href="http://nuget.org/List/Packages/NetLint">a NuGet package</a> to make it even easier to use</div>
Hipster developers2011-05-24T00:00:00+00:00http://timkellogg.me/blog/2011/05/24/hipster-developers<div class='post'>
I'd like to know what the deal is with these new <i>hipster developers</i>, as I like to call them. You know, those guys who adore those new<a href="http://www.readwriteweb.com/hack/2011/01/wait-whats-nodejs-good-for-aga.php"> languages</a> and <a href="http://www.nonblocking.io/2011/04/jquery-module-anti-pattern.html">frameworks</a> until they start catching on. I mean, you have to respect them for putting in that initial work to bring <a href="http://www.rubyinside.com/rails-3-1-adopts-coffeescript-jquery-sass-and-controversy-4669.html">technology</a> forward, but eventually they just become a headache. Honestly, does <a href="http://nodejs.org/">node</a> even have a chance of being a truly scalable solution?</div>
<div class='post'>
I'd like to know what the deal is with these new <i>hipster developers</i>, as I like to call them. You know, those guys who adore those new<a href="http://www.readwriteweb.com/hack/2011/01/wait-whats-nodejs-good-for-aga.php"> languages</a> and <a href="http://www.nonblocking.io/2011/04/jquery-module-anti-pattern.html">frameworks</a> until they start catching on. I mean, you have to respect them for putting in that initial work to bring <a href="http://www.rubyinside.com/rails-3-1-adopts-coffeescript-jquery-sass-and-controversy-4669.html">technology</a> forward, but eventually they just become a headache. Honestly, does <a href="http://nodejs.org/">node</a> even have a chance of being a truly scalable solution?</div>
Some useful git aliases2011-05-13T00:00:00+00:00http://timkellogg.me/blog/2011/05/13/some-useful-git-aliases<div class='post'>
Git aliases are a great way to do more with less typing. Our team uses submodules to an extent which can sometimes be confusing. Some of these aliases help to clarify behavior. These are a few of my favorites.<br /><br /><span class="Apple-style-span" style="font-size: large;">git lg</span><br /><br />This gives you a nicely formatted semi-graphical log view with users, branches, and remotes<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.lg "log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %C(green)%an%Creset %Cgreen(%cr)%Creset' --abbrev-commit --date=relative" </pre><br /><span class="Apple-style-span" style="font-size: large;">git latest</span><br /><br />This does a git pull on the current repository as well as all submodules<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.latest '!sh -c "git pull && git submodule foreach \"git pull\""'</pre><br /><span class="Apple-style-span" style="font-size: large;">git virgin </span>(getting to a pure state)<br /><br />This will reset your changes and delete all untracked and ignored files (includes bin/ and obj/ directories)<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.virgin '!sh -c "git reset HEAD --hard && git clean -fXd && git clean -fd"'</pre><br /><span class="Apple-style-span" style="font-size: large;">git harem </span>(a whole lot of virgins)<br /><br />This does a virgin for your repository as well as all submodules<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.harem '!sh -c "git virgin && git submodule \"git harem\""'</pre></div>
<div class='post'>
Git aliases are a great way to do more with less typing. Our team uses submodules to an extent which can sometimes be confusing. Some of these aliases help to clarify behavior. These are a few of my favorites.<br /><br /><span class="Apple-style-span" style="font-size: large;">git lg</span><br /><br />This gives you a nicely formatted semi-graphical log view with users, branches, and remotes<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.lg "log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %C(green)%an%Creset %Cgreen(%cr)%Creset' --abbrev-commit --date=relative" </pre><br /><span class="Apple-style-span" style="font-size: large;">git latest</span><br /><br />This does a git pull on the current repository as well as all submodules<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.latest '!sh -c "git pull && git submodule foreach \"git pull\""'</pre><br /><span class="Apple-style-span" style="font-size: large;">git virgin </span>(getting to a pure state)<br /><br />This will reset your changes and delete all untracked and ignored files (includes bin/ and obj/ directories)<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.virgin '!sh -c "git reset HEAD --hard && git clean -fXd && git clean -fd"'</pre><br /><span class="Apple-style-span" style="font-size: large;">git harem </span>(a whole lot of virgins)<br /><br />This does a virgin for your repository as well as all submodules<br /><span class="Apple-style-span" style="color: #484848; font-family: Verdana, sans-serif; font-size: 12px;"></span><br /><pre style="background-color: #fafafa; border-bottom-color: rgb(218, 218, 218); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(218, 218, 218); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(218, 218, 218); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(218, 218, 218); border-top-style: solid; border-top-width: 1px; margin-bottom: 1em; margin-left: 1.6em; margin-right: 1em; margin-top: 1em; overflow-x: auto; overflow-y: hidden; padding-bottom: 2px; padding-left: 0px; padding-right: 2px; padding-top: 2px; width: auto;">git config --global alias.harem '!sh -c "git virgin && git submodule \"git harem\""'</pre></div>
Scripting with rake2011-04-20T00:00:00+00:00http://timkellogg.me/blog/2011/04/20/scripting-with-rake<div class='post'>
<a href="http://martinfowler.com/articles/rake.html">Rake</a> is a great twist on traditional <a href="http://www.gnu.org/software/make/">make</a> (honestly, I never really liked <a href="http://ant.apache.org/">Ant</a> or <a href="http://nant.sourceforge.net/">NAnt</a>). On the surface it looks more like make than Ant or Nant, but you can leverage the full syntax and standard library of <a href="http://www.ruby-lang.org/en/">Ruby</a> (and there's no <a href="http://www.gnu.org/s/hello/manual/make/Error-Messages.html">weird rules about tabs</a>). As a .NET developer, <a href="https://github.com/derickbailey/Albacore">albacore</a> augments rake nicely with tasks for MSBuild (building Visual Studio projects and solutions), NUnit, ASP.NET precompiler, modifying your <a href="https://github.com/derickbailey/Albacore/wiki/AssemblyInfoTask">AssemblyInfo.cs</a> (like for bumping the version number), and many more.<br /><br />Since rake is just ruby code, you can do just about anything, but most file manipulation routines are even easier to write in rake, because most everything is already imported and ready to use. Unlike make, Ant, and Nant, you don't have to start a separate project just to develop tools to use in a rakefile, just write a ruby function!<br /><br /><span class="Apple-style-span" style="font-size: large;">Building dependencies first</span><br />A lot of people who aren't already familiar with build languages make some common mistakes. Among them, not using dependencies correctly. For instance, given a website solution that references framework<br /><br /><pre class="brush: ruby">msbuild :framework do |msb|<br /> msb.solution = 'framework/src/framework.sln'<br />end<br /><br />msbuild :website do |msb|<br /> msb.solution = 'src/website.sln'<br />end<br /><br />task :default => [:framework, :website]<br /></pre><br />The default task is the task that's executed when you just type <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake</span> at the CLI. The reason this is terrible is that it's procedural and inflexible. Now, if I do <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake website</span> the build fails because framework hasn't been built yet. Instead, each task should specify what other tasks it <i>directly</i> relies on. This script should change to:<br /><br /><pre class="brush: ruby">msbuild :framework do |msb|<br /> msb.solution = 'framework/src/framework.sln'<br />end<br /><br />msbuild :website => :framework do |msb|<br /> msb.solution = 'src/website.sln'<br />end<br /><br />task :default => :website<br /></pre><br />This way both <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake</span> and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake website</span> work the same. This leverages rakes dependency framework that is at the core of all build languages.<br /><br /><span class="Apple-style-span" style="font-size: large;">Using file tasks</span><br />The other point that people often forget is that build languages are oriented around files. <i>Make</i> tasks were oriented around questions like <i>"does this file need to be created?"</i>. This is where rakes <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">file</span> task comes in very handy. For instance, the above tasks can become<br /><br /><pre class="brush: ruby">$framework_dll = 'framework/src/framework/bin/Debug/framework.dll'<br /><br />file $framework_dll => :framework<br /><br />$website_dll = 'website/bin/Debug/website.dll'<br /><br />file $website_dll => :website<br /><br />msbuild :framework do |msb|<br /> msb.solution = 'framework/src/framework.sln'<br />end<br /><br />msbuild :website => $framework_dll do |msb|<br /> msb.solution = 'src/website.sln'<br />end<br /><br />task :default => $website_dll<br /></pre><br />This makes it so that framework and website are only built if they aren't built already and won't be attempted unless they're missing.<br /><br /><span class="Apple-style-span" style="font-size: large;">Arbitrary scripting</span><br />Rake is a great platform for hosting arbitrary scripts that you might write to automate your development process. I have scripts to bump the assembly version and subsequently commit to git, deploy to our test server, and I plan to make tasks to interact with redmine via it's REST API (something certainly not possible in NAnt). Basically, any little task that I might write a script for (which is quite a bit) can be imported into the rakefile and mounted as a task (yes, ruby is very modular).</div>
<div class='post'>
<a href="http://martinfowler.com/articles/rake.html">Rake</a> is a great twist on traditional <a href="http://www.gnu.org/software/make/">make</a> (honestly, I never really liked <a href="http://ant.apache.org/">Ant</a> or <a href="http://nant.sourceforge.net/">NAnt</a>). On the surface it looks more like make than Ant or Nant, but you can leverage the full syntax and standard library of <a href="http://www.ruby-lang.org/en/">Ruby</a> (and there's no <a href="http://www.gnu.org/s/hello/manual/make/Error-Messages.html">weird rules about tabs</a>). As a .NET developer, <a href="https://github.com/derickbailey/Albacore">albacore</a> augments rake nicely with tasks for MSBuild (building Visual Studio projects and solutions), NUnit, ASP.NET precompiler, modifying your <a href="https://github.com/derickbailey/Albacore/wiki/AssemblyInfoTask">AssemblyInfo.cs</a> (like for bumping the version number), and many more.<br /><br />Since rake is just ruby code, you can do just about anything, but most file manipulation routines are even easier to write in rake, because most everything is already imported and ready to use. Unlike make, Ant, and Nant, you don't have to start a separate project just to develop tools to use in a rakefile, just write a ruby function!<br /><br /><span class="Apple-style-span" style="font-size: large;">Building dependencies first</span><br />A lot of people who aren't already familiar with build languages make some common mistakes. Among them, not using dependencies correctly. For instance, given a website solution that references framework<br /><br /><pre class="brush: ruby">msbuild :framework do |msb|<br /> msb.solution = 'framework/src/framework.sln'<br />end<br /><br />msbuild :website do |msb|<br /> msb.solution = 'src/website.sln'<br />end<br /><br />task :default => [:framework, :website]<br /></pre><br />The default task is the task that's executed when you just type <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake</span> at the CLI. The reason this is terrible is that it's procedural and inflexible. Now, if I do <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake website</span> the build fails because framework hasn't been built yet. Instead, each task should specify what other tasks it <i>directly</i> relies on. This script should change to:<br /><br /><pre class="brush: ruby">msbuild :framework do |msb|<br /> msb.solution = 'framework/src/framework.sln'<br />end<br /><br />msbuild :website => :framework do |msb|<br /> msb.solution = 'src/website.sln'<br />end<br /><br />task :default => :website<br /></pre><br />This way both <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake</span> and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">rake website</span> work the same. This leverages rakes dependency framework that is at the core of all build languages.<br /><br /><span class="Apple-style-span" style="font-size: large;">Using file tasks</span><br />The other point that people often forget is that build languages are oriented around files. <i>Make</i> tasks were oriented around questions like <i>"does this file need to be created?"</i>. This is where rakes <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">file</span> task comes in very handy. For instance, the above tasks can become<br /><br /><pre class="brush: ruby">$framework_dll = 'framework/src/framework/bin/Debug/framework.dll'<br /><br />file $framework_dll => :framework<br /><br />$website_dll = 'website/bin/Debug/website.dll'<br /><br />file $website_dll => :website<br /><br />msbuild :framework do |msb|<br /> msb.solution = 'framework/src/framework.sln'<br />end<br /><br />msbuild :website => $framework_dll do |msb|<br /> msb.solution = 'src/website.sln'<br />end<br /><br />task :default => $website_dll<br /></pre><br />This makes it so that framework and website are only built if they aren't built already and won't be attempted unless they're missing.<br /><br /><span class="Apple-style-span" style="font-size: large;">Arbitrary scripting</span><br />Rake is a great platform for hosting arbitrary scripts that you might write to automate your development process. I have scripts to bump the assembly version and subsequently commit to git, deploy to our test server, and I plan to make tasks to interact with redmine via it's REST API (something certainly not possible in NAnt). Basically, any little task that I might write a script for (which is quite a bit) can be imported into the rakefile and mounted as a task (yes, ruby is very modular).</div>
Automocking containers are not just for mocks2011-04-13T00:00:00+00:00http://timkellogg.me/blog/2011/04/13/automocking-containers-are-not-just-for<div class='post'>
In <a href="http://timkellogg.blogspot.com/2011/04/introducing-moqcontrib-auto-mocking.html">my last post</a> I introduced MoqContrib's automocking container. In this post I want to describe what sets it apart from MoqContrib's <a href="http://code.google.com/p/moq-contrib/wiki/Automocking">previous</a> automocking container and all other automocking containers that I've heard of thus far.<br /><br />A <a href="http://docs.castleproject.org/Windsor.MainPage.ashx">Castle.Windsor</a> <a href="http://stackoverflow.com/questions/312624/removing-or-overwriting-a-component-from-windsor-container/312918#312918">contributor</a> said that for unit tests, "it's recommended that you don't use the container at all, or if the test setup gets too dense because of dependencies, use an AutoMockingContainer." This is in response to a stack overflow question regarding how to remove components in order to replace them with mocks. There are <a href="http://groups.google.com/group/moqdisc/browse_thread/thread/94b8d1d56e783ef0/bc696d408015eab1?pli=1">others</a> that agree with him.<br /><br />I don't agree with Mauricio or Derek (from the links above). I strongly believe that there are several reasons to let an automocking container have real services registered that aren't mocks. The primary reason is for integration tests. This is where you are testing a system of modules, a subset of the entire system, but you still need to isolate those modules to just the system under test (SUT). So while the dependencies within the SUT are going to be implemented with real implementations, everything else is mocked. This is a partially mocked situation.<br /><br />One of the big reasons to use an automocking container is just to simplify everything. Sure, you're setups are starting to get pretty long for unit tests, but sometimes you run into issues where there is already a component registered so you can't register a mock without first removing the original component. This is very tedious and totally ruins any love you might have had for your IoC container.<br /><br />In MoqContrib 1.0 the container will favor the last component registered over everything else. This is handy because you can do setups by exception. For an integration test fixture you can setup everything as a production implementation and then just mock components as needed. You can also do it the other way and just override with production implementations. I believe this will lead to much cleaner tests and much less time tracking down "how that friggin' component got registered".<br /><br />As far as the progress of a 1.0 release, I had originally said that it was going to be released last weekend. However, there have been some problems getting the community on board. I also realized that it was missing several important features. I will release a preview as soon as I get the current code stable.</div>
<div class='post'>
In <a href="http://timkellogg.blogspot.com/2011/04/introducing-moqcontrib-auto-mocking.html">my last post</a> I introduced MoqContrib's automocking container. In this post I want to describe what sets it apart from MoqContrib's <a href="http://code.google.com/p/moq-contrib/wiki/Automocking">previous</a> automocking container and all other automocking containers that I've heard of thus far.<br /><br />A <a href="http://docs.castleproject.org/Windsor.MainPage.ashx">Castle.Windsor</a> <a href="http://stackoverflow.com/questions/312624/removing-or-overwriting-a-component-from-windsor-container/312918#312918">contributor</a> said that for unit tests, "it's recommended that you don't use the container at all, or if the test setup gets too dense because of dependencies, use an AutoMockingContainer." This is in response to a stack overflow question regarding how to remove components in order to replace them with mocks. There are <a href="http://groups.google.com/group/moqdisc/browse_thread/thread/94b8d1d56e783ef0/bc696d408015eab1?pli=1">others</a> that agree with him.<br /><br />I don't agree with Mauricio or Derek (from the links above). I strongly believe that there are several reasons to let an automocking container have real services registered that aren't mocks. The primary reason is for integration tests. This is where you are testing a system of modules, a subset of the entire system, but you still need to isolate those modules to just the system under test (SUT). So while the dependencies within the SUT are going to be implemented with real implementations, everything else is mocked. This is a partially mocked situation.<br /><br />One of the big reasons to use an automocking container is just to simplify everything. Sure, you're setups are starting to get pretty long for unit tests, but sometimes you run into issues where there is already a component registered so you can't register a mock without first removing the original component. This is very tedious and totally ruins any love you might have had for your IoC container.<br /><br />In MoqContrib 1.0 the container will favor the last component registered over everything else. This is handy because you can do setups by exception. For an integration test fixture you can setup everything as a production implementation and then just mock components as needed. You can also do it the other way and just override with production implementations. I believe this will lead to much cleaner tests and much less time tracking down "how that friggin' component got registered".<br /><br />As far as the progress of a 1.0 release, I had originally said that it was going to be released last weekend. However, there have been some problems getting the community on board. I also realized that it was missing several important features. I will release a preview as soon as I get the current code stable.</div>
Introducing MoqContrib Auto-mocking Container2011-04-06T00:00:00+00:00http://timkellogg.me/blog/2011/04/06/introducing-moqcontrib-auto-mocking<div class='post'>
The past couple weeks I have been working on an auto-mocking inversion of control container for <a href="http://moq.codeplex.com/">Moq</a> <a href="http://moqcontrib.codeplex.com/">Contrib</a>. The first results are almost ready to release in the form of an Alpha. The first container to be released will be Castle.Windsor, later we will release an Autofac container.<br /><br />You will be interested in this project if you use an <a href="http://martinfowler.com/articles/injection.html">IoC</a> container in conjunction with unit tests and mocking (with Moq). You probably find yourself writing setups like:<br /><br /><pre class="brush: csharp">[SetUp]<br />public void Given()<br />{<br /> _service = Mock<IService>();<br /> Container.Register(For<IService>().Instance(service.Object));<br />}<br /><br />[Test]<br />public void I_did_something() <br />{<br /> var test = new TestThingy();<br /> test.DoSomething();<br /> <br /> _service.Verify(x => x.Something(), Times.Once();<br />}<br /></pre><br />When you use an auto-mocking container, the container will create mocks at resolve-time if it doesn't already have a component for it. So in the above example, the setup would drop out completely as there wouldn't be any need to explicitly create and register the mock:<br /><br /><pre class="brush: csharp">[Test]<br />public void I_did_something() <br />{<br /> var test = new TestThingy();<br /> test.DoSomething();<br /> <br /> _service.Verify(x => x.Something(), Times.Once();<br />}<br /></pre><br />We will release an alpha version of the <a href="http://docs.castleproject.org/Windsor.MainPage.ashx">Castle.Windsor</a> auto-mocking container later this week. Soon after we will add an <a href="http://code.google.com/p/autofac/">Autofac</a> container and start working towards a regular release schedule. If you are interested, visit <a href="http://moqcontrib.codeplex.com/">the site at codeplex</a> and give feedback through the discussion groups.<br /><br />Happy Mocking!</div>
<div class='post'>
The past couple weeks I have been working on an auto-mocking inversion of control container for <a href="http://moq.codeplex.com/">Moq</a> <a href="http://moqcontrib.codeplex.com/">Contrib</a>. The first results are almost ready to release in the form of an Alpha. The first container to be released will be Castle.Windsor, later we will release an Autofac container.<br /><br />You will be interested in this project if you use an <a href="http://martinfowler.com/articles/injection.html">IoC</a> container in conjunction with unit tests and mocking (with Moq). You probably find yourself writing setups like:<br /><br /><pre class="brush: csharp">[SetUp]<br />public void Given()<br />{<br /> _service = Mock<IService>();<br /> Container.Register(For<IService>().Instance(service.Object));<br />}<br /><br />[Test]<br />public void I_did_something() <br />{<br /> var test = new TestThingy();<br /> test.DoSomething();<br /> <br /> _service.Verify(x => x.Something(), Times.Once();<br />}<br /></pre><br />When you use an auto-mocking container, the container will create mocks at resolve-time if it doesn't already have a component for it. So in the above example, the setup would drop out completely as there wouldn't be any need to explicitly create and register the mock:<br /><br /><pre class="brush: csharp">[Test]<br />public void I_did_something() <br />{<br /> var test = new TestThingy();<br /> test.DoSomething();<br /> <br /> _service.Verify(x => x.Something(), Times.Once();<br />}<br /></pre><br />We will release an alpha version of the <a href="http://docs.castleproject.org/Windsor.MainPage.ashx">Castle.Windsor</a> auto-mocking container later this week. Soon after we will add an <a href="http://code.google.com/p/autofac/">Autofac</a> container and start working towards a regular release schedule. If you are interested, visit <a href="http://moqcontrib.codeplex.com/">the site at codeplex</a> and give feedback through the discussion groups.<br /><br />Happy Mocking!</div>
Object Incest2011-03-23T00:00:00+00:00http://timkellogg.me/blog/2011/03/23/object-incest<div class='post'>
<i>Note: I thought I had read this term from somewhere else, but after a quick internet search turned up only dirty videos, I think I may be the sole "coiner" of the term. </i><br /><i><br /></i><br />Many inexperience developers (and experienced ones too) have been known to make several common mistakes in object oriented design. Hence, the coining of the terms <a href="http://en.wikipedia.org/wiki/Anti-pattern">anti-pattern</a> and <a href="http://www.codinghorror.com/blog/2006/05/code-smells.html">code smell</a> to refer to patterns of development (like design patterns) that lead to convoluted, overly complex code that costs exponentially to maintain and exhibits little value.<br /><br />Object incest is a pattern where two unrelated classes are intimately dependent on each other. Simply put, <i>if object A </i><i>directly </i><i>relies on object B and B relies directly on A</i>, you have two incestual objects. This usually happens to intermediate developers who realize that they need <a href="http://trese.cs.utwente.nl/taosad/separation_of_concerns.htm">separation of concerns</a> and break a class into two classes without actually breaking the dependencies. While it is understandable (and almost respectable) why a developer might commit object incest, it is no less dangerous and harmful to a code base full of child objects.<br /><br />Here is an example of object incest:<br /><br /><pre class="brush: csharp">class Brother {<br /> public Sister MySister { get; set; }<br /><br /> private void GetMyHairBrushed() {<br /> MySister.BrushHair(this);<br /> }<br /><br /> public void DefendFromBullies(Sister sis) {<br /> // ...<br /> }<br />}<br /><br />class Sister {<br /> public Brother MyBrother { get; set; }<br /><br /> public void BrushHair(Brother bro) {<br /> // ...<br /> }<br /><br /> private void GetRidOfBullies() {<br /> MyBrother.DefendFromBullies(this);<br /> }<br />}<br /></pre><br />This is wrong because the two objects are so involved that it's hard to tell them apart, breaking the principal of separation of concerns. You can fix this by extracting <i>roles</i> from the objects as interfaces. Therefore, each object depends on some kind of object that can fulfill a role. A brother object needs someone to brush his hair, a sister needs someone to defend her from bullies.<br /><br /><pre class="brush: csharp">class Brother : IDefenderOfTheWeak, IPersonWithHair {<br /> public IHairBrusher MyHairBrushPartner { get; set; }<br /> <br /> private void BrushMyHair() {<br /> MyHairBrushPartner.BrushHair(this);<br /> }<br /> <br /> public void DefendFromBullies(IWeakling weakling) {<br /> // ...<br /> }<br />}<br /><br />class Sister : IWeakling, IHairBrusher {<br /> public IDefenderOfTheWeak Defender { get; set; }<br /> <br /> public void BrushHair(IPersonWithHair hairyPerson) {<br /> // ...<br /> }<br /> <br /> private void FightOffBullies() {<br /> Defender.DefendFromBullies(this);<br /> }<br />}</pre><br />In the second example, the two objects are no longer reliant on each other. Now they only rely on the roles that each of them provide. Down the road it will be much easier to create other objects that implement those interfaces (roles) like Husband and Wife.</div>
<div class='post'>
<i>Note: I thought I had read this term from somewhere else, but after a quick internet search turned up only dirty videos, I think I may be the sole "coiner" of the term. </i><br /><i><br /></i><br />Many inexperience developers (and experienced ones too) have been known to make several common mistakes in object oriented design. Hence, the coining of the terms <a href="http://en.wikipedia.org/wiki/Anti-pattern">anti-pattern</a> and <a href="http://www.codinghorror.com/blog/2006/05/code-smells.html">code smell</a> to refer to patterns of development (like design patterns) that lead to convoluted, overly complex code that costs exponentially to maintain and exhibits little value.<br /><br />Object incest is a pattern where two unrelated classes are intimately dependent on each other. Simply put, <i>if object A </i><i>directly </i><i>relies on object B and B relies directly on A</i>, you have two incestual objects. This usually happens to intermediate developers who realize that they need <a href="http://trese.cs.utwente.nl/taosad/separation_of_concerns.htm">separation of concerns</a> and break a class into two classes without actually breaking the dependencies. While it is understandable (and almost respectable) why a developer might commit object incest, it is no less dangerous and harmful to a code base full of child objects.<br /><br />Here is an example of object incest:<br /><br /><pre class="brush: csharp">class Brother {<br /> public Sister MySister { get; set; }<br /><br /> private void GetMyHairBrushed() {<br /> MySister.BrushHair(this);<br /> }<br /><br /> public void DefendFromBullies(Sister sis) {<br /> // ...<br /> }<br />}<br /><br />class Sister {<br /> public Brother MyBrother { get; set; }<br /><br /> public void BrushHair(Brother bro) {<br /> // ...<br /> }<br /><br /> private void GetRidOfBullies() {<br /> MyBrother.DefendFromBullies(this);<br /> }<br />}<br /></pre><br />This is wrong because the two objects are so involved that it's hard to tell them apart, breaking the principal of separation of concerns. You can fix this by extracting <i>roles</i> from the objects as interfaces. Therefore, each object depends on some kind of object that can fulfill a role. A brother object needs someone to brush his hair, a sister needs someone to defend her from bullies.<br /><br /><pre class="brush: csharp">class Brother : IDefenderOfTheWeak, IPersonWithHair {<br /> public IHairBrusher MyHairBrushPartner { get; set; }<br /> <br /> private void BrushMyHair() {<br /> MyHairBrushPartner.BrushHair(this);<br /> }<br /> <br /> public void DefendFromBullies(IWeakling weakling) {<br /> // ...<br /> }<br />}<br /><br />class Sister : IWeakling, IHairBrusher {<br /> public IDefenderOfTheWeak Defender { get; set; }<br /> <br /> public void BrushHair(IPersonWithHair hairyPerson) {<br /> // ...<br /> }<br /> <br /> private void FightOffBullies() {<br /> Defender.DefendFromBullies(this);<br /> }<br />}</pre><br />In the second example, the two objects are no longer reliant on each other. Now they only rely on the roles that each of them provide. Down the road it will be much easier to create other objects that implement those interfaces (roles) like Husband and Wife.</div>
Unit testing databases - with NHibernate!2011-03-17T00:00:00+00:00http://timkellogg.me/blog/2011/03/17/unit-testing-databases-with-nhibernate<div class='post'>
One of the pesky problems with databases is unit testing the database portion of your application. For instance, it's enough of a pain to tear down and restore data to it's original state, but it's even harder if your application code requires you to commit changes. A while ago I saw <a href="http://stackoverflow.com/questions/321180/how-do-i-test-database-related-code-with-nunit">this stack overflow question</a> that said you could wrap all your code in a TransactionScope like:<br /><br /><pre class="brush: csharp">using (new TransactionScope())<br />{<br /> // Database access code here<br />}<br /></pre><br />When .Dispose() is called at the end of the using block, the code is supposed to roll back all transactions, even if they were committed. After reading <a href="http://msdn.microsoft.com/en-us/library/system.transactions.transactionscope.aspx">the documentation</a> I realized that any new transactions will use this transaction scope, and hence be rolled back when the transaction scope rolls back at the end of the using block.<br /><br />This all seems like a great idea for ADO.NET code, but I was skeptical of using this with NHibernate because I know NHibernate does funny things with the session and how it creates transactions. Even though I've known about this trick for some time, I never trusted it or even took the time to actually test it...until now.<br /><br />I tested this idea out inside the scope of our application code which I'm basically just pasting here. So bear with some of the abstraction code we have built up in IGenericDAO and Container.<br /><br /><pre class="brush: csharp">[Test]<br />public void CheckNHibernateMappings()<br />{<br /> using (new TransactionScope())<br /> {<br /> // IGenericDAO is our abstraction layer for accessing NHibernate<br /> var dao = Container.Resolve<IGenericDAO<WorkflowTransition>>();<br /> var obj = new WorkflowTransition() { FromFk = 1, ToFk = 2, IsAllowed = true, WorkflowFk = 1, RightFk = 1 };<br /> dao.Save(obj);<br /> dao.CommitChanges();<br /><br /> var selected = dao.SelectById(obj.WorkflowTransitionId);<br /> Assert.That(selected.WorkflowTransitionId, Is.GreaterThan(0));<br /> Assert.That(selected.To, Is.EqualTo(2));<br /> }<br />}<br /></pre><br />I placed a breakpoint at line 12, after CommitChanges(). I debugged the unit test and when it stopped at the breakpoint I ran this query in SSMS:<br /><br /><pre class="brush: sql">select * from WorkflowTransitions with (nolock)<br /></pre><br />The query returned the row I just inserted. The <i>nolock</i> table hint means to ignore any locks that might be on the table and read everything, even uncommitted data. This means we can see the results of NHibernate's <i>insert</i> statement without having to mess with the SQL profiler. If you run the query without the nolock option it hangs until timeout. I then let the test finish and ran the query again. This time the row was gone!<br /><br />Apparently, this TransactionScope is fully capable of rolling back all transactions, even if they were created automagically by NHibernate. I presume this means it will work with any ORM framework, not just NHibernate.</div>
<div class='post'>
One of the pesky problems with databases is unit testing the database portion of your application. For instance, it's enough of a pain to tear down and restore data to it's original state, but it's even harder if your application code requires you to commit changes. A while ago I saw <a href="http://stackoverflow.com/questions/321180/how-do-i-test-database-related-code-with-nunit">this stack overflow question</a> that said you could wrap all your code in a TransactionScope like:<br /><br /><pre class="brush: csharp">using (new TransactionScope())<br />{<br /> // Database access code here<br />}<br /></pre><br />When .Dispose() is called at the end of the using block, the code is supposed to roll back all transactions, even if they were committed. After reading <a href="http://msdn.microsoft.com/en-us/library/system.transactions.transactionscope.aspx">the documentation</a> I realized that any new transactions will use this transaction scope, and hence be rolled back when the transaction scope rolls back at the end of the using block.<br /><br />This all seems like a great idea for ADO.NET code, but I was skeptical of using this with NHibernate because I know NHibernate does funny things with the session and how it creates transactions. Even though I've known about this trick for some time, I never trusted it or even took the time to actually test it...until now.<br /><br />I tested this idea out inside the scope of our application code which I'm basically just pasting here. So bear with some of the abstraction code we have built up in IGenericDAO and Container.<br /><br /><pre class="brush: csharp">[Test]<br />public void CheckNHibernateMappings()<br />{<br /> using (new TransactionScope())<br /> {<br /> // IGenericDAO is our abstraction layer for accessing NHibernate<br /> var dao = Container.Resolve<IGenericDAO<WorkflowTransition>>();<br /> var obj = new WorkflowTransition() { FromFk = 1, ToFk = 2, IsAllowed = true, WorkflowFk = 1, RightFk = 1 };<br /> dao.Save(obj);<br /> dao.CommitChanges();<br /><br /> var selected = dao.SelectById(obj.WorkflowTransitionId);<br /> Assert.That(selected.WorkflowTransitionId, Is.GreaterThan(0));<br /> Assert.That(selected.To, Is.EqualTo(2));<br /> }<br />}<br /></pre><br />I placed a breakpoint at line 12, after CommitChanges(). I debugged the unit test and when it stopped at the breakpoint I ran this query in SSMS:<br /><br /><pre class="brush: sql">select * from WorkflowTransitions with (nolock)<br /></pre><br />The query returned the row I just inserted. The <i>nolock</i> table hint means to ignore any locks that might be on the table and read everything, even uncommitted data. This means we can see the results of NHibernate's <i>insert</i> statement without having to mess with the SQL profiler. If you run the query without the nolock option it hangs until timeout. I then let the test finish and ran the query again. This time the row was gone!<br /><br />Apparently, this TransactionScope is fully capable of rolling back all transactions, even if they were created automagically by NHibernate. I presume this means it will work with any ORM framework, not just NHibernate.</div>
Introducing ObjectFlow2011-03-14T00:00:00+00:00http://timkellogg.me/blog/2011/03/14/introducing-objectflow<div class='post'>
I've been assigned to create a light and flexible workflow for two separate projects. After <a href="http://stackoverflow.com/questions/5198315/what-workflow-framework-to-use-in-c">doing some research</a>, I found that <a href="http://stackoverflow.com/questions/3634901/to-workflow-or-not-to-workflow">there really aren't any light, easy to use and understand, workflows</a>. I noticed that <a href="http://objectflow.codeplex.com/">objectflow</a> lets you define workflows in C# with an easy-to-read fluent interface, but after digging into it I realized it was missing some crucial features. For instance, there was no clear way that you could pause a workflow in the middle so that a real person can interact with it.<br /><br />I contacted the maintainer of the project and have contributed a large portion of functionality that makes it easy to define workflows that include people. Here is a sample workflow:<br /><br /><pre class="brush: csharp">var open = Declare.Step();<br />var wf = new StatefulWorkflow<SiteVisit>("Site Visit Workflow")<br /> .Do(x => x.GatherInformation())<br /> .Define(defineAs: open)<br /> .Yield(SiteVisit.States.Open)<br /> .Unless(x => x.Validate(), otherwise: open)<br /> .Do(x => x.PostVisit());<br /><br />// And send an object through<br />var visit = new SiteVisit();<br />wf.Start(visit);<br /><br />// It stops at the Yield, maybe persist it in a database and display a page to the user<br />wf.Start(visit);<br /><br />// extension methods to check if it's still in the workflow<br />if (visit.IsAliveInWorkflow("Site Visit Workflow"))<br /> wf.Start(visit);<br /></pre><br />This workflow is fairly simple and demonstrates how you can create a module for defining workflow and isolate all business logic in data objects (models and view-models work great here). I was initially concerned with the idea of creating conditional goto constructs, but after more thought I decided that this shouldn't be a significant problem as long as workflows stay simple and there is a clear separation from business logic and workflow logic.<br /><br />There is a lot more to this project - and to the features I contributed. However, I haven't even put forth a good effort in developing the official documentation, so perhaps I'll write about this more after developing the core documentation a little more. I think this is an excellent solution for companies who want to quickly through together workflows without a significant barrier to understanding. I think I will continue developing on ObjectFlow as long as I have something I feel I can add.</div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>De Wet</div>
<div class='content'>
Hi <br /><br />I am also looking for a light weight workflow where we can create the workflow but the user is allowed to add users to a workflow step. So that the same page will open for every user that was added. Only when all users have approved the step it will continue to the next step. <br /><br />You dont have a sample of using Objectflow and how to display a page to the user?</div>
</div>
</div>
<div class='post'>
I've been assigned to create a light and flexible workflow for two separate projects. After <a href="http://stackoverflow.com/questions/5198315/what-workflow-framework-to-use-in-c">doing some research</a>, I found that <a href="http://stackoverflow.com/questions/3634901/to-workflow-or-not-to-workflow">there really aren't any light, easy to use and understand, workflows</a>. I noticed that <a href="http://objectflow.codeplex.com/">objectflow</a> lets you define workflows in C# with an easy-to-read fluent interface, but after digging into it I realized it was missing some crucial features. For instance, there was no clear way that you could pause a workflow in the middle so that a real person can interact with it.<br /><br />I contacted the maintainer of the project and have contributed a large portion of functionality that makes it easy to define workflows that include people. Here is a sample workflow:<br /><br /><pre class="brush: csharp">var open = Declare.Step();<br />var wf = new StatefulWorkflow<SiteVisit>("Site Visit Workflow")<br /> .Do(x => x.GatherInformation())<br /> .Define(defineAs: open)<br /> .Yield(SiteVisit.States.Open)<br /> .Unless(x => x.Validate(), otherwise: open)<br /> .Do(x => x.PostVisit());<br /><br />// And send an object through<br />var visit = new SiteVisit();<br />wf.Start(visit);<br /><br />// It stops at the Yield, maybe persist it in a database and display a page to the user<br />wf.Start(visit);<br /><br />// extension methods to check if it's still in the workflow<br />if (visit.IsAliveInWorkflow("Site Visit Workflow"))<br /> wf.Start(visit);<br /></pre><br />This workflow is fairly simple and demonstrates how you can create a module for defining workflow and isolate all business logic in data objects (models and view-models work great here). I was initially concerned with the idea of creating conditional goto constructs, but after more thought I decided that this shouldn't be a significant problem as long as workflows stay simple and there is a clear separation from business logic and workflow logic.<br /><br />There is a lot more to this project - and to the features I contributed. However, I haven't even put forth a good effort in developing the official documentation, so perhaps I'll write about this more after developing the core documentation a little more. I think this is an excellent solution for companies who want to quickly through together workflows without a significant barrier to understanding. I think I will continue developing on ObjectFlow as long as I have something I feel I can add.</div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>De Wet</div>
<div class='content'>
Hi <br /><br />I am also looking for a light weight workflow where we can create the workflow but the user is allowed to add users to a workflow step. So that the same page will open for every user that was added. Only when all users have approved the step it will continue to the next step. <br /><br />You dont have a sample of using Objectflow and how to display a page to the user?</div>
</div>
</div>
Crass grammar drives me crazy2011-03-04T00:00:00+00:00http://timkellogg.me/blog/2011/03/04/crass-grammar-drives-me-crazy<div class='post'>
I recently had a conversation with someone that went something like:<br /><br /><i>Me:</i> Yeah, I went to the Sunflower market down on 287 & South Boulder Rd<br /><i>PersonX:</i> That's one long ass walk<br /><br />How am I supposed to reply to that? I could say, "Not really, I wasn't ass walking the whole way" or "Yes, my ass is long, I should get in shape". No wonder people have such a hard time learning English...</div>
<div class='post'>
I recently had a conversation with someone that went something like:<br /><br /><i>Me:</i> Yeah, I went to the Sunflower market down on 287 & South Boulder Rd<br /><i>PersonX:</i> That's one long ass walk<br /><br />How am I supposed to reply to that? I could say, "Not really, I wasn't ass walking the whole way" or "Yes, my ass is long, I should get in shape". No wonder people have such a hard time learning English...</div>
I'm becoming a DVCS snob2011-03-03T00:00:00+00:00http://timkellogg.me/blog/2011/03/03/im-becoming-dvcs-snob<div class='post'>
Today i was looking at open source workflow frameworks for work today and paused on <a href="http://objectflow.codeplex.com/">objectflow</a>. I almost decided not to use the library because they're still using SVN or TFS (I'm not real sure which) even though codeplex supports <a href="http://mercurial.selenic.com/">Mecurial</a>.<br /><br />I'm coming in with the idea that I may contribute to the project if I find, down the road, that I have something that could be added to the project. Submitting patches seems so painful compared to a simple pull request. The workflow of a distributed version control system (DVCS) makes sharing code so incredibly easy that it causes me psychological pain to think about going back to SVN.<br /><br />On the other hand, one benefit of objectflow being available as SVN is that I can easily use git-svn to create a git clone that can be included as a submodule. It wouldn't be quite as straight-forward if it were a mecurial repository. Submodules are an excellent feature of Git!</div>
<div class='post'>
Today i was looking at open source workflow frameworks for work today and paused on <a href="http://objectflow.codeplex.com/">objectflow</a>. I almost decided not to use the library because they're still using SVN or TFS (I'm not real sure which) even though codeplex supports <a href="http://mercurial.selenic.com/">Mecurial</a>.<br /><br />I'm coming in with the idea that I may contribute to the project if I find, down the road, that I have something that could be added to the project. Submitting patches seems so painful compared to a simple pull request. The workflow of a distributed version control system (DVCS) makes sharing code so incredibly easy that it causes me psychological pain to think about going back to SVN.<br /><br />On the other hand, one benefit of objectflow being available as SVN is that I can easily use git-svn to create a git clone that can be included as a submodule. It wouldn't be quite as straight-forward if it were a mecurial repository. Submodules are an excellent feature of Git!</div>
NUnit Extension Methods2011-02-26T00:00:00+00:00http://timkellogg.me/blog/2011/02/26/nunit-extension-methods<div class='post'>
I've always used NUnit for testing code so it's naturally the framework I'm most familiar with (I haven't used anything else). I learned unit testing using the classic <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Assert.AreEqual(expected, actual)</span> methods. Although, I was finding my tests slightly confusing to read - I sometimes can't remember which comes first, expected or actual.<br /><br />More recently I've been getting into v2.5 including the new asserts - <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Assert.That(actual, Is.EqualTo(expected))</span>. I think this makes a lot of sense and I often find myself using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Assert.That</span> most of the time just because it makes sense.<br /><br />Recently, a coworker created a few extension methods that I'm finding quite handy:<br /><br /><pre class="brush: c#">public static void ShouldBe(this object @this, object expected) {<br /> Assert.AreEqual((dynamic)expected, (dynamic)@this);<br />}<br />public static void ShouldNotBe(this object @this, object expected) {<br /> Assert.AreNotEqual((dynamic)expected, (dynamic)@this);<br />}<br />public static void ShouldBeNull(this object @this) {<br /> Assert.IsNull(@this);<br />}<br />public static void ShouldNotBeNull(this object @this) {<br /> Assert.IsNotNull(@this);<br />}</pre><br />I've completely fallen in love with how this reads: <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">actual.ShouldBe(expected)</span>. It also makes me giggle to do <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">actual.ShouldBeNull()</span><span class="Apple-style-span" style="font-family: inherit;"> (Don't you love extension methods?)</span>. This makes unit testing so easy...</div>
<div class='post'>
I've always used NUnit for testing code so it's naturally the framework I'm most familiar with (I haven't used anything else). I learned unit testing using the classic <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Assert.AreEqual(expected, actual)</span> methods. Although, I was finding my tests slightly confusing to read - I sometimes can't remember which comes first, expected or actual.<br /><br />More recently I've been getting into v2.5 including the new asserts - <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Assert.That(actual, Is.EqualTo(expected))</span>. I think this makes a lot of sense and I often find myself using <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">Assert.That</span> most of the time just because it makes sense.<br /><br />Recently, a coworker created a few extension methods that I'm finding quite handy:<br /><br /><pre class="brush: c#">public static void ShouldBe(this object @this, object expected) {<br /> Assert.AreEqual((dynamic)expected, (dynamic)@this);<br />}<br />public static void ShouldNotBe(this object @this, object expected) {<br /> Assert.AreNotEqual((dynamic)expected, (dynamic)@this);<br />}<br />public static void ShouldBeNull(this object @this) {<br /> Assert.IsNull(@this);<br />}<br />public static void ShouldNotBeNull(this object @this) {<br /> Assert.IsNotNull(@this);<br />}</pre><br />I've completely fallen in love with how this reads: <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">actual.ShouldBe(expected)</span>. It also makes me giggle to do <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">actual.ShouldBeNull()</span><span class="Apple-style-span" style="font-family: inherit;"> (Don't you love extension methods?)</span>. This makes unit testing so easy...</div>
The internal secrets of Git2011-02-13T00:00:00+00:00http://timkellogg.me/blog/2011/02/13/internal-secrets-of-git<div class='post'>
Thursday night I attended a lecture at the Boulder Linux user's group called <i><a href="http://sea.ucar.edu/event/unlocking-secrets-git">Unlocking the Secrets of Git</a></i> by <a href="https://github.com/mojombo">Tom</a>, one of the co-founders of <a href="http://github.com/">Github</a>. This was extremely eye-opening. Up until now I had viewed Git as simply a distributed version control system. Tom showed us how to manipulate Git's <a href="http://progit.org/book/ch9-2.html">internal file format</a> and demonstrated that Git is actually a filesystem in userspace with built-in versioning and synchronization. He demonstrated how, by storing a SHA1 hash of files, Git is (1) extremely fast at comparing files and (2) doesn't actually care about the file name - it just cares about the contents of files. This is important when you're renaming files - the filename is generally unimportant in the grand scheme of things.<br /><br />Tom also showed us several open source projects that build upon the concept of Git as a filesystem. One was a <a href="http://github.com/apenwarr/bup">highly efficient backup system</a>. Another is a <a href="http://github.com/mojombo/jekyll">static site generator</a>. There were many more. The point here is that Git is destined to be not just version control; it will be a feature-complete platform for anything that requires a filesystem with versioning and synchronization.<br /><br />The critical component to the success of Git as a plaform is <a href="http://libgit2.github.com/">libgit2</a>, a C library for interacting with Git. The reason why this is the critical component is that many people had been re-creating the functionality of Git. By combining this functionality into a library, the logic only has to be written once and can be used by everyone else. The other reason why this is a critical component is because libgit2 is being released under a permissive license that allows it to be easily used by many other people and projects without getting into any legal snafu's. <br /><br />Most importantly, Thursday night I realized that the tech community of Boulder is so complex and complete, I should never get bored here. I haven't lived here for a full six months yet but already I feel like I can't leave this city.</div>
<div class='post'>
Thursday night I attended a lecture at the Boulder Linux user's group called <i><a href="http://sea.ucar.edu/event/unlocking-secrets-git">Unlocking the Secrets of Git</a></i> by <a href="https://github.com/mojombo">Tom</a>, one of the co-founders of <a href="http://github.com/">Github</a>. This was extremely eye-opening. Up until now I had viewed Git as simply a distributed version control system. Tom showed us how to manipulate Git's <a href="http://progit.org/book/ch9-2.html">internal file format</a> and demonstrated that Git is actually a filesystem in userspace with built-in versioning and synchronization. He demonstrated how, by storing a SHA1 hash of files, Git is (1) extremely fast at comparing files and (2) doesn't actually care about the file name - it just cares about the contents of files. This is important when you're renaming files - the filename is generally unimportant in the grand scheme of things.<br /><br />Tom also showed us several open source projects that build upon the concept of Git as a filesystem. One was a <a href="http://github.com/apenwarr/bup">highly efficient backup system</a>. Another is a <a href="http://github.com/mojombo/jekyll">static site generator</a>. There were many more. The point here is that Git is destined to be not just version control; it will be a feature-complete platform for anything that requires a filesystem with versioning and synchronization.<br /><br />The critical component to the success of Git as a plaform is <a href="http://libgit2.github.com/">libgit2</a>, a C library for interacting with Git. The reason why this is the critical component is that many people had been re-creating the functionality of Git. By combining this functionality into a library, the logic only has to be written once and can be used by everyone else. The other reason why this is a critical component is because libgit2 is being released under a permissive license that allows it to be easily used by many other people and projects without getting into any legal snafu's. <br /><br />Most importantly, Thursday night I realized that the tech community of Boulder is so complex and complete, I should never get bored here. I haven't lived here for a full six months yet but already I feel like I can't leave this city.</div>
Mind control2011-01-19T00:00:00+00:00http://timkellogg.me/blog/2011/01/19/mind-control<div class='post'>
I found <a href="https://github.com/blog/771-mind-control-with-frickin-lasers">this blog post</a> about a couple Harvard students who wrote some [GPL'd] software for controlling worms' minds. They can control how these worms move and even make them lay eggs!<br /><br />The implications of this are obviously huge. This is only an academic project now, but in a couple decades I wonder if we'll see animals used like machines? I guess there's several other ideas you could draw from this, but no matter how you view it, it's a fascinating idea.</div>
<div class='post'>
I found <a href="https://github.com/blog/771-mind-control-with-frickin-lasers">this blog post</a> about a couple Harvard students who wrote some [GPL'd] software for controlling worms' minds. They can control how these worms move and even make them lay eggs!<br /><br />The implications of this are obviously huge. This is only an academic project now, but in a couple decades I wonder if we'll see animals used like machines? I guess there's several other ideas you could draw from this, but no matter how you view it, it's a fascinating idea.</div>
Declaring the Future of Programming2011-01-09T00:00:00+00:00http://timkellogg.me/blog/2011/01/09/declaring-future-of-programming<div class='post'>
Programming languages have developed significantly over the past several decades. I hypothesize that this development has tended more towards declarative syntax than imperative. The future of programming languages will only become more declarative in the years to come.<br /><br />In the beginning was machine code. Programmers wrote programs by stringing together arcane byte codes of instructions and parameters. Programs were getting pretty hard to read so they made assemblers so you could write instructions in plain text, complete with comments. An assembler program would process the source code and turn each instruction into it's equivalent machine code. This is imperative programming at its most pure state.<br /><br />When the first C compiler was written it immediately became popular because the programmer only had to declare what should happen in the program and the compiler would generate the necessary machine code to make that happen. Hence why you can write a C program that can be compiled for Linux, Windows and Mac with zero changes to the source code. However, C and C++ are still imperative languages in most other aspects because the thought process is still very much a "do this, now do this, now do this" algorithmic sequence of instructions.<br /><br /><span class="Apple-style-span" style="font-size: large;">Query Languages</span><br /><br />The hallmark of declarative languages thoughout history is probably SQL (referring strictly to set operations here). In SQL you describe the result set and let the DBMS decide the best way to produce that result set. For instance, consider this query:<br /><br /><pre class="brush: sql">select p.FirstName, p.LastName, a.AccountName<br />from Person p<br />inner join Account a<br />on p.PersonId = a.ResponsiblePerson<br />where a.IsActive = 1<br />order by p.FirstName, p.LastName<br /></pre><br />First we describe the columns that we want (this actually happens last, if you want to be technical). In the from clause we say what tables we want information from and specify how we want them matched up using the on clause of the join. In the where clause we specify what criteria for the rows that we want to show and in the order by we describe the sort order.<br /><br />All this was done strictly declaratively. If you have the opportunity to look at the execution plan, it all ends up being quite elaborate. It might consult two or three indexes before actually joining rows, selecting columns and ordering the result set not to mention all the asynchronous locking that took place so as not to run into race conditions. If we had to write this in C# or Java code it would be an extremely gnarly component and would probably be buggy and slow.<br /><br /><span class="Apple-style-span" style="font-size: large;">Expression Trees in C#</span><br /><br />Interestingly, .NET land is also developing into a declarative playground. The biggest step in this direction happened with Linq and it's expression trees. Now, the Linq query syntax is declarative, but I'm referring to something more basic. Expression trees can be broken down at run time by a processor that can analyze the contents of a lambda that it was passed. For instance, NHibernate can receive a method call like:<br /><br /><pre class="brush: csharp">var timsAccounts = accounts.Where(x => x.ResponsiblePerson == "Tim");<br /></pre><br />and pull out the meaning (ResponsiblePerson = Tim) and convert it into a SQL "where" clause at run time (sql = "where a.ResponsiblePerson = 'Tim'). The implications of this are wild, and in recent months and years have become very powerful. Examples include <a href="http://wiki.fluentnhibernate.org/Fluent_mapping">Fluent NHibernate</a>, <a href="http://code.google.com/p/moq/wiki/QuickStart">Moq</a>, and Castle Windsor's <a href="http://using.castleproject.org/display/IoC/Fluent+Registration+API">fluent registration API</a>. Both castle windsor and NHibernate both used to use XML configuration files but have since moved towards using expression trees in combination with dynamic proxies and interceptors to configure via code. This declarative approach is leading towards less code that has potential to be more efficient.<br /><br /><span class="Apple-style-span" style="font-size: large;">Treatise on Domain Specific Languages</span><br /><br />The topic of <a href="http://en.wikipedia.org/wiki/Domain-specific_language">domain specific languages</a> deserves an entire blog post. SQL and CSS are the obvious examples, but there are hundreds more. In one of my internships a coworker wrote a DSL to specify sort order for dictionaries for arcane natural languages and scripts. A simple DSL is much easier to develop than a GUI for the same purpose and can many times be easier for a non-techy user to learn and become productive in.<br /><br />The sad news is that colleges and universities are putting less focus on compiler & parser classes. The assumption being that we have all the languages we need, why would we need more? The answer is simple: by providing a simple syntax to describe problems or solutions we can simplify the entire process of arriving to that solution. If the problem is abstracted away from the solution we can easily leverage constructs like multi-threading and highly optimized solutions. Sometime you should take a look at the byte codes that your compiler produces - ask yourself if you could have even thought of those sorts of mind bending tricks.<br /><br />We need domain specific languages because they simplify problems. They create more effective abstraction than even <a href="http://timkellogg.blogspot.com/2010/05/incidental-inversion-of-control.html">inversion of control</a> frameworks. Unfortunately, less people are learning about string processing these days. How many people have you worked with actually consider themselves proficient in regular expressions or compiler generators? (yet two more declarative DSLs that simplify solutions)<br /><br /><span class="Apple-style-span" style="font-size: large;">Conclusion</span><br /><br />Anytime you write code that is less imperative, it allows the layer underneath more room to innovate efficient algorithms. Surely this isn't surprising since any good programmer would feel exactly the same way towards a micro-managing supervisor. So after saying all this, it should be clear why I believe that the future of programming is declarative. Declarative syntaxes allow us to simplify the problem by simply stating what the problem is (or describing what the solution looks like) and allowing the underlying engine to determine the solution. As such, I believe we will be seeing the number of domain specific languages multiply in the years to come.</div>
<div class='post'>
Programming languages have developed significantly over the past several decades. I hypothesize that this development has tended more towards declarative syntax than imperative. The future of programming languages will only become more declarative in the years to come.<br /><br />In the beginning was machine code. Programmers wrote programs by stringing together arcane byte codes of instructions and parameters. Programs were getting pretty hard to read so they made assemblers so you could write instructions in plain text, complete with comments. An assembler program would process the source code and turn each instruction into it's equivalent machine code. This is imperative programming at its most pure state.<br /><br />When the first C compiler was written it immediately became popular because the programmer only had to declare what should happen in the program and the compiler would generate the necessary machine code to make that happen. Hence why you can write a C program that can be compiled for Linux, Windows and Mac with zero changes to the source code. However, C and C++ are still imperative languages in most other aspects because the thought process is still very much a "do this, now do this, now do this" algorithmic sequence of instructions.<br /><br /><span class="Apple-style-span" style="font-size: large;">Query Languages</span><br /><br />The hallmark of declarative languages thoughout history is probably SQL (referring strictly to set operations here). In SQL you describe the result set and let the DBMS decide the best way to produce that result set. For instance, consider this query:<br /><br /><pre class="brush: sql">select p.FirstName, p.LastName, a.AccountName<br />from Person p<br />inner join Account a<br />on p.PersonId = a.ResponsiblePerson<br />where a.IsActive = 1<br />order by p.FirstName, p.LastName<br /></pre><br />First we describe the columns that we want (this actually happens last, if you want to be technical). In the from clause we say what tables we want information from and specify how we want them matched up using the on clause of the join. In the where clause we specify what criteria for the rows that we want to show and in the order by we describe the sort order.<br /><br />All this was done strictly declaratively. If you have the opportunity to look at the execution plan, it all ends up being quite elaborate. It might consult two or three indexes before actually joining rows, selecting columns and ordering the result set not to mention all the asynchronous locking that took place so as not to run into race conditions. If we had to write this in C# or Java code it would be an extremely gnarly component and would probably be buggy and slow.<br /><br /><span class="Apple-style-span" style="font-size: large;">Expression Trees in C#</span><br /><br />Interestingly, .NET land is also developing into a declarative playground. The biggest step in this direction happened with Linq and it's expression trees. Now, the Linq query syntax is declarative, but I'm referring to something more basic. Expression trees can be broken down at run time by a processor that can analyze the contents of a lambda that it was passed. For instance, NHibernate can receive a method call like:<br /><br /><pre class="brush: csharp">var timsAccounts = accounts.Where(x => x.ResponsiblePerson == "Tim");<br /></pre><br />and pull out the meaning (ResponsiblePerson = Tim) and convert it into a SQL "where" clause at run time (sql = "where a.ResponsiblePerson = 'Tim'). The implications of this are wild, and in recent months and years have become very powerful. Examples include <a href="http://wiki.fluentnhibernate.org/Fluent_mapping">Fluent NHibernate</a>, <a href="http://code.google.com/p/moq/wiki/QuickStart">Moq</a>, and Castle Windsor's <a href="http://using.castleproject.org/display/IoC/Fluent+Registration+API">fluent registration API</a>. Both castle windsor and NHibernate both used to use XML configuration files but have since moved towards using expression trees in combination with dynamic proxies and interceptors to configure via code. This declarative approach is leading towards less code that has potential to be more efficient.<br /><br /><span class="Apple-style-span" style="font-size: large;">Treatise on Domain Specific Languages</span><br /><br />The topic of <a href="http://en.wikipedia.org/wiki/Domain-specific_language">domain specific languages</a> deserves an entire blog post. SQL and CSS are the obvious examples, but there are hundreds more. In one of my internships a coworker wrote a DSL to specify sort order for dictionaries for arcane natural languages and scripts. A simple DSL is much easier to develop than a GUI for the same purpose and can many times be easier for a non-techy user to learn and become productive in.<br /><br />The sad news is that colleges and universities are putting less focus on compiler & parser classes. The assumption being that we have all the languages we need, why would we need more? The answer is simple: by providing a simple syntax to describe problems or solutions we can simplify the entire process of arriving to that solution. If the problem is abstracted away from the solution we can easily leverage constructs like multi-threading and highly optimized solutions. Sometime you should take a look at the byte codes that your compiler produces - ask yourself if you could have even thought of those sorts of mind bending tricks.<br /><br />We need domain specific languages because they simplify problems. They create more effective abstraction than even <a href="http://timkellogg.blogspot.com/2010/05/incidental-inversion-of-control.html">inversion of control</a> frameworks. Unfortunately, less people are learning about string processing these days. How many people have you worked with actually consider themselves proficient in regular expressions or compiler generators? (yet two more declarative DSLs that simplify solutions)<br /><br /><span class="Apple-style-span" style="font-size: large;">Conclusion</span><br /><br />Anytime you write code that is less imperative, it allows the layer underneath more room to innovate efficient algorithms. Surely this isn't surprising since any good programmer would feel exactly the same way towards a micro-managing supervisor. So after saying all this, it should be clear why I believe that the future of programming is declarative. Declarative syntaxes allow us to simplify the problem by simply stating what the problem is (or describing what the solution looks like) and allowing the underlying engine to determine the solution. As such, I believe we will be seeing the number of domain specific languages multiply in the years to come.</div>
Would I choose Git again?2011-01-02T00:00:00+00:00http://timkellogg.me/blog/2011/01/02/would-i-choose-git-again<div class='post'>
I wrote <a href="http://newline/">a post</a> a few months ago about the reasons we chose to use Git over subversion and I think it's time to follow up that post and write about how its gone so far. We're an ASP.NET outfit, and as such there are a few considerations that might not apply to, say, the Linux kernel team. I'm going to break this up into three parts: my perspective, my team's perspective, and some tips for anyone who might want to also try using Git.<br /><br /><span class="Apple-style-span" style="font-size: large;">My Experiences With Git</span><br />I seriously love using Git. I make a branch for everything I do just like they recommend. An old-school member of our team made a comment, "we always considered branches as something to be avoided", hinting at SVN branches' trait of being hard to manage and keep in sync with the trunk. Git branches are very different from SVN branches - they are very light and easy to keep up to date.<br /><br />Git has some seriously awesome merging mechanisms. First, you can select from a list of merge algorithms (you really only need one of these, but hey, its great to have choices just in case). Then they also have rebase and cherry-picking. These last two aren't regular merges because their algorithms look at the history of the entire repository and make several [and possibly hundreds of] incremental merges. Because these schemes take history into account, you can actually do some serious refactoring and still apply patches to both the production and development branches with relatively little effort.<br /><br />Our team develops and maintains a web application that our company sells as a service. As such, we don't spend time on installers or maintaining previous versions because the only versions that matter are the version that's in production and the development version. Git allows us to cherry-pick hotfixes from development into production (or vice versa) without really thinking much. This would have been a small nightmare in SVN (and invoke suicidal tendencies in TFS). Back when we were using TFS there really wasn't any process or procedure that went into hotfixes. You basically just updated production. With Git, its incredibly easy to just stash whatever you're doing, checkout the production branch, fix a critical bug, test & deploy it, an then cherry pick it back into the dev branch. Git works well for people who get interrupted by escalations (everyone??).<br /><br /><span class="Apple-style-span" style="font-size: large;">My Team's Experiences</span><br />My team hates Git. Well, that's a bit harsh and premature, but there was some backlash when we first switched. About three weeks in I gave a brown bag lunch presentation on Git to teach everyone how to use it. After that people generally caught on to the basics with exception of some merging snafus.<br /><br />Merging is actually an interesting point. TFS merging drove me nuts. Perhaps it was just the merge program, but I always felt like I had my hands tied. Now that we're using Git I feel free again to branch and merge at will, but one of my teammates seemed to be (at least at first) completely confused by Git merging. This was [probably] entirely due to the fact that Git Extensions didn't come with <a href="http://kdiff3.sourceforge.net/">kdiff </a>by default (they now offer a convenient all-in-one installer that includes kdiff & Git).<br /><br />Another point of confusion in using Git GUIs was that <a href="http://code.google.com/p/tortoisegit/">TortoiseGit </a>makes it very difficult to see what's different between local and remote repositories. I think the Tortoise crew made too much of an effort to make it feel like TortoseSVN when in reality it left some very important questions unanswered (TortoiseSVN only has to answer 1 or 2 important questions, but Git GUIs need to answer 4 or 5 important questions). Among these unanswered questions are "what branch am I on?" and "have I pushed this to the server yet?". TortoiseGit doesn't provide a clear answer to either of these questions, so I had everyone make a switch to <a href="http://code.google.com/p/gitextensions/">Git Extensions</a>.<br /><br /><span class="Apple-style-span" style="font-size: large;">Tips for Future Git Users</span><br />We were forced to learn a few lessons pretty quickly. I'll list them here in paragraph format...<br /><br /><a href="http://en.wikipedia.org/wiki/Graphical_user_interface">GUIs </a>are still young. Most Git users are sick Linux users who live by vi & grep, so developing a decent GUI hasn't really been a priority for Git (there is an official Git GUI that ships with Git, but it possesses some serious <a href="http://www.urbandictionary.com/define.php?term=suckage">suckage</a>). If you work in a Microsoft/Windows outfit there is no conceivable way your coworkers will be happy with command line, so a good GUI is critical. <b>Use <a href="http://code.google.com/p/gitextensions/">Git Extensions</a>!</b><br /><b><br /></b><br />Setting up a central server is not entirely straightforward. While SVN is distributed as either a client or a server, Git has no reason to require a central server so this was also an afterthought. <b>Use <a href="https://github.com/sitaramc/gitolite#start">gitolite</a> on Linux</b>. Use the package manager method of installing it, its very easy to get it started and its also easy to maintain.<br /><br />SSH keys are problematic. Try to use putty/plink to manage keys if possible. OpenSSH is very un-Windows-like.<br /><br />Unit tests are good and they can make Git shine even brighter. If you maintain a generally complete unit test suite you can have Git utilize your test runner to quickly find where code started breaking. The "bisect" command can take a program or command that returns 0 or 1 (standard success/failure codes, so throwing exceptions would work) and perform a binary search through past commits to find the first place where a test started failing. This could also work great if you're a scripting guru - write a short script to check for some text (like "CREATE TABLE X") in a particular file and Git will do the leg work.<br /><br /><span class="Apple-style-span" style="font-size: large;">Conclusive Thoughts</span><br />Git is very powerful and can adapt to any workflow. If process is important to you, Git will enable you in whatever process you choose. If process isn't important, Git won't get in your way. It is very scalable via its distributed nature (ref <a href="http://whygitisbetterthanx.com/#any-workflow">dictator and lieutenants</a>). It's also great for small personal projects that I do in my spare time. I can still have code version controlled without sharing it with anyone, but when I want to I can push it to <a href="https://github.com/">Github</a> (another awesome idea). However, if your coworkers are generally stagnant and opposed to change, Git will drive them nuts and you will hate your life. <i>Choose Git only if you want a program that will abstract away mundane tasks like merging but you don't mind having to change your world view towards version control.</i></div>
<div class='post'>
I wrote <a href="http://newline/">a post</a> a few months ago about the reasons we chose to use Git over subversion and I think it's time to follow up that post and write about how its gone so far. We're an ASP.NET outfit, and as such there are a few considerations that might not apply to, say, the Linux kernel team. I'm going to break this up into three parts: my perspective, my team's perspective, and some tips for anyone who might want to also try using Git.<br /><br /><span class="Apple-style-span" style="font-size: large;">My Experiences With Git</span><br />I seriously love using Git. I make a branch for everything I do just like they recommend. An old-school member of our team made a comment, "we always considered branches as something to be avoided", hinting at SVN branches' trait of being hard to manage and keep in sync with the trunk. Git branches are very different from SVN branches - they are very light and easy to keep up to date.<br /><br />Git has some seriously awesome merging mechanisms. First, you can select from a list of merge algorithms (you really only need one of these, but hey, its great to have choices just in case). Then they also have rebase and cherry-picking. These last two aren't regular merges because their algorithms look at the history of the entire repository and make several [and possibly hundreds of] incremental merges. Because these schemes take history into account, you can actually do some serious refactoring and still apply patches to both the production and development branches with relatively little effort.<br /><br />Our team develops and maintains a web application that our company sells as a service. As such, we don't spend time on installers or maintaining previous versions because the only versions that matter are the version that's in production and the development version. Git allows us to cherry-pick hotfixes from development into production (or vice versa) without really thinking much. This would have been a small nightmare in SVN (and invoke suicidal tendencies in TFS). Back when we were using TFS there really wasn't any process or procedure that went into hotfixes. You basically just updated production. With Git, its incredibly easy to just stash whatever you're doing, checkout the production branch, fix a critical bug, test & deploy it, an then cherry pick it back into the dev branch. Git works well for people who get interrupted by escalations (everyone??).<br /><br /><span class="Apple-style-span" style="font-size: large;">My Team's Experiences</span><br />My team hates Git. Well, that's a bit harsh and premature, but there was some backlash when we first switched. About three weeks in I gave a brown bag lunch presentation on Git to teach everyone how to use it. After that people generally caught on to the basics with exception of some merging snafus.<br /><br />Merging is actually an interesting point. TFS merging drove me nuts. Perhaps it was just the merge program, but I always felt like I had my hands tied. Now that we're using Git I feel free again to branch and merge at will, but one of my teammates seemed to be (at least at first) completely confused by Git merging. This was [probably] entirely due to the fact that Git Extensions didn't come with <a href="http://kdiff3.sourceforge.net/">kdiff </a>by default (they now offer a convenient all-in-one installer that includes kdiff & Git).<br /><br />Another point of confusion in using Git GUIs was that <a href="http://code.google.com/p/tortoisegit/">TortoiseGit </a>makes it very difficult to see what's different between local and remote repositories. I think the Tortoise crew made too much of an effort to make it feel like TortoseSVN when in reality it left some very important questions unanswered (TortoiseSVN only has to answer 1 or 2 important questions, but Git GUIs need to answer 4 or 5 important questions). Among these unanswered questions are "what branch am I on?" and "have I pushed this to the server yet?". TortoiseGit doesn't provide a clear answer to either of these questions, so I had everyone make a switch to <a href="http://code.google.com/p/gitextensions/">Git Extensions</a>.<br /><br /><span class="Apple-style-span" style="font-size: large;">Tips for Future Git Users</span><br />We were forced to learn a few lessons pretty quickly. I'll list them here in paragraph format...<br /><br /><a href="http://en.wikipedia.org/wiki/Graphical_user_interface">GUIs </a>are still young. Most Git users are sick Linux users who live by vi & grep, so developing a decent GUI hasn't really been a priority for Git (there is an official Git GUI that ships with Git, but it possesses some serious <a href="http://www.urbandictionary.com/define.php?term=suckage">suckage</a>). If you work in a Microsoft/Windows outfit there is no conceivable way your coworkers will be happy with command line, so a good GUI is critical. <b>Use <a href="http://code.google.com/p/gitextensions/">Git Extensions</a>!</b><br /><b><br /></b><br />Setting up a central server is not entirely straightforward. While SVN is distributed as either a client or a server, Git has no reason to require a central server so this was also an afterthought. <b>Use <a href="https://github.com/sitaramc/gitolite#start">gitolite</a> on Linux</b>. Use the package manager method of installing it, its very easy to get it started and its also easy to maintain.<br /><br />SSH keys are problematic. Try to use putty/plink to manage keys if possible. OpenSSH is very un-Windows-like.<br /><br />Unit tests are good and they can make Git shine even brighter. If you maintain a generally complete unit test suite you can have Git utilize your test runner to quickly find where code started breaking. The "bisect" command can take a program or command that returns 0 or 1 (standard success/failure codes, so throwing exceptions would work) and perform a binary search through past commits to find the first place where a test started failing. This could also work great if you're a scripting guru - write a short script to check for some text (like "CREATE TABLE X") in a particular file and Git will do the leg work.<br /><br /><span class="Apple-style-span" style="font-size: large;">Conclusive Thoughts</span><br />Git is very powerful and can adapt to any workflow. If process is important to you, Git will enable you in whatever process you choose. If process isn't important, Git won't get in your way. It is very scalable via its distributed nature (ref <a href="http://whygitisbetterthanx.com/#any-workflow">dictator and lieutenants</a>). It's also great for small personal projects that I do in my spare time. I can still have code version controlled without sharing it with anyone, but when I want to I can push it to <a href="https://github.com/">Github</a> (another awesome idea). However, if your coworkers are generally stagnant and opposed to change, Git will drive them nuts and you will hate your life. <i>Choose Git only if you want a program that will abstract away mundane tasks like merging but you don't mind having to change your world view towards version control.</i></div>
Why Linux Sucks2010-11-04T00:00:00+00:00http://timkellogg.me/blog/2010/11/04/why-linux-sucks<div class='post'>
Just to be clear, I have had Linux on my main home computer for several years. In fact, I'm writing this on Linux and I'm not having any problems. I have no intention of giving up Linux. I like how it works and I like tinkering with the different parts of it. <br /><br />I use Ubuntu. Ubuntu really is Linux for humans - its easy to use and everything just works. Well...almost everything. I installed the 64-bit version and Adobe didn't support 64-bit flash for a long time (and I couldn't install 32-bit Firefox). Seriously, how many web sites use flash? Essentially every site that my wife and I both use. My wife hates Linux. <br /><br />There's two sides to the Linux community. There are the people who want to see Linux for the masses (Canonical & team) and then there's the hardcore users.<br /><br />The thing that really gets me about Linux is that the hardcore users have no intention of making Linux easier to use. I usually don't have a problem finding Linux help on the Internet, but the gurus that answer Linux questions aren't particularly easy going. I've spent enough time reading through forums for Linux help that I know that they follow a strict rubric:<br /><ol><li>Always use command line. The biggest thing is installing new programs and packages. They could easily tell someone that they need to install package <i>x</i>, but instead they always use the command line:<br /><pre class="brush: shell">sudo apt-get install destroy_linux<br /></pre>Seriously, why can't you just use the pretty UI that Ubuntu created for installing software? I know they are easy commands, but seriously. Not making things easy for my wife.</li><li>Always make things more complicated than necessary. Usually this involves using the command line with three times as many commands than you really need. But also chastising for silly questions</li><li>Keep things magical. Magical lands are fun at Disney land, but I hate punching in inexplicably terse text into a console. The terms and commands become shorter and less descriptive as you get deeper into Linux (there is no end). Don't try to understand.</li></ol>There has always been this expectation that eventually everyone will cling to Linux and reject Windows. I think that day won't come until most of the Linux kernal development team & posse have died/started using Windows. The problem with Linux is, and will continue to be for the foreseeable future, it's users. <br /><ol></ol></div>
<div class='post'>
Just to be clear, I have had Linux on my main home computer for several years. In fact, I'm writing this on Linux and I'm not having any problems. I have no intention of giving up Linux. I like how it works and I like tinkering with the different parts of it. <br /><br />I use Ubuntu. Ubuntu really is Linux for humans - its easy to use and everything just works. Well...almost everything. I installed the 64-bit version and Adobe didn't support 64-bit flash for a long time (and I couldn't install 32-bit Firefox). Seriously, how many web sites use flash? Essentially every site that my wife and I both use. My wife hates Linux. <br /><br />There's two sides to the Linux community. There are the people who want to see Linux for the masses (Canonical & team) and then there's the hardcore users.<br /><br />The thing that really gets me about Linux is that the hardcore users have no intention of making Linux easier to use. I usually don't have a problem finding Linux help on the Internet, but the gurus that answer Linux questions aren't particularly easy going. I've spent enough time reading through forums for Linux help that I know that they follow a strict rubric:<br /><ol><li>Always use command line. The biggest thing is installing new programs and packages. They could easily tell someone that they need to install package <i>x</i>, but instead they always use the command line:<br /><pre class="brush: shell">sudo apt-get install destroy_linux<br /></pre>Seriously, why can't you just use the pretty UI that Ubuntu created for installing software? I know they are easy commands, but seriously. Not making things easy for my wife.</li><li>Always make things more complicated than necessary. Usually this involves using the command line with three times as many commands than you really need. But also chastising for silly questions</li><li>Keep things magical. Magical lands are fun at Disney land, but I hate punching in inexplicably terse text into a console. The terms and commands become shorter and less descriptive as you get deeper into Linux (there is no end). Don't try to understand.</li></ol>There has always been this expectation that eventually everyone will cling to Linux and reject Windows. I think that day won't come until most of the Linux kernal development team & posse have died/started using Windows. The problem with Linux is, and will continue to be for the foreseeable future, it's users. <br /><ol></ol></div>
Object-Form mapping2010-10-19T00:00:00+00:00http://timkellogg.me/blog/2010/10/19/object-form-mapping<div class='post'>
I'm pretty sure most developers (web developers anyway) have heard of ORM (Object Relational Mapping) tools like NHibernate that map your database tables and to objects. These ORM tools reduce interaction with the database to just a few method calls, many times just Save(), GetById(), and a few custom query methods. There's a lot written about ORM, but no one really writes about the mapping between HTML forms and the objects that ORM maps.<br /><br />ASP.NET has a great solution for OFM (I'm calling it OFM because google won't give me a real name for it). If you use a FormView in combination with an ObjectDataSource you can bind the properties of your object to form elements. This is pretty cool because it reduces your code to writing an ORM mapping, creating factory methods to get and save the object, and some ASP markup that maps the object to HTML elements.<br /><br />I was playing with Ruby on Rails which has a somewhat different approach to OFM. Basically you write regular HTML and give your form elements names like "account[id]", "account[name]", etc. This seems like a little more work than the ASP.NET way except that on the server side it uses this notation to wrap the query string into an object that can be referenced in object notation from ruby code like "account.id", "account.name", etc. I believe PHP does something similar. I like this method because it's very light on HTTP - there's no obstructively bloated view state being passed around like there is in ASP.NET and you can pass several objects through the query string.<br /><br />Basically, OFM manages some of the page flow by marshalling form parameters into objects that can easily be passed to a factory method. This is awesome because it means I can focus more effort on writing unit tests for business logic that has no dependencies on the web API. It allows me to to keep page flow simple and sets up business logic for creating restful web services (seriously, you could just slap [WebMethod] attributes on the factory methods and <i>voila</i> you have web services). There seems to be a lot of framework that goes into managing OFM, but oddly I don't think many people have addressed it directly as a problem that needs to be overcome (I assume this is because the MVC architecture is supposed to address this; unfortunately vanilla ASP.NET isn't MVC).<br /><br />I recently pulled most of my hair out over the ObjectDataSource and interfacing with factory methods. In the future I want to write a post about how I got around it (and another one lambasting Microsoft for even attempting to release an API as thoughtless as the ODS, but seriously, more on that later).</div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
found it by googleing "object form mapping", things with no or unknown names are hard to find with google :D, why not calling it OFM.<br />Nice subjects on your blog :).</div>
</div>
</div>
<div class='post'>
I'm pretty sure most developers (web developers anyway) have heard of ORM (Object Relational Mapping) tools like NHibernate that map your database tables and to objects. These ORM tools reduce interaction with the database to just a few method calls, many times just Save(), GetById(), and a few custom query methods. There's a lot written about ORM, but no one really writes about the mapping between HTML forms and the objects that ORM maps.<br /><br />ASP.NET has a great solution for OFM (I'm calling it OFM because google won't give me a real name for it). If you use a FormView in combination with an ObjectDataSource you can bind the properties of your object to form elements. This is pretty cool because it reduces your code to writing an ORM mapping, creating factory methods to get and save the object, and some ASP markup that maps the object to HTML elements.<br /><br />I was playing with Ruby on Rails which has a somewhat different approach to OFM. Basically you write regular HTML and give your form elements names like "account[id]", "account[name]", etc. This seems like a little more work than the ASP.NET way except that on the server side it uses this notation to wrap the query string into an object that can be referenced in object notation from ruby code like "account.id", "account.name", etc. I believe PHP does something similar. I like this method because it's very light on HTTP - there's no obstructively bloated view state being passed around like there is in ASP.NET and you can pass several objects through the query string.<br /><br />Basically, OFM manages some of the page flow by marshalling form parameters into objects that can easily be passed to a factory method. This is awesome because it means I can focus more effort on writing unit tests for business logic that has no dependencies on the web API. It allows me to to keep page flow simple and sets up business logic for creating restful web services (seriously, you could just slap [WebMethod] attributes on the factory methods and <i>voila</i> you have web services). There seems to be a lot of framework that goes into managing OFM, but oddly I don't think many people have addressed it directly as a problem that needs to be overcome (I assume this is because the MVC architecture is supposed to address this; unfortunately vanilla ASP.NET isn't MVC).<br /><br />I recently pulled most of my hair out over the ObjectDataSource and interfacing with factory methods. In the future I want to write a post about how I got around it (and another one lambasting Microsoft for even attempting to release an API as thoughtless as the ODS, but seriously, more on that later).</div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
found it by googleing "object form mapping", things with no or unknown names are hard to find with google :D, why not calling it OFM.<br />Nice subjects on your blog :).</div>
</div>
</div>
Why we chose Git instead of Subversion2010-10-12T00:00:00+00:00http://timkellogg.me/blog/2010/10/12/why-we-chose-git-instead-of-subversion<div class='post'>
I just got a new job as an ASP.NET developer at a small company that is freshly developing itself into somewhat of a software company. The development team is undergoing a ton of changes over the past 6 months (6 months ago there were two developers, now there is five as well as a new director of technology). As part of our changes we took some time to evaluate the tools we use. We had been using Microsoft's Team Foundation Server for source control and a home-grown system for bug tracking but after our evaluations we settled on <a href="http://www.redmine.org/">Redmine</a> and <a href="http://git-scm.com/">Git</a>.<br /><br />The fact that we are using Redmine for ALM and bug tracking isn't particularly surprising to me because it's a feature heavy and mature product that is very natural to use. There are several other feature heavy mature ALM tools that would fit us, but none that are free (I don't consider Trac feature heavy). Git, however, is a bit of a pleasant surprise for me.<br /><br />For the uninitiated ones, Git is a distributed SCM (source control management) tool. The distributed part means that it works kind of like Subversion except that everyone has a full clone of the repository. When you want to check your code in you commit first to yourself and then push your changes to the rest of the team. More realistically you would be committing to yourself several times and occasionally pushing your changes to the rest of the team when you verify that your code is stable.<br /><br />The benefit of this is that you can maintain your own personal branches of the code where you experiment on certain features without having to push them out to everyone else. I see this as psychologically breaking down the barrier to committing code. I often find that I don't commit code for a while because, even though it builds, I'm not sure if some of the pages will run without errors. However, committing to myself means that I can commit whenever I want and not slow any of my teammates down with potential errors.<br /><br />Git also provides very easy and simple branching. They made it extremely easy to drop everything your doing to fix that top priority bug in production (the "stash" operation lets you save uncommitted changes and move to another part of the code). With this extra change management, Git also forces you to account for all your changes. Before you switch branches you have to either stash, commit or revert your current changes. At first this seems annoying, but on second thought it forces to always have some sort of accounting for why you changed stuff.<br /><br />We did have some hesitation with changing to Git. Our biggest concern was if one of our partner teams from a different company could keep up with a change in SCM. After some evaluation we realized that Git provided so much flexibility with managing our workflow with this partner that it makes Subversion look like an archaic hack.<br /><br />Another concern we had was stability. Git itself has been around since 2005 and seems to have pretty strong development community backing it. It has a very strong Linux following and a year ago lacked a good Windows interface. However, <a href="http://code.google.com/p/tortoisegit/">TortoiseGit</a> has been developing at a very rapid rate (it's single developer has been releasing more than twice a month and is quickly working toward supporting most of Git's features). Because it is developing so fast we agreed that we could disregard shortcomings in the Windows environment in due to the awesome number and power of the features it brings.<br /><br />Today I worked on importing our TFS repository into a Git clone. I found <a href="http://github.com/WilbertOnGithub/TFS2GIT">a PowerShell script</a> hosted on Github that got me pretty close. The code in the script was a little too brittle so I made the code a little more generic and sent it back to him. It's taking about six hours to migrate the 1200 changesets into Git, so the script probably won't finish running for another couple hours, but I think it's working so far.<br /><br />I will have to follow up in six months or so with an evaluation of how things have gone.</div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
nice thank you, i will have a look at this Git.</div>
</div>
</div>
<div class='post'>
I just got a new job as an ASP.NET developer at a small company that is freshly developing itself into somewhat of a software company. The development team is undergoing a ton of changes over the past 6 months (6 months ago there were two developers, now there is five as well as a new director of technology). As part of our changes we took some time to evaluate the tools we use. We had been using Microsoft's Team Foundation Server for source control and a home-grown system for bug tracking but after our evaluations we settled on <a href="http://www.redmine.org/">Redmine</a> and <a href="http://git-scm.com/">Git</a>.<br /><br />The fact that we are using Redmine for ALM and bug tracking isn't particularly surprising to me because it's a feature heavy and mature product that is very natural to use. There are several other feature heavy mature ALM tools that would fit us, but none that are free (I don't consider Trac feature heavy). Git, however, is a bit of a pleasant surprise for me.<br /><br />For the uninitiated ones, Git is a distributed SCM (source control management) tool. The distributed part means that it works kind of like Subversion except that everyone has a full clone of the repository. When you want to check your code in you commit first to yourself and then push your changes to the rest of the team. More realistically you would be committing to yourself several times and occasionally pushing your changes to the rest of the team when you verify that your code is stable.<br /><br />The benefit of this is that you can maintain your own personal branches of the code where you experiment on certain features without having to push them out to everyone else. I see this as psychologically breaking down the barrier to committing code. I often find that I don't commit code for a while because, even though it builds, I'm not sure if some of the pages will run without errors. However, committing to myself means that I can commit whenever I want and not slow any of my teammates down with potential errors.<br /><br />Git also provides very easy and simple branching. They made it extremely easy to drop everything your doing to fix that top priority bug in production (the "stash" operation lets you save uncommitted changes and move to another part of the code). With this extra change management, Git also forces you to account for all your changes. Before you switch branches you have to either stash, commit or revert your current changes. At first this seems annoying, but on second thought it forces to always have some sort of accounting for why you changed stuff.<br /><br />We did have some hesitation with changing to Git. Our biggest concern was if one of our partner teams from a different company could keep up with a change in SCM. After some evaluation we realized that Git provided so much flexibility with managing our workflow with this partner that it makes Subversion look like an archaic hack.<br /><br />Another concern we had was stability. Git itself has been around since 2005 and seems to have pretty strong development community backing it. It has a very strong Linux following and a year ago lacked a good Windows interface. However, <a href="http://code.google.com/p/tortoisegit/">TortoiseGit</a> has been developing at a very rapid rate (it's single developer has been releasing more than twice a month and is quickly working toward supporting most of Git's features). Because it is developing so fast we agreed that we could disregard shortcomings in the Windows environment in due to the awesome number and power of the features it brings.<br /><br />Today I worked on importing our TFS repository into a Git clone. I found <a href="http://github.com/WilbertOnGithub/TFS2GIT">a PowerShell script</a> hosted on Github that got me pretty close. The code in the script was a little too brittle so I made the code a little more generic and sent it back to him. It's taking about six hours to migrate the 1200 changesets into Git, so the script probably won't finish running for another couple hours, but I think it's working so far.<br /><br />I will have to follow up in six months or so with an evaluation of how things have gone.</div>
<h2>Comments</h2>
<div class='comments'>
<div class='comment'>
<div class='author'>Anonymous</div>
<div class='content'>
nice thank you, i will have a look at this Git.</div>
</div>
</div>
CouchDB + Ext as a Replacement for Server Code2010-06-08T00:00:00+00:00http://timkellogg.me/blog/2010/06/08/couchdb-ext-as-replacement-for-server<div class='post'>
In a previous post about ExtJS I mentioned the possibility of developing a web application that runs entirely inside the browser and doesn't require any server side code. The idea stems from a) ExtJS is a fully capable widget framework and b) CouchDB is accessible via a web service. At least 80% of web apps are just a HTML interface with a database back-end and a little bit of business logic. So why can't we move all that business logic to the browser, setup calls to a <a href="http://couchdb.apache.org/docs/intro.html">CouchDB</a> web service from the browser and 86 the server-side code? In this post I'm going to analyze this question and see if it's realistic. In a follow up post I'm going to analyze this same question from a business standpoint.<br /><br /><b><span style="font-size: large;">A Database Void of Schema</span></b><br /><br />CouchDB is a document oriented database, meaning that it doesn't have tables and keys like you do in relational databases. It just has one big space full of documents. A document in CouchDB is a JSON object, so its attribute values can be strings, booleans, numbers, lists, or other objects (documents). Having complex "rows" means that many of your relationships that you would normally form by using a second table and a primary-foreign key set is simplified down to embedding a list. Consequently, 1-to-1 and 1-to-many relationships are native to the database and require no extra thought or planning. Many-to-many relationships are <a href="http://wiki.apache.org/couchdb/EntityRelationship">more complicated</a>, so this approach might break down if you require too many of these. Some other oddities in relational databases like versioning and pivot tables come native with CouchDB. Since the bulk of our database requirements are made easier with CouchDB, querying is going to be generally simpler.<br /><br />The other great thing about having a document formatted in JSON is that you can save any JavaScript object directly to the database. You could save the state of an Ext widget or a whole form. It's like simplified object serialization for the browser! This is definitely a killer argument for making fat client apps with Ext.<br /><br /><b><span style="font-size: large;">But What About Performance?</span></b><br /><br />At some point, someone's going to ask it. I say, Twitter uses it, they seem to be doing well, there's one case that its proven itself. The biggest argument for CouchDB being a scalable database is the fact that it is built from the ground up with the intent of being distributed across many nodes in a cloud. So while it is easy to get a database stood up for development, it's just as easy to move that database into a highly distributed cloud with hundreds of nodes. This makes it easy to develop scalable world-class apps like Twitter or Google.<br /><br />CouchDB uses a type of index that is based on the <a href="http://en.wikipedia.org/wiki/MapReduce">map-reduce</a> algorithm used in functional programming. You define a function in JavaScript that takes a list of values and chooses which ones to include in a <a href="http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views">view</a>. When you want to query the database you just ask it for all or part of a view. Because it uses the map-reduce algorithm to index, it's agnostic towards when and how many documents are indexed at a time. So documents can be quickly indexed on insertion/creation, or the whole database can be indexed at one time.<br /><br />If the client code is developed entirely in Ext and JavaScript, all forms are static HTML pages, so the server can easily respond to 80% of requests with little more than <a href="http://www.mozilla.org/projects/netlib/http/http-caching-faq.html">a few HTTP headers</a> (client-side caching). <br /><br /><span style="font-size: large;"><b>What About Security?</b></span><br /><br />At this point someone must be ready to blurt out something about this being an incredibly insecure approach to web development. After all, any slick hacker can modify the JavaScript code and execute arbitrary insertions/deletions. Security is definitely going to be a lot bigger of a concern in this case. However, CouchDB does provide fine grained security controls. <a href="http://www.youtube.com/watch?v=oHKvV3Nh-CI">Here</a> is an informative video about CouchDB security controls.<br /><br />The big difference with designing security into couch apps is that security is going to be built into the database instead of the application. CouchDB provides constructs for users to be part of roles. If constructed well, the developer can leverage the database to deny or allow certain operations for the current user.<br /><br />Taking the Ext + CouchDB approach is going to be a fundamental shift in application design. If we learned to write apps like this we might actually learn to rely on the <a href="http://timkellogg.blogspot.com/2010/05/incidental-inversion-of-control.html">framework</a> to do what it does best, and let our app do only what it needs to do. We might even find ourselves making stable and secure apps in less time.<br /><br /><span style="font-size: large;"><b>Conclusion</b></span><br /><br />From a technical perspective, I think this might be a very feasible design paradigm. In a coming post I am going to talk about the business costs involved. However, I think document oriented databases might be something I want to investigate further and design into future applications.</div>
<div class='post'>
In a previous post about ExtJS I mentioned the possibility of developing a web application that runs entirely inside the browser and doesn't require any server side code. The idea stems from a) ExtJS is a fully capable widget framework and b) CouchDB is accessible via a web service. At least 80% of web apps are just a HTML interface with a database back-end and a little bit of business logic. So why can't we move all that business logic to the browser, setup calls to a <a href="http://couchdb.apache.org/docs/intro.html">CouchDB</a> web service from the browser and 86 the server-side code? In this post I'm going to analyze this question and see if it's realistic. In a follow up post I'm going to analyze this same question from a business standpoint.<br /><br /><b><span style="font-size: large;">A Database Void of Schema</span></b><br /><br />CouchDB is a document oriented database, meaning that it doesn't have tables and keys like you do in relational databases. It just has one big space full of documents. A document in CouchDB is a JSON object, so its attribute values can be strings, booleans, numbers, lists, or other objects (documents). Having complex "rows" means that many of your relationships that you would normally form by using a second table and a primary-foreign key set is simplified down to embedding a list. Consequently, 1-to-1 and 1-to-many relationships are native to the database and require no extra thought or planning. Many-to-many relationships are <a href="http://wiki.apache.org/couchdb/EntityRelationship">more complicated</a>, so this approach might break down if you require too many of these. Some other oddities in relational databases like versioning and pivot tables come native with CouchDB. Since the bulk of our database requirements are made easier with CouchDB, querying is going to be generally simpler.<br /><br />The other great thing about having a document formatted in JSON is that you can save any JavaScript object directly to the database. You could save the state of an Ext widget or a whole form. It's like simplified object serialization for the browser! This is definitely a killer argument for making fat client apps with Ext.<br /><br /><b><span style="font-size: large;">But What About Performance?</span></b><br /><br />At some point, someone's going to ask it. I say, Twitter uses it, they seem to be doing well, there's one case that its proven itself. The biggest argument for CouchDB being a scalable database is the fact that it is built from the ground up with the intent of being distributed across many nodes in a cloud. So while it is easy to get a database stood up for development, it's just as easy to move that database into a highly distributed cloud with hundreds of nodes. This makes it easy to develop scalable world-class apps like Twitter or Google.<br /><br />CouchDB uses a type of index that is based on the <a href="http://en.wikipedia.org/wiki/MapReduce">map-reduce</a> algorithm used in functional programming. You define a function in JavaScript that takes a list of values and chooses which ones to include in a <a href="http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views">view</a>. When you want to query the database you just ask it for all or part of a view. Because it uses the map-reduce algorithm to index, it's agnostic towards when and how many documents are indexed at a time. So documents can be quickly indexed on insertion/creation, or the whole database can be indexed at one time.<br /><br />If the client code is developed entirely in Ext and JavaScript, all forms are static HTML pages, so the server can easily respond to 80% of requests with little more than <a href="http://www.mozilla.org/projects/netlib/http/http-caching-faq.html">a few HTTP headers</a> (client-side caching). <br /><br /><span style="font-size: large;"><b>What About Security?</b></span><br /><br />At this point someone must be ready to blurt out something about this being an incredibly insecure approach to web development. After all, any slick hacker can modify the JavaScript code and execute arbitrary insertions/deletions. Security is definitely going to be a lot bigger of a concern in this case. However, CouchDB does provide fine grained security controls. <a href="http://www.youtube.com/watch?v=oHKvV3Nh-CI">Here</a> is an informative video about CouchDB security controls.<br /><br />The big difference with designing security into couch apps is that security is going to be built into the database instead of the application. CouchDB provides constructs for users to be part of roles. If constructed well, the developer can leverage the database to deny or allow certain operations for the current user.<br /><br />Taking the Ext + CouchDB approach is going to be a fundamental shift in application design. If we learned to write apps like this we might actually learn to rely on the <a href="http://timkellogg.blogspot.com/2010/05/incidental-inversion-of-control.html">framework</a> to do what it does best, and let our app do only what it needs to do. We might even find ourselves making stable and secure apps in less time.<br /><br /><span style="font-size: large;"><b>Conclusion</b></span><br /><br />From a technical perspective, I think this might be a very feasible design paradigm. In a coming post I am going to talk about the business costs involved. However, I think document oriented databases might be something I want to investigate further and design into future applications.</div>
Playing With ExtJS2010-06-02T00:00:00+00:00http://timkellogg.me/blog/2010/06/02/playing-with-extjs<div class='post'>
<div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div>I've worked with the JavaScript framework <a href="http://jquery.com/">jQuery</a> before and I've heard of <a href="http://www.extjs.com/">Ext JS</a> but I wanted to try it for myself. Essentially the main difference between jQuery and Ext is that while jQuery works great tacked on top of other JavaScript frameworks like <a href="http://www.asp.net/">ASP.NET</a> or <a href="http://java.sun.com/javaee/javaserverfaces/">JSF</a>, Ext is more of a replacement for those frameworks. Coding in Ext feels like Swing or Windows Forms but for the browser.<br /><br />Since Ext forms live completely inside the browser's memory space there isn't a <a href="http://www.xefteri.com/articles/show.cfm?id=18">postback</a> every time you click a button or expand a tree node like there is in ASP.NET. I think the delay from a postback makes the user experience feel choppy, especially if you don't have a fast internet connection. ASP.NET makes it very easy to hook into any <a href="http://en.wikipedia.org/wiki/Document_Object_Model">DOM </a>event, but since these event handlers live on the server, hooking into these events causes a postback which in turn causes the whole page to reload. Moving all this event handling logic to the browser makes the application seem a lot faster.<br /><br />Since it requires so much JavaScript coding (and so little HTML coding), you should invest in a good JavaScript editor. <a href="http://www.extjs.com/products/designer/">Ext Designer</a> is a WYSISYG drag-n-drop editor for Ext controls. The pricing seems kind of steep to me, $219 for a <a href="http://www.extjs.com/store/designer/">single developer license</a>, but I suppose that if you're going to use it a lot then it's probably worth the money. Take a look at the screenshot of Ext Desinger below.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/_7Sytqc9_ICY/TAKsf1OKV0I/AAAAAAAAAII/qWoNKyILE1Y/s1600/Screenshot-ExtDesigner.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/_7Sytqc9_ICY/TAKsf1OKV0I/AAAAAAAAAII/qWoNKyILE1Y/s320/Screenshot-ExtDesigner.png" /></a></div><br /><br />There's a list of controls on the left. You drag a control onto the form, re-size it, edit its properties and preview the whole form. It's very easy for laying out the form (especially if you're not familiar with Ext). You can even setup all the data sources (AJAX calls) for controls, like the grid or the tree, and then preview the form with real data. The major shortcoming is that it's only a UI designer - there is no integrated code editor. You have to export the project to add all the program logic and event handlers via another editor like <a href="http://www.aptana.org/studio">Eclipse</a>. On the other hand, using the designer in conjunction with another editor isn't particularly difficult if you have the designer project saved in the same folder as the rest of your application, its just a little painful to have to switch between applications, I suppose. <br /><br />This being my first experience with Ext I have to say that I'm relatively impressed. JavaScript has come a long way since the days of dial-up modems and table-layout. With the dawn of <a href="http://www.taranfx.com/ie9-vs-chrome-vs-firefox-vs-opera">efficient browsers</a> and <a href="http://www.alistapart.com/articles/previewofhtml5">HTML5 </a>I think creating true fat client web applications is a reality. In a follow-up post I am going to talk about CouchDB and how using Ext with CouchDB could possibly replace the need for server-side code altogether.</div>
<div class='post'>
<div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"></div>I've worked with the JavaScript framework <a href="http://jquery.com/">jQuery</a> before and I've heard of <a href="http://www.extjs.com/">Ext JS</a> but I wanted to try it for myself. Essentially the main difference between jQuery and Ext is that while jQuery works great tacked on top of other JavaScript frameworks like <a href="http://www.asp.net/">ASP.NET</a> or <a href="http://java.sun.com/javaee/javaserverfaces/">JSF</a>, Ext is more of a replacement for those frameworks. Coding in Ext feels like Swing or Windows Forms but for the browser.<br /><br />Since Ext forms live completely inside the browser's memory space there isn't a <a href="http://www.xefteri.com/articles/show.cfm?id=18">postback</a> every time you click a button or expand a tree node like there is in ASP.NET. I think the delay from a postback makes the user experience feel choppy, especially if you don't have a fast internet connection. ASP.NET makes it very easy to hook into any <a href="http://en.wikipedia.org/wiki/Document_Object_Model">DOM </a>event, but since these event handlers live on the server, hooking into these events causes a postback which in turn causes the whole page to reload. Moving all this event handling logic to the browser makes the application seem a lot faster.<br /><br />Since it requires so much JavaScript coding (and so little HTML coding), you should invest in a good JavaScript editor. <a href="http://www.extjs.com/products/designer/">Ext Designer</a> is a WYSISYG drag-n-drop editor for Ext controls. The pricing seems kind of steep to me, $219 for a <a href="http://www.extjs.com/store/designer/">single developer license</a>, but I suppose that if you're going to use it a lot then it's probably worth the money. Take a look at the screenshot of Ext Desinger below.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="http://2.bp.blogspot.com/_7Sytqc9_ICY/TAKsf1OKV0I/AAAAAAAAAII/qWoNKyILE1Y/s1600/Screenshot-ExtDesigner.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://2.bp.blogspot.com/_7Sytqc9_ICY/TAKsf1OKV0I/AAAAAAAAAII/qWoNKyILE1Y/s320/Screenshot-ExtDesigner.png" /></a></div><br /><br />There's a list of controls on the left. You drag a control onto the form, re-size it, edit its properties and preview the whole form. It's very easy for laying out the form (especially if you're not familiar with Ext). You can even setup all the data sources (AJAX calls) for controls, like the grid or the tree, and then preview the form with real data. The major shortcoming is that it's only a UI designer - there is no integrated code editor. You have to export the project to add all the program logic and event handlers via another editor like <a href="http://www.aptana.org/studio">Eclipse</a>. On the other hand, using the designer in conjunction with another editor isn't particularly difficult if you have the designer project saved in the same folder as the rest of your application, its just a little painful to have to switch between applications, I suppose. <br /><br />This being my first experience with Ext I have to say that I'm relatively impressed. JavaScript has come a long way since the days of dial-up modems and table-layout. With the dawn of <a href="http://www.taranfx.com/ie9-vs-chrome-vs-firefox-vs-opera">efficient browsers</a> and <a href="http://www.alistapart.com/articles/previewofhtml5">HTML5 </a>I think creating true fat client web applications is a reality. In a follow-up post I am going to talk about CouchDB and how using Ext with CouchDB could possibly replace the need for server-side code altogether.</div>
Why I Decided To Start A Blog2010-05-29T00:00:00+00:00http://timkellogg.me/blog/2010/05/29/why-i-decided-to-start-blog<div class='post'>
<span style="font-size: large;"><b></b></span><br />I strongly believe that for every avenue of life that we enter, we should leave it a better place. So for every job that I take, my goal is to leave a more efficient or more powerful work group behind. By blogging I can bring up issues that I come across, and if I also bring up solutions to those problems I can give other people the chance to learn from my experiences.<br /><br />I also believe strongly in open source software (OSS). I wish there were more companies like <a href="http://code.google.com/">Google</a> that invest a lot of capital in developing OSS. From a business standpoint, when considering investment in public resources like OSS, it is hard to see the ROI. I think Google has done an exceptional job of finding revenue from OSS, and I think that is positive for the world.<br /><br />Blogging is similar to OSS in the way that blogs are a public resource and they're written by regular people in their spare time (I wish I could be paid to develop OSS). I read a lot of blogs from other technical people. Some of them I <a href="http://www.nullorempty.com/">follow regularly</a>, others I end up inadvertently reading by googling for <a href="http://jcalderone.livejournal.com/39678.html">some technical problem</a>. Blogs are free content that adds value to our lives. <br /><br />Starting a blog was a result of a lot of thinking. It's been bugging me for a while that I read all these blogs and I don't write one. I think its important to give back at least a portion of what you consume. If you don't like what I have to say here you don't have to read it. What I say here won't waste anyone's time or clog their inbox without their consent. Since this blog can't ever be a burden on society, it can only add value. So in that line of logic, this blog is necessary.</div>
<div class='post'>
<span style="font-size: large;"><b></b></span><br />I strongly believe that for every avenue of life that we enter, we should leave it a better place. So for every job that I take, my goal is to leave a more efficient or more powerful work group behind. By blogging I can bring up issues that I come across, and if I also bring up solutions to those problems I can give other people the chance to learn from my experiences.<br /><br />I also believe strongly in open source software (OSS). I wish there were more companies like <a href="http://code.google.com/">Google</a> that invest a lot of capital in developing OSS. From a business standpoint, when considering investment in public resources like OSS, it is hard to see the ROI. I think Google has done an exceptional job of finding revenue from OSS, and I think that is positive for the world.<br /><br />Blogging is similar to OSS in the way that blogs are a public resource and they're written by regular people in their spare time (I wish I could be paid to develop OSS). I read a lot of blogs from other technical people. Some of them I <a href="http://www.nullorempty.com/">follow regularly</a>, others I end up inadvertently reading by googling for <a href="http://jcalderone.livejournal.com/39678.html">some technical problem</a>. Blogs are free content that adds value to our lives. <br /><br />Starting a blog was a result of a lot of thinking. It's been bugging me for a while that I read all these blogs and I don't write one. I think its important to give back at least a portion of what you consume. If you don't like what I have to say here you don't have to read it. What I say here won't waste anyone's time or clog their inbox without their consent. Since this blog can't ever be a burden on society, it can only add value. So in that line of logic, this blog is necessary.</div>
Incidental Inversion of Control2010-05-28T00:00:00+00:00http://timkellogg.me/blog/2010/05/28/incidental-inversion-of-control<div class='post'>
This morning I started reading about the <a href="http://www.springsource.org/">Spring Framework</a> and, as usual, I followed a <a href="http://martinfowler.com/bliki/InversionOfControl.html">rabbit hole</a> to learn what the phrase <i>Inversion of Control</i> (IoC) means. IoC is also known as the <i>Hollywood Effect</i> ("don't call us, we'll call you"). A lot of programming frameworks use an inversion of control to take care of the bulk of the work and leave your code to perform its task (and only its task).<br /><br />Most web frameworks are a good example of IoC. In Java web applications, the framework takes care of all HTTP complexities and turns control over to your servlet or JSP when the time is right. This leaves your JSP to process the request and return a response - easy! The ASP.NET framework has an excellent inversion of control with its postback model. The framework allows for applications to be built very similar to Windows applications - the underlying framework takes care of display issues and calls parts of the applications code when the time is right. A lot of these calls to code are handlers for events like <i>Click</i>, <i>Load</i>, and others.<br /><br />As I read about this "new" concept I began to realize that it wasn't new at all. ASP.NET and J2EE use it extensively. In fact, I have created such a framework without realizing what I created. In the middle of last year I created a pluggable scheduler interface for our <a href="http://www.eq-technologic.com/">eQube</a> environment that allows the programmer to simply specify report names and filter values via XML, and when it comes time to do something special, the programmer can hook into events and have the framework execute some JavaScript code to do something special.<br /><br />I stumbled into creating this framework after doing several short projects that required some boilerplate code to interface with the eQube APIs. It all happened quite innocently, but having taken the incidental route to IoC framework I have gotten much more value than I thought I would. For instance, it is suddenly very easy to run a report with 400 different filter configurations. I just put together some XML to spec the report and throw in a block of JavaScript to change the filter values. The inversion of control takes away most of the responsibility and leaves me to do my job, and only my job.<br /><br />After today's lesson in inversion of control, I'm brainstorming new ways to use it. Perhaps even consolidating my other code into the scheduler framework, or maybe integrating it with the spring framework. As always, there's power in doing less.</div>
<div class='post'>
This morning I started reading about the <a href="http://www.springsource.org/">Spring Framework</a> and, as usual, I followed a <a href="http://martinfowler.com/bliki/InversionOfControl.html">rabbit hole</a> to learn what the phrase <i>Inversion of Control</i> (IoC) means. IoC is also known as the <i>Hollywood Effect</i> ("don't call us, we'll call you"). A lot of programming frameworks use an inversion of control to take care of the bulk of the work and leave your code to perform its task (and only its task).<br /><br />Most web frameworks are a good example of IoC. In Java web applications, the framework takes care of all HTTP complexities and turns control over to your servlet or JSP when the time is right. This leaves your JSP to process the request and return a response - easy! The ASP.NET framework has an excellent inversion of control with its postback model. The framework allows for applications to be built very similar to Windows applications - the underlying framework takes care of display issues and calls parts of the applications code when the time is right. A lot of these calls to code are handlers for events like <i>Click</i>, <i>Load</i>, and others.<br /><br />As I read about this "new" concept I began to realize that it wasn't new at all. ASP.NET and J2EE use it extensively. In fact, I have created such a framework without realizing what I created. In the middle of last year I created a pluggable scheduler interface for our <a href="http://www.eq-technologic.com/">eQube</a> environment that allows the programmer to simply specify report names and filter values via XML, and when it comes time to do something special, the programmer can hook into events and have the framework execute some JavaScript code to do something special.<br /><br />I stumbled into creating this framework after doing several short projects that required some boilerplate code to interface with the eQube APIs. It all happened quite innocently, but having taken the incidental route to IoC framework I have gotten much more value than I thought I would. For instance, it is suddenly very easy to run a report with 400 different filter configurations. I just put together some XML to spec the report and throw in a block of JavaScript to change the filter values. The inversion of control takes away most of the responsibility and leaves me to do my job, and only my job.<br /><br />After today's lesson in inversion of control, I'm brainstorming new ways to use it. Perhaps even consolidating my other code into the scheduler framework, or maybe integrating it with the spring framework. As always, there's power in doing less.</div>