Brandon Rice

Traversing a Graph with Spaceships

2017-07-07T15:20:00-04:00

Computer science students are typically introduced to graph traversal early in their coursework. It’s also a topic often used as a talking point or whiteboard exercise during many engineering interviews. Outside of those contexts, it might be hard to find a non-contrived reason to brush up on graph traversal algorithms. I recently reacquainted myself while working on a game I’ve been building in my spare time. The game contains a galaxy map with interconnected star systems, and the player uses the map to calculate a flight path between those systems. This post discusses a depth-first search algorithm with path tracing used to build this feature.

Choosing an Algorithm

Selecting a method for traversing a graph involves two decisions. First, decide between depth-first versus breadth-first. A depth-first approach fully explores each branch before continuing on to the next one. A breadth-first method travels across each node on one level of the graph before moving on to the next. The choice between depth-first and breadth-first depends on the organization and expectations of the graph data. A depth-first search is more appropriate for graphs with very deep branches containing many nodes. In this application, the graph contains ten nodes with multiple potential paths between any two nodes. I selected a depth-first algorithm, but the graph is small enough that a breadth-first search would perform just as well.

The second question when selecting a graph traversal algorithm is iteration versus recursion. The trade off here is a question of complexity. Many people find recursion harder to reason about. The order that the nodes are traversed in can also vary slightly between the two methods. For this use case, I selected an iterative approach mostly because I can’t remember previously writing an iterative, depth-first search algorithm.

Finding the Destination

This implementation uses two data structures: A stack to keep track of the nodes to search, and an array containing a list of nodes that were already visited.

Pop the next node off the stack. This is the node currently being examined.
Mark this node as visited.
If this node is the destination, we're done!
If not, push each node adjacent to this one on to the stack unless it was already visited.
Go back to 1.

In plain English, that’s all it takes to traverse the graph for a desired destination. This is an iterative approach, but it’s worth noting that the only difference between this and a recursive implementation is the stack. In a recursive method, the stack is implied because it’s the call stack. Steps one and five then change slightly. Instead of looping and going back to the first step, call the same function again on the next adjacent node.

Leaving a Trail of Breadcrumbs

The algorithm returns the destination node, but that’s not particularly helpful for this use case. We already knew there was a path between the origin and destination because we designed the graph! The real value of this feature lies in showing a highlighted route to the player and storing it for future reference. As the player flies to subsequent systems, update the pointer to the head of the path to keep track of the current route. The full algorithm has one new step and one new data structure. The new data structure is a dictionary with keys and values that are both graph nodes.

Pop the next node off the stack. This is the node currently being examined.
Mark this node as visited.
If this node is the destination, we're done!
Otherwise, for each node adjacent to this one...
1 Push the adjacent node onto the stack.
2 Add the adjacent node to the path dictionary as a key if it does not yet exist.
        The value is the current node being examined.
Go back to 1.

Retracing Our Steps

Unwind the path dictionary by starting from the destination and retracing the path back to its origin. Record each node along the way in a new array, and the result will be a list of nodes in the traveled path.

Set X equal to the destination.
Add X to the array of nodes.
Set Y to the value of the path dictionary entry where the key equals X.
Set X equal to Y.
Repeat 2-4 until X equals the origin.

With an array of nodes representing the navigational path in hand, a route can be painted for the player on the screen and stored somewhere for future reference.

Real Code

The final implementation uses C# (this game is built in Unity). The definition of the GalaxyMapNode class is not shown – nor is the full listing for the class containing this method – but the few parts that are relevant can be discerned from context.

private GalaxyMapNode[] DfsPath(GalaxyMapNode destination) {
        GalaxyMapNode   current = null;
        GalaxyMapNode   found   = null;
        GalaxyMapNode[] visited = { };
        GalaxyMapNode[] path    = { };

        Stack<GalaxyMapNode> stack                         = new Stack<GalaxyMapNode>();
        Dictionary<GalaxyMapNode, GalaxyMapNode> parentMap = new Dictionary<GalaxyMapNode, GalaxyMapNode>();

        stack.Push(currentPosition);

        while (stack.Count > 0) {
            current = stack.Pop();        // Pop the next node off the stack...

            if (current == destination) { // If this node is the destination...
                found = current;
                break;                    // ...we're done!
            }

            if (!ArrayUtility.Contains(visited, current)) {
                ArrayUtility.Add(ref visited, current);

                // For each adjacent node...
                foreach (GalaxyMapNode adjacent in current.GetAdjacentNodes()) {
                    stack.Push(adjacent);              // Push each one onto the stack.

                    if (!parentMap.ContainsKey(adjacent)) {
                        parentMap[adjacent] = current; // Add to the path dictionary.
                    }
                }
            }
        }

        current = found; // now we backtrack for the path

        while (current != currentPosition) {
            ArrayUtility.Add(ref path, current);
            current = parentMap[current];
        }

        ArrayUtility.Add(ref path, currentPosition);

        System.Array.Reverse(path);

        return path;
  }

Conclusion

This implementation of a depth-first iterative search is not optimal or perfect by any means. However, the size and structure of the graph is known, and perfect optimization isn’t necessary in order to meet the requirements.

If you enjoyed this post, please consider subscribing

One Rails App With Many Databases

2016-04-09T13:08:00-04:00

A traditional Rails application uses a single database. Programmers interact with it by writing models that inherit from ActiveRecord::Base. As the application grows, it may be useful to connect to different databases for a variety of reasons. One database might be dedicated to reports. Another may be the result of an entirely different process, and now the Rails application wants to read from it. Using multiple databases helps a Rails application scale, and may be a more manageable first step toward an architecture based on microservices.

Rails needs two things in order to back specific ActiveRecord models from different databases: A connection configuration and an establish_connection directive. First, the configuration.

config/database.yml

default: &default
  adapter: postgresql
  encoding: unicode
  pool: 5
  host: localhost
  username: postgres
  password: postgres

reporting_db:
  <<: *default
  database: reporting_db

If the new database has different connection or authentication options, make those additions.

Next, instruct Rails to use a different database for a particular model.

app/models/report_user.rb

class ReportUser < ActiveRecord::Base
  establish_connection("reporting_db")
end

When the ReportUser class is loaded, Rails creates an additional connection pool for the new database. All reads and writes involving this model now use the new database.

Those are the basics, but there’s a few more things to think about when working with multiple databases in the same Rails app.

Migrations

The ReportUser model works great if a report_users table already exists in the new database, but what about creating one from scratch? Generated migrations need a little tweaking because the default database is the assumed target.

db/migrate/201604091557_create_report_users.rb

class CreateReportUsers < ActiveRecord::Migration
  def change
    ActiveRecord::Base.establish_connection("reporting_db")

    create_table :report_users do |t|
      t.string :first_name
      t.string :last_name
      # ... etc.
    end

    ActiveRecord::Base.establish_connection(Rails.env)
  end
end

This works, but there should be an easy way to create the database before running migrations.

lib/tasks/reporting.rake

namespace :reporting do
  namespace :db do
    desc 'Create the reporting database'
    task create: :environment do
      config = ActiveRecord::Base.configurations['reporting_db']

      # Database is null because it hasn't been created yet.
      ActiveRecord::Base.establish_connection(config.merge('database' => nil))
      ActiveRecord::Base.connection.create_database(config['database'], config)
    end
  end
end

Now we’re getting somewhere, but what about using this database for several additional models?

Just One Connection Pool

Imagine creating two more reporting models, ReportOrder and ReportProduct. They look identical to ReportUser, each with a call to establish_connection. The problem here is that each class creates its own independent connection pool, and each pool has some number of individual TCP connections to the database server. Maybe this doesn’t matter for three models, but what about ten? I previously wrote about the dangers of failing to care about TCP connections. Let’s refactor before this has an opportunity to become a problem.

app/models/reporting/base.rb

module Reporting
  class Base < ActiveRecord::Base
    self.abstract_class = true

    establish_connection('reporting_db')
  end
end

app/models/reporting/user.rb

module Reporting
  class User < Reporting::Base
  end
end

app/models/reporting/order.rb

module Reporting
  class Order < Reporting::Base
  end
end

All subclasses of Reporting::Base now share a single connection pool. This is the same way that ActiveRecord::Base creates a connection pool used by its other subclasses. The abstract_class assignment in the Reporting::Base model means child classes look for database tables using expected Rails-isms (i.e. reporting_users, reporting_orders) instead of following single table inheritance rules.

Architectural Thinking

We’ve nicely namespaced all of the reporting models. This convention can extend to include namespacing of related controllers and views. Good separation of concerns suggests that it makes sense to isolate the reporting concept. In a world where microservices are trendy, this might be the moment when someone suggests making a reporting service. That’s a heavy investment, but there is a reasonable compromise that still accomplishes many of the same design goals: A Rails engine.

An isolated Rails engine with its own database is basically a lightweight service. Generate an engine inside lib/reporting and relocate everything in the existing Reporting namespace into the engine. Make sure the engine is isolated.

lib/reporting/lib/reporting/engine.rb

module Reporting
  class Engine < ::Rails::Engine
    isolate_namespace Reporting
  end
end

It’s normal when using a Rails engine to copy the engine migrations into the enclosing application using rake reporting:install:migrations. This step is unnecessary when the engine has its own database, and is actually detrimental to the separation of concerns. Instead, add a few helper tasks alongside the earlier one for creating the database.

lib/tasks/reporting.rake

namespace :reporting do
  namespace :db do
    migration_path = Rails.root.to_s + '/lib/reporting/db/migrate'

    desc 'Migrate the Reporting database'
    task migrate: :environment do
      ActiveRecord::Base.establish_connection("reporting_db")
      ActiveRecord::Migrator.migrate(migration_path)
    end

    desc 'Rollback the Reporting database'
    task rollback: :environment do
      ActiveRecord::Base.establish_connection("reporting_db")
      ActiveRecord::Migrator.rollback(migration_path)
    end

    # Additional tasks for db:drop, db:seed, db:schema:load
  end
end

Treat the reporting engine as a different project. Develop it separately. Consider moving the code into its own repository and pulling it in as a gem. Strictly adhere to the engine’s isolation by keeping constants from unnecessarily bleeding across module boundaries.

class User < ActiveRecord::Base
end

module Reporting
  # Referring to ::User here is adding a dependency on the enclosing
  # application from the engine.

  class Order < Reporting::Base
  end
end

# Referring to Reporting::Order here is adding a dependency on the engine
# from the enclosing application.

Adding the above dependencies couples the engine to the application and vice versa. This is not always bad, but each additional dependency should be an explicit and careful choice.

If and when you decide to take the plunge on a reporting service, the engine is ready to convert into a standalone Rails application. In the meantime, repeat this pattern to grow an existing Rails app using multiple databases in a modularized, scalable manner.

If you enjoyed this post, please consider subscribing.

A Gentle Intro to WebGL

2016-02-27T09:30:00-05:00

WebGL is a JavaScript API for rendering hardware accelerated graphics in an HTML canvas element. In other words, it’s a key that unlocks the door between desktop application graphics and the web. This post discusses the WebGL rendering pipeline and shows an example of drawing basic shapes using a fairly minimal 50 lines of code.

At a high level, the WebGL rendering process breaks down into three phases:

Stuff happens inside JavaScript.
More stuff happens inside the GPU.
The GPU draws the results of all the stuff in an HTML canvas element.

Setting up this drawing process comes with a lot of initial ceremony. It might seem overwhelming without prior OpenGL programming experience, but this is a one-time cost. An early investment in a few different concepts becomes the foundation for creating a custom rendering pipeline tailored to the individual needs of a system.

A basic program that exercises every piece of the above diagram can be assembled from the bottom up. But first, some initial setup.

index.htmllink

  
   onload="main()">
     id="canvas" width="400" height="400">
      Your browser does not support canvas.

scripts.jslink

function main() {
  var canvas = document.getElementById("canvas");
  var gl     = canvas.getContext("webgl");
}

The gl variable contains a reference to a WebGL rendering context. This context is the main interface for the WebGL API.

Shaders

Shaders are pre-compiled drawing programs that run inside the GPU. They are written in a C-like language called GLSL and provide rendering instructions to the GPU. Two types of shaders are used in this pipeline example: Vertex and fragment.

Vertex Shader

Vertex shaders describe how to draw the vertices making up one or more polygons. For the purposes of this example, that means a list of two-dimensional coordinates. However, the vertex shader does not know the actual positions of these coordinates. It knows only that they exist, and that they will be available by way of some attribute provided when the program runs.

index.htmllink

At runtime, an attribute named a_position of the type vec2 (a 2-dimensional vector) contains positional data about a vertex. Convert that vector into a vec4 (4-dimensional vector) and assign it to the special WebGL global variable gl_Position. This program runs once for every pair of vertex coordinates.

Fragment Shader

Fragment shaders describe the space between vertices. While the vertex shader was called once for each vertex, the fragment shader program is called once for each pixel in the space between those vertices. In this example, the fragment shader program describes the color of each pixel.

index.htmllink

At runtime, each time the fragment shader program executes (for each pixel), assign a new 4-dimensional vector describing a color (in RGBA form) to the special WebGL global variable gl_FragColor. In this case, the color is always white.

Shader Setup

Hooking up shaders makes up a large chunk of the WebGL setup ceremony. The source code for the shaders must be compiled and linked together in an instance of a WebGL program.

scripts.jslink

var program       = gl.createProgram();
var vShader       = gl.createShader(gl.VERTEX_SHADER);
var fShader       = gl.createShader(gl.FRAGMENT_SHADER);
var vShaderSource = document.getElementById("vertex-shader").text;
var fShaderSource = document.getElementById("fragment-shader").text;

gl.shaderSource(vShader, vShaderSource);
gl.compileShader(vShader);

gl.shaderSource(fShader, fShaderSource);
gl.compileShader(fShader);

gl.attachShader(program, vShader);
gl.attachShader(program, fShader);

gl.linkProgram(program);
gl.useProgram(program);

Once compiled, the process is not repeated unless the shader source code changes.

Attributes

Attributes serve as containers for the data that travels from JavaScript into the shader programs.

scripts.jslink

var positionLocation = gl.getAttribLocation(program, "a_position");
gl.enableVertexAttribArray(positionLocation);

Expose the a_position attribute from the vertex shader and provide a reference to it in JavaScript. Think of it as a pointer to the place in memory where the attribute data resides.

Buffers

If attributes are the containers for data, then buffers are the pipes that connect JavaScript to those containers.

scripts.jslink

var buffer   = gl.createBuffer();
var vertices = [-0.5,  0.5,
                -0.5, -0.5,
                 0.5,  0.5,
                 0.5,  0.5,
                -0.5, -0.5,
                 0.5, -0.5];
gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
gl.bufferData(gl.ARRAY_BUFFER, new Float32Array(vertices), gl.STATIC_DRAW);

Create a buffer and an array containing positional data. Then, activate the buffer by “binding” it. Finally, declare that the data for the activated buffer is the array of positional data in the form of 32-bit floats.

Drawing

The setup is finally complete. It’s time to draw.

scripts.jslink

gl.clearColor(0.0, 0.0, 0.0, 1.0);
gl.clear(gl.COLOR_BUFFER_BIT);
gl.vertexAttribPointer(positionLocation, 2, gl.FLOAT, false, 0, 0);
gl.drawArrays(gl.TRIANGLES, 0, 6);

First, declare a clear color of black (in RGBA form). Next, inform WebGL to clear the color buffer using the declared clear color. Then, declare a pointer to the WebGL attribute a_position containing the vertex data. Finally, draw the buffered vertices as a pair of triangles.

The results may be slightly underwhelming for the amount of effort, but this lays the groundwork for more advanced applications.

Conclusion

In a more advanced WebGL application, the drawing section might be called repeatedly in a loop as the buffered data changes. Drawing 60 times each second results in a target frame rate of 60fps. Everything else is initial setup that may expand in size (i.e. additional buffers, more complex shaders), but otherwise looks very similar to this example. The complete example is available on Github and as a JSBin.

For more in-depth learning about WebGL, I recommend these resources:

The WebGL Programming Guide is an introductory book that assumes no previous knowledge of WebGL, HTML, JavavScript, or OpenGL.
Learning Three.js is a book covering a popular library that wraps WebGL in a higher-level abstraction. Three.js is great for getting things done, but it hides many implementation details.
Learning WebGL is an older tutorial site based on an even older OpenGL tutorial. However, it has a wide variety of lessons covering many different aspects of WebGL.
WebGL Fundamentals is a new(er) tutorial site covering more advanced WebGL concepts. Some preexisting familiarity with WebGL will be helpful here. In particular, the math sections are well done.

If you enjoyed this post, please consider subscribing.

A Middleware Stack Without Rack

2015-09-18T13:43:00-04:00

Rack is the underlying web server interface used by popular Ruby frameworks like Rails and Sinatra. I previously wrote about making a Rack server from scratch after finding a lack of how-to help while googling. Recently, I found myself digging back in after becoming inspired by Rack’s middleware stack. There are plenty of guides floating around on how to use and create middleware for Rack, but I wanted to take the stack concept itself and use it in a completely different context – something that wasn’t necessarily a web application.

A middleware stack is not the traditional LIFO (last in, first out) data structure that comes to mind for many programmers when they hear the word “stack”. It’s a layered series of code modules, each of which modifies the state of an incoming data structure. After each layer has a turn, the resulting (new) structure is returned.

Here’s the situation which made me want to adapt this pattern:

I have a collection of isolated code modules, each of which does a specific task.
The sum result of those tasks are used to make decisions somewhere else in the program.
The code modules can be assembled and used in various configurable combinations.
Sometimes, the order they run in matters.
I want to easily communicate the architecture to teammates in a way that is familiar to them.

There are only two types of pieces in this puzzle: The middleware and the builder.

Middleware

A middleware is nothing more than a class that takes an “application” (more on that later) as its constructor argument, and which implements a single method, call. This method takes one argument: A hash of the current “request” environment. In Rack parlance, the request is an incoming HTTP request. There is no HTTP here, so the “request” is really just whatever is making use of the middleware stack. The only requirement is that call must return by passing the new environment (including whatever changes are made) to the next layer of the stack.

class SomeMiddleware
  def initialize(app)
    @app = app
  end

  def call(env)
    # Do some stuff to env
    @app.call(env) # on to the next layer
  end
end

Builder

The builder is a class that manages a middleware stack and an associated application. There may be many builder instances depending on the number of desired middleware arrangements. The application is simply an object that responds to call. In its simplest form, it is a lambda.

# Almost a verbatim copy of the Rack::Builder class
class Builder
  def initialize(&block)
    @middleware = []
    instance_eval(&block)
  end

  # @param [Hash] env The initial environment for this call.
  #
  # @return [Hash] The resulting environment after all the middleware has done something.
  def call(environment = {})
    to_app.call(environment)
  end

  # Makes instances of all of the classes in the current stack of middleware,
  #   injecting the application and any specified arguments given when #use was called.
  def to_app
    application = lambda { |env| env }

    @middleware.reverse.inject(application) do |app, component|
      component.call(app)
    end
  end

  # @param [Class] klass The middleware class that will be instantiated and used
  # @param [Array] args Any arguments that need to be passed when instantiating the middleware
  def use(klass, *args, &block)
    @middleware << lambda do |app|
      klass.new(app, *args, &block)
    end
  end
end

Using the builder means defining a desired stack configuration and then calling it.

first_config = Builder.new do
  use SomeMiddleware
  use SomeOtherMiddleware
end

second_config = Builder.new do
  use AlternativeMiddleware1, with_constructor_arg
  use AlternativeMiddleware2
end

request_details = {} # an initial env to pass in

first_result = first_config.call(request_details)
second_result = second_config.call(request_details)

The results are hashes that have been manipulated in any number of ways by the various middleware layers.

Takeaways

A shortcoming I’ve identified is the difficulty of parallelizing stack layers that don’t absolutely have to run in a specified order. That’s a solvable problem, but worth noting when considering this pattern. One large benefit is the ease of communicating this architecture to my teammates, which I mentioned above as a primary goal. I can start a knowledge transfer conversation with “it works like Rack middleware”, and immediately establish a shared understanding. Would I use this pattern again? Maybe. It gets the job done and it’s fairly easy to understand. At the very least, I’ve emerged with a deeper understanding of Rack internals.

If you enjoyed this post, please consider subscribing.

Leading A Team At Ruby For Good

2015-08-10T13:30:00-04:00

Last weekend, I attended Ruby for Good at George Mason University. Ruby for Good is a cross between a mini-conference and hackathon where participants spend three days building web applications (or other software) for non-profit organizations. This was my second year attending and my first time leading a team. I personally think our team was amazing and managed to deliver a great application, but I definitely learned a few things along the way about leadership and deadlines.

You Can Never Be Too Prepared

I learned I would be leading a team about a month early, and I attempted to make good use of that lead time by contacting my organization’s representative early and often via email and phone. After a few conversations, I wrote up a series of user stories for her to review, comment on, and approve. Then, I turned those user stories into Github issues. I did all of this in an attempt to understand the domain as deeply as possible. At Ruby for Good, the team meets on Friday morning and delivers a project on Sunday afternoon. The faster I could transfer my knowledge of the problem space to my teammates, the more productivity we could squeeze out of those 2.5 days.

I set a goal of having a code repository ready and waiting for teammates to clone on the morning of the event. It’s easy to lose half of that first day to environment and build issues. So, after initializing a bare Rails app, I wrote a README with a list of succinct instructions for getting up and running as quickly as possible.

I prepared as best I could prior to the event, and I think it paid off. Teams were selected at approximately 10AM on Friday morning. The first commit from one of my team members occurred at 11:41AM. The first pull request was merged at 3:30PM, and that was after we took a break for lunch.

Being Decisive Saves Time

Time is the most valuable asset at any hackathon. Ruby for Good amplified that feeling because I wasn’t simply building something for myself. I made a commitment to deliver a working application, and I wanted to follow through on that promise. There were two things in particular that I didn’t want to waste time on: Technology choices and stakeholder feedback.

I normally take the time to evaluate technology options carefully by comparing the requirements to the available solutions. I consider if it’s worthwhile to experiment with pre-1.0 products. I poll my teammates to see what they’re interested in and if we can find an opportunity to learn something new. None of that applied in this case. The choices I made were meant to be traditional and obvious to Rails programmers, regardless of experience level: The latest stable versions of Ruby, Rails, and Postgres. Vanilla JavaScript without any frameworks. I set up deployment using Capistrano to an Ubuntu 14.04 machine on Digital Ocean. The team later decided to use the CSS portions of Materialize, but that was a deferred group decision that was easy to drop in during development.

I did my best to eliminate anything that resulted in waiting on stakeholder feedback. Normal, non-programmer people who aren’t participating in hackathons typically don’t respond to email very promptly on weekends. It’s a bonus if a particular stakeholder is able to be more involved, but it’s not the expectation. A sufficient amount of preparation meant that I wasn’t waiting for responses to questions and blocking development as a result. In those rare situations where something questionable came up, I took the initiative, made a decision, and acted on it. If it turns out to be the wrong choice and the resulting feature isn’t delivered just right, that’s fine. Tweaking things later in response to feedback is just iterative development. The most valuable (and limited) resource available to me in this situation was the time of my team members, and I didn’t want to waste it on indecision.

Junior Developers Know More Than You

The team had a healthy variety of experience that was evenly split between senior and junior developers. My goal as a lead was to find divisions of work appropriate for those different experience levels. I wanted everyone to feel that they were making important contributions. To that end, I tried to set aside tasks for the junior developers that were more straight-forward or easier to reference in documentation: Using generators to build out application scaffolding, or integrating third-party login using the OmniAuth gem. Meanwhile, I leaned on the more experienced developers for things like researching integration with various Google APIs and making higher-level architectural choices for the entire application.

One thing I didn’t expect was the sheer amount of new stuff I learned from reading pull requests from junior developers. I spend my days working on an older Rails application with a fairly stable set of gems and libraries. The people on my team at work are mostly senior developers. It was great having exposure to gems I’ve never heard of and language or framework features that I don’t normally have an opportunity to use. I pride myself on continual learning and professional growth, but this was a stark reminder of just how quickly technology moves. More importantly, it was a reminder that experience diversity is a good thing.

Leadership Means Becoming a Renaissance Developer

Our team had people comfortable working at most levels of the stack: From design, styling, and JavaScript on the front-end to Ruby and Rails on the server. One thing we were missing was someone to handle the infrastructure, operations, and deployment strategies. I initially filled that role myself, but soon started taking on a variety of increasingly diverse tasks in support of other team members. I was suddenly very thankful for the excuses I’ve had to work on all sorts of crazy projects at every level of the stack. I often worry that I’m not focusing on specific skills enough; that I’ll end up with knowledge that is wide but shallow. That’s still a concern, but this showed me the importance of being able to jump into any role that a team might need.

Doing Good at Ruby for Good

Ruby for Good is an opportunity to participate in a full (albeit accelerated) project lifecycle from concept to delivery. That kind of experience can be hard to come by. It’s valuable to have insight into how a project evolves beyond simply writing code, and that goes double for anyone considering freelancing or contracting. The icing on the cake is the knowledge that you’re giving a little something back to organizations who wouldn’t otherwise have the means to hire developers. We’re all immersed in technology and surrounded by brilliant people on a daily basis, and that environment breeds impostor syndrome. This conference is a great reminder that everyone’s skills are valuable and the demand for them is high, regardless of our perceived self-worth when compared to peers. Ruby for Good might not be able to save the world in a single weekend, but I think it does a pretty good job at making things a little better.

If you enjoyed this post, please consider subscribing.

Make Your Own Rack Server

2015-05-31T18:26:00-04:00

tl;dr here’s a gist with all of the code.

Every web developer who spends a significant amount of time with Ruby inevitably reaches a point when they want to learn more about Rack. Rack is at the heart of the most popular Ruby web frameworks, including Rails and Sinatra. There are tons of resources available for getting started with Rack applications from the ground up, but I found myself curious about the other side of the fence. How do I write a web server that knows how to talk to Rack applications, and can I get Sinatra to serve a minimal app using that server?

I started with the simplest Sinatra application possible.

# my_server is the server I want to write
set :server, :my_server

get '/' do
  'Hello world!'
end

Trying to run the above application will result in an error because Sinatra is asking Rack to use a server called my_server, and Rack doesn’t know about it. So, let’s tell Rack about the new server.

require 'rack'

# Stub out the server we're making
class MyServer
  def initialize(app)
    @app = app
  end

  def start
    # Handle requests
  end
end

module Rack
  module Handler
    class MyServer
      def self.run(app, options = {})
        server = ::MyServer.new(app)
        server.start
      end
    end
  end
end
Rack::Handler.register('my_server', 'Rack::Handler::MyServer')

Telling Rack about a server is as simple as defining a new handler that lets Rack know how to start the server. The handler has a single method, run, which receives the Rack-compliant application to be served, along with an optional hash of server-specific settings. All that’s left to do is actually implement the server, which is the most significant portion of the entire exercise.

class MyServer
  STATUS_CODES = {200 => 'OK', 500 => 'Internal Server Error'}

  attr_reader :app, :tcp_server

  def initialize(app)
    @app = app
  end

  def start
    @tcp_server = TCPServer.new('localhost', 8080)

    loop do
      socket   = tcp_server.accept
      request  = socket.gets
      response = ''

      env = new_env(*request.split)
      status, headers, body = app.call(env)

      response << "HTTP/1.1 #{status} #{STATUS_CODES[status]}\r\n"
      headers.each do |k, v|
        response << "#{k}: #{v}\r\n"
      end
      response << "Connection: close\r\n"

      socket.print response
      socket.print "\r\n"

      if body.is_a?(String)
        socket.print body
      else
        body.each do |chunk|
          socket.print chunk
        end
      end

      socket.close
    end
  end

  def new_env(method, location, *args)
    {
      'REQUEST_METHOD'   => method,
      'SCRIPT_NAME'      => '',
      'PATH_INFO'        => location,
      'QUERY_STRING'     => location.split('?').last,
      'SERVER_NAME'      => 'localhost',
      'SERVER_POST'      => '8080',
      'rack.version'     => Rack.version.split('.'),
      'rack.url_scheme'  => 'http',
      'rack.input'       => StringIO.new(''),
      'rack.errors'      => StringIO.new(''),
      'rack.multithread' => false,
      'rack.run_once'    => false
    }
  end
end

If you’ve ever experimented with writing a basic HTTP server, most of this is boilerplate. Loop continually, waiting for TCP connections. When one is received, pass the request through to the Rack application along with all of the necessary environment settings. When the application is done, send the request back to the client along with any headers, and then close the connection. Obviously, this server has some pretty severe limitations and isn’t intended for actual real-world use.

The only Rack-specific code is the hash created in the new_env method. A Rack application is simply an object that responds to one method, call. That method takes a single argument which is a hash describing the current environment. I took some liberties here because I was only interested in getting the most basic application to work, but the Rack specification describes all of the expected environment values in detail. The takeaway is that Rack applications expect an environment hash, and it’s the job of the server to provide that hash its initial state.

That’s literally all there is to standing up a web server that can speak the Rack language. The small Sinatra app from the beginning of this post should now serve up its Hello World page without a problem. The functionality of this web server is obviously quite limited, but it’s enough to get started on the path toward something more robust. The interesting part to me was how easy this was to put together after a little digging through the Rack source. From the perspective of a server, Rack really is designed to get out of your way while providing a very simple interface to the world of Ruby web apps.

If you enjoyed this post, please consider subscribing.

Take a Swim in the Connection Pool

2015-04-24T17:01:00-04:00

A few days ago, we experienced some problems with Redis. It occasionally blew up. When we examined the server at those times, we saw interesting things like thousands of open TCP connections on port 6379.

At the same time, in a seemingly unrelated universe, we experienced strange problems on the server that runs our monitoring application. We use Dashing to keep tabs on various metrics. Over time, Dashing ate up all the memory on the box and kept dying a horrible death. We implemented a “fix” by restarting the application via cron a few times each day, but that didn’t always keep the machine from dying.

Considering these two problems led me to think about the internals of Dashing. Dashing uses Thin, which is an evented application server built on top of EventMachine. Dashing also uses Rufus-scheduler in order to schedule its various monitoring jobs. The Rufus-scheduler gem hooks into that same EventMachine loop, and it all runs in a single Ruby process. We had a few Dashing jobs that looked like this…

SCHEDULER.every '1m' do |job|
  redis = Redis.new(url: "redis://#{redis_config}")
  # ... do some stuff with Redis
end

I did some digging in the Redis Ruby client and discovered that there is no automatic connection pooling implemented. That’s interesting. Then, I did this…

$ netstat -a | grep 6379 | wc -l
1

$ irb
2.1.2 (main):0 > 100.times { Redis.new.keys }
=> 100

$ netstat -a | grep 6379 | wc -l
101

Well, that sucks. Each subsequent run of our monitoring jobs created a new TCP connection to Redis that wasn’t closed. Sure, the connections eventually timed out, but multiple jobs running once a minute resulted in connections that were created faster than they could timeout. No wonder All The Things were breaking after an indeterminate amount of time. The fix was pretty simple…

# Somewhere in some globally accessible object
require 'connection_pool'
pool = ConnectionPool.new(size: 3, timeout: 5) do
  ::Redis.new(url: "redis://#{redis_config}")
end

# Then, when it's time to use Redis
pool.with |conn| do
  # do some stuff with Redis
end

I used the connection_pool gem out of convenience, but writing your own connection pool manager is pretty trivial. Here’s the result of the change…

The yellow blob is memory in use. Prior to deploying fixes, the only way to keep the machine from choking up all the time was a series of application restarts courtesy of cron. The resulting improvement is significant. I should note that I deployed other fixes at the same time which cut back on memory being leaked throughout the application, but the connection pool implementation was the most significant change.

The moral of the story is that we as Ruby/Rails programmers tend to take things like memory management and connection pooling for granted. Ruby is garbage collected, but it’s still very easy to leak memory through poor code. Additionally, it’s important to keep track of the network connections our applications are arbitrarily establishing. ActiveRecord manages a connection pool for us. Certain other gems (like the Mongo Ruby driver) do as well. But that doesn’t mean that every 3rd party client library will keep us safe. I’ll certainly consider this next time I’m opening a connection in my code.

If you enjoyed this post, please consider subscribing.

How the Fork Does Resque Work?!

2015-01-24T11:22:00-05:00

Any reasonably sized Rails application will eventually find itself in need of background job processing. Sending users email, generating reports, and interacting with third party APIs are just a few examples of secondary things that can ideally happen without slowing down web requests. There are several answers to the background job question for Rails, but the architectural differences usually boil down to a choice between threads or processes for concurrency. Resque is one that uses processes, and it’s one we use at Optoro. Several work projects have given me a reason to familiarize myself with the internals of Resque, and as a result I routinely end up supporting my teammates when they have questions about how it works and deploying things that use it. Here is a high level look at how Resque does its magic.

1. Enqueue a Request

An operation that can run in the background is identified. The Rails application needs to enqueue a request. The mechanism for doing this depends on the particular versions of Rails and Resque. On Rails 4.x using ActiveJob, the interface is MyJob.perform_later. On earlier versions of Rails using Resque 1.x, this is done using Resque.enqueue. Resque uses Redis behind the scenes, and adding a job to the queue actually means serializing some details about the job and inserting that information into Redis. The Rails application finishes dealing with Resque at this point, and returns to the web request-response cycle.

2. Start a Worker Process

A piece of information now sits in a Redis data structure representing the job to execute. Something needs to consume that queue, and it comes in the form of a completely separate Ruby process. This process is typically started by running rake resque:work from the application directory. The new process waits to consume information from the Redis queue, and then uses that information to identify and execute the background job.

When the worker process starts, a completely new copy of the entire Rails application is loaded into memory. One of the most common examples of a background job is sending an email using ActionMailer. That code lives inside the Rails application, and so the entire app must be loaded. However, an important thing to consider is that you don’t actually need Rails to run the job code. Rails is potentially a huge amount of overhead. If a job is enqueued using the MyJob class from Rails, then the only requirement for running that job is a MyJob class in the consuming process that listens on the same queue.

require 'resque'

class MyJob
  @queue = :my_job_queue

  def self.perform
      # do some stuff
  end
end

The code above knows nothing about Rails, but will happily consume a background job that was enqueued from a Rails application. Add a minimal Rakefile that pulls in resque/tasks, and you can run this worker with rake resque:work from a completely different application. This example uses Resque directly without the ActiveJob interface. Using ActiveJob means adding it as a dependency. In general, the more domain knowledge needed to run the background job, the more dependencies the consumer process will have in common with the Rails application. Use this as inspiration to write several small applications – only one of which uses Rails – instead of one huge monolithic Rails app. This could be an intermediary step toward some sort of microservices based solution.

3. Fork a Child Process

The Ruby process that sits and listens for jobs in Redis is not the process that ultimately runs the job code written in the perform method. It is the “master” process, and its only responsibility is to listen for jobs. When it receives a job, it forks yet another process to run the code. This other “child” process is managed entirely by its master. The user is not responsible for starting or interacting with it using rake tasks. When the child process finishes running the job code, it exits and returns control to its master. The master now continues listening to Redis for its next job.

The advantage of this master-child process organization – and the advantage of Resque processes over threads – is the isolation of job code. Resque assumes that your code is flawed, and that it contains memory leaks or other errors that will cause abnormal behavior. Any memory claimed by the child process will be released when it exits. This eliminates the possibility of unmanaged memory growth over time. It also provides the master process with the ability to recover from any error in the child, no matter how severe. For example, if the child process needs to be terminated using kill -9, it will not affect the master’s ability to continue processing jobs from the Redis queue.

In earlier versions of Ruby, Resque’s main criticism was its potential to consume a lot of memory. Creating new processes means creating a separate memory space for each one. Some of this overhead was mitigated with the release of Ruby 2.0 thanks to copy-on-write. However, Resque will always require more memory than a solution that uses threads because the master process is not forked. It’s created manually using a rake task, and therefore must load whatever it needs into memory from the start. Of course, manually managing each worker process in a production application with a potentially large number of jobs quickly becomes untenable. Thankfully, we have pool managers for that.

4. Pools and Schedulers

One of the most ubiquitous pool managers for Resque (and the one we use at Optoro) is resque-pool. This plugin provides a rake task that manages all of the workers normally started using rake resque:work. Earlier, I pointed out that each worker process requires its own copy of the application in memory. A pool can potentially alleviate these memory concerns. When the pool starts, it loads the entire application into memory. Then, it forks a process for each (master) worker. Once again, copy-on-write significantly reduces the amount of memory used by each forked process. The memory benefits combined with the convenience of process management make resque-pool (or some other pool solution) an easy win.

The other tool worthy of consideration for part of your Resque infrastructure is a scheduler. One of the most popular solutions is resque-scheduler. The scheduler is a very simple, cron-based application that inserts jobs into the Redis queue based on a configuration file. It has very few dependencies in general, and doesn’t need the Rails app or the job code in memory. As a matter of fact, it doesn’t need any constant definitions at all if the job class names are passed as string arguments.

Conclusion

It’s valuable to understand the tools you’re using, especially when it comes to rogue processes outside the normal scope of the Rails application. Understanding leads to better architectural decisions. The concepts that apply to Resque will certainly be applicable to other background job solutions. Implementation is details. The most important skill is learning how to think about the application in a different way. Go forth and enqueue.

If you enjoyed this post, please consider subscribing.

Investigating Higher Quality Software

2015-01-15T17:15:00-05:00

Deploying 100% bug-free code is the Holy Grail of software development. Everyone wants it, but sometimes it seems like an impossible quest. This is increasingly true if you happen to be working on a large project that contains a significant amount of legacy code. We’ve all been there: Sitting at our desks, sweating bullets while pushing the deploy button and praying that none of those lovely green servers on the load balancer dashboard turn red. We develop processes like regression testing in an attempt to avoid those dangerous deploys, but bugs still slip through the cracks. What’s to be done?

It’s not just you. Delivering faulty software on a regular basis is a problem that plagues the industry. Sometimes, it feels like we’re trying to hide our own failures behind the Captain America shield of “agile” development. Bugs are part of the process, but don’t worry because we’re iterating quickly! I challenge myself not to think like that. Just because we’re able to deploy 20 times a day doesn’t excuse us from the responsibility of getting it right the first time.

Playing loose and fast like this doesn’t fly in other industries. Surgeons can’t make mistakes until they get it right. Architects can’t implement a flawed building blueprint and then correct it later. Before I started writing code for a living, I worked in an industry that has a very low tolerance for mistakes: Law enforcement. Getting it wrong the first time as a police officer can have some pretty serious repercussions on someone’s life or liberty.

That’s not to say that surgeons, architects, and cops always get it right. Watch the news on any given day and you’re sure to see the ramifications of screwing up something serious. However, those other industries still have an error rate (and a fault tolerance) that is far lower than your average software development shop. So, why shouldn’t we as programmers hold ourselves to an equally high standard? Our work may not mean the difference between life and death on a daily basis, but our mistakes could result in tens of thousands of dollars (or more) worth of economic damage to our employers. Depending on your business, that might indirectly affect more lives than you think. Let’s do it right the first time.

When I worked in law enforcement, I was a criminal investigator primarily tasked with pursuing fire and arson-based crimes. I’ve spent quite a bit of time recently thinking about techniques and practices that I used as an investigator to minimize my risk of making mistakes in all aspects of my work. Minimizing risk is a way of life for a police officer. I want to apply that mindset to my work as a developer, and I also want to encourage it among my team members.

So, without further ado, here are eight techniques for raising the bar on software deliverables, from a criminal investigation perspective.

1. Maintaining Situational Awareness (Reading Code)

In law enforcement, they say that your head is “on a swivel”. Be aware of your environment. Always know what’s around you. When entering a room, your first inclination is to note the locations of all the exits. When sitting down, put your back against a corner and face the entrance so that you have line of sight on everyone who comes in. Always watch the hands of people you approach on the street. Take note of identifying details in case you need to describe someone later.

In software development terms, maintain awareness of your surroundings by reading code. Read the code that other members of your team write. Read the third party library code in your application. Before charging ahead to implement a new feature, read the code that might be affected and understand the implications of the needed changes. In many cases, you may spend more time reading code than writing it. That’s a good thing.

2. Interviewing Witnesses (Talking to Stakeholders)

Witnesses are always one of the most valuable sources of information when conducting a criminal investigation. The problem is that most people typically aren’t very observant. Getting good information from a witness is something of a painful extraction process that requires asking very specific questions in order to exercise their memories. Part of this process involves honing your ability to “read” people. Be observant of the subtle physiological reactions that your questions elicit, and practice associating those reactions with the emotions they represent.

Effectively communicating with your stakeholders is one of the most important parts of taking a software project from start to finish. All of those same communication skills are directly applicable. Asking specific, pointed questions reassures the stakeholder that you’re both talking about the exact same thing. Being cognizant of physiological responses will help you recognize when the other person doesn’t really understand, even though they might say otherwise. That’s your signal to re-frame the explanation, probably using less technical jargon. Not everyone speaks Tech, and developers are notorious for finding the most complicated way to explain simple concepts. It’s a natural reaction for people to “fake it until they make it” even if they don’t truly understand what you’re saying. That can spell disaster when the subject at hand is project requirements.

3. Examining the Scene (Testing)

Talking to witnesses yields subjective observations, but physical evidence doesn’t forget and can’t lie. A thorough scene examination is the only way to get the objective information that you need as an investigator to draw conclusions based on fact. Conduct your examination by applying methodologies that are widely accepted, procedural, and repeatable because you will be called upon to justify them in court, under oath. Courts will not qualify expert witnesses who’s methods can’t stand up under rigorous vetting.

The “procedural” part refers to the development process, not the programming language. Some people swear by Test or Behavior Driven Development. I generally practice TDD, but that doesn’t mean it’s my exclusive mantra. Maybe you’re one of those folks who believe TDD is dead. The particular school of thought doesn’t matter as long as you have some sort of process that involves testing. Most like-minded developers will probably listen as long as you can justify your methods. The non-negotiable part is that there should be tests, regardless of when they were written. Those tests will be the record of truth for future development. They are proof that the proper specification was implemented, regardless of methodology.

4. Consulting Your Partner (Quality Assurance)

Most investigators in agencies with enough personnel work in pairs. The reasoning is simple: Two pairs of eyes are better than one. A partner is an investigatory assistant, a sounding board for crazy theories, and a friend to watch your back all rolled into one. Working in pairs is safer and more productive than going it alone. The very presence of another person means any potential mistakes have to make it through an additional layer of protection.

Quality assurance can mean several different things depending on the work environment. Larger organizations may have a dedicated QA team. In small shops, it may be just another developer. If you’re a freelancer, QA might be you taking a fresh look after a coffee break. It’s preferable to have someone who was not involved in development QA your work, but that’s not always realistic. Even when you can hand your work to someone else, you should still be manually testing it beforehand. The existence of a QA department is not your excuse to pawn basic functionality testing off on someone else. That means not only testing the features you worked on directly, but any related systems that may have been affected as a result of your changes. There is never a reason for delivering code that hasn’t been run from a user’s perspective.

5. Preparing Charges (Documentation)

The act of sitting down and writing the report is both mandatory and exceedingly useful for the investigator. It forces the organization and presentation of thoughts. This often has the beneficial side-effect of raising new questions and revealing previously unconsidered connecting details. Furthermore, it’s an opportunity to tell the detailed story about how your conclusions were reached using sound investigatory methods. It will be the record on file that will represent you and your work in front of a judge and jury. It could be the deciding factor in someone’s guilt or innocence. Details matter. Professionalism matters. Judges don’t like spelling mistakes.

Documentation is more than just a collection of README files. It’s any written attempt to communicate the intent of your code to an audience. That could mean Github issue responses, JIRA comments, commit messages, or any number of things. Clarity of detail and the presentation of professionalism are just as important as they are in the investigator’s written report. Documentation for a new feature should explain how the deliverable meets the original requirements. If it’s a bug, describe the process used to diagnose and repair the problem in such a way that a reader could duplicate your actions. Consider your audience and write at a technical level that is appropriate for the reader. The goal is not to impress everyone with the depths of your knowledge, but rather to communicate well enough that the reader doesn’t need to ask any follow up questions.

6. Obtaining a Warrant (Code Review)

Writing an arrest warrant is a detailed, often frustrating process. It’s a request to take away someone’s freedom. Not only must you carefully, laboriously lay out the facts of your case and the conclusions that you drew as a result, but you must do so using a very specific presentation style and format. Getting a warrant signed means bringing it in person to a judge at the courthouse. The judge reviews your application and, if he or she approves, has you swear an oath in their presence. This means that every mistake or forgotten detail in the warrant results in one more round trip to the office and back. That’s serious motivation to get it right the first time.

Code review is asking for a warrant that, once signed, will allow you to deploy. Developers don’t have to raise their right hands and swear an oath before receiving approval, but the vetting process should still be equally rigorous depending on the scope of change. The review may be a semi-formalized process depending on the organization, or it could be as simple as pinging a friend and asking them to review a pull request. The mechanics are not important as long as it means getting your code in front of someone else. The best reviewer is another developer who wasn’t involved in writing the code. A fresh perspective will often lead to architectural and functional improvements.

7. Making the Arrest (Deploying)

Making an arrest requires careful planning and coordination with an overall goal of controlling the environment where the arrest will be made. The best way to accomplish this is to maintain the element of surprise. Learn the target’s routine and choose a time and place where you will have a tactical (and preferably numerical) advantage. It is difficult and dangerous to make an arrest in a place that you are unfamiliar with, such as the suspect’s house. Regardless of location, maintain a heightened level of awareness and anticipation until the suspect is in jail and you are back at home or in the office. A good arrest is a well executed plan where conflict is kept to a minimum. When successful, it represents the culmination of days, weeks, or perhaps months of work.

A successful deployment is the reward at the end of the development cycle, but it too requires careful planning and coordination. Maintaining a tactical advantage means picking the right deployment strategy. End users should remain blissfully unaware of any update roll outs or restarting services. If there must be some kind of interruption, keep it as minimal as possible and choose a time that is convenient for the majority of users. Additionally, releasing code into the wild does not mean the deploy is done. There is a Danger Zone immediately following a release which may last anywhere from a few minutes to a few hours depending on the application and scope of change. Maintain a heightened sense of awareness during this time by using all available monitoring tools. Indications that something is wrong with new code may be buried in the middle of a long stack trace for something seemingly unrelated. If you work in an organization where someone else is deploying your code, the responsibility for knowing when it’s happening and subsequently monitoring the roll out still rests with you.

8. Continuing Education (Retrospective)

An investigator’s learning is never complete. There are minimum levels of government-mandated training that cover a wide array of concepts from firearms to law, but good investigators go above and beyond by seeking out sources of knowledge for staying at the forefront their field. This includes looking critically at past cases for areas of possible improvement. The most useful resource in a group of investigators is their collective past experience.

Developers don’t have mandated continuing education, so it’s each person’s individual responsibility to continue honing their craft. Reading blog posts and tech news, listening to podcasts, learning new languages and frameworks, and experimenting with side projects are all forms of continuing education. When it comes to specific projects, retrospectives are a great way to examine both the positives and negatives of the development cycle after delivering a particular feature or product. Some organizations have a formal retrospective process, but as an individual there’s nothing wrong with taking a moment after finishing a deliverable to reflect on the experience. Recognize what went well and what didn’t. Come up with ideas for how to perpetuate the former while correcting the latter.

Conclusion

This ended up being a bit long winded, but it turned out to be a useful exercise in organizing my thoughts around a topic that was floating in my head for a while. I’ve had varying degrees of success in applying these principles to my own work, but I will continue trying. We all want to deliver bug-free code, and the very act of recognizing that we can improve is a step in the right direction.

If you enjoyed this post, please consider subscribing.

Scaling Rails and MySQL with Distributed DB Reads

2014-12-06T09:46:00-05:00

Scaling is hard. If you are fortunate enough to work on a Rails application that gets a significant amount of traffic, you will eventually have to address the need to scale. The idea of microservices and Service Oriented Architecture (SOA) has become a popular methodology in the Ruby community, but there’s a big difference between talking about it and implementing it.

I’ve read tons of great blog posts and watched dozens of awesome talks on this subject, but one thing that usually seems to be missing from the SOA discussion is an approach to handling the database. Carving your monolithic application into a series of lightweight services is probably going to be an incremental affair. You need to balance ongoing feature development and business needs against your desire to take a refactoring axe to your codebase.

If you’re taking the smallest possible first step, you’re trying to find the seams in your code in order to identify and break off that first service. What about the data? The SOA dream means that most services will eventually have their own databases, and other services that need that data will have to talk to the service that owns it. Getting there is going to be hard. If you’re a traditional Rails shop, you’re likely dealing with a single, massive SQL instance (MySQL in my case). Up until now, “scaling” the database has meant throwing more hardware at the problem. So, do we shard? Great idea! Only now you’ve introduced a significant amount of complexity and operations overhead into what was supposed to be a small, incremental step in your scaling journey. Scope creep, ahoy! Maybe there’s another way.

One thing you can easily do for your database if you haven’t already is set up replication. This is great for backups and durability, but it would be nice if we could take advantage of replication by distributing database reads from Rails across all of the replication nodes. This is harder than it seems at first glance, and a myriad of problems must be addressed: What happens if you attempt to read data from a slave that hasn’t been replicated yet? What happens if one or more slave nodes go down? How do you deal with reading data immediately after it’s been written?

Fortunately, the folks over at TaskRabbit have been working on this. They’re developing a library called Makara which makes it easy to distribute SQL reads across multiple slave servers while simultaneously addressing all of the previously mentioned problems. Makara is designed for any Ruby application, but comes with a very handy set of ActiveRecord adapters for plugging in to Rails.

If you’re anything like those of us over at Optoro, we tend to be picky about introducing new dependencies into our already bloated Gemfile. Additionally, the idea of blindly embracing something as critical as an ActiveRecord adapter from a pre-1.0 library across the entire application is very scary. So, we wanted a chance to evaluate Makara by choosing the parts of our application where we specifically wanted to distribute reads to our replication slave servers. Unfortunately, Makara doesn’t make this easy by default. It’s designed to either be on or off with very little flexibility in between. Typically, if you force Makara to read or write to your master SQL instance any time in the context of a request, it’s going to “stick” to that master for any subsequent reads for the remainder of the request. This is a good thing, but it also means you can’t choose the parts of the request where you want to distribute your reads.

We wanted to be able to do something like this…

def handle_a_request
  # Normal stuff always reading/writing to master

  with_distributed_reads do
      # Stuff that will distribute reads across slaves, but
      # if a write occurs in this context it should stick 
      # to master for the remainder.
  end

  # Normal stuff always reading/writing to master
end

Furthermore, we didn’t want to introduce the overhead of establishing, disconnecting, and reestablishing ActiveRecord connections within our requests. Just use the one already defined connection (Makara maintains separate pools for the various nodes), and within a block distribute the reads if it’s appropriate to do so. I was able to accomplish this by subclassing and extending the MySQL ActiveRecord adapter that comes with Makara.

lib/active_record/connection_adapters/my_makara_adapter.rb

module ActiveRecord
  module ConnectionAdapters
      class MyMakaraAdapter < ActiveRecord::ConnectionAdapters::MakaraMysql2Adapter
          attr_accessor :distributed_mode
          attr_reader :previous_context, :current_context

          # Overrides MakaraMysql2Adapater#needs_master?
          def needs_master?(method_name, args)
              if distributed_mode
                  super
              else
                  true
              end
          end

          def with_new_context
              @current_context  = Makara::Context.get_current
              @previous_context = Makara::Context.get_previous

              Makara::Context.set_current(Makara::Context.generate)
              Makara::Context.set_previous(Makara::Context.generate)

              yield if block_given?
          ensure
              Makara::Context.set_current(current_context)
              Makara::Context.set_previous(previous_context)
          end
      end
  end
end

By default, the Makara adapter returns false for #needs_master? if the pending SQL statement is interpreted as a read operation. I wanted it to return true all the time unless we’re operating in “distributed” mode. Unfortunately, there was still one problem. The recommended Makara configuration is to set your adapter to “sticky” mode, which means that once any SQL operation hits a particular node within a specific context, it will continue to use that node until the context changes. This is a good thing because replication across different nodes may happen at different times. You don’t want to read data from one node, only to find on a subsequent read (from a different node) that the data doesn’t exist. For our purposes, the downside to this is that every request starts out by reading (and writing) to the master node. So, by the time we enter a distributed_mode block, the request is always “stuck” to the master node. Therefore, I made a #with_new_context method that resets the Makara context only for the duration of the given block, and resets it afterwards. This will give the request a chance to hit a slave node, and subsequently become “stuck” to whatever node it ends up with. Then, when the block ends, the context is reset to what it was before the operation. The previous context is always the one that was originally stuck to the master node. It’s important to note that the context handling for Makara uses class methods and singletons which essentially means the entire library is not threadsafe. This isn’t a problem for us at Optoro because we use a forking server model (Unicorn).

Finally, I needed a method that takes a block and uses the adapter’s #with_new_context method…

module ActiveRecord
  class Base
      # Needed in order to load our adapter
    def self.my_makara_connection(config)
      ActiveRecord::ConnectionAdapters::MyMakaraAdapter.new(config)
    end

    def self.execute_distributed
      connection = ActiveRecord::Base.connection
      result = nil

      if makara_enabled?(connection)
        connection.distributed_mode = true

        # We were previously reading from master, meaning master context is true.  This resets
        # the context so we can read from slave, but will allow it to get re-stuck to master if needed.
        connection.with_new_context do
          result = yield if block_given?
        end
      else
        Rails.logger.warn "*** Makara connection adapter not enabled.  Falling back to normal behavior. ***"
        result = yield if block_given?
      end

      result
    ensure
      if makara_enabled?(connection)
        connection.distributed_mode = false
      end
    end

    def self.makara_enabled?(connection)
      connection.is_a?(ActiveRecord::ConnectionAdapters::MyMakaraAdapter)
    end
  end
end

I wanted application-wide access to the method, so sticking it on ActiveRecord::Base seemed like the way to go. I’m not in love with the idea of monkey patching ActiveRecord::Base, but it gets the job done for the time being until I can come up with a better implementation. The upside of this implementation is that it falls back to normal behavior even if we’re not using our custom ActiveRecord adapter. This means I can use ActiveRecord::Base.execute_distribued wherever I want, and if it just so happens that we’re not using the Makara adapter (development mode, A/B testing production, etc.), nothing will break. If you use this approach, you’ll always need to require the adapter even if you’re not using it.

That’s all there is to it! 60-something lines of code and one new gem, and now we have the ability to distribute reads across any number of MySQL slave nodes. Depending on how widely you use distributed mode, this has the potential to greatly reduce the load on your MySQL master node. Additionally, it will buy you some breathing room for a strained database. Most importantly, it’s a small, incremental step toward scalability that doesn’t require you to make huge, sweeping ops changes.

Make sure you check out the Makara documentation for details on configuring the gem for your particular needs.

If you enjoyed this post, please consider subscribing.

Lessons Learned at Ruby for Good

2014-08-09T17:02:00-04:00

tl;dr: My weekend at Ruby for Good taught me that imperfect, shipped code is better than perfect code that no one sees.

A couple weekends ago I had the opportunity to attend Ruby for Good in Fairfax, VA at George Mason University. Ruby for Good is a conference/hackathon where participants split into teams and spend the weekend hacking on projects that benefit someone or something. The “someone or something” is typically a charitable organization of some sort, and the projects span the gamut. They could be (and were) anything from improving documentation on existing open-source projects, to building a greenfield app from scratch for a nonprofit.

The team I was on did the latter. We built a fresh web application from the ground up for a nonprofit. I had a great time, met some awesome people, and learned way more than I thought I would going in.

Our team built an issue tracker application for Pathway Homes, a nonprofit dedicated to providing housing to adults with mental illnesses. We started the weekend in the advantageous position of knowing who our stakeholders were, and having a fairly good understanding of what they needed. Up until now, tracking maintenance issues across their properties meant one employee listening to voicemails and taking notes in Microsoft Word. This was great from a developer perspective: Anything we produced would be better than the current system.

Team members started off by identifying our individual strengths. Few things make me happier than a well-designed data model and a great backend API, but somehow I ended up leading the frontend team. This was probably because I suggested using Angular. It just so happened that not only was I the only person with any experience using Angular, but several other people were immediately excited about the opportunity to learn something new throughout the weekend. I had my reservations initially, but those were soon replaced with excitement at the prospect of being way outside of my comfort zone. These kinds of conferences are all about doing something new. On top of that, I was in a position to lead and teach: Two areas in which I definitely want to improve.

I was initially concerned about the architecture of our application. I thought we were making hasty decisions that would come back to bite us later on. That fear soon faded away, and was replaced by the driving need to Get Shit Done. It was great. I’m typically the sort of person that will spend hours or days agonizing over the perfect system design. That doesn’t fly in a hackathon situation. Inspired by team members who were pushing changes left and right, I soon fell in line. The Github stats by Monday morning speak for themselves…

That is an absoultely insane amount of work accomplished by 10 people in less than 3 days. It’s even more insane when you consider that we typically called it quits each night by 9pm in order to go be social and do conference stuff. On Monday afternoon, we had a working, (mostly) production-ready application to demo. As far as I’m concerned, any other shortcoming pales in comparison with that accomplishment.

And there are certainly shortcomings. There are design problems. There is bad code all over the place (much of it written by me). Our git practices were terrible. The separation between frontend and backend is not really identifiable despite the fact that we were using a client-side framework. But you know what? That’s okay. I don’t think I would have been okay with it before Ruby for Good, but it took this experience to bring me to that realization. Perfect code is worthless if it doesn’t ship. Our goal was to write an app that would make life easier for a nonprofit, and that’s what we did.

We learned something. We shipped something. We had fun. Who can ask for more than that in one weekend?

If you’d like to contribute or you’re just curious, the code is available on Github.

If you enjoyed this post, please consider subscribing.

ETL with Ruby and Rodimus

2014-06-03T20:22:00-04:00

This is the story of how and why I made Rodimus, a minimal ETL framework written in Ruby.

When I started at Optoro (which coincidentally was also my first professional programming experience), one of my first projects was to create an ETL solution that had been sitting on the back burner for a while. The goal was to migrate a series of Mongo collections to equivalent MySQL database tables so that the company’s analysts could easily access the data from their Windows-based GUI SQL clients. In the case of embedded documents, the structure essentially had to be “flattened” into a series of SQL columns. An additional requirement was that the schema should be determined dynamically during each run. If we started to add arbitrary new fields on future Mongo documents, the program should recognize that and adjust the destination SQL schema appropriately during the next run.

My first attempt at a solution was a very crude Ruby implementation. It “worked”, but it used a ludicrous amount of memory and was horrendously slow. When I say slow, I mean it took over a day to process the entire events collection. The collection was admittedly over a terabyte in size, but that was still unacceptable. If the code had been capable of determining the most recently translated record’s timestamp and only pulling events that were created after that point in time, maybe the other limitations would have been acceptable. But that wasn’t the case.

Over the next few months, I spent a great deal of time reading about ETL and data warehouse solutions. I came into this project without any knowledge of these things, and with programming experience that was fairly limited to typical Ruby on Rails applications and other small-scale school projects. After evaluating other ETL solutions in the wild, I settled on a second iteration that used Pentaho’s Kettle. Kettle is an open-source Java ETL engine provided by a company that’s been doing ETL for over a decade.

My thought process behind using Kettle was simple: Why reinvent the wheel when people who are smarter than I am have already figured out these problems? I spent the next few weeks implementing version 2 of my ETL solution using Kettle. Working with Kettle involves a kind of visual programming using a drag-and-drop GUI interface. Transformation components (read from a database, match against a string, etc.) are arranged on a canvas, connected together, and then configured using step-specific menus. In some cases, you can script custom solutions using Javascript, Java, and with help from a third-party plugin even Ruby. There are literally hundreds of pre-defined steps for everything from reading a CSV file to interacting with a Salesforce module.

By the end of this second iteration, I had a working solution that was reliable and performant. On each run, only new data was processed from the source collection and subsequently written to the destination database. From the end users’ perspective (our analysts), it was a success.

Unfortunately, it was not a success from a development perspective. Kettle was designed for non-programmers. The development environment is mouse-click heavy with tons of windows and menus for every minute detail. It’s possible to dive into the Java source in order to manage, create, or extend any component of the system, but that’s not the Kettle Way. They’ve gone above and beyond in order to make sure that anything and everything you could possibly want to do to your data is available in a pre-baked configurable step from inside the GUI. This is great for ensuring that Kettle can handle any conceivable ETL problem, but it also results in a lot of complexity. My Kettle project was massive and completely unmaintainable. Making a change required at minimum a full day of refreshing myself on how things worked. Debugging anything was a nightmare. It was completely untestable, and to make matters worse I was the only developer on our team who had any idea how to work with Kettle.

A few weeks ago, we made some changes to our Mongo infrastructure that resulted in a need to refactor how my ETL project works. I was dreaded even opening the project files. Then, I had an epiphany. Why not do it in Ruby? It’s been a year since my first failed attempt. I’ve spent that year reading about, working with, and generally absorbing a great deal of knowledge about ETL solutions. As the only ETL developer on the team, I could say with certainty that re-implementing this entire thing in Ruby would make developer happiness increase by 1000%.

Enter Rodimus.

One major thing I’ve learned about ETL in the past year or so is that solutions tend to be very targeted to their specific domain. When you try to generalize too much, you end up with a conglomerate of options and a barrier of entry that is far too high. Kettle is, by design, the one stop shop for ETL solutions. Entire books have been written on it because it takes an entire book to even begin to understand everything it has to offer. That’s all well and good if you’re a non-programmer who’s working with ETL, but I wanted something simple, clean, and easily maintainable. I wanted something that would fit nicely into our Ruby ecosystem.

I approached Rodimus with the goal of simplicity. I wanted something very lightweight with minimal dependencies upon which targeted ETL solutions could be implemented. Its forking process approach to concurrency is actually inspired by Kettle’s design. Check out the README for more details.

It only took me a couple of nights to produce the first version of Rodimus. A few days later, I had rewritten our entire ETL stack on top of it.
When I look at the code now, its simplicity still surprises me when compared to the monstrosity that I had previously implemented in Kettle. Approaching the ETL project is no longer an exercise in frustration. I am more confident in my (now testable) solution, and I can easily share an understanding of the project with my coworkers. In all, I am a much happier developer.

I have future plans for Rodimus, but I think I will continue to strive for simplicity at its core. ETL can be complex, but that complexity should live solely in the specific application. It shouldn’t be the concern of the ETL engine.

If you enjoyed this post, please consider subscribing.

Using gdb to Debug Ruby

2014-03-27T17:23:00-04:00

At work, we were encoutering a few situations where Resque workers would hang indefinitely after forking. My first thought was to force the process to raise an exception in order to generate a stack trace which I would then be able to examine in the logs. However, sending TERM, INT, and QUIT signals to the process didn’t seem to have any affect. The process still hung, and I had nothing useful to look at in the logs. The only way to end the process was to send it a kill -9, which unfortunately means no stack trace and no clue as to what actually went wrong.

After this happened a few times, I realized that I needed to examine the running process for more information. My limited C/C++ exposure told me that gdb would be a great tool for this, but I wasn’t sure how I could get useful Ruby information after attaching to the running process. A little googling led me to an excellent blog post by Thoughtbot that included some helpful gdbinit definitions:

~/.gdbinit

define redirect_stdout
  call rb_eval_string("$_old_stdout, $stdout = $stdout, File.open('/tmp/ruby-debug.' + Process.pid.to_s, 'a'); $stdout.sync = true")
end

define ruby_eval
  call(rb_p(rb_eval_string_protect($arg0,(int*)0)))
end

The first function redirects the process standard out to a temporary file which you can then tail -f in order to see what’s going on. The second function allows you execute arbitrary Ruby code inside the process. Assuming you have the above definitions in your ~/.gdbinit file, and a running irb process with pid 12345, the process would look like:

$ gdb /home/brandon/.rvm/rubies/ruby-2.1.0/bin/ruby 12345
(gdb) redirect_stdout
$1 = 20
(gdb) ruby_eval("puts caller.join('\n')")

Obviously, substitute your own path to the appropriate Ruby binary. In a separate window, you can tail -f /tmp/ruby-debug.12345 after the first command. The second command will then output the current execution stack for the running process. Your ruby_eval calls are running in the context of the currently executing code. So, for instance, you could see a list of currently defined local variables with Kernel#local_variables. You could then examine each of those variables in turn in order to get an idea of what was going on at that particular point in your code.

If you’re curious as I was about the call to rb_eval_string_protect() (which is an internal C function in the Ruby source), the first argument is the string of Ruby code that’s being executed. The second argument is an integer pointer to a defined error constant. In this case, 0 means the code executes successfully. A non-0 number would indicate an error, and the function would return nil.

These little gdb tricks have changed my world when it comes to debugging. I use this technique all the time.

If you enjoyed this post, please consider subscribing.

Ruby Class Method Mixins

2014-03-25T17:57:00-04:00

Adding custom class methods is a common Ruby-ism that’s used in many gems. Maybe you’ve used state_machine to do something like this:

class MyMachine
  state_machine :state do
    # other stuff
  end
end

Fairly straight-forward. Plenty of libraries use class methods like this. You can easily group related methods together in your own modules, and then have them defined on the class when the module is later included. This is accomplished with the Module#included method like so:

module MyModule
  def self.included(base)
    base.extend(ClassMethods)
  end

  module ClassMethods
    def class_do_something(arg)
      # stuff
    end
  end
end

Then, include this module in one of your other classes:

class MyClass
  include MyModule

  class_do_something :foo
end

A friend of mine has been working on a Ruby MUD server that I’ve been helping with when time permits. One thing we want to do is persist game objects to disk using any number of backends (Mongo, SQLite, etc.). We want an API that’s storage-type agnostic which can talk to the appropriate mechanism by way of an adapter. The adapter should be hidden from the main application. We want to be able to write object classes that might look like this:

class Weapon
  include Persistable

  field :damage, :float
  field :weight, :int
end

This is very similar to the Mongoid syntax (a gem my friend and I are both fans of). Using the above pattern in our Persistable module allows us to accomplish exactly what we’re going for. The full code after our initial refactor is below.

module Persistable
  attr_reader :_id

  def _id
    _id ||= SecureRandom.uuid
  end

  def self.included(base)
    base.extend(ClassMethods)
  end

  module ClassMethods
    def self.extended(base)
      base.instance_eval do
        class << self; attr_reader :fields; end
        @fields = []
      end
    end

    def field(name)
      attr_accessor name
      @fields << name
    end

    def read_only_field(name)
      attr_reader name
      @fields << name
    end
  end

end

If you enjoyed this post, please consider subscribing.

Rails Validator Classes and Instance Vars

2013-11-07T23:21:00-05:00

Have you ever used a custom validator class in Rails? I’m referring to a subclass of ActiveModel::Validator which you then include inside your model such as…

class MyModel < ActiveRecord::Base

  include ActiveModel::Validations
  validates_with MyValidator

  # ...etc.
end

This allows you to define your validation methods in a custom class. It can be useful for extracting validation behavior out of your model. However, what you might not know is how this class is instantiated by Rails. I assumed that a new instance of my validator would be created each time validation was performed on an instance of the model during a web request. I also assumed that the validator instance would stick around if multiple validation attempts were made. Based on this assumption, I attempted to memoize the model instance inside my validator.

class MyValidator < ActiveModel::Validator
  attr_reader :model

  def validate(record)
    @model ||= record
    a_validation_method
    another_validation_method
  end

  private

  def a_validation_method
    if !check_for_validity
      model.errors[:base] << 'No good'
    end
  end

  def another_validation_method
    # etc...
  end
end

This won’t work, and leads to some unexpected behavior. Everything seemed fine while submitting my form via the browser, but my tests were failing. Things that should have been valid weren’t, and vice versa. This eventually led me to do a little research on how Rails uses validators.

ActiveModel::Validations::ClassMethods#validates_with

def validates_with(*args, &block)
  options = args.extract_options!
  args.each do |klass|
    validator = klass.new(options, &block)
    validator.setup(self) if validator.respond_to?(:setup)

    # more stuff...

    validate(validator, options)
  end
end

The moral of the story is that validates_with is a class method that creates a single instance of the validator when the model class is first loaded. If you memoize an instance variable inside the validator, it will not be replaced on successive calls to the validate method. In other words, the validator might be trying to validate the wrong object.

If you enjoyed this post, please consider subscribing.

Diving into Clojure

2013-06-28T21:21:00-04:00

I was on vacation at the shore for the last week, and I resolved to learn at least one new thing. Halfway through the week, I decided that my new thing would be a new programming language. Since I have limited exposure to functional languages, I decided to go with one of those.

After surveying the landscape, I decided that Clojure would be my language of choice. Why Clojure? For one, it runs on the JVM and is able to take advantage of all the default Java/JVM tools and libraries. Other then that, it was mostly a random pick. Functional programming seems to be the new (old?) hotness these days, and there’s a ton of options. Many of them bring a lot of similar things to the table. In the end, I chose Clojure because it’s an evolution of Lisp, which is itself very similar to the MUSH/MUX code I spent much of my spare time writing in younger days.

I took a stab at Clojure by writing a simple telnet chat server, which is usually my go-to app for learning a new language.

Documentation for Clojure isn’t bad. I mainly used two sites: The Clojure Documentation Site and Mark Volkman’s Tutorial. I found a healthy mix of easy to advanced guides and examples to help me on my way.

My overall experience was pretty pleasant when it comes to programming in Clojure. I found myself struggling occasionally with finding alternatives to defining variables and maintaining application state, but I think that has less to do with Clojure itself and is more reflective of my inexperience with functional programming. Clojure syntax is fairly intuitive and easy to use when it comes to native Clojure concepts, but I found it a little less so when dealing with Java libraries.

I ended up relying on those Java libraries for a lot of input and output in my application, mainly due to the fact that I couldn’t find a socket library for Clojure. I also found myself falling back on Java threads for concurrency. Clojure offers some very nifty concurrency tools in the form of delays, futures, and promises, but none of them solved my need for long-running loops in a separate thread. I’m not sure if the use of native Java threads is intended when using Clojure, or if that was a result of my inexperience with the language. One thing I really enjoyed about concurrency in Clojure was the ready-to-use thread-safe reference types such as atoms and agents.

At the end of the day, I can definitely see the advantage of using Clojure (or any functional language) for certain purposes: specifically math computations and stuff that require lots of concurrency. I don’t know if Clojure will be a go-to for me in general outside of those two specific applications. I’ll have to take a stab at other functional languages in order to form a better opinion.

Ruby Comparable Mixin

2012-11-05T11:30:00-05:00

It’s often desirable to compare two or more objects for some sort of custom sort order. Ruby implements this by default in many classes using the comparison (<=>) operator. This comparison operator is used internally whenever you call the sort method to impose some kind of order on an array of strings, integers, or anything else. Comparing two arbitrary objects directly using the comparison operator will return -1, 0, or 1 depending on if the calling object is less than, equal to, or greater than the object it’s being compared to.

For example…

1.9.3p286 :001 > ["charlie", "bravo", "alpha"].sort => ["alpha", "bravo", "charlie"]
1.9.3p286 :002 > 5 <=> 2 => 1

If you have a unique class that needs comparison behavior (or alternatively you want to override the default comparison behavior in some existing class), it’s as easy as including the Ruby Comparable mixin. If you’re familiar with the Java Comparable interface, then this is very similar.

class Person
  include Comparable

  attr_accessor :title

  def initialize(title)
    @title = title
  end

  def <=>(other)
    return -1 if @title == 'boss'
    @title <=> other.title
  end
end

After including the Comparable mixin, simply define a <=> method that implements the comparison behavior you’re looking for. In this case, I always want my “boss” sorted to the top of the list in any comparison. Otherwise, just perform a default comparison between titles of the two Person objects. That would be a string comparison in this scenario, meaning an alphabetical sort.

1.9.3p286 :002 > a = Person.new('boss')
 => #
1.9.3p286 :003 > b = Person.new('worker')
 => #
1.9.3p286 :004 > c = Person.new('customer')
 => #
1.9.3p286 :005 > [b, c, a].sort
 => [#, #, #]
1.9.3p286 :006 > b <=> c
 => 1
1.9.3p286 :007 >

Defining the <=> method will also give you all of the default comparison operators (<, <=, ==, >=, and >), as well as the between? method. It should be noted that it’s generally a good idea to have your comparison return nil in the event that it can’t be compared to the other object.

Rails Fixtures and FactoryGirl

2012-10-23T14:12:00-04:00

I’ve been jumping into Ruby (and specifically Rails) testing in the last few weeks. I figured I’d share a quick introduction to a neat tool that really has the potential to save time: FactoryGirl. FactoryGirl is a gem that allows you to replace cumbersome Rails fixtures with something much more flexible. When testing, fixtures allow you to specify some default entries in your database tables which can then be used throughout your tests. Read more about fixtures here. A fixtures file might look something like this.

test/fixtures/users.yml

bob:
  first_name: Bob
  last_name: Smith
  email: bob@example.com
  position: CEO
  phone: 777-1111

john:
  first_name: John
  last_name: Doe
  email: john@example.com
  position: CTO
  phone: 777-1112

Fixtures are automatically loaded when tests are run. Inside my tests, I would be able to use my fixtures like so.

test/unit/user_test.rb

require 'test_helper'

class UserTest < ActiveSupport::TestCase
  test 'Bob should be the CEO' do
    assert_equal users(:bob).position, 'CEO'
  end
end

This is all well and good until you suddenly need to do something slightly outside of the box. What if you want to quickly generate 3 more test users with unique names? What if you need a large array of 100 test users? Fixtures don’t work particularly well in these kinds of scenarios. Of course, you could also write these particulars into your test’s setup method. But there’s an easier way. Enter FactoryGirl.

test/factories/user.rb

FactoryGirl.define do
  factory :user do
    sequence(:email) { |n| "test#{n}@example.com" }

    factory :named_user do
      email { "#{first_name}@example.com" }

      factory :user_bob do
        first_name 'Bob'
        last_name 'Smith'
        phone '777-1111'
      end

      factory :user_john do
        first_name 'John'
        last_name 'Doe'
        phone '777-1112'
      end
    end
  end
end

In this one example, I’ve taken care of all of the afore mentioned hypothetical scenarios while still including our original fixtures. Each model can be associated with many factories. A factory allows you to manufacture lots of objects of your model class. Here’s how I might use the factories I defined in place of the earlier unit test.

test/unit/unit_test.rb

require 'test_helper'

class UserTest < ActiveSupport::TestCase
  test 'Run some test examples' do
    # Let's get John and Bob back from the original fixtures
    bob = FactoryGirl.create(:user_bob)
    john = FactoryGirl.create(:user_john)

    # Now let's create 100 unique users
    users = FactoryGirl.create_list(:user, 100)

    # I need a new unique user named Jim
    jim = FactoryGirl.create(:named_user, first_name: 'Jim', last_name: 'Smith')

    # Both of these will pass
    assert_equal jim.email, 'jim@example.com'
    assert_equal users.first.email, 'test1@example.com'
  end
end

Not only have I easily generated some unique users for my tests, but I can sequence unique email addresses for each one with just a couple lines of code in my factory definition. This is just scratching the surface of what can be done with FactoryGirl. It has support for callbacks, associations, and so much more.

Intro to the Java Collections Framework

2012-10-14T04:30:00-04:00

Over the course of the last couple weeks, I’ve been messing around with the Java Collections Framework. Specifically, I’ve had to learn a few things about the framework in order to use some of the concrete implementations in a school project. The Java Collections Framework has a lot of moving pieces, and the Oracle tutorials can be a little confusing. Here is my attempt to demonstrate a simple way to take advantage of the framework.

Overview

The Java Collections Framework is a group of interfaces and classes that make it easy to deal with data structures. A data structure is just a fancy way of referring to structured information. There are many different types of data structures, and an examination of those types is beyond the scope of this post. If you’re interested, here’s one place you can start. In addition to classically recognized data structures, you’re also free to invent your own type of data structure that is capable of meeting your particular needs in a given project. Here’s the process I go through when working on a Java project that could benefit from use of the collections framework.

Step 1: Recognize the Need

It’s tough to use a tool if you can’t recognize the situations where it’s needed. In this case, if I find myself storing, accessing, and manipulating a collection of objects, then this is probably a good time to consider using the collections framework.

Step 2: Choose a Structure

Next, I pick from one of four high-level data structures that are implemented in the Collections Framework: Set, List, Queue, and Map. If you don’t know what these are, check my earlier link about data structures. However, here’s a very brief description of each.

Set - A bucket that contains a bunch of unique items.
List - A bucket that contains a bunch of items. Duplicates are allowed.
Queue - A bucket that contains a bunch of items that are waiting to do something.
Map - A bucket that acts like a dictionary. Each item has a key (the dictionary “word”) and a value (the “definition”). Keys have to be unique.

I’ve found that 80% of the problems I’ve encountered can be handled by using some flavor of these four structures.

Step 3: Choose an Implementation

The Java Collections Framework is made up of a bunch of interfaces, abstract classes, and actual implementations of those classes. They are arranged in a hierarchy so that you can use some or all of what’s offered to meet your needs. However, the most straight-forward way to use the framework is to simply use the actual implementations of the classes. That’s the approach I’ve taken, and it works almost all of the time. It’s an approach that won’t always work, but I prefer to take the simplest, most straight-forward route, and then go back and refactor if I need to make changes.

That being said, here’s how I decide which class implementation to use for each data structure.

Set

Does it need to be ordered? If not, use a HashSet. Is the order in which they were added to the set sufficient for my ordering needs? If yes, use a LinkedHashSet. Otherwise, use a TreeSet.

List

Always default to using an ArrayList. It’s the best for inserting and removing elements at the end of the list. However, if I need to insert or remove elements in the middle of the list, use a LinkedList.

Queue

Default to using a LinkedList for any standard type of queue (FIFO, LIFO, etc.). If I need the queue to order its elements in some other way, use a PriorityQueue.

Map

Does it need to be ordered? If not, use a HashMap. Is the order in which they were added to the map sufficient for my ordering needs? If yes, use a LinkedHashMap. Otherwise, use a TreeMap.

Summary

If you choose your collection implementation using the above considerations, you will (usually) end up using the most appropriate and most efficient collection for your needs. There are always exceptions to the rule, but this post was meant to be a high-level look at the framework. I know I didn’t talk about the mechanics of sorting, iterating, or synchronization. Those are discussions for another day.

How I Became a Developer

2012-10-08T03:55:00-04:00

How did I go from working in the public sector to writing code for a living in six months?

I graduated high school several years ago. Since then, I’ve worked almost exclusively in the public sector. I’ve been an EMT, a firefighter, a police officer, and a criminal investigator. However, I’ve always maintained this strange obsession with computers even while my career proceeded down a decidedly non-technical path. As a youngster, I was constantly taking our family computer apart and inevitably breaking things that my parents would have to replace. At the same time, I was growing increasingly obsessed with those online text-based games like Dragon Realms. They were the MMOs that existed before MMOs were a thing. They were one of the few games you could play for free on AOL, and - almost as important - the minimum system requirements were non-existent. As time went on, I grew tired of simply playing these games. I wanted to build my own world that other people could play in. A friend gave me a shell account on a Unix machine, and before long I was hacking away at various flavors of DikuMUD and others.

By the end of high school, I was regularly attending computer shows and scooping up cheap, second-hand parts so that I could build Linux boxes on the living room floor. I was extremely active as a player and administrator in the PennMUSH community. A friend introduced me to the Lisp-like syntax of MUSHcode and promptly created a monster. In no time at all, I was wasting hours upon hours of my life building exceedingly complex game systems in imaginary worlds that didn’t mean squat in the scheme of things. However, little did I know at the time that I was learning valuable programming concepts. I would have stared at you blankly if you’d asked me what a binary tree was or how to use recursion to traverse it. But then I’d turn around and use those and other concepts to create increasingly impressive systems on games like Wing Commander: Red Horizon and FiranMUX.

I tried college as a music major (did I mention I’m also a musician?), but soon dropped out. I knew it wasn’t what I wanted to do. At 19, I took as job as an EMT. The pay was decent (by 19-year old standards), and the benefits in the public sector were great. There was also upward mobility. A couple years later, I was a firefighter in a major metropolitan area making money that you just couldn’t get anywhere else without a bachelor’s degree. Soon after that, I became a cop and a criminal investigator. It seemed like my career was on track. Someday, I’d be able to collect on one of those fat government pensions (that was sarcasm).

However, all the while I was still dabbling away as an amateur developer. I wanted to build websites, so I learned HTML, CSS, and a little Javascript. I wanted to learn about RDBMS, so I started messing around with MySQL. I wanted to learn a “real” programming language, so I taught myself C and promptly developed a real appreciation for garbage collection and memory management. Somewhere along the line, I decided that perhaps my hobby could be more than a hobby. So, I went back to school part time as a Computer Information Science major. I started learning Java, and suddenly realized that all those years building MUDs and MUSHes weren’t a complete waste of time. It turns out that I had a pretty good understanding of some programming concepts that my classmates were struggling with.

At one point, a friend (the same friend who introduced me to MUSHcode) introduced me to Ruby, and once again created a monster. That was about a year ago. I read The Pickaxe cover to cover and was writing halfway decent Ruby code in no time. I loved it. Dynamic typing, intuitive syntax, a gigantic and very helpful online community that made learning a breeze. What’s not to love? I started hacking away at Rails applications, and fell in love with MongoDB by way of the MongoID gem.

About six months ago, I decided to get serious about this programming thing. I was getting toward the end of my degree and simultaneously realizing that I didn’t want to be working as a public servant in 20 years. So, I put together a resume and started applying for entry-level developer jobs. My resume contained almost nothing to do with programming or software development. Employers and recruiters were probably laughing. Who is this guy who hasn’t even finished his BS, who’s only practical experience is writing code in an irrelevant scripting language for online games?

I peppered the area with resumes. I probably applied for 30 different positions. Most of them never called. But I kept at it, and eventually got a couple interviews. One interviewer must have seen some sort of potential because he sent me home with an programming exercise. ”Just make the tests pass,” he told me. The tests were written in Cucumber. I’d never used Cucumber or RSpec. So, I did what I’ve historically done in similar situations. I picked up a Cucumber book and read the first five chapters. Then, I picked up an RSpec book and did the same thing. Then, I blasted through that programming exercise. For bonus points, I turned it into a fully functional Rails app and deployed it.

Wouldn’t you know, I got the job.

So here I am, a Junior Software Developer with a background in law enforcement and public safety. I’ve thrown stability to the wind in favor of pursuing a career doing something that started as a hobby. Thankfully, my wife has been extremely supportive and hasn’t divorced me (yet).

Sometimes, I’m not quite sure how I got here. Like I said, long and complicated.