For the last few years I’ve been taking photos of my daily life and posting them on instagram. And for the last year and a halfish I’ve been using https://github.com/tryghost/ghost as a backend for my blog.

One of the major things that I thought was missing and really wanted to setup was cross posting from instagram on my personal blog. This includes grabbing the source images via the instagram API, storing them in S3, and creating a post.

I had written a script that would grab images and upload them to s3, and I was manually posting things every month or two. But with some down time over the recent holiday I made a poorly written end-to-end sync tool that backs up the instagram images and posts them to the blog. The script is below:

#!/usr/bin/env python

import json
import os
import requests

from boto.s3.connection import S3Connection
from boto.s3.key import Key

import urllib2
import StringIO
import datetime

from slugify import slugify

global CONN

auth_token = 'instagram-api-token'

try:
    AWS_ACCESS_KEY = os.environ['AWS_ACCESS_KEY']
    AWS_SECRET_KEY = os.environ['AWS_SECRET_KEY']
    AWS_BUCKET_NAME = os.environ.get('instascraper_bucket', 'bucketname')
    CONN = S3Connection(AWS_ACCESS_KEY, AWS_SECRET_KEY)
except Exception, e:
    print 'AWS Credentials were not properly set'


def _key_name(id):
    return 'instagram-photos/%s.png' % id


def get_ghost_token():
    res = requests.post('http://jake.ai/ghost/api/v0.1/authentication/token', data={
        'username': '[email protected]',
        'password': 'password-goes-here',
        'grant_type': 'password',
        'client_id': 'ghost-admin',
        'client_secret': 'q0f8hqf0hq'
    })

    return json.loads(res.content)['access_token']


def create_post(title, created_time, html):
    token = get_ghost_token()

    try:
        slug = slugify(title)
    except Exception:
        slug = '(untitled)'

    pd = dict(author="1",
              featured=False,
              image=None,
              language="en_US",
              markdown=html,
              meta_description=None,
              meta_title=title,
              page=False,
              published_by=None,
              slug=slug,
              status="published",
              tags=[{
                  "id": 7,
                  "uuid": "041d5867-9bcf-4f9e-a5a5-51cf7ab541d0",
                  "name": "insta",
                  "slug": "insta",
              }],
              title=title,
              published_at=created_time)

    h = {'Authorization': 'Bearer %s' % token, 'Content-Type': 'application/json'}
    res = requests.post('http://jake.ai/ghost/api/v0.1/posts',
                        json=dict(posts=[pd]), headers=h)

class InstagramPhoto(object):
    def __init__(self, image_dict):
        super(InstagramPhoto, self).__init__()
        self.id = image_dict.get('id')
        self.caption = None
        self.created_time = None
        if image_dict.get('caption'):
            self.caption = image_dict['caption'].get('text')
        self.instagram_image_url = image_dict['images']['standard_resolution']['url']
        self.instagram_url = image_dict.get('link')
        self.created_time = datetime.datetime.fromtimestamp(float(image_dict.get('created_time'))).strftime('%Y-%m-%d %H:%M:%S')

        self.s3_url = None

    def __repr__(self):
        return "InstagramPhoto(id=%s)" % (self.id)

    def upload_to_s3(self):
        bucket = CONN.get_bucket(AWS_BUCKET_NAME)
        if bucket.get_key(_key_name(self.id)):
            print 'This image already exists in s3: %s' % self.id
            k = Key(bucket)
            k.key = _key_name(self.id)
            self.s3_url = k.generate_url(expires_in=0, query_auth=False)
            return False
        try:
            k = Key(bucket)
            k.key = _key_name(self.id)

            file_handle = urllib2.urlopen(self.instagram_image_url)
            file_content = StringIO.StringIO(file_handle.read())
            k.set_contents_from_file(file_content)
            k.set_acl('public-read')
            self.s3_url = k.generate_url(expires_in=0, query_auth=False)
            return True
        except Exception, e:
            print 'An error occured trying to upload %s: %s' % (self.id, e)

    def post_to_blog(self):
        raw_body = '''<a href="%(instagram_url)s"><img src="%(image_url)s" class="instagram" alt="%(caption)s"></a>'''
        body = raw_body % {
            'instagram_url': self.instagram_url,
            'image_url': self.s3_url,
            'caption': self.caption
        }
        post = {'title': self.caption, 'html': body, 'status': 'draft',
                'created_time': self.created_time}

        create_post(post.get('title'), post.get('created_time'),
                    post.get('html'))


raw_images = requests.get('https://api.instagram.com/v1/users/self/media/recent', params={
    'access_token': auth_token,
    'count': 1000
}).json()['data']

photos = [InstagramPhoto(p) for p in raw_images]

for p in photos:
    if p.upload_to_s3():
        p.post_to_blog()

I’ve been playing with making a service that runs containers via Amazon’s Elastic Container Service. This evening I finally got to the point where it was time to start experimenting with schedulers and other such fun-ness.

After a few quick tests, it became clear that the scheduler I’m working on might need to have more information to intelligently schedule tasks in the cluster. Let me try to illustrate what I mean

  • Task definition: 100MB ram, less than 1 core, and takes 5 minutes to complete
  • Container Instance: 512MB of ram / 1 core

I queue 100 of these tasks in SQS. The scheduler quickly pulls the messages out of the queue and tries to schedule the tasks to run in the ECS cluster. After 5 tasks are active, all ram on the container instance is consumed and no new tasks can be run until the existing tasks finish.

Upon ECS task rejection, the scheduler performs no further action on the SQS message. This means that the message eventually gets requeued (after a default of 30 seconds it becomes visible again). This sounded like an easy way to do things, but if the cluster is full, and high priority tasks get rejected, it is possible for lower scheduled tasks to run first if the higher priority messages are invisible (waiting to be requeued) in SQS.

So ideally the scheduler will know the present state of the cluster before actually calling RunTask or StartTask. Unfortunately its kind of a pain to query all of the cluster metadata every time we want to run a task. To gather and collect all of the pertinent information about a given cluster it requires 4 HTTP queries, and a fair amount of json mangling.

I started thinking about this a little, and searching for how other people do stuff like this and I stumbled upon http://williamthurston.com/2015/08/20/create-custom-aws-ecs-schedulers-with-ecs-state.html + https://github.com/jhspaybar/ecs_state

Basically it is a proto-scheduler that keeps cluster state in an in-memory sqlite database. They provide a nice example of how this sort of thing could be used to build a completely customized StartTask API scheduler. For example: https://github.com/jhspaybar/ecs_state/blob/6686cdfc418385e8db76d6bc719c7c278a10b471/ecs_state.go#L373

But to get started I really just want to use the RunTask API, so all I really need to know is “is there space in the cluster right now?”, so I wrote a simple tool to cache cluster metadata in redis which is super easy/fast to query and make simple decisions from.

import boto3
import redis


def summarize_resources(resource_list):
    response = {}
    for r in resource_list:
        response[r['name']] = r['integerValue']
    return response


def cluster_remaining_resources(cluster_name, instance_arns):
    remaining_cpu = 0
    remaining_memory = 0

    for arn in instance_arns:
        instance_key = '%s:instance:%s' % (cluster_name, arn)
        remaining_cpu += int(redis_client.hget(instance_key, 'remaining_cpu'))
        remaining_memory += int(redis_client.hget(instance_key, 'remaining_memory'))
    return dict(cpu=remaining_cpu, memory=remaining_memory)


pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
redis_client = redis.Redis(connection_pool=pool)

ecs = boto3.client('ecs')

cluster_arns = ecs.list_clusters()['clusterArns']
clusters = ecs.describe_clusters(clusters=cluster_arns)['clusters']

for cluster in clusters:
    cluster_arn = cluster['clusterArn']
    container_instance_list = ecs.list_container_instances(cluster=cluster_arn)
    container_instance_arns = container_instance_list['containerInstanceArns']
    instances = ecs.describe_container_instances(cluster=cluster_arn,
                                                 containerInstances=container_instance_arns)

    for i in instances['containerInstances']:
        registered_resources = summarize_resources(i['registeredResources'])
        remaining_resources = summarize_resources(i['remainingResources'])

        instance_state = {
            'status': i['status'],
            'active_tasks': i['runningTasksCount'],
            'registered_cpu': registered_resources['CPU'],
            'registered_memory': registered_resources['MEMORY'],
            'remaining_cpu': remaining_resources['CPU'],
            'remaining_memory': remaining_resources['MEMORY']
        }

        instance_key = '{cluster_arn}:instance:{instance_arn}'
        key = instance_key.format(cluster_arn=cluster_arn,
                                  instance_arn=i['containerInstanceArn'])
        redis_client.hmset(key, instance_state)

    cluster_resources = cluster_remaining_resources(cluster_arn,
                                                    container_instance_arns)

    cluster_rm_key = '%s:remaining_memory' % cluster_arn
    cluster_rcpu_key = '%s:remaining_cpu' % cluster_arn

    redis_client.set(cluster_rcpu_key, cluster_resources['cpu'])
    redis_client.set(cluster_rm_key, cluster_resources['memory'])

Once that is done you can query whether a cluster has enough resources by running

if redis.get('cluster_arn:remaining_memory') < task_definition_memory_requirement:
    # unable to run task, don't try to RunTask

A few weeks ago I started thinking about how to deploy my new side project. The project consists of a pretty small API written with rails, and a completely separate static React application.

For deploying the API I wrote an ansible role that basically replicates the capistrano approach to deploying code. Basically it checks out the latest code from master into a new folder, symlinks in a bunch of persistent files/configuration, and then symlinks the current application directory to the newly checked out code. This approach allows for fast rollback and easy troubleshooting. This part was pretty straight forward, and not very interesting.

For the static react application I wound up doing something and kind of awesome. Instead of running a static server with nginx, I decided to try a serverless approach that utilizes S3 and CloudFront. The serverless approach is extremely scalable, extremely cheap, and requires 0 maintenance.

Initially I wanted to just host things out of s3, this seemed like an easy/practical thing to do. But I quickly learned that I wouldn’t be able to use TLS for my domain in the way that I wanted, so I started research how to make it work. It turns out that AWS has CloudFront (a CDN) which can act as a proxy in front of S3, and allows you to do TLS via Server Name Indication.

So I quickly tested things out by manually uploading things to an s3 bucket, and hooking up CloudFront, and I was immediately pretty satisfied, but then I tried to update some files and realized that the caching with CloudFront was going to be an issue. One way I had heard of dealing with this in the past is to add an md5/sha2 suffix to each file, and then every time anything is updated a new pointer is created, which allows users with old/cached versions of things to continue working while new requests get an all-new set of assets.

So I started trying to figure out how I was going to achieve that behavior. After some googling I found some awesome gulp packages that let me achieve exactly what I wanted!

How it works

  1. gulp dist — This compiles/builds all of the static assets (less, jsx, js, etc) into a single main.css and main.js and the output is a directory that is deployable.
  2. gulp rev-all This goes through the directory that was outputted by step 1, adds a sha/md5 suffix to every file, and updates all references to the old paths to the new paths with the sha/md5 in the filenames.
  3. Publish to the S3 Bucket — all new files will be uploaded, all existing files that have not changed will be left alone, and all changed files (like index.html) will be updated in s3.
  4. Invalidate the cache — Once the new files are in s3, I invalidate a single file in the CloudFront distribution /index.html
  5. Wait for 10 minutes, and your changes will be globally deployed across all continents providing a nice/fast experience to the user.

I’ve taken this flow and turned it into a single command, so whenever I’m ready to make changes I run npm run deploy, and the entire process kicks off.

The code

Most of the heavy lifting is done by gulp-rev-all, which provides an example in the readme. You can read more about things here: https://www.npmjs.com/package/gulp-rev-all

Below is a snippet of code that I setup to get things working for me, it follows the gulp-rev-all example pretty closely.

gulp.task('dist-revision', function() {
  var refs = [/(.*icons.*)/g, 'index.html'];
  var revAll = new RevAll({
    dontRenameFile: refs,
    dontUpdateReference: refs
  });

  return gulp.src(DIST_DIR + '/**/**')
    .pipe(revAll.revision())
    .pipe(gulp.dest('./build/cdn'));
})

var aws = {
  'params': {
    'Bucket': 'bucketname'
  },
  'accessKeyId': process.env.AWS_ACCESS_KEY,
  'secretAccessKey': process.env.AWS_SECRET_KEY,
  'distributionId': 'E30r8h0rf8hf',
  'region': 'us-west-2'
};

gulp.task('dist-deploy-static', ['dist', 'dist-revision'], function() {
  var publisher = awspublish.create(aws);
  var headers = {'Cache-Control': 'max-age=315360000, no-transform, public'};

  return gulp.src(CDN_DIR + '/**/**')
    .pipe(rename(function(path) {
      path.dirname = '/production/' + path.dirname;
    }))
    .pipe(awspublish.gzip())
    .pipe(parallelize(publisher.publish(headers), 50))
    .pipe(publisher.cache())
    .pipe(awspublish.reporter())
    .pipe(cloudfront(aws))
})

Over the last few years I’ve attempted to make iphone apps on a few occasions. The first few times was when Objective-C was the only option. I really struggled to fully grok the syntax and concepts of how the language worked, and never really got the hang of things. So while I was attracted to the idea of making an iphone app, I couldn’t muster the patience to learn the language that was so different than what I was used to writing.

About 1 year ago I attended a few-hour crash course class on Swift and Xcode (like 3 weeks after swift was announced) with Jessica at Thoughtbot in San Francisco. The class was a high-level approach to building iphone apps from the perspective of a designer. It was extremely useful to see how design assets were pulled into the iphone app and used throughout xcode. This was really the turning point for me wanting to make an application for realz.

After that class I messed around with a few things for a few days before getting caught up in real work and dropping the focus on iphone apps all together.

About a month ago I started messing around with swift and iphone app development again for a few hours a week. Swift had stabalized quite a bit, and there were now more resources available for learning/googling.

There is something really attractive about building iphone apps. It is a fully encapsulated product — I find that when I am designing/building an iphone app there are more constraints and I am forced to consider the entire user experience and flow to a much more rigorous level than what I typically see in web applications. When I’m working on web applications I frequently focus on things like api usability, and backend ergonomics, and often lose sight of the end-user experience. With iphone apps, the only that that matters is the end-user experience.

Anyway, I’m writing this because I completed my first iphone app, and it has successfully landed in the apple store. It is a coin tossing application that performs 1,000 coin toss trials every time you hit the flip button.

It is called Cointossr, and you can download it here: https://itunes.apple.com/us/app/cointossr/id912158129

About a month ago I stumbled onto a kickstarter called PlantEnvy: https://www.kickstarter.com/projects/cassidytuttle/plant-envy-indoor-planter/ – Initially I was super excited, but then I realized that the campaign was over and it had failed.

My company just moved to a new office location and I now have a large desk that is in need of some plants. I immediately thought of PlantEnvy; but again was sad by the fact that the campaign failed and that I couldn’t buy it anywhere.

So I decided to try and make my own! Luckily I have access to a 3d printer. I spent most of today learning how to do some really basic 3d modelling tasks. I learned about http://tinkercad.com – which is by far the easiest way to get started on creating your own 3d models.

At first my plan was to make a simple prototype, and once I got the prototype I was going to send the order off to http://shapeways.com – but when I went to check out the prices I was blown away by how expensive things were. Apparently printing 1 planter costs ~$350, which is way too much.

So I guess I’ll have to be satisfied with the one print that I’ll probably make over the next week or two. I’ll plan to post photos here of photos that I take during the printing process.

Below you can check out the model that I created.

After a fair bit more of experimentation I have come up with an almost final version of the logo for the project that I’m now calling “everyday”. You can read about the first pass at this logo here: http://jake.ai/logo-for-a-new-project/

I have prototyped it using http://codepen.io which I’ve really been loving:

http://codepen.io/jakedahn/pen/KwaXrG

See the Pen Full demo with mask by Jake Dahn (@jakedahn) on CodePen.

Several years ago I learned about oblique strategies from listening to a talk from Joel Gethin Lewis. His talk was super inspiring for me, and it led me to building a website for oblique strategies that displays one at random. This project was one of the first things I built using ruby when I was first learning how to code, so even though it is extremely trivial to build, it holds a special place in my heart.

Unfortunately the site has long been dead. Until today! I performed some code archeology and revived the project – You can visit it by going to http://oblique.jakedahn.com.

The site was shut down a long time ago because it was running on Heroku’s deprecated aspen stack (first heroku stack). When I was building it ruby 1.8.7 was the new hottness, and sinatra had just come out. So to revive the site a lot of changes had to be made so I spent some time today modernizing the sinatra code, and making it work against new ruby (2.1.3) and deploying it outside of heroku on my own server.

I also took some time to update the style a little bit. The old version was very dark and hard to read. So I took about 10 minutes to wrap it in bootstrap — which makes it actually readable. Below is what it used to look like vs. what it looks like now.

Old vs. New

old is left, new is right.

Go is all of the rage, all my friends are doing it so I feel like I have to do it too… So I’ve been playing with Go and Docker a bit recently to try and learn more about go and how I can use it for work and side projects.

I’m enjoying things so far, and I’m finding interesting tidbits as I go. This is one of them.

With go it is easy to build staticly linked binaries which can be distributed. This is awesome by itself, but with docker it is even more awesome.

You can build your go binaries like so:

CGO_ENABLED=0 go build -a ../src/worker.go

Once you’ve built your staticly linked binary you can build a docker image on top of the scratch base image. You can read about the Scratch Image on the docker site, but basically its an empty image that you can drop your binary into that will run as a very minimal container.

FROM scratch
MAINTAINER Jake Dahn <jak[email protected]>
ADD bin/worker /worker
ENTRYPOINT ["/worker"]

Below is output of some shell output from building and running a docker container that contains a go binary. Notice the image size is 5MB. If I use ubuntu or other distros as a base image my docker image is significanly larger:

[email protected]:/vagrant# docker pull scratch
Pulling repository scratch
511136ea3c5a: Download complete
Status: Image is up to date for scratch:latest

[email protected]:/vagrant$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
scratch             latest              511136ea3c5a        16 months ago       0 B

[email protected]:/vagrant$ docker build -t jake .
Sending build context to Docker daemon 53.73 MB
Sending build context to Docker daemon
Step 0 : FROM scratch
 ---> 511136ea3c5a
Step 1 : MAINTAINER Jake Dahn <[email protected]>
 ---> Running in 27f068468755
 ---> a063241634df
Removing intermediate container 27f068468755
Step 2 : ADD bin/worker /worker
 ---> 5053eed5787c
Removing intermediate container 3a36a6c5e19b
Step 3 : ENTRYPOINT /worker
 ---> Running in 1f8014d919b5
 ---> ae0bd28d3e2e
Removing intermediate container 1f8014d919b5
Successfully built ae0bd28d3e2e

[email protected]:/vagrant$ docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
jake                latest              ae0bd28d3e2e        4 seconds ago       5.191 MB
scratch             latest              511136ea3c5a        16 months ago       0 B

[email protected]:/vagrant$ docker run -t -i jake
2014/10/24 05:40:02  [*] Waiting for messages. To exit press CTRL+C # this is a running go program that is a rabbitmq consumer

This is an interesting way to ship and deploy software.

A long time ago when I was working on NASA Nebula I was introduced to the idea of a CloudTop. CloudTop is the simple idea of using a VM hosted in a private or public cloud as your development environment, but also having tools to make bootstrapping your environment easy in ephemeral scenerios. The benefits of using this type of environment as a development environment is pretty clear:

  • Crazy fast 10Gbit internet
  • Regardless of the internet connection you’re currently using, as long as its fast enough to smoothly run an SSH session you will be able to download things to your development environment and very high speeds.
  • Access to 16core+ machines
  • Lots of ram, more ram that will fit in a macbook air

Over time this pattern has become much more common with the price of public cloud services dropping, and better tools for keeping your development environment in sync and working as you expect it.

Recently while trying to get back into the habbit of using a CloudTop I worked through how to use Amazon EC2’s spot instances as a CloudTop. This type of environment is a little quirky, and it requires a few tools to make it actually work well.

Getting Started

I’m going to go over the basics of how I setup a CloudTop environment on Amazon EC2 using a persistent spot instance request.

The Instance Type

For the last month I’ve been using the hi1.4xlarge (16 cores, 60GB ram, and 2x1TB ephemeral SSDs which I stripe together as 1 disk using RAID0). At the time of writing the average bid for this type of machine on us-west-2 is ~$0.14/hr. In an attempt to ensure that the spot instance doesn’t get reaped regularly I bid $0.2/hr for this instance type. This means that in a worst case scenerio I would pay ~$145/month for my development environment.

This may sound like a lot of money, but having pre-provisioned resources with all the tools I need to be effective is worth it.

If you’re interested in something a little more standard but still want the benefits of fast internet I can recommend the c3.xlarge instance type (4cores, 7.5GB ram, 2x40GB ephemeral SSDs). At the time of writing the c3.xlarge type runs at ~$0.03/hr on the spot market, which means you end up paying a little over $20/month for it.

The intricacies of spot instances

Spot instances are a little strange in that they can be terminated at any moment, and all of your data will be wiped. This presents two problems:

  1. If my development gets wiped every time spot prices increase, it’s a pain in the ass to reprovision everything when prices come back down. I address this with a few things. First: my spot requests are “persistent” meaning if the instance gets reaped because of price increase, it will come back when the prices return to normal. So that makes sure my CloudTop is up when it is financially possible. The other thing that I do is create an AMI image using Packer.io that contains my standard development environment on the root disk. This includes services like docker, and general tools like vim and tmux. I may publish a tutorial on how to use Packer for this, but the summary is that I just run my dotfiles repository to create the base environment I want. Using persistent spot requests means that whenever my development environment is terminated because spot prices increased, it will bounce back once prices are low enough, and all of my tools and services will be presnet.
  2. If I write code and do not push it back upstream to github, and my VM gets reaped I can potentially lose code. I address this by attaching a persistent EBS volume to the spot instance at /dev/xvdj which I mount to /home/ubuntu/persistent. When I’m working on code that I do not want to lose between cloudtop reboots I place it in this persistent directory. In the spot request you can map the device to a specific EBS volume so it will always come back with your data after a new spot instance has spawned.

The next time I get into the weeds with rebuilding an ami for cloudtops I might publish some of the scripts that I used to provision the cloudtop that I use and also tutorials on how to use packer with dotfiles and stuff. This post is meant to get the point across and the next post will be a technical how-to.