Bigdinosaur Blog

Tales of hacking and stomping on things.

Working on a New Comment System

In the previous post, I walked through setting up Discourse, a Ruby-based web forum. I’m in the process of shifting this blog’s comment system from Disqus, which I’ve never been wholly comfortable with, over to using Discourse instead. There is being done via a plug-in for Octopress currently under development by one of the posters over at the main Discourse development forum.

Why ditch Disqus? There are several reasons, but the biggest is the privacy concerns. Disqus tracks users’ movement across Disqus-enabled web sites, and I don’t like that. It provides a free, well-constructed, low-friction commenting system—that part’s nice, of course!—but I don’t like being followed, and I don’t think readers of this site should have to concede to being watched by Disqus in order to comment. Whether or not Disqus is harmless, and regardless of what they do with the data, I object to the principal.

So, we switch to Discourse. I could easily use Vanilla, too, since I already have a functioning Vanilla install and Vanilla easily supports being framed inside a blog for comments, but where’s the fun in easy?

The switch to Discourse is still a work in progress, and things aren’t quite working right yet, but I’m hammering away on it in my spare time. Currently, none of the blog posts have functioning comments, but we’re getting there. In the meanwhile, if you’d like to leave a comment on any of the blog entries, head over to my Discourse forum and comment there.

Setting Up Discourse With Passenger and Nginx

I like fiddling with new software and seeing if I can make it work—that’s what most of this blog is about, in fact. Most of the web-based apps I’ve walked through deploying have been written with PHP, but there’s a fancy new bit of Ruby-based forum software that I’ve sort of fallen in love with: Discourse.

Discourse is shiny and new, and the developers (including Jeff Atwood, one of the folks behing StackExchange). It’s made out of Ruby instead of PHP, and it uses PostgreSQL and Redis for its back-end. The project is still very, very beta; there are multiple methods of deploying it and it has a robust development environment that you can set up and start hacking away on.

This doesn’t matter much to me, though—I dont’ code. I just wanted to set it up and play with it. So, this is a walkthrough on how to deploy Discourse on Ubuntu, using Nginx and Phusion Passenger. If you want to see the end result, check out my Discourse test forum—when we’re done, you’ll have something similar up and running.

Node.js, Redis, and Etherpad Lite

Etherpad Lite is a real-time collaborative text editor which allows multiple users to simultaneously fiddle with a document. Everyone can see everyone else’s changes in real time, which is really cool. The “lite” tag after the name is there because the Etherpad Lite project was spawned from the ashes of the original Etherpad project, which itself was snapped up by Google and transformed into the ill-fated Google Wave. Wave was never really all that popular and has since been killed, but the core idea is still totally cool—to present a document to more than one user and have all users be able to make changes to it, and to have those changes shown to all other users as they’re made. It’s a surprisingly complex problem to solve. For one thing, it’s entirely possible that more than one user can change the same thing at the same time; there has to be a way of telling who “wins” and whose changes are tossed out. Even more complex is figuring out a way to track all the changes and organize them, and then display them for everyone.

The original Etherpad project solved this with a full mix of heavy web technologies, requiring you to install Java, Scala, and MySQL; it utilized no small amount of server resources and was difficult to scale. Etherpad Lite jettisons a lot of its predecessor’s bulk and does things in a much more web two-point-oh fashion. Specifically, Etherpad Lite runs on Node.js, a server-side Javascript engine which can be used for lots and lots of fancy things—Node is really deserving of its own blog entry, and we’re using only a tiny subset of its features here. Etherpad Lite also needs a database to hold the documents and store changes; out of the box it can use MySQL, but in this post we’re going to take things even further and configure it to run on Redis, an extremely fast memory-based key-value store. Finally, we’ll do a tiny bit of hacking on Etherpad’s HTML to force it to display a list of all the “pads” (documents) currently in its database.

Platforms and Value Judgments

N.B. This is a personal post. I promise to keep this kind of thing extremely rare, and to do more technical posts soon.

My name is Lee, and I am a Mac user.

There, I said it. I’m a dirty, dirty Mac user, and I’m okay with that.

My intent with this blog was for it to remain purely technical, with no personal entries at all; I’ve been down that road before with my last blog and it didn’t end well. However, an article went up this past weekend on Ars where the staff posted pictures of their office desks, and the amount of herpderping in the article’s comments about the mostly-Mac setups was boggling. Maybe it struck me so hard because I consciously avoid platform flame war discussions, having taken part in more than my share in the 80s and 90s; whatever the reason, some of the shit bubbling up to the surface in that article’s discussion thread just blows my mind.

The computing platform you start with might say something about your economic status (can’t afford a Mac, gotta use a hand-me-down Packard Bell!) or your computing ability (“I’ve got a Dell!” “What model?” “….Dell!”), and the computing platform you choose might say something about your goals and preferences (“GONNA DRIVE THE FRAG TRAIN TO TEAMKILL TOWN WITH MY FIFTEEN GRAPHICS CARDS!”), but judging someone’s intelligence and worth as a human being based on whether they’re using a home-built Windows 7 PC or a Mac is ludicrous. I’ve built more PCs from parts in the past quarter-century than any hate-spewing MACS SUCK noob on any discussion board you’d care to name, and yet the computers I use most often throughout any given day have an Apple logo on them.

Things weren’t always thus.

Adventures in Varnish

In the previous entry, I touched briefly on how some experimentation with Blitz.io led to me installing Varnish Cache on the Bigdino web server. Varnish Cache is a fast and powerful web accelerator, which works by caching your web site’s content (html and css files, javascript files, images, font files, and whatever else you got) in RAM. It differs from other key-based web cache solutions (like memcache) by not attempting to reinvent the wheel with respect to storing and accessing its cache contents; rather than potentially arguing with its host server’s OS and virtual memory subsystem on what gets to live in RAM and what gets paged out to disk, Varnish Cache relies wholly on the host’s virtual memory subsystem for handling where its cache contents live.

Varnish is able to serve objects out of its cache much faster and using far fewer host resources than a web server application. While deploying it to see how it might help or hinder my Blitz.io runs, I did some brief and extremely unscientific testing on the Bigdino Nginx web server using Siege to simulate 1000 concurrent HTTP connections. When hitting Nginx directly, without Varnish in the middle, the server showed a considerable amount of CPU usage:

Running the same benchmark against the same host but pointing Siege at Varnish’s TCP port instead of Nginx’s yields considerably lower CPU utilization:

In the second instance, Varnish is servicing the HTTP requests directly out of its cache, which is sitting in RAM. Varnish doesn’t have to bother Nginx to have it evaluate the requests and pull things up off of slow disk (even the web server’s fast SSD is glacially slow compared to RAM!); the improvements would be even greater if there were any PHP code to be executed in order to dig up and serve static content.

As might be gleaned from the plethora of “Make X work on Nginx!” blog posts here, Bigdino supports a number of different sites and web applications. There’s The Chronicles of George, which is all static content but which has a forum running Vanilla; there’s a couple of Wordpress blogs; there’s a MediaWiki-based wiki for Minecraft stuff, and then there’s the Bigdino main site and the blog. This is a fair mix of static and dynamic content, but through some trial and error and a whole hell of a lot of Googling, I’ve put together a Varnish configuration which handles all of the sites very well, and I’d like to share.

Blitz.io Makes Load Testing Fun

Web site performance has been on my mind a lot lately. An average day for this blog means serving only a few hundred visitors and maybe 4-500 page views, but bigdinosaur.org also hosts the Chronicles of George, which carries with it a much higher average load; on days when a link hits Reddit or a popular Facebook page, the CoG can clock 10-12,000 pageviews. This is still small potatoes compared to a truly popular site, but it pays to be prepared for anything and everything. Setting up a web server to be as fast as possible is good practice for the day it gets linked to from Daring Fireball or Slashdot, and even if that day never comes, there’s nothing wrong with having an efficient and powerful setup which can dish out a huge amount of both static and dynamic content.

So in the course of some casual perusing for Nginx optimizations, I happened across Blitz.io, a Heroku-based site which gives you the ability to load-test your web server. I was immediately intrigued; I’ve done load testing on my LAN before using Siege and Apachebench; LAN-based testing is useful to a point, but it won’t help you to understand the over-the-net performance of your web site. Blitz.io fills a gaping hole in web site testing, letting you observe how your site reacts under load in real-world conditions. I was intrigued and I signed up, and I ended up killing several hours with testing, mitigation, and re-testing. The results were unexpected and incredibly valuable.

If This Then That Dot Com

Brandon Mathis, the creator and maintainer of Octopress, recently tweeted a method to programmatically create tweets from new Octopress blog posts. Moments after that, he retweeted a reponse from another Octopress user which outlined a simpler method using ifttt.com, a web site which lets you create automated actions based on conditions.

The site’s name is prounounced like “lift” but without the “l”, as it they proclaim on their homepage. The awkward construction comes from “if this then that”, which describes the web site’s purpose: if a thing happens, then do something.

There are a huge range of triggers you can pick from and actions you can accomplish; the site gives lots of ready-made examples (which they call “recipes”)—automatically send yourself a text message if you’re tagged in a photo on Facebook, or automatically take anything you send to Instagram and archive it on Dropbox, and lots of other things. I was interested in using it to automatically tweet anything I post on the Bigdinosaur blog. There are recipes already created for if you have a Blogger or Wordpress blog, but Octopress doesn’t have the external hooks those other blogging engines do (or really any hooks, since Octopress is just a flat-file blog!), but Octopress does automatically update its RSS feed whenever you make a new post. Since you can create actions based on RSS feeds, we’re all set!

Vanilla Forums on Nginx

A few years ago I created a web site called the Chronicles of George, featuring some badly-written help desk tickets from the job I had at the time. It gained me some small amount of Internet fame (but no fortune), and developed a loyal community of sympathizers. For a long time we hung out on a self-hosted phpbb forum, but a change in web hosting led to the opportunity to also change the forum software away from something as hack-prone and complex as phpbb to something faster, simpler, and ostensibly more secure: Vanilla.

Out of the box, Vanilla operates a bit differently from a traditional thread-based forum like phpbb. It is a discussion-focused forum, deprioritizing standard categorical organization in favor of bringing the things being talked about to the forefront. This has advantages in some forum models, like a support forum for a specific product or service, where the first thing a reader wants to see is discussion, not a choice of categories, but it’s not necessarily what most folks are used to seeing out of a web forum. Fortunately, Vanilla also offers configuration options to make it behave more like a “standard” web forum.

Why choose it, then, if we’re just going to override its most distinguishing characteristic? Because, as mentioned in the opening paragraph, it’s light and fast and secure. Additionally, the 2.1 branch (currently under development and downloadable here) comes with an absolutely killer theme that we can easily customize and prettify with some quick CSS.

Wordpress on Nginx

Wordpress is the Microsoft Word of blogging platforms—it’s overkill for almost everyone, but everyone uses it anyway. It’s a popular, monstrous, ugly app that requires regular patching to keep evildoers from doing evil with it, but it’s still a top choice for self-hosted blogging because if you can fight your way through its ridiculously complex interface, you can use it to make a good-looking blog without having to know a lot about HTML or CSS.

Our blogging platform here at the Bigdino compoud is obviously Octopress—which you’re reading right now—but I had occasion to stand up a Wordpress blog recently and wanted to share what I learned doing it. Wordpress’s ubiquity means that there are a million-billion-trillion guides out there for getting it working; however, the vast majority of them focus on how to make it work with Apache, not Nginx. What I hope differentiates this post is that I’m going to focus on taking common .htaccess-based security practices and turning them into Nginx-specific location directives and rules.

MediaWiki on Nginx

I host a small Minecraft server with maybe a couple dozen players total, and for the past several months we’ve been using a wiki to catalog our achievements. I started with DokuWiki, a flat-file wiki, because I was reluctant to weigh down a webserver with a database just to host a small wiki, but now that Bigdinosaur.org is hosting some more things and needs a database, it seemed time to switch over to MediaWiki, the wiki engine that powers Wikipedia and the whole MedkaWiki foundation network of sites.

There are lots of MediaWiki-on-Nginx guides out there, but I didn’t find anything approaching the completeness of the much more common MediaWiki-on-Apache guides. The configuration I settled on was a mix of things from around the web, including the MediaWiki site and the Nginx Wiki, and my own ideas, with an eye toward closing off access to as much of the internals as possible and pulling the main configuration components out of the web root.