Bigdinosaur Blog

Tales of hacking and stomping on things.

Node.js, Redis, and Etherpad Lite

| Comments

Etherpad Lite is a real-time collaborative text editor which allows multiple users to simultaneously fiddle with a document. Everyone can see everyone else’s changes in real time, which is really cool. The “lite” tag after the name is there because the Etherpad Lite project was spawned from the ashes of the original Etherpad project, which itself was snapped up by Google and transformed into the ill-fated Google Wave. Wave was never really all that popular and has since been killed, but the core idea is still totally cool—to present a document to more than one user and have all users be able to make changes to it, and to have those changes shown to all other users as they’re made. It’s a surprisingly complex problem to solve. For one thing, it’s entirely possible that more than one user can change the same thing at the same time; there has to be a way of telling who “wins” and whose changes are tossed out. Even more complex is figuring out a way to track all the changes and organize them, and then display them for everyone.

The original Etherpad project solved this with a full mix of heavy web technologies, requiring you to install Java, Scala, and MySQL; it utilized no small amount of server resources and was difficult to scale. Etherpad Lite jettisons a lot of its predecessor’s bulk and does things in a much more web two-point-oh fashion. Specifically, Etherpad Lite runs on Node.js, a server-side Javascript engine which can be used for lots and lots of fancy things—Node is really deserving of its own blog entry, and we’re using only a tiny subset of its features here. Etherpad Lite also needs a database to hold the documents and store changes; out of the box it can use MySQL, but in this post we’re going to take things even further and configure it to run on Redis, an extremely fast memory-based key-value store. Finally, we’ll do a tiny bit of hacking on Etherpad’s HTML to force it to display a list of all the “pads” (documents) currently in its database.

Platforms and Value Judgments

| Comments

N.B. This is a personal post. I promise to keep this kind of thing extremely rare, and to do more technical posts soon.

My name is Lee, and I am a Mac user.

There, I said it. I’m a dirty, dirty Mac user, and I’m okay with that.

My intent with this blog was for it to remain purely technical, with no personal entries at all; I’ve been down that road before with my last blog and it didn’t end well. However, an article went up this past weekend on Ars where the staff posted pictures of their office desks, and the amount of herpderping in the article’s comments about the mostly-Mac setups was boggling. Maybe it struck me so hard because I consciously avoid platform flame war discussions, having taken part in more than my share in the 80s and 90s; whatever the reason, some of the shit bubbling up to the surface in that article’s discussion thread just blows my mind.

The computing platform you start with might say something about your economic status (can’t afford a Mac, gotta use a hand-me-down Packard Bell!) or your computing ability (“I’ve got a Dell!” “What model?” “….Dell!”), and the computing platform you choose might say something about your goals and preferences (“GONNA DRIVE THE FRAG TRAIN TO TEAMKILL TOWN WITH MY FIFTEEN GRAPHICS CARDS!”), but judging someone’s intelligence and worth as a human being based on whether they’re using a home-built Windows 7 PC or a Mac is ludicrous. I’ve built more PCs from parts in the past quarter-century than any hate-spewing MACS SUCK noob on any discussion board you’d care to name, and yet the computers I use most often throughout any given day have an Apple logo on them.

Things weren’t always thus.

Adventures in Varnish

| Comments

In the previous entry, I touched briefly on how some experimentation with Blitz.io led to me installing Varnish Cache on the Bigdino web server. Varnish Cache is a fast and powerful web accelerator, which works by caching your web site’s content (html and css files, javascript files, images, font files, and whatever else you got) in RAM. It differs from other key-based web cache solutions (like memcache) by not attempting to reinvent the wheel with respect to storing and accessing its cache contents; rather than potentially arguing with its host server’s OS and virtual memory subsystem on what gets to live in RAM and what gets paged out to disk, Varnish Cache relies wholly on the host’s virtual memory subsystem for handling where its cache contents live.

Varnish is able to serve objects out of its cache much faster and using far fewer host resources than a web server application. While deploying it to see how it might help or hinder my Blitz.io runs, I did some brief and extremely unscientific testing on the Bigdino Nginx web server using Siege to simulate 1000 concurrent HTTP connections. When hitting Nginx directly, without Varnish in the middle, the server showed a considerable amount of CPU usage:

Running the same benchmark against the same host but pointing Siege at Varnish’s TCP port instead of Nginx’s yields considerably lower CPU utilization:

In the second instance, Varnish is servicing the HTTP requests directly out of its cache, which is sitting in RAM. Varnish doesn’t have to bother Nginx to have it evaluate the requests and pull things up off of slow disk (even the web server’s fast SSD is glacially slow compared to RAM!); the improvements would be even greater if there were any PHP code to be executed in order to dig up and serve static content.

As might be gleaned from the plethora of “Make X work on Nginx!” blog posts here, Bigdino supports a number of different sites and web applications. There’s The Chronicles of George, which is all static content but which has a forum running Vanilla; there’s a couple of Wordpress blogs; there’s a MediaWiki-based wiki for Minecraft stuff, and then there’s the Bigdino main site and the blog. This is a fair mix of static and dynamic content, but through some trial and error and a whole hell of a lot of Googling, I’ve put together a Varnish configuration which handles all of the sites very well, and I’d like to share.

Blitz.io Makes Load Testing Fun

| Comments

Web site performance has been on my mind a lot lately. An average day for this blog means serving only a few hundred visitors and maybe 4-500 page views, but bigdinosaur.org also hosts the Chronicles of George, which carries with it a much higher average load; on days when a link hits Reddit or a popular Facebook page, the CoG can clock 10-12,000 pageviews. This is still small potatoes compared to a truly popular site, but it pays to be prepared for anything and everything. Setting up a web server to be as fast as possible is good practice for the day it gets linked to from Daring Fireball or Slashdot, and even if that day never comes, there’s nothing wrong with having an efficient and powerful setup which can dish out a huge amount of both static and dynamic content.

So in the course of some casual perusing for Nginx optimizations, I happened across Blitz.io, a Heroku-based site which gives you the ability to load-test your web server. I was immediately intrigued; I’ve done load testing on my LAN before using Siege and Apachebench; LAN-based testing is useful to a point, but it won’t help you to understand the over-the-net performance of your web site. Blitz.io fills a gaping hole in web site testing, letting you observe how your site reacts under load in real-world conditions. I was intrigued and I signed up, and I ended up killing several hours with testing, mitigation, and re-testing. The results were unexpected and incredibly valuable.

If This Then That Dot Com

| Comments

Brandon Mathis, the creator and maintainer of Octopress, recently tweeted a method to programmatically create tweets from new Octopress blog posts. Moments after that, he retweeted a reponse from another Octopress user which outlined a simpler method using ifttt.com, a web site which lets you create automated actions based on conditions.

The site’s name is prounounced like “lift” but without the “l”, as it they proclaim on their homepage. The awkward construction comes from “if this then that”, which describes the web site’s purpose: if a thing happens, then do something.

There are a huge range of triggers you can pick from and actions you can accomplish; the site gives lots of ready-made examples (which they call “recipes”)—automatically send yourself a text message if you’re tagged in a photo on Facebook, or automatically take anything you send to Instagram and archive it on Dropbox, and lots of other things. I was interested in using it to automatically tweet anything I post on the Bigdinosaur blog. There are recipes already created for if you have a Blogger or Wordpress blog, but Octopress doesn’t have the external hooks those other blogging engines do (or really any hooks, since Octopress is just a flat-file blog!), but Octopress does automatically update its RSS feed whenever you make a new post. Since you can create actions based on RSS feeds, we’re all set!

Vanilla Forums on Nginx

| Comments

A few years ago I created a web site called the Chronicles of George, featuring some badly-written help desk tickets from the job I had at the time. It gained me some small amount of Internet fame (but no fortune), and developed a loyal community of sympathizers. For a long time we hung out on a self-hosted phpbb forum, but a change in web hosting led to the opportunity to also change the forum software away from something as hack-prone and complex as phpbb to something faster, simpler, and ostensibly more secure: Vanilla.

Out of the box, Vanilla operates a bit differently from a traditional thread-based forum like phpbb. It is a discussion-focused forum, deprioritizing standard categorical organization in favor of bringing the things being talked about to the forefront. This has advantages in some forum models, like a support forum for a specific product or service, where the first thing a reader wants to see is discussion, not a choice of categories, but it’s not necessarily what most folks are used to seeing out of a web forum. Fortunately, Vanilla also offers configuration options to make it behave more like a “standard” web forum.

Why choose it, then, if we’re just going to override its most distinguishing characteristic? Because, as mentioned in the opening paragraph, it’s light and fast and secure. Additionally, the 2.1 branch (currently under development and downloadable here) comes with an absolutely killer theme that we can easily customize and prettify with some quick CSS.

Wordpress on Nginx

| Comments

Wordpress is the Microsoft Word of blogging platforms—it’s overkill for almost everyone, but everyone uses it anyway. It’s a popular, monstrous, ugly app that requires regular patching to keep evildoers from doing evil with it, but it’s still a top choice for self-hosted blogging because if you can fight your way through its ridiculously complex interface, you can use it to make a good-looking blog without having to know a lot about HTML or CSS.

Our blogging platform here at the Bigdino compoud is obviously Octopress—which you’re reading right now—but I had occasion to stand up a Wordpress blog recently and wanted to share what I learned doing it. Wordpress’s ubiquity means that there are a million-billion-trillion guides out there for getting it working; however, the vast majority of them focus on how to make it work with Apache, not Nginx. What I hope differentiates this post is that I’m going to focus on taking common .htaccess-based security practices and turning them into Nginx-specific location directives and rules.

MediaWiki on Nginx

| Comments

I host a small Minecraft server with maybe a couple dozen players total, and for the past several months we’ve been using a wiki to catalog our achievements. I started with DokuWiki, a flat-file wiki, because I was reluctant to weigh down a webserver with a database just to host a small wiki, but now that Bigdinosaur.org is hosting some more things and needs a database, it seemed time to switch over to MediaWiki, the wiki engine that powers Wikipedia and the whole MedkaWiki foundation network of sites.

There are lots of MediaWiki-on-Nginx guides out there, but I didn’t find anything approaching the completeness of the much more common MediaWiki-on-Apache guides. The configuration I settled on was a mix of things from around the web, including the MediaWiki site and the Nginx Wiki, and my own ideas, with an eye toward closing off access to as much of the internals as possible and pulling the main configuration components out of the web root.

Securing SSH With Iptables

| Comments

In the previous post, I discussed one possible method of keeping undesirables from connecting to your server via ssh: using the DenyHosts TCP wrapper to watch authentication attempts and block remote hosts based on conditions you set. DenyHosts (and other TCP wrappers) are easy to set up and don’t require much maintenance, but the block list files they generate can grow to a not-insignificant size; further, your web server must spend resources matching incoming ssh connection attempts against the block lists. If you’re on a particularly resource-constrained shared host, this might have some impact on overall server performance. Plus, even in its most recent update, DenyHosts can lag a bit in its blocking—because it uses regexes run against your server’s auth.log file to figure out what it needs to do, a remote attacker blasting out a tremendous number of logon attempts per second could get far above your allowed threshold of connection attempts in before DenyHosts drops the hammer.

There are lots of other things you can do to help secure your web server’s ssh port, but one of the most powerful and flexible is to bring iptables into the mix. Iptables is an applicaiton which comes preinstalled on most modern GNU/Linux distros and which provides instructions to the Linux kernel firewall. It is not a firewall in and of itself; rather, it provides a (relatively) easy way to view and modify the way the system’s built-in firewall tracks, filters, and transforms the network packets it receives.

In this particular use case, we care about iptables’s ability to perform actions on incoming ssh packets, based on parameters we define. Specifically, we’re going to use it to track all incoming ssh requests, and then block any host that tries to connect too many times. This is a simpler and more robust approach than the one DenyHosts takes, and the advantages are that it is self-maintaining and not dependent on log file parsing to work.

(Special thanks to my friend and mentor RB for passing along his feedback on the previous post and the instruction on how to get rolling with iptables!)

Securing Your Server With DenyHosts

| Comments

Running any kind of server at all is a risk, because the internet is a bad place full of bad people who like to destroy things for fun (and if you don’t believe me, read this). It becomes a matter of risk management—you have to expose certain things, like TCP ports 80 and maybe 443, for your web server to be reachable; you also probably need to expose at least one management port somewhere so that your server can be poked and prodded should things go wrong with it.

This usually means exposing port 22 for ssh if you’re on some kind of Unix-ish operating system like we are here at the Bigdinosaur.org compound. Blindly exposing your ssh port is not without peril, but there are several things you can do to manage the risk—namely, moving the ssh daemon onto a different port; controlling which local accounts are allowed to log on via ssh; and most importantly, installing and configuring something like DenyHosts, a TCP wrapper designed to help keep undesirables from being allowed to log on at all.