Node.js, Redis, and Etherpad

This is an old post. It may contain broken links and outdated information.

Etherpad Lite is a real-time collaborative text editor which allows multiple users to simultaneously fiddle with a document. Everyone can see everyone else’s changes in real time, which is really cool. The “lite” tag after the name is there because the Etherpad Lite project was spawned from the ashes of the original Etherpad project, which itself was snapped up by Google and transformed into the ill-fated Google Wave. Wave was never really all that popular and has since been killed, but the core idea is still totally cool—to present a document to more than one user and have all users be able to make changes to it, and to have those changes shown to all other users as they’re made. It’s a surprisingly complex problem to solve. For one thing, it’s entirely possible that more than one user can change the same thing at the same time; there has to be a way of telling who “wins” and whose changes are tossed out. Even more complex is figuring out a way to track all the changes and organize them, and then display them for everyone.

The original Etherpad project solved this with a full mix of heavy web technologies, requiring you to install Java, Scala, and MySQL; it utilized no small amount of server resources and was difficult to scale. Etherpad Lite jettisons a lot of its predecessor’s bulk and does things in a much more web two-point-oh fashion. Specifically, Etherpad Lite runs on Node.js, a server-side Javascript engine which can be used for lots and lots of fancy things—Node is really deserving of its own blog entry, and we’re using only a tiny subset of its features here. Etherpad Lite also needs a database to hold the documents and store changes; out of the box it can use MySQL, but in this post we’re going to take things even further and configure it to run on Redis, an extremely fast memory-based key-value store. Finally, we’ll do a tiny bit of hacking on Etherpad’s HTML to force it to display a list of all the “pads” (documents) currently in its database.

Read more

Platforms and value judgments

This is an old post. It may contain broken links and outdated information.

My name is Lee, and I am a Mac user.

There, I said it. I’m a dirty, dirty Mac user, and I’m okay with that.

My intent with this blog was for it to remain purely technical, with no personal entries at all; I’ve been down that road before with my last blog and it didn’t end well. However, an article went up this past weekend on Ars where the staff posted pictures of their office desks, and the amount of herpderping in the article’s comments about the mostly-Mac setups was boggling. Maybe it struck me so hard because I consciously avoid platform flame war discussions, having taken part in more than my share in the 80s and 90s; whatever the reason, some of the shit bubbling up to the surface in that article’s discussion thread just blows my mind.

The computing platform you start with might say something about your economic status (can’t afford a Mac, gotta use a hand-me-down Packard Bell!) or your computing ability (“I’ve got a Dell!” “What model?” “….Dell!”), and the computing platform you choose might say something about your goals and preferences (“GONNA DRIVE THE FRAG TRAIN TO TEAMKILL TOWN WITH MY FIFTEEN GRAPHICS CARDS!”), but judging someone’s intelligence and worth as a human being based on whether they’re using a home-built Windows 7 PC or a Mac is ludicrous. I’ve built more PCs from parts in the past quarter-century than any hate-spewing MACS SUCK noob on any discussion board you’d care to name, and yet the computers I use most often throughout any given day have an Apple logo on them.

Things weren’t always thus.

Read more

Adventures in Varnish

This is an old post. It may contain broken links and outdated information.

In the previous entry, I touched briefly on how some experimentation with Blitz.io led to me installing Varnish Cache on the Bigdino web server. Varnish Cache is a fast and powerful web accelerator, which works by caching your web site’s content (html and css files, javascript files, images, font files, and whatever else you got) in RAM. It differs from other key-based web cache solutions (like memcache) by not attempting to reinvent the wheel with respect to storing and accessing its cache contents; rather than potentially arguing with its host server’s OS and virtual memory subsystem on what gets to live in RAM and what gets paged out to disk, Varnish Cache relies wholly on the host’s virtual memory subsystem for handling where its cache contents live.

Read more

Blitz.io makes load testing fun

This is an old post. It may contain broken links and outdated information.

Web site performance has been on my mind a lot lately. An average day for this blog means serving only a few hundred visitors and maybe 4-500 page views, but bigdinosaur.org also hosts the Chronicles of George, which carries with it a much higher average load; on days when a link hits Reddit or a popular Facebook page, the CoG can clock 10-12,000 pageviews. This is still small potatoes compared to a truly popular site, but it pays to be prepared for anything and everything. Setting up a web server to be as fast as possible is good practice for the day it gets linked to from Daring Fireball or Slashdot, and even if that day never comes, there’s nothing wrong with having an efficient and powerful setup which can dish out a huge amount of both static and dynamic content.

So in the course of some casual perusing for Nginx optimizations, I happened across Blitz.io, a Heroku-based site which gives you the ability to load-test your web server. I was immediately intrigued; I’ve done load testing on my LAN before using Siege and Apachebench; LAN-based testing is useful to a point, but it won’t help you to understand the over-the-net performance of your web site. Blitz.io fills a gaping hole in web site testing, letting you observe how your site reacts under load in real-world conditions. I was intrigued and I signed up, and I ended up killing several hours with testing, mitigation, and re-testing. The results were unexpected and incredibly valuable.

Read more

If This Then That dot com

This is an old post. It may contain broken links and outdated information.

Brandon Mathis, the creator and maintainer of Octopress, recently tweeted a method to programmatically create tweets from new Octopress blog posts. Moments after that, he retweeted a reponse from another Octopress user which outlined a simpler method using ifttt.com, a web site which lets you create automated actions based on conditions.

The site’s name is pronounced like “lift” but without the “l”, as it they proclaim on their homepage. The awkward construction comes from “if this then that”, which describes the web site’s purpose: if a thing happens, then do something.

Read more

Vanilla forums on Nginx

This is an old post. It may contain broken links and outdated information.

A few years ago I created a web site called the Chronicles of George, featuring some badly-written help desk tickets from the job I had at the time. It gained me some small amount of Internet fame (but no fortune), and developed a loyal community of sympathizers. For a long time we hung out on a self-hosted phpbb forum, but a change in web hosting led to the opportunity to also change the forum software away from something as hack-prone and complex as phpbb to something faster, simpler, and ostensibly more secure: Vanilla.

Out of the box, Vanilla operates a bit differently from a traditional thread-based forum like phpbb. It is a discussion-focused forum, deprioritizing standard categorical organization in favor of bringing the things being talked about to the forefront. This has advantages in some forum models, like a support forum for a specific product or service, where the first thing a reader wants to see is discussion, not a choice of categories, but it’s not necessarily what most folks are used to seeing out of a web forum. Fortunately, Vanilla also offers configuration options to make it behave more like a “standard” web forum.

Why choose it, then, if we’re just going to override its most distinguishing characteristic? Because, as mentioned in the opening paragraph, it’s light and fast and secure. Additionally, the 2.1 branch (currently under development and downloadable here) comes with an absolutely killer theme that we can easily customize and prettify with some quick CSS.

Read more

Wordpress on Nginx

This is an old post. It may contain broken links and outdated information.

Wordpress is the Microsoft Word of blogging platforms—it’s overkill for almost everyone, but everyone uses it anyway. It’s a popular, monstrous, ugly app that requires regular patching to keep evildoers from doing evil with it, but it’s still a top choice for self-hosted blogging because if you can fight your way through its ridiculously complex interface, you can use it to make a good-looking blog without having to know a lot about HTML or CSS.

Our blogging platform here at the Bigdino compoud is obviously Octopress—which you’re reading right now—but I had occasion to stand up a Wordpress blog recently and wanted to share what I learned doing it. Wordpress’s ubiquity means that there are a million-billion-trillion guides out there for getting it working; however, the vast majority of them focus on how to make it work with Apache, not Nginx. What I hope differentiates this post is that I’m going to focus on taking common .htaccess-based security practices and turning them into Nginx-specific location directives and rules.

Read more

MediaWiki on Nginx

This is an old post. It may contain broken links and outdated information.

I host a small Minecraft server with maybe a couple dozen players total, and for the past several months we’ve been using a wiki to catalog our achievements. I started with DokuWiki, a flat-file wiki, because I was reluctant to weigh down a webserver with a database just to host a small wiki, but now that Bigdinosaur.org is hosting some more things and needs a database, it seemed time to switch over to MediaWiki, the wiki engine that powers Wikipedia and the whole MedkaWiki foundation network of sites.

There are lots of MediaWiki-on-Nginx guides out there, but I didn’t find anything approaching the completeness of the much more common MediaWiki-on-Apache guides. The configuration I settled on was a mix of things from around the web, including the MediaWiki site and the Nginx Wiki, and my own ideas, with an eye toward closing off access to as much of the internals as possible and pulling the main configuration components out of the web root.

Read more

Securing ssh with iptables

This is an old post. It may contain broken links and outdated information.

In the previous post, I discussed one possible method of keeping undesirables from connecting to your server via ssh: using the DenyHosts TCP wrapper to watch authentication attempts and block remote hosts based on conditions you set. DenyHosts (and other TCP wrappers) are easy to set up and don’t require much maintenance, but the block list files they generate can grow to a not-insignificant size; further, your web server must spend resources matching incoming ssh connection attempts against the block lists. If you’re on a particularly resource-constrained shared host, this might have some impact on overall server performance. Plus, even in its most recent update, DenyHosts can lag a bit in its blocking—because it uses regexes run against your server’s auth.log file to figure out what it needs to do, a remote attacker blasting out a tremendous number of logon attempts per second could get far above your allowed threshold of connection attempts in before DenyHosts drops the hammer.

There are lots of other things you can do to help secure your web server’s ssh port, but one of the most powerful and flexible is to bring iptables into the mix. Iptables is an applicaiton which comes preinstalled on most modern GNU/Linux distros and which provides instructions to the Linux kernel firewall. It is not a firewall in and of itself; rather, it provides a (relatively) easy way to view and modify the way the system’s built-in firewall tracks, filters, and transforms the network packets it receives.

In this particular use case, we care about iptables’s ability to perform actions on incoming ssh packets, based on parameters we define. Specifically, we’re going to use it to track all incoming ssh requests, and then block any host that tries to connect too many times. This is a simpler and more robust approach than the one DenyHosts takes, and the advantages are that it is self-maintaining and not dependent on log file parsing to work.

(Special thanks to my friend and mentor RB for passing along his feedback on the previous post and the instruction on how to get rolling with iptables!)

Read more

Securing your server with DenyHosts

This is an old post. It may contain broken links and outdated information.

Running any kind of server at all is a risk, because the internet is a bad place full of bad people who like to destroy things for fun (and if you don’t believe me, read this). It becomes a matter of risk management—you have to expose certain things, like TCP ports 80 and maybe 443, for your web server to be reachable; you also probably need to expose at least one management port somewhere so that your server can be poked and prodded should things go wrong with it.

This usually means exposing port 22 for ssh if you’re on some kind of Unix-ish operating system like we are here at the Bigdinosaur.org compound. Blindly exposing your ssh port is not without peril, but there are several things you can do to manage the risk—namely, moving the ssh daemon onto a different port; controlling which local accounts are allowed to log on via ssh; and most importantly, installing and configuring something like DenyHosts, a TCP wrapper designed to help keep undesirables from being allowed to log on at all.

Read more