I Built a Node Site
Intrest Free (Technical) Debt Is Risky
Earlier today I read a post from Javier Salado that asked the question “If the interest rate is 0%, do you want to pay back your debt?”. In this case Javier was referring to technical debt, but I felt like the conclusion he reached was the same mis-understanding that people apply to regular debt. Let me back up a bit. In Javier’s post, he lay’s out the following scenario:
“Imagine you convince a bank (not likely) to grant you a loan with 0% interest rate until the end of time, would you pay back? I wouldn’t. It’s free money. Who doesn’t like free money?”
He then goes on to apply this thinking to technical debt.
“You have an application with, let’s say, $1,000,000 measured technical debt. It was developed 10 years ago when your organization didn’t have a fixed quality model nor coding standards for the particular technologies involved, hence the debt. Overtime, the application has been steadily provided useful functionality to users and what they have to say about it is mainly good. You have adapted to your organization’s new quality process, the maintenance cost is reasonable and any changes you have to make have an expected time-to-market that allows business growth. We could say the interest rate on your debt is close to 0%, why should I invest in reducing the debt?”
I think the answer to both questions is yes, and he makes the same mistake a lot of people do when it comes to taking on debt (technical or otherwise). Calculating the cost of debt cannot be based just on the interest rate alone, you must also factor in risk. In financial transactions, even a debt with 0% interest likely has some form of payment terms and collateral. (One might argue that Javier really meant a loan from a bank that was 0% interest, required no collateral, and had no terms for re-payment. I’d argue that’s a gift, not a loan.) It turns out, 0% interest loans aren’t actually just make believe. A simple example, which is actually a real world example, would be a 0% interest car loan. While this looks great from an interest point of view, it’s not so good from a risk assessment point of view; if you get into an accident, you now owe a bunch of money and no longer have the collateral to pay it off. It’s a double whammy if you figure you might have to deal with fallout from the accident itself.
So the question is, does risk assessment carry over to the technical debt metaphor? I believe it does. In most cases technical debt comes from legacy code, which means the number of people who can work on it are all folks who have been around a long time. In most cases, rather than teach new people how to develop on the legacy system, you just have the “old timers” deal with it when needed. But of course, this is risky, because as time goes by, you probably have fewer and fewer people who can serve in this role. This is a risk. You also have to be aware that, while you have the large amount of managed technical debt, it’s always possible that some new, unforeseen event could occur that changes the dynamic of things. Perhaps a large client / market opens up to you, or some similar opportunity. Perhaps a merger with a new company would be proposed. You now have to re-evaluate your technical situation, and in many cases that technical debt may come back to bite you.
In the end, I don’t think Javier was way off base with his recommendations, which was essentially to follow Elizabeth Naramore’s “D.E.B.T.” system (pdf/slides), to measure your debt and then decide how and what needs to be paid off. But I think it’s important to remember that once you have identified your debt, even if the “interest” on that debt is still low, it does represent risk within your organization (or your personal finances), and you would be best to eliminate as much of it as you can.
Monitoring for “E-Tailers”
Cloudy With a Chance of Scale
Checkpoints, Buffers, and Graphs
 Yay graphs!
Update: Shortly after posting, Keith mentioned that he had updated the graph to speak in MB rather than Buffers. So, here is an updated screenshot with friendlier output and more data. (Note that Phil, one of our other DBA’s, also flipped the buffers allocated to a right axis as well).
Yay graphs!
Update: Shortly after posting, Keith mentioned that he had updated the graph to speak in MB rather than Buffers. So, here is an updated screenshot with friendlier output and more data. (Note that Phil, one of our other DBA’s, also flipped the buffers allocated to a right axis as well). 
 
Understanding Postgres Durability Options
Most people tend to think of Postgres as a very conservative piece of software, one designed to “Not Lose Your Data”. This reputation is probably warranted, but the other side of that coin is that Postgres also suffers when it comes to performance because it chooses to be safe with your data out of the box. While a lot of systems tend to side towards being “fast by default”, and leaving durability needs as an exercise to the user, the Postgres community takes the opposite approach. I think I heard it put once as “We care more about your data than our benchmarks”.
That said, Postgres does have several options that can be used for performance gains in the face of durability tradeoffs. While only you can know the right mix for your particular data needs, it’s worth reviewing and understanding the options available.
“by default” - OK, this isn’t a real setting, but you should understand that, by default, Postgres will work to ensure full ACID guarantees, and more specifically that any data that is part of a COMMIT is immediately synched to disk. This is of course the slowest option you can chose, but given it’s also a popular code path the postgres devs have worked hard to optimize this scenario.
“synchronous commits” - By default synchronous_commit is turned on, meaning all commits are fsyncd to disk as they happen. The first trade-off of durability for performance should start here. Turning off synchronous commits introduces a window between when the client is notified of commit success, and when the data is truly pushed to disk. In affect, it let’s the database cheat a little. The key to this parameter is that, while you might introduce data loss, you would never introduce data corruption. Since it tends to produce significantly faster operations for write based workloads, many people find that is a durability tradeoff they are willing to make. As an added bonus, if you think that most of your code could take advantage of this but you have some certain part of your system that you can’t afford the tradeoff, this setting can be set per transaction, so you can ensure durability in the specific cases where you need it. That level of fined grained control is pretty awesome.
“delayed commits” - Similar sounding in theory to synchronous_commit, the settings for “commit_siblings” and “commit_delay” try to provide “grouped commits”, meaning multiple transactions are committed with a single fsync() call. While this certainly has the possibility of increasing performance in a heavily loaded system, when the system is not loaded these will actually slow down commits, and that overall lack of granularity compared to synchronous_commit usually means you should favor turning off synchronous_commit and bypass these settings when trading off durability for performance.
“non-synching” - Fsync was the original parameter for durability vs performance tradeoffs, and it can still be useful in some environments today. When turned off, postgres throws out all logic of synchronizing write activity with command input. This does mean that running in this mode, in the event of hardware or server failure, you can end up with corrupt, not just missing, but corrupt data. In many cases this might not happen, or might happen in an area that does matter (say a corrupt index, that you can just REINDEX), but it could also happen within a system catalog, which can be disastrous. This leads many a Postgres DBA to tell you to never turn this off, but I’d say ignore that advice and evaluate things based on the tradeoffs of durability vs performance that are right for you. Consider this; if you have a standby set up (WAL based, Slony, Bucardo, etc…), and you are designing for a low MTTR, chances are in most cases hardware failure on the primary will lead to a near immediate switch to the standby anyway, so a corrupt database that you have already moved beyond will be irrelevant to your operations. This assumes that you can afford to lose some data, but if you are using asynchronous replication, you’ve already come to that conclusion. Of course, you are giving up single node durability, which might not be worth the tradeoffs in performance, especially since you can get most of the performance improvements with turning off synchronous_commits. In some situations you might fly in the face of conventional wisdom and turn off fsync in production, but leave it on in development; imagine an architecture where you’ve built redundancy on top of ec2 (so a server crash means a lost node), but you are developing on a desktop machine where you don’t want to have to rebuild in the case of a power failure, and don’t want to run multiple nodes.
Life is a series of tradeoffs between cost and efficiency, and Postgres tries to give you the flexibility you need to adjust to fit your particular situation. If you are setting up a new system, take a moment to think about the needs of your data. And before you replace Postgres with a new system, verify what durability guarantees that new system is giving you; it might be easier to set Postgres to something comparable. If you are trying to find the right balance on your own situation, please feel free to post your situation in the comments, and I’ll be happy to try to address it.