Cookie banners

In the EU, there's a requirement that users are warned about the cookies that a site uses. It's a bizarre quirk that resulted from well-meaning legislation that appeared before GDPR. However, it suggests that users know what a cookie is and what it might be used for, which they absolutely do not. And it's oddly specific: I can register a serviceworker or use localStorage to store data and run code on the device, but for cookies I need a warning banner. https://www.cookiebot.com/en/cookie-law/ Every website implements this banner differently, leading to a mix of different ways to pop up a banner over the content of whichever site you're currently viewing. There's no prescribed way to handle this, so each site owner must solve it for themselves. This leads to a lot of superstition about what the requirements actually are, especially for teams without formal legal advice, and especially for teams with formal legal teams (because legal teams tend to hedge against the possibility of legal action rather than acting in the interests of the user). What would be a better solution here? I notice that some sites ask me if they can send me push notifications, or whether they're able to read my current location. These prompts don't need to be provided by the site owner - they're built into the permission model of the browser. _Why is it different for cookies?_ Indeed, at a basic level, it is the browser who stores cookies for the websites. It is the browser attaching them to requests, reading them off the responses and managing cross-domain security. A browser is the only actor that can prevent a cookie in an http response header from being saved, based on approval given in a prompt. As I browse the internet in the EU, the vast majority of sites I see give me cookie notices. However, some do not. I would be interested to know if the browser vendor could be held legally responsible for not warning the users of these websites about the cookies that are issued. That might prompt swift action from the browser vendors to implement a native prompt. A native prompt could have other benefits: one might be able to set a setting to say "always accept" or "always reject", or to artificially limit the lifetime of cookies across all sites. I imagine that we could agree a means to provide the browser with a link to our cookie policy, and that the browser could display it without using cookies, before cookies are accepted by the user. Perhaps this would be at a [well-known](https://tools.ietf.org/html/rfc5785) url per site. Cookies are essential for complex and useful websites. Let's ask the browsers to help us out.

Read

Taking notes

There are different kinds of note-takers. Some people record everything: every word. They might or might not share them. Others will write down important points or action items. I tend to note down my own action items and let others fend for themselves. Others take no notes. Let's say a waiter comes to your table to take orders. If they take no notes, you're happy, because they're confident. But you're also uncomfortable. Unless this is a very classy place, you _know_ there's a mistake in there. Lemonades will be forgotten; steaks will be overcooked. Waiters should be taking action items. Let's say you're in a 1:1 at work with your direct manager. They're looking at you, but they're typing every word you say. Everything. It's not that they're not listening - they _must_ be listening to every word. But they're distracted - they're not _hearing_ you. And the chances of a Slack ping popping up when they're typing is 100%. They'll just answer it and "what were you saying can you just repeat that?" Infuriating. Managers should be present in the moment, taking important points, preferably to paper so that you can see them. Both of these recommendations involve paper, but they don't need to. A waiter taking orders on a tablet can work fine. A manager taking notes in a shared doc works really well, especially if you're remote. When I was a manager, I tried a mix of these methods. I couldn't keep up the conversation transcribing. My touch-typing just isn't good enough, and the conversation misses the context of the day anyway, so they don't even make good long term records. I often took _no_ notes. I enjoyed this - I could give my reports my full attention. However, our 1:1s became very repetitive, neither of us made progress. While I liked the idea of compiling a record of meetings at the end of the day, it simply never happened - I never found the time. Sharing a single doc of basic bullet points seems to work. Each meeting, make a new section at the top with the date. Review your previous notes at the start of each new meeting. Make new bullets. This provides a useful shared track record, and gives to opportunity to build forward. Actually, this would work pretty well for restaurants too.

Read

Identity and Endorsement

_Verification_ is a feature common to Facebook, Twitter and several other social media networks. The problem it was created to solve was to differentiate between Jan Smith the famous actor and Jan Smith, the absolute nobody who wants to build social capital off the reputation of Jan Smith. Or indeed, to attempt to sully the social reputation of Jan Smith. All the "real" Jan Smith needs to do is provide proof of identity and their username, and the networks will add a reassuring "tick" to let everyone know who is real and who is not. Twitter is one of the few networks that still allows anonymous and parody accounts to be created, so I believe the risk of confusion and impersonation is greater. As with basically every other feature ever, the problem is more complex than it first appears. The "blue tick" is now seen as an endorsement. This mixes the reputation of the network with the reputation of the verified user. Additionally, since it appears to be an endorsement, and is of limited availability, the award of the blue tick is greatly prized. There is always someone in my DMs asking if I can help them get verified (I cannot). Once a user has a blue tick confirming their identity, what does that mean? Can they then change their photo, their name, their username, their bio? And if they do, what does the tick mean? In a world of seven billion individuals, who gets to decide who the "real" Jan Smith is? How does Smith apply for verification? What if a Jan Smith has already been approved for verification? What if the individual lives in a place where they do not have a way to prove their identity (a surprisingly common issue)? Verification is a combination of **identity** (proving that your name is what you say it is), and **credibility** (showing which person with that name you are). If I were to try to define such a system, I imagine I would try to build my qualification criteria off existing systems. Perhaps I'd leverage passports, government ids, and credit cards for identity. And for a measure of credibility I'd look to large reputable organisations like newspapers (for journalists), sports teams, religious institutions, elected government officials, and so forth. Beyond those, I'd try to establish a measure of credibility in their field, based on mentions in the press, or wikipedia. Without a measure of _credibility_, we might be asked to verify every single "Jan Smith", which would undermine our solution to the original problem: how do I find "my" Jan Smith? The trouble with mesuring credibility based on external factors is that we will accumulate all the biases of the original sources. If men are more likely to get published in the press, and more likely to apply for verification, they're doubly likely to be approved. Have a think about how we could qualify the credibility of someone in their field. Perhaps you would base it on the approvals of a given amount of other network users? LinkedIn does that for "skills". But that factor could easily be abused; you'd need to account for that in your model. And remember the seven billion people problem: we can't employ a private investigator for each and every applicant. It's a difficult problem. In my opinion the best solution (not my idea, widely discussed already), is to split the confusion of identity verification and endorsement of credibility. Identity verification should be widely available, and networks could even tell us which method they used, eg "Facebook has seen copy of US Drivers License". Endorsement of credibility should still be left to others. Allow people to link to official websites and crawl those websites for usernames to be verified. When approved, add the website to the user's profile. This means the networks need only vet official websites for validity. Obviously this is still not trivial, but might be more managable. This takes time and resources, both of which are carefully managed at any social network, regardless of size. I don't expect to see it fixed soon. One final problem to consider appeared last year. My friend contacted me on behalf of [@dog_rates](https://twitter.com/dog_rates), asking if I could help with verification (I couldn't). There were plenty of impersonating accounts, so I can see why it makes sense. But how do you verify a user without an identity?

Read

Losing control

Twitter added a new control to the home timeline recently: "latest tweets". It's a toggle that let's you switch between "Home" (ranked or algorithmic timeline) and "Latest" (reverse-chronological list). The algorithmic timeline has always been controversial, both inside the company and out. To account for that, it was rigorously tested over months and months. The results are unequivocal: our users like more tweets, read more tweets, and use Twitter more, when they have the ranked timeline. For new users who have subscribed to a few spammy accounts (for example, news), the ranked timeline reduces the effect of the overactive accounts, letting the quieter accounts show through. For power users, the ranked timeline helps cut through the many accounts you follow to see the more valuable tweets, surface new accounts to follow, and conversations you're interested in. If the ranked timeline is better for everyone, why is it controversial? Why can it be so unpopular? It's a long-standing issue at Twitter that we don't really know what the magic is. We know why people use Uber: they need to get from A to B. Why do people use Twitter? A whole bunch of reasons. Why does it succeed where others did not? Unclear. One of the most successful initiatives in recent years has been defining the role of Twitter: to provide news. This doesn't mean that we're dismissing other uses, simply that we can focus on optimising for news. One thing that distinguishes Twitter from news tv, from newspapers, from Facebook, from Google, is the ability to choose who you follow and control your experience. You know why you're seeing the tweets on the screen: you chose to follow those accounts. For the ranked timeline, this is no longer true. You've lost control over the experience. And with that, you've lost _ownership_ in the product. Previously you were free to make the Twitter experience your own. Now, someone else is changing that. It doesn't matter if the tweets are better or not, the feeling of ownership is lost. Was that part of the magic? Is that why Twitter added a toggle for "latest"? Actually no. We added it to recognise that sometimes you are following a live sporting event and need the tweets to be in chronological order. That's why the toggle resets after a few hours; when the event is over. Twitter isn't alone in pushing a feature that claims to know what you want better than you do. Apple famously design their hardware and software without user research. They hire experts and want to solve the issues users haven't thought about yet, not the issues users are talking about (which tend to be top-of-mind). The removal of headphone jacks, or the keyboard, or the touchbar, are classic examples. Facebook's News Feed has been ranked for a long time, which has led to accusations of intentional (or unintentional) political (and emotional) manipulation of its users. Apple and Facebook are some of the most successful companies in the world, suggesting that taking control away from users does not hurt the bottom line. It'll be interesting to see if this continues to be a winning formula, or whether new competitors offering to return a sense of ownership in the product will win through.

Read

Shipping

After months and maybe years of stress, meeting, late nights, bug reports, dogfooding, requirements changes, dependency changes, management changes, user testing and actual coding, you're ready to ship your significant rebuild. What happens next? There's often some kind of anti-climax at the launch. If your site has existing traffic, you can't just flip a switch. You need a/b testing, holdbacks, gradual rollouts, comms. It can take a lot longer than expected. With an established software development team, you've probably pivoted the whole team away from the day-to-day mission, to focus on this rebuild. Managers will be familiar with [Tuckman's stages of team development](https://en.wikipedia.org/wiki/Tuckman%27s_stages_of_group_development): forming, storming, norming and performing. With any luck you'll be a fast efficient team in the performing stage of the project. Indeed, the smaller tasks near the end, where the team are very familiar with the code (because they've just written it), can feel the most productive. Your team has been focussed on a mission. The mission was the project, and the project is nearing completion. But what is a team? It's a group of people with a mission. Without a mission, there's no team. In 1977, Tuckman added a fifth phase "adjourning" to the model, recognising the end of the team, and to allow for a cyclic process for teams. What might that phase look like? **Folks will leave.** Either the company or the team. It's likely many have stayed to complete the project anyway, and you'll have seen unusually low attrition for the last third of the project. That will catch up. **External stakeholders will come knocking.** It's likely that you had to freeze some features or push back on requirements for the duration of your project. Those external partners will be expecting more attention, likely before you've even shipped. A human aspect is that they felt left out of your project launch and expect compensation. **Management will reorg.** Rather than leave the high-performing team alone, senior management often sees this as an opportunity for a reorg. Like the external partners, they've seen slow responses to requests and received pushback on their requirements because of the long project. Perhaps your team has been left out of previous reorgs to keep the project on track. Senior management will be looking to normalize the team with the rest of the company, possibly simply by disbanding it. From the point of view of the team, all of this can be stressful. The team likely has a significant backlog of technical debt which was taken on to achieve the deadline, there'll be an accumulation of key knowledge in the team which should be codified, there'll be bugs to address, there'll be cleanup tasks for the previous systems, especially if they're still running in parallel. With the team's future in question, it can be hard for the team to focus, especially when they believe they should still be celebrating the launch. To make this easier, I have some ideas: Get closure - Celebrate the launch. - Look for signs of burnout and manage it. Let folks take extended vacations and reassure them that they'll be welcomed back. - Be honest about the end of the project and how you plan to address the wrap-up work. You should have that work fully scoped before you launch. Don't make it a six-month documentation sprint. Prepare the next phase - Compensate the team. _You now have the most skilled team in the current codebase that there will likely ever be._ Increase salaries rather than giving spot bonuses. It'll be more clear that they're valued after the project rather than being compensated for work done. - Be honest with your manager if you're considering leaving the company. It gives them a chance to either offer you more opportunities, or at least to manage the transition early. Start the next phase - Consider a new mission but be careful how you discuss it - allow the team the time to enjoy the launch but let them know there will be meaningful work afterwards. - Consider the team at the forming stage again. New processes need to be established. Stakeholders should be reviewed and reconnected. The acknowledgement of the "adjourning" phase allows the group to respect the end of things the way they were, and to move onto the next project.

Read

Removing cookies

Cookies are hard to manage. As you'll know, the cookie API is ... _[infelicitous](https://tools.ietf.org/search/rfc6265)_. You can set a cookie like so: ``` > document.cookie='name=value; Path=/; Domain=kenneth.kufluk.com; Max-Age=1'; ``` When a cookie is set by the server, it uses a similar format, in the "set-cookie" header of the response. Reading the cookie back just gives you the serialized name/value pairs: ``` > document.cookie; < "name=value;another=value" ``` When the cookies are sent to the server in the request they're also just the name/value pairs. What this means is that the full metadata of a cookie is never available, to the client or the server, except when it is set. If you want to delete all the cookies for your website, this is tricky. You can delete a cookie by providing a new cookie if the same name with an expiry date in the past. However, cookies are partitioned by domain and path. If those aren't set appropriately on the deletion, you won't clear the right cookie. (There is a new header "[Clear-Site-Data](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Clear-Site-Data)" if you want a nuclear option, which will instruct supporting browsers to remove all cookies.) Given that we need to set a cookie to delete a cookie, we will need to know the name, domain, path and other meta data of the cookies to be able to delete them. We only know the names of cookies on the current page (ie, based on the current path and domain), and the path/domain/meta info for those cookies is not available. In other words, it's important to issue your cookies appropriately in the first place. **Always set the Path to root (/)** **Always set the Domain** If our cookies can expire within a reasonable timeframe then there's little need for deletion. A common solution is to use session cookies. If you don't specify an expiry or a max-age when setting a cookie, it's set to the "session" and is deleted when the browser window is closed. In an old browser context where we had one site per window, this made sense. However, modern browsers are able to preserve and restore pages, windows and tabs even after a reboot. The session cookies are no longer short-length, they're indefinite length. During a recent survey of cookies at Twitter, we observed session cookies in requests that we hadn't issued for _three years_. > There are two different kinds of cookies often called "session cookies", > which can be confusing. I'm using it here to mean a non-persistent cookie > that should be removed at the end of a "session". It is often also used > to describe a cookie that contains a serialized set of values. **Always set the Expires or Max-Age header** (_max-age is not supported by older browsers, so expires is still prefered_) Given that we now have cookies with are properly issued with expiry dates, we should question what a reasonable expiry time would be. Let's consider a couple of examples. We show a tooltip pointing to a new feature of the site. When a user has dismissed that tooltip, we issue a cookie so that the user doesn't have to see it again. We don't ever want the user to lose that cookie, so we set the expiry time to "infinite", which our cookie library helpfully sets to the year 9999. In other words, eight thousand years from now. > Every cookie you set with be attached to every request. While small, > these bytes can add up over time, and can cause issues. We call > this _request bloat_. An issue we see at Twitter is when the cookie > size, combined with other headers, exceeds the limit of our http > framework. These requests are immediately rejected with a status 431, > which is bad for users, because they won't know why the request failed > and won't be able to submit a similar request until they clear out some > cookies. Another example is your login. When you log in, we issue a cookie representing your credentials. The cookie allows you to make subsequent requests as that user. We set that cookie for a month. Let's consider those expiries against expectations of a user. If you take a vacation for a month, then reopen your laptop, would you expect either of those cookies to disappear? Probably not. Are you likely to be on vacation for 8000 years? Probably not. I think we could set realistic grounded values here, based on common sense. If you leave your computer for more than three months, it wouldn't be too much of a hassle to log in again. If you saw an educational tooltip 18 months after you first saw it, that mightn't be too annoying (assuming the code is still in the site). However, cookie expiries don't work that way for the login cookie. It's not measuring the time since the cookie was last used, it's measuring the time since the cookie was issued. The best solution here is to keep a rotating value managed by the server. Store the login cookie value in a table on the server. If it's seen and it's more than 30 days old, issue a new one. If it's seen and it's more than 90 days old, consider it expired. By checking the expiry on the server not the client, we can set the cookie expiry to anything reasonable over 90 days. **Consider 18 months a maximum lifetime for your cookie** **Manage login expiries on the server side** **Refresh/reset cookies that you want to keep longer** If your site has lost its login cookie, it might find itself in a bad state. Maybe you have cached content in the serviceworker, in other cookies, in localStorage, in indexedDB. In this case, we have historically cleaned up the user storage as if they had been logged out, but we found this caused problems for users, where they were unexpectedly cleaned up. It turns out that some privacy-protection browser extensions can strip cookies from the first request. A common example is Privacy Badger, which strips the cookies from the page requests [if the page is served by the serviceworker](https://bugs.chromium.org/p/chromium/issues/detail?id=946908). As these extensions are common, the developer should guard against them by checking login via XHR and either refreshing the page, or popping a dialog asking for advice. Since cookies are a distributed store of data that is hard to read from and manage, it's important to be careful about which cookies you issue and when, and try to limit those cookies as much as possible. Storage such as localStorage should be used in preference, where possible. Cookie spec: https://tools.ietf.org/search/rfc6265

Read

CSRF stories

I think I first learned about [CSRF](https://www.owasp.org/index.php/Cross-Site_Request_Forgery_(CSRF)) at an "@media" conference talk in 2008 given by Simon Willison ([Twitter](https://twitter.com/simonw)). In the middle of a Django presentation, he explained how it's possible to make a form that will post across to a different website, and that when you do so, the appropriate login details are also attached. This lets one website do _anything it likes_ with the user's credentials on another website. "This," he explained, "should make you think 'oh shit'". "Oh shit" I thought. The solution to this problem is a csrf "token". The idea is that you add a new field to all the forms on your website. The field is prefilled with the token: something that only your own website would know. So Simon's website has an evil form that posts { text: "I like marmite" }. But all the forms on your website post { token: 'secret', text: "I like vegemite" }. And you check the token before you accept the data. This works perfectly, and it's left to the developers to figure out a way to generate, distribute and check the secret. At Twitter when I joined we were using Ruby-on-Rails. It had a common library providing the form token, which I think it called "form_authenticity_token". However, the project we were working on was the "Follow button": a button that we wanted people to place on a good percentage of the entire internet's web pages. The traffic would be immense, and our Ruby-on-Rails system couldn't handle the load of serving those pages with their tokens. Our solution should have been Scala, as is well known, but at the time our first Scala service wasn't ready for prime-time, so we only had Rails to work with. So we generated the token on the client. None of us had tried that before, so we were a little dubious, but we ran it past our security team who seemed satisfied by the idea. For clientside CSRF, we just generate a random string using JavaScript and append it to both the form AND the cookie. While Simon's evil form can submit any form fields it likes, it's unable to set or read cookies from another site. This worked well for numerous years. More recently our security team moved our primary website off csrf tokens and onto Origin headers. These were more recently supported in browsers, and let you know the original domain of a request. If the origin is "twitter.com" then we can be pretty sure the request came from Twitter. This worked well until recently, when we received a bug report that the csrf protection was failing. This was hard to believe - nothing had changed for years in that code, and all of our focus was on the new website. Sure enough, the new website was the cause of the problem. The new website installs a serviceworker that intercepts requests, which means that it can cache pages for offline or performance reasons. Sure enough, it was intercepting Simon's evil form post and proxying it back to the server. Since it intercepted the request and issued a new one, it added the origin header for the serviceworker itself, completely defeating our old website's origin header csrf protection. The solution was simple: check the [mode](https://developer.mozilla.org/en-US/docs/Web/API/Request/mode) of the request before proxying it. I've recently been playing with CSRF again for the new website. The new website depends on a CSRF cookie which is provided by the server. If the CSRF cookie expires (it has a limited duration) the API requests will fail, but they will issue a new cookie as they do so, and our code would retry the request. Something I'd noticed is that the retried requests would often still fail, which should be impossible. Cookies are inherently complicated to handle. While the concept is simple, the api in the browser is not. There are volumes I could write on the inadequacies of the model. Happily, there are some beginnings of an idea to improve them, but it will likely take years to roll out. A lot of the problems stem from legitimate attempts at privacy protection: a browser or extension might hide or block a cookie it considers unworthy. Sadly this can interfere with security protections like CSRF, and there's no way for developers to detect or override them. After a few failed attempts to crack the problem without success, I turned back to clientside token generation, something I hadn't looked at for 6 or 7 years. Is it still a valid option? I found that I could easily generate a random string that would be accepted by our serverside CSRF validation. In code review, my colleagues asked whether a Math.random-based solution would be sufficiently secure? My initial answer was that it shouldn't matter - this isn't a login credential, it's just a random string. Upon investigation I discovered this confidence was misplaced - in several browsers it's possible to reconstruct the state machine of the random number generator, and by asking for a few random numbers you can perfectly duplicate the machine to be able to predict the next number to be generated. Mind blown. I switched my random string generator over to [window.crypto](https://developer.mozilla.org/en-US/docs/Web/API/Window/crypto), a new feature for generating cryptographically secure numbers. In production, this seems to work. When I find the cookie is missing or expired, I simply generate a new token and cookie and the requests succeed. While I struggled to reproduce the original retry-failure, I could mock up responses using [Charles proxy](https://www.charlesproxy.com/), and observed the drop in errors when my code was deployed. https://twitter.com/wongmjane/status/1151676723064295424 There was a spike in new errors, seemingly only from Xiaomi phones and tablets - they don't support window.crypto. I now let them fall back to the original retry logic. We'll still see some errors, but significantly fewer than before. If you find a CSRF issue on the site, please do report it responsibly. The best place is [hackerone.com/twitter](https://hackerone.com/twitter), where there's also the chance of rewards and recognition.

Read

2018: What I did this year

My annual review of the past year. One thing I said I needed to do last year was work on my shoulders. I get a lot of pain in my shoulders when I sleep. I can’t sleep on my right side because my right shoulder just starts hurting, I can’t sleep on my left side because my right shoulder starts hurting, so I’m sleeping on my back which means I’m snoring (and sometimes my shoulder hurts). I looked into getting a new mattress. We’ve had this one since we got engaged about ten years ago, so we’re due. But, we have a British bed, and sizes here are different from there, and there’s quirky weird laws around exporting mattresses, so it’s problematic. Haven’t done it. So I opted for some physio instead. I got my shoulder MRI’ed, which showed nothing, and the doc recommended physio, but that’s too complicated to set up. So I’ve just been doing some pushups before bed each night. And sometimes lifting some small (12lb) weights. It has really helped! I can feel the effects when I lie down, and there’s no pain in the morning. I’m proud of myself for keeping that up all year. Another thing I said I’d do is stop getting older. But, sure as clockwork, I turned 40, so that’s a fail. I’ve been trying to keep a little fitter, but I’m getting no younger. To celebrate the occasion, I invited the folks from work round to my house and we had an awkward social event in the sweltering 100F heat of Walnut Creek for my birthday. It was good. Speaking of work, I’ve settled into being an IC. Coding is just much easier than working with people and I need easy right now. I’ve got this feeling that I’d like to switch back into management at some point, but it’s not imminent. I’d rather improve my coding skills for now. Last year I started off working on Settings. My thinking was that they’re the unglamorous part of the website and as we rewrite and refactor the website they’re the most likely to be forgotten. I did a lot of work to assess what needed fixing, both from the API and the website, and build out what was needed. The next big project was GDPR compliance, and the settings work turned out to be essential for that effort, so it turned out really well. The new API endpoints I’d made could be used by the native mobile clients as well, so now (finally) you can set all your settings on all the clients! I also worked on a privacy/security problem during this time, for which I also wrote a blog post on the twitter engineering blog. That was a fun project to get into and learn about. For another GDPR project, I was asked to look at cookies across the company. As a long-time employee, I figured I could use my historical knowledge here. We had a huge log of 600k different cookies that were sent to Twitter, which needed classification and ownership. We managed to find owners and roll out the allowlist filter at the front of the service without any incident. It was surprisingly smooth. A small thing I was proud to do last year was Text Size. As the team worked on various accessibility fixes, I realised that I was struggling to read the text on the site myself. My eyes are pretty good - I’m just getting old. So I added a text size option. I’m not sure how many other people are using it, but personally I find it invaluable. 🤓 Enough work We took a month off and went to England! We stayed with my parents in their new house in the sticks; we journeyed up to the cousins in Melton Mowbray, we took a short trip to Poland, we went to London to see the Harry Potter play, we went to Waldringfield to mess about in boats and camp in tents. It was a lot! Probably too much. It was only our third trip back since we moved here eight years ago, and the temptation to do everything is too strong. I’d like to do the Waldringfield part again, maybe just with Jack, because it’s a part of my childhood I really value. We bought a new car. The old one was only five years old, but it was having battery issues and getting tetchy. The car had electrical issues from day one, and honestly I’m happy to see the back of it. We looked at different options and settled on something similar from a different manufacturer. The biggest shock of last year was losing friends. I found out from tweets that Cindy had died. It was the kind of shock that just stops you dead. I couldn’t keep working - I just went home to tell my wife. We met Cindy through Matt when we first moved out to SF, they were incredibly friendly and helpful - we still use their leftover cutlery. We’ve not been in touch for years, so didn’t know she was sick. I still can’t quite come to terms with it. And then we lost Dave Goody. He ran the coffee shop in Fort Mason, which we stopped at every day when Jack was a baby. The place was never too busy, so he always had time to give us a warm welcome and a chat. He marked Jack’s height on the wall as he grew. We’ll always remember him fondly. And we also discovered that my favourite great aunt had terminal cancer too! What is happening? I thought this kind of thing wouldn’t hit for another 20 years or so. I thought we were still in the “everyone’s having babies” phase that follows the “everyone’s getting married” phase. I miss that phase. Next year: - take Jack back to London - try some new things - write some blog posts - work out if nine years at Twitter is something I really want to do O_O

Read

Blog Update

Ah, shit. It’s the end of May. I’d meant to write a blog post at around the end of last year, when I provide my typical end-of-year update. But I didn’t. I should do that. I had this smart idea of trying one new thing every day for each month. I knew that one of those new things would be blogging, so I delayed 😱 my end-of-year post until then. But it hasn’t happened yet. First month started well. I did Keto. The next month I needed something a little easier, so I got myself a watch and decided to wear a watch every day. The third month I was considering the blog posts, but I got really sick at the start - so I decided not to shave for the month, which required no effort. Negative effort really. And then it was April, and I still wasn’t ready to blog, and I didn’t have the energy to do anything else either (sick kids and wife can really take it out of you), so all of a sudden it’s the end of May and I’ve got nowhere. Bah. Topics I was considering: - What I did this year (2018) - Refactoring Twitter. Some things I’ve noticed over my 8+ years working there. - Thoughts on the trend from internet companies to give you what they say is good for you, rather than what you want. - A (probable) rant about how tests both set you free and hold you back. - Some thoughts on customer service for small teams within large companies. - Monorepos and source dependencies - a simple summary of what that means and why. - Some thoughts on skills and expertise of leaders. None of these are particularly well formed in my head, but they’ve been running around in there for a year or three. It’ll be good to set them down in type. I write this blog more to compose my own thoughts than to provide reading material for others, and I’ll continue to do that. _Update:_ Ah shit, it’s August already. Time to publish this thing.

Read

Microservices and websites

You build a startup quickly, scale the tech as far as it goes. You’ve got one codebase, and one scary-as-hell deploy. You’re known for failure at least as well as you’re known for your product. So you break up your service into microservices. Each can be deployed independently. Each can be owned by a team who understands the whole of the thing, and so the changes therein. Responsibility and accountability are delegated to those teams. It’s a proven pattern. And very successful. At Twitter, we run dozens of microservices. We have macaw-users, serving the user information apis. We have macaw-tweets, serving tweets. We have macaw-timelines, serving timelines. It all makes a lot of sense. The tweets service can roll out a deploy, observe issues with their endpoints, roll back and fix - all without bothering the other services. This is an ideal way to scale an api. Your mobile apps can use this api. Maybe with JSON, or something more efficient. If an endpoint fails now, a single team is paged and the service recovers. Users might see a momentary error with some actions. The mobile app teams needn’t even know; they worry about their own code deploys, not the reliability of their apis. A website is a different animal. One web page is the combined output from many of your apis, many of those services. And for speed of delivery, you are rendering those web pages on the server. For server-side rendered web pages, the output of your apis is combined inside your own datacenters. This is happening on your own servers. The microservice model starts to fail. If a service you depend upon is experiencing issues, your web pages are seeing that issue and throwing error pages themselves. Your service may fall below SLA before theirs does. Suddenly, you’re a single point of failure, and so primary oncall for all microservices. Farewell sleep. It gets worse. Since your service is inside the DC, you use a more efficient api layer with compile-time dependencies. When you compile your project, you need to build every other service. Of course your build system has caching, but what does that matter when every backend change forces a recompile on you? It gets worse. The other services start to focus on mobile. They build business logic into the json formatter. It’s not as crazy as it sounds - of course they use shared objects for json. And checks at the edge are the simplest. But now you’re missing them on web. And, oh, resources are tight. Build it yourself or miss out. It gets worse. Your integration test suite needs fixture data in code. But the mobile clients just depend on manual QA and json fixtures. Effective integration testing falls away, and deploys gets even more scary. Development grinds to a halt. This is where we were. Let me tell you where we went, how we made things better. We shipped a PWA. A progressive web app. The app is entirely rendered in JavaScript in the browser. We use the same apis as the mobile clients. We’re a mobile client. We can write just one language: JavaScript. We can write, deploy, test and be oncall for our own app, and stop worrying about others. We can use the new api formats and features as soon as they’re ready. We did it. Or did we? Another route we could have taken is to rethink this from the perspective of the user. To the user it doesn’t matter whether an api call failed in the client or the server. A failure is catchable and can be well handled. We had an SLA on the server because we could easily measure it. Should that be different on the client where we can’t? There’s no doubt in my mind that our service could have been coded more defensively, mitigating the vast majority of pages and alerts. So why did we not? Partly because our service isn’t built on a robust web framework. It’s a migration of our ruby-on-rails system to Scala. This was necessary for the complexity of a migration while operating at scale. The framework is optimised towards api development, which again makes sense, as most services are apis. It’s logical that our graphs, monitoring and alerts are also configured this way. APIs are not websites. What’s more, it’s because websites are built with static assets (JavaScript, CSS) that don’t split well against service-based concerns. So our website could not be fragmented into micro web services without some mismatch in the asset versions delivered by the various services as they deploy. And finally it’s because web engineers have a tricky enough job writing good quality modular scalable robust crossbrowser user experiences. We should be allowed our focus and not have to worry about the intricacies of server code in a foreign language. Try hiring a CSS developer with Scala skills. I’m sure Google has some. Good luck. So we went PWA and we’re not going back. Was it necessary? Maybe, maybe not. Was it worth it? Hell yes. I shall sleep well tonight.

Read