Corwin’s Tenth Law

Any sufficiently complicated distributed system program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of Raft.

With apologies to Philip Greenspun.

Posted in computer science, jokes, software | Leave a comment

Against Flowers

One of my deepest, oldest, most cherished and firmly-held feminist beliefs is that flowers are fucking bullshit.

Flowers are a tool of the patriarchy, to render men bumbling, useless fools and women placid receivers of meaningless tokens. Consider the transaction in its standard societal narrative: a man has need of giving a gift to his female romantic partner; usually either because he behaved badly or because some trite Hallmark holiday has arrived, though sometimes “just because.” What does this man get his partner? Flowers. This is a function that takes two people (a man and a woman) and ignores both of them! There is no thought. There is no generosity of spirit. There is no investment of time, or consideration, or even finding a fucking “this season’s hottest anodyne gifts” blog post. Flowers. He buys flowers.

Next what happens: The woman receives the flowers. She is very pleased that, after forgetting to pick the kids up from soccer for 6 hours, or whatever, her partner has decided to apologize via a boring script that requires her to, firstly, accept the apology, and, secondly, take these dead plants, put them in a vase, fill it with water, and find a nice decorative place to put them. Now, after giving this “gift” that cost the man $15 and the woman a bunch of labour she didn’t ask for, she gets to stare at the limp token of apology with no true remorse or culpability until it dies a few days or weeks later. Who got the better end of this deal? The man who is allowed to follow a socially-ordained script to get out of anything, or the woman who has to keep these things in her home, because throwing out these unasked-for tokens of a previous row before they rot in her living room would surely lead to yet another?

I admit, sometimes a man buys a woman flowers merely because it is Mother’s Day, or Valentine’s Day, or whatever. In those cases, you can elide the various narratives about remembered pain of the previous paragraph: and yet, the woman here has not received any affirmation of who she is, or what she wants, beyond “women like pretty things, and flowers are female-coded, so here’s some dead plants, make our house prettier for me, won’t you?” This idea that women — all women — should adopt the standard of “flowers are an acceptable gift for nearly all occasions” is an affront to the infinite varieties of women, and of womanhood, and of fucking people in general.

It is, in fact, also an affront to men! The idea that men are so stupid, so bumbling, as to be incapable of actual thoughtful gift-giving, is offensive! Men can be perfectly good at giving gifts and apologizing (and picking up the kids when they were supposed to!) as anyone else. Or, perhaps I should say, they could be, if we didn’t have this fanciful, delusional narrative permeating our culture, and our rom-coms, and — well, to say it’s “permeating” Hallmark would be generous. Presumably you know something more about your partner than “she is female.” Buy her a book! Buy her movie tickets! Buy her some sweet new shoes! Buy her some flowers if and only if she, herself, personally, specifically and actively likes flowers, goddammit!

And if you must stoop to giving flowers for all occasions, at least put them in the vase yourself. It’s the least you can do.

Posted in dating | Leave a comment

You Are Extremely Bad At Thinking About Tragedy

Today, some asshole killed a shitton of people in Las Vegas. Fifty eight deaths, at time of writing. This sucks. Like, 58 deaths is a lot, and this is a horrifying tragedy worthy of news coverage.

Let’s talk about some reaction-clusters to this that I’ve been seeing on Twitter. First off, we have the gun control cluster. This is where conservatives say “don’t talk to us about gun control today, give us time to grieve,” and liberals point out that we have a mass shooting literally every day in America and so no, we won’t, we need better gun control laws. You can rest assured that tomorrow’s mass shooting will have fewer casualties and thus receive less media attention. Something else bad, or funny, or weird will happen and everyone will yell about that and not talk or think about gun laws until there’s a media-worthy massacre a couple months from now, we’ll recite the same positions and come away with nothing but a reinforced loathing for the “other side.”

Next, we have the terrorism cluster. This is where liberals say “if this dude weren’t white, he’d be called a terrorist!” and draw comparisons to headlines about white shooters vs black victims. Conservatives seem to mostly say “this is a gauche reaction to a tragedy,” or point out that non-ideologically-motivated murder doesn’t quite count as terrorism, or whatever. In any case, tomorrow everyone will stop worrying so much about who is and isn’t a terrorist, because something else bad or funny will happen, and we all come away with nothing but a reinforced loathing for the “other side.”

We also have the thoughts-and-prayers cluster. This one is conservatives retweeting politicians who say “our thoughts and prayers go out to <latest victim>” or send out their warmest condolences. Liberals fire back about how thoughts and prayers don’t help anything, and politicians should be increasing gun control laws if they actually think and pray about the tragedy. I hope you know how this paragraph ends.

 

My point here, insofar as I have one, is that none of this is principled, meaningful discussion, nor is any of it meant to fix things. This is just a hysterical reaction to a flashy event. Yesterday everyone was having these same reactions to Puerto Rico, but now something bad happened and we have to shout about Las Vegas instead. Yesterday it was, “if you care about the tragedy in Puerto Rico, maybe work on climate change?” Tomorrow it will be “if you really care about the tragedy in <new area>, maybe work on <arguable source problem>?” And I’m sure everyone genuinely feels like they’re right about this, but today we care an awful lot more about gun control than we do about climate change, and that just doesn’t make sense if what’s happening is a genuine, principled desire to reduce death and suffering.


 

In 2015, there were 96 vehicle deaths per day. The CDC reports 121 suicide deaths per day in 2016, and that’s far from the worst killer on the list. About 1500 people die from malaria in Africa, each day. If you care about improving the world, about reducing suffering and unnecessary deaths, you have to think about the actual numbers, the actual causes of death. We should be willing to accept a Las Vegas shooting-level incident every single day if we can cut the national suicide rate in half. We should be just about willing to trade off one Pulse shooting a day in return for halving the vehicle deaths statistic. And if we can halve the impact of malaria in Africa, most victims of which are children, we should be willing to accept quite a large number of Sandy Hooks.

It’s important to have advocates for even smaller causes. Radically restricting gun access in America is probably low hanging fruit that could save a lot of lives with relatively light investment. Investments in self-driving car technology are likely to reduce vehicle deaths; this is kind of a long shot, but it’s good that someone’s working on it. Terrorism has a pretty low body count, at least in America, but it has other negative effects and deserves people working to curb it (I’m not going to speculate on methods here). Heroin overdoses kill around 40 people a day, which is plenty to be worthy of attention.

But unless you’re laser-focused on one of these smallish causes, or have extra leverage towards it, or can only motivate yourself to speak up/donate/work towards improving things you care about personally, they’re not what’s important. You should be advocating for more cancer research, for more malaria prevention, for less cigarette smoking, for the big causes. Stopping to worry about a few people murdered by an asshole with a gun is time not spent worrying about the things that cause extraordinary, large-scale death and suffering. Donate your time and money and advocacy for the important causes, or else accept that you’re experiencing an emotional reaction wrought in you by the media, and don’t act like your response to tragedy is principled.


 

Please don’t search for a way to read this as me being callous or heartless. I am furiously angered by people being unable to deal with the world’s actual, horrifying tragedy and instead myopically focusing on flashy tragedy with emotional pull to it. In absolute terms, malaria is more fucking important than gun control, because more people die because of it. I wish very much that there were not mass shootings, and I am very sorry for the people affected by the Las Vegas shooting. But I am very sorry for the many, many more people who will die of other causes today, as well, and ultimately we have to allocate our resources where they can do the most good.

Posted in morality, politics, things i will regret | Leave a comment

Creating a Sentient Mind Can’t Be That Hard

It just can’t. A conscious mind might be harder, but sentience is just a very low bar. It seems to me that there has to be some insight we’re missing, because, empirically, the problem cannot be particularly hard.

What’s the least-complicated (or least-intelligent, or least-conscious, or whatever) animal that is sentient? Certainly my cat is sentient. Rats and mice and so on seem obviously sentient as well. I’m not an animal expert, so let’s go with rats, and if you know more than I do, substitute a simpler animal if you can.

So rats are sentient. Think about that. Every single rat is sentient. Every single rat going back tens of thousands of years has been sentient. That’s a shitton of different sentient brains, none of them particularly complex! That’s billions of slightly different arrangements of neurons, some for the better and some for the worse. That’s thousands of rats that we poke in the brain for Science™ that remain sentient. Smart rats, stupid rats, diseased rats, rats with birth defects, rats suffering neurotoxicity, malnourished rats, overfed rats, and on and on. All those brains, assembled sloppily from wetware, every last one sentient.

I assume you accept that this at the very least establishes that sentience is a goddamn robust property. If you take a sentient brain and make any random smallish perturbation to it or to its design (in the case of genetic blips), it remains a sentient brain (or design therefor). Which implies that once we’ve got our hands on a brain design, designing new, different brains will be extremely easy (although directing that design towards useful ends may prove to be difficult).

I aver that this also establishes that sentience is a low bar. Sentience doesn’t require anything particularly complicated, and it is extraordinarily robust to errors. If we imagine “things [like neurons] interacting with each other using signals [like neurotransmitters, or like electrons]”-space, the sub-section of that space that results in a conscious mind-thing is large. Mice have something like 71,000,000 neurons and some elephants have like 257,000,000,000 for some goddamn reason, and it seems extraordinarily unlikely that adding neurons would eliminate consciousness (most added neurons will simply be no-ops, they won’t break anything), so we have a lower bound of roughly 70 million neurons for a brain. And way less than that if you accept that a frog (16,000,000), or a shrew (36,000,000), is sentient. And lower still if you accept that damaged individuals with fewer neurons are sentient. And lower than that if you think that many neurons are not critical to sentience (which also seems obvious; how hard do you have to poke a brain before it becomes non-sentient? Pretty fucking hard, it turns out.*)

So, okay, let’s say… 20 million neurons is enough to create a brain. Sure, that’s gotta be deeply inconvenient to simulate in software, but it’s not that much! We should have figured something out about this now, because there’s billions of mice and rats and so on out there, and they each have their 20-70million neurons in substantially different configurations and they’re all goddamn sentient. So there’s gotta be some kind of core property that’s extremely easy to reproduce and extremely hard to remove that is a subset of all these different animals. It has to be a fairly small core just because the number of neurons is not that big, in absolute terms, and you can fuck it up so much before sentience disappears.

 

WHAT ARE WE MISSING?!

 

* I’m aware that this is a deeply misleading comparison. If you want an actual justification, I would argue that you can probably start chopping out bits of brains responsible for things like ‘automatic breathing’ and ‘muscular control’ and retain sentience. If you can’t, then… what the fuck is even going on in brains?!

Posted in computer science | 1 Comment

Why The Forum Was Better In The Old Days

You may have noticed, if you’ve participated in online communities or forums*, that the old guard always thinks the forum used to be better, and simultaneously the newer members often think that it’s pretty great as it is now. I propose that this is a necessary consequence of having a community.

The Beginning

You want to start a forum or blog or whatever to talk about something that’s interesting to you and doesn’t have a lot of interesting discussion elsewhere that you can find. Let’s say, for the sake of argument, the topic you’re really interested in is Shakespeare’s poetry (and, again for the sake of argument, let’s pretend there aren’t a thousand other places that people like to discuss this). So you start up a blog and you post really interesting thoughts about the Sonnets, and you engage with the academic criticism, and so on. People start noticing that you’re insightful and knowledgeable, and you start getting regular readers commenting on your posts, saying things like, “hey, that was really interesting – it reminded me of this paper I read recently, what do you think?” and in general a conversation is started and a community begins to form.

Then one day, one of your community members links to a blog post they found interesting about Edmund Spenser. Spenser is also an important part of the canon of 16th century English poetry, so you let it slide, and people start talking about Spenser. They bring up interesting points and are friendly with each other, so all is well.

The Spiral

Unfortunately, these people were too interesting, and now you have a regular “Spenser thread” commented on each of your new posts where people congregate to discuss The Faerie Queene or whatever. This isn’t the end of the world — most of the people in your community are still mostly discussing Shakespeare, after all, and if they’re getting to know each others’ tastes a bit and finding more common interests to discuss, isn’t that a good thing?

Before you know it, people who are only interested in Spenser are popping by your comment threads, and they’ve only ever read Hamlet and none of the sonnets! You’re a little miffed by this, but your Shakespeare-focused fans who also like Spenser think they’re adding a lot to the conversations, so you let it slide yet again.

And so on. You get someone talking about John Donne, and you get threads derailed by conversations on T.S. Eliot’s criticism of Hamlet, and then someone starts talking about how great Eliot’s poetry was, and suddenly your community isn’t about Shakespeare anymore! It’s a bunch of relatively like-minded people just talking about poetry in general.

And There We Have It

Now the people who joined at the beginning, who are truly fanatical about Shakespeare, start bemoaning that there’s hardly any good discussion of Henry IV anymore, and who gives a shit about Rembrandt or whatever has come up the last two weeks, and why is this community such a shithole these days?

And the answer is: it’s become a community, and communities are tied together by friendship and conversation and humour and so on, not just shared devotion to a particular topic. The small number of people who really just want to discuss Shakespeare constantly are massively outnumbered by the large number of people who do genuinely love Shakespeare but want to talk about lots of other stuff, too, with other people who “get” them.

So? What’s wrong with communities?

Nothing! But I think this pattern is approximately inevitable unless you take a very strict stance from day one in terms of what people can discuss in your forum, and then you end up with a very small group of people who don’t feel especially connected to one another. I think your choices are basically either to have a small, focused group that is extremely insular and not socially connected, or to have a sprawling, loosely-connected group of people that just starts to feel like any regular community over time.

 

* I imagine this occurs in meatspace communities, too, but I’m going to stick to online ones for this post.

Posted in sociology, Uncategorized | Leave a comment

Rap Quines

A Quine is a form of poetry in which a poem, through formalism or sometimes more literally, states that it is going to state part of itself, and then does so. This is related to the modern programming art form of a quine, which is a computer program that outputs itself when run.

Rap in particular offers some notable quines and quine-forms, some of which I would like to present today. The simplest and perhaps most illuminating example is the following, a construction taken from a not-entirely-respectful caricature of rap to its obvious conclusion; that is:

My name is Alexander, and I’m here to say:

“My name is Alexander, and I’m here to say!”

In this short rap verse, line 1 is a promise to the audience that the narrator has something to say; in line 2 the tension is momentarily broken by revealing what it is Alexander set out to say. However, then we wonder: has Alexander indeed said, “My name is Alexander and I’m here to say,” or is this a promise of future effusion? But we think back to the first line, and realize the narrator has indeed said what he promised to, in addition to promising to say it. Thus the quine is complete, and the effect is one of whiplash between future promises and their already-past satisfaction.

We will scrutinize one more example, this one from popular culture: Taio Cruz’s spectacular self-referential and obviously heavily quine-influenced song, “Dynamite.” In it, Cruz masterfully intertwines promises to act or explain with both previous and past references to performing that act or giving the promised explanation. To start with, the song opens with the following:

I throw my hands up in the air sometimes,

Saying AYO! Gotta let go!

This is already a complex lyrical dance. We have what begins as a simple opening clause: “I throw my hands up in the air,” a relatively standard gesture of celebration. This immediately comes down with the addition of the word “sometimes” — is Cruz throwing his hands up now, and shouting the declaration “gotta let go,” or merely letting us know that he sometimes does? Is this describing an action, or foreshadowing later revelations?

To skip ahead somewhat (Cruz did need to get this masterful formalist poetry on the airwaves after all; some of it is straightforward), we find the following in the chorus:

‘Cause we gon’ rock this club, we gon’ go all night,
We gon’ light it up like it’s dynamite!
‘Cause I told you once, now I told you twice,
We gon’ light it up like it’s dynamite!

The chorus on its own is very nearly a quine! The first line is somewhat out of place without the context of the rest of the song serving as setting, but otherwise this holds its own amongst historical quines. It inverts the normal poetical structure of a quine somewhat: normally, as in the first example we saw, a quine explains what it’s going to say before saying it (or simultaneously). In this case, however, we have a symmetry around line 3, reminiscent of classic presentation advice: first Cruz says what he’s going to say, then he explains very clearly what’s up next: he’s going to repeat himself. And he faithfully does.

Interestingly, this refrain is repeated word for word later in the song. Cruz is trying to get us to carefully evaluate the number of times he’s told us we’re going to “light it up like it’s dynamite,” by saying it for the third time, then claiming he “told [us] once, now [he’ll tell us] twice”, but actually following up by telling us for the fourth time! In one sense, this is inaccurate, but since when is poesy supposed to be factual? He is both telling us another two times, and at the same time in a sense resetting us to the earlier appearance of the lines, reminding us of the cyclical, symmetric nature of his proclamations.

I don’t want to ruin the fun of analyzing all the depth in this poem for you, but let me leave you with a final observation. Harken back to that ambiguity from the introduction: is Cruz declaring that he is raising his hands, or suggesting his intention to, at a later point? While I cannot claim a final answer to this enigma, I will say that whatever his earlier status was, he surely raises his hands during the triumphant bridge:

I’m gonna put my hands in the air!
Hands, hands in the air!
Put your hands in the air!

This is, while not the end of the song, the moment where the potential energy set up at the beginning (and repeated before each chorus) is definitively resolved, allowing the listener to release her breath. Cruz insists repeatedly that he “sometimes” puts his hands in the air. Here, though, it is realized: in another masterful use of three-line symmetry, Cruz is first reaching tentatively, as if unsure of himself, to raise his own hands. In the middle line, his hands are clearly in the air, and in the final line he turns it around on us: now you, too, must put your hands on the air and join in the aura of celebration.

It is worth noting that, enhancing the reflection about the middle line here, the middle line itself can be read as a part with either of the others: “I’m gonna put my hands in the air/ [my] hands, [my] hands [are] in the air!” reads just as naturally as “Hands, hands in the air!” being in the imperative voice, the way a police officer might say it, which aligns with the unambiguous exultation of “put your hands in the air!”

The allusion to police language in a rap song by a black man could not possibly be accidental, but I will leave it to the interested reader to look further into it.

 

 

Posted in jokes | Leave a comment

On Trolley Problems

If you choose not to decide, you still have made a choice

There’s a philosophical question that people seem to struggle with, commonly called the trolley problem, that I never really got the difficulty of until fairly recently. The original form, the one I was first introduced to, is formulated like so:

You are on a train, and coming up on a junction. The tracks are currently set such that the course the train will follow will run over 5 people tied to the tracks. However, you are able to switch the tracks to a side channel. This alternative course will run over 1 person tied to the tracks. Do you pull the lever to change to the alternate route?

This is excruciatingly clear to me: of course you change routes. The problem reduces to “is it better to kill one person or five?” which has a pretty obvious answer. And so I never understood why there’s any struggle over it. (Incidentally, I am deeply suspicious of philosophers finding ways to argue about nonsense, which perhaps biased me to miss important aspects of the question).

There are extensions of the trolley problem that all retain the basic form: should you kill one person, or kill five? One of these I really struggled with, which is a formation like the following:

A train is barreling, undivertably, toward five people tied on the tracks. You are on an overpass that looks over the tracks. You are standing next to an impossibly fat person who, if you pushed them off the ledge in front of the train, will slow it down sufficiently to save the five trapped people, but at the cost of the fat person’s life.

The question — and, thus, the answer — are the same: five people are worth more than one person, all else being equal (and despite the weird contrivances to make this situation seem vaguely plausible, we assume all else is equal). But this one made me recoil and struggle with my own understanding of the problem: imagining pushing someone over the tracks to save five lives makes me viscerally uncomfortable.

This brewed at the back of my mind for some time: the well-understood belief that killing the one to save the five is the correct course of action, alongside the deeply held feeling that it was wrong. Feeling so strongly repulsed by an action you thoroughly believe is morally correct is an uncomfortable thing, and so I tried to understand the disconnect here: surely there is something I am missing (I notice I am confused).

I had a few ideas about this — perhaps the net decrease in happiness caused by the pushing outweighs the net increase of +4 lives, perhaps I am using as a crutch some pre-cached idea that pushing people off of bridges is never the right answer, etc — but they all felt somewhat hollow.

Recently I’ve come to what I think is a more insightful explanation for the discomfort. I think that, as a matter of human nature, a bug in our programming (be it social or neural; irrelevant here), humans categorize “do something” and “do nothing” separately, which is completely factually incorrect.

The idea of “no action” as being a thing a person can take is actually incoherent. You are at all times doing something; what is meant by “do nothing” has to be in reference to a particular situation. I can choose to do nothing about a scenario (not pushing a person off a bridge) but in that case I am merely doing something else (standing around feeling unhappy) instead.

Inaction towards a particular scenario is not a different category from action. It is merely a human perception of inertia, or of some “default.” If I am in the “fat person” trolley scenario, I have two choices: push a person off a bridge, or stand idly by. These choices have to be considered as fully equal in terms of agency. And it is only by incorrectly considering one choice as “nothing” and the other choice as “intervening,” or similar, that I can even start to absolve myself of the moral guilt of the murder of four people.

 

When choosing your actions, do not forget that inaction is a choice on the exact same level as any other choice. When you start to treat inaction as a separate category, you wind up in scenarios that lead to such famous quotes as “The only thing necessary for the triumph of evil is for good men to do nothing” and “then they came for the Jews, and I did not speak out— because I was not a Jew.”

Posted in morality, philosophy | Leave a comment

ORMs model databases, databases don’t model objects

I was discussing the merits of a couple different ways to abstract some data common to multiple models at work, and my coworkers asked some questions that forced me to articulate exactly why I have such strong opposition to things like “abstract models.” My conclusion, basically, is that I think ORMs serve as models of the database, not as arbitrary app-level objects, and should be designed as such.

Let’s make up some models here to demonstrate what was going on. We start with a model for cats, of course, called Cat, of course, that looks like this*:

class Cat(Model):
    # cat-specific
    fur = ChoiceField('short', 'medium', 'long')
    breed = CharField()
    # vitals
    heartrate = IntegerField()
    core_temp = FloatField()

but now we’ve decided we want to model frogs, too (god knows why!).

We have a couple different ways of modeling this. Coming at it from the code viewpoint, you might have something like this:

class AnimalMixin(object):
    heartrate = Integerfield()
    core_temp = FloatField()

class Cat(Model, AnimalMixin):
    fur = ...

class Frog(Model, AnimalMixin):
    jump_height = ...

This is the kind of thing that your average OO programmer would reach for (well, perhaps AnimalMixin would merely be a base class called Animal) in app code. And, in that context, it’s probably even defensible!** But this is an extremely code-centric view of things that basically ignores the database. What happens in this case? What tables and columns get created?

Well, it sort of depends. In Django, anyway, there’s a couple ways to do table inheritance: there’s a way where the base class just adds fields to all inherited classes; i.e. you would have a Cat table with heartrate and core_temp columns, and you would have a Frog table with heartrate and core_temp columns. The other way is to do a bunch of joins; i.e. you have a table with heartrate and core_temp columns, and your Cat class has a foreign key back to that table, but in the code you just have access to cat.heartrate and it doesn’t feel like a join.

These two options both have significant drawbacks: the first duplicates columns across multiple tables, and the second turns innocent-looking field access in your code into joins! It is highly non-obvious from reading the code whether an attribute access is going to reference a column of the object (fast, already in memory) or a join to a different table (slow, goes to the DB unless prefetched).

I always sort of flinch when I see these approaches used or recommended, but I’ve never really clarified my thinking around why until today. The issue is that in both cases you’re trying to hide the database from the programmer, and that is not a good thing! Don’t get me wrong, there’s lots of aspects of the database I want hidden: connection strings spring to mind, spelling out long JOIN statements, etc, but the fundamental structure of the tables is extremely relevant to thinking about your application.

So enter the third way of doing this, which gives you slightly-less-convenient app code in exchange for significantly-more-obvious database patterns:

class Vitals(Model):
    heartrate = IntegerField()
    core_temp = FloatField()

class Cat(Model):
    vitals = ForeignKey('Vitals')
    fur = ...

class Frog(Model):
    vitals = ForeignKey('Vitals')
    jump_height = ...

Now what’s a join vs what’s a field is obvious at first glance. If you call my_frog.vitals.heartrate you know you’re going through a foreign key, and if you just access my_frog.jump_height, it’s obvious that was an existing field. Obvious to someone who has some experience with Django, anyway.

I think the difference is which you think of first when you’re coming up with persistent models: your database and the data in it, or your app code and the operations of that data. I give the data primacy, and I think the code should model that. The alternative approach, where the code should be maximally convenient and the database should be molded to make it work, makes me frustrated. Give me nice, clean, well-organized data and I’ll swallow a lot of code duplication for it.

The ORM is in service of the database, not the other way around, and don’t you forget it.

* All examples made up and not checked for correctness.

** I find inheritance loathsome. Do not @ me.

Posted in computer science, software | Leave a comment

How To DDoS Your Own Website With Python

Have you ever thought, “huh, I wonder how easily I could DDoS myself despite all the hard work of library and framework authors?” In that case, have I got news for you!

In the vein of Uses and Abuses Of Python’s Excessive Dynamism, I have found a way to DDoS yourself via hash collision in a measly 3-ish lines of python!

Background

Since I am an unbearable pedant and this blog post would be about 10 lines long if I didn’t write a bunch of unnecessary exposition, I am going to start out with a brief explanation of hash tables/sets!

Alternatively, you could just read Wikipedia’s explanation.

A hash table is a data structure that is commonly used to implement maps, which you learned about last time. It’s also used to implement sets, which are like maps except, instead mapping keys to values, they just store a bunch of values.

The basic way a hash table works is that we start out with an array of some size, and we consider each index of the array a “bucket.” Then, each time we insert a key (or value, in the case of a set), we take a hash of the key and use that to determine what bucket to insert it in. How a hash is calculated is not important (especially not for this post, as you’ll see!), but it is a deterministic one-way function from any object to an integer. Deterministic means if I ask you for the hash of an object multiple times, it must return the same value, or everything will be ruined for ever.

Once we have the hash, we find the value of that hash modulo the length of the array. For example, if we were working with a backing array of length 10, and our hash was 1238, we would select bucket 8. Then we find what bucket 8 points at, which is generally a linked-list of values of whatever type we’re putting in.

Okay, back up, for concreteness, let’s assume we’re looking at a hash set (not table) of strings. So our backing array is a length-ten array of linked-lists of strings.

That’s better. So we’re at the linked-list that bucket 8 of our array points to. Now we iterate through this list, and make sure that the string we’re putting in isn’t already in there. This means that our string has to be compared with every other string we’ve already put into bucket 8. So every string for which our hash function ends in an 8 will go in this bucket, and we have to compare against each of those every time we add a new string with hash ending in 8.

Now this sounds somewhat scary, right? The whole point of hash sets is that they have O(1) (that is, constant time, also known as “screaming fast”) insertion time and membership test (i.e. “is this string already in my set). But if I have a gazillion strings that hash to something that is 8 mod 10, I have to do a gazillion string comparisons to verify whether or not this particular string that goes in this bucket is, in fact, new or not!

All that is to say that if your hash function has “bad” behaviour (where bad behaviour means hashing a lot of different objects to the same bucket) and/or your backing array is too small for the size of the set you’re working with, all this stuff you expect to be screaming fast will be super slow instead, which is just the worst if you’re a developer. There of course has been a lot of work done by serious computer scientists to come up with great hashing algorithms and heuristics for when it’s time to resize your array to accommodate more objects (strings, in our case), but nothing is stopping you from defining a hash function that is super terrible.

Okay, can we ruin everything now?

Here’s a great way to ruin everything:

class Foo:
  def __hash__(self):
    return 1

Now every time you try to add a new Foo object to a hash set, you’re gonna get them all in the same bucket (this is what we call a hash collision). So just throw a ton of those into a set every time your website loads and your customers will all hate you.

But wait… we can do better than that, right? If we can only make our own objects degenerate, we’ll have to go out of our way to slow our website to a crawl. Couldn’t we improve that somehow? Couldn’t we make our website terrible even in the face of all the hard work computer scientists and framework authors have done to keep things running smoothly?

Of course we can! This is Python! With adequate cleverness, we can ruin anything!

Just stick something like this into your codebase:

  for name, thing in globals().items():
    if isinstance(thing, type):
      thing.__hash__ = lambda self: 1

Now all user-defined types will have horrible hashing algorithms, guaranteeing that anywhere they’re used in sets or as keys in a set, every insertion and membership test will be O(n) instead of O(1)! Take that, helpful, thoughtful library authors!

Another terrible option

Try this one on for size:

class Foo:
  def __hash__(self):
    return 1
f = Foo()
s = {f}
print(f in s)
Foo.__hash__ = lambda self: 2
print(f in s)

You can turn every dict and set in your codebase into a liar in one swell foop!

Merry self-destruction, everyone.

Unfortunate addendum

You can run something like __builtins__.hash = lambda *args: 1, but it does not appear to affect the algorithm that’s used to hash strings. This is presumably some implementation detail of Python internals that we can’t mess with. str.__hash__ = lambda *args: 1 throws an error. Sadly, we cannot ruin the most common type of map, string -> x, so easily 😦

Posted in computer science, pedantry, software | Leave a comment

Common Data Structures You Already Secretly Understand

Let’s talk about data structures that have close analogies to things you already do or know! This post will not be educational or entertaining, I promise. Enjoy!

Linked Lists

What is it?

A linked list is a way of storing related information consecutively, such that you can only navigate forwards.

Where do I already use it?

Most people store the alphabet in their heads this way! You can easily recite the alphabet forwards (it’s even got a catchy tune), but few people can fluently recite the alphabet backwards. That’s exactly what a linked list is like: going forward is easy, going backwards is hard.

Variations

If you can cite the alphabet backwards, you probably remember it analogously to what we call a doubly-linked list. It’s easy to go forwards and backwards (quick, what letter comes before v?) but it’s much more difficult to tell me what the nth letter is without counting (quick, what is the 17th letter?)

There’s a fairly uncommon but cool data structure called a “skip list,” too. This is basically a linked list, but where you don’t always want to navigate through every element to move forward because it’s a lot of elements. So you have your linked list, but then you also have a linked list that skips bunches of the list. The only example that’s coming to mind for me here is learning to speedrun a video game: You have a linked list of what you’d call the “route,” e.g. the order in which you go from place to place doing stuff. Then, at each place you go, there’s a bunch of smaller steps to do said stuff. So here you have the base linked list which is very detailed and tedious (e.g. walk 3 squares left, slash your sword then move down, etc etc) but you have a list on top of that that’s much coarser, and you can sort of go up and down between the two. So you’d think of it like: go to dungeon 1 (skip list), collect item 2 (skip list), (now drop down to the more detailed linked list) then move down two squares before wiggling away from some enemy. If this explanation makes no sense, don’t worry! I told you this wouldn’t be educational or entertaining, didn’t I?

Array

What is it?

An array is a lot like a linked list in these abstract, analogy-driven terms: it’s easy to navigate forward. It’s also easy to navigate backwards (like a doubly-linked list). However, it’s also very easy to access the nth element.

Where do I already use it?

Probably in how you think of months! You can tell me the months in order, and you could go backwards (it might be a bit stilted but you wouldn’t have any real trouble), but you can also probably tell me offhand exactly what the 5th month is.

You probably also think of days of the week this way, with some possible complications (e.g. is the first day of the week sunday or monday?). Although that’s hard to really say; 7 elements is tiny.

Map/Dictionary/Hash

These are all terms that usually refer to the same thing. Sometimes they hint at underlying implementations, which are far too educational for this blog.

What is it?

It’s a lot like what it sounds like: a mapping from something to something else. A mapping allows very fast access to any value based on its key. It’s not very good at going backwards, though!

Where do I already use it?

Probably something similar is happening all over your brain! But I can’t convince myself any of that is sufficiently map-like, because cognition is basically a black box.

Some more concrete examples: if I ask you to tell me what the word “alopecoid” means, you can go look it up in a dictionary easily*. But if I ask you to tell me a word that means “like a fox,” that’s a lot harder! Maybe you come up with “vulpine,” and I say, oh, no, that’s not the one I was thinking of, can you give me another? There’s no way get to a word based on its definition besides reading the whole dictionary, so you’re out of luck.

Tree

What is it?

A tree is a way of storing hierarchical data; that is, there is some “root” to the tree that has some number of “children,” each of which can have its own children (or not), etc. It’s useful for lots of stuff: it can make various searching procedures faster, it’s (generally) quite fast for inserting new values while maintaining sortedness, it’s the only sensible way of representing certain kinds of data at all.

Where do I already use it?

Company hierarchies work like this! There’s some big boss CEO at the top, who has some number of direct reports, and they have direct reports, etc. Presumably you know at least a substantial portion of the tree that includes you: you should know your boss, and probably their boss, and on up. You should also know all your direct reports, should you have any. But who cares about the hierarchy in departments you’re not in? I sure don’t!

Another example is family trees. Given a sufficiently correct family tree, you can go up and down and left and right to define relationships! For example, instead of your “boss” being above you, it would be your parents. Your children are your… well, children (one of those references is to human children, of course; either will do). If you go up a level and over a level, you get an aunt or an uncle**; if you go up a level, right a level, up another level (but not to the same place you’d get to going up two from the start!) and then down some more, you’ll end up with a 9th step-cousin π times removed (I think).

Graph

What is it?

A graph is a lot like a tree, except the connections between elements don’t have to go straight up and down***. They can go anywhere! Anything can be connected to anything else!!

Where do I already use it?

This is how Facebook works! I’m friends with a bunch of people, and they’re friends with a bunch of people, and some of those people are friends with me, etc, etc. Just lots and lots of people and lots of connections between them.

For an example that might make it more clear why graphs are interesting, consider playing the “Six degrees of separation” game, where you try to figure out how you’re connected to <cool celebrity X>. You start at yourself, and then you figure out the next connection: maybe your mom knew someone who grew up in X’s hometown. Then from there, you find out that this person knew X’s childhood barber. Now you’ve gotten there in just four jumps: you -> your mom -> person from X’s hometown -> X’s childhood barber -> X! This is a useful (if somewhat silly) way to use a graph.

Variations

Some graphs are what we call “Directed Acyclic Graphs,” or DAGs. These are an important subset of all graphs, with some restrictions on the connections between elements: the connections have a direction (such as parent-child or boss-employee) and there are no cycles (that is, if we have people A, B, and C, and we have that A is B’s boss, and B is C’s boss, we know for sure that C can’t be A’s boss, because that would form the cycle A->B->C->A).

Do you play video games (or maybe pen and paper RPGs)? If so, the “skill tree” you may have encountered in that setting are named pretty appropriately! There’s a “base skill” (or more than one, in which case you can think of the true root of the tree being a skill you have from the beginning), and after you’ve obtained that maybe you get a couple options for new skills. But sometimes you need more than one prerequisite — these skill “trees” should really be called skill DAGs! In this example, the direction is a “prerequisite” relationship, and there can’t be any cycles. If there were a cycle in a skill DAG, all the skills in that cycle would be unlearnable, since they all depend on each other (directly or indirectly).

Maybe you went to college? College degree requirements can be thought of as a DAG. Taking topology requires completing calc 3 and also algebra, which don’t depend on each other!

Lots of things are graphs. You can think of roads as connections on a graph, and intersections as the elements. Graphs show up everywhere! Some graphs can be directed and cyclic (think of a city neighbourhood with a lot of one way streets). Some graphs can be undirected and acyclic (I presume, I can’t actually think of a real-life example at the moment).

Bloom Filter

What is it?

A way to check whether we’ve already seen a certain element, without using much storage space (memory) to keep track. But there’s a twist… sometimes it lies! It might tell us we’ve seen something that we have not (but never the opposite).

Where do I already use it?

Hahahahaha, just kidding, you don’t.


* “Easily” is a substantial overstatement here! For years it wasn’t defined in any online dictionary I could find, and missing from many text ones. Even now, check out the Merriam Webster page for it, namely:

Wait, there’s more! This word doesn’t usually appear in our free dictionary, but we’ve shared just a bit of the information that appears in our premium Unabridged Dictionary. There’s more definition detail there.

** Is there a gender-neutral term for “sibling of parent?”

*** I sort of pretended that “right” and “left” are givens in trees, since they’re often represented directly in family trees. In general, that’s not the case, and navigating to a sibling is a process of going up to the shared parent and then back down to the sibling; they generally aren’t stored, in a computer, with a direct sibling connection. This does differentiate them from graphs importantly.

Posted in computer science, pedantry | 1 Comment