ORMs model databases, databases don’t model objects

I was discussing the merits of a couple different ways to abstract some data common to multiple models at work, and my coworkers asked some questions that forced me to articulate exactly why I have such strong opposition to things like “abstract models.” My conclusion, basically, is that I think ORMs serve as models of the database, not as arbitrary app-level objects, and should be designed as such.

Let’s make up some models here to demonstrate what was going on. We start with a model for cats, of course, called Cat, of course, that looks like this*:

class Cat(Model):
    # cat-specific
    fur = ChoiceField('short', 'medium', 'long')
    breed = CharField()
    # vitals
    heartrate = IntegerField()
    core_temp = FloatField()

but now we’ve decided we want to model frogs, too (god knows why!).

We have a couple different ways of modeling this. Coming at it from the code viewpoint, you might have something like this:

class AnimalMixin(object):
    heartrate = Integerfield()
    core_temp = FloatField()

class Cat(Model, AnimalMixin):
    fur = ...

class Frog(Model, AnimalMixin):
    jump_height = ...

This is the kind of thing that your average OO programmer would reach for (well, perhaps AnimalMixin would merely be a base class called Animal) in app code. And, in that context, it’s probably even defensible!** But this is an extremely code-centric view of things that basically ignores the database. What happens in this case? What tables and columns get created?

Well, it sort of depends. In Django, anyway, there’s a couple ways to do table inheritance: there’s a way where the base class just adds fields to all inherited classes; i.e. you would have a Cat table with heartrate and core_temp columns, and you would have a Frog table with heartrate and core_temp columns. The other way is to do a bunch of joins; i.e. you have a table with heartrate and core_temp columns, and your Cat class has a foreign key back to that table, but in the code you just have access to cat.heartrate and it doesn’t feel like a join.

These two options both have significant drawbacks: the first duplicates columns across multiple tables, and the second turns innocent-looking field access in your code into joins! It is highly non-obvious from reading the code whether an attribute access is going to reference a column of the object (fast, already in memory) or a join to a different table (slow, goes to the DB unless prefetched).

I always sort of flinch when I see these approaches used or recommended, but I’ve never really clarified my thinking around why until today. The issue is that in both cases you’re trying to hide the database from the programmer, and that is not a good thing! Don’t get me wrong, there’s lots of aspects of the database I want hidden: connection strings spring to mind, spelling out long JOIN statements, etc, but the fundamental structure of the tables is extremely relevant to thinking about your application.

So enter the third way of doing this, which gives you slightly-less-convenient app code in exchange for significantly-more-obvious database patterns:

class Vitals(Model):
    heartrate = IntegerField()
    core_temp = FloatField()

class Cat(Model):
    vitals = ForeignKey('Vitals')
    fur = ...

class Frog(Model):
    vitals = ForeignKey('Vitals')
    jump_height = ...

Now what’s a join vs what’s a field is obvious at first glance. If you call my_frog.vitals.heartrate you know you’re going through a foreign key, and if you just access my_frog.jump_height, it’s obvious that was an existing field. Obvious to someone who has some experience with Django, anyway.

I think the difference is which you think of first when you’re coming up with persistent models: your database and the data in it, or your app code and the operations of that data. I give the data primacy, and I think the code should model that. The alternative approach, where the code should be maximally convenient and the database should be molded to make it work, makes me frustrated. Give me nice, clean, well-organized data and I’ll swallow a lot of code duplication for it.

The ORM is in service of the database, not the other way around, and don’t you forget it.

* All examples made up and not checked for correctness.

** I find inheritance loathsome. Do not @ me.

Advertisements
Posted in computer science, software | Leave a comment

How To DDoS Your Own Website With Python

Have you ever thought, “huh, I wonder how easily I could DDoS myself despite all the hard work of library and framework authors?” In that case, have I got news for you!

In the vein of Uses and Abuses Of Python’s Excessive Dynamism, I have found a way to DDoS yourself via hash collision in a measly 3-ish lines of python!

Background

Since I am an unbearable pedant and this blog post would be about 10 lines long if I didn’t write a bunch of unnecessary exposition, I am going to start out with a brief explanation of hash tables/sets!

Alternatively, you could just read Wikipedia’s explanation.

A hash table is a data structure that is commonly used to implement maps, which you learned about last time. It’s also used to implement sets, which are like maps except, instead mapping keys to values, they just store a bunch of values.

The basic way a hash table works is that we start out with an array of some size, and we consider each index of the array a “bucket.” Then, each time we insert a key (or value, in the case of a set), we take a hash of the key and use that to determine what bucket to insert it in. How a hash is calculated is not important (especially not for this post, as you’ll see!), but it is a deterministic one-way function from any object to an integer. Deterministic means if I ask you for the hash of an object multiple times, it must return the same value, or everything will be ruined for ever.

Once we have the hash, we find the value of that hash modulo the length of the array. For example, if we were working with a backing array of length 10, and our hash was 1238, we would select bucket 8. Then we find what bucket 8 points at, which is generally a linked-list of values of whatever type we’re putting in.

Okay, back up, for concreteness, let’s assume we’re looking at a hash set (not table) of strings. So our backing array is a length-ten array of linked-lists of strings.

That’s better. So we’re at the linked-list that bucket 8 of our array points to. Now we iterate through this list, and make sure that the string we’re putting in isn’t already in there. This means that our string has to be compared with every other string we’ve already put into bucket 8. So every string for which our hash function ends in an 8 will go in this bucket, and we have to compare against each of those every time we add a new string with hash ending in 8.

Now this sounds somewhat scary, right? The whole point of hash sets is that they have O(1) (that is, constant time, also known as “screaming fast”) insertion time and membership test (i.e. “is this string already in my set). But if I have a gazillion strings that hash to something that is 8 mod 10, I have to do a gazillion string comparisons to verify whether or not this particular string that goes in this bucket is, in fact, new or not!

All that is to say that if your hash function has “bad” behaviour (where bad behaviour means hashing a lot of different objects to the same bucket) and/or your backing array is too small for the size of the set you’re working with, all this stuff you expect to be screaming fast will be super slow instead, which is just the worst if you’re a developer. There of course has been a lot of work done by serious computer scientists to come up with great hashing algorithms and heuristics for when it’s time to resize your array to accommodate more objects (strings, in our case), but nothing is stopping you from defining a hash function that is super terrible.

Okay, can we ruin everything now?

Here’s a great way to ruin everything:

class Foo:
  def __hash__(self):
    return 1

Now every time you try to add a new Foo object to a hash set, you’re gonna get them all in the same bucket (this is what we call a hash collision). So just throw a ton of those into a set every time your website loads and your customers will all hate you.

But wait… we can do better than that, right? If we can only make our own objects degenerate, we’ll have to go out of our way to slow our website to a crawl. Couldn’t we improve that somehow? Couldn’t we make our website terrible even in the face of all the hard work computer scientists and framework authors have done to keep things running smoothly?

Of course we can! This is Python! With adequate cleverness, we can ruin anything!

Just stick something like this into your codebase:

  for name, thing in globals().items():
    if isinstance(thing, type):
      thing.__hash__ = lambda self: 1

Now all user-defined types will have horrible hashing algorithms, guaranteeing that anywhere they’re used in sets or as keys in a set, every insertion and membership test will be O(n) instead of O(1)! Take that, helpful, thoughtful library authors!

Another terrible option

Try this one on for size:

class Foo:
  def __hash__(self):
    return 1
f = Foo()
s = {f}
print(f in s)
Foo.__hash__ = lambda self: 2
print(f in s)

You can turn every dict and set in your codebase into a liar in one swell foop!

Merry self-destruction, everyone.

Unfortunate addendum

You can run something like __builtins__.hash = lambda *args: 1, but it does not appear to affect the algorithm that’s used to hash strings. This is presumably some implementation detail of Python internals that we can’t mess with. str.__hash__ = lambda *args: 1 throws an error. Sadly, we cannot ruin the most common type of map, string -> x, so easily 😦

Posted in computer science, pedantry, software | Leave a comment

Common Data Structures You Already Secretly Understand

Let’s talk about data structures that have close analogies to things you already do or know! This post will not be educational or entertaining, I promise. Enjoy!

Linked Lists

What is it?

A linked list is a way of storing related information consecutively, such that you can only navigate forwards.

Where do I already use it?

Most people store the alphabet in their heads this way! You can easily recite the alphabet forwards (it’s even got a catchy tune), but few people can fluently recite the alphabet backwards. That’s exactly what a linked list is like: going forward is easy, going backwards is hard.

Variations

If you can cite the alphabet backwards, you probably remember it analogously to what we call a doubly-linked list. It’s easy to go forwards and backwards (quick, what letter comes before v?) but it’s much more difficult to tell me what the nth letter is without counting (quick, what is the 17th letter?)

There’s a fairly uncommon but cool data structure called a “skip list,” too. This is basically a linked list, but where you don’t always want to navigate through every element to move forward because it’s a lot of elements. So you have your linked list, but then you also have a linked list that skips bunches of the list. The only example that’s coming to mind for me here is learning to speedrun a video game: You have a linked list of what you’d call the “route,” e.g. the order in which you go from place to place doing stuff. Then, at each place you go, there’s a bunch of smaller steps to do said stuff. So here you have the base linked list which is very detailed and tedious (e.g. walk 3 squares left, slash your sword then move down, etc etc) but you have a list on top of that that’s much coarser, and you can sort of go up and down between the two. So you’d think of it like: go to dungeon 1 (skip list), collect item 2 (skip list), (now drop down to the more detailed linked list) then move down two squares before wiggling away from some enemy. If this explanation makes no sense, don’t worry! I told you this wouldn’t be educational or entertaining, didn’t I?

Array

What is it?

An array is a lot like a linked list in these abstract, analogy-driven terms: it’s easy to navigate forward. It’s also easy to navigate backwards (like a doubly-linked list). However, it’s also very easy to access the nth element.

Where do I already use it?

Probably in how you think of months! You can tell me the months in order, and you could go backwards (it might be a bit stilted but you wouldn’t have any real trouble), but you can also probably tell me offhand exactly what the 5th month is.

You probably also think of days of the week this way, with some possible complications (e.g. is the first day of the week sunday or monday?). Although that’s hard to really say; 7 elements is tiny.

Map/Dictionary/Hash

These are all terms that usually refer to the same thing. Sometimes they hint at underlying implementations, which are far too educational for this blog.

What is it?

It’s a lot like what it sounds like: a mapping from something to something else. A mapping allows very fast access to any value based on its key. It’s not very good at going backwards, though!

Where do I already use it?

Probably something similar is happening all over your brain! But I can’t convince myself any of that is sufficiently map-like, because cognition is basically a black box.

Some more concrete examples: if I ask you to tell me what the word “alopecoid” means, you can go look it up in a dictionary easily*. But if I ask you to tell me a word that means “like a fox,” that’s a lot harder! Maybe you come up with “vulpine,” and I say, oh, no, that’s not the one I was thinking of, can you give me another? There’s no way get to a word based on its definition besides reading the whole dictionary, so you’re out of luck.

Tree

What is it?

A tree is a way of storing hierarchical data; that is, there is some “root” to the tree that has some number of “children,” each of which can have its own children (or not), etc. It’s useful for lots of stuff: it can make various searching procedures faster, it’s (generally) quite fast for inserting new values while maintaining sortedness, it’s the only sensible way of representing certain kinds of data at all.

Where do I already use it?

Company hierarchies work like this! There’s some big boss CEO at the top, who has some number of direct reports, and they have direct reports, etc. Presumably you know at least a substantial portion of the tree that includes you: you should know your boss, and probably their boss, and on up. You should also know all your direct reports, should you have any. But who cares about the hierarchy in departments you’re not in? I sure don’t!

Another example is family trees. Given a sufficiently correct family tree, you can go up and down and left and right to define relationships! For example, instead of your “boss” being above you, it would be your parents. Your children are your… well, children (one of those references is to human children, of course; either will do). If you go up a level and over a level, you get an aunt or an uncle**; if you go up a level, right a level, up another level (but not to the same place you’d get to going up two from the start!) and then down some more, you’ll end up with a 9th step-cousin π times removed (I think).

Graph

What is it?

A graph is a lot like a tree, except the connections between elements don’t have to go straight up and down***. They can go anywhere! Anything can be connected to anything else!!

Where do I already use it?

This is how Facebook works! I’m friends with a bunch of people, and they’re friends with a bunch of people, and some of those people are friends with me, etc, etc. Just lots and lots of people and lots of connections between them.

For an example that might make it more clear why graphs are interesting, consider playing the “Six degrees of separation” game, where you try to figure out how you’re connected to <cool celebrity X>. You start at yourself, and then you figure out the next connection: maybe your mom knew someone who grew up in X’s hometown. Then from there, you find out that this person knew X’s childhood barber. Now you’ve gotten there in just four jumps: you -> your mom -> person from X’s hometown -> X’s childhood barber -> X! This is a useful (if somewhat silly) way to use a graph.

Variations

Some graphs are what we call “Directed Acyclic Graphs,” or DAGs. These are an important subset of all graphs, with some restrictions on the connections between elements: the connections have a direction (such as parent-child or boss-employee) and there are no cycles (that is, if we have people A, B, and C, and we have that A is B’s boss, and B is C’s boss, we know for sure that C can’t be A’s boss, because that would form the cycle A->B->C->A).

Do you play video games (or maybe pen and paper RPGs)? If so, the “skill tree” you may have encountered in that setting are named pretty appropriately! There’s a “base skill” (or more than one, in which case you can think of the true root of the tree being a skill you have from the beginning), and after you’ve obtained that maybe you get a couple options for new skills. But sometimes you need more than one prerequisite — these skill “trees” should really be called skill DAGs! In this example, the direction is a “prerequisite” relationship, and there can’t be any cycles. If there were a cycle in a skill DAG, all the skills in that cycle would be unlearnable, since they all depend on each other (directly or indirectly).

Maybe you went to college? College degree requirements can be thought of as a DAG. Taking topology requires completing calc 3 and also algebra, which don’t depend on each other!

Lots of things are graphs. You can think of roads as connections on a graph, and intersections as the elements. Graphs show up everywhere! Some graphs can be directed and cyclic (think of a city neighbourhood with a lot of one way streets). Some graphs can be undirected and acyclic (I presume, I can’t actually think of a real-life example at the moment).

Bloom Filter

What is it?

A way to check whether we’ve already seen a certain element, without using much storage space (memory) to keep track. But there’s a twist… sometimes it lies! It might tell us we’ve seen something that we have not (but never the opposite).

Where do I already use it?

Hahahahaha, just kidding, you don’t.


* “Easily” is a substantial overstatement here! For years it wasn’t defined in any online dictionary I could find, and missing from many text ones. Even now, check out the Merriam Webster page for it, namely:

Wait, there’s more! This word doesn’t usually appear in our free dictionary, but we’ve shared just a bit of the information that appears in our premium Unabridged Dictionary. There’s more definition detail there.

** Is there a gender-neutral term for “sibling of parent?”

*** I sort of pretended that “right” and “left” are givens in trees, since they’re often represented directly in family trees. In general, that’s not the case, and navigating to a sibling is a process of going up to the shared parent and then back down to the sibling; they generally aren’t stored, in a computer, with a direct sibling connection. This does differentiate them from graphs importantly.

Posted in computer science, pedantry | 1 Comment

Why Is No One Writing Language Runtimes?

A conversation with a coworker recently lead me to wonder why none of the innovation in writing software recently has been in terms of runtimes, or at least not pluggable ones.  Before I get into that in any detail, I’ll go over a bit of what a “runtime” even is, and how it’s distinct from other components of software.

This article is probably appropriate reading for someone who has written code in at least one language, is vaguely aware of some of the differences in the ways other languages run, and is willing to take some things on faith. Any more knowledge will just mean more and more of it, up to the last two headings, will be a review.

How Software Runs

There are two major, coarse-grained ways of defining how software is actually run: what we call “compiled” and “interpreted” code.

Compiled code is written, like all* code, in plain text. You open up a text file, write some human-readable characters (and some *s and ->s and so forth). Then when you want to actually run it, you use what you’ve written as input to a program called, unsurprisingly, a compiler, which takes the plain text you’ve written and turns it into “machine code,” which is just a whole lot of bits that your computer can actually run. The compiler outputs an executable, and then you just run that on its own. This looks something like gcc hello.c -o hello ; ./hello – first you compile (gcc stands for “GNU C Compiler”) some source code (in the file you write, “hello.c”) into an executable named hello (specified by the “-o hello” part). Then you run it (“./hello” means execute the program in the current directory named hello).

Interpreted code is, of course, written in plain text as well. There is no compilation step separate from running the code, though; instead you feed your code as input to an interpreter directly. This looks something like python hello.py. What’s going on under the hood is that python is a program that’s already compiled and runs on your machine successfully, and it opens up your file and executes specific instructions for each line that it reads.

There’s also a third, sort of in-between way of executing code. Some programming languages (perhaps most famously Java) have a separate compilation step but also still run their own interpreter on the compiled output. What’s happening is that the Java compiler does not compile to machine code, it compiles to what we usually refer to as bytecode. Java then reads in the bytecode and executes machine code based on it. In keeping with the last two sections, this looks something like javac Hello.java ; java Hello – you’ll note it has aspects of both of the prior examples.

Runtimes

No program is an island. Every program has to hook into the operating system at some point, if nothing else. The environment in which an executing program operates is sometimes called its “runtime environment,” or just runtime. When you write a C program, your runtime environment is very bare – the C compiler turns your code into machine code and really adds very little beyond that. When you write a Python program, however, the runtime is rich: the interpreter is the thing that’s actually running, and does things like managing your memory and doing function dispatch for you.

Other languages are in between. Since Java compiles your source text to bytecode before being executed, the only thing that’s running when you execute your code is the Java Virtual Machine (JVM), which is not a Java source compiler. It does things like manage memory allocations and deletions for you, which C won’t, but you can’t so much add a new data type at runtime, or change the object hierarchy around, like you can in Python.**

Different runtimes make different guarantees, and different languages provide their guarantees in different ways. For example, Python won’t leak memory because the interpreter creates all the objects for you and can’t lose track of them. Rust programs, however, won’t leak memory because the compiler makes sure you didn’t write code that could leak memory, and then writes down the parts about freeing memory into the final executable; the safety guarantees in the Rust case are not provided by the program actually running in the end, but by the other program (the Rust compiler) that created it. Yes, this is very difficult to understand at first, and I am not the greatest teacher; please don’t feel bad if your head is hurting.

Why Is Bytecode Important?

Normally, when you’re creating a new programming language, you have a whole lot of work to do: you need to define how the language looks (syntax) and works (semantics). Once you have your ideas for that, you need to write some programs that will actually take source text that humans write and run it. There’s a lot of different ways to do this! More now than ever. The major ways are basically writing a compiler or an interpreter. But writing compilers is really fucking hard, and writing interpreters isn’t a ton easier (and it makes the resultant code much slower, in general). So what if there were a way to get around having to do all that work, but still end up with a fast, correct executable at the end?

The key insight here is that you can write a compiler from your language’s source code to someone else‘s bytecode. If you can just get that one pass done, from your language into, say, Java bytecode, you get the whole JVM for free. The JVM gives you garbage collection and very fast execution (along, of course, with great library support, among other things, which isn’t really relevant here but is to many language designers). Instead of writing a compiler that takes human-readable text and turns it into machine-executable code, you can do the much easier task of turning that text into JVM-executable code.

Beyond Bytecode: Intermediate Representations

Bytecode is similar in some ways to a thing called intermediate representation, or IR, that many compilers use. This is basically breaking down the source text -> machine executable into an extra step, but then instead of leaving it as code for a runtime to execute, you finish up: source text -> IR -> executable.

Let’s summarize here, with examples of how some languages do things:

Language
C Source text -> machine executable
Objective-C Source text -> LLVM IR -> machine executable
Java Source text -> JVM bytecode (bytecode executed by the runtime)
Python Source text is executed directly by the runtime

Every language starts with source text; some move through an IR or bytecode step, and some go all the way to an executable, while the others have an interpreter to run as the last step.

Plugging In

All the languages I’ve used as examples so far are self-contained within the parse-compile-execute zone. Many languages are in fact built as I alluded to a moment ago – they compile from their source down to an IR or bytecode format that is then compiled or executed by a compiler or interpreter the language authors did not write. The example on my mind today is Rust, which compiles down to LLVM IR, which is then compiled by LLVM into machine code. The Rust authors don’t have to write their own compilation to machine code, and can instead leverage the expertise of the LLVM team. Other examples include languages like Clojure and Scala, which compile to Java bytecode and are then run on the JVM. Notably, LLVM and JVM are by far the most prominent targets in this scenario: they’re both open-source and well-documented targets that provide a ton of firepower, and there frankly aren’t a ton of other options.***

Where Are We Going?

Currently we’ve talked about a few approaches. You can own the entire source-code-to-running-program pipeline, or you can abandon it somewhere before the end. If you want machine code, you can compile it yourself or only compile to LLVM IR and then let the LLVM toolchain finish the job. If you want the entire weight (both in terms of heaviness and in terms of force, here) of the JVM, you can compile to Java bytecode.

What I don’t see is anyone writing a runtime that can be compiled to. The JVM is very powerful, but it has its own potential issues (type erasure and garbage collection come to mind). But what if you want a runtime with some features? what if you want, say, a runtime with very powerful coroutine support?

The answer is: you write Go.

The answer should be: you write a language that compiles to Go’s IR.

From some quick googling, it seems like there in fact is a separate IR for go, but I was unable to dig up information about how to compile that IR. I also found no evidence that anyone has tried one way or the other (compiling it or compiling to it). The fact that the Go team seems dead set on naming everything as inconveniently as possible for googling**** (the language is Go, and they call their IR “assembly,” which means that googling for “go assembly” mostly finds questions about using assembly language inside Go, not using Go’s “assembly” language) makes it hard to be confident in my assertion that no one is doing it, but it is at best niche and rare.

So Let’s Write Runtimes!

From everything I’ve heard on the internet and my coworkers, the Go language is mostly pretty crappy, or, at least, has several very serious flaws, but the performance is good and it’s excellent for high-concurrency web stuff, mostly because you get coroutines for free. Unfortunately, the two are shackled together, and I will have to wait for someone who actually has the time, energy, and skill to write a runtime with first-class coroutines as a pluggable backend that I can compile to. I mean, realistically I’ll keep writing slow, non-concurrent, interpreted Python, but perhaps you, dear reader, have an actual need for such a thing.

More than that, I’m partly just curious why no one is working on this. The JVM was never written to be a generic runtime, but it’s been wildly successful in that role. LLVM was written on purpose to be a generic compiler backend, and it’s also wildly successful. Why is no one looking at reproducing the success of the JVM, without all the baggage? There is a project called Parrot VM that was an attempt to do just this, provide a generic IR format for executing dynamic languages, but it never took off, probably because it was originally tied to the tragic development of Perl 6, and thus didn’t gain traction outside of that community. Given the number of new languages that leverage the JVM or other virtual machines (Elixir is a notable example that runs on the Erlang VM), there’s clearly a demand for this kind of thing.

 


* Very, very nearly all

** Probably not strictly true. I don’t know the exact limits of the JVM runtime. Last time I looked at it, you could certainly look up methods by name, which would be very difficult in C, but I don’t think you could create new ones at runtime. Even if this stuff is strictly possible though, the point is that Java bytecode is somewhere in-between Python and C in terms of the runtime environment.

*** One important other option is using another programming language as your IR! If you can compile (or perhaps more accurately transpile) your language into C, then you can leverage gcc (or clang, etc) into compiling that all the way down into machine code. This isn’t relevant to the post, though, really, despite being interesting. There’s also CIL, which is basically Microsoft’s version of Java bytecode, interpreted by the CLR. This is not different enough from the JVM situation for me to include it much elsewhere.

**** Is this on purpose? They work at Google! Maybe they want it to take several queries instead of one, which then allows them to put ads in front of your face more times in a row? I can’t believe anyone is that nefarious; given all the dumb decisions behind Go, I’ll have to assume Hanlon’s Razor here.

Posted in software | 13 Comments

Programming Language Misfeatures With Little Comment

(An incomplete and unordered list)

  1. inheritance
  2. !=
  3. ===
  4. not in  from python
  5. .. and ... (flip-flop operators)
  6. implicit type coercion
    1. (except arguably short -> long style)
  7. null
  8. case fallthroughs
  9. non-exhaustive case statements
  10. the switch/case construct in general
  11. lambda‘s restrictions in python
  12. ruby having 3 (? maybe more?) types of function objects
  13. memory-location equality comparisons
    1. ok, these are sometimes fine, but should never be the default
  14. arguments that can be either positional or named when called
  15. anything that exposes the difference between co- and contravariance because of a type error (lookin’ at you, Java arrays)
  16. constructs whose behaviour is partly controlled by the parser (e.g. php’s instanceof)
  17. constructs that affect the runtime behaviour of the interpreter (e.g. php’s __halt_compiler or ruby’s __END__)
  18. operator overloading where you have to (can?) define your own precedence/associativity (no, seriously)
  19. compiler warnings
    1. ESPECIALLY warnings that cannot be turned into errors
  20. function scoping
  21. closing over variables
  22. closing over values
    1. why is it never the one i want!!
  23. hygienic macros
  24. unhygienic macros
    1. why is it never the one i want!!
  25. raw-text replacement as “macros” (e.g. the C preprocessor)
Posted in software | Leave a comment

Probabilistic Morality, or: Ghosting On Spherical Cows

A Lighthearted Preamble

I have, of late, you’ll notice, been trying to find a date. In said pursuit, I have also begun using Tinder in an effort to hopefully delude some young woman into believing I am worth spending time with*.

I recently went on a nice date with a a nice young woman. Unfortunately, she… either doesn’t like talking to me, or is the world’s worst texter. We set up a tentative date, and I said “I can be there at 6?” — the question mark indicates that I was opening it up for her to confirm, deny, or choose a different time. She didn’t respond! When it came time for me to leave work, and I had to choose whether to go to the train or home, I finally had to follow up** and ask if that worked — and she quickly responded yes, we met at the decided-upon time, and had, I think, a nice evening.

So the next day, I followed up, saying I had a good time and asking about something she had mentioned she was working on while we were out. She didn’t reply! For over a whole day, she has not replied. Normally, I would take this as a clear sign that she was not interested, and in fact was doing what those damn Millennials call “ghosting.” But, as I said, she is already a forerunner for the World’s Worst Texter trophy. So maybe she did have a good time — she actually said as much while we were out! — and just does not reply to texts in a timely manner. Or, sometimes, at all.

A Bitter Aside

Dating sucks. Texting sucks. Finding oneself strategizing about texts to send hours or days in advance sucks. Everything sucks. Eat Arby’s.

Getting Towards My Actual Point, Insofar As There Is One

I was talking about all this with a friend of mine who has also been dipping her feet intermittently into the online dating game. I also mentioned that one time, after a date that I had enjoyed, I sent a similar followup text to the one described above, along the lines of: Hey, I enjoyed meeting you last night, would you like to hang out again sometime? <insert reference to a thing we talked about here***>. And she replied with something like: Hey, it was nice meeting you, but I’m not really feeling it, sorry. And I was all, Okay cool thanks for letting me know; good luck going forward!

And my friend here — the one at the beginning of last paragraph — was taken aback that a woman actually said that! It’s like getting a rejection letter after applying to a company. Nobody does that! She even thought that, hey, maybe she should try doing that too, when she’s not interested in someone, rather than ghosting.

But for a woman to actually straight-up turn a man down is often unpleasant, and sometimes even dangerous. When should a woman give a date an “official rejection” versus just ghosting on them? It’s clear that the woman should protect her own safety and wellbeing first of all, so texting a man who she did not feel safe with to begin with is a bad idea compared to disappearing. And giving a gentle but clear let-down to a man who is nice enough but you’re not interested in seeing again is a pretty clear good: you get to feel benevolent for being up front, and he gets to know where he stands and not to waste his energy pining over you. The advice seems natural: Err on the side of caution, but give a straightforward rejection whenever it feels safe.

Do You Even Remember The Title By This Point?

I started thinking about exactly what to do here. I started wondering what the correct course of action would be based on my poorly understood, utilitarian philosophy of morality. There’s two variables here: the woman’s (henceforth Alice) and the man’s (henceforth Bob) suffering****. We assume some things:

  • Alice has already decided not to see Bob again.
  • Bob would like to see Alice again.
  • Bob’s suffering is reduced by getting an explicit rejection as compared to being ghosted.
  • Alice’s suffering is reduced by offering an explicit rejection.
  • Alice’s suffering is, after deciding to offer the rejection or not, exclusively controlled by Bob’s response
    • If Bob replies in a positive or neutral manner, her suffering does not change (after the bump from magnanimously being straightforward)
    • The more negatively Bob replies, the more Alice’s suffering increases

I don’t think any of those assumptions are controversial: The first two are just an exposition of our hypothesis, and the rest seem straightforward.

For the purpose of getting anywhere, I am going to posit a final constraint that moves us into the world of spherical cows:

  • Alice has no priors to help predict Bob’s reaction

I think the idea holds up fine even if we allow Alice to have priors, but let’s assume she doesn’t for now. Bob is a spherical cow.

Maybe I’ll Present A Model Under This Header

Our assumptions let us assume fairly naturally that that Alice’s suffering will be increased by the value of some simple function*****.

14606632277209

The x-axis is the negativity of Bob’s response, and the y-axis is the effect on Alice’s suffering. So if he is positive or neutral, there’s no effect, then as he ramps up from “Okay, thanks” to “Wow, rude,” through “Fuck you, you’re ugly anyway” and on up to “You think you can just turn me down? I’ll fucking kill you,” her suffering increases. It’s not linear, its shape is not well known in general except that it is non-decreasing, but we can assume that it is “known” insofar as for any given response from Bob, we know the exact amount of suffering it causes in Alice.

But this function is not the end of the story. Because, see, what we actually have here is a probability distribution! Bob, for any given value of Bob, will only have one of these reactions. Over a large population of Bobs, however, we will find ourselves with some probability distribution over negativity of responses.

The Much-Awaited Conclusion

Once we have this probability distribution, we can wave our hands****** and come up with an expected value of Alice sending out a rejection text. The formula looks something like this:

eqn6646

Where S stands for suffering. So the total change in suffering, which is the sum of Alice and Bob’s changes in suffering, is equal to the suffering decrease of giving a rejection, plus the suffering decrease in receiving a response (these two are constants for our purposes), plus the suffering change based on the response that Bob gives.

Our setup gets us to this:

eqn6646

Which is to say: the expected suffering from Bob’s response is equal to the sum of suffering deltas of all responses times the probability of that response, multiplied by some normalization factor. So all we need to know is, given some suffering function from Alice and some probability distribution over possible responses from all Bobs, if

eqn6646

then Alice should send a rejection text. If the inequality is false, she should not.

To wrap up, I’d just like to say that I think it’s interesting that (for some model, anyway) a woman’s choice to do a rejected suitor the courtesy of actually telling him he’s rejected depends on two things: her own sensitivity to negative replies, and the probability distribution of how negative young men in the online dating world are when faced with rejection. That is to say, it is quite possible that today it is immoral to send a rejection text, but tomorrow the world will be a brighter place and men won’t be so fucking crazy, and it will become moral to reject them properly.

 


* It is not going great, so far. Thank you for your concern.

** Double-texting, despite being the Ultimate Sin of Thirst and Neediness, was my only option here besides giving up, so I reluctantly went with it.

*** I always worry that if I just say “hey let’s hang out again” I sound like I just wanna hook up, when, I mean, that’s something that’s totally cool, but not my primary goal, so I try to provide some sort of evidence that I was paying enough attention to be interested in my date as a person and not just a body, you know? I have no idea if this is effective; there are too many confounding reasons a woman might not want to go on a second date with me.

**** Suffering can be negative (what laymen refer to as “happiness”) but it’s easier to work with just one name for the quantity.

***** Simple here meaning: continuous, non-negative, real valued.

****** Do some calculus.☨

☨ I should really format my endnotes better.

 

Posted in math, morality | 2 Comments

On Speed Dating

I went Speed Dating this week. I utilized the services of this company.

The event’s location was moved the morning of.

I left my apartment half an hour before the event. I noticed I felt very nervous during the cab ride. I arrived at the event 20 minutes early. They told us to be fifteen minutes early to check in. No organizers were there when I arrived. No one was there to check anyone in until five minutes before the event started. The event started late.

There were more women than men. The men sat and the women moved between spots. I sat on a couch and women sat next to me. There were two groups of friends. One group was all medical students. One group was all mental health volunteers. Everyone was nice.

I was very tired by the end. I was less interesting in my last dates. Being nervous is very tiring. So is talking to strangers. The event was on a weekday. Speed Dating after work is even more tiring.

I turned in my match card after the dates. This matches people who match each other. The organizers never emailed me. People who match each other get introduced in an email. If you match no one they do not email you.

Posted in dating | Leave a comment

The Moral Calculus Of Suicide

From some sort of (poorly-understood) utilitarian standpoint, suicide (like anything) is immoral iff it increases the total suffering in the world. Presumably a person who is suicidal is experiencing significant suffering, and thus killing themselves is reducing total suffering for at least one person, from some high value to zero. Particularly, a young person with some very serious mental disorder (severe chronic depression, type II bipolar, etc) has many years of heavy suffering to weigh. However, that person’s life may well be reducing suffering (bringing happiness!) to others around them; many suicidally depressed people are superficially happy and exert a positive influence on the people who love them.

Committing suicide both reduces the happiness you bring to those around you and also adds some grief, sadness, guilt, etc to same. It might well also reduce suffering in smaller ways (e.g. people no longer have to worry about you or be stressed about caring for you, etc), but overall it’s going to be negative. Not only is any joy you could have brought anyone for the rest of your life now gone, you’ve also introduced a lot of sadness (that will, presumably, taper off as time goes on; there’s some weighing of futures here).

So, from a very shallow understanding of utilitarian morality, and as supported by quotes such as:

Even if a person has some duty to others, say, family members, the suicide can still be morally acceptable provided the distress to others caused by the suicide does not outweigh the distress to the person who refrains from committing suicide. No person is obligated to undergo extreme distress in order to save others from a smaller amount of distress. [source]

my understanding is that this is roughly where the analysis generally leaves off. Kill yourself if it reduces suffering; don’t if it doesn’t.

But, let’s consider a hypothetical scenario where someone I love very much and who has no impact on anyone else in the world is contemplating suicide. Wave your hands and assume the calculus shows that the person’s suicide is a small net negative: thus it is immoral for this person to commit suicide; the suffering she ends in herself (integrated over her expected life, etc, etc, blah blah blah) is slightly less than the suffering increase in mine.

Now in this case I can choose, to some degree*, to be less sad about it. To believe that the end of my dearly beloved friend’s suffering is so important to me that my gladness that she is no longer in pain reduces my total suffering to the extent that the moral calculus changes: now my friend committing suicide slightly reduces the net suffering in the world, and is thus the more moral act.

 

What does this mean? I somehow have control over whether or not others’ acts are moral? If I tell my friend, “I will be glad to see your suffering ended, if that is what you choose, even though it will break my heart to lose you,” and she correctly infers that the moral calculus has changed to the point that it is now moral for her to commit suicide, and she is some wacky alien perfectly-rational-actor caricature, then she goes ahead and does commit suicide, have I effectively killed her? Was I right to? Can I choose, after the fact, to embrace others’ decisions to reduce their own suffering and thus retroactively justify a suicide as a moral act? Does any of this argument even make sense?

As to that last question: who cares; morality isn’t objective anyway, I get to weigh things for myself.


 

*Haha! Just kidding, free will is a lie. But acting as though it’s not (which of course we all must), one in fact does have some control over one’s reactions and emotions, or rather is at least able to if one tries.

Posted in math, morality, philosophy | Tagged | Leave a comment

Encryption & Pronoun Games

You should use encryption for everything you do online. All websites should be https only, even if there’s nothing to secure. All email should be sent over encrypted protocols. All chat clients should support encryption. Etc, etc. The reason for this is that if only sensitive data is encrypted, then attackers (read: the NSA) needs only to look at and inspect (and decrypt) encrypted transmissions. Further, if most “normal people” don’t encrypt most “normal communications,” then there is the possibility for a sort of social pressure along the lines of “you wouldn’t be using encryption if you didn’t have something to hide,” which, despite popular opinion, is a powerful way for authorities to put pressure on people and break in where they don’t belong. I’m not going to go into detail on why the “nothing to hide” argument is toxic, but if you don’t already agree with me I recommend doing your reading.

But the point is that if it becomes standard for everyone to encrypt everything, then not only do we have all our communications secured, but we also all, as a group, benefit from not doing anything suspicious. Otherwise, when you find you suddenly do have something to hide, and switch to encrypting your communications, that’s a signal. If everyone does it, there’s no signal.

 

I think we should choose information-minimizing strategies in other situations where we can protect others or ourselves. The example I chose here, as you can tell from the title, is playing the pronoun game. If only people in — I don’t know what to call them that doesn’t sound condescending, maybe “nontraditional”? — relationships (basically anything besides monogamous heterosexual partnership between two cissexual people) play the pronoun game, then any time someone starts subtly dodging naming their partner’s sex, you have a very strong signal that person is in some sort of “nontraditional” — ugh, again, I don’t like that word, what should I use instead? — relationship. That’s exactly what they were trying to avoid signalling!

Since I think both that people should be able to be in whatever the fuck kinds of relationships bring them joy, who cares, and also that a person’s relationships are the business of absolutely no one except people that person decides to share them with*, I think it’s a shame that merely dodging pronoun usage might expose someone. So I play the pronoun game too! Every time someone plays the pronoun game and then it turns out their identity-unspecified partner is the sort of “norm” that you would expect, that makes the “pronoun game” signal a little weaker. Every time I dodge using terms like “girlfriend” or “she” and stick to “my partner” and phrase things to say “we,” etc, it makes someone in a nontraditional relationship less likely to be “caught” by people who might disapprove.

 

So, go forth! Encrypt all your data. Play the pronoun game. Hide any nonessential information that might cause someone pain if it were exposed. I have nothing to hide – I’m a boring ol’ cishet man who has, thus far, dated only boring ol’ cishet women – and it’s people like me who need to jump on the pronoun game bandwagon.

One assumes there are many other places where we should all be doing this as well. If you have to take a day off to go to the doctor, just say “I have a doctor’s appointment,” not “oh it’s just a cold,” because that means someone dodging the subject because of a protected medical status doesn’t get looked on suspiciously, for example. I can’t think of others at the moment, but I think the same idea applies across a wide range of fields of discourse.

 

*I mean, I don’t think people should be in like abusive or pedophilic relationships, and perhaps in some circumstances your relationship is people’s business even if you’d rather it not be (say, you have a child by a past partner or something). But come on, who am I even writing these disclaimers for on all these posts? It’s like I’m assuming the worst of myself and feel compelled to get down on paper that I don’t believe the literal worst interpretation of everything I write.

Posted in social justice | Leave a comment

A Poem (of sorts)

Once upon a midday sunny, while I pondered nothing funny,
Over many a quaint and curious man page of forgotten lore—
    While I angered, nearly snapping, suddenly there came a bad ping,
As if LVS was napping, napping though it knew the score.
“’Tis just upstart,” I muttered, “napping; i’ll give it what for—
            sudo service keepalived restart.”
Posted in jokes, software | 1 Comment