Article word count: 2389
HN Discussion: https://news.ycombinator.com/item?id=18314628
Posted by fuzzythinker
(karma: 1546)Post stats: Points: 76 - Comments: 41 - 2018-10-27T06:40:52Z
Fear and the shape of data
I like to call this the “pretend it’s what you want” approach. In high-trust environments, it can work well enough.
But then the fear creeps in. The code grows in complexity. You work with code from developers who follow different conventions. You receive data that you cannot control from upstream in erratic formats. You start seeing null pointer errors. Trust in the code breaks down, and questions about the data start to provoke anxiety rather than confidence.
\* What values does this data actually contain?
\* Can I delete these values without breaking things?
\* Can I pass in this data to this function?
You can see the fear in the code base. It looks like this:
This is defensive programming. It happens when you can no longer trust your own code to provide the data you expect at the appropriate times. Your beautiful code becomes cluttered with defensive checks, you lose readability, and the code becomes more brittle and harder to change. Fear grows, and it is harder and harder to trust that your code actually works.
Optional types: Pretend really hard
One way to stave off the fear is to introduce optional types via TypeScript or Flow. You receive a user and then proclaim joyously that it is of the User type, and henceforth shall be treated only as a User.
This is like pretending really hard. You’ve shifted your trust around. You still trust other systems to give you data in the correct shape. But within your code base, you trust the type that you’ve given to that data and that the compiler will complain if you use that data incorrectly. Instead of trusting developers to know the shape of data and use it appropriately, you’re trusting developers to write and maintain correct types, and you’re trusting the compiler to not lie about those types. More on that later.
Adding types to our example doesn’t solve the underlying problem. It improves trust within the code base by helping to ensure that data is used consistently, but it says nothing about data received from the outside world.
Validation: Trust but validate
In a low trust environment, you may need to introduce data validation at various points.
You could do this by hand, but the validation would be ad hoc, laborious, and error-prone. Or you could write JSON schema definitions and validate with ajv or the like to verify that the data matches your schema. This is less ad hoc and allows other uses like generating documentation, but is likely no less verbose or error-prone because you have to manually write out schemas like this:
Optional types + validation
Or you could introduce both types and validation. Types to stave off fear internally, and validation to be able to trust data from external sources.
To avoid writing essentially the same type definitions for both validation and optional types you can use the TypeScript or Flow compilers directly as libraries, or use another library like runtypes (TS), runtime-types (Flow), or typescript-json-schema (TS). After going through a few hoops you start feeling more trust in your data. But there are deeper issues here, which I will get to later.
Fear and changing data
But in this style, the flow is hard to follow, and fear starts to creep in. What if our data is used elsewhere? What if it was already changed elsewhere? What values do I have in my data at this point? How can I trust that the data I have at this point is the data I want at this point and will stay that way? This is a trivial example, but the problem becomes much worse with a large code base or a highly concurrent system.
You turn to optional types, but those types won’t save you. In TypeScript and Flow, both of these functions have the same type:
One of these does what you want; the other burns the city down. As far as these type systems are concerned, these functions do nothing.
Convention: Pretend immutability
You favor const over var and duplicating values over mutation. You use let to indicate value references that change. You rediscover the ternary operator as a functional alternative to if statements, at least for short lines. You use functions to return new values instead of changing values. You use map, filter, reduce, and other functional constructs to create new data structures without changing the underlying data.
Libraries: Pretend really hard
You can shift the trust partly from other developers to tools by adopting libraries for data transformation and immutable data structures. You might start using a library like Ramda pervasively as a functional utility belt, or adopt lenses à la partial.lenses, monocle-ts, or the like.
One fundamental idea in these types of libraries is that the underlying data is treated as though it were immutable. It’s not – even Ramda only does shallow clones – but if the convention of immutable data is strong enough, then everyone can pretend it is. You may take a slight performance hit from copying data, but you gain some level of trust in the code. This works best if the use of the library and this convention is pervasive.
To enforce actual immutability and avoid the performance hit for changing data, you might also introduce immutable data structures via something like Immutable.js, seamless-immutable or Mori.
Both of these approaches have limitations, but most importantly they clash hard with optional types.
Optional types give a false sense of security
To type these out in TypeScript or Flow, you sacrifice on one or more principles:
1. Sacrifice type safety, the whole reason you use types: Type them out with any types, which allow any values and essentially disable the type checker for all values in the “path” of any.
2. Sacrifice usefulness: Make the functions less general in order to provide more specific, accurate types.
3. Sacrifice other developers’ time: Make the user of the function provide the correct types, as in
Then you add libraries into the mix, with their own type definitions with mixed levels of accuracy. This transfers some trust not to the developers of libraries, but to the developers of type definitions for libraries. Many of these libraries will contain any annotations, and calling those functions will quietly render your trust in types invalid. In Flow, type-checking can also be quietly disabled when a file is missing a @flow annotation.
You can work around this trust issue by adopting type annotations pervasively, disallowing both implicit and explicit any types, setting the linter to complain when files are not type-checked, and otherwise tightening up configurations.
Ultimately the strength of your types depends on the knowledge and belief of the team in applying them. If the team has a high level of belief and knowledge of types, they can encode a high level of trust into the system. But this is dependent on the team’s attention and discipline to maintain this level of trust, and fear can creep in and destroy that trust in many subtle ways.
“Note: many of the functions in Ramda are still hard to properly type in Ramda, with issues mainly centered around partial application, currying, and composition, especially so in the presence of generics. And yes, those are probably why you’d be using Ramda in the first place, making these issues particularly problematic to type Ramda for TypeScript. A few links to issues at TS can be found below.”
Despite the impressive work of people like Giulio Canti, every time you choose even slightly more advanced functional programming concepts, like immutable data structures, function composition, or currying, you are essentially opting out of the type checker or going to extra lengths to make the types work. This discourages functional programming.
Types, immutability, and functional programming can all support each other, just like they do in many languages. Types can be used to enforce immutability, even when the underlying data structures are mutable or the types don’t exist at runtime. Types can help developers connect the piping correctly when using functional composition or transforming data using lenses. Functional transformations can be easier to understand and maintain when you see the types. Functional transformations can be more efficient when you know the underlying data is immutable.
Learning to code with fear
Adopting one of these languages is not going to solve all of your problems. It will introduce its own problems. But it might give you a higher level of basic trust in your code, and better tools to increase or decrease that trust as needed. In my next post, I discuss how these ideas play together in PureScript.
HackerNewsBot debug: Calculated post rank: 64 - Loop: 80 - Rank min: 60 - Author rank: 20