Why does Enum require to implement toEnum and fromEnum, if that's not enough for types larger than Int?

Question

It is actually just a curiosity, but here's how I came up with it.

tl;dr

I could have wondered how to write a sensible Enum instance for a type larger than Int, say (Int, Int).

A bit longer

But actually I was wondering: why does Enum's minimal definition assumes that the possible values of Int are enough to enumerate the type of which I want to write the Enum class?

At fromEnum I do read that

Convert to an Int. It is implementation-dependent what fromEnum returns when applied to a value that is too large to fit in an Int.

which I interpret as saying that I can indeed define an Enum class for a type that doesn't fit Int. I just have to accept that I'll have to return nonsense form fromEnum for some values of my type.

But how could succ and pred possibly work with such a broken type, given their implementations use toEnum and fromEnum?

    succ = toEnum . (+ 1) . fromEnum
    pred = toEnum . (subtract 1) . fromEnum

Or maybe I should define manually succ and pred so that they don't make use toEnum/fromEnum?

But if that's the case, then way isn't succ&pred a minimal definition?

Yes, I have read the comment from dfeuer here, and the answer, but the question remains as to why I need to define fromEnum/toEnum.

The origin of the question (in case you have spare suggestions)

I was using Word32, but then it took me quite a while to undertand the reason of a but in my program. Essentially, quite a good chunk of my logic was assuming that the value of Word32s is not 0.

Therefore, I thought of using a more appropriate data type implementing appropriate classes. I came across Integer.Positive, but that's only BoundedBelow, not Bounded, whereas Word32 is Bounded, so I decided to write a datatype myself and came up with a module like this:

module Positive32 (Positive32, positive32, getNum) where

import Data.Word (Word32)

newtype Positive32 = Positive32 { getNum :: Word32 } deriving (Enum, Eq, Ord, Show)

instance Bounded Positive32 where
  minBound = Positive32 1
  maxBound = Positive32 (maxBound :: Word32)

positive32 :: Word32 -> Positive32
positive32 0 = error "Attempt to create `Positive32` from `0`"
positive32 i = Positive32 i

so that

λ> minBound :: Positive32 
Positive32 {getNum = 1}
λ> positive32 2
Positive32 {getNum = 2}
λ> positive32 0
Positive32 {getNum = *** Exception: Attempt to create `Positive32` from `0`
CallStack (from HasCallStack):
  error, called at ......

However, soon I realized that, even though I have not exported Positive32's constructor, I can still create Positive32 0 via its derived Enum instance:

λ> pred $ minBound :: Positive32 
Positive32 {getNum = 0}

Hence I tried to write the Enum instance myself, think it would have a minimal definition succ :: a -> a and pred :: a -> a, but in reality it requires toEnum :: Int -> a and fromEnum :: a -> Int.

Now, in my specific usecase, that poses no problem, because Word32 fits in Int, so I can't possibly overflow Int with Word32, but I'm still left puzzled with the question of why I can't have an Enum for types that have more possible values than Int.

Daniel Wagner · Accepted Answer

I will wax a little philosophical here, as you're asking a question about the motivations of people writing a spec about three decades ago, which from what I know of human memory, may not even be accurately known to the people that were involved themselves.

But: I don't think, at the time Haskell was being defined, there was a really good picture of whether it would catch on at all or merely be a flash-in-the-pan academic curiosity. There are a number of things in the design that are, therefore, pragmatic decisions made for simplicity of the compiler writers and/or the folks writing essentially toy applications. Certainly nothing was being done at the scale of modern, industrial software engineering. I won't talk about all the decisions that I think differ between those two regimes, but certainly typeclass philosophy is a bit different.

At small scales, large type classes (i.e. classes that encompass multiple concepts and have many methods) are very convenient, in the sense of low boilerplate. Type signatures can be more compact; instance implementations have lighter overhead; more library functions in terms of those classes can be provided; functionality can be more easily shared between method implementations; and programmers don't need to keep track of as many concepts.

Of course, we now know there are also costs to this. When you have a wide variety of data types floating around in your application or in publically available libraries, you find that the larger your typeclass is, the fewer types can instantiate it. At small scales, this matters less, because you can simply export the functionality your type does support and leave it at that. But at large scales, modularity demands that some behaviors be typeclass-polymorphic, and so types that cannot implement those classes lose out on a lot of already-written functionality.

We've seen this play out repeatedly over the history of Haskell. The introduction of Applicative as a stepping stone between Functor and Monad is one example; splitting Monoid out to have a Semigroup superclass is another; losing the Eq superclass on Num also goes on the list. More generally, I think the closer to the core libraries you are, the more important it is in today's designs that typeclasses be split into as many orthogonal pieces as you can. (Of course, you have to know what orthogonal pieces are useful! It's not at all clear that Applicative could reasonably have been identified as a useful building block back when making IO a monad was a fresh, exciting insight.)

And so we come to Enum. In practice, Enum serves a variety of roles, including as a de-facto part of the FFI, where toEnum and fromEnum are often used for marshalling to C; iterating over numeric ranges; listing inhabitants of your favorite type; as the target of multiple forms of syntactic sugar; and likely more besides. With a modern design, these concerns would likely each be split out into its own class with its own concerns and requirements. But at a small scale, most types that support one of those support essentially all of them, and so it's convenient to lump them together and share implementation work and programmer mindspace.

And if you think Enum is bad, wait until you start thinking carefully about the numeric hierarchy...

Ben · Answer

A set of methods being listed as a minimal definition means that the other methods all have default definitions in terms of the minimal set. But the other methods are still part of the typeclass, not separate functions, you certainly can provide your own definitions. Indeed it's reasonably common to see a redundant method provided as a typeclass method specifically on the grounds that a (non-minimal) instance may want to override it. For example mconcat is documented as:

Fold a list using the monoid.

For most types, the default definition for mconcat will be used, but the function is included in the class definition so that an optimized version can be provided for specific types.

But another thing that can happen is that the default definitions are based on assumptions about the behaviour of the minimal set. If you implement the minimal methods in a way that violates those assumptions¹, then the default definitions of the other methods will not work and you'll have to actually implement them yourself. A documented "minimal definition" for a class is not claiming that you will never need to implement the other methods.

(Ideally these assumptions would be clearly spelled out in the class' documentation; this is just one more way that Enum is not an ideal class)

succ and pred can't be a minimal definition for Enum, because you can't implement toEnum and fromEnum in terms of just succ and pred (okay you could try if you also had minBound and maxBound, but it would be awfully inefficient to figure out that fromEnum $ Positive32 4000000000 must be 4000000000 by counting how many times you had to apply succ to get there from minBound... and it would probably produce 3999999999 since your minBound is supposed to be Positive32 1). Whereas you can provide default definitions of succ and pred in terms of fromEnum and toEnum, if you make some assumptions.

¹ You normally should not violate any type class laws in your instance, and for some classes what the default definitions assume about the minimal is only that the typeclass laws will hold. In that case any lawful minimal instance will be fine. In other cases the default definitions assume more than the typeclass laws, so there are valid instances where the minimal set is not correct without also replacing the defaults for the other methods. Enum is a fairly lawless class (its docs don't specify any requirements at all unless the type is also Bounded, and even then they don't require a whole lot) and the default definitions make some quite strict assumptions, so it falls into the latter category.

In this case the assumption that Enum's default definitions rely on is essentially that if you convert to Int with fromEnum and then do the equivalent operations using the Enum Int instance and convert the results back with toEnum, you'll get the same thing you would have got if you implemented them more directly. It's not just succ and pred; for example this is the default definition of enumFromThenTo (which is used for the [x1, x2 .. xn] syntax):

enumFromThenTo x1 x2 y = map toEnum [fromEnum x1, fromEnum x2 .. fromEnum y]

This assumption can seem convenient when it holds. It means that in [x1, x2 .. xn], the delta between x1 and x2 can be calculated using Int subtraction, which can then be repeatedly Int added to x1 and compared to xn using Int ordering. Without those handy Int operations (if enumFromThenTo could only be implemented in terms of succ and pred) then any delta range would essentially have to step through the entire type one step at a time instead of in jumps of the delta, which would be much less efficient.

But this assumption is really hard to strictly meet. Even something as simple as your Positive32 being a newtype where Positive32 0 is intended to be invalid breaks the default method assumptions. It even makes the defaults violate one of the few laws Enum has; pred minBound is required to be a runtime error, so you must override the default pred with one that checks that it isn't doing pred (Positive32 1). But then you'll also find that [Positive32 5, Positive32 4 ..] won't correctly stop at Positive32 1 (because it uses enumFromThenTo @Int, which has no idea about the minBound of your custom type). So you have to override that method too, and so on. (It also won't respect your maxBound on platforms where Int is larger than 32 bits, and when Int is only 32 bits it won't be able to properly enumerate ranges above the maxBound @Int)

In fact these defaults are pretty useless. They look like they're intended to simplify derived instances for enumeration types (whose constructors have no fields), since the compiler assigns them a nice contiguous range of Int where succ and pred can just be +1 and subtract 1. But of course these types are almost always Bounded and (much) smaller than Int, so the default enumFrom and similar wouldn't even work for these! And indeed if you dump the derived instances for something simple like:

data Foo = A | B | C | D | E | F
  deriving (Eq, Show, Bounded, Enum)

You'll see that it doesn't make use of all the default method implementations. It generates constants like:

$maxtag_Foo_a1Fu :: GHC.Types.Int
$maxtag_Foo_a1Fu = GHC.Types.I# 5#

so that it can make use of them in generated non-default enumFrom and enumFromThen. It does use the default enumFromTo and enumFromThenTo, since the user-supplied stop value must be something that's in range, so there's no danger stepping outside the valid range provided the Int values corresponding to the type are contiguous. If you use non-contiguous or out-of-order Ints you can get very bogus results, for example:

data Foo = A | B
  deriving (Eq, Show)


instance Enum Foo where
  fromEnum x = case x of
    A -> 1
    B -> 10
  toEnum x = case x of
    1 -> A
    10 -> B
    n -> error $ show n <> " is out of range"

main :: IO ()
main =  do
  -- prints 10, without error
  print $ length [A .. B]

Essentially just about the only type for which you could actually write a minimal instance defining only fromEnum and toEnum is... Int. Anything larger or in a different order will have range syntax that sometimes errors out or produces incorrect results. For anything smaller and Bounded, the defaults will break the laws. (Technically if your type doesn't implement Bounded despite being smaller than Int, then it's "okay" that [x..] eventually produces an error when it steps out of range, since it's only required to stop cleanly if the type is Bounded... and you could even argue that all the dodgy behaviour is allowed if your type is not Bounded since there are no laws at all for Enum then, but ugh). Even Enum Word can't use the minimal instance, despite Word being exactly the same size as Int (it's isomorphic to Int, but the isomorphism isn't fromEnum/toEnum), because if it enumerated via Int it would have problems with upper half of Word that is past maxBound @Int.

If your type is smaller than Int but maps to a contiguous range then you can at least use some of the defaults, but you still need to implement most of the methods. If your type's mapping to Int is out of order or has holes, then you're completely out of luck.

But there is nothing stopping you from implementing more than just fromEnum and toEnum to make your type support range syntax properly. As we've seen, the compiler even needs to do this itself just to properly generate instances for simple enumeration types. If your type is larger than Int, then fromEnum will necessarily be partial (or fail to round trip with toEnum); but if your type is smaller than Int then toEnum is necessarily partial, so if that's acceptable then a partial fromEnum that you simply never use should be just as acceptable. Either way, the few typeclass laws Enum has don't forbid this situation, so you certainly can have an Enum instance for a type larger than Int.

To be opinionated: Enum is simply not a class designed to the standards we would use today (for the reasons that Danial Wagner's answer discusses; it wasn't necessarily a bad choice in the historical context it was actually made in, but the result is bad today). I tend to think it is best approached purely as a weird form you need to fill out in order to get range syntax working, rather than as actually representing the abstract concept of enumerability.²

With correctly defined Enum instances all the range syntax works well and predictably (well, apart from some dubious behaviour with floating point numbers). Any behaviour that isn't observable through the range syntax I usually treat as meaningless implementation detail that is accidentally exposed, which I try to avoid depending on. Thus I don't care whether fromEnum is partial because Int isn't big enough; converting to Int is just an implementation detail of not-insanely-inefficient range syntax. If converting to Int doesn't actually help you do that, just write the least-bad implementation of toEnum and fromEnum you can think of, and implement the other methods in terms of something else. The usual way to achieve "correctly defined" Enum instances is deriving Enum where that is applicable; if it isn't you almost always will have to override more than just the claimed minimal methods, unfortunately.

² If we were designing from scratch a class to support the range syntax today I think we probably wouldn't expose toEnum and fromEnum at all, we'd instead try to specify the minimal requirements for computing and applying deltas, probably requiring an associated type family (or multi-parameter class) allowing the type used for the delta to be different in each instance. Such a class simply couldn't have been written when Enum was designed, as the compiler features did not exist.

And I think we would use separate class for the bare minimum of enumerability that is succ and pred (or even succ on its own), and find a way of avoiding them being partial. I can imagine types where you can always generate the next element, but can't efficiently jump through a range in fixed increments, so perhaps if you just provide succ then [x..] would work, or even [x..y] if you also have Eq, but not [x, y .. z].

I haven't really thought it through very hard, but I doubt it would end up looking very much like Enum.

Why does Enum require to implement toEnum and fromEnum, if that's not enough for types larger than Int?

Tags:

enums

functional-programming

haskell

typeclass

bounded-types

tl;dr

A bit longer

The origin of the question (in case you have spare suggestions)

Enlico

2 Answers

Daniel Wagner

Ben

Recent Activity

Donate For Us

Why does Enum require to implement toEnum and fromEnum, if that's not enough for types larger than Int?

Tags:

enums

functional-programming

haskell

typeclass

bounded-types

tl;dr

A bit longer

The origin of the question (in case you have spare suggestions)

Enlico

2 Answers

Daniel Wagner

Ben

Related questions

Recent Activity

Donate For Us