-
Notifications
You must be signed in to change notification settings - Fork 87
Coding Standards
This file contains agreed-upon coding standards and best practices, as well as proposals for changes or new standards. Also, the concept and topics are shamelessly stolen from the Adrestia Wiki (thanks guys!).
Proposals are prefixed with [PROPOSAL]
until discussed and voted on by the
Hydra team. To be accepted, a practice should be voted with majority + 1, with
neutral votes counting as positive votes.
Each proposal should start with a section justifying the standard with rational arguments. When it makes sense, we should also provide good and bad examples to make the point clearer.
Use a Red Bin to quickly identify, analyse, track and resolve defects in the process.
- When a defect is spotted, place a sticky in the Red bin area of our project backlog. A "defect" is anything that goes in our way: Inconsistencies in formatting, long build times, flaky tests, unclear names, ill understood code or tool...
- If there's already a similar issue, add a 👍 emoji to the sticky
- Work on the defect as soon as possible
- optional: Link to an entry in Logbook explaining the issue and if possible analysing it: Why it matters, what possible solutions there are
Why
Identifying and resolving defects quickly and relentlessly is a key practice to improve quality of both the process and the product
Here are seven rules for great git commit messages:
- Separate subject from body with a blank line
- Limit the subject line to 50 characters (soft limit)
- Capitalize the subject line
- Do not end the subject line with a period
- Use the imperative mood in the subject line and suffix with ticket number if applicable
- Wrap the body at 72 characters (hard limit)
- Use the body to explain what and why vs. how
Why
Git commit messages are our only source of why something was changed the way it was changed. So we better make the readable, concise and detailed (when required).
When closing branches / PRs use merge commits, so we have a history of PRs also in the git history.
Do not merge master
into side branches, instead rebase them on top of master
. Try to keep branches up-to-date with master (not strict requirement though).
Why?
Keeping branches ahead of master not only make the git history a lot nicer to process, it also makes conflict resolutions easier. Merging
master
into a branch repeatedly is a good recipe to introduce invalid conflict resolutions and loose track of the actual changes brought by a the branch.
Tip: Use Github's merge button in PRs to merge with commit. If a branch is outdated, use the rebase button in PRs to rebase feature branches (NOT update via merge)
Merge PR button
Update/rebase branch button
Examples
GOODBAD (unclear it was a branch)
BAD (messier than necessary history because of merge into feature branch)
In general, we will use a code formatting tool and enforce it's usage by CI or commit hooks. Refer to the appropriate configuration file in our repository. Any change to which tool and configuration is used shall be discussed and voted upon in the corresponding pull request. Things which are not covered by our tool of choice are also mentioned and voted on in here. Right now, we are using fourmolu.
Why
- Focus on content and logic when writing and reviewing code
- Avoid discussions (and bikeshedding) code format (e.g. indentation) on pull requests
ℹ️ Should be enforced by code formatting tool (e.g. Fourmolu).
A source code file should not contain multiple consecutive blank lines.
Use only a single blank line between the following top-level definitions:
- function definitions
- data type definitions
- class definitions
- instance definitions
Why
- Consistency with other Haskell code.
- Excessive vertical space increases the amount of unnecessary scrolling required to read a module.
Examples
-- BAD
newtype Foo = Foo Integer
deriving (Eq, Show)
newtype Bar = Bar Integer
deriving (Eq, Show)
-- GOOD
newtype Foo = Foo Integer
deriving (Eq, Show)
newtype Bar = Bar Integer
deriving (Eq, Show)
-- BAD
instance FromCBOR Block where
fromCBOR = Block <$> decodeBlock
newtype BlockHeader = BlockHeader
{ getBlockHeader :: Primitive.BlockHeader
} deriving Eq
-- GOOD
instance FromCBOR Block where
fromCBOR = Block <$> decodeBlock
newtype BlockHeader = BlockHeader
{ getBlockHeader :: Primitive.BlockHeader
} deriving Eq
- We do use short and concise names for fields in record data types.
- We use
DuplicateRecordFields
when needed to avoid conflicts. - We use type annotations, getField or generic-lens to disambiguate record field access.
Why
- Use the most suitable name for record field without restrictions.
- This allows to derive instances for pretty-printing, serialization etc. with minimal amount of boiler-plate.
- Avoid redundancy and errors because of it
Examples
-- GOOD
data Foo = Foo
{ name :: Integer
, value :: Value
}
-- BAD
data Foo = Foo
{ fooName :: Integer
, fooValue :: Value
}
-- BAD
data FooBarBaz = Foo
{ _fbbName :: Integer
, _fbbValue :: Value
}
Do not pass Bool
as arguments to functions if it can be avoided. Alternatives are (also see examples):
- Make it two functions
- Create a
newtype
aroundBool
(We do want to avoid that as well though) - Create a dedicated
data
type
Why
- Side-steps the type system and misses the opportunity to catch errors due to mixing multiple
Bool
arguments- Helps in documenting what's going on (we can still add documentation of course)
Examples
-- WORST
client :: Bool -- ^ Adding documentation is not enough
-> IO ()
client pipelined =
-- Naming ^ the argument is not enough
pure ()
-- BAD
type IsPipelined = Bool
client :: IsPipelined -> IO ()
client pipelined =
-- Naming ^ the argument is not enough
pure ()
-- BAD
newtype IsPipelined = IsPipelined Bool
client :: IsPipelined -> IO ()
-- GOOD
clientSimple :: IO ()
clientPipelined :: IO ()
-- GOOD
data IsPipelined = Simple | Pipelined
client :: IsPipelined -> IO ()
For user-defined kinds, we want to use ticked constructors in type signatures.
Why
- Clearly identifies that a data-type is being promoted as kind.
Examples
data FireForget msg where
StIdle :: FireForget msg
StDone :: FireForget msg
-- BAD
fireForgetClientPeer :: Peer (FireForget msg) AsClient StIdle m a
fireForgetClientPeer = ...
-- GOOD
fireForgetClientPeer :: Peer (FireForget msg) 'AsClient 'StIdle m a
fireForgetClientPeer = ...
Use camel-case for all identifiers, whether functions or types, including acronyms.
Why
- Not-quite-camel-case identifiers introduce irregularity in names which might hamper interoperability (eg. generating snake-case identifiers from camel-case)
Examples
-- BAD
data UTXO = ...
data UTxO = ...
-- GOOD
data Utxo = ...
Distinguish between errors which are common and errors which are exceptional, e.g. validating something fails "as likely" as it will not, whereas sending a message over the network is expected to work usually.
Here are some condensed rules we try to follow:
- Try to avoid errors in the first place "by design", e.g. using smaller type like
Natural
orNonEmpty
lists (Source) - Avoid
error
and it's derivates at all cost! Add a huge comment if you still need to use it.
All (pure) functions must be total so there's no use for error
or panic
except as a placeholder or Test Double to remind one there's more test to write. The only reason why one would use panic
in production code is to satisfy the completeness checker of the compiler even though we know some case is impossible, but then we should think very hard of a better solution.
- Use
Either e a
for expected errors (or evenMaybe
if there is only one error case) - Use
m (Either e a)
for the same reason in monadic code - We want to avoid
ExceptT e IO
and thus also inMonadIO m
situations:
ExceptT
is bad because:
- It's inefficient
- It makes code more complicated
- It does add any form of safety as the underlying
IO
s can still throw exceptions anyhow
Also, make sure e
is a proper error type with relevant information, possibly even an instance Exception MyError
. And have ToJSON
and FromJSON
instances to be able to shove the error's information in a properly formatted log trace.
And quoting Edward Yang:
canonicalizing errors that the libraries you are interoperating [with are exposing] is a good thing: it makes you think about what information you care about and how you want to present it to the user. You can always create a MyParsecError constructor which takes the parsec error verbatim, but for a really good user experience you should be considering each case individually.
- Exceptional situations may be reported using
MonadThrow m
andException
instances - Add documentation about what additional exceptions may be thrown (and when) by a function.
- Specify the exception type on all usages of
catch
and think twice before you writecatch (_ :: SomeException)
- Use
bracket
orResourceT
for resource management in presence of exceptions. - Make sure you do use
throwIO
and notthrow
if you are in theIO
monad, since the former guarantees ordering; the latter, not necessarily.
Examples
-- BAD
myHead :: [a] -> a
-- LESS BAD
myHead :: [a] -> Maybe a
-- GOOD
myHead :: NonEmpty a -> a
-- GOOD
complexAlgorithm :: Input -> Either ValidationError Output
-- GOOD
complexAlgorithm :: MonadError ValidationError m => Input -> m Output
-- BAD
complexAlgorithm :: (MonadError ValidationError m, MonadIO m) => Input -> m Output
-- GOOD
-- Accepted because the function itself is not necessarily in IO and the caller does decide
complexAlgorithm :: MonadError ValidationError m => Input -> m Output
-- BAD
myOpenHandle :: FilePath -> ExceptT MyException IO Handle
-- BAD
myOpenHandle :: MonadIO m => FilePath -> ExceptT MyException m Handle
-- GOOD
-- | Does throw 'MyException', when ... (and any other type of 'Exception' really)
myIoishCode :: (MonadThrow m, MonadIO m) => String -> m Int
- Michael Snoyman's Exception Best Practices,
- Matt Parson's posts Type Safety Back and Forth,
- Mark Karpov's article on exceptions,
- Edward Yang's 8 Ways to report errors revisited,
- which is based on Eric Kidd's 8 ways to report errors in Haskell,
- Another Snoyman's post on The Use of Void to express "impossible" states
When a type declaration includes more than one constraints, use tuple notation instead of double-arrow notation.
Parenthesis are not needed when there is a single constraint.
Rationale
* Easier to read the boundary between constraints and argumentsExamples
-- BAD
processEffect ::
MonadAsync m =>
MonadTimer m =>
MonadThrow m =>
HydraNode tx m ->
Tracer m (HydraNodeLog tx) ->
Effect tx -> Generate signature comments
m ()
-- GOOD
processEffect ::
(MonadAsync m, MonadTimer m, MonadThrow m) =>
HydraNode tx m ->
Tracer m (HydraNodeLog tx) ->
Effect tx -> Generate signature comments
m ()
-- GOOD
processEffect :: Generate signature comments
( MonadAsync m
, MonadTimer m
, MonadThrow m
) =>
HydraNode tx m ->
Tracer m (HydraNodeLog tx) ->
Effect tx ->
m ()
processEffect HydraNode{hn, oc, sendResponse, eq} tracer e = do
-- GOOD
createEventQueue :: MonadSTM m => m (EventQueue m e) Unfold createEventQueue
For all data types, provide generic instances of the following typeclasses:
-
Eq
,Show
from stock deriving, -
Generic
fordata
types declaration, -
Arbitrary
using generic-random package.
Rationale
-
Eq
andShow
are needed for test assertions and generally useful for pattern matching, guards, predicates... - Using any kind of automatic derivation (
Generic
,stock
,anyclass
...) prevents various kind of runtime errors like missing cases - Having an
Arbitrary
instance for all base types does not prevent using specialised ones but makes it easy to use those for generating arbitrary test data thus improving coverage of tests
For types which are serialised, whether in JSON, CBOR or other formats:
- Favor the use of
Generic
ally derived instances instead of manually written instances whenever possible. - Provide tests for (de)serialisation using hspec-aeson-golden, quickcheck-classes or other similar tools.
Rationale
- Serialisation code is usually a lot of boilerplate which can be avoided by using generic derivation
- Golden tests ensure changes in serialisation format are caught by tests
- Roundtrip property tests ensure consistency of serialisation and deserialisatsion format
We do allow the use of record fields on sum types as they make the code
clearer and allow more often the use of Generic
instances for e.g. JSON
serialization.
Hence, we do not activate the
-Wpartial-fields
warning for our packages.
However, this requires some discipline in using these data types and we
NEVER use field accessors as functions and rather pattern match on the
field names or use techniques like record
puns.
Also the DuplicateRecordFields
language pragma shall be used to allow re-use
of field names.
Rationale
- Field names can be self-documenting
-
Generic
instances get better and are more often sufficient - We never have been bitten by the problem
Examples
-- BAD
data Effect tx
= ClientEffect (ServerOutput tx)
| NetworkEffect (Message tx)
| OnChainEffect (OnChainTx tx)
| Delay DiffTime (Event tx)
deriving stock (Generic)
deriving anyclass (ToJSON, FromJSON)
-- GOOD
data Effect tx
= ClientEffect {serverOutput :: ServerOutput tx}
| NetworkEffect {message :: Message tx}
| OnChainEffect {onChainTx :: OnChainTx tx}
| Delay {delay :: DiffTime, event :: Event tx}
deriving stock (Generic)
deriving anyclass (ToJSON, FromJSON)
When pattern-matching on the last argument of a function, use Lambda-case expressions instead of multiple equations.
Rationale
- Especially when the number of cases is important, eg. more than a handful, repeating a function's name and arguments obscures the intention of the code.
- When not all arguments are used in all branches of the pattern, compiler requires to use wildcards for unused names which decreases readability of the code.
Examples
-- BAD
monitor metricsMap (Node (ProcessingEvent _ (NetworkEvent (ReqTx _ tx)))) = do
...
monitor metricsMap (Node (ProcessedEffect _ (ClientEffect (SnapshotConfirmed snapshot)))) = do
...
monitor metricsMap (Node (ProcessedEvent _ _)) =
...
monitor _ _ = pure ()
-- GOOD
monitor transactionsMap metricsMap = \case
(Node (ProcessingEvent _ (NetworkEvent (ReqTx _ tx)))) -> do
...
(Node (ProcessedEffect _ (ClientEffect (SnapshotConfirmed snapshot)))) -> do
...
(Node (ProcessedEvent _ _)) -> ..
...
_ -> ..
When import
ing symbols from other modules, use explicit imports except for:
-
XXX.Prelude
modules which can be imported in full, - conflicting definitions which can then be resolved using qualified imports.
Rationale
- Being explicit about where a symbol comes from removes cognitive overload about symbol resolution when working on a module
- Using HLS, it's very simple to lookup and add imports whenever they are needed
Examples
-- BAD
import Data.Text
-- GOOD
import Hydra.Prelude
import Test.Hydra.Prelude
import Data.Text(pack, unpack)
Use hspec
's describe
and it
to
- narrow down the system under test
- provide additional context,
- state assumptions and expectations.
Also, do not repeat yourself by restating the module name. hspec
does
introduce a context for the module name -Spec
automatically.
Rationale
- Easy to read test results
- Easy to read test failures
- Good source for matching test cases or groups thereof
Examples
-- BAD test results
Hydra.API.Server
API Server
sends sendOutput to all connected clients
-- BAD test results
Hydra.ModuleA
Paraphrasing the name of ModuleA
case 1
case 2
-- BAD test failure
1) Hydra.ModuleA, Paraphrasing the name of ModuleA , case 1
uncaught exception: ErrorCall
demo
-- GOOD test results
Hydra.Behavior
Single participant Head
accepts Init command
accepts Commit after successful Init
not accepts commits when the head is open
can close an open head
does finalize head after contestation period
Two participant Head
only opens the head after all nodes committed
can abort and re-open a head when one party has not committed
cannot abort head when commits have been collected
cannot commit twice
outputs committed utxo when client requests it
in an open head
sees the head closed by other nodes
-- GOOD test failure
1) Hydra.Behavior, Two participant Head, in an open head, sees the head closed by other nodes
uncaught exception: ErrorCall
demo
Readability is of utmost importance and there is balance to be found between using the lexical scope and complexity of nesting. So these are not strict rules, but more guidelines we try to follow:
- Prefer
where
overlet
when possible - No
where
inwhere
- No
let
inlet
- Don't mix
where
andlet
in a function
Note, that the let
hear refers to the let .. in
syntax, not the let
in a
do
block.
Rationale
* Top-down approach via `where` is more favoring the reader * Nested `where` and `let` clauses are hard to read and modify * Scoping symbols does make sense though * Mixing `where` and `let` makes it harder to follow as a readerExamples
-- BAD: nested where
-- | Documented top-level function.
foo :: ...
foo arg1 arg2 =
computeResult bar arg1
where
bar = someConstructor part1 part2
where
part1 = anotherConstructor 123
part2 = baz arg2 "something much more complex"
-- BAD: nested let
-- | Documented top-level function.
foo :: ...
foo arg1 arg2 =
let bar =
let part1 = anotherConstructor 123
part2 = baz arg2 "something much more complex"
in someConstructor part1 part2
in computeResult bar arg1
-- BAD: mixing where and let
-- | Documented top-level function.
foo :: ...
foo arg1 arg2 =
computeResult bar arg1
where
bar =
let part1 = anotherConstructor 123
part2 = baz arg2 "something much more complex"
in someConstructor part1 part2
-- BAD: bottom-up with let (acceptable when more readable though)
-- | Documented top-level function.
foo :: ...
foo arg1 arg2 =
let part1 = anotherConstructor 123
part2 = baz arg2 "something much more complex"
bar = someConstructor part1 part2
in computeResult bar arg1
-- GOOD: top-down with where
-- | Documented top-level function.
foo :: ...
foo arg1 arg2 =
computeResult bar arg1
where
bar = someConstructor part1 part2
part1 = anotherConstructor 123
part2 = baz arg2 "something much more complex"
-- GOOD: factored out functions
-- | Documented top-level function.
foo :: ...
foo arg1 arg2 =
computeResult bar arg1
where
bar = someConstructor part1 (part2 arg2)
-- No haddock, but still documented
part1 :: ...
part1 = anotherConstructor 123
-- No haddock, but still documented
part2 :: ...
part2 arg2 = baz arg2 "something much more complex"
Quickcheck generators for should be named genThingQualifier :: Gen Thing
where Thing
is the type of the generated value and Qualifier
is an optional qualifier to this particular generator.
Note that the single, blessed genThing :: Gen Thing
is expected to be also be used in Arbitrary Thing
instances.
Rationale
* Looking for generators of `Thing` allows one to search for `genThing`, or use code completion tools * Allows use for multiple generatorsExamples
-- BAD
genAdaOnlyValue :: Gen Value
-- GOOD
genValueAdaOnly :: Gen Value
-- BAD
genSequenceOfValidTransactions :: Utxo -> Gen [CardanoTx]
-- GOOD
genCardanoTxValidSequence :: Utxo -> Gen [CardanoTx]
Label all TVar
, TQueue
, Async
... data structures using the appropriate labelXXX
functions, eg.
Rationale
Concurrency in Hydra is expressed using the type-classes provided by io-sim-classes.
This allows us to benefit from speed-up and determinism on gets by running test code in the IOSim s
monad
io-sim
can also provide a wealth of information on the actual behaviour of multithreaded code and potentially help detecting race conditions, deadlocks, livelocks, and other typical issues on encounter when writing concurrent code. This information is typically available through the SimTrace result from runSimTrace
which records all scheduling events
However, this information can be hard to understand as the threads and variables are labelled using arbitrary numbers
When pretty-printing a SimTrace
using (fmap ppEvents) . traceEvents
, one can easily identifies which thread does what and which variables are read/written
Examples
-- Threads
nodeThread <- async $ labelThisThread ("node-" <> shortLabel hsk) >> runHydraNode (contramap Node tr) node
-- TVar
labelTVarIO numThreads "num-threads"
-- TQueue
labelTQueueIO q "event-queue"
Do not use orphan instances and consequently do not disable the warning (error) with -Wno-orphans
.
If the orphan instance is not easily possible to avoid, we must add a haddock comment to each orphan instances we create with a rationale why this is needed and can't be avoided.
Why
This blog post by a former colleague has a good explanation of the problems of orphan instances, but also a balanced rationale on when orphan instances are even acceptable (feel free to use this when arguing for it in haddock), plus a few links of the general wisdom of avoiding orphan instances.
Note that as we use Arbitrary liberally in our public libraries, we are not okay with orphan instances for those as they can be easily avoided by just keeping them next to the data types they are instantiated on.