never executed always true always false
    1 {-# LANGUAGE FlexibleInstances #-}
    2 
    3 -- | Units are library components from Cabal packages compiled and installed in
    4 -- a database
    5 module GHC.Unit
    6    ( module GHC.Unit.Types
    7    , module GHC.Unit.Info
    8    , module GHC.Unit.Parser
    9    , module GHC.Unit.State
   10    , module GHC.Unit.Module
   11    , module GHC.Unit.Home
   12    )
   13 where
   14 
   15 import GHC.Unit.Types
   16 import GHC.Unit.Info
   17 import GHC.Unit.Parser
   18 import GHC.Unit.Module
   19 import GHC.Unit.Home
   20 import GHC.Unit.State
   21 
   22 {-
   23 
   24 Note [About Units]
   25 ~~~~~~~~~~~~~~~~~~
   26 
   27 Haskell users are used to manipulate Cabal packages. These packages are
   28 identified by:
   29    - a package name :: String
   30    - a package version :: Version
   31    - (a revision number, when they are registered on Hackage)
   32 
   33 Cabal packages may contain several components (libraries, programs,
   34 testsuites). In GHC we are mostly interested in libraries because those are
   35 the components that can be depended upon by other components. Components in a
   36 package are identified by their component name. Historically only one library
   37 component was allowed per package, hence it didn't need a name. For this
   38 reason, component name may be empty for one library component in each
   39 package:
   40    - a component name :: Maybe String
   41 
   42 UnitId
   43 ------
   44 
   45 Cabal libraries can be compiled in various ways (different compiler options
   46 or Cabal flags, different dependencies, etc.), hence using package name,
   47 package version and component name isn't enough to identify a built library.
   48 We use another identifier called UnitId:
   49 
   50   package name             \
   51   package version          |                       ________
   52   component name           | hash of all this ==> | UnitId |
   53   Cabal flags              |                       --------
   54   compiler options         |
   55   dependencies' UnitId     /
   56 
   57 Fortunately GHC doesn't have to generate these UnitId: they are provided by
   58 external build tools (e.g. Cabal) with `-this-unit-id` command-line parameter.
   59 
   60 UnitIds are important because they are used to generate internal names
   61 (symbols, etc.).
   62 
   63 Wired-in units
   64 --------------
   65 
   66 Certain libraries (ghc-prim, base, etc.) are known to the compiler and to the
   67 RTS as they provide some basic primitives.  Hence UnitIds of wired-in libraries
   68 are fixed. Instead of letting Cabal chose the UnitId for these libraries, their
   69 .cabal file uses the following stanza to force it to a specific value:
   70 
   71    ghc-options: -this-unit-id ghc-prim    -- taken from ghc-prim.cabal
   72 
   73 The RTS also uses entities of wired-in units by directly referring to symbols
   74 such as "base_GHCziIOziException_heapOverflow_closure" where the prefix is
   75 the UnitId of "base" unit.
   76 
   77 Unit databases
   78 --------------
   79 
   80 Units are stored in databases in order to be reused by other codes:
   81 
   82    UnitKey ---> UnitInfo { exposed modules, package name, package version
   83                            component name, various file paths,
   84                            dependencies :: [UnitKey], etc. }
   85 
   86 Because of the wired-in units described above, we can't exactly use UnitIds
   87 as UnitKeys in the database: if we did this, we could only have a single unit
   88 (compiled library) in the database for each wired-in library. As we want to
   89 support databases containing several different units for the same wired-in
   90 library, we do this:
   91 
   92    * for non wired-in units:
   93       * UnitId = UnitKey = Identifier (hash) computed by Cabal
   94 
   95    * for wired-in units:
   96       * UnitKey = Identifier computed by Cabal (just like for non wired-in units)
   97       * UnitId  = unit-id specified with -this-unit-id command-line flag
   98 
   99 We can expose several units to GHC via the `package-id <unit-key>` command-line
  100 parameter. We must use the UnitKeys of the units so that GHC can find them in
  101 the database.
  102 
  103 During unit loading, GHC replaces UnitKeys with UnitIds. It identifies wired
  104 units by their package name (stored in their UnitInfo) and uses wired-in UnitIds
  105 for them.
  106 
  107 For example, knowing that "base", "ghc-prim" and "rts" are wired-in units, the
  108 following dependency graph expressed with database UnitKeys will be transformed
  109 into a similar graph expressed with UnitIds:
  110 
  111    UnitKeys
  112    ~~~~~~~~                      ----------> rts-1.0-hashABC <--
  113                                  |                             |
  114                                  |                             |
  115    foo-2.0-hash123 --> base-4.1-hashXYZ ---> ghc-prim-0.5.3-hashUVW
  116 
  117    UnitIds
  118    ~~~~~~~               ---------------> rts <--
  119                          |                      |
  120                          |                      |
  121    foo-2.0-hash123 --> base ---------------> ghc-prim
  122 
  123 
  124 Note that "foo-2.0-hash123" isn't wired-in so its UnitId is the same as its UnitKey.
  125 
  126 
  127 Module signatures / indefinite units / instantiated units
  128 ---------------------------------------------------------
  129 
  130 GHC distinguishes two kinds of units:
  131 
  132    * definite units:
  133       * units without module holes and with definite dependencies
  134       * can be compiled into machine code (.o/.a/.so/.dll/...)
  135 
  136    * indefinite units:
  137       * units with some module holes or with some indefinite dependencies
  138       * can only be type-checked
  139 
  140 Module holes are constrained by module signatures (.hsig files). Module
  141 signatures are a kind of interface (similar to .hs-boot files). They are used in
  142 place of some real code. GHC allows modules from other units to be used to fill
  143 these module holes: the process is called "unit/module instantiation". The
  144 instantiating module may either be a concrete module or a module signature. In
  145 the latter case, the signatures are merged to form a new one.
  146 
  147 You can think of this as polymorphism at the module level: module signatures
  148 give constraints on the "type" of module that can be used to fill the hole
  149 (where "type" means types of the exported module entitites, etc.).
  150 
  151 Module signatures contain enough information (datatypes, abstract types, type
  152 synonyms, classes, etc.) to typecheck modules depending on them but not
  153 enough to compile them. As such, indefinite units found in databases only
  154 provide module interfaces (the .hi ones this time), not object code.
  155 
  156 Unit instantiation / on-the-fly instantiation
  157 ---------------------------------------------
  158 
  159 Indefinite units can be instantiated with modules from other units. The
  160 instantiating units can also be instantiated themselves (if there are
  161 indefinite) and so on.
  162 
  163 On-the-fly unit instantiation is a tricky optimization explained in
  164 http://blog.ezyang.com/2016/08/optimizing-incremental-compilation
  165 Here is a summary:
  166 
  167    1. Indefinite units can only be type-checked, not compiled into real code.
  168    Type-checking produces interface files (.hi) which are incomplete for code
  169    generation (they lack unfoldings, etc.) but enough to perform type-checking
  170    of units depending on them.
  171 
  172    2. Type-checking an instantiated unit is cheap as we only have to merge
  173    interface files (.hi) of the instantiated unit and of the instantiating
  174    units, hence it can be done on-the-fly. Interface files of the dependencies
  175    can be concrete or produced on-the-fly recursively.
  176 
  177    3. When we compile a unit, we mustn't use interfaces produced by the
  178    type-checker (on-the-fly or not) for the instantiated unit dependencies
  179    because they lack some information.
  180 
  181    4. When we type-check an indefinite unit, we must be consistent about the
  182    interfaces we use for each dependency: only those produced by the
  183    type-checker (on-the-fly or not) or only those produced after a full
  184    compilation, but not both at the same time.
  185 
  186    It can be tricky if we have the following kind of dependency graph:
  187 
  188       X (indefinite) ------> D (definite, compiled) -----> I (instantiated, definite, compiled)
  189       |----------------------------------------------------^
  190 
  191    Suppose we want to type-check unit X which depends on unit I and D:
  192       * I is definite and compiled: we have compiled .hi files for its modules on disk
  193       * I is instantiated: it is cheap to produce type-checker .hi files for its modules on-the-fly
  194 
  195    But we must not do:
  196 
  197       X (indefinite) ------> D (definite, compiled) -----> I (instantiated, definite, compiled)
  198       |--------------------------------------------------> I (instantiated on-the-fly)
  199 
  200       ==> inconsistent module interfaces for I
  201 
  202    Nor:
  203 
  204       X (indefinite) ------> D (definite, compiled) -------v
  205       |--------------------------------------------------> I (instantiated on-the-fly)
  206 
  207       ==> D's interfaces may refer to things that only exist in I's *compiled* interfaces
  208 
  209    An alternative would be to store both type-checked and compiled interfaces
  210    for every compiled non-instantiated unit (instantiated unit can be done
  211    on-the-fly) so that we could use type-checked interfaces of D in the
  212    example above. But it would increase compilation time and unit size.
  213 
  214 
  215 The 'Unit' datatype represents a unit which may have been instantiated
  216 on-the-fly:
  217 
  218    data Unit = RealUnit DefUnitId         -- use compiled interfaces on disk
  219              | VirtUnit InstantiatedUnit  -- use on-the-fly instantiation
  220 
  221 'InstantiatedUnit' has two interesting fields:
  222 
  223    * instUnitInstanceOf :: UnitId
  224       -- ^ the indefinite unit that is instantiated
  225 
  226    * instUnitInsts :: [(ModuleName,(Unit,ModuleName)]
  227       -- ^ a list of instantiations, where an instantiation is:
  228            (module hole name, (instantiating unit, instantiating module name))
  229 
  230 A 'VirtUnit' may be indefinite or definite, it depends on whether some holes
  231 remain in the instantiated unit OR in the instantiating units (recursively).
  232 Having a fully instantiated (i.e. definite) virtual unit can lead to some issues
  233 if there is a matching compiled unit in the preload closure.  See Note [VirtUnit
  234 to RealUnit improvement]
  235 
  236 Unit database and indefinite units
  237 ----------------------------------
  238 
  239 We don't store partially instantiated units in the unit database.  Units in the
  240 database are either:
  241 
  242    * definite (fully instantiated or without holes): in this case we have
  243      *compiled* module interfaces (.hi) and object codes (.o/.a/.so/.dll/...).
  244 
  245    * fully indefinite (not instantiated at all): in this case we only have
  246      *type-checked* module interfaces (.hi).
  247 
  248 Note that indefinite units are stored as an instantiation of themselves where
  249 each instantiating module is a module variable (see Note [Representation of
  250 module/name variables]). E.g.
  251 
  252    "xyz" (UnitKey) ---> UnitInfo { instanceOf       = "xyz"
  253                                  , instantiatedWith = [A=<A>,B=<B>...]
  254                                  , ...
  255                                  }
  256 
  257 Note that non-instantiated units are also stored as an instantiation of
  258 themselves.  It is a reminiscence of previous terminology (when "instanceOf" was
  259 "componentId"). E.g.
  260 
  261    "xyz" (UnitKey) ---> UnitInfo { instanceOf       = "xyz"
  262                                  , instantiatedWith = []
  263                                  , ...
  264                                  }
  265 
  266 TODO: We should probably have `instanceOf :: Maybe UnitId` instead.
  267 
  268 
  269 Note [Pretty-printing UnitId]
  270 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  271 
  272 When we pretty-print a UnitId for the user, we try to map it back to its origin
  273 package name, version and component to print "package-version:component" instead
  274 of some hash. How to retrieve these information from a UnitId?
  275 
  276 Solution 0: ask for a UnitState to be passed each time we want to pretty-print a
  277 SDoc so that the Outputable instance for UnitId could retrieve the information
  278 from it. That what we used in the past: a DynFlags was passed and the UnitState
  279 was retrieved from it. This is wrong for several reasons:
  280 
  281     1. The UnitState is accessed when the message is printed, not when it is
  282        generated. So we could imagine that the UnitState could have changed
  283        in-between. Especially if we want to allow unit unloading.
  284 
  285     2. We want GHC to support several independent sessions at once, hence
  286        several UnitState. This approach supposes there is a unique UnitState
  287        (the one given at printing-time), moreover a UnitId doesn't indicate
  288        which UnitState it comes from (think about statically defined UnitId for
  289        wired-in units).
  290 
  291 Solution 1: an obvious approach would be to store the required information in
  292 the UnitId itself. However it doesn't work because some UnitId are defined
  293 statically for wired-in units and the same UnitId can map to different units in
  294 different contexts. This solution would make wired-in units harder to deal with.
  295 
  296 Solution 2: another approach would be to thread the UnitState to all places
  297 where a UnitId is pretty-printed and to retrieve the information from the
  298 UnitState only when needed. It would mean that UnitId couldn't have an
  299 Outputable instance as it would need an additional UnitState parameter to be
  300 printed. It means that many other types couldn't have an Outputable instance
  301 either: Unit, Module, Name, InstEnv, etc. Too many to make this solution
  302 feasible.
  303 
  304 Solution 3: the approach we use is a compromise between solutions 0 and 2: the
  305 appropriate UnitState has to be threaded close enough to the function generating
  306 the SDoc so that the latter can use `pprWithUnitState` to set the UnitState to
  307 fetch information from. However the UnitState doesn't have to be threaded
  308 explicitly all the way down to the point where the UnitId itself is printed:
  309 instead the Outputable instance of UnitId fetches the "sdocUnitIdForUser"
  310 field in the SDocContext to pretty-print.
  311 
  312    1. We can still have Outputable instances for common types (Module, Unit,
  313       Name, etc.)
  314 
  315    2. End-users don't have to pass a UnitState (via a DynFlags) to print a SDoc.
  316 
  317    3. By default "sdocUnitIdForUser" prints the UnitId hash. In case of a bug
  318       (i.e. GHC doesn't correctly call `pprWithUnitState` before pretty-printing a
  319       UnitId), that's what will be shown to the user so it's no big deal.
  320 
  321 
  322 Note [VirtUnit to RealUnit improvement]
  323 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  324 
  325 Over the course of instantiating VirtUnits on the fly while typechecking an
  326 indefinite library, we may end up with a fully instantiated VirtUnit. I.e.
  327 one that could be compiled and installed in the database. During
  328 type-checking we generate a virtual UnitId for it, say "abc".
  329 
  330 Now the question is: do we have a matching installed unit in the database?
  331 Suppose we have one with UnitId "xyz" (provided by Cabal so we don't know how
  332 to generate it). The trouble is that if both units end up being used in the
  333 same type-checking session, their names won't match (e.g. "abc:M.X" vs
  334 "xyz:M.X").
  335 
  336 As we want them to match we just replace the virtual unit with the installed
  337 one: for some reason this is called "improvement".
  338 
  339 There is one last niggle: improvement based on the unit database means
  340 that we might end up developing on a unit that is not transitively
  341 depended upon by the units the user specified directly via command line
  342 flags.  This could lead to strange and difficult to understand bugs if those
  343 instantiations are out of date.  The solution is to only improve a
  344 unit id if the new unit id is part of the 'preloadClosure'; i.e., the
  345 closure of all the units which were explicitly specified.
  346 
  347 Note [Representation of module/name variables]
  348 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  349 In our ICFP'16, we use <A> to represent module holes, and {A.T} to represent
  350 name holes.  This could have been represented by adding some new cases
  351 to the core data types, but this would have made the existing 'moduleName'
  352 and 'moduleUnit' partial, which would have required a lot of modifications
  353 to existing code.
  354 
  355 Instead, we use a fake "hole" unit:
  356 
  357      <A>   ===> hole:A
  358      {A.T} ===> hole:A.T
  359 
  360 This encoding is quite convenient, but it is also a bit dangerous too,
  361 because if you have a 'hole:A' you need to know if it's actually a
  362 'Module' or just a module stored in a 'Name'; these two cases must be
  363 treated differently when doing substitutions.  'renameHoleModule'
  364 and 'renameHoleUnit' assume they are NOT operating on a
  365 'Name'; 'NameShape' handles name substitutions exclusively.
  366 
  367 -}