never executed always true always false
1 {-# LANGUAGE FlexibleInstances #-}
2
3 -- | Units are library components from Cabal packages compiled and installed in
4 -- a database
5 module GHC.Unit
6 ( module GHC.Unit.Types
7 , module GHC.Unit.Info
8 , module GHC.Unit.Parser
9 , module GHC.Unit.State
10 , module GHC.Unit.Module
11 , module GHC.Unit.Home
12 )
13 where
14
15 import GHC.Unit.Types
16 import GHC.Unit.Info
17 import GHC.Unit.Parser
18 import GHC.Unit.Module
19 import GHC.Unit.Home
20 import GHC.Unit.State
21
22 {-
23
24 Note [About Units]
25 ~~~~~~~~~~~~~~~~~~
26
27 Haskell users are used to manipulate Cabal packages. These packages are
28 identified by:
29 - a package name :: String
30 - a package version :: Version
31 - (a revision number, when they are registered on Hackage)
32
33 Cabal packages may contain several components (libraries, programs,
34 testsuites). In GHC we are mostly interested in libraries because those are
35 the components that can be depended upon by other components. Components in a
36 package are identified by their component name. Historically only one library
37 component was allowed per package, hence it didn't need a name. For this
38 reason, component name may be empty for one library component in each
39 package:
40 - a component name :: Maybe String
41
42 UnitId
43 ------
44
45 Cabal libraries can be compiled in various ways (different compiler options
46 or Cabal flags, different dependencies, etc.), hence using package name,
47 package version and component name isn't enough to identify a built library.
48 We use another identifier called UnitId:
49
50 package name \
51 package version | ________
52 component name | hash of all this ==> | UnitId |
53 Cabal flags | --------
54 compiler options |
55 dependencies' UnitId /
56
57 Fortunately GHC doesn't have to generate these UnitId: they are provided by
58 external build tools (e.g. Cabal) with `-this-unit-id` command-line parameter.
59
60 UnitIds are important because they are used to generate internal names
61 (symbols, etc.).
62
63 Wired-in units
64 --------------
65
66 Certain libraries (ghc-prim, base, etc.) are known to the compiler and to the
67 RTS as they provide some basic primitives. Hence UnitIds of wired-in libraries
68 are fixed. Instead of letting Cabal chose the UnitId for these libraries, their
69 .cabal file uses the following stanza to force it to a specific value:
70
71 ghc-options: -this-unit-id ghc-prim -- taken from ghc-prim.cabal
72
73 The RTS also uses entities of wired-in units by directly referring to symbols
74 such as "base_GHCziIOziException_heapOverflow_closure" where the prefix is
75 the UnitId of "base" unit.
76
77 Unit databases
78 --------------
79
80 Units are stored in databases in order to be reused by other codes:
81
82 UnitKey ---> UnitInfo { exposed modules, package name, package version
83 component name, various file paths,
84 dependencies :: [UnitKey], etc. }
85
86 Because of the wired-in units described above, we can't exactly use UnitIds
87 as UnitKeys in the database: if we did this, we could only have a single unit
88 (compiled library) in the database for each wired-in library. As we want to
89 support databases containing several different units for the same wired-in
90 library, we do this:
91
92 * for non wired-in units:
93 * UnitId = UnitKey = Identifier (hash) computed by Cabal
94
95 * for wired-in units:
96 * UnitKey = Identifier computed by Cabal (just like for non wired-in units)
97 * UnitId = unit-id specified with -this-unit-id command-line flag
98
99 We can expose several units to GHC via the `package-id <unit-key>` command-line
100 parameter. We must use the UnitKeys of the units so that GHC can find them in
101 the database.
102
103 During unit loading, GHC replaces UnitKeys with UnitIds. It identifies wired
104 units by their package name (stored in their UnitInfo) and uses wired-in UnitIds
105 for them.
106
107 For example, knowing that "base", "ghc-prim" and "rts" are wired-in units, the
108 following dependency graph expressed with database UnitKeys will be transformed
109 into a similar graph expressed with UnitIds:
110
111 UnitKeys
112 ~~~~~~~~ ----------> rts-1.0-hashABC <--
113 | |
114 | |
115 foo-2.0-hash123 --> base-4.1-hashXYZ ---> ghc-prim-0.5.3-hashUVW
116
117 UnitIds
118 ~~~~~~~ ---------------> rts <--
119 | |
120 | |
121 foo-2.0-hash123 --> base ---------------> ghc-prim
122
123
124 Note that "foo-2.0-hash123" isn't wired-in so its UnitId is the same as its UnitKey.
125
126
127 Module signatures / indefinite units / instantiated units
128 ---------------------------------------------------------
129
130 GHC distinguishes two kinds of units:
131
132 * definite units:
133 * units without module holes and with definite dependencies
134 * can be compiled into machine code (.o/.a/.so/.dll/...)
135
136 * indefinite units:
137 * units with some module holes or with some indefinite dependencies
138 * can only be type-checked
139
140 Module holes are constrained by module signatures (.hsig files). Module
141 signatures are a kind of interface (similar to .hs-boot files). They are used in
142 place of some real code. GHC allows modules from other units to be used to fill
143 these module holes: the process is called "unit/module instantiation". The
144 instantiating module may either be a concrete module or a module signature. In
145 the latter case, the signatures are merged to form a new one.
146
147 You can think of this as polymorphism at the module level: module signatures
148 give constraints on the "type" of module that can be used to fill the hole
149 (where "type" means types of the exported module entitites, etc.).
150
151 Module signatures contain enough information (datatypes, abstract types, type
152 synonyms, classes, etc.) to typecheck modules depending on them but not
153 enough to compile them. As such, indefinite units found in databases only
154 provide module interfaces (the .hi ones this time), not object code.
155
156 Unit instantiation / on-the-fly instantiation
157 ---------------------------------------------
158
159 Indefinite units can be instantiated with modules from other units. The
160 instantiating units can also be instantiated themselves (if there are
161 indefinite) and so on.
162
163 On-the-fly unit instantiation is a tricky optimization explained in
164 http://blog.ezyang.com/2016/08/optimizing-incremental-compilation
165 Here is a summary:
166
167 1. Indefinite units can only be type-checked, not compiled into real code.
168 Type-checking produces interface files (.hi) which are incomplete for code
169 generation (they lack unfoldings, etc.) but enough to perform type-checking
170 of units depending on them.
171
172 2. Type-checking an instantiated unit is cheap as we only have to merge
173 interface files (.hi) of the instantiated unit and of the instantiating
174 units, hence it can be done on-the-fly. Interface files of the dependencies
175 can be concrete or produced on-the-fly recursively.
176
177 3. When we compile a unit, we mustn't use interfaces produced by the
178 type-checker (on-the-fly or not) for the instantiated unit dependencies
179 because they lack some information.
180
181 4. When we type-check an indefinite unit, we must be consistent about the
182 interfaces we use for each dependency: only those produced by the
183 type-checker (on-the-fly or not) or only those produced after a full
184 compilation, but not both at the same time.
185
186 It can be tricky if we have the following kind of dependency graph:
187
188 X (indefinite) ------> D (definite, compiled) -----> I (instantiated, definite, compiled)
189 |----------------------------------------------------^
190
191 Suppose we want to type-check unit X which depends on unit I and D:
192 * I is definite and compiled: we have compiled .hi files for its modules on disk
193 * I is instantiated: it is cheap to produce type-checker .hi files for its modules on-the-fly
194
195 But we must not do:
196
197 X (indefinite) ------> D (definite, compiled) -----> I (instantiated, definite, compiled)
198 |--------------------------------------------------> I (instantiated on-the-fly)
199
200 ==> inconsistent module interfaces for I
201
202 Nor:
203
204 X (indefinite) ------> D (definite, compiled) -------v
205 |--------------------------------------------------> I (instantiated on-the-fly)
206
207 ==> D's interfaces may refer to things that only exist in I's *compiled* interfaces
208
209 An alternative would be to store both type-checked and compiled interfaces
210 for every compiled non-instantiated unit (instantiated unit can be done
211 on-the-fly) so that we could use type-checked interfaces of D in the
212 example above. But it would increase compilation time and unit size.
213
214
215 The 'Unit' datatype represents a unit which may have been instantiated
216 on-the-fly:
217
218 data Unit = RealUnit DefUnitId -- use compiled interfaces on disk
219 | VirtUnit InstantiatedUnit -- use on-the-fly instantiation
220
221 'InstantiatedUnit' has two interesting fields:
222
223 * instUnitInstanceOf :: UnitId
224 -- ^ the indefinite unit that is instantiated
225
226 * instUnitInsts :: [(ModuleName,(Unit,ModuleName)]
227 -- ^ a list of instantiations, where an instantiation is:
228 (module hole name, (instantiating unit, instantiating module name))
229
230 A 'VirtUnit' may be indefinite or definite, it depends on whether some holes
231 remain in the instantiated unit OR in the instantiating units (recursively).
232 Having a fully instantiated (i.e. definite) virtual unit can lead to some issues
233 if there is a matching compiled unit in the preload closure. See Note [VirtUnit
234 to RealUnit improvement]
235
236 Unit database and indefinite units
237 ----------------------------------
238
239 We don't store partially instantiated units in the unit database. Units in the
240 database are either:
241
242 * definite (fully instantiated or without holes): in this case we have
243 *compiled* module interfaces (.hi) and object codes (.o/.a/.so/.dll/...).
244
245 * fully indefinite (not instantiated at all): in this case we only have
246 *type-checked* module interfaces (.hi).
247
248 Note that indefinite units are stored as an instantiation of themselves where
249 each instantiating module is a module variable (see Note [Representation of
250 module/name variables]). E.g.
251
252 "xyz" (UnitKey) ---> UnitInfo { instanceOf = "xyz"
253 , instantiatedWith = [A=<A>,B=<B>...]
254 , ...
255 }
256
257 Note that non-instantiated units are also stored as an instantiation of
258 themselves. It is a reminiscence of previous terminology (when "instanceOf" was
259 "componentId"). E.g.
260
261 "xyz" (UnitKey) ---> UnitInfo { instanceOf = "xyz"
262 , instantiatedWith = []
263 , ...
264 }
265
266 TODO: We should probably have `instanceOf :: Maybe UnitId` instead.
267
268
269 Note [Pretty-printing UnitId]
270 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
271
272 When we pretty-print a UnitId for the user, we try to map it back to its origin
273 package name, version and component to print "package-version:component" instead
274 of some hash. How to retrieve these information from a UnitId?
275
276 Solution 0: ask for a UnitState to be passed each time we want to pretty-print a
277 SDoc so that the Outputable instance for UnitId could retrieve the information
278 from it. That what we used in the past: a DynFlags was passed and the UnitState
279 was retrieved from it. This is wrong for several reasons:
280
281 1. The UnitState is accessed when the message is printed, not when it is
282 generated. So we could imagine that the UnitState could have changed
283 in-between. Especially if we want to allow unit unloading.
284
285 2. We want GHC to support several independent sessions at once, hence
286 several UnitState. This approach supposes there is a unique UnitState
287 (the one given at printing-time), moreover a UnitId doesn't indicate
288 which UnitState it comes from (think about statically defined UnitId for
289 wired-in units).
290
291 Solution 1: an obvious approach would be to store the required information in
292 the UnitId itself. However it doesn't work because some UnitId are defined
293 statically for wired-in units and the same UnitId can map to different units in
294 different contexts. This solution would make wired-in units harder to deal with.
295
296 Solution 2: another approach would be to thread the UnitState to all places
297 where a UnitId is pretty-printed and to retrieve the information from the
298 UnitState only when needed. It would mean that UnitId couldn't have an
299 Outputable instance as it would need an additional UnitState parameter to be
300 printed. It means that many other types couldn't have an Outputable instance
301 either: Unit, Module, Name, InstEnv, etc. Too many to make this solution
302 feasible.
303
304 Solution 3: the approach we use is a compromise between solutions 0 and 2: the
305 appropriate UnitState has to be threaded close enough to the function generating
306 the SDoc so that the latter can use `pprWithUnitState` to set the UnitState to
307 fetch information from. However the UnitState doesn't have to be threaded
308 explicitly all the way down to the point where the UnitId itself is printed:
309 instead the Outputable instance of UnitId fetches the "sdocUnitIdForUser"
310 field in the SDocContext to pretty-print.
311
312 1. We can still have Outputable instances for common types (Module, Unit,
313 Name, etc.)
314
315 2. End-users don't have to pass a UnitState (via a DynFlags) to print a SDoc.
316
317 3. By default "sdocUnitIdForUser" prints the UnitId hash. In case of a bug
318 (i.e. GHC doesn't correctly call `pprWithUnitState` before pretty-printing a
319 UnitId), that's what will be shown to the user so it's no big deal.
320
321
322 Note [VirtUnit to RealUnit improvement]
323 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
324
325 Over the course of instantiating VirtUnits on the fly while typechecking an
326 indefinite library, we may end up with a fully instantiated VirtUnit. I.e.
327 one that could be compiled and installed in the database. During
328 type-checking we generate a virtual UnitId for it, say "abc".
329
330 Now the question is: do we have a matching installed unit in the database?
331 Suppose we have one with UnitId "xyz" (provided by Cabal so we don't know how
332 to generate it). The trouble is that if both units end up being used in the
333 same type-checking session, their names won't match (e.g. "abc:M.X" vs
334 "xyz:M.X").
335
336 As we want them to match we just replace the virtual unit with the installed
337 one: for some reason this is called "improvement".
338
339 There is one last niggle: improvement based on the unit database means
340 that we might end up developing on a unit that is not transitively
341 depended upon by the units the user specified directly via command line
342 flags. This could lead to strange and difficult to understand bugs if those
343 instantiations are out of date. The solution is to only improve a
344 unit id if the new unit id is part of the 'preloadClosure'; i.e., the
345 closure of all the units which were explicitly specified.
346
347 Note [Representation of module/name variables]
348 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
349 In our ICFP'16, we use <A> to represent module holes, and {A.T} to represent
350 name holes. This could have been represented by adding some new cases
351 to the core data types, but this would have made the existing 'moduleName'
352 and 'moduleUnit' partial, which would have required a lot of modifications
353 to existing code.
354
355 Instead, we use a fake "hole" unit:
356
357 <A> ===> hole:A
358 {A.T} ===> hole:A.T
359
360 This encoding is quite convenient, but it is also a bit dangerous too,
361 because if you have a 'hole:A' you need to know if it's actually a
362 'Module' or just a module stored in a 'Name'; these two cases must be
363 treated differently when doing substitutions. 'renameHoleModule'
364 and 'renameHoleUnit' assume they are NOT operating on a
365 'Name'; 'NameShape' handles name substitutions exclusively.
366
367 -}