never executed always true always false
1
2 -- | Constructed Product Result analysis. Identifies functions that surely
3 -- return heap-allocated records on every code path, so that we can eliminate
4 -- said heap allocation by performing a worker/wrapper split.
5 --
6 -- See https://www.microsoft.com/en-us/research/publication/constructed-product-result-analysis-haskell/.
7 -- CPR analysis should happen after strictness analysis.
8 -- See Note [Phase ordering].
9 module GHC.Core.Opt.CprAnal ( cprAnalProgram ) where
10
11 import GHC.Prelude
12
13 import GHC.Driver.Session
14
15 import GHC.Builtin.Names ( runRWKey )
16
17 import GHC.Types.Var.Env
18 import GHC.Types.Basic
19 import GHC.Types.Id
20 import GHC.Types.Id.Info
21 import GHC.Types.Demand
22 import GHC.Types.Cpr
23
24 import GHC.Core.FamInstEnv
25 import GHC.Core.DataCon
26 import GHC.Core.Type
27 import GHC.Core.Utils
28 import GHC.Core
29 import GHC.Core.Seq
30 import GHC.Core.Opt.WorkWrap.Utils
31
32 import GHC.Data.Graph.UnVar -- for UnVarSet
33
34 import GHC.Utils.Outputable
35 import GHC.Utils.Misc
36 import GHC.Utils.Panic
37 import GHC.Utils.Panic.Plain
38 import GHC.Utils.Logger ( Logger, putDumpFileMaybe, DumpFormat (..) )
39
40 import Data.List ( mapAccumL )
41
42 {- Note [Constructed Product Result]
43 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
44 The goal of Constructed Product Result analysis is to identify functions that
45 surely return heap-allocated records on every code path, so that we can
46 eliminate said heap allocation by performing a worker/wrapper split
47 (via 'GHC.Core.Opt.WorkWrap.Utils.mkWWcpr_entry').
48
49 `swap` below is such a function:
50 ```
51 swap (a, b) = (b, a)
52 ```
53 A `case` on an application of `swap`, like
54 `case swap (10, 42) of (a, b) -> a + b` could cancel away
55 (by case-of-known-constructor) if we \"inlined\" `swap` and simplified. We then
56 say that `swap` has the CPR property.
57
58 We can't inline recursive functions, but similar reasoning applies there:
59 ```
60 f x n = case n of
61 0 -> (x, 0)
62 _ -> f (x+1) (n-1)
63 ```
64 Inductively, `case f 1 2 of (a, b) -> a + b` could cancel away the constructed
65 product with the case. So `f`, too, has the CPR property. But we can't really
66 "inline" `f`, because it's recursive. Also, non-recursive functions like `swap`
67 might be too big to inline (or even marked NOINLINE). We still want to exploit
68 the CPR property, and that is exactly what the worker/wrapper transformation
69 can do for us:
70 ```
71 $wf x n = case n of
72 0 -> case (x, 0) of -> (a, b) -> (# a, b #)
73 _ -> case f (x+1) (n-1) of (a, b) -> (# a, b #)
74 f x n = case $wf x n of (# a, b #) -> (a, b)
75 ```
76 where $wf readily simplifies (by case-of-known-constructor and inlining `f`) to:
77 ```
78 $wf x n = case n of
79 0 -> (# x, 0 #)
80 _ -> $wf (x+1) (n-1)
81 ```
82 Now, a call site like `case f 1 2 of (a, b) -> a + b` can inline `f` and
83 eliminate the heap-allocated pair constructor.
84
85 Note [Nested CPR]
86 ~~~~~~~~~~~~~~~~~
87 We can apply Note [Constructed Product Result] deeper than just the top-level
88 result constructor of a function, e.g.,
89 ```
90 g x
91 | even x = (x+1,x+2) :: (Int, Int)
92 | odd x = (x+2,x+3)
93 ```
94 Not only does `g` return a constructed pair, the pair components /also/ have the
95 CPR property. We can split `g` for its /nested/ CPR property, as follows:
96 ```
97 $wg (x :: Int#)
98 | .. x .. = (# x +# 1#, x +# 2# #) :: (# Int#, Int# #)
99 | .. x .. = (# x +# 2#, x +# 3# #)
100 g (I# x) = case $wf x of (# y, z #) -> (I# y, I# z)
101 ```
102 Note however that in the following we will only unbox the second component,
103 even if `foo` has the CPR property:
104 ```
105 h x
106 | even x = (foo x, x+2) :: (Int, Int)
107 | odd x = (x+2, x+3)
108 -- where `foo` has the CPR property
109 ```
110 Why can't we also unbox `foo x`? Because in order to do so, we have to evaluate
111 it and that might diverge, so we cannot give `h` the nested CPR property in the
112 first component of the result.
113
114 The Right Thing is to do a termination analysis, to see if we can guarantee that
115 `foo` terminates quickly, in which case we can speculatively evaluate `foo x` and
116 hence give `h` a nested CPR property. That is done in !1866. But for now we
117 have an incredibly simple termination analysis; an expression terminates fast
118 iff it is in HNF: see `exprTerminates`. We call `exprTerminates` in
119 `cprTransformDataConWork`, which is the main function figuring out whether it's
120 OK to propagate nested CPR info (in `extract_nested_cpr`).
121
122 In addition to `exprTerminates`, `extract_nested_cpr` also looks at the
123 `StrictnessMark` of the corresponding constructor field. Example:
124 ```
125 data T a = MkT !a
126 h2 x
127 | even x = MkT (foo x) :: T Int
128 | odd x = MkT (x+2)
129 -- where `foo` has the CPR property
130 ```
131 Regardless of whether or not `foo` terminates, we may unbox the strict field,
132 because it has to be evaluated (the Core for `MkT (foo x)` will look more like
133 `case foo x of y { __DEFAULT -> MkT y }`).
134
135 Surprisingly, there are local binders with a strict demand that *do not*
136 terminate quickly in a sense that is useful to us! The following function
137 demonstrates that:
138 ```
139 j x = (let t = x+1 in t+t, 42)
140 ```
141 Here, `t` is used strictly, *but only within its scope in the first pair
142 component*. `t` satisfies Note [CPR for binders that will be unboxed], so it has
143 the CPR property, nevertheless we may not unbox `j` deeply lest evaluation of
144 `x` diverges. The termination analysis must say "Might diverge" for `t` and we
145 won't unbox the first pair component.
146 There are a couple of tests in T18174 that show case Nested CPR. Some of them
147 only work with the termination analysis from !1866.
148
149 Giving the (Nested) CPR property to deep data structures can lead to loss of
150 sharing; see Note [CPR for data structures can destroy sharing].
151
152 Note [Phase ordering]
153 ~~~~~~~~~~~~~~~~~~~~~
154 We need to perform strictness analysis before CPR analysis, because that might
155 unbox some arguments, in turn leading to more constructed products.
156 Ideally, we would want the following pipeline:
157
158 1. Strictness
159 2. worker/wrapper (for strictness)
160 3. CPR
161 4. worker/wrapper (for CPR)
162
163 Currently, we omit 2. and anticipate the results of worker/wrapper.
164 See Note [CPR for binders that will be unboxed].
165 An additional w/w pass would simplify things, but probably add slight overhead.
166 So currently we have
167
168 1. Strictness
169 2. CPR
170 3. worker/wrapper (for strictness and CPR)
171 -}
172
173 --
174 -- * Analysing programs
175 --
176
177 cprAnalProgram :: Logger -> FamInstEnvs -> CoreProgram -> IO CoreProgram
178 cprAnalProgram logger fam_envs binds = do
179 let env = emptyAnalEnv fam_envs
180 let binds_plus_cpr = snd $ mapAccumL cprAnalTopBind env binds
181 putDumpFileMaybe logger Opt_D_dump_cpr_signatures "Cpr signatures" FormatText $
182 dumpIdInfoOfProgram (ppr . cprSigInfo) binds_plus_cpr
183 -- See Note [Stamp out space leaks in demand analysis] in GHC.Core.Opt.DmdAnal
184 seqBinds binds_plus_cpr `seq` return binds_plus_cpr
185
186 -- Analyse a (group of) top-level binding(s)
187 cprAnalTopBind :: AnalEnv
188 -> CoreBind
189 -> (AnalEnv, CoreBind)
190 cprAnalTopBind env (NonRec id rhs)
191 = (env', NonRec id' rhs')
192 where
193 (id', rhs', env') = cprAnalBind env id rhs
194
195 cprAnalTopBind env (Rec pairs)
196 = (env', Rec pairs')
197 where
198 (env', pairs') = cprFix env pairs
199
200 --
201 -- * Analysing expressions
202 --
203
204 -- | The abstract semantic function ⟦_⟧ : Expr -> Env -> A from
205 -- "Constructed Product Result Analysis for Haskell"
206 cprAnal, cprAnal'
207 :: AnalEnv
208 -> CoreExpr -- ^ expression to be denoted by a 'CprType'
209 -> (CprType, CoreExpr) -- ^ the updated expression and its 'CprType'
210
211 cprAnal env e = -- pprTraceWith "cprAnal" (\res -> ppr (fst (res)) $$ ppr e) $
212 cprAnal' env e
213
214 cprAnal' _ (Lit lit) = (topCprType, Lit lit)
215 cprAnal' _ (Type ty) = (topCprType, Type ty) -- Doesn't happen, in fact
216 cprAnal' _ (Coercion co) = (topCprType, Coercion co)
217
218 cprAnal' env (Cast e co)
219 = (cpr_ty, Cast e' co)
220 where
221 (cpr_ty, e') = cprAnal env e
222
223 cprAnal' env (Tick t e)
224 = (cpr_ty, Tick t e')
225 where
226 (cpr_ty, e') = cprAnal env e
227
228 cprAnal' env e@(Var{})
229 = cprAnalApp env e []
230 cprAnal' env e@(App{})
231 = cprAnalApp env e []
232
233 cprAnal' env (Lam var body)
234 | isTyVar var
235 , (body_ty, body') <- cprAnal env body
236 = (body_ty, Lam var body')
237 | otherwise
238 = (lam_ty, Lam var body')
239 where
240 -- See Note [CPR for binders that will be unboxed]
241 env' = extendSigEnvForArg env var
242 (body_ty, body') = cprAnal env' body
243 lam_ty = abstractCprTy body_ty
244
245 cprAnal' env (Case scrut case_bndr ty alts)
246 = (res_ty, Case scrut' case_bndr ty alts')
247 where
248 (scrut_ty, scrut') = cprAnal env scrut
249 env' = extendSigEnv env case_bndr (CprSig scrut_ty)
250 (alt_tys, alts') = mapAndUnzip (cprAnalAlt env' scrut_ty) alts
251 res_ty = foldl' lubCprType botCprType alt_tys
252
253 cprAnal' env (Let (NonRec id rhs) body)
254 = (body_ty, Let (NonRec id' rhs') body')
255 where
256 (id', rhs', env') = cprAnalBind env id rhs
257 (body_ty, body') = cprAnal env' body
258
259 cprAnal' env (Let (Rec pairs) body)
260 = body_ty `seq` (body_ty, Let (Rec pairs') body')
261 where
262 (env', pairs') = cprFix env pairs
263 (body_ty, body') = cprAnal env' body
264
265 cprAnalAlt
266 :: AnalEnv
267 -> CprType -- ^ CPR type of the scrutinee
268 -> Alt Var -- ^ current alternative
269 -> (CprType, Alt Var)
270 cprAnalAlt env scrut_ty (Alt con bndrs rhs)
271 = (rhs_ty, Alt con bndrs rhs')
272 where
273 env_alt
274 | DataAlt dc <- con
275 , let ids = filter isId bndrs
276 , CprType arity cpr <- scrut_ty
277 , assert (arity == 0 ) True
278 = case unpackConFieldsCpr dc cpr of
279 AllFieldsSame field_cpr
280 | let sig = mkCprSig 0 field_cpr
281 -> extendSigEnvAllSame env ids sig
282 ForeachField field_cprs
283 | let sigs = zipWith (mkCprSig . idArity) ids field_cprs
284 -> extendSigEnvList env (zipEqual "cprAnalAlt" ids sigs)
285 | otherwise
286 = env
287 (rhs_ty, rhs') = cprAnal env_alt rhs
288
289 --
290 -- * CPR transformer
291 --
292
293 data TermFlag -- Better than using a Bool
294 = Terminates
295 | MightDiverge
296
297 -- See Note [Nested CPR]
298 exprTerminates :: CoreExpr -> TermFlag
299 exprTerminates e
300 | exprIsHNF e = Terminates -- A /very/ simple termination analysis.
301 | otherwise = MightDiverge
302
303 cprAnalApp :: AnalEnv -> CoreExpr -> [(CprType, CoreArg)] -> (CprType, CoreExpr)
304 -- Main function that takes care of /nested/ CPR. See Note [Nested CPR]
305 cprAnalApp env e arg_infos = go e arg_infos []
306 where
307 go e arg_infos args'
308 -- Collect CprTypes for (value) args (inlined collectArgs):
309 | App fn arg <- e, isTypeArg arg -- Don't analyse Type args
310 = go fn arg_infos (arg:args')
311 | App fn arg <- e
312 , arg_info@(_arg_ty, arg') <- cprAnal env arg
313 -- See Note [Nested CPR] on the need for termination analysis
314 = go fn (arg_info:arg_infos) (arg':args')
315
316 | Var fn <- e
317 = (cprTransform env fn arg_infos, mkApps e args')
318
319 | (e_ty, e') <- cprAnal env e -- e is not an App and not a Var
320 = (applyCprTy e_ty (length arg_infos), mkApps e' args')
321
322 cprTransform :: AnalEnv -- ^ The analysis environment
323 -> Id -- ^ The function
324 -> [(CprType, CoreArg)] -- ^ info about incoming /value/ arguments
325 -> CprType -- ^ The demand type of the application
326 cprTransform env id args
327 -- Any local binding, except for data structure bindings
328 -- See Note [Efficient Top sigs in SigEnv]
329 | Just sig <- lookupSigEnv env id
330 = applyCprTy (getCprSig sig) (length args)
331 -- See Note [CPR for data structures]
332 | Just rhs <- cprDataStructureUnfolding_maybe id
333 = fst $ cprAnal env rhs
334 -- Some (mostly global, known-key) Ids have bespoke CPR transformers
335 | Just cpr_ty <- cprTransformBespoke id args
336 = cpr_ty
337 -- Other local Ids that respond True to 'isDataStructure' but don't have an
338 -- expandable unfolding, such as NOINLINE bindings. They all get a top sig
339 | isLocalId id
340 = assertPpr (isDataStructure id) (ppr id) topCprType
341 -- See Note [CPR for DataCon wrappers]
342 | isDataConWrapId id, let rhs = uf_tmpl (realIdUnfolding id)
343 = fst $ cprAnalApp env rhs args
344 -- DataCon worker
345 | Just con <- isDataConWorkId_maybe id
346 = cprTransformDataConWork (ae_fam_envs env) con args
347 -- Imported function
348 | otherwise
349 = applyCprTy (getCprSig (idCprSig id)) (length args)
350
351 -- | Precise, hand-written CPR transformers for select Ids
352 cprTransformBespoke :: Id -> [(CprType, CoreArg)] -> Maybe CprType
353 cprTransformBespoke id args
354 -- See Note [Simplification of runRW#] in GHC.CoreToStg.Prep
355 | idUnique id == runRWKey -- `runRW (\s -> e)`
356 , [(arg_ty, _arg)] <- args -- `\s -> e` has CPR type `arg` (e.g. `. -> 2`)
357 = Just $ applyCprTy arg_ty 1 -- `e` has CPR type `2`
358 | otherwise
359 = Nothing
360
361 -- | Get a (possibly nested) 'CprType' for an application of a 'DataCon' worker,
362 -- given a saturated number of 'CprType's for its field expressions.
363 -- Implements the Nested part of Note [Nested CPR].
364 cprTransformDataConWork :: FamInstEnvs -> DataCon -> [(CprType, CoreArg)] -> CprType
365 cprTransformDataConWork fam_envs con args
366 | null (dataConExTyCoVars con) -- No existentials
367 , wkr_arity <= mAX_CPR_SIZE -- See Note [Trimming to mAX_CPR_SIZE]
368 , args `lengthIs` wkr_arity
369 , isRecDataCon fam_envs fuel con /= DefinitelyRecursive -- See Note [CPR for recursive data constructors]
370 -- , pprTrace "cprTransformDataConWork" (ppr con <+> ppr wkr_arity <+> ppr args) True
371 = CprType 0 (ConCpr (dataConTag con) (strictZipWith extract_nested_cpr args wkr_str_marks))
372 | otherwise
373 = topCprType
374 where
375 fuel = 3 -- If we can unbox more than 3 constructors to find a
376 -- recursive occurrence, then we can just as well unbox it
377 -- See Note [CPR for recursive data constructors], point (4)
378 wkr_arity = dataConRepArity con
379 wkr_str_marks = dataConRepStrictness con
380 -- See Note [Nested CPR]
381 extract_nested_cpr (CprType 0 cpr, arg) str
382 | MarkedStrict <- str = cpr
383 | Terminates <- exprTerminates arg = cpr
384 extract_nested_cpr _ _ = topCpr -- intervening lambda or doesn't terminate
385
386 -- | See Note [Trimming to mAX_CPR_SIZE].
387 mAX_CPR_SIZE :: Arity
388 mAX_CPR_SIZE = 10
389
390 --
391 -- * Bindings
392 --
393
394 -- Recursive bindings
395 cprFix :: AnalEnv -- Does not include bindings for this binding
396 -> [(Id,CoreExpr)]
397 -> (AnalEnv, [(Id,CoreExpr)]) -- Binders annotated with CPR info
398 cprFix orig_env orig_pairs
399 = loop 1 init_env init_pairs
400 where
401 init_sig id
402 -- See Note [CPR for data structures]
403 | isDataStructure id = topCprSig
404 | otherwise = mkCprSig 0 botCpr
405 -- See Note [Initialising strictness] in GHC.Core.Opt.DmdAnal
406 orig_virgin = ae_virgin orig_env
407 init_pairs | orig_virgin = [(setIdCprSig id (init_sig id), rhs) | (id, rhs) <- orig_pairs ]
408 | otherwise = orig_pairs
409 init_env = extendSigEnvFromIds orig_env (map fst init_pairs)
410
411 -- The fixed-point varies the idCprSig field of the binders and and their
412 -- entries in the AnalEnv, and terminates if that annotation does not change
413 -- any more.
414 loop :: Int -> AnalEnv -> [(Id,CoreExpr)] -> (AnalEnv, [(Id,CoreExpr)])
415 loop n env pairs
416 | found_fixpoint = (reset_env', pairs')
417 | otherwise = loop (n+1) env' pairs'
418 where
419 -- In all but the first iteration, delete the virgin flag
420 -- See Note [Initialising strictness] in GHC.Core.Opt.DmdAnal
421 (env', pairs') = step (applyWhen (n/=1) nonVirgin env) pairs
422 -- Make sure we reset the virgin flag to what it was when we are stable
423 reset_env' = env'{ ae_virgin = orig_virgin }
424 found_fixpoint = map (idCprSig . fst) pairs' == map (idCprSig . fst) pairs
425
426 step :: AnalEnv -> [(Id, CoreExpr)] -> (AnalEnv, [(Id, CoreExpr)])
427 step env pairs = mapAccumL go env pairs
428 where
429 go env (id, rhs) = (env', (id', rhs'))
430 where
431 (id', rhs', env') = cprAnalBind env id rhs
432
433 -- | Process the RHS of the binding for a sensible arity, add the CPR signature
434 -- to the Id, and augment the environment with the signature as well.
435 cprAnalBind
436 :: AnalEnv
437 -> Id
438 -> CoreExpr
439 -> (Id, CoreExpr, AnalEnv)
440 cprAnalBind env id rhs
441 | isDFunId id -- Never give DFuns the CPR property; we'll never save allocs.
442 = (id, rhs, extendSigEnv env id topCprSig)
443 -- See Note [CPR for data structures]
444 | isDataStructure id
445 = (id, rhs, env) -- Data structure => no code => no need to analyse rhs
446 | otherwise
447 = (id', rhs', env')
448 where
449 (rhs_ty, rhs') = cprAnal env rhs
450 -- possibly trim thunk CPR info
451 rhs_ty'
452 -- See Note [CPR for thunks]
453 | stays_thunk = trimCprTy rhs_ty
454 | otherwise = rhs_ty
455 -- See Note [Arity trimming for CPR signatures]
456 sig = mkCprSigForArity (idArity id) rhs_ty'
457 id' = setIdCprSig id sig
458 env' = extendSigEnv env id sig
459
460 -- See Note [CPR for thunks]
461 stays_thunk = is_thunk && not_strict
462 is_thunk = not (exprIsHNF rhs) && not (isJoinId id)
463 not_strict = not (isStrUsedDmd (idDemandInfo id))
464
465 isDataStructure :: Id -> Bool
466 -- See Note [CPR for data structures]
467 isDataStructure id =
468 not (isJoinId id) && idArity id == 0 && isEvaldUnfolding (idUnfolding id)
469
470 -- | Returns an expandable unfolding
471 -- (See Note [exprIsExpandable] in "GHC.Core.Utils") that has
472 -- So effectively is a constructor application.
473 cprDataStructureUnfolding_maybe :: Id -> Maybe CoreExpr
474 cprDataStructureUnfolding_maybe id
475 -- There are only FinalPhase Simplifier runs after CPR analysis
476 | activeInFinalPhase (idInlineActivation id)
477 , isDataStructure id
478 = expandUnfolding_maybe (idUnfolding id)
479 | otherwise
480 = Nothing
481
482 {- Note [Arity trimming for CPR signatures]
483 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
484 Although it doesn't affect correctness of the analysis per se, we have to trim
485 CPR signatures to idArity. Here's what might happen if we don't:
486
487 f x = if expensive
488 then \y. Box y
489 else \z. Box z
490 g a b = f a b
491
492 The two lambdas will have a CPR type of @1m@ (so construct a product after
493 applied to one argument). Thus, @f@ will have a CPR signature of @2m@
494 (constructs a product after applied to two arguments).
495 But WW will never eta-expand @f@! In this case that would amount to possibly
496 duplicating @expensive@ work.
497
498 (Side note: Even if @f@'s 'idArity' happened to be 2, it would not do so, see
499 Note [Don't eta expand in w/w].)
500
501 So @f@ will not be worker/wrappered. But @g@ also inherited its CPR signature
502 from @f@'s, so it *will* be WW'd:
503
504 f x = if expensive
505 then \y. Box y
506 else \z. Box z
507 $wg a b = case f a b of Box x -> x
508 g a b = Box ($wg a b)
509
510 And the case in @g@ can never cancel away, thus we introduced extra reboxing.
511 Hence we always trim the CPR signature of a binding to idArity.
512
513 Note [CPR for DataCon wrappers]
514 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
515 We used to give DataCon wrappers a (necessarily flat) CPR signature in
516 'GHC.Types.Id.Make.mkDataConRep'. Now we transform DataCon wrappers simply by
517 analysing their unfolding. A few reasons for the change:
518
519 1. DataCon wrappers are generally inlined in the Final phase (so before CPR),
520 all leftover occurrences are in a boring context like `f x y = $WMkT y x`.
521 It's simpler to analyse the unfolding anew at every such call site, and the
522 unfolding will be pretty cheap to analyse. Also they occur seldom enough
523 that performance-wise it doesn't matter.
524 2. 'GHC.Types.Id.Make' no longer precomputes CPR signatures for DataCon
525 *workers*, because their transformers need to adapt to CPR for their
526 arguments in 'cprTransformDataConWork' to enable Note [Nested CPR].
527 Better keep it all in this module! The alternative would be that
528 'GHC.Types.Id.Make' depends on DmdAnal.
529 3. In the future, Nested CPR could take a better account of incoming args
530 in cprAnalApp and do some beta-reduction on the fly, like !1866 did. If
531 any of those args had the CPR property, then we'd even get Nested CPR for
532 DataCon wrapper calls, for free. Not so if we simply give the wrapper a
533 single CPR sig in 'GHC.Types.Id.Make.mkDataConRep'!
534
535 Note [Trimming to mAX_CPR_SIZE]
536 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
537 We do not treat very big tuples as CPR-ish:
538
539 a) For a start, we get into trouble because there aren't
540 "enough" unboxed tuple types (a tiresome restriction,
541 but hard to fix),
542 b) More importantly, big unboxed tuples get returned mainly
543 on the stack, and are often then allocated in the heap
544 by the caller. So doing CPR for them may in fact make
545 things worse, especially if the wrapper doesn't cancel away
546 and we move to the stack in the worker and then to the heap
547 in the wrapper.
548
549 So we (nested) CPR for functions that would otherwise pass more than than
550 'mAX_CPR_SIZE' fields.
551 That effect is exacerbated for the unregisterised backend, where we
552 don't have any hardware registers to return the fields in. Returning
553 everything on the stack results in much churn and increases compiler
554 allocation by 15% for T15164 in a validate build.
555 -}
556
557 data AnalEnv
558 = AE
559 { ae_sigs :: SigEnv
560 -- ^ Current approximation of signatures for local ids
561 , ae_virgin :: Bool
562 -- ^ True only on every first iteration in a fixed-point
563 -- iteration. See Note [Initialising strictness] in "GHC.Core.Opt.DmdAnal"
564 , ae_fam_envs :: FamInstEnvs
565 -- ^ Needed when expanding type families and synonyms of product types.
566 }
567
568 instance Outputable AnalEnv where
569 ppr (AE { ae_sigs = env, ae_virgin = virgin })
570 = text "AE" <+> braces (vcat
571 [ text "ae_virgin =" <+> ppr virgin
572 , text "ae_sigs =" <+> ppr env ])
573
574 -- | An environment storing 'CprSig's for local Ids.
575 -- Puts binders with 'topCprSig' in a space-saving 'IntSet'.
576 -- See Note [Efficient Top sigs in SigEnv].
577 data SigEnv
578 = SE
579 { se_tops :: !UnVarSet
580 -- ^ All these Ids have 'topCprSig'. Like a 'VarSet', but more efficient.
581 , se_sigs :: !(VarEnv CprSig)
582 -- ^ Ids that have something other than 'topCprSig'.
583 }
584
585 instance Outputable SigEnv where
586 ppr (SE { se_tops = tops, se_sigs = sigs })
587 = text "SE" <+> braces (vcat
588 [ text "se_tops =" <+> ppr tops
589 , text "se_sigs =" <+> ppr sigs ])
590
591 emptyAnalEnv :: FamInstEnvs -> AnalEnv
592 emptyAnalEnv fam_envs
593 = AE
594 { ae_sigs = SE emptyUnVarSet emptyVarEnv
595 , ae_virgin = True
596 , ae_fam_envs = fam_envs
597 }
598
599 modifySigEnv :: (SigEnv -> SigEnv) -> AnalEnv -> AnalEnv
600 modifySigEnv f env = env { ae_sigs = f (ae_sigs env) }
601
602 lookupSigEnv :: AnalEnv -> Id -> Maybe CprSig
603 -- See Note [Efficient Top sigs in SigEnv]
604 lookupSigEnv AE{ae_sigs = SE tops sigs} id
605 | id `elemUnVarSet` tops = Just topCprSig
606 | otherwise = lookupVarEnv sigs id
607
608 extendSigEnv :: AnalEnv -> Id -> CprSig -> AnalEnv
609 -- See Note [Efficient Top sigs in SigEnv]
610 extendSigEnv env id sig
611 | isTopCprSig sig
612 = modifySigEnv (\se -> se{se_tops = extendUnVarSet id (se_tops se)}) env
613 | otherwise
614 = modifySigEnv (\se -> se{se_sigs = extendVarEnv (se_sigs se) id sig}) env
615
616 -- | Extend an environment with the (Id, CPR sig) pairs
617 extendSigEnvList :: AnalEnv -> [(Id, CprSig)] -> AnalEnv
618 extendSigEnvList env ids_cprs
619 = foldl' (\env (id, sig) -> extendSigEnv env id sig) env ids_cprs
620
621 -- | Extend an environment with the CPR sigs attached to the ids
622 extendSigEnvFromIds :: AnalEnv -> [Id] -> AnalEnv
623 extendSigEnvFromIds env ids
624 = foldl' (\env id -> extendSigEnv env id (idCprSig id)) env ids
625
626 -- | Extend an environment with the same CPR sig for all ids
627 extendSigEnvAllSame :: AnalEnv -> [Id] -> CprSig -> AnalEnv
628 extendSigEnvAllSame env ids sig
629 = foldl' (\env id -> extendSigEnv env id sig) env ids
630
631 nonVirgin :: AnalEnv -> AnalEnv
632 nonVirgin env = env { ae_virgin = False }
633
634 -- | A version of 'extendSigEnv' for a binder of which we don't see the RHS
635 -- needed to compute a 'CprSig' (e.g. lambdas and DataAlt field binders).
636 -- In this case, we can still look at their demand to attach CPR signatures
637 -- anticipating the unboxing done by worker/wrapper.
638 -- See Note [CPR for binders that will be unboxed].
639 extendSigEnvForArg :: AnalEnv -> Id -> AnalEnv
640 extendSigEnvForArg env id
641 = extendSigEnv env id (CprSig (argCprType (idDemandInfo id)))
642
643 -- | Produces a 'CprType' according to how a strict argument will be unboxed.
644 -- Examples:
645 --
646 -- * A head-strict demand @1!L@ would translate to @1@
647 -- * A product demand @1!P(1!L,L)@ would translate to @1(1,)@
648 -- * A product demand @1!P(1L,L)@ would translate to @1(,)@,
649 -- because the first field will not be unboxed.
650 argCprType :: Demand -> CprType
651 argCprType dmd = CprType 0 (go dmd)
652 where
653 go (n :* sd)
654 | isAbs n = topCpr
655 | Prod Unboxed ds <- sd = ConCpr fIRST_TAG (strictMap go ds)
656 | Poly Unboxed _ <- sd = ConCpr fIRST_TAG []
657 | otherwise = topCpr
658
659 {- Note [Safe abortion in the fixed-point iteration]
660 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
661 Fixed-point iteration may fail to terminate. But we cannot simply give up and
662 return the environment and code unchanged! We still need to do one additional
663 round, to ensure that all expressions have been traversed at least once, and any
664 unsound CPR annotations have been updated.
665
666 Note [Efficient Top sigs in SigEnv]
667 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
668 It's pretty common for binders in the SigEnv to have a 'topCprSig'.
669 Wide records with 100 fields like in T9675 even will generate code where the
670 majority of binders has Top signature. To save some allocations, we store
671 those binders with a Top signature in a separate UnVarSet (which is an IntSet
672 with a convenient Var-tailored API).
673
674 Why store top signatures at all in the SigEnv? After all, when 'cprTransform'
675 encounters a locally-bound Id without an entry in the SigEnv, it should behave
676 as if that binder has a Top signature!
677 Well, the problem is when case binders should have a Top signatures. They always
678 have an unfolding and thus look to 'cprTransform' as if they bind a data
679 structure, Note [CPR for data structures], and thus would always have the CPR
680 property. So we need some mechanism to separate data structures from case
681 binders with a Top signature, and the UnVarSet provides that in the least
682 convoluted way I can think of.
683
684 Note [CPR for binders that will be unboxed]
685 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
686 If a lambda-bound variable will be unboxed by worker/wrapper (so it must be
687 demanded strictly), then give it a CPR signature. Here's a concrete example
688 ('f1' in test T10482a), assuming h is strict:
689
690 f1 :: Int -> Int
691 f1 x = case h x of
692 A -> x
693 B -> f1 (x-1)
694 C -> x+1
695
696 If we notice that 'x' is used strictly, we can give it the CPR
697 property; and hence f1 gets the CPR property too. It's sound (doesn't
698 change strictness) to give it the CPR property because by the time 'x'
699 is returned (case A above), it'll have been evaluated (by the wrapper
700 of 'h' in the example).
701
702 Moreover, if f itself is strict in x, then we'll pass x unboxed to
703 f1, and so the boxed version *won't* be available; in that case it's
704 very helpful to give 'x' the CPR property.
705
706 This is all done in 'extendSigEnvForArg'.
707
708 Note that
709
710 * Whether or not something unboxes is decided by 'wantToUnboxArg', else we may
711 get over-optimistic CPR results (e.g., from \(x :: a) -> x!).
712
713 * If the demand unboxes deeply, we can give the binder a /nested/ CPR
714 property, e.g.
715
716 g :: (Int, Int) -> Int
717 g p = case p of
718 (x, y) | x < 0 -> 0
719 | otherwise -> x
720
721 `x` should have the CPR property because it will be unboxed. We do so
722 by giving `p` the Nested CPR property `1(1,)`, indicating that we not only
723 have `p` available unboxed, but also its field `x`. Analysis of the Case
724 will then transfer the CPR property to `x`.
725
726 Before we were able to express Nested CPR, we used to guess which field
727 binders should get the CPR property.
728 See Historic Note [Optimistic field binder CPR].
729
730 * See Note [CPR examples]
731
732 Note [CPR for thunks]
733 ~~~~~~~~~~~~~~~~~~~~~
734 If the rhs is a thunk, we usually forget the CPR info, because
735 it is presumably shared (else it would have been inlined, and
736 so we'd lose sharing if w/w'd it into a function). E.g.
737
738 let r = case expensive of
739 (a,b) -> (b,a)
740 in ...
741
742 If we marked r as having the CPR property, then we'd w/w into
743
744 let $wr = \() -> case expensive of
745 (a,b) -> (# b, a #)
746 r = case $wr () of
747 (# b,a #) -> (b,a)
748 in ...
749
750 But now r is a thunk, which won't be inlined, so we are no further ahead.
751 But consider
752
753 f x = let r = case expensive of (a,b) -> (b,a)
754 in if foo r then r else (x,x)
755
756 Does f have the CPR property? Well, no.
757
758 However, if the strictness analyser has figured out (in a previous
759 iteration) that it's strict, then we DON'T need to forget the CPR info.
760 Instead we can retain the CPR info and do the thunk-splitting transform
761 (see WorkWrap.splitThunk).
762
763 This made a big difference to PrelBase.modInt, which had something like
764 modInt = \ x -> let r = ... -> I# v in
765 ...body strict in r...
766 r's RHS isn't a value yet; but modInt returns r in various branches, so
767 if r doesn't have the CPR property then neither does modInt
768 Another case I found in practice (in Complex.magnitude), looks like this:
769 let k = if ... then I# a else I# b
770 in ... body strict in k ....
771 (For this example, it doesn't matter whether k is returned as part of
772 the overall result; but it does matter that k's RHS has the CPR property.)
773 Left to itself, the simplifier will make a join point thus:
774 let $j k = ...body strict in k...
775 if ... then $j (I# a) else $j (I# b)
776 With thunk-splitting, we get instead
777 let $j x = let k = I#x in ...body strict in k...
778 in if ... then $j a else $j b
779 This is much better; there's a good chance the I# won't get allocated.
780
781 But what about botCpr? Consider
782 lvl = error "boom"
783 fac -1 = lvl
784 fac 0 = 1
785 fac n = n * fac (n-1)
786 fac won't have the CPR property here when we trim every thunk! But the
787 assumption is that error cases are rarely entered and we are diverging anyway,
788 so WW doesn't hurt.
789
790 Should we also trim CPR on DataCon application bindings?
791 See Note [CPR for data structures]!
792
793 Note [CPR for data structures]
794 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
795 Long static data structures (whether top-level or not) like
796
797 xs = x1 : xs1
798 xs1 = x2 : xs2
799 xs2 = x3 : xs3
800
801 should not get (nested) CPR signatures (#18154), because they
802
803 * Never get WW'd, so their CPR signature should be irrelevant after analysis
804 (in fact the signature might even be harmful for that reason)
805 * Would need to be inlined/expanded to see their constructed product
806 * BUT MOST IMPORTANTLY, Problem P1:
807 Recording CPR on them blows up interface file sizes and is redundant with
808 their unfolding. In case of Nested CPR, this blow-up can be quadratic!
809 Reason: the CPR info for xs1 contains the CPR info for xs; the CPR info
810 for xs2 contains that for xs1. And so on.
811 By contrast, the size of unfoldings and types stays linear. That's why
812 quadratic blowup is problematic; it makes an asymptotic difference.
813
814 Hence (Solution S1) we don't give data structure bindings a CPR *signature* and
815 hence don't to analyse them in 'cprAnalBind'.
816 What do we mean by "data structure binding"? Answer:
817
818 (1) idArity id == 0 (otherwise it's a function)
819 (2) is eval'd (otherwise it's a thunk, Note [CPR for thunks] applies)
820 (3) not (isJoinId id) (otherwise it's a function and its more efficient to
821 analyse it just once rather than at each call site)
822
823 But (S1) leads to a new Problem P2: We can't just stop giving DataCon application
824 bindings the CPR *property*, for example the factorial function after FloatOut
825
826 lvl = I# 1#
827 fac 0 = lvl
828 fac n = n * fac (n-1)
829
830 lvl is a data structure, and hence (see above) will not have a CPR *signature*.
831 But if lvl doesn't have the CPR *property*, fac won't either and we allocate a
832 box for the result on every iteration of the loop.
833
834 So (Solution S2) when 'cprAnal' meets a variable lacking a CPR signature to
835 extrapolate into a CPR transformer, 'cprTransform' tries to get its unfolding
836 (via 'cprDataStructureUnfolding_maybe'), and analyses that instead.
837
838 The Result R1: Everything behaves as if there was a CPR signature, but without
839 the blowup in interface files.
840
841 There is one exception to (R1):
842
843 x = (y, z); {-# NOINLINE x #-}
844 f p = (y, z); {-# NOINLINE f #-}
845
846 While we still give the NOINLINE *function* 'f' the CPR property (and WW
847 accordingly, see Note [Worker/wrapper for NOINLINE functions]), we won't
848 give the NOINLINE *data structure* 'x' the CPR property, because it lacks an
849 unfolding. In particular, KindRep bindings are NOINLINE data structures (see
850 the noinline wrinkle in Note [Grand plan for Typeable]). We'll behave as if the
851 bindings had 'topCprSig', and that is fine, as a case on the binding would never
852 cancel away after WW!
853
854 It's also worth pointing out how ad-hoc (S1) is: If we instead had
855
856 f1 x = x:[]
857 f2 x = x : f1 x
858 f3 x = x : f2 x
859 ...
860
861 we still give every function an ever deepening CPR signature. But it's very
862 uncommon to find code like this, whereas the long static data structures from
863 the beginning of this Note are very common because of GHC's strategy of ANF'ing
864 data structure RHSs.
865
866 Note [CPR for data structures can destroy sharing]
867 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
868 In Note [CPR for data structures], we argued that giving data structure bindings
869 the CPR property is useful to give functions like fac the CPR property:
870
871 lvl = I# 1#
872 fac 0 = lvl
873 fac n = n * fac (n-1)
874
875 Worker/wrappering fac for its CPR property means we get a very fast worker
876 function with type Int# -> Int#, without any heap allocation at all.
877
878 But consider what happens if we call `map fac (replicate n 0)`, where the
879 wrapper doesn't cancel away: Then we rebox the result of $wfac *on each call*,
880 n times, instead of reusing the static thunk for 1, e.g. an asymptotic increase
881 in allocations. If you twist it just right, you can actually write programs that
882 that take O(n) space if you do CPR and O(1) if you don't:
883
884 fac :: Int -> Int
885 fac 0 = 1 -- this clause will trigger CPR and destroy sharing for O(n) space
886 -- fac 0 = lazy 1 -- this clause will prevent CPR and run in O(1) space
887 fac n = n * fac (n-1)
888
889 const0 :: Int -> Int
890 const0 n = signum n - 1 -- will return 0 for [1..n]
891 {-# NOINLINE const0 #-}
892
893 main = print $ foldl' (\acc n -> acc + lazy n) 0 $ map (fac . const0) [1..100000000]
894
895 Generally, this kind of asymptotic increase in allocation can happen whenever we
896 give a data structure the CPR property that is bound outside of a recursive
897 function. So far we don't have a convincing remedy; giving fac the CPR property
898 is just too attractive. #19309 documents a futile idea. #13331 tracks the
899 general issue of WW destroying sharing and also contains above reproducer.
900 #19326 is about CPR destroying sharing in particular.
901
902 With Nested CPR, sharing can also be lost within the same "lambda level", for
903 example:
904
905 f (I# x) = let y = I# (x*#x) in (y, y)
906
907 Nestedly unboxing would destroy the box shared through 'y'. (Perhaps we can call
908 this "internal sharing", in contrast to "external sharing" beyond lambda or even
909 loop levels above.) But duplicate occurrences like that are pretty rare and may
910 never lead to an asymptotic difference in allocations of 'f'.
911
912 Note [CPR for recursive data constructors]
913 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
914 Note [CPR for data structures can destroy sharing] gives good reasons not to
915 give shared data structure bindings the CPR property. But we shouldn't even
916 give *functions* that return *recursive* data constructor applications the CPR
917 property. Here's an example for why:
918
919 c = C# 'a'
920 replicateC :: Int -> [Int]
921 replicateC 1 = [c]
922 replicateC n = c : replicateC (n-1)
923
924 What happens if we give `replicateC` the (nested) CPR property? We get a WW
925 split for 'replicateC', the wrapper of which is certain to inline, like this:
926
927 replicateC (I# n) = case $wreplicateC n of (# x, xs #) -> C# x : xs
928 $wreplicateC 1# = (# 'a', [] #)
929 $wreplicateC n = (# 'a', replicateC (I# (n -# 1#)) #)
930
931 Eliminating the shared 'c' binding in the process. And then
932
933 * We *might* save allocation of the topmost (of most likely several) (:)
934 constructor if it cancels away at the call site. Similarly for the 'C#'
935 constructor.
936 * But we will now re-allocate the C# box on every iteration of the loop,
937 because we separated the character literal from the C# application.
938 That means n times as many C# allocations as before. Yikes!!
939 * We make all other call sites where the wrapper inlines a bit larger, most of
940 them for no gain. But this shouldn't matter much.
941 * The inlined wrapper may inhibit eta-expansion in some cases. Here's how:
942 If the wrapper is inlined in a strict arg position, the Simplifier will
943 transform as follows
944
945 f (replicateC n)
946 ==> { inline }
947 f (case $wreplicateC n of (# x, xs #) -> (C# x, xs))
948 ==> { strict arg }
949 case $wreplicateC n of (# x, xs #) -> f (C# x, xs)
950
951 Now we can't float out the case anymore. In fact, we can't even float out
952 `$wreplicateC n`, because it returns an unboxed tuple.
953 This can inhibit eta-expansion if we later find out that `f` has arity > 1
954 (such as when we define `foldl` in terms of `foldr`). #19970 shows how
955 abstaining from worker/wrappering made a difference of -20% in reptile. So
956 while WW'ing for CPR didn't make the program slower directly, the resulting
957 program got much harder to optimise because of the returned unboxed tuple
958 (which can't easily float because unlifted).
959
960 `replicateC` comes up in T5536, which regresses significantly if CPR'd nestedly.
961
962 What can we do about it?
963
964 A. Don't CPR functions that return a *recursive data type* (the list in this
965 case). This is the solution we adopt. Rationale: the benefit of CPR on
966 recursive data structures is slight, because it only affects the outer layer
967 of a potentially massive data structure.
968 B. Don't CPR any *recursive function*. That would be quite conservative, as it
969 would also affect e.g. the factorial function.
970 C. Flat CPR only for recursive functions. This prevents the asymptotic
971 worsening part arising through unsharing the C# box, but it's still quite
972 conservative.
973 D. No CPR at occurrences of shared data structure in hot paths (e.g. the use of
974 `c` in the second eqn of `replicateC`). But we'd need to know which paths
975 were hot. We want such static branch frequency estimates in #20378.
976
977 We adopt solution (A) It is ad-hoc, but appears to work reasonably well.
978 Deciding what a "recursive data constructor" is is quite tricky and ad-hoc, too:
979 See Note [Detecting recursive data constructors]. We don't have to be perfect
980 and can simply keep on unboxing if unsure.
981
982 Note [Detecting recursive data constructors]
983 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
984 What qualifies as a "recursive data constructor" as per
985 Note [CPR for recursive data constructors]? That is up to
986 'GHC.Core.Opt.WorkWrapW.Utils.isRecDataCon' to decide. It does a DFS search over
987 the field types of the DataCon and looks for term-level recursion into the data
988 constructor's type constructor. A few perhaps surprising points:
989
990 1. It deems any function type as non-recursive, because it's unlikely that
991 a recursion through a function type builds up a recursive data structure.
992 2. It doesn't look into kinds or coercion types because there's nothing to unbox.
993 Same for promoted data constructors.
994 3. We don't care whether a NewTyCon or DataTyCon App is fully saturated or not;
995 we simply look at its definition/DataCons and its field tys. Any recursive arg
996 occs will have been detected before (see the invariant of 'go_tc_app').
997 This is so that we expand the `ST` in `StateT Int (ST s) a`.
998 4. We don't recurse deeper than 3 (at the moment of this writing) TyCons and
999 assume the DataCon is non-recursive after that. One reason is guaranteed
1000 constant-time efficiency; the other is that it's fair to say that a recursion
1001 over 3 or more TyCons doesn't really count as a list-like data structure
1002 anymore and a bit of unboxing doesn't hurt much.
1003 5. It checks TyConApps like `T <huge> <type>` by eagerly checking the
1004 potentially huge argument types *before* it tries to expand the
1005 DataCons/NewTyCon/TyFams/etc. so that it doesn't need to re-check those
1006 argument types after having been substituted into every occurrence of
1007 the the respective TyCon parameter binders. It's like call-by-value vs.
1008 call-by-name: Eager checking of argument types means we only need to check
1009 them exactly once.
1010 There's one exception to that rule, namely when we are able to reduce a
1011 TyFam by considering argument types. Then we pay the price of potentially
1012 checking the same type arg twice (or more, if the TyFam is recursive).
1013 It should hardly matter.
1014 6. As a result of keeping the implementation simple, it says "recursive"
1015 for `data T = MkT [T]`, even though we could argue that the inner recursion
1016 (through the `[]` TyCon) by way of which `T` is recursive will already be
1017 "broken" and thus never unboxed. Consequently, it might be OK to CPR a
1018 function returning `T`. Lacking arguments for or against the current simple
1019 behavior, we stick to it.
1020 7. When the search hits an abstract TyCon (one without visible DataCons, e.g.,
1021 from an .hs-boot file), it returns 'Nothing' for "inconclusive", the same
1022 as when we run out of fuel. If there is ever a recursion through an
1023 abstract TyCon, then it's not part of the same function we are looking at,
1024 so we can treat it as if it wasn't recursive.
1025
1026 Here are a few examples of data constructors or data types with a single data
1027 con and the answers of our function:
1028
1029 data T = T (Int, (Bool, Char)) NonRec
1030 (:) Rec
1031 [] NonRec
1032 data U = U [Int] NonRec
1033 data U2 = U2 [U2] Rec (see point (6))
1034 data T1 = T1 T2; data T2 = T2 T1 Rec
1035 newtype Fix f = Fix (f (Fix f)) Rec
1036 data N = N (Fix (Either Int)) NonRec
1037 data M = M (Fix (Either M)) Rec
1038 data F = F (F -> Int) NonRec (see point (1))
1039 data G = G (Int -> G) NonRec (see point (1))
1040 newtype MyM s a = MyM (StateT Int (ST s) a NonRec
1041 type S = (Int, Bool) NonRec
1042
1043 { type family E a where
1044 E Int = Char
1045 E (a,b) = (E a, E b)
1046 E Char = Blub
1047 data Blah = Blah (E (Int, (Int, Int))) NonRec (see point (5))
1048 data Blub = Blub (E (Char, Int)) Rec
1049 data Blub2 = Blub2 (E (Bool, Int)) } Rec, because stuck
1050
1051 { data T1 = T1 T2; data T2 = T2 T3;
1052 ... data T5 = T5 T1 } Nothing (out of fuel) (see point (4))
1053
1054 { module A where -- A.hs-boot
1055 data T
1056 module B where
1057 import {-# SOURCE #-} A
1058 data U = MkU T
1059 f :: T -> U
1060 f t = MkU t Nothing (T is abstract) (see point (7))
1061 module A where -- A.hs
1062 import B
1063 data T = MkT U }
1064
1065 These examples are tested by the testcase RecDataConCPR.
1066
1067 I've played with the idea to make points (1) through (3) of 'isRecDataCon'
1068 configurable like (4) to enable more re-use throughout the compiler, but haven't
1069 found a killer app for that yet, so ultimately didn't do that.
1070
1071 Note [CPR examples]
1072 ~~~~~~~~~~~~~~~~~~~
1073 Here are some examples (stranal/should_compile/T10482a) of the
1074 usefulness of Note [Optimistic field binder CPR]. The main
1075 point: all of these functions can have the CPR property.
1076
1077 ------- f1 -----------
1078 -- x is used strictly by h, so it'll be available
1079 -- unboxed before it is returned in the True branch
1080
1081 f1 :: Int -> Int
1082 f1 x = case h x x of
1083 True -> x
1084 False -> f1 (x-1)
1085
1086 ------- f3 -----------
1087 -- h is strict in x, so x will be unboxed before it
1088 -- is rerturned in the otherwise case.
1089
1090 data T3 = MkT3 Int Int
1091
1092 f1 :: T3 -> Int
1093 f1 (MkT3 x y) | h x y = f3 (MkT3 x (y-1))
1094 | otherwise = x
1095
1096 Historic Note [Optimistic field binder CPR]
1097 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1098 This Note describes how we used to guess whether fields have the CPR property
1099 before we were able to express Nested CPR for arguments.
1100
1101 Consider
1102
1103 data T a = MkT a
1104 f :: T Int -> Int
1105 f x = ... (case x of
1106 MkT y -> y) ...
1107
1108 And assume we know from strictness analysis that `f` is strict in `x` and its
1109 field `y` and we unbox both. Then we give `x` the CPR property according
1110 to Note [CPR for binders that will be unboxed]. But `x`'s sole field `y`
1111 likewise will be unboxed and it should also get the CPR property. We'd
1112 need a *nested* CPR property here for `x` to express that and unwrap one level
1113 when we analyse the Case to give the CPR property to `y`.
1114
1115 Lacking Nested CPR (hence this Note is historic now that we have Nested CPR), we
1116 have to guess a bit, by looking for
1117
1118 (A) Flat CPR on the scrutinee
1119 (B) A variable scrutinee. Otherwise surely it can't be a parameter.
1120 (C) Strict demand on the field binder `y` (or it binds a strict field)
1121
1122 While (A) is a necessary condition to give a field the CPR property, there are
1123 ways in which (B) and (C) are too lax, leading to unsound analysis results and
1124 thus reboxing in the wrapper:
1125
1126 (b) We could scrutinise some other variable than a parameter, like in
1127
1128 g :: T Int -> Int
1129 g x = let z = foo x in -- assume `z` has CPR property
1130 case z of MkT y -> y
1131
1132 Lacking Nested CPR and multiple levels of unboxing, only the outer box
1133 of `z` will be available and a case on `y` won't actually cancel away.
1134 But it's simple, and nothing terrible happens if we get it wrong. e.g.
1135 #10694.
1136
1137 (c) A strictly used field binder doesn't mean the function is strict in it.
1138
1139 h :: T Int -> Int -> Int
1140 h !x 0 = 0
1141 h x 0 = case x of MkT y -> y
1142
1143 Here, `y` is used strictly, but the field of `x` certainly is not and
1144 consequently will not be available unboxed.
1145 Why not look at the demand of `x` instead to determine whether `y` is
1146 unboxed? Because the 'idDemandInfo' on `x` will not have been propagated
1147 to its occurrence in the scrutinee when CprAnal runs directly after
1148 DmdAnal.
1149
1150 We used to give the case binder the CPR property unconditionally instead of
1151 deriving it from the case scrutinee.
1152 See Historic Note [Optimistic case binder CPR].
1153
1154 Historic Note [Optimistic case binder CPR]
1155 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1156 We used to give the case binder the CPR property unconditionally, which is too
1157 optimistic (#19232). Here are the details:
1158
1159 Inside the alternative, the case binder always has the CPR property, meaning
1160 that a case on it will successfully cancel.
1161 Example:
1162 f True x = case x of y { I# x' -> if x' ==# 3
1163 then y
1164 else I# 8 }
1165 f False x = I# 3
1166 By giving 'y' the CPR property, we ensure that 'f' does too, so we get
1167 f b x = case fw b x of { r -> I# r }
1168 fw True x = case x of y { I# x' -> if x' ==# 3 then x' else 8 }
1169 fw False x = 3
1170 Of course there is the usual risk of re-boxing: we have 'x' available boxed
1171 and unboxed, but we return the unboxed version for the wrapper to box. If the
1172 wrapper doesn't cancel with its caller, we'll end up re-boxing something that
1173 we did have available in boxed form.
1174
1175 -}