never executed always true always false
    1 --
    2 -- Copyright (c) 2014 Joachim Breitner
    3 --
    4 
    5 {-# LANGUAGE BangPatterns #-}
    6 
    7 module GHC.Core.Opt.CallArity
    8     ( callArityAnalProgram
    9     , callArityRHS -- for testing
   10     ) where
   11 
   12 import GHC.Prelude
   13 
   14 import GHC.Types.Var.Set
   15 import GHC.Types.Var.Env
   16 
   17 import GHC.Types.Basic
   18 import GHC.Core
   19 import GHC.Types.Id
   20 import GHC.Core.Opt.Arity ( typeArity )
   21 import GHC.Core.Utils ( exprIsCheap, exprIsTrivial )
   22 import GHC.Data.Graph.UnVar
   23 import GHC.Types.Demand
   24 import GHC.Utils.Misc
   25 
   26 import Control.Arrow ( first, second )
   27 
   28 
   29 {-
   30 %************************************************************************
   31 %*                                                                      *
   32               Call Arity Analysis
   33 %*                                                                      *
   34 %************************************************************************
   35 
   36 Note [Call Arity: The goal]
   37 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
   38 
   39 The goal of this analysis is to find out if we can eta-expand a local function
   40 based on how it is being called. The motivating example is this code,
   41 which comes up when we implement foldl using foldr, and do list fusion:
   42 
   43     let go = \x -> let d = case ... of
   44                               False -> go (x+1)
   45                               True  -> id
   46                    in \z -> d (x + z)
   47     in go 1 0
   48 
   49 If we do not eta-expand `go` to have arity 2, we are going to allocate a lot of
   50 partial function applications, which would be bad.
   51 
   52 The function `go` has a type of arity two, but only one lambda is manifest.
   53 Furthermore, an analysis that only looks at the RHS of go cannot be sufficient
   54 to eta-expand go: If `go` is ever called with one argument (and the result used
   55 multiple times), we would be doing the work in `...` multiple times.
   56 
   57 So `callArityAnalProgram` looks at the whole let expression to figure out if
   58 all calls are nice, i.e. have a high enough arity. It then stores the result in
   59 the `calledArity` field of the `IdInfo` of `go`, which the next simplifier
   60 phase will eta-expand.
   61 
   62 The specification of the `calledArity` field is:
   63 
   64     No work will be lost if you eta-expand me to the arity in `calledArity`.
   65 
   66 What we want to know for a variable
   67 -----------------------------------
   68 
   69 For every let-bound variable we'd like to know:
   70   1. A lower bound on the arity of all calls to the variable, and
   71   2. whether the variable is being called at most once or possibly multiple
   72      times.
   73 
   74 It is always okay to lower the arity, or pretend that there are multiple calls.
   75 In particular, "Minimum arity 0 and possibly called multiple times" is always
   76 correct.
   77 
   78 
   79 What we want to know from an expression
   80 ---------------------------------------
   81 
   82 In order to obtain that information for variables, we analyze expression and
   83 obtain bits of information:
   84 
   85  I.  The arity analysis:
   86      For every variable, whether it is absent, or called,
   87      and if called, with what arity.
   88 
   89  II. The Co-Called analysis:
   90      For every two variables, whether there is a possibility that both are being
   91      called.
   92      We obtain as a special case: For every variable, whether there is a
   93      possibility that it is being called twice.
   94 
   95 For efficiency reasons, we gather this information only for a set of
   96 *interesting variables*, to avoid spending time on, e.g., variables from pattern matches.
   97 
   98 The two analysis are not completely independent, as a higher arity can improve
   99 the information about what variables are being called once or multiple times.
  100 
  101 Note [Analysis I: The arity analysis]
  102 ------------------------------------
  103 
  104 The arity analysis is quite straightforward: The information about an
  105 expression is an
  106     VarEnv Arity
  107 where absent variables are bound to Nothing and otherwise to a lower bound to
  108 their arity.
  109 
  110 When we analyze an expression, we analyze it with a given context arity.
  111 Lambdas decrease and applications increase the incoming arity. Analysing a
  112 variable will put that arity in the environment. In `let`s or `case`s all the
  113 results from the various subexpressions are lub'd, which takes the point-wise
  114 minimum (considering Nothing an infinity).
  115 
  116 
  117 Note [Analysis II: The Co-Called analysis]
  118 ------------------------------------------
  119 
  120 The second part is more sophisticated. For reasons explained below, it is not
  121 sufficient to simply know how often an expression evaluates a variable. Instead
  122 we need to know which variables are possibly called together.
  123 
  124 The data structure here is an undirected graph of variables, which is provided
  125 by the abstract
  126     UnVarGraph
  127 
  128 It is safe to return a larger graph, i.e. one with more edges. The worst case
  129 (i.e. the least useful and always correct result) is the complete graph on all
  130 free variables, which means that anything can be called together with anything
  131 (including itself).
  132 
  133 Notation for the following:
  134 C(e)  is the co-called result for e.
  135 G₁∪G₂ is the union of two graphs
  136 fv    is the set of free variables (conveniently the domain of the arity analysis result)
  137 S₁×S₂ is the complete bipartite graph { {a,b} | a ∈ S₁, b ∈ S₂ }
  138 S²    is the complete graph on the set of variables S, S² = S×S
  139 C'(e) is a variant for bound expression:
  140       If e is called at most once, or it is and stays a thunk (after the analysis),
  141       it is simply C(e). Otherwise, the expression can be called multiple times
  142       and we return (fv e)²
  143 
  144 The interesting cases of the analysis:
  145  * Var v:
  146    No other variables are being called.
  147    Return {} (the empty graph)
  148  * Lambda v e, under arity 0:
  149    This means that e can be evaluated many times and we cannot get
  150    any useful co-call information.
  151    Return (fv e)²
  152  * Case alternatives alt₁,alt₂,...:
  153    Only one can be execuded, so
  154    Return (alt₁ ∪ alt₂ ∪...)
  155  * App e₁ e₂ (and analogously Case scrut alts), with non-trivial e₂:
  156    We get the results from both sides, with the argument evaluated at most once.
  157    Additionally, anything called by e₁ can possibly be called with anything
  158    from e₂.
  159    Return: C(e₁) ∪ C(e₂) ∪ (fv e₁) × (fv e₂)
  160  * App e₁ x:
  161    As this is already in A-normal form, CorePrep will not separately lambda
  162    bind (and hence share) x. So we conservatively assume multiple calls to x here
  163    Return: C(e₁) ∪ (fv e₁) × {x} ∪ {(x,x)}
  164  * Let v = rhs in body:
  165    In addition to the results from the subexpressions, add all co-calls from
  166    everything that the body calls together with v to everything that is called
  167    by v.
  168    Return: C'(rhs) ∪ C(body) ∪ (fv rhs) × {v'| {v,v'} ∈ C(body)}
  169  * Letrec v₁ = rhs₁ ... vₙ = rhsₙ in body
  170    Tricky.
  171    We assume that it is really mutually recursive, i.e. that every variable
  172    calls one of the others, and that this is strongly connected (otherwise we
  173    return an over-approximation, so that's ok), see note [Recursion and fixpointing].
  174 
  175    Let V = {v₁,...vₙ}.
  176    Assume that the vs have been analysed with an incoming demand and
  177    cardinality consistent with the final result (this is the fixed-pointing).
  178    Again we can use the results from all subexpressions.
  179    In addition, for every variable vᵢ, we need to find out what it is called
  180    with (call this set Sᵢ). There are two cases:
  181     * If vᵢ is a function, we need to go through all right-hand-sides and bodies,
  182       and collect every variable that is called together with any variable from V:
  183       Sᵢ = {v' | j ∈ {1,...,n},      {v',vⱼ} ∈ C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪ C(body) }
  184     * If vᵢ is a thunk, then its rhs is evaluated only once, so we need to
  185       exclude it from this set:
  186       Sᵢ = {v' | j ∈ {1,...,n}, j≠i, {v',vⱼ} ∈ C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪ C(body) }
  187    Finally, combine all this:
  188    Return: C(body) ∪
  189            C'(rhs₁) ∪ ... ∪ C'(rhsₙ) ∪
  190            (fv rhs₁) × S₁) ∪ ... ∪ (fv rhsₙ) × Sₙ)
  191 
  192 Using the result: Eta-Expansion
  193 -------------------------------
  194 
  195 We use the result of these two analyses to decide whether we can eta-expand the
  196 rhs of a let-bound variable.
  197 
  198 If the variable is already a function (exprIsCheap), and all calls to the
  199 variables have a higher arity than the current manifest arity (i.e. the number
  200 of lambdas), expand.
  201 
  202 If the variable is a thunk we must be careful: Eta-Expansion will prevent
  203 sharing of work, so this is only safe if there is at most one call to the
  204 function. Therefore, we check whether {v,v} ∈ G.
  205 
  206     Example:
  207 
  208         let n = case .. of .. -- A thunk!
  209         in n 0 + n 1
  210 
  211     vs.
  212 
  213         let n = case .. of ..
  214         in case .. of T -> n 0
  215                       F -> n 1
  216 
  217     We are only allowed to eta-expand `n` if it is going to be called at most
  218     once in the body of the outer let. So we need to know, for each variable
  219     individually, that it is going to be called at most once.
  220 
  221 
  222 Why the co-call graph?
  223 ----------------------
  224 
  225 Why is it not sufficient to simply remember which variables are called once and
  226 which are called multiple times? It would be in the previous example, but consider
  227 
  228         let n = case .. of ..
  229         in case .. of
  230             True -> let go = \y -> case .. of
  231                                      True -> go (y + n 1)
  232                                      False > n
  233                     in go 1
  234             False -> n
  235 
  236 vs.
  237 
  238         let n = case .. of ..
  239         in case .. of
  240             True -> let go = \y -> case .. of
  241                                      True -> go (y+1)
  242                                      False > n
  243                     in go 1
  244             False -> n
  245 
  246 In both cases, the body and the rhs of the inner let call n at most once.
  247 But only in the second case that holds for the whole expression! The
  248 crucial difference is that in the first case, the rhs of `go` can call
  249 *both* `go` and `n`, and hence can call `n` multiple times as it recurses,
  250 while in the second case find out that `go` and `n` are not called together.
  251 
  252 
  253 Why co-call information for functions?
  254 --------------------------------------
  255 
  256 Although for eta-expansion we need the information only for thunks, we still
  257 need to know whether functions are being called once or multiple times, and
  258 together with what other functions.
  259 
  260     Example:
  261 
  262         let n = case .. of ..
  263             f x = n (x+1)
  264         in f 1 + f 2
  265 
  266     vs.
  267 
  268         let n = case .. of ..
  269             f x = n (x+1)
  270         in case .. of T -> f 0
  271                       F -> f 1
  272 
  273     Here, the body of f calls n exactly once, but f itself is being called
  274     multiple times, so eta-expansion is not allowed.
  275 
  276 
  277 Note [Analysis type signature]
  278 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  279 
  280 The work-hourse of the analysis is the function `callArityAnal`, with the
  281 following type:
  282 
  283     type CallArityRes = (UnVarGraph, VarEnv Arity)
  284     callArityAnal ::
  285         Arity ->  -- The arity this expression is called with
  286         VarSet -> -- The set of interesting variables
  287         CoreExpr ->  -- The expression to analyse
  288         (CallArityRes, CoreExpr)
  289 
  290 and the following specification:
  291 
  292   ((coCalls, callArityEnv), expr') = callArityEnv arity interestingIds expr
  293 
  294                             <=>
  295 
  296   Assume the expression `expr` is being passed `arity` arguments. Then it holds that
  297     * The domain of `callArityEnv` is a subset of `interestingIds`.
  298     * Any variable from `interestingIds` that is not mentioned in the `callArityEnv`
  299       is absent, i.e. not called at all.
  300     * Every call from `expr` to a variable bound to n in `callArityEnv` has at
  301       least n value arguments.
  302     * For two interesting variables `v1` and `v2`, they are not adjacent in `coCalls`,
  303       then in no execution of `expr` both are being called.
  304   Furthermore, expr' is expr with the callArity field of the `IdInfo` updated.
  305 
  306 
  307 Note [Which variables are interesting]
  308 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  309 
  310 The analysis would quickly become prohibitive expensive if we would analyse all
  311 variables; for most variables we simply do not care about how often they are
  312 called, i.e. variables bound in a pattern match. So interesting are variables that are
  313  * top-level or let bound
  314  * and possibly functions (typeArity > 0)
  315 
  316 Note [Taking boring variables into account]
  317 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  318 
  319 If we decide that the variable bound in `let x = e1 in e2` is not interesting,
  320 the analysis of `e2` will not report anything about `x`. To ensure that
  321 `callArityBind` does still do the right thing we have to take that into account
  322 every time we would be lookup up `x` in the analysis result of `e2`.
  323   * Instead of calling lookupCallArityRes, we return (0, True), indicating
  324     that this variable might be called many times with no arguments.
  325   * Instead of checking `calledWith x`, we assume that everything can be called
  326     with it.
  327   * In the recursive case, when calclulating the `cross_calls`, if there is
  328     any boring variable in the recursive group, we ignore all co-call-results
  329     and directly go to a very conservative assumption.
  330 
  331 The last point has the nice side effect that the relatively expensive
  332 integration of co-call results in a recursive groups is often skipped. This
  333 helped to avoid the compile time blowup in some real-world code with large
  334 recursive groups (#10293).
  335 
  336 Note [Recursion and fixpointing]
  337 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  338 
  339 For a mutually recursive let, we begin by
  340  1. analysing the body, using the same incoming arity as for the whole expression.
  341  2. Then we iterate, memoizing for each of the bound variables the last
  342     analysis call, i.e. incoming arity, whether it is called once, and the CallArityRes.
  343  3. We combine the analysis result from the body and the memoized results for
  344     the arguments (if already present).
  345  4. For each variable, we find out the incoming arity and whether it is called
  346     once, based on the current analysis result. If this differs from the
  347     memoized results, we re-analyse the rhs and update the memoized table.
  348  5. If nothing had to be reanalyzed, we are done.
  349     Otherwise, repeat from step 3.
  350 
  351 
  352 Note [Thunks in recursive groups]
  353 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  354 
  355 We never eta-expand a thunk in a recursive group, on the grounds that if it is
  356 part of a recursive group, then it will be called multiple times.
  357 
  358 This is not necessarily true, e.g.  it would be safe to eta-expand t2 (but not
  359 t1) in the following code:
  360 
  361   let go x = t1
  362       t1 = if ... then t2 else ...
  363       t2 = if ... then go 1 else ...
  364   in go 0
  365 
  366 Detecting this would require finding out what variables are only ever called
  367 from thunks. While this is certainly possible, we yet have to see this to be
  368 relevant in the wild.
  369 
  370 
  371 Note [Analysing top-level binds]
  372 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  373 
  374 We can eta-expand top-level-binds if they are not exported, as we see all calls
  375 to them. The plan is as follows: Treat the top-level binds as nested lets around
  376 a body representing “all external calls”, which returns a pessimistic
  377 CallArityRes (the co-call graph is the complete graph, all arityies 0).
  378 
  379 Note [Trimming arity]
  380 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  381 
  382 In the Call Arity papers, we are working on an untyped lambda calculus with no
  383 other id annotations, where eta-expansion is always possible. But this is not
  384 the case for Core!
  385  1. We need to ensure the invariant
  386       callArity e <= typeArity (exprType e)
  387     for the same reasons that exprArity needs this invariant (see Note
  388     [exprArity invariant] in GHC.Core.Opt.Arity).
  389 
  390     If we are not doing that, a too-high arity annotation will be stored with
  391     the id, confusing the simplifier later on.
  392 
  393  2. Eta-expanding a right hand side might invalidate existing annotations. In
  394     particular, if an id has a strictness annotation of <...><...>b, then
  395     passing two arguments to it will definitely bottom out, so the simplifier
  396     will throw away additional parameters. This conflicts with Call Arity! So
  397     we ensure that we never eta-expand such a value beyond the number of
  398     arguments mentioned in the strictness signature.
  399     See #10176 for a real-world-example.
  400 
  401 Note [What is a thunk]
  402 ~~~~~~~~~~~~~~~~~~~~~~
  403 
  404 Originally, everything that is not in WHNF (`exprIsWHNF`) is considered a
  405 thunk, not eta-expanded, to avoid losing any sharing. This is also how the
  406 published papers on Call Arity describe it.
  407 
  408 In practice, there are thunks that do a just little work, such as
  409 pattern-matching on a variable, and the benefits of eta-expansion likely
  410 outweigh the cost of doing that repeatedly. Therefore, this implementation of
  411 Call Arity considers everything that is not cheap (`exprIsCheap`) as a thunk.
  412 
  413 Note [Call Arity and Join Points]
  414 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  415 
  416 The Call Arity analysis does not care about join points, and treats them just
  417 like normal functions. This is ok.
  418 
  419 The analysis *could* make use of the fact that join points are always evaluated
  420 in the same context as the join-binding they are defined in and are always
  421 one-shot, and handle join points separately, as suggested in
  422 https://gitlab.haskell.org/ghc/ghc/issues/13479#note_134870.
  423 This *might* be more efficient (for example, join points would not have to be
  424 considered interesting variables), but it would also add redundant code. So for
  425 now we do not do that.
  426 
  427 The simplifier never eta-expands join points (it instead pushes extra arguments from
  428 an eta-expanded context into the join point’s RHS), so the call arity
  429 annotation on join points is not actually used. As it would be equally valid
  430 (though less efficient) to eta-expand join points, this is the simplifier's
  431 choice, and hence Call Arity sets the call arity for join points as well.
  432 -}
  433 
  434 -- Main entry point
  435 
  436 callArityAnalProgram :: CoreProgram -> CoreProgram
  437 callArityAnalProgram binds = binds'
  438   where
  439     (_, binds') = callArityTopLvl [] emptyVarSet binds
  440 
  441 -- See Note [Analysing top-level-binds]
  442 callArityTopLvl :: [Var] -> VarSet -> [CoreBind] -> (CallArityRes, [CoreBind])
  443 callArityTopLvl exported _ []
  444     = ( calledMultipleTimes $ (emptyUnVarGraph, mkVarEnv $ [(v, 0) | v <- exported])
  445       , [] )
  446 callArityTopLvl exported int1 (b:bs)
  447     = (ae2, b':bs')
  448   where
  449     int2 = bindersOf b
  450     exported' = filter isExportedId int2 ++ exported
  451     int' = int1 `addInterestingBinds` b
  452     (ae1, bs') = callArityTopLvl exported' int' bs
  453     (ae2, b')  = callArityBind (boringBinds b) ae1 int1 b
  454 
  455 
  456 callArityRHS :: CoreExpr -> CoreExpr
  457 callArityRHS = snd . callArityAnal 0 emptyVarSet
  458 
  459 -- The main analysis function. See Note [Analysis type signature]
  460 callArityAnal ::
  461     Arity ->  -- The arity this expression is called with
  462     VarSet -> -- The set of interesting variables
  463     CoreExpr ->  -- The expression to analyse
  464     (CallArityRes, CoreExpr)
  465         -- How this expression uses its interesting variables
  466         -- and the expression with IdInfo updated
  467 
  468 -- The trivial base cases
  469 callArityAnal _     _   e@(Lit _)
  470     = (emptyArityRes, e)
  471 callArityAnal _     _   e@(Type _)
  472     = (emptyArityRes, e)
  473 callArityAnal _     _   e@(Coercion _)
  474     = (emptyArityRes, e)
  475 -- The transparent cases
  476 callArityAnal arity int (Tick t e)
  477     = second (Tick t) $ callArityAnal arity int e
  478 callArityAnal arity int (Cast e co)
  479     = second (\e -> Cast e co) $ callArityAnal arity int e
  480 
  481 -- The interesting case: Variables, Lambdas, Lets, Applications, Cases
  482 callArityAnal arity int e@(Var v)
  483     | v `elemVarSet` int
  484     = (unitArityRes v arity, e)
  485     | otherwise
  486     = (emptyArityRes, e)
  487 
  488 -- Non-value lambdas are ignored
  489 callArityAnal arity int (Lam v e) | not (isId v)
  490     = second (Lam v) $ callArityAnal arity (int `delVarSet` v) e
  491 
  492 -- We have a lambda that may be called multiple times, so its free variables
  493 -- can all be co-called.
  494 callArityAnal 0     int (Lam v e)
  495     = (ae', Lam v e')
  496   where
  497     (ae, e') = callArityAnal 0 (int `delVarSet` v) e
  498     ae' = calledMultipleTimes ae
  499 -- We have a lambda that we are calling. decrease arity.
  500 callArityAnal arity int (Lam v e)
  501     = (ae, Lam v e')
  502   where
  503     (ae, e') = callArityAnal (arity - 1) (int `delVarSet` v) e
  504 
  505 -- Application. Increase arity for the called expression, nothing to know about
  506 -- the second
  507 callArityAnal arity int (App e (Type t))
  508     = second (\e -> App e (Type t)) $ callArityAnal arity int e
  509 callArityAnal arity int (App e1 e2)
  510     = (final_ae, App e1' e2')
  511   where
  512     (ae1, e1') = callArityAnal (arity + 1) int e1
  513     (ae2, e2') = callArityAnal 0           int e2
  514     -- If the argument is trivial (e.g. a variable), then it will _not_ be
  515     -- let-bound in the Core to STG transformation (CorePrep actually),
  516     -- so no sharing will happen here, and we have to assume many calls.
  517     ae2' | exprIsTrivial e2 = calledMultipleTimes ae2
  518          | otherwise        = ae2
  519     final_ae = ae1 `both` ae2'
  520 
  521 -- Case expression.
  522 callArityAnal arity int (Case scrut bndr ty alts)
  523     = -- pprTrace "callArityAnal:Case"
  524       --          (vcat [ppr scrut, ppr final_ae])
  525       (final_ae, Case scrut' bndr ty alts')
  526   where
  527     (alt_aes, alts') = unzip $ map go alts
  528     go (Alt dc bndrs e) = let (ae, e') = callArityAnal arity (int `delVarSetList` (bndr:bndrs)) e
  529                           in  (ae, Alt dc bndrs e')
  530     alt_ae = lubRess alt_aes
  531     (scrut_ae, scrut') = callArityAnal 0 int scrut
  532     final_ae = scrut_ae `both` alt_ae
  533 
  534 -- For lets, use callArityBind
  535 callArityAnal arity int (Let bind e)
  536   = -- pprTrace "callArityAnal:Let"
  537     --          (vcat [ppr v, ppr arity, ppr n, ppr final_ae ])
  538     (final_ae, Let bind' e')
  539   where
  540     int_body = int `addInterestingBinds` bind
  541     (ae_body, e') = callArityAnal arity int_body e
  542     (final_ae, bind') = callArityBind (boringBinds bind) ae_body int bind
  543 
  544 -- Which bindings should we look at?
  545 -- See Note [Which variables are interesting]
  546 isInteresting :: Var -> Bool
  547 isInteresting v = not $ null (typeArity (idType v))
  548 
  549 interestingBinds :: CoreBind -> [Var]
  550 interestingBinds = filter isInteresting . bindersOf
  551 
  552 boringBinds :: CoreBind -> VarSet
  553 boringBinds = mkVarSet . filter (not . isInteresting) . bindersOf
  554 
  555 addInterestingBinds :: VarSet -> CoreBind -> VarSet
  556 addInterestingBinds int bind
  557     = int `delVarSetList`    bindersOf bind -- Possible shadowing
  558           `extendVarSetList` interestingBinds bind
  559 
  560 -- Used for both local and top-level binds
  561 -- Second argument is the demand from the body
  562 callArityBind :: VarSet -> CallArityRes -> VarSet -> CoreBind -> (CallArityRes, CoreBind)
  563 -- Non-recursive let
  564 callArityBind boring_vars ae_body int (NonRec v rhs)
  565   | otherwise
  566   = -- pprTrace "callArityBind:NonRec"
  567     --          (vcat [ppr v, ppr ae_body, ppr int, ppr ae_rhs, ppr safe_arity])
  568     (final_ae, NonRec v' rhs')
  569   where
  570     is_thunk = not (exprIsCheap rhs) -- see note [What is a thunk]
  571     -- If v is boring, we will not find it in ae_body, but always assume (0, False)
  572     boring = v `elemVarSet` boring_vars
  573 
  574     (arity, called_once)
  575         | boring    = (0, False) -- See Note [Taking boring variables into account]
  576         | otherwise = lookupCallArityRes ae_body v
  577     safe_arity | called_once = arity
  578                | is_thunk    = 0      -- A thunk! Do not eta-expand
  579                | otherwise   = arity
  580 
  581     -- See Note [Trimming arity]
  582     trimmed_arity = trimArity v safe_arity
  583 
  584     (ae_rhs, rhs') = callArityAnal trimmed_arity int rhs
  585 
  586 
  587     ae_rhs'| called_once     = ae_rhs
  588            | safe_arity == 0 = ae_rhs -- If it is not a function, its body is evaluated only once
  589            | otherwise       = calledMultipleTimes ae_rhs
  590 
  591     called_by_v = domRes ae_rhs'
  592     called_with_v
  593         | boring    = domRes ae_body
  594         | otherwise = calledWith ae_body v `delUnVarSet` v
  595     final_ae = addCrossCoCalls called_by_v called_with_v $ ae_rhs' `lubRes` resDel v ae_body
  596 
  597     v' = v `setIdCallArity` trimmed_arity
  598 
  599 
  600 -- Recursive let. See Note [Recursion and fixpointing]
  601 callArityBind boring_vars ae_body int b@(Rec binds)
  602   = -- (if length binds > 300 then
  603     -- pprTrace "callArityBind:Rec"
  604     --           (vcat [ppr (Rec binds'), ppr ae_body, ppr int, ppr ae_rhs]) else id) $
  605     (final_ae, Rec binds')
  606   where
  607     -- See Note [Taking boring variables into account]
  608     any_boring = any (`elemVarSet` boring_vars) [ i | (i, _) <- binds]
  609 
  610     int_body = int `addInterestingBinds` b
  611     (ae_rhs, binds') = fix initial_binds
  612     final_ae = bindersOf b `resDelList` ae_rhs
  613 
  614     initial_binds = [(i,Nothing,e) | (i,e) <- binds]
  615 
  616     fix :: [(Id, Maybe (Bool, Arity, CallArityRes), CoreExpr)] -> (CallArityRes, [(Id, CoreExpr)])
  617     fix ann_binds
  618         | -- pprTrace "callArityBind:fix" (vcat [ppr ann_binds, ppr any_change, ppr ae]) $
  619           any_change
  620         = fix ann_binds'
  621         | otherwise
  622         = (ae, map (\(i, _, e) -> (i, e)) ann_binds')
  623       where
  624         aes_old = [ (i,ae) | (i, Just (_,_,ae), _) <- ann_binds ]
  625         ae = callArityRecEnv any_boring aes_old ae_body
  626 
  627         rerun (i, mbLastRun, rhs)
  628             | i `elemVarSet` int_body && not (i `elemUnVarSet` domRes ae)
  629             -- No call to this yet, so do nothing
  630             = (False, (i, Nothing, rhs))
  631 
  632             | Just (old_called_once, old_arity, _) <- mbLastRun
  633             , called_once == old_called_once
  634             , new_arity == old_arity
  635             -- No change, no need to re-analyze
  636             = (False, (i, mbLastRun, rhs))
  637 
  638             | otherwise
  639             -- We previously analyzed this with a different arity (or not at all)
  640             = let is_thunk = not (exprIsCheap rhs) -- see note [What is a thunk]
  641 
  642                   safe_arity | is_thunk    = 0  -- See Note [Thunks in recursive groups]
  643                              | otherwise   = new_arity
  644 
  645                   -- See Note [Trimming arity]
  646                   trimmed_arity = trimArity i safe_arity
  647 
  648                   (ae_rhs, rhs') = callArityAnal trimmed_arity int_body rhs
  649 
  650                   ae_rhs' | called_once     = ae_rhs
  651                           | safe_arity == 0 = ae_rhs -- If it is not a function, its body is evaluated only once
  652                           | otherwise       = calledMultipleTimes ae_rhs
  653 
  654                   i' = i `setIdCallArity` trimmed_arity
  655 
  656               in (True, (i', Just (called_once, new_arity, ae_rhs'), rhs'))
  657           where
  658             -- See Note [Taking boring variables into account]
  659             (new_arity, called_once) | i `elemVarSet` boring_vars = (0, False)
  660                                      | otherwise                  = lookupCallArityRes ae i
  661 
  662         (changes, ann_binds') = unzip $ map rerun ann_binds
  663         any_change = or changes
  664 
  665 -- Combining the results from body and rhs, (mutually) recursive case
  666 -- See Note [Analysis II: The Co-Called analysis]
  667 callArityRecEnv :: Bool -> [(Var, CallArityRes)] -> CallArityRes -> CallArityRes
  668 callArityRecEnv any_boring ae_rhss ae_body
  669     = -- (if length ae_rhss > 300 then pprTrace "callArityRecEnv" (vcat [ppr ae_rhss, ppr ae_body, ppr ae_new]) else id) $
  670       ae_new
  671   where
  672     vars = map fst ae_rhss
  673 
  674     ae_combined = lubRess (map snd ae_rhss) `lubRes` ae_body
  675 
  676     cross_calls
  677         -- See Note [Taking boring variables into account]
  678         | any_boring               = completeGraph (domRes ae_combined)
  679         -- Also, calculating cross_calls is expensive. Simply be conservative
  680         -- if the mutually recursive group becomes too large.
  681         | lengthExceeds ae_rhss 25 = completeGraph (domRes ae_combined)
  682         | otherwise                = unionUnVarGraphs $ map cross_call ae_rhss
  683     cross_call (v, ae_rhs) = completeBipartiteGraph called_by_v called_with_v
  684       where
  685         is_thunk = idCallArity v == 0
  686         -- What rhs are relevant as happening before (or after) calling v?
  687         --    If v is a thunk, everything from all the _other_ variables
  688         --    If v is not a thunk, everything can happen.
  689         ae_before_v | is_thunk  = lubRess (map snd $ filter ((/= v) . fst) ae_rhss) `lubRes` ae_body
  690                     | otherwise = ae_combined
  691         -- What do we want to know from these?
  692         -- Which calls can happen next to any recursive call.
  693         called_with_v
  694             = unionUnVarSets $ map (calledWith ae_before_v) vars
  695         called_by_v = domRes ae_rhs
  696 
  697     ae_new = first (cross_calls `unionUnVarGraph`) ae_combined
  698 
  699 -- See Note [Trimming arity]
  700 trimArity :: Id -> Arity -> Arity
  701 trimArity v a = minimum [a, max_arity_by_type, max_arity_by_strsig]
  702   where
  703     max_arity_by_type = length (typeArity (idType v))
  704     max_arity_by_strsig
  705         | isDeadEndDiv result_info = length demands
  706         | otherwise = a
  707 
  708     (demands, result_info) = splitDmdSig (idDmdSig v)
  709 
  710 ---------------------------------------
  711 -- Functions related to CallArityRes --
  712 ---------------------------------------
  713 
  714 -- Result type for the two analyses.
  715 -- See Note [Analysis I: The arity analysis]
  716 -- and Note [Analysis II: The Co-Called analysis]
  717 type CallArityRes = (UnVarGraph, VarEnv Arity)
  718 
  719 emptyArityRes :: CallArityRes
  720 emptyArityRes = (emptyUnVarGraph, emptyVarEnv)
  721 
  722 unitArityRes :: Var -> Arity -> CallArityRes
  723 unitArityRes v arity = (emptyUnVarGraph, unitVarEnv v arity)
  724 
  725 resDelList :: [Var] -> CallArityRes -> CallArityRes
  726 resDelList vs ae = foldl' (flip resDel) ae vs
  727 
  728 resDel :: Var -> CallArityRes -> CallArityRes
  729 resDel v (!g, !ae) = (g `delNode` v, ae `delVarEnv` v)
  730 
  731 domRes :: CallArityRes -> UnVarSet
  732 domRes (_, ae) = varEnvDom ae
  733 
  734 -- In the result, find out the minimum arity and whether the variable is called
  735 -- at most once.
  736 lookupCallArityRes :: CallArityRes -> Var -> (Arity, Bool)
  737 lookupCallArityRes (g, ae) v
  738     = case lookupVarEnv ae v of
  739         Just a -> (a, not (g `hasLoopAt` v))
  740         Nothing -> (0, False)
  741 
  742 calledWith :: CallArityRes -> Var -> UnVarSet
  743 calledWith (g, _) v = neighbors g v
  744 
  745 addCrossCoCalls :: UnVarSet -> UnVarSet -> CallArityRes -> CallArityRes
  746 addCrossCoCalls set1 set2 = first (completeBipartiteGraph set1 set2 `unionUnVarGraph`)
  747 
  748 -- Replaces the co-call graph by a complete graph (i.e. no information)
  749 calledMultipleTimes :: CallArityRes -> CallArityRes
  750 calledMultipleTimes res = first (const (completeGraph (domRes res))) res
  751 
  752 -- Used for application and cases
  753 both :: CallArityRes -> CallArityRes -> CallArityRes
  754 both r1 r2 = addCrossCoCalls (domRes r1) (domRes r2) $ r1 `lubRes` r2
  755 
  756 -- Used when combining results from alternative cases; take the minimum
  757 lubRes :: CallArityRes -> CallArityRes -> CallArityRes
  758 lubRes (g1, ae1) (g2, ae2) = (g1 `unionUnVarGraph` g2, ae1 `lubArityEnv` ae2)
  759 
  760 lubArityEnv :: VarEnv Arity -> VarEnv Arity -> VarEnv Arity
  761 lubArityEnv = plusVarEnv_C min
  762 
  763 lubRess :: [CallArityRes] -> CallArityRes
  764 lubRess = foldl' lubRes emptyArityRes