#| 

  Clan size in Racket.Gamble 

  https://brainstellar.com/puzzles/probability/1009
  """
  The people in a country are partitioned into clans. In order to estimate 
  the average size of a clan, a survey is conducted where 1000 randomly 
  selected people are asked to state the size of the clan to which they belong. 
  How does one compute an estimate average clan size from the data collected?

  Solution: This is more of a puzzle-to-ponder rather than a puzzle to learn. 
  In my opinion, best estimator for average is sum( n )/ sum( #n/n), where 
  #n is the number of people with clan size as 'n', and this sum is over all 
  the values of 'n' we receive.
  """

  If a poisson distribution is used, it's enough to pick just one card to 
  get a fairly good estimate (at least on average). Here's an experiment
  using 10 people and we look at just one of the cards:

var : real-avg
122.3: 0.02099999999999999
120.3: 0.01999999999999999
123.0: 0.01899999999999999
121.1: 0.01699999999999999
122.1: 0.01699999999999999
...
114.5: 0.0009999999999999994
129.4: 0.0009999999999999994
128.9: 0.0009999999999999994
112.8: 0.0009999999999999994
112.3: 0.0009999999999999994
mean: 121.00989999999986
Credible interval (0.84): 116.2..125.5

var : est-avg
123.0: 0.040999999999999995
127.0: 0.04099999999999999
126.0: 0.039999999999999994
120.0: 0.039
122.0: 0.03899999999999999
...
151.0: 0.0009999999999999994
150.0: 0.0009999999999999994
92.0: 0.0009999999999999994
93.0: 0.0009999999999999994
156.0: 0.0009999999999999994
mean: 121.52299999999998
Credible interval (0.84): 104.0..135.0

var : est-diff
4.099999999999994: 0.014999999999999989
1.5: 0.014999999999999989
0.29999999999999716: 0.01299999999999999
4.299999999999997: 0.011999999999999992
2.0999999999999943: 0.011999999999999992
...
14.599999999999994: 0.0009999999999999994
15.0: 0.0009999999999999994
18.799999999999997: 0.0009999999999999994
15.5: 0.0009999999999999994
22.099999999999994: 0.0009999999999999994
mean: 7.974499999999997
Credible interval (0.84): 0.0..14.900000000000006

var : formula
123: 0.040999999999999995
127: 0.04099999999999999
126: 0.039999999999999994
120: 0.039
122: 0.03899999999999999
...
151: 0.0009999999999999994
89: 0.0009999999999999994
156: 0.0009999999999999994
92: 0.0009999999999999994
93: 0.0009999999999999994
mean: 121.52299999999998
Credible interval (0.84): 104..135


  * Using 1+(random-integer 100)

var : real-avg
51.71: 0.004999999999999997
53.66: 0.004999999999999997
50.01: 0.004999999999999997
49.52: 0.004999999999999997
51.63: 0.004999999999999997
...
51.76: 0.0009999999999999994
51.56: 0.0009999999999999994
51.61: 0.0009999999999999994
51.11: 0.0009999999999999994
44.79: 0.0009999999999999994
mean: 50.29121999999994
Credible interval (0.84): 45.85..54.23

var : est-avg
99.0: 0.020999999999999994
17.0: 0.01799999999999999
91.0: 0.01799999999999999
27.0: 0.01599999999999999
68.0: 0.014999999999999989
...
14.0: 0.004999999999999997
47.0: 0.004999999999999997
55.0: 0.0039999999999999975
51.0: 0.0029999999999999983
7.0: 0.0019999999999999987
mean: 50.33399999999997
Credible interval (0.84): 1.0..84.0

var : est-diff
42.96: 0.0029999999999999983
11.189999999999998: 0.0029999999999999983
25.369999999999997: 0.0019999999999999987
29.549999999999997: 0.0019999999999999987
9.420000000000002: 0.0019999999999999987
...
38.82: 0.0009999999999999994
44.29: 0.0009999999999999994
51.26: 0.0009999999999999994
38.17: 0.0009999999999999994
19.409999999999997: 0.0009999999999999994
mean: 25.346959999999974
Credible interval (0.84): 3.0799999999999983..44.81

var : formula
99: 0.020999999999999994
17: 0.01799999999999999
91: 0.01799999999999999
27: 0.01599999999999999
68: 0.014999999999999989
...
73: 0.004999999999999997
75: 0.004999999999999997
55: 0.0039999999999999975
51: 0.0029999999999999983
7: 0.0019999999999999987
mean: 50.33399999999997
Credible interval (0.84): 1..84


  However, for "wilder" distribution such as exponential or cauchy this methods does not work well.
  For example here's an example of 100 people, a sample size of 10 and (cauchy 1 3):
  var : real-avg
4.9945013861507235: 0.0009999999999999994
-1.9282693164066032: 0.0009999999999999994
-5.78385306662126: 0.0009999999999999994
5.834977904097586: 0.0009999999999999994
4.338080853485787: 0.0009999999999999994
...
-2.8382552508601298: 0.0009999999999999994
7.893463081046366: 0.0009999999999999994
-9.467293987574438: 0.0009999999999999994
2.690621430563473: 0.0009999999999999994
20.839359467014873: 0.0009999999999999994
mean: 5.504197112000554
Credible interval (0.84): -10.289982577554063..15.503944495128742

var : est-avg
-42.079793684950864: 0.0009999999999999994
4.862897359572078: 0.0009999999999999994
7.047966444002657: 0.0009999999999999994
20.278161939723113: 0.0009999999999999994
4.882008351763264: 0.0009999999999999994
...
-3.2573329715359343: 0.0009999999999999994
3.906248404409567: 0.0009999999999999994
-7.760236238256129: 0.0009999999999999994
6.2403040456566625: 0.0009999999999999994
0.4132067650224469: 0.0009999999999999994
mean: 7.505723777389244
Credible interval (0.84): -9.445547638340894..13.633543061517742

var : est-diff
10.162322947279558: 0.0009999999999999994
11.59943510872656: 0.0009999999999999994
36.109041678191325: 0.0009999999999999994
2.042931594378473: 0.0009999999999999994
3.6757305724692033: 0.0009999999999999994
...
4.710982185081698: 0.0009999999999999994
4.764455581609525: 0.0009999999999999994
2.243272917092363: 0.0009999999999999994
0.3511945960294707: 0.0009999999999999994
57.8534979456945: 0.0009999999999999994
mean: 29.97468506741003
Credible interval (0.84): 0.009552321198864178..21.150784614798418

var : formula
-42.079793684950864: 0.0009999999999999994
4.862897359572078: 0.0009999999999999994
7.047966444002657: 0.0009999999999999994
20.278161939723113: 0.0009999999999999994
4.882008351763264: 0.0009999999999999994
...
-3.2573329715359343: 0.0009999999999999994
3.906248404409567: 0.0009999999999999994
-7.760236238256129: 0.0009999999999999994
6.2403040456566625: 0.0009999999999999994
0.4132067650224469: 0.0009999999999999994
mean: 7.505723777389244
Credible interval (0.84): -9.445547638340894..13.633543061517742


  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Racket page: http://www.hakank.org/racket/

|#

#lang gamble

; (require gamble/viz)
(require racket)
(require "gamble_utils.rkt")
; (require "gamble_distributions.rkt")


(define (model)
  (; enumerate
   ; rejection-sampler
   importance-sampler
   ; mh-sampler

   (define num-people 100)
   (define num-to-pick 1) ; Number of samples to pick

   (define (clans i) (add1 (poisson 100)))
   ; (define (clans i) (add1 (random-integer 10)))
   ; (define (clans i) (add1 (normal 100 15)))
   ; (define (clans i) (add1 (exponential 100)))
   ; (define (clans i) (add1 (cauchy 1 3)))      

   (define all-clans (for/list ([i num-people]) (clans i)))
   (define real-avg (* 1.0 (avg all-clans)))

   ; The sample: We take num-to-pick samples for the presented clan cards
   (define the-sample (take all-clans num-to-pick))
   (define est-avg (* 1.0 (avg the-sample)))
   
   (define est-diff (abs (- est-avg real-avg)))

   ; The formula in the Solution part
   ; It gives the same result as est-avg
   (define formula (/ (sum the-sample) (sum (hash-values (collect the-sample)))))
   
   (list real-avg
         est-avg
         est-diff
         formula
         )
   
   )
)

(show-marginals (model)
                (list  "real-avg"
                       "est-avg"
                       "est-diff"
                       "formula"
                       )
                #:num-samples 1000
                #:truncate-output 5
                ; #:skip-marginals? #t
                ; #:show-stats? #t
                #:credible-interval 0.84
                ; #:show-histogram? #t
                ; #:show-percentiles? #t
                )