#| Clan size in Racket.Gamble https://brainstellar.com/puzzles/probability/1009 """ The people in a country are partitioned into clans. In order to estimate the average size of a clan, a survey is conducted where 1000 randomly selected people are asked to state the size of the clan to which they belong. How does one compute an estimate average clan size from the data collected? Solution: This is more of a puzzle-to-ponder rather than a puzzle to learn. In my opinion, best estimator for average is sum( n )/ sum( #n/n), where #n is the number of people with clan size as 'n', and this sum is over all the values of 'n' we receive. """ If a poisson distribution is used, it's enough to pick just one card to get a fairly good estimate (at least on average). Here's an experiment using 10 people and we look at just one of the cards: var : real-avg 122.3: 0.02099999999999999 120.3: 0.01999999999999999 123.0: 0.01899999999999999 121.1: 0.01699999999999999 122.1: 0.01699999999999999 ... 114.5: 0.0009999999999999994 129.4: 0.0009999999999999994 128.9: 0.0009999999999999994 112.8: 0.0009999999999999994 112.3: 0.0009999999999999994 mean: 121.00989999999986 Credible interval (0.84): 116.2..125.5 var : est-avg 123.0: 0.040999999999999995 127.0: 0.04099999999999999 126.0: 0.039999999999999994 120.0: 0.039 122.0: 0.03899999999999999 ... 151.0: 0.0009999999999999994 150.0: 0.0009999999999999994 92.0: 0.0009999999999999994 93.0: 0.0009999999999999994 156.0: 0.0009999999999999994 mean: 121.52299999999998 Credible interval (0.84): 104.0..135.0 var : est-diff 4.099999999999994: 0.014999999999999989 1.5: 0.014999999999999989 0.29999999999999716: 0.01299999999999999 4.299999999999997: 0.011999999999999992 2.0999999999999943: 0.011999999999999992 ... 14.599999999999994: 0.0009999999999999994 15.0: 0.0009999999999999994 18.799999999999997: 0.0009999999999999994 15.5: 0.0009999999999999994 22.099999999999994: 0.0009999999999999994 mean: 7.974499999999997 Credible interval (0.84): 0.0..14.900000000000006 var : formula 123: 0.040999999999999995 127: 0.04099999999999999 126: 0.039999999999999994 120: 0.039 122: 0.03899999999999999 ... 151: 0.0009999999999999994 89: 0.0009999999999999994 156: 0.0009999999999999994 92: 0.0009999999999999994 93: 0.0009999999999999994 mean: 121.52299999999998 Credible interval (0.84): 104..135 * Using 1+(random-integer 100) var : real-avg 51.71: 0.004999999999999997 53.66: 0.004999999999999997 50.01: 0.004999999999999997 49.52: 0.004999999999999997 51.63: 0.004999999999999997 ... 51.76: 0.0009999999999999994 51.56: 0.0009999999999999994 51.61: 0.0009999999999999994 51.11: 0.0009999999999999994 44.79: 0.0009999999999999994 mean: 50.29121999999994 Credible interval (0.84): 45.85..54.23 var : est-avg 99.0: 0.020999999999999994 17.0: 0.01799999999999999 91.0: 0.01799999999999999 27.0: 0.01599999999999999 68.0: 0.014999999999999989 ... 14.0: 0.004999999999999997 47.0: 0.004999999999999997 55.0: 0.0039999999999999975 51.0: 0.0029999999999999983 7.0: 0.0019999999999999987 mean: 50.33399999999997 Credible interval (0.84): 1.0..84.0 var : est-diff 42.96: 0.0029999999999999983 11.189999999999998: 0.0029999999999999983 25.369999999999997: 0.0019999999999999987 29.549999999999997: 0.0019999999999999987 9.420000000000002: 0.0019999999999999987 ... 38.82: 0.0009999999999999994 44.29: 0.0009999999999999994 51.26: 0.0009999999999999994 38.17: 0.0009999999999999994 19.409999999999997: 0.0009999999999999994 mean: 25.346959999999974 Credible interval (0.84): 3.0799999999999983..44.81 var : formula 99: 0.020999999999999994 17: 0.01799999999999999 91: 0.01799999999999999 27: 0.01599999999999999 68: 0.014999999999999989 ... 73: 0.004999999999999997 75: 0.004999999999999997 55: 0.0039999999999999975 51: 0.0029999999999999983 7: 0.0019999999999999987 mean: 50.33399999999997 Credible interval (0.84): 1..84 However, for "wilder" distribution such as exponential or cauchy this methods does not work well. For example here's an example of 100 people, a sample size of 10 and (cauchy 1 3): var : real-avg 4.9945013861507235: 0.0009999999999999994 -1.9282693164066032: 0.0009999999999999994 -5.78385306662126: 0.0009999999999999994 5.834977904097586: 0.0009999999999999994 4.338080853485787: 0.0009999999999999994 ... -2.8382552508601298: 0.0009999999999999994 7.893463081046366: 0.0009999999999999994 -9.467293987574438: 0.0009999999999999994 2.690621430563473: 0.0009999999999999994 20.839359467014873: 0.0009999999999999994 mean: 5.504197112000554 Credible interval (0.84): -10.289982577554063..15.503944495128742 var : est-avg -42.079793684950864: 0.0009999999999999994 4.862897359572078: 0.0009999999999999994 7.047966444002657: 0.0009999999999999994 20.278161939723113: 0.0009999999999999994 4.882008351763264: 0.0009999999999999994 ... -3.2573329715359343: 0.0009999999999999994 3.906248404409567: 0.0009999999999999994 -7.760236238256129: 0.0009999999999999994 6.2403040456566625: 0.0009999999999999994 0.4132067650224469: 0.0009999999999999994 mean: 7.505723777389244 Credible interval (0.84): -9.445547638340894..13.633543061517742 var : est-diff 10.162322947279558: 0.0009999999999999994 11.59943510872656: 0.0009999999999999994 36.109041678191325: 0.0009999999999999994 2.042931594378473: 0.0009999999999999994 3.6757305724692033: 0.0009999999999999994 ... 4.710982185081698: 0.0009999999999999994 4.764455581609525: 0.0009999999999999994 2.243272917092363: 0.0009999999999999994 0.3511945960294707: 0.0009999999999999994 57.8534979456945: 0.0009999999999999994 mean: 29.97468506741003 Credible interval (0.84): 0.009552321198864178..21.150784614798418 var : formula -42.079793684950864: 0.0009999999999999994 4.862897359572078: 0.0009999999999999994 7.047966444002657: 0.0009999999999999994 20.278161939723113: 0.0009999999999999994 4.882008351763264: 0.0009999999999999994 ... -3.2573329715359343: 0.0009999999999999994 3.906248404409567: 0.0009999999999999994 -7.760236238256129: 0.0009999999999999994 6.2403040456566625: 0.0009999999999999994 0.4132067650224469: 0.0009999999999999994 mean: 7.505723777389244 Credible interval (0.84): -9.445547638340894..13.633543061517742 This program was created by Hakan Kjellerstrand, hakank@gmail.com See also my Racket page: http://www.hakank.org/racket/ |# #lang gamble ; (require gamble/viz) (require racket) (require "gamble_utils.rkt") ; (require "gamble_distributions.rkt") (define (model) (; enumerate ; rejection-sampler importance-sampler ; mh-sampler (define num-people 100) (define num-to-pick 1) ; Number of samples to pick (define (clans i) (add1 (poisson 100))) ; (define (clans i) (add1 (random-integer 10))) ; (define (clans i) (add1 (normal 100 15))) ; (define (clans i) (add1 (exponential 100))) ; (define (clans i) (add1 (cauchy 1 3))) (define all-clans (for/list ([i num-people]) (clans i))) (define real-avg (* 1.0 (avg all-clans))) ; The sample: We take num-to-pick samples for the presented clan cards (define the-sample (take all-clans num-to-pick)) (define est-avg (* 1.0 (avg the-sample))) (define est-diff (abs (- est-avg real-avg))) ; The formula in the Solution part ; It gives the same result as est-avg (define formula (/ (sum the-sample) (sum (hash-values (collect the-sample))))) (list real-avg est-avg est-diff formula ) ) ) (show-marginals (model) (list "real-avg" "est-avg" "est-diff" "formula" ) #:num-samples 1000 #:truncate-output 5 ; #:skip-marginals? #t ; #:show-stats? #t #:credible-interval 0.84 ; #:show-histogram? #t ; #:show-percentiles? #t )