Show Discussion: Benchmark

By | June 1, 2015

benchmarkWeekdays, 4pm
Channel 4

A simple game with a simple host, Paddy McGuinness challenges ten people to answer a range of numerical questions, the average of which forms the notional Benchmark. The challenge for our plucky main player is to work out whether the actual answer is above or below the benchmark for big cash prizes.

There’s also a secondary game going on across the hour, the ten benchmarkers are trying to be as accurate as possible because the best one gets to come back the next episode as the main player, in a fairly clever mechanic not seen since Judgemental with Sophie Raworth.

I’ve no idea how well it will fill an hour, but producer Stuart Shawcross comes across as one of the good guys, so we wish it best of luck. There are also five celebrity specials to go out in primetime, and here’s an interview with Paddy.

124 thoughts on “Show Discussion: Benchmark

  1. David Howell

    This is an important show for C4. They need to replace DoND, surely. Is this going to do it?

    Given that even a contestant has popped up on here and said some people will hate it, I’m not confident…

    Reply
    1. Brig Bother Post author

      I suspect the show they’re planning on replacing DOND with is the 300,000 megahit 15 to 1, reading between some lines.

      Reply
  2. MrCT2U

    That will be me you are making reference to David! ;).

    I did say that in my honest opinion being a contestant on the show later on in the series that some will like it and some will absolute hate it and that could be due to some of the “characters” that will appear on Benchmark. Don’t get me wrong it is certainly something very different to your average tea-time quiz/game show and part of me hopes that the show is a success mainly due to how brilliant and incredibly professional the entire team at Victory Television who work on the show were and they were awesome but I just fear the general viewing public might not take to it. Obviously I can say more after my time on the show has been TX’ed.

    As regarding 15 to 1, I appeared on the show last year and had a whale of a time despite doing abysmal but from my sources and things that I have heard it would not surprise if the show gets 2 series per year like it did during the William G era!.

    Reply
  3. Andrew Hain

    It’s that time again, can someone please give me a complete rundown of the format?

    Reply
    1. Andrew 'Kesh' Sullivan

      Yeah, I’m sure someone will give you a full run-down. It’s broadcasting right now so if no-one else does, then I probably will in about an hour or so

      Reply
    2. Andrew 'Kesh' Sullivan

      OK, here’s the show as I see it:

      For this first show, we’re introduced to 11 players and 2 of them have been randomly selected to compete against each other to play the game. They are given a question and they both lock in their answers. Whoever is closest to the correct answer is the player and the other goes back to the others and is a Benchmarker. For subsequent episodes, the 2 best Benchmarkers from the previous episode will compete to be the next game’s player.

      The player faces 7 questions taken from various polling sources. Questions shown today included ‘How much wine does the average French person drink in a year?’ and ‘In miles, what is the distance between London and Glasgow as the crow flies’. The Benchmarkers all lock in their answers and the average is shown on a scale behind them. The 3 Benchmarkers that were the furthest away are out, and Paddy chats to 1 or 2 of them and their answers are shown so that the player has a better chance of working out what they should do. The player then has to decide whether the correct answer is higher or lower than the benchmark that has been set. If they are correct, they move 1 space up a money scale, but if wrong, they stay where they are and the highest amount is removed from the game, which could be as high as £25,000. The money scale is as follows: £50, £100, £250, £500, £1,000, £2,000, £4,000, £6,000, £10,000, £15,000, £25,000. A red box covers a block of 3 amounts. Introduced at the 5th question is a Benchmark Bonus. If the player uses it and is correct, they move 2 spaces up the money scale instead of 1.

      After the 7 questions have been asked, the player then has a chance to win one of the 3 amounts they have landed on. They are asked 3 further questions, each with a percentile answer. Today’s were ‘What percentage of British households are thought to own a Scrabble set?’, ‘What percentage of all the words in the English language contain the letter ‘e’?’, and ‘In England and Wales, what percentage of people walk to work?’. For each question, the player can ask one of the Benchmarkers for advice before locking their answers. Then, the game goes The Price Is Right on us as this final round is essentially Cliffhanger. For every percent that the player is out, a red bar makes its way across the cash amounts, starting at the higher amount, with each cash amount having 33 spaces. Wherever the bar ends up after all 3 questions have been answered determines what the player wins, if anything.

      Reply
  4. Brig Bother Post author

    One interesting wrinkle is that although the Benchmarkers are trying to be helpful, in fact they’re being anything but as the closer they are to the answer the harder it becomes for the player to succeed.

    That’s mildly entertaining and devious, intrigued to see if this gets picked up on on the show.

    Reply
  5. Merman

    The banter from Take Me Out, the row of contestants from DoND, a hint of 8 Out Of 10 Cats… Not convinced this is going to be a long-term hit…

    Reply
  6. Oliver

    I went in with very low expectations, as the blurb sounded rubbish, but I was really impressed. Really entertaining, fun to play along with, and far funnier than I expected. Paddy McGuinness is a very good host and did a good job at bringing out the humour in the format.

    Felt like Pointless crossed with Two Tribes with a touch of the Deal or No Deal community/personality aspect thrown in for flavour.

    Best new gameshow I’ve watched in a long time.

    Reply
  7. Greg

    I feel ill be in a minority but i loved it. I laughed hard at some of the answers given. Great hosing, well thought out game that is very enjoyable to watch and is something a bit different. I like the mechanics of the game and how that decides the prize. Great question writing loved everything about it.

    By far my favourite new show of the year

    Reply
  8. Jon

    Ok… I wasn’t expecting much but found it funny. Format was really neat, leaderboard obviously makes the game work for benchmarkers, and questions were different and good.
    Was it just me or did the end game have a hint of yodelling man? Not a diss by any means – I like the device.
    Best new daytime show for some time…

    Reply
  9. Daniel Peake

    Benchmark – I quite like it. It’s not great, but it is good. It’s definitely play-a-long and I really like the endgame.

    There’s lots of polls in this sort of show, but I reckon that’s perfectly acceptable for the gameplay. The questions are fun and not boring, which is good.

    I thought the pacing was ok, if slightly slow, but this show makes excellent background watching. You don’t have to watch every single second of it to understand what is going on, and that is a good thing.

    The endgame is good – and based on having up to 100 percentage points of error, I reckon they’ll probably end up winning the lowest or middle amount of money on most days – so expect lots of wins in the low thousands. (And yes, yodelling man over the top would be appropriate)

    The revelation about how the other Benchmarkers answered needs some form of onscreen graphic – you can’t see in the tiny boxes. Also when answer reveals happen, an onscreen graphic of the benchmark overlaid on the contestants reaction would have been better, rather than cutting between the shots.

    There’s something about the show that niggles me that I can’t quite put my finger on that stops me from saying I really like this show. Does anyone else have that? This must have been the first show to record, so give time for it to bed in.

    Does anyone know, is it possible for someone to spend the entire series as a Benchmarker, or do they have a finite number of shows to become the One on?

    Overall, a good start to a comfortable show, I hope it beds in. It’s good, but it’s not great yet.

    And, for those interested, the set is *just* light enough to stop my set-is-too-dark-klaxon from going off, but it’s close.

    Reply
    1. MrCT2U

      It is possible for someone to be a Benchmarker throughout the entire series and still end up not playing for the money and you were told that you could leave the show at any time as there was no set limit of how many shows you could do and obviously if at any time your the lowest scoring Benchmarker then you are out.

      I do know there are at a least a few contestants that have long runs on the show and I would love to say more especially about two of them and how much they irritated the living daylights out of me but I will do that once my first appearance on the show is TX’ed 😉

      Reply
      1. Daniel Peake

        It’s like DoND in that respect, you’ll like some of the long term Benchmarkers, and you won’t like others.

        Reply
        1. MrCT2U

          In the words of Ed Milliband “Hell yeah!”.

          Certainly in my opinion the casting producers picked a wide and varied bunch of contestants from all differing backgrounds which is good but I guarantee you will get some absolute herberts that come across as loud, obnoxious and not very bright.

          Put it this way you could be a very competent quizzer that knows their stuff but on a show like this skill and knowledge mean diddly squat, you need serious amounts of luck and I just could not believe how random and obscure some of the questions were during my time on the show.

          Reply
  10. Brekkie

    Will take a look at thie at some point this week but it’s the first new daytime C4 show in ages which doesn’t look like cheap filler. Has a shot in the 4pm slot as opposed to 5pm where 5 Minutes to a Fortune was put to the slaughter.

    Reply
  11. David

    I’d think the obvious strategy in the endgame is “when in doubt, say 50%”- with 33% to play with in each question, if the question doesn’t scream a high or low number, go in the middle and you’ll most likely end up with something at the end….

    Reply
    1. David B

      Yes, all three questions would have to be 83 for that strategy not to work.

      Reply
      1. David B

        Oops, it thought I was using HTML tags. That meant to say “less than 17 or greater than 83”.

        Reply
        1. Brig Bother Post author

          Mmm, the reveal is like Cliffhanger, but the game is more Extreme Lucky 7 really.

          Will be intrigued to see how the answers go throughout the series, I expect one middling number and two fairly extreme ones througout.

          Reply
    2. Daniel Peake

      Yep, spotted that too. Thus I reckon we’ll see a reasonable number of answers with answers reasonably near 0 and 100%.

      Reply
  12. John R

    What happens if the Benchmarkers actually hit the answer spot on?

    Reply
    1. Simon F

      I’d guess they would ask probably ask 1-2 back up questions per show in case the average answer turned out to be spot on and then use one of the back up questions in its place.

      Reply
  13. Brig Bother Post author

    It’s OK, it works as a format and its clever and I get it, but I can’t say I found it massively compelling if I’m being honest. I might like it better when I’ve not been working all day so will give it another chance at the weekend.

    MacGuinness much happier and convincing chatting to the contestants than doing the procedure.

    I note the touch of the Hot Seat with the elimination if the top values after a wrong answer.

    Reply
  14. Tim

    I have to say that I turned off halfway through. It just didn’t do enough to keep me interested for the whole hour. Which is a shame really, because I really enjoyed my time on the show!

    Reply
    1. Paul B

      Perfection 840,000 (12.4%)
      The Box 1,078,000 (12.0%)
      Pointless 3,125,000 (22.4%)

      Eggheads 942,000 (5.5%)
      Beat the Brain 825,000 (4.4%)
      Antiques Road Trip 1,358,200 (6.6%)

      Judge Rinder 954,000 (13.7%)
      Dickinson’s Real Deal 1,032,300 (14.0%)
      Tipping Point 2,037,000 (21.0%)
      The Chase 2,551,700 (19.0%)

      Countdown 319,200 (4.7%)
      Deal or No Deal 391,900 (5.3%)
      Benchmark 381,200 (3.9%)
      Couples Come Dine With Me 990,600 (7.4%)

      Big Brother 1,111,800 (7.2%)
      Big Brother’s Bigger Bit on the Side 347,900 (5.3%)

      Reply
        1. Score

          Tipping Point doing very well. But low for The Chase but it always starts low after a break so it should build. Still much better than Paul O’Grady though. Rinder is still on repeats so that looks like quite a decent number really.

          Reply
      1. Jon

        Be interesting to see how Benchmark does over a long run, i think all new shows need 3 or 4 weeks to bed in…

        Tipping point 2m! amazing…

        Reply
        1. Brig Bother Post author

          Time to bed in but the trend tends to be your friend, if it’s not rising within the next week or so it probably never will.

          Reply
  15. Sceptical

    i have been trying to discover if the really low percentage of words in English with e in it is right because it seems way off for me particularly as there are only 5 vowels so I would have guessed a lower limit greater than 20% and knowing that e most common frequency in text I would have guessed in excess of 50%. They quoted Oxford dictionary as source – can anyone suggest someone I could trust? Suzie Dent? I am really surprised no one else has queried it

    Reply
    1. Jon

      I found a figure of 11.16% of all words use the letter E.
      I would expect Oxford Dictionaries to be a pretty reliable source.

      Reply
      1. sceptical

        Trouble is we are asked to take the information on trust and there does not seem to be anyway to verify unless I download a comparable dictionary and run my own analysis for how many words contain E. They say their source is Oxford Dictionary but does that mean they asked the dictionary compilers or they looked it up in the dictionary? If they have an algorithm I would like to see some of the 88% of words which do not contain an E.

        Apparently the Oxford English Dictionary contains full entries for 171,476 words in current use, 47,156 obsolete words and around 9,500 derivative words included as subentries. Does anyone know a public domain source that is as comprehensive?

        Reply
        1. David B

          You don’t need one – just pick any dictionary at random, look at the first 200 words and you’ll probably get within 2% of the answer.

          Reply
          1. sceptical

            OK! I downloaded 109,582 English words in an arbitrary online word list and found that 75,470 contained letter e which is 68.87% which is in the range that my gut feel would have suggested – I would actually probably have gone slightly higher, say, 75%. Throws a lot of doubt on the answer benchmark gave and also undermines Oxford Dictionaries as a source (if they actually truly gave the original answer).

            I think their error was they misunderstood the elementary code breakers rule for straight substitution ciphers that in any representative English text the letter e is most common and appears with a frequency of 12% of the total.

            I can stop fretting now but I will have to stop watching Benchmark after only a couple of episodes because I won’t be able to trust their answers

  16. Chris M. Dickson

    Two episodes in, I reckon there’s a lot to like here and I hope this is a big, long-lasting (though inherently very gentle) sleeper hit. It has the feel of a weekend early primetime show, so I can see why Channel 4 are treating it in the same sort of why they treated DoND.

    I’m not a big Paddy fan on Take Me Out, but this is all feelgood stuff; Paddy surprised me by just how well he worked here and how happy he was to play second fiddle to the contestants. I also like the contestants (including the panel members) so far, though it’s quite possible that there will be one who goes on to grate considerably.

    It doesn’t feel like there’s a lot of content per show, but nor does it feel like the show goes slowly. Compared to most artificial drama inductions, I rather like the band with the correct answer wiggling up and down, not least because there isn’t a single pattern that it observes each time, though the monotonic jangle as the question is revealed gets very old very quickly.

    The question material is surprising and a lot of fun, but I love numbers and number-based questions in general. Bonus points for quoting the sources (interesting to see The Fact Controllers credited – at last, a rival for the lovely people at Beyond Doubt!) though I get a sinking feeling when I see they have used the Daily Fail as a source. (I’d give bonus points for the production of a web page per episode pointing to the sources and the polls that generated them.)

    Bonus points for the questions where the order of magnitude can surprise; percentage questions must be between 0% and 100%, but the open-ended questions can be as high or as low as you like. I don’t do very well at answering the questions when I would back myself to get the correct order of magnitude fairly frequently and I do rather like that – it’s just plain funny when the benchmark is out by a factor of ten or more.

    Is it possible to use the numbers revealed as being incorrect as a tell as to which direction the truth lies from the benchmark, at least in the early questions? Given that three answers are stripped out and only two of them, so far, have been revealed, then not necessarily, but if the show does have a trend then it would be worth spotting as a potential contestant.

    Weird question: are the benchmarker’s accuracies per question calculated in absolute, percentage or some other fashion? It would be strange if there were a question where the answer were hundreds of thousands where the other questions were all two digits, and the benchmarkers’ overall performance effectively was decided by their performance on that one question, either by the calculation being based on absolute errors or on percentage errors (where the percentage errors might vary from, e.g. 99.98% out to a mere 90% out).

    Very good year so far, 2015.

    Reply
    1. MrCT2U

      Oh yes believe me expect to have Benchmarkers that literally will do your sodding nut in. There are two in question during my time on the show that in poker parlance put me on severe tilt as they were incredibly annoying and very cock sure of themselves!!

      Reply
    2. Brig Bother Post author

      I’m going to guess that for each question the Benchmarkers are ranked 1-9 depending on comparative order and points awarded.

      If Stuart or Ben are about as they sometimes are perhaps they will divulge.

      Reply
      1. Daniel Peake

        I think what Chris is asking is on a mathematical level, how are the errors calculated. Now, I can almost guarantee they’ll be absolute errors, if the answer is 1,000,000 and one person answers 1 and one person answers 2,000,000, the person answering 1 is closer (an error of 999,999 compared to an error of 1,000,000).

        HOWEVER, I propose that a not-tv-friendly logarithmic error should be used instead. In the above example, if the question is “population of a city” say and the answer is 1,000,000 and you answer 1, you are WAY out as you’re not even close in terms of orders of magnitude. 1 is a clearly insane answer, but 2,000,000 at least is in the ballpark of “millions”.

        I think that’s what Chris was asking, but whilst using logarithms is technically a better measure of error in this case, the actual difference between the two methods isn’t huge and using logarithms would only be deemed acceptable on BBC FOUR.

        There you have it. Bothers Bar, your source for game show mathsy geekiness!

        Reply
        1. Brig Bother Post author

          I would expect it to be absolute because that’s what a mean average person would consider when considering the mean average of the results.

          Reply
        2. Chris M. Dickson

          Totally agreed.

          The “rank everyone first-to-tenth per question and sum the ranks” route suggested by David below would also work perfectly adequately well in practice, even if it would reward slightly different things.

          Fingers crossed for second and further series; this is one that I might well apply to be a contestant for, which is not something I think at all frequently.

          Reply
        3. Tom F

          I was thinking about this, and concluded that when the question seems order-of-magnitude-ish, the distribution means the player is probably better going lower than higher, BUT THEN, the ‘7 from 10’ rule probably weakens the effect, to what extent I’m not sure. I may do a proper spiel about it when I don’t have exams.

          Reply
      2. David

        I think they show all the Benchmarkers answers after every question, so it should be easy to figure out how it’s calculated (I agree they probably rank them 1-10 on each question and total up the scores that way- it’s fairer than basing it on total percentage they were off by on each question)

        Reply
      3. RoarJustice

        On any given question the closest Benchmarker is awarded 10 points and the furthest out 1pt. These are totalled up to make up their scores on the leaderboard.

        Reply
  17. Falklad

    I am currently on the bench, the scoring is calculated by points. 10 for closest all the way down to 1 for the furthest away.

    Contestants are also asked 9 questions before the show and as you see only 7 are used. One of the other two is normally used as the viewer question the next day.

    Reply
    1. Chris M. Dickson

      Extremely interesting. With this in mind, I was impressed by the accuracy of the benchmark offered for the viewers’ question today; when so many viewers’ questions must be answered incorrectly by only a handful of entrants, presumably almost all deliberately, this one will probably get a seriously non-trivial proportion of incorrect answers.

      Wonder how they decide which seven of the nine questions to use, particularly when it might have an impact on which people on the bench go forward to do what the next day? (Or are all nine questions scored, regardless of whether they’re used on-screen or not?)

      Reply
      1. Jon

        Thanks for clarification Ben. I love the detail and effort you went to ranking the contestants. A bit like worlds strongest man!

        It always surprises me how acurrate the average of the benchmarkerscan be given some of the odd contestants / answers…

        Reply
  18. David B

    Another mathematical quirk to consider – would taking the ‘mean’ average of the answers tend to skew the average higher or lower?

    Usually, large outliers tend to skew the average heavily when most of the answers are small but the outliers are large.

    BUT, if three outliers are removed, does that mean that in fact the larger numbers tend to get discounted more, meaning that the benchmark is actually more often lower than the correct answer?

    It’s for this reason that the median average is often taken for this kind of thing.

    Reply
    1. Stuart

      David, The average of the benchmarkers answers are a real mix of higher and lower than the correct answer… There wasn’t really one which was more prevalent than the other. Removing the really inaccurate answers (both high or too low) is what makes the benchmark so close, and the format work.

      Likewise choosing which way to go based on the benchmarker who is out isn’t that helpful either, as they are sometimes higher, sometimes lower – again a real mix.

      Reply
      1. Daniel Peake

        That’s quite a subtle point that I hadn’t appreciated – that the outliers can make benchmarks unreliable, so you have to get rid of some of the chaff each time. Interesting.

        Reply
      2. David B

        Let me be clearer about what I mean.

        Suppose the question is “How many bones are in the adult human body?”. You’d likely get a range like this:

        47 89 105 140 200 241 270 390 805 1400

        Although most people have guessed around the right kind of answer (206), the mean average of these answers is 369 because the higher numbers affect the mean much more.

        Now, obviously the “three outlier” rules has an effect to mollify this. But in fact, rather than preventing outliers from ruining the average, I have a theory that this rule actually REVERSES the effect I’ve outlined above.

        To continue the example, if we now chop out the three outliers then the average crashes down to 156 and on the low side of the correct answer.

        To put it in more basic terms, it’s easier to be wrong on the high side on a question like this, because you know it clearly can’t be small. That means that three large numbers are more likely to be the ones that get culled, so the average falls dramatically.

        As such, the benchmark is likely to be lower than the correct answer for questions with open number ranges (so, not the percentage ones). So you should go “Higher” if you don’t have an inkling.

        This effect is likely to be marginal, and it’ll be interesting to see if it works out like this in practice (8 higher, 5 lower so far). While the game is not broken, I do think the effect will be quite noticeable for those questions that involve larger numbers.

        Reply
        1. Jon

          I’m not sure your reasoning is quite right, surely if the open ended question provokes big number answers then more than three players could be very far out… so the Benchmark could still skewed higher than the answer.

          But from what i’ve seen so far it seems to me like you can’t predict it.
          People’s answers and thus the benchmark are varied.

          Reply
          1. David B

            All seven benchmarks for the main game today were lower than the correct answer! My hypothesis is 15 vs. 5 at the moment.

          2. David B

            I also note that “off the scale” events have so far been exclusively to the right hand side…

  19. David

    I wonder if Paddy is told which of the outlier Benchmarkers to talk to for each question or he just picks them at random…

    Reply
    1. Paul

      He already knows, I knew when he was coming to me on my show (near the end of the show)

      Reply
  20. jaybs

    Well Channel 4 have reached the bottom, I don’t know who sits and thinks up such trash TV as this, everything is so false and host Paddy McGuinness become boring and tiring!

    Just think that Channel 4 was supposed to be cutting edge, now anything to fill a hour TV slot in a peak tea time period?

    Reply
    1. Jon

      What would you put there?

      Must say i’m enjoying the show.

      Reply
  21. Paul B

    Not great news so far on the ratings front. Everything down on Wednesday (probably due to weather, and maybe tennis), but even so. As well as BBC One and ITV, Benchmark was behind Allo Allo on BBC Two and an old movie on Channel 5. It also trailed Jeremy Kyle on ITV2 and The French Open on ITV4.

    Perfection 692,000 (13.1%)
    The Box 699,100 (9.8%)
    Pointless 2,357,100 (20.4%)

    Eggheads 718,700 (5.0%)
    Beat the Brain 550,600 (3.6%)

    Judge Rinder 715,000 (13.2%)
    Dickinson’s Real Deal 846,600 (14.1%)
    Tipping Point 1,489,400 (19.6%)
    The Chase 2,151,300 (19.5%)
    The Cube 3,338,900 (18.4%)

    Countdown 243,600 (4.6%)
    Deal or No Deal 287,100 (4.8%)
    Benchmark 213,300 (2.8%)
    Couples Come Dine With Me 689,200 (6.3%)

    Big Brother 1,080,300 (6.8%)
    Big Brother’s Bit on the Side 378,900 (4.7%)

    Reply
  22. Stefan

    Anyone know what Big Brother got last night? And Deal or No Deal? I’m guessing both would be down even further due to the weather. Also is you have Tuesdays Big Brother rating too, that would be appreciated greatly!

    Reply
  23. weeble wobble

    Having been a Benchmarker and on other shows I feel that it just needs time to bed in as it was excellent and very funny to do, I seem to remember Tipping point being slow and awful at first then look how that’s improved! I feel this is far more entertaining! And I can’t wait to find out who the other people MrCT2u disliked! Hopefully not me!

    Reply
    1. Brig Bother Post author

      I’m always a bit fascinated by people’s comparisons to Tipping Point, it really hasn’t really changed all that much from S1, and it was getting around 2m right off the bat (despite Twitter reaction) although I’m glad people have come round to it obviously.

      Reply
    2. MrCT2U

      Well it all depends whether our paths crossed on the show. The way you worded your comment I have a feeling it might have perhaps?

      Reply
  24. Jon

    I saw in the Manchester Evening News that they filmed a celeb version…

    Reply
        1. RoarJustice

          Benchmark was always coming off for the racing, as it did last week.

          Reply
          1. Brig Bother Post author

            I get that, but Monday is normal afternoon programming with extra A Place In The Sun and minus Benchmark. Isn’t that a bit odd?

          2. Weaver

            Er, yes. Provisional schedules for the week of 22 June show Benchmark moving to 1pm, with a new run of inferior antiques show French Collection slotting in at 4.

          3. MrCT2U

            It looks like Benchmark is going exactly the same way as 1001 things you should know as that was on at a Mid-Afternoon slot and got “demoted” to a lunchtime slot due to poor ratings and subsequently the show got the axe.

            Does not look promising if I’m being honest.

  25. David B

    Ok, we now have seven shows of data to look at. My hypothesis was that the Benchmark would be usually on the low side, so “Higher” should be the correct answer.

    From the 49 main game questions we’ve seen so far:
    “Higher” is correct 65.3% of the time
    “Higher” is correct 78.5% of the time when the benchmark is 50 or above
    “Higher” is correct 90.9% of the time when the benchmark is 1000 or above

    So, the game seems to be pretty broken. If you have no idea, you’ve got a 2:1 advantage if you go for “Higher”, and the trend gets worse as the numbers get higher, which is what I’d expect.

    Interestingly, the effect seems to happen even for the “best 3” questions (Q5-7), not just the early rounds.

    Obviously it’s fixable to make it more 50:50 by throwing out more questions than you write, but you’d have to write a heck of a lot of high number questions to reverse a 10:1 deficit.

    Maybe these seven shows are not representative, but I doubt it…

    Reply
    1. Daniel H

      Very interesting!

      Perhaps a little early based on just seven shows for this but I’ve had a look at the final round to see how strong a strategy of just going for 50% three times, irrespective of the question or your feelings for the answer, would be. We’d hypothesised you’d probably win at least something most days.

      In the 7 shows so far this would have resulted in:
      1 Loss
      2 Lowest Value Wins
      3 Middle Value Wins
      1 Highest Value Win

      This compares to
      1 Loss
      4 Lowest Value Wins
      2 Middle Value Wins
      0 Highest Value Wins
      for the contestant’s guesses who were playing “properly”

      Finally, as regards amounts won, compared to the contestants the “50% strategy” so far would have produced
      1 worse result
      2 same results
      4 better results

      Reply
      1. Brig Bother Post author

        That’s super interesting, I was expecting production to consider the effect of people going for the 50% strategy by ensuring it wouldn’t win – apparently not!

        That’s not especially a criticism, and I almost certainly would have been caught out by David’s maths left to my own devices as well. I’d certainly suggest 7 episodes is a reasonable enough sample from a 25 episode run for the numbers to work.

        Reply
        1. Daniel H

          If they made it so 50% always lost I think it would become noticeable that most of the answers were extreme. Surprised that they’ve allowed it to possibly win you the top money, though.

          Yes – 7 shows is plenty for David’s Maths – I just meant it was a bit low for mine – effectively 7 results to go on

          (today’s 8th incidentally was a 5th low win for the contestants, a 4th middle win for the 50% rule and thus a 5th day where 50% beats the contestant)

          Reply
      2. Jon

        The 50% rule seems to increase your chances of winning something… but if you wanted the big money you’d want to tweak the guesses a little based on your gut instincts… if you think it’s high go 60%, or low at 40%

        Reply
    2. Jon

      How many questions have answers of 1000 or more? It doesn’t seem like that many to me.

      From the ones i’ve seen so far none of the contestants have played this theory and gone higher – they have all gone with their gut.
      So it doesn’t feel like it’s particularly broken … it would if they did a second series and the the contestants knew this – but i’m guessing that the benchmarkers would also get wise and increase their guesses. So for a series one show it’s actually no problem at all.

      Unless i’m missing something?

      Reply
      1. Brig Bother Post author

        For the format to work properly the expectation is that the answers are effectively 50/50 shots, but this suggests anyone can basically game the game pretty easily.

        I wouldn’t say it was broken but it is rather unfortunate.

        Reply
      2. David B

        11 of the 49 questions (22.4%) have had benchmarks of 1000 or more.

        I agree there are ways of trying to fix this for a theoretical series 2. Since the higher number questions tend to be more entertaining, it would be a shame to lose them. However, the skewing effect gets worse the larger the numbers get.

        But you should never underestimate contestants. It would have taken just one person to suss this out and then the message can get passed on from one day’s contestants to the next day’s.

        Reply
        1. Tom F

          It seems to me like the obvious fix is to use the median. (In fact I was amazed when in episode 1 they didn’t use the median)

          Reply
    3. Chris M. Dickson

      As you have the data to hand: for the open-ended, non-percentage questions, does Benford’s law appear to apply? (Very small data set, obv.)

      Reply
      1. David B

        Kinda… but probably still too early. For those who don’t know what CMD is talking about, the Law of Lord Benford is a funny effect where 30% of all the numbers in the world begin with a “1” but only 4.6% begin with a “9”.

        There’s definitely a bias to 1-2-3 but there’s some funny spikes. I guess another 50 data points wouldn’t harm the trend.

        Starting digits for correct answers:
        1: 18
        2: 8
        3: 10
        4: 1
        5: 2
        6: 2
        7: 6
        8: 1
        9: 1

        Starting digits for benchmarks:
        1: 15
        2: 8
        3: 9
        4: 5
        5: 0
        6: 4
        7: 0
        8: 4
        9: 4

        The ‘fit’ for the benchmarks is noticeably worse than the correct answers, which *might* give an indicator as to why the game doesn’t work properly. [It might just be a factor that 6 of the 49 correct answers started with a 7.] It’ll be interesting to see if that divergence continues with more figures – er, if we ever get to see them…

        Reply
    1. Jon

      Including the +1 it’s 150k, but still that’s low.

      DOND was also down to 270k…and countdown under 210k.

      C4 is down all afternoon…

      Reply
    2. David B

      Yikes, that’s a pity. I love “Wits and Wagers” and was hoping this would be a good TV adaptation of the same kind of thing.

      I do think that number-based questions start to outstay their appeal after a while, so to do it both as an hour-long show and stripped daily is maybe asking too much.

      Reply
    3. Daniel H

      Seemingly now not on the schedule at all after tomorrow, not even in the week commencing the 22nd

      Reply
      1. Brig Bother Post author

        This is kind of amazing, there’s no way it deserves to be done after nine episodes – it’s basically competent if not exactly thrilling (I doubt it will be Hall of Shame bound at the end of the year), but I guess you can’t argue with figures.

        Reply
        1. Jon

          I think this is probably my favourite new show this year… Shame it hasn’t found an audience.
          Wonder if the celeb show (out later in the yr apparently) fare any better… I suspect they might?

          Reply
  26. Dan

    Anyone know when the celeb eps are due out?
    My friend watched the record of Jimmy Carr playing, and 3 or 4 comedians as benchmarkers. I can see how that could be funny.

    Reply
  27. Stefan

    Hi Paul.

    Any ratings for any of the following with +1:

    – Benchmark
    – Deal or No Deal
    – Big Brother
    – Love Island
    – Pointless

    Would be appreciated thank you!

    Reply
  28. Brig Bother Post author

    Well Benchmark doesn’t seem to be on the schedule week beginning 29th June either, Benchmark fans.

    Reply
    1. Delano

      C4-wise, the episodes could go out in the twilight zone of insomniacs/night shifters going to bed and early morning shifters waking up.

      Reply
  29. Brett Linforth

    According to UKgameshows, the remaining 21 episodes are going to be played out in the current Deal Or No Deal slot (weekdays at 1.10pm) as of Monday November 2nd

    Reply
  30. Thomas Sales

    Going entirely on today’s episode – I can’t remember any of the previous nine episodes from over three months ago, and my college restarts tomorrow – the problem really is its length. You could cut five minutes by slashing the drifting reveals, and asking the single benchmarker what they put and mocking it is just filler except in the final round. When I get around to catching up with tomorrow’s episode on All 4 I’m going to see how much shorter the program is from losing them!

    Reply
    1. jon

      That is probably down to the channel wanting a specific length, generally i think most hour long quiz shows would be helped by being 10 mins shorter… but I’m guessing the channels wouldn’t want that.

      Reply
      1. Thomas Sales

        Shipping Wars UK appears to be doing better in a half-hour slot at 5:30 than in an hour slot at 4. Perhaps they could do the same for this?

        Reply
  31. Brett Linforth

    Well, it appears that the show has been taken off air once again. They did show another 9 episodes by my count but, apparently, two of them went out between 4 and 5 in the morning. I hate to say this as I quite enjoyed the show but I reckon it’s a dead-cert to be featured in the 2015 UK Gameshows Hall Of Shame.

    Reply
    1. David B

      And here’s the stats for the 9 regular shows currently available on All4:

      HIGHER is correct 58.7% of the time
      HIGHER is correct 72.5% for benchmarks 50+
      HIGHER is correct 70% for benchmarks 1000+ (for 2500+, it was correct every time except once – 13 out of 14)

      So still rather biased, but not quite as badly as the first 7 shows.

      Reply
      1. Daniel H

        Whilst we’re here, the “current” stats for my 50% experiment are (for all broadcast civilian and celebrity episodes combined):

        50% Strategy:

        1 Loss
        9 Low
        12 Middle
        2 High

        Real:

        2 Losses
        10 Low
        11 Middle
        1 High (this was on a celebrity show meaning we haven’t seen a civilian high win “yet”)

        Comparison (50% to Real):

        8 Better (One of these was two levels better)
        11 Same
        5 Worse

        And for posterity here’s exactly how those 24 episodes add up:

        Eps 1-9: June 4pm
        10-17: November 1:10pm
        18-28: Unaired
        29-30: November Early Hours
        Celebrity 1-3: Saturdays 7pm
        Celebrity 4-5: Fridays 11:05pm

        Reply
        1. jon

          In the endgame I think the 50% rule wins more times – but only at the lower amounts, but to win big you need to take the gamble…

          Oddly I like this! But we’ll never see this show again!

          Reply
        1. John R

          I liked when The Colour Of Money Episode 8 got shoved in a graveyard slot then pulled at the last minute for an episode of Midsomer Murders or something like that.

          I did find a website which claims it is going to be aired soon though!

          S01E08 – Episode 8
          Next Episode Air Date: 1/1/2099, 6:45:00 PM – 83 years from now

          Reply
          1. Mart With A Y Not An I

            From memory, it was hardly a graveyard slot.

            For some reason 3pm on the 28th December rings a bell – but, at the time ITV were repeating Midsummer Murders in that slot, so to find TCOM put back on the shelf of eternal rest for MM wasn’t a massive surprise.

          2. Weaver

            (checks through old Weeks)

            Yeah, not a graveyard slot: Tuesday 29 December 2009, at 5.30. Something to burn against A Grand Day Out on BBC1.

            Replaced by an episode of Rosemary and Thyme, and (to the best of my knowledge) never shown.

          3. Mart With A Y Not An I

            Thanks for trawling the back issues of the RT for that Mr W.
            Close-ish on day and time.

            I wonder what (if any) the contestants won on that show?
            My guess is nothing – and no sob story attached to the players being on the show to win some money for, so it was a pretty expendable edition to forget about and ignore.

          4. Brig Bother Post author

            Actually, unless my memory is playing tricks on me here, there was meant to be quite a substantial win involved because it got leaked beforehand and I remember getting annoyed over the prospect they might not have got paid. A blind person may have been involved in the story as well, for added resonance.

            It’s a pity the comments are no longer available for the old site, I’m sure it would have been discussed.

        2. MrCT2U

          Going up against shows like ITV Nightscreen, BBC News 24 and the 900+ channels on Sky so plenty of stiff competition there haha lol 😉

          Reply
  32. Daniel H

    Time for the stats literally no one has been waiting for – how did the 50% strategy stack up across all 35 (civilian and celebrity) episodes of Benchmark

    50% Strategy
    4 High – 16 Middle – 14 Low – 1 Zero

    Real
    3 High – 14 Middle – 15 Low – 3 Zero

    Comparison (50% to Real)
    12 Better – 16 Same – 7 Worse
    (One of the “betters” was two levels better)

    Air Dates: All on my previous post other than to note that within the episodes 18-28 shown across December, episodes 18 and 19 were shown in the wrong order after they had to add in an unscheduled episode when they realised that episode 18 hadn’t gone out.

    Main Game: Briefly the number of places moved up the money scale was as follows:
    0 Zero Places Up – 0 One – 1 Two – 10 Three – 11 Four
    7 Five – 4 Six – 2 Seven – 0 Eight

    Finally, as per David’s suggestion there were too many higher “off the scales” to count but interestingly there was finally one occasion where the answer was “off the scale” on the low side.

    Happy New Year to everyone at the Bar!

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *