CCRL 40/15 Testing Conditions (previously 40/40)

Questions and comments related to CCRL testing study
bastiball
Posts: 1940
Joined: Thu Aug 05, 2021 2:35 pm
Sign-up code: 10159
Location: Cavite, Philippines
Contact:

Re: CCRL 40/15 Testing Conditions (previously 40/40)

Post by bastiball »

jkominek wrote: Wed Dec 21, 2022 3:35 pm I have a question in regards to relating the benchmark calibration to running conditions. The answer is perhaps obvious, but better to ask since I have not seen it addressed in this forum thread.

i) How many simultaneous games do CCRL testers typically run?
ii) If it is more than one, is benchmark calibration performed when the machine running a CPU load in line with what is expected during the tournament?

In Kirill's post he suggests rebooting to get a clean slate before benchmarking. But that was back in the years of single-CPU computers.

To be specific to my situation I have a 48-CPU test computer, and use cutechess-cli. I set "-concurrency 48" when running engines single-threaded, or "-concurrency 12" with engines configured to 4 threads. The benchmark result under heavy load is about double the time when run while the machine is idle. That makes a big difference to time control settings if I want a good match CCRL testing conditions.
I have 8 Cores I used 7 concurrency, I always leave 1 core for the system. And it safe to say that it is good.
CCRL Testing Group
Ray
Posts: 22604
Joined: Sun Dec 18, 2005 6:33 pm
Sign-up code: 10159
Location: NZ

Re: CCRL 40/15 Testing Conditions (previously 40/40)

Post by Ray »

jkominek wrote: Wed Dec 21, 2022 3:35 pm
i) How many simultaneous games do CCRL testers typically run?
ii) If it is more than one, is benchmark calibration performed when the machine running a CPU load in line with what is expected during the tournament?

In Kirill's post he suggests rebooting to get a clean slate before benchmarking. But that was back in the years of single-CPU computers.

To be specific to my situation I have a 48-CPU test computer, and use cutechess-cli. I set "-concurrency 48" when running engines single-threaded, or "-concurrency 12" with engines configured to 4 threads. The benchmark result under heavy load is about double the time when run while the machine is idle. That makes a big difference to time control settings if I want a good match CCRL testing conditions.
Very good point. Clock speeds these days vary depending on the load on the CPU. For example on my i9 10900 if I run the benchmark on a fresh boot on its own, the clock will boost as high as 5.2Ghz but usually 5.0GHz. If I'm running 10 threads, it will be running at more like 4.0 GHz ish

If I'm going to be running concurrency 10, what I do is start a process which takes 9 threads, and then I run the Stockfish bench and use the figure returned under those conditions to calculate the adjusted time control.

I also do not exceed the core count. The 19 10900 is 10 cores, 20 threads. I never run concurrency of more than 10.
User avatar
Gabor Szots
Posts: 12845
Joined: Sat Dec 09, 2006 6:30 am
Sign-up code: 10159
Location: Szentendre, Hungary

Re: CCRL 40/15 Testing Conditions (previously 40/40)

Post by Gabor Szots »

Ray wrote: Wed Dec 21, 2022 4:55 pmClock speeds these days vary depending on the load on the CPU. For example on my i9 10900 if I run the benchmark on a fresh boot on its own, the clock will boost as high as 5.2Ghz but usually 5.0GHz. If I'm running 10 threads, it will be running at more like 4.0 GHz ish
That surprises me. Is there no way to force a given clock speed any more? I have an i5-4690K and in the BIOS I selected a scheme which forces all cores to run at 3.5 GHz whatever the load.
Ray
Posts: 22604
Joined: Sun Dec 18, 2005 6:33 pm
Sign-up code: 10159
Location: NZ

Re: CCRL 40/15 Testing Conditions (previously 40/40)

Post by Ray »

Could be, but when I'm not running chess that boost is worth having.
jkominek
Posts: 150
Joined: Mon Dec 04, 2006 9:02 am
Sign-up code: 0
Location: Pittsburgh, PA

Re: CCRL 40/15 Testing Conditions (previously 40/40)

Post by jkominek »

Gabor Szots wrote: Wed Dec 21, 2022 5:07 pm
Ray wrote: Wed Dec 21, 2022 4:55 pmClock speeds these days vary depending on the load on the CPU. For example on my i9 10900 if I run the benchmark on a fresh boot on its own, the clock will boost as high as 5.2Ghz but usually 5.0GHz. If I'm running 10 threads, it will be running at more like 4.0 GHz ish
That surprises me. Is there no way to force a given clock speed any more? I have an i5-4690K and in the BIOS I selected a scheme which forces all cores to run at 3.5 GHz whatever the load.
I've spent multiple happy(?) days in a fiddle-with-BIOS-settings/reboot loop to optimize clock settings, using a program called y-cruncher (http://www.numberworld.org/y-cruncher/records.html) to apply compute load. I was surprised to find - on my computer anyway - that lowering the target clock speed resulted in faster execution times. I believe it is no longer possible to force a flat clock rate with modern CPUs, despite how the BIOS settings tool describes it. The controller reserves the option of lowering the clock as needed -- to save power, for one, but also prevent over-heating -- and there is nothing you can do to prevent that. I am sure the designers reason backing off on speed is better than frying a circuit. In the opposite direction, "turbo boosting" the clock of one or two previously idle CPUs when under light load is a clever way of having the computer be more responsive to human interaction.

I imagine most CCRL participants are Windows users. But for those on Linux I'll mention the program "i7z". It is handy for monitoring clock rates live. Below is a snapshot of its report with 46/48 CPUs running chess (to save space only showing CPU socket 0).

Code: Select all

Cpu speed from cpuinfo 2999.00Mhz
True Frequency (without accounting Turbo) 2999 MHz

Socket [0] - [physical cores=24, logical cores=48, max online cores ever=24]
  CPU Multiplier 30x || Bus clock frequency (BCLK) 99.97 MHz
  TURBO ENABLED on 24 Cores, Hyper Threading ON
  Max Frequency without considering Turbo 3098.97 MHz (99.97 x [31])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is  40x/38x/37x/37x/37x/37x
  Real Current Frequency 3272.63 MHz (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       3097.92 (30.99x)      7.05    92.7       0       0    64      0.8853
        Core 2 [1]:       3098.30 (30.99x)      24.4    73.8       0       1    65      0.8931
        Core 3 [2]:       3101.40 (31.02x)       100       0       0       0    64      0.9009
        Core 4 [3]:       3120.47 (31.22x)         1     100       0       0    66      0.9001
        Core 5 [4]:       3091.99 (30.93x)         1    99.9       0       0    65      0.9152
        Core 6 [5]:       3098.15 (30.99x)      99.9       0       0       0    69      0.9148
        Core 7 [6]:       3096.52 (30.98x)         1    99.9       0       0    66      0.9301
        Core 8 [7]:       3099.79 (31.01x)      78.2       0       0      20    62      0.9005
        Core 9 [8]:       3097.93 (30.99x)       100       0       0       0    63      0.8931
        Core 10 [9]:      3098.11 (30.99x)      75.5       0       0      24    67      0.8999
        Core 11 [10]:     3100.48 (31.02x)       100       0       0       0    66      0.9146
        Core 12 [11]:     3272.63 (32.74x)         1    99.8       0       0    65      0.9154
        Core 13 [12]:     3121.15 (31.22x)      22.4    14.3       0    62.4    59      0.8795
        Core 14 [13]:     3098.17 (30.99x)       100       0       0       0    64      0.8860
        Core 15 [14]:     3175.22 (31.76x)         1     100       0       0    64      0.8934
        Core 16 [15]:     3099.10 (31.00x)         1    99.2       0       0    67      0.9001
        Core 17 [16]:     3097.93 (30.99x)       100       0       0       0    64      0.9154
        Core 18 [17]:     3098.28 (30.99x)       100       0       0       0    64      0.9154
        Core 19 [18]:     3097.99 (30.99x)       100       0       0       0    65      0.9154
        Core 20 [19]:     3161.58 (31.63x)         1    99.9       0       0    63      0.8931
        Core 21 [20]:     3167.64 (31.69x)         1     100       0       0    66      0.9005
        Core 22 [21]:     3097.93 (30.99x)       100       0       0       0    66      0.9146
        Core 23 [22]:     3105.50 (31.07x)         1    99.9       0       0    67      0.9146
        Core 24 [23]:     3097.94 (30.99x)       100       0       0       0    65      0.9301
C6 = Everything in C3 + core state saved to last level cache
  Above values in table are in percentage over the last 1 sec
[core-id] refers to core-id number in /proc/cpuinfo
When the load drops down to say 32/48 the clock multiplier increases.

Code: Select all

Socket [0] - [physical cores=24, logical cores=48, max online cores ever=24]
  CPU Multiplier 30x || Bus clock frequency (BCLK) 99.97 MHz
  TURBO ENABLED on 24 Cores, Hyper Threading ON
  Max Frequency without considering Turbo 3098.97 MHz (99.97 x [31])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is  40x/38x/37x/37x/37x/37x
  Real Current Frequency 3659.91 MHz (Max of below)
        Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp      VCore
        Core 1 [0]:       3593.56 (35.95x)      3.85    31.6       0    63.8    58      0.9832
        Core 2 [1]:       3369.81 (33.71x)         1     100       0       0    63      0.8940
        Core 3 [2]:       3544.41 (35.46x)         1    0.687      0    99.3    57      0.8726
        Core 4 [3]:       3607.31 (36.09x)         1    1.95       0    97.7    56      0.8726
        Core 5 [4]:       3287.66 (32.89x)         0    0.624      0    99.4    58      0.8799
        Core 6 [5]:       3486.87 (34.88x)      99.8       0       0       0    66      0.9154
        Core 7 [6]:       3524.83 (35.26x)         1    5.28       0    94.5    58      0.9021
        Core 8 [7]:       3359.14 (33.60x)         1     100       0       0    65      0.9005
        Core 9 [8]:       3489.47 (34.91x)      82.1       0       0    12.5    62      0.8934
        Core 10 [9]:      3475.40 (34.77x)      18.1    5.86       0    73.2    58      0.8722
        Core 11 [10]:     3492.63 (34.94x)      24.3       0       0    72.7    60      0.8792
        Core 12 [11]:     3659.91 (36.61x)         1    25.7       0    73.6    60      0.8789
        Core 13 [12]:     3522.23 (35.23x)      99.7       0       0       0    64      0.9160
        Core 14 [13]:     3486.85 (34.88x)      99.8       0       0       0    64      0.8860
        Core 15 [14]:     3575.79 (35.77x)         1     100       0       0    66      0.9978
        Core 16 [15]:     3437.66 (34.39x)         1     5.6       0    94.2    58      0.9979
        Core 17 [16]:     3486.86 (34.88x)      99.8       0       0       0    65      0.9156
        Core 18 [17]:     3618.64 (36.20x)         1    4.05       0    95.2    57      1.0055
        Core 19 [18]:     3488.67 (34.90x)      99.8       0       0       0    65      0.9156
        Core 20 [19]:     3472.98 (34.74x)         1    99.8       0       0    62      0.8940
        Core 21 [20]:     3213.99 (32.15x)         1    3.74       0    96.2    57      0.9983
        Core 22 [21]:     3486.89 (34.88x)      99.8       0       0       0    65      0.9156
        Core 23 [22]:     3486.88 (34.88x)      99.8       0       0       0    66      0.9152
        Core 24 [23]:     3534.81 (35.36x)      99.4       0       0       0    67      1.0420
I can't say I do much with the report, but it's fun to watch.

It's not just clock rate that affects benchmark results when running under full or near-full load. The flock of simultaneous engines increase memory contention too. Memory contention might be the largest contributor to observed slowdown.
jkominek
Posts: 150
Joined: Mon Dec 04, 2006 9:02 am
Sign-up code: 0
Location: Pittsburgh, PA

Re: CCRL 40/15 Testing Conditions (previously 40/40)

Post by jkominek »

I have a couple extra questions that I have not been able to find answers for, and which I think wouldn't hurt to put on public record.

i) When running BayesElo, what values are used for anchoring the rating lists?
ii) Was anchoring to FIDE rating lists performed, and if so, what historical notes do you have on that procedure?

In the talkchess thread "a direct comparison of FIDE and CCRL rating systems" a person with the handle drj4759 asserts that "Shredder 12 x64 wtih ELO 2800 was used as the anchor in all the different rating list presentation." Is that the case? That's not an unreasonable anchor point, but I cannot find confirmation, and I don't get the impression that drj4759 is one of the CCRL principals. Graham Banks participated in the thread but only weighed in on the CCRL machine calibration procedure.
Ray
Posts: 22604
Joined: Sun Dec 18, 2005 6:33 pm
Sign-up code: 10159
Location: NZ

Re: CCRL 40/15 Testing Conditions (previously 40/40)

Post by Ray »

jkominek wrote: Thu Jan 05, 2023 9:11 pm I have a couple extra questions that I have not been able to find answers for, and which I think wouldn't hurt to put on public record.

i) When running BayesElo, what values are used for anchoring the rating lists?
ii) Was anchoring to FIDE rating lists performed, and if so, what historical notes do you have on that procedure?

In the talkchess thread "a direct comparison of FIDE and CCRL rating systems" a person with the handle drj4759 asserts that "Shredder 12 x64 wtih ELO 2800 was used as the anchor in all the different rating list presentation." Is that the case? That's not an unreasonable anchor point, but I cannot find confirmation, and I don't get the impression that drj4759 is one of the CCRL principals. Graham Banks participated in the thread but only weighed in on the CCRL machine calibration procedure.
Categorically no, our ratings lists are not anchored to Shredder 12. And categorically no, they are not directly anchored to any FIDE ratings.

Back in late 2006 we chose a basket of 14 engines from the SSDF ratings list dated 24th November 2006 and got the average value of those. We run bayeselo on our database, compare the average of those 12 engines per the default (zero-based?) bayeselo calculation, and increment the ratings for all engines by that fixed difference. (40/15 and blitz lists)

Subsequently some years later we took the view that the ratings looked high, and reduced by 100 Elo.

It has been said that SSDF back then was supposedly reasonably representative of human ratings, so indirectly our lists *might* have some correlation to human ratings, but that is a big stretch and I definitely would not be making that statement.

The other complication is that bayeselo and Ordo both give very different ratings from the same database with the same anchor.
Post Reply