Skip to content

Commit bcea001

Browse files
2 parents 64e94c9 + 250a60f commit bcea001

File tree

12 files changed

+1826
-86
lines changed

12 files changed

+1826
-86
lines changed

README.md

Lines changed: 41 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22
<p>
33
<a href="https://discord.gg/c382VBk"><img src="https://img.shields.io/discord/623794270255579146?label=Delphi Community Discord" alt="Delphi Community" /></a>
44
<a href="https://discord.gg/3VdxbSFyJP"><img src="https://img.shields.io/discord/570025060312547359?label=Unofficial Free Pacal Discord" alt="Unofficial Free Pacal" /></a>
5+
<a href="https://forum.lazarus.freepascal.org/index.php/topic,66571.0.html"><img src="https://img.shields.io/badge/Lazarus_Forum-1BRC_Thread-blue" /></a>
6+
<a href="https://en.delphipraxis.net/topic/11209-offical-launch-of-the-1-billion-row-challenge-in-object-pascal/"><img src="https://img.shields.io/badge/Delphi_Praxis_Forum-1BRC_Thread-blue" /></a>
57
</p>
68

79
This is the repository that will coordinate the 1 Billion Row Challenge for Object Pascal.
@@ -66,33 +68,51 @@ Submit your implementation and become part of the leader board!
6668

6769
## Rounding
6870

69-
Székely Balázs has provided code for rounding towards positive infinity per the original challenge.
70-
This will be the official way to round the output values:
71+
While I recognize that Székely's rounding code was a good effort, it was not simple and made a lot of people doubt it was even correct for negative temperatures.\
72+
In a discussion with [Mr. Packman](https://pack.ac/) themselves, we came up with a simpler solution. They even added some _Unit Testing_ :D.
73+
74+
This will be the official way to round the output values, so pick your poison:
7175
```pas
72-
function TBaseline.RoundEx(x: Double): Double;
76+
function RoundEx(x: Double): Double; inline;
7377
begin
74-
Result := PascalRound(x*10.0)/10.0;
78+
Result := Ceil(x * 10) / 10;
7579
end;
7680
77-
function TBaseline.PascalRound(x: Double): Double;
81+
function RoundExInteger(x: Double): Integer; inline;
82+
begin
83+
Result := Ceil(x * 10);
84+
end;
85+
86+
function RoundExString(x: Double): String; inline;
7887
var
79-
t: Double;
88+
V, Q, R: Integer;
8089
begin
81-
//round towards positive infinity
82-
t := Trunc(x);
83-
if (x < 0.0) and (t - x = 0.5) then
90+
V := RoundExInteger(x);
91+
if V < 0 then
8492
begin
85-
// Do nothing
93+
Result := '-';
94+
V := -V;
8695
end
87-
else if Abs(x - t) >= 0.5 then
88-
begin
89-
t := t + Math.Sign(x);
90-
end;
91-
92-
if t = 0.0 then
93-
Result := 0.0
9496
else
95-
Result := t;
97+
Result := '';
98+
Q := V div 10;
99+
R := V - (Q * 10);
100+
Result := IntToStr(Q) + '.' + IntToStr(R);
101+
end;
102+
103+
procedure Test;
104+
var
105+
F: Double;
106+
begin
107+
for F in [10.01, 10.04, -10.01, -10.0, 0, -0, -0.01] do
108+
WriteLn(RoundExInteger(F), ' ', RoundExString(F), ' ', RoundEx(F));
109+
//101 10.1 1.0100000000000000E+001
110+
//101 10.1 1.0100000000000000E+001
111+
//-100 -10.0 -1.0000000000000000E+001
112+
//-100 -10.0 -1.0000000000000000E+001
113+
//0 0.0 0.0000000000000000E+000
114+
//0 0.0 0.0000000000000000E+000
115+
//0 0.0 0.0000000000000000E+000
96116
end;
97117
```
98118

@@ -146,7 +166,7 @@ Expected `SHA256` hash:
146166
>
147167
> We are still waiting for the Delphi version to be completed in order for us to have an official `SHA256` hash for the output.
148168
>
149-
> Until then, this is the current one: `db3d79d31b50daa8c03a1e4f2025029cb137f9971aa04129d8bca004795ae524`
169+
> Until then, this is the current one: `4256d19d3e134d79cc6f160d428a1d859ce961167bd01ca528daca8705163910`
150170
> There's also an archived version of the [baseline output](./data/baseline.output.gz)
151171
152172
## Differences From Original
@@ -209,7 +229,8 @@ I'd like to thank [@paweld](https://github.com/paweld) for taking us from my mis
209229
I'd like to thank [@mobius](https://github.com/mobius1qwe) for taking the time to provide the Delphi version of the generator.\
210230
I'd like to thank [@dtpfl](https://github.com/dtpfl) for his invaluable work on maintaining the `README.md` file up to date with everything.\
211231
I'd like to thank Székely Balázs for providing many patches to make everything compliant with the original challenge.\
212-
I'd like to thank [@corneliusdavid](https://github.com/corneliusdavid) for giving some of the information files a once over and making things more legible and clear.
232+
I'd like to thank [@corneliusdavid](https://github.com/corneliusdavid) for giving some of the information files a once over and making things more legible and clear.\
233+
I'd like to thank Mr. **Pack**man, aka O, for clearing the fog around the rounding issues.
213234

214235
## Links
215236
The original repository: https://github.com/gunnarmorling/1brc \

baseline/Common/baseline.common.pas

Lines changed: 1 addition & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ TBaseline = class(TObject)
3636
procedure AddToHashList(AStation: String; ATemp: Int64);
3737
procedure BuildHashList;
3838
function RoundEx(x: Double): Double;
39-
function PascalRound(x: Double): Double;
4039
protected
4140
public
4241
constructor Create(AInputFile: String);
@@ -185,28 +184,7 @@ procedure TBaseline.BuildHashList;
185184

186185
function TBaseline.RoundEx(x: Double): Double;
187186
begin
188-
Result := PascalRound(x*10.0)/10.0;
189-
end;
190-
191-
function TBaseline.PascalRound(x: Double): Double;
192-
var
193-
t: Double;
194-
begin
195-
//round towards positive infinity
196-
t := Trunc(x);
197-
if (x < 0.0) and (t - x = 0.5) then
198-
begin
199-
// Do nothing
200-
end
201-
else if Abs(x - t) >= 0.5 then
202-
begin
203-
t := t + Math.Sign(x);
204-
end;
205-
206-
if t = 0.0 then
207-
Result := 0.0
208-
else
209-
Result := t;
187+
Result := Ceil(x * 10) / 10;
210188
end;
211189

212190
procedure TBaseline.Generate;
@@ -221,7 +199,6 @@ procedure TBaseline.Generate;
221199

222200
BuildHashList;
223201

224-
//FStationNames.DefaultEncoding := TEncoding.UTF8;
225202
FStationNames.BeginUpdate;
226203
for index := 0 to FHashStationList.Count - 1 do
227204
begin

data/baseline.output.gz

-68 Bytes
Binary file not shown.

entries/abouchez/README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ I am very happy to share decades of server-side performance coding techniques us
2020

2121
Here are the main ideas behind this implementation proposal:
2222

23-
- **mORMot** makes cross-platform and cross-compiler support simple - e.g. `TMemMap`, `TDynArray.Sort`,`TTextWriter`, `SetThreadCpuAffinity`, `crc32c`, `ConsoleWrite` or command-line parsing;
23+
- **mORMot** makes cross-platform and cross-compiler support simple - e.g. `TMemMap`, `TDynArray`,`TTextWriter`, `SetThreadCpuAffinity`, `crc32c`, `ConsoleWrite` or command-line parsing;
2424
- The entire 16GB file is `memmap`ed at once into memory - it won't work on 32-bit OS, but avoid any `read` syscall or memory copy;
2525
- Process file in parallel using several threads - configurable via the `-t=` switch, default being the total number of CPUs reported by the OS;
2626
- Input is fed into each thread as 64MB chunks: because thread scheduling is unbalanced, it is inefficient to pre-divide the size of the whole input file into the number of threads;
@@ -32,20 +32,22 @@ Here are the main ideas behind this implementation proposal:
3232
- Parse temperatures with a dedicated code (expects single decimal input values);
3333
- The station names are stored as UTF-8 pointers to the memmap location where they appear first, in `StationName[]`, to be emitted eventually for the final output, not during temperature parsing;
3434
- No memory allocation (e.g. no transient `string` or `TBytes`) nor any syscall is done during the parsing process to reduce contention and ensure the process is only CPU-bound and RAM-bound (we checked this with `strace` on Linux);
35-
- Pascal code was tuned to generate the best possible asm output on FPC x86_64 (which is our target);
35+
- Pascal code was tuned to generate the best possible asm output on FPC x86_64 (which is our target) - perhaps making it less readable, because we used pointer arithmetics when it matters (I like to think as such low-level pascal code as [portable assembly](https://sqlite.org/whyc.html#performance) similar to "unsafe" code in managed languages);
3636
- Can optionally output timing statistics and resultset hash value on the console to debug and refine settings (with the `-v` command line switch);
3737
- Can optionally set each thread affinity to a single core (with the `-a` command line switch).
3838

3939
If you are not convinced by the "perfect hash" trick, you can define the `NOPERFECTHASH` conditional, which forces full name comparison, but is noticeably slower. Our algorithm is safe with the official dataset, and gives the expected final result - which was the goal of this challenge: compute the right data reduction with as little time as possible, with all possible hacks and tricks. A "perfect hash" is a well known hacking pattern, when the dataset is validated in advance. And since our CPUs offers `crc32c` which is perfect for our dataset... let's use it! https://en.wikipedia.org/wiki/Perfect_hash_function ;)
4040

4141
## Why L1 Cache Matters
4242

43-
Take great care of the "64 bytes cache line" is quite unique among all implementations of the "1brc" I have seen in any language - and it does make a noticeable difference in performance.
43+
Taking special care of the "64 bytes cache line" is quite unique among all implementations of the "1brc" I have seen in any language - and it does make a noticeable difference in performance.
4444

4545
The L1 cache is well known in the performance hacking litterature to be the main bottleneck for any efficient in-memory process. If you want things to go fast, you should flatter your CPU L1 cache.
4646

4747
Min/max values will be reduced as 16-bit smallint - resulting in temperature range of -3276.7..+3276.8 which seems fair on our planet according to the IPCC. ;)
4848

49+
As a result, each `Station[]` entry takes only 16 bytes, so we can fit exactly 4 entries in a single CPU L1 cache line. To be fair, if we put some more data into the record (e.g. use `Int64` instead of `smallint`/`integer`), the performance degrades only for a few percents. The main fact seems to be that the entry is likely to fit into a single cache line, even if filling two cache lines may be sometimes needed for misaligned data.
50+
4951
In our first attempt (see "Old Version" below), we stored the name into the `Station[]` array, so that each entry is 64 bytes long exactly. But since `crc32c` is a perfect hash function for our dataset, it is enough to just store the 32-bit hash instead, and not the actual name.
5052

5153
Note that if we reduce the number of stations from 41343 to 400, the performance is much higher, also with a 16GB file as input. The reason is that since 400x16 = 6400, each dataset could fit entirely in each core L1 cache. No slower L2/L3 cache is involved, therefore performance is better. The cache memory seems to be the bottleneck of our code. Which is a good sign.
@@ -236,6 +238,6 @@ Benchmark 1: abouchez
236238
```
237239
It is a known fact from experiment that forcing thread affinity is not a good idea, and it is always much better to let any modern Operating System do the threads scheduling to the CPU cores, because it has a much better knowledge of the actual system load and status. Even on a "fair" CPU architecture like AMD Zen. For a "pure CPU" process, affinity may help a very little. But for our "old" process working outside of the L1 cache limits, we better let the OS decide.
238240

239-
So with this "old" version, it was decided to use `-t=16`. The "old" version is using a whole cache line (16 bytes) for its `Station[]` record, so it may be the responsible of using too much CPU cache, so more than 16 threads does not make a difference with it. Whereas our "new" version, with its `Station[]` of only 16 bytes, could use `-t=32` with benefits. The cache memory access is likely to be the bottleneck from now on.
241+
So with this "old" version, it was decided to use `-t=16`. The "old" version is using a whole cache line (64 bytes) for its `Station[]` record, so it may be the responsible of using too much CPU cache, so more than 16 threads does not make a difference with it. Whereas our "new" version, with its `Station[]` of only 16 bytes, could use `-t=32` with benefits. The cache memory access is likely to be the bottleneck from now on.
240242

241243
Arnaud :D

entries/abouchez/src/brcmormot.lpr

Lines changed: 39 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -327,29 +327,31 @@ function Average(sum, count: PtrInt): PtrInt;
327327
//ConsoleWrite([sum / (count * 10), ' ', result / 10]);
328328
end;
329329

330-
function ByStationName(const A, B): integer;
330+
function ByStationName(const A, B): integer; // = StrComp() but ending with ';'
331331
var
332332
pa, pb: PByte;
333+
c: byte;
333334
begin
334335
result := 0;
335336
pa := pointer(A);
336337
pb := pointer(B);
337-
if pa = pb then
338+
dec(pa, {%H-}PtrUInt(pb));
339+
if pa = nil then
338340
exit;
339341
repeat
340-
if pa^ <> pb^ then
342+
c := PByteArray(pa)[{%H-}PtrUInt(pb)];
343+
if c <> pb^ then
341344
break
342-
else if pa^ = ord(';') then
345+
else if c = ord(';') then
343346
exit; // Str1 = Str2
344-
inc(pa);
345347
inc(pb);
346348
until false;
347-
if pa^ = ord(';') then
349+
if (c = ord(';')) or
350+
((pb^ <> ord(';')) and
351+
(c < pb^)) then
348352
result := -1
349-
else if pb^ = ord(';') then
350-
result := 1
351353
else
352-
result := pa^ - pb^;
354+
result := 1;
353355
end;
354356

355357
function TBrcMain.SortedText: RawUtf8;
@@ -368,36 +370,39 @@ function TBrcMain.SortedText: RawUtf8;
368370
assert(c <> 0);
369371
DynArraySortIndexed(
370372
pointer(fList.StationName), SizeOf(PUtf8Char), c, ndx, ByStationName);
371-
// generate output
372-
FastSetString(result, nil, 1200000); // pre-allocate result
373-
st := TRawByteStringStream.Create(result);
374373
try
375-
w := TTextWriter.Create(st, @tmp, SizeOf(tmp));
374+
// generate output
375+
FastSetString(result, nil, 1200000); // pre-allocate result
376+
st := TRawByteStringStream.Create(result);
376377
try
377-
w.Add('{');
378-
n := ndx.buf;
379-
repeat
380-
s := @fList.Station[n^];
381-
assert(s^.Count <> 0);
382-
p := fList.StationName[n^];
383-
w.AddNoJsonEscape(p, NameLen(p));
384-
AddTemp(w, '=', s^.Min);
385-
AddTemp(w, '/', Average(s^.Sum, s^.Count));
386-
AddTemp(w, '/', s^.Max);
387-
dec(c);
388-
if c = 0 then
389-
break;
390-
w.Add(',', ' ');
391-
inc(n);
392-
until false;
393-
w.Add('}');
394-
w.FlushFinal;
395-
FakeLength(result, w.WrittenBytes);
378+
w := TTextWriter.Create(st, @tmp, SizeOf(tmp));
379+
try
380+
w.Add('{');
381+
n := ndx.buf;
382+
repeat
383+
s := @fList.Station[n^];
384+
assert(s^.Count <> 0);
385+
p := fList.StationName[n^];
386+
w.AddNoJsonEscape(p, NameLen(p));
387+
AddTemp(w, '=', s^.Min);
388+
AddTemp(w, '/', Average(s^.Sum, s^.Count));
389+
AddTemp(w, '/', s^.Max);
390+
dec(c);
391+
if c = 0 then
392+
break;
393+
w.Add(',', ' ');
394+
inc(n);
395+
until false;
396+
w.Add('}');
397+
w.FlushFinal;
398+
FakeLength(result, w.WrittenBytes);
399+
finally
400+
w.Free;
401+
end;
396402
finally
397-
w.Free;
403+
st.Free;
398404
end;
399405
finally
400-
st.Free;
401406
ndx.Done;
402407
end;
403408
end;

entries/bfire/README.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Brian Fire
2+
3+
An Entry to the One Billion Row Challenge in Object Pascal using Delphi 12 by [EagleAglow](https://github.com/EagleAglow), Discord: briar.on.fire
4+
5+
## Compiler
6+
7+
**Delphi 12** Professional Edition
8+
9+
### Dependencies
10+
11+
Project uses Delphi units: `Classes`, `System.SysUtils`, `System.StrUtils` and `Math`.
12+
13+
### UTF8 vs. Windows Terminal
14+
15+
The text in the Windows Terminal console uses the system code page, which does not play well with `UTF8`.
16+
The only way to match the approved result is to write the output to a file, with resulting `SHA256` hash:\
17+
`db3d79d31b50daa8c03a1e4f2025029cb137f9971aa04129d8bca004795ae524`
18+
19+
If the Windows console output is redirected to a file, some characters are mangled, and the resulting `SHA256` hash is:\
20+
`82411ba76c59ae765e85b497f135a8f4e68d7a14cb7c0909ba96dea0d0635a28`
21+
22+
For the challenge, compiled for LINUX, the console result will (hopefully) be correct.
23+
24+
### Execution
25+
```
26+
Usage
27+
bfire -h | Write this help message and exit
28+
bfire -v | Write the version and exit
29+
bfire -i <file_1> | <file_1> contains Weather Data
30+
bfire -i <file_1> -o <file_2> | <file_1> contains Weather Data
31+
| <file_2> contains result
32+
If <file_2> is not defined, result goes to CONSOLE (STDOUT)
33+
```
34+
35+
#### Contest Mode
36+
37+
To run the challenge, read from the 'challenge.csv' file:
38+
39+
```console
40+
C:> bfire -i challenge.csv
41+
```
42+
43+
## Remarks
44+
45+
I haven't used Delphi very much recently, really needed to work on this for a refresher.
46+
I like TStringList self-sorting, but it is not as fast as other techniques.
47+
Now that this entry is set up, I can play with improvements. Maybe even get a time under 15 minutes! :)
48+
49+
## History
50+
51+
- Version 1.0: first working version.

0 commit comments

Comments
 (0)