ObjectPascal-Community
diff --git a/‎README.md‎
Lines changed: 41 additions & 20 deletions b/‎README.md‎
Lines changed: 41 additions & 20 deletions
diff --git a/‎baseline/Common/baseline.common.pas‎
Lines changed: 1 addition & 24 deletions b/‎baseline/Common/baseline.common.pas‎
Lines changed: 1 addition & 24 deletions
diff --git a/‎data/baseline.output.gz‎
-68 Bytes b/‎data/baseline.output.gz‎
-68 Bytes
diff --git a/‎entries/abouchez/README.md‎
Lines changed: 6 additions & 4 deletions b/‎entries/abouchez/README.md‎
Lines changed: 6 additions & 4 deletions
diff --git a/‎entries/abouchez/src/brcmormot.lpr‎
Lines changed: 39 additions & 34 deletions b/‎entries/abouchez/src/brcmormot.lpr‎
Lines changed: 39 additions & 34 deletions
diff --git a/‎entries/bfire/README.md‎
Lines changed: 51 additions & 0 deletions b/‎entries/bfire/README.md‎
Lines changed: 51 additions & 0 deletions
@@ -2,6 +2,8 @@
 <p>
     <a href="https://discord.gg/c382VBk"><img src="https://img.shields.io/discord/623794270255579146?label=Delphi Community Discord" alt="Delphi Community" /></a>
     <a href="https://discord.gg/3VdxbSFyJP"><img src="https://img.shields.io/discord/570025060312547359?label=Unofficial Free Pacal Discord" alt="Unofficial Free Pacal" /></a>
+    <a href="https://forum.lazarus.freepascal.org/index.php/topic,66571.0.html"><img src="https://img.shields.io/badge/Lazarus_Forum-1BRC_Thread-blue" /></a>
+    <a href="https://en.delphipraxis.net/topic/11209-offical-launch-of-the-1-billion-row-challenge-in-object-pascal/"><img src="https://img.shields.io/badge/Delphi_Praxis_Forum-1BRC_Thread-blue" /></a>
 </p>
 
 This is the repository that will coordinate the 1 Billion Row Challenge for Object Pascal.
@@ -66,33 +68,51 @@ Submit your implementation and become part of the leader board!
 
 ## Rounding
 
-Székely Balázs has provided code for rounding towards positive infinity per the original challenge.  
-This will be the official way to round the output values:
+While I recognize that Székely's rounding code was a good effort, it was not simple and made a lot of people doubt it was even correct for negative temperatures.\
+In a discussion with [Mr. Packman](https://pack.ac/) themselves, we came up with a simpler solution. They even added some _Unit Testing_ :D.
+
+This will be the official way to round the output values, so pick your poison:
 ```pas
-function TBaseline.RoundEx(x: Double): Double;
+function RoundEx(x: Double): Double; inline;
 begin
-  Result := PascalRound(x*10.0)/10.0;
+  Result := Ceil(x * 10) / 10;
 end;
 
-function TBaseline.PascalRound(x: Double): Double;
+function RoundExInteger(x: Double): Integer; inline;
+begin
+  Result := Ceil(x * 10);
+end;
+
+function RoundExString(x: Double): String; inline;
 var
-  t: Double;
+  V, Q, R: Integer;
 begin
-  //round towards positive infinity
-  t := Trunc(x);
-  if (x < 0.0) and (t - x = 0.5) then
+  V := RoundExInteger(x);
+  if V < 0 then
   begin
-    // Do nothing
+    Result := '-';
+    V := -V;
   end
-  else if Abs(x - t) >= 0.5 then
-  begin
-    t := t + Math.Sign(x);
-  end;
-
-  if t = 0.0 then
-    Result := 0.0
   else
-    Result := t;
+    Result := '';
+  Q := V div 10;
+  R := V - (Q * 10);
+  Result := IntToStr(Q) + '.' + IntToStr(R);
+end;
+
+procedure Test;
+var
+  F: Double;
+begin
+  for F in [10.01, 10.04, -10.01, -10.0, 0, -0, -0.01] do
+    WriteLn(RoundExInteger(F), ' ', RoundExString(F), ' ', RoundEx(F));
+  //101 10.1  1.0100000000000000E+001
+  //101 10.1  1.0100000000000000E+001
+  //-100 -10.0 -1.0000000000000000E+001
+  //-100 -10.0 -1.0000000000000000E+001
+  //0 0.0  0.0000000000000000E+000
+  //0 0.0  0.0000000000000000E+000
+  //0 0.0  0.0000000000000000E+000
 end;
 ```
 
@@ -146,7 +166,7 @@ Expected `SHA256` hash:
 >
 > We are still waiting for the Delphi version to be completed in order for us to have an official `SHA256` hash for the output.
 >
-> Until then, this is the current one: `db3d79d31b50daa8c03a1e4f2025029cb137f9971aa04129d8bca004795ae524`
+> Until then, this is the current one: `4256d19d3e134d79cc6f160d428a1d859ce961167bd01ca528daca8705163910`
 > There's also an archived version of the [baseline output](./data/baseline.output.gz)
 
 ## Differences From Original
@@ -209,7 +229,8 @@ I'd like to thank [@paweld](https://github.com/paweld) for taking us from my mis
 I'd like to thank [@mobius](https://github.com/mobius1qwe) for taking the time to provide the Delphi version of the generator.\
 I'd like to thank [@dtpfl](https://github.com/dtpfl) for his invaluable work on maintaining the `README.md` file up to date with everything.\
 I'd like to thank Székely Balázs for providing many patches to make everything compliant with the original challenge.\
-I'd like to thank [@corneliusdavid](https://github.com/corneliusdavid) for giving some of the information files a once over and making things more legible and clear.
+I'd like to thank [@corneliusdavid](https://github.com/corneliusdavid) for giving some of the information files a once over and making things more legible and clear.\
+I'd like to thank Mr. **Pack**man, aka O, for clearing the fog around the rounding issues.
 
 ## Links
 The original repository: https://github.com/gunnarmorling/1brc \
 
@@ -36,7 +36,6 @@   TBaseline = class(TObject)
     procedure AddToHashList(AStation: String; ATemp: Int64);
     procedure BuildHashList;
     function RoundEx(x: Double): Double;
-    function PascalRound(x: Double): Double;
   protected
   public
     constructor Create(AInputFile: String);
@@ -185,28 +184,7 @@ procedure TBaseline.BuildHashList;
 
 function TBaseline.RoundEx(x: Double): Double;
 begin
-  Result := PascalRound(x*10.0)/10.0;
-end;
-
-function TBaseline.PascalRound(x: Double): Double;
-var
-  t: Double;
-begin
-  //round towards positive infinity
-  t := Trunc(x);
-  if (x < 0.0) and (t - x = 0.5) then
-  begin
-    // Do nothing
-  end
-  else if Abs(x - t) >= 0.5 then
-  begin
-    t := t + Math.Sign(x);
-  end;
-
-  if t = 0.0 then
-    Result := 0.0
-  else
-    Result := t;
+  Result := Ceil(x * 10) / 10;
 end;
 
 procedure TBaseline.Generate;
@@ -221,7 +199,6 @@ procedure TBaseline.Generate;
 
   BuildHashList;
 
-  //FStationNames.DefaultEncoding := TEncoding.UTF8;
   FStationNames.BeginUpdate;
   for index := 0 to FHashStationList.Count - 1 do
   begin
 
@@ -20,7 +20,7 @@ I am very happy to share decades of server-side performance coding techniques us
 
 Here are the main ideas behind this implementation proposal:
 
-- **mORMot** makes cross-platform and cross-compiler support simple - e.g. `TMemMap`, `TDynArray.Sort`,`TTextWriter`, `SetThreadCpuAffinity`, `crc32c`, `ConsoleWrite` or command-line parsing;
+- **mORMot** makes cross-platform and cross-compiler support simple - e.g. `TMemMap`, `TDynArray`,`TTextWriter`, `SetThreadCpuAffinity`, `crc32c`, `ConsoleWrite` or command-line parsing;
 - The entire 16GB file is `memmap`ed at once into memory - it won't work on 32-bit OS, but avoid any `read` syscall or memory copy;
 - Process file in parallel using several threads - configurable via the `-t=` switch, default being the total number of CPUs reported by the OS;
 - Input is fed into each thread as 64MB chunks: because thread scheduling is unbalanced, it is inefficient to pre-divide the size of the whole input file into the number of threads;
@@ -32,20 +32,22 @@ Here are the main ideas behind this implementation proposal:
 - Parse temperatures with a dedicated code (expects single decimal input values);
 - The station names are stored as UTF-8 pointers to the memmap location where they appear first, in `StationName[]`, to be emitted eventually for the final output, not during temperature parsing;
 - No memory allocation (e.g. no transient `string` or `TBytes`) nor any syscall is done during the parsing process to reduce contention and ensure the process is only CPU-bound and RAM-bound (we checked this with `strace` on Linux);
-- Pascal code was tuned to generate the best possible asm output on FPC x86_64 (which is our target);
+- Pascal code was tuned to generate the best possible asm output on FPC x86_64 (which is our target) - perhaps making it less readable, because we used pointer arithmetics when it matters (I like to think as such low-level pascal code as [portable assembly](https://sqlite.org/whyc.html#performance) similar to "unsafe" code in managed languages);
 - Can optionally output timing statistics and resultset hash value on the console to debug and refine settings (with the `-v` command line switch);
 - Can optionally set each thread affinity to a single core (with the `-a` command line switch).
 
 If you are not convinced by the "perfect hash" trick, you can define the `NOPERFECTHASH` conditional, which forces full name comparison, but is noticeably slower. Our algorithm is safe with the official dataset, and gives the expected final result - which was the goal of this challenge: compute the right data reduction with as little time as possible, with all possible hacks and tricks. A "perfect hash" is a well known hacking pattern, when the dataset is validated in advance. And since our CPUs offers `crc32c` which is perfect for our dataset... let's use it! https://en.wikipedia.org/wiki/Perfect_hash_function ;)
 
 ## Why L1 Cache Matters
 
-Take great care of the "64 bytes cache line" is quite unique among all implementations of the "1brc" I have seen in any language - and it does make a noticeable difference in performance.
+Taking special care of the "64 bytes cache line" is quite unique among all implementations of the "1brc" I have seen in any language - and it does make a noticeable difference in performance.
 
 The L1 cache is well known in the performance hacking litterature to be the main bottleneck for any efficient in-memory process. If you want things to go fast, you should flatter your CPU L1 cache.
 
 Min/max values will be reduced as 16-bit smallint - resulting in temperature range of -3276.7..+3276.8 which seems fair on our planet according to the IPCC. ;)
 
+As a result, each `Station[]` entry takes only 16 bytes, so we can fit exactly 4 entries in a single CPU L1 cache line. To be fair, if we put some more data into the record (e.g. use `Int64` instead of `smallint`/`integer`), the performance degrades only for a few percents. The main fact seems to be that the entry is likely to fit into a single cache line, even if filling two cache lines may be sometimes needed for misaligned data.
+
 In our first attempt (see "Old Version" below), we stored the name into the `Station[]` array, so that each entry is 64 bytes long exactly. But since `crc32c` is a perfect hash function for our dataset, it is enough to just store the 32-bit hash instead, and not the actual name.
 
 Note that if we reduce the number of stations from 41343 to 400, the performance is much higher, also with a 16GB file as input. The reason is that since 400x16 = 6400, each dataset could fit entirely in each core L1 cache. No slower L2/L3 cache is involved, therefore performance is better. The cache memory seems to be the bottleneck of our code. Which is a good sign.
@@ -236,6 +238,6 @@ Benchmark 1: abouchez
 ```
 It is a known fact from experiment that forcing thread affinity is not a good idea, and it is always much better to let any modern Operating System do  the threads scheduling to the CPU cores, because it has a much better knowledge of the actual system load and status. Even on a "fair" CPU architecture like AMD Zen. For a "pure CPU" process, affinity may help a very little. But for our "old" process working outside of the L1 cache limits, we better let the OS decide.
 
-So with this "old" version, it was decided to use `-t=16`. The "old" version is using a whole cache line (16 bytes) for its `Station[]` record, so it may be the responsible of using too much CPU cache, so more than 16 threads does not make a difference with it. Whereas our "new" version, with its `Station[]` of only 16 bytes, could use `-t=32` with benefits. The cache memory access is likely to be the bottleneck from now on.
+So with this "old" version, it was decided to use `-t=16`. The "old" version is using a whole cache line (64 bytes) for its `Station[]` record, so it may be the responsible of using too much CPU cache, so more than 16 threads does not make a difference with it. Whereas our "new" version, with its `Station[]` of only 16 bytes, could use `-t=32` with benefits. The cache memory access is likely to be the bottleneck from now on.
 
 Arnaud :D
@@ -327,29 +327,31 @@ function Average(sum, count: PtrInt): PtrInt;
   //ConsoleWrite([sum / (count * 10), ' ', result / 10]);
 end;
 
-function ByStationName(const A, B): integer;
+function ByStationName(const A, B): integer; // = StrComp() but ending with ';'
 var
   pa, pb: PByte;
+  c: byte;
 begin
   result := 0;
   pa := pointer(A);
   pb := pointer(B);
-  if pa = pb then
+  dec(pa, {%H-}PtrUInt(pb));
+  if pa = nil then
     exit;
   repeat
-    if pa^ <> pb^ then
+    c := PByteArray(pa)[{%H-}PtrUInt(pb)];
+    if c <> pb^ then
       break
-    else if pa^ = ord(';') then
+    else if c = ord(';') then
       exit; // Str1 = Str2
-    inc(pa);
     inc(pb);
   until false;
-  if pa^ = ord(';') then
+  if (c = ord(';')) or
+     ((pb^ <> ord(';')) and
+      (c < pb^)) then
     result := -1
-  else if pb^ = ord(';') then
-    result := 1
   else
-    result := pa^ - pb^;
+    result := 1;
 end;
 
 function TBrcMain.SortedText: RawUtf8;
@@ -368,36 +370,39 @@ function TBrcMain.SortedText: RawUtf8;
   assert(c <> 0);
   DynArraySortIndexed(
     pointer(fList.StationName), SizeOf(PUtf8Char), c, ndx, ByStationName);
-  // generate output
-  FastSetString(result, nil, 1200000); // pre-allocate result
-  st := TRawByteStringStream.Create(result);
   try
-    w := TTextWriter.Create(st, @tmp, SizeOf(tmp));
+    // generate output
+    FastSetString(result, nil, 1200000); // pre-allocate result
+    st := TRawByteStringStream.Create(result);
     try
-      w.Add('{');
-      n := ndx.buf;
-      repeat
-        s := @fList.Station[n^];
-        assert(s^.Count <> 0);
-        p := fList.StationName[n^];
-        w.AddNoJsonEscape(p, NameLen(p));
-        AddTemp(w, '=', s^.Min);
-        AddTemp(w, '/', Average(s^.Sum, s^.Count));
-        AddTemp(w, '/', s^.Max);
-        dec(c);
-        if c = 0 then
-          break;
-        w.Add(',', ' ');
-        inc(n);
-      until false;
-      w.Add('}');
-      w.FlushFinal;
-      FakeLength(result, w.WrittenBytes);
+      w := TTextWriter.Create(st, @tmp, SizeOf(tmp));
+      try
+        w.Add('{');
+        n := ndx.buf;
+        repeat
+          s := @fList.Station[n^];
+          assert(s^.Count <> 0);
+          p := fList.StationName[n^];
+          w.AddNoJsonEscape(p, NameLen(p));
+          AddTemp(w, '=', s^.Min);
+          AddTemp(w, '/', Average(s^.Sum, s^.Count));
+          AddTemp(w, '/', s^.Max);
+          dec(c);
+          if c = 0 then
+            break;
+          w.Add(',', ' ');
+          inc(n);
+        until false;
+        w.Add('}');
+        w.FlushFinal;
+        FakeLength(result, w.WrittenBytes);
+      finally
+        w.Free;
+      end;
     finally
-      w.Free;
+      st.Free;
     end;
   finally
-    st.Free;
     ndx.Done;
   end;
 end;
 
@@ -0,0 +1,51 @@
+# Brian Fire
+
+An Entry to the One Billion Row Challenge in Object Pascal using Delphi 12 by [EagleAglow](https://github.com/EagleAglow), Discord: briar.on.fire
+
+## Compiler
+
+**Delphi 12** Professional Edition
+
+### Dependencies
+
+Project uses Delphi units: `Classes`, `System.SysUtils`, `System.StrUtils` and `Math`.
+
+### UTF8 vs. Windows Terminal
+
+The text in the Windows Terminal console uses the system code page, which does not play well with `UTF8`.
+The only way to match the approved result is to write the output to a file, with resulting `SHA256` hash:\
+`db3d79d31b50daa8c03a1e4f2025029cb137f9971aa04129d8bca004795ae524`
+
+If the Windows console output is redirected to a file, some characters are mangled, and the resulting `SHA256` hash is:\
+`82411ba76c59ae765e85b497f135a8f4e68d7a14cb7c0909ba96dea0d0635a28`
+
+For the challenge, compiled for LINUX, the console result will (hopefully) be correct.
+
+### Execution
+```
+    Usage
+    bfire -h                       |  Write this help message and exit
+    bfire -v                       |  Write the version and exit
+    bfire -i <file_1>              |  <file_1> contains Weather Data
+    bfire -i <file_1> -o <file_2>  |  <file_1> contains Weather Data
+                                   |  <file_2> contains result
+    If <file_2> is not defined, result goes to CONSOLE (STDOUT)
+```
+
+#### Contest Mode
+
+To run the challenge, read from the 'challenge.csv' file:
+
+```console
+C:> bfire -i challenge.csv
+```
+
+## Remarks
+
+I haven't used Delphi very much recently, really needed to work on this for a refresher.
+I like TStringList self-sorting, but it is not as fast as other techniques.
+Now that this entry is set up, I can play with improvements. Maybe even get a time under 15 minutes! :)
+
+## History
+
+- Version 1.0: first working version.