You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: entries/dcornelius/README.md
+17-9Lines changed: 17 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,11 +2,13 @@
2
2
3
3
An Entry to the One Billion Row Challenge in Object Pascal using Delphi 12 Athens by [David Cornelius](https://github.com/corneliusdavid).
4
4
5
-
I wanted to see how different methods compared for ease of writing the code and speed of execution, so I solved this in three different ways:
5
+
I wanted to see how different methods compared for ease of writing the code and speed of execution, so I approached this in three different ways:
6
6
7
-
-**TDictionary** - as each line is read, create an object and add it to a `TDictionary` collection; if an entry already exists for a city, update it instead of adding. This was simple to implement but since `TDictionary` doesn't have a sort method, after this list is built, another list must be created to sort them.
8
-
-**TStringList** - this is a really simple implementation but requires a lot of memory because the `LoadFromFile` method is used to read in all rows before processing them. Then a second TStringList is used to collate and sort the data. *NOTE: Using LoadFromFile resulted in an immediate Range Check Error when trying to read in the 1-billion line file! The default Stream created in `LoadFromFile` was the problem. When I switched to LoadFromStream and created my own Stream, it worked. However, it it's not near as fast as the `TDictionary` version.*
9
-
-**In-Memory Table** - another approach I thought I'd try was to load all the data into an in-memory table and use local SQL to query the data. While I did learn some cool things about FireDAC, I also learned that this is by far, the most *inefficient* approach for this: after running for 26 *HOURS*, I killed the process!
7
+
-**TDictionary** - as each line is read, create an object and add it to a `TDictionary` collection; if an entry already exists for a city, update it instead of adding. This was simple to implement but since `TDictionary` doesn't have a sort method, the city name keys were dumped to an array and then sorted.
8
+
-**TStringList** - This was a really simple implementation at fist because I was using `LoadFromFile`; however, when loading one billion rows, it choked with an immediate Range Check Error, so I had to revert to loading it the same way as other approaches. Loading all the strings individually turned out to be so slow and take so much memory, I killed the process before ever checking the results.
9
+
-**In-Memory Table** - Another approach I thought I'd try was to load all the data into an in-memory table and use local SQL to query the data. This had the same speed and memory problems the TStringList technique had. Curious to see how long it would take, I left it running overnight but finally killed it after it ran for *26 HOURS*!
10
+
11
+
My conclusion is that it doesn't make sense to try and load all the rows then apply summation but to collate the data as you load it.
10
12
11
13
## Compiler
12
14
@@ -16,7 +18,7 @@ I wanted to see how different methods compared for ease of writing the code and
16
18
17
19
There are no dependencies if run under one of the most recent versions of Delphi. The code should be backwards-compatible to Delphi 10.3 Rio (it uses inline variables and type inference introduced in that version) and further back with a few simple modifications. It uses `System.StrUtils`, `Generics.Collections`, and a few other run-time libraries in Delphi.
18
20
19
-
### Conditional Compilation
21
+
### Debugging
20
22
21
23
There are compiler directives to add some convenience when debugging. If built with the default *Debug* configuration, then the DEBUG compiler symbol is defined which turns on a few lines of code that give a little feedback and wait for Enter to be pressed so you can run this from the IDE without missing the quickly disappearing DOS box where the output is displayed.
22
24
@@ -34,7 +36,7 @@ The program runs identical between Win32, Win64, and Linux64. If you run it with
34
36
35
37
#### Example
36
38
37
-
To run the challenge, read from the `measurements.txt` file, and use the TDictionary method, run it like this:
39
+
To run the challenge, read from the `measurements.txt` file, and use the `TDictionary` method, run it like this:
I entered this challenge as a learning experience. I did not expect to be the fastest as I don't have time to implement multiple threads (which is clearly the road to victory here) but I had fun and learned a lot!
46
48
47
-
I now know more about how buffer size can affect stream reads significantly. I have not used streams much in Delphi before but after using `AssignFile`/`Reset`/`Readln`/`CloseFile` in my first attempts, noticing how fast `TStringList.LoadFromFile` was on small files and studying its implementation, I switched to using a `TStreamReader` and realized how much simpler and faster it is for reading text files.
49
+
I now know more about how buffer size can affect stream reads significantly. I have not used streams much in Delphi before but after using `AssignFile`/`Reset`/`Readln`/`CloseFile` in my first attempts, noticing how fast `TStringList.LoadFromFile` was on small files and studying its implementation, I switched to using a `TStreamReader` and realized how much simpler and faster it is for reading text files, something most everyone else has certainly known for many years! (But hey, I work with databases and APIs far more often than text files, so reverted back to some very old habits.)
48
50
49
51
I also learned some things about a `TDictionary` and why it works so well for this particular situation. And, *after* I implemented this method, I looked at other entries and noticed the one by IWAN KELAIAH was very similar to mine. While the "ikelaiah" entry was submitted before mine, I did not look at or copy anything from that implementation. Chalk it up to great minds thinking alike, I guess!
50
52
51
-
Finally, I learned that FireDAC has "LocalSQL" and uses the SQLite engine internally to query in-memory tables. It's not efficient for this long data set of two fields but could come in handy later, like when handling results from a REST service or something.
53
+
Finally, I learned that FireDAC has "LocalSQL" and uses the SQLite engine internally to query in-memory tables. It's not efficient for this long data set of two fields but could come in handy later, like when handling results from a REST service or something.
54
+
55
+
## Acknowledgements
56
+
57
+
I'd like to thank [Gustavo 'Gus' Carreno](https://github.com/gcarreno) for bringing this challenge to the Pascal programming community.
58
+
59
+
I'd also like to give a shout-out to my friends at the [Oregon Delphi User Group](https://odug.org), where I [presented the challenge](https://odug.org/events/2024-03/) and implemented several of their suggestions for optimization.
52
60
53
61
## History
54
62
55
-
- Version 1.0: working version with`TDictionary`, `TStringList`, and `TFDMemTable`methods implemented.
63
+
- Version 1.0 (April, 2024): successful entry using`TDictionary`. Also, implemented the solution with `TStringList` and `TFDMemTable`but they never produced a result in any reasonable timeframe.
0 commit comments