Skip to content

Commit 68d70ce

Browse files
authored
Merge pull request #95 from corneliusdavid/main
doc: updated dcornelius README
2 parents 220546a + 3622d4e commit 68d70ce

File tree

1 file changed

+17
-9
lines changed

1 file changed

+17
-9
lines changed

entries/dcornelius/README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,13 @@
22

33
An Entry to the One Billion Row Challenge in Object Pascal using Delphi 12 Athens by [David Cornelius](https://github.com/corneliusdavid).
44

5-
I wanted to see how different methods compared for ease of writing the code and speed of execution, so I solved this in three different ways:
5+
I wanted to see how different methods compared for ease of writing the code and speed of execution, so I approached this in three different ways:
66

7-
- **TDictionary** - as each line is read, create an object and add it to a `TDictionary` collection; if an entry already exists for a city, update it instead of adding. This was simple to implement but since `TDictionary` doesn't have a sort method, after this list is built, another list must be created to sort them.
8-
- **TStringList** - this is a really simple implementation but requires a lot of memory because the `LoadFromFile` method is used to read in all rows before processing them. Then a second TStringList is used to collate and sort the data. *NOTE: Using LoadFromFile resulted in an immediate Range Check Error when trying to read in the 1-billion line file! The default Stream created in `LoadFromFile` was the problem. When I switched to LoadFromStream and created my own Stream, it worked. However, it it's not near as fast as the `TDictionary` version.*
9-
- **In-Memory Table** - another approach I thought I'd try was to load all the data into an in-memory table and use local SQL to query the data. While I did learn some cool things about FireDAC, I also learned that this is by far, the most *inefficient* approach for this: after running for 26 *HOURS*, I killed the process!
7+
- **TDictionary** - as each line is read, create an object and add it to a `TDictionary` collection; if an entry already exists for a city, update it instead of adding. This was simple to implement but since `TDictionary` doesn't have a sort method, the city name keys were dumped to an array and then sorted.
8+
- **TStringList** - This was a really simple implementation at fist because I was using `LoadFromFile`; however, when loading one billion rows, it choked with an immediate Range Check Error, so I had to revert to loading it the same way as other approaches. Loading all the strings individually turned out to be so slow and take so much memory, I killed the process before ever checking the results.
9+
- **In-Memory Table** - Another approach I thought I'd try was to load all the data into an in-memory table and use local SQL to query the data. This had the same speed and memory problems the TStringList technique had. Curious to see how long it would take, I left it running overnight but finally killed it after it ran for *26 HOURS*!
10+
11+
My conclusion is that it doesn't make sense to try and load all the rows then apply summation but to collate the data as you load it.
1012

1113
## Compiler
1214

@@ -16,7 +18,7 @@ I wanted to see how different methods compared for ease of writing the code and
1618

1719
There are no dependencies if run under one of the most recent versions of Delphi. The code should be backwards-compatible to Delphi 10.3 Rio (it uses inline variables and type inference introduced in that version) and further back with a few simple modifications. It uses `System.StrUtils`, `Generics.Collections`, and a few other run-time libraries in Delphi.
1820

19-
### Conditional Compilation
21+
### Debugging
2022

2123
There are compiler directives to add some convenience when debugging. If built with the default *Debug* configuration, then the DEBUG compiler symbol is defined which turns on a few lines of code that give a little feedback and wait for Enter to be pressed so you can run this from the IDE without missing the quickly disappearing DOS box where the output is displayed.
2224

@@ -34,7 +36,7 @@ The program runs identical between Win32, Win64, and Linux64. If you run it with
3436

3537
#### Example
3638

37-
To run the challenge, read from the `measurements.txt` file, and use the TDictionary method, run it like this:
39+
To run the challenge, read from the `measurements.txt` file, and use the `TDictionary` method, run it like this:
3840

3941
```
4042
C:> docrnelius measurements.txt dic
@@ -44,12 +46,18 @@ C:> docrnelius measurements.txt dic
4446

4547
I entered this challenge as a learning experience. I did not expect to be the fastest as I don't have time to implement multiple threads (which is clearly the road to victory here) but I had fun and learned a lot!
4648

47-
I now know more about how buffer size can affect stream reads significantly. I have not used streams much in Delphi before but after using `AssignFile`/`Reset`/`Readln`/`CloseFile` in my first attempts, noticing how fast `TStringList.LoadFromFile` was on small files and studying its implementation, I switched to using a `TStreamReader` and realized how much simpler and faster it is for reading text files.
49+
I now know more about how buffer size can affect stream reads significantly. I have not used streams much in Delphi before but after using `AssignFile`/`Reset`/`Readln`/`CloseFile` in my first attempts, noticing how fast `TStringList.LoadFromFile` was on small files and studying its implementation, I switched to using a `TStreamReader` and realized how much simpler and faster it is for reading text files, something most everyone else has certainly known for many years! (But hey, I work with databases and APIs far more often than text files, so reverted back to some very old habits.)
4850

4951
I also learned some things about a `TDictionary` and why it works so well for this particular situation. And, *after* I implemented this method, I looked at other entries and noticed the one by IWAN KELAIAH was very similar to mine. While the "ikelaiah" entry was submitted before mine, I did not look at or copy anything from that implementation. Chalk it up to great minds thinking alike, I guess!
5052

51-
Finally, I learned that FireDAC has "LocalSQL" and uses the SQLite engine internally to query in-memory tables. It's not efficient for this long data set of two fields but could come in handy later, like when handling results from a REST service or something.
53+
Finally, I learned that FireDAC has "LocalSQL" and uses the SQLite engine internally to query in-memory tables. It's not efficient for this long data set of two fields but could come in handy later, like when handling results from a REST service or something.
54+
55+
## Acknowledgements
56+
57+
I'd like to thank [Gustavo 'Gus' Carreno](https://github.com/gcarreno) for bringing this challenge to the Pascal programming community.
58+
59+
I'd also like to give a shout-out to my friends at the [Oregon Delphi User Group](https://odug.org), where I [presented the challenge](https://odug.org/events/2024-03/) and implemented several of their suggestions for optimization.
5260

5361
## History
5462

55-
- Version 1.0: working version with `TDictionary`, `TStringList`, and `TFDMemTable` methods implemented.
63+
- Version 1.0 (April, 2024): successful entry using `TDictionary`. Also, implemented the solution with `TStringList` and `TFDMemTable` but they never produced a result in any reasonable timeframe.

0 commit comments

Comments
 (0)