-
Notifications
You must be signed in to change notification settings - Fork 8
feat(solana): wire with samples & tests #168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
vinhloc30796
wants to merge
4
commits into
Blockchain-Technology-Lab:main
Choose a base branch
from
vinhloc30796:loc/feat/support-solana
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
ccd1f29
feat(solana): wire with samples & tests
vinhloc30796 7243601
docs(solana): add new ledger
vinhloc30796 d4a82b4
docs(solana): correct query in markdown
vinhloc30796 53885c8
feat(solana): add validators collector, clustering, and docs
vinhloc30796 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,4 +11,5 @@ site | |
| .ipynb_checkpoints/ | ||
| *.ipynb | ||
| processed_data | ||
| results | ||
| results | ||
| *.env | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -26,6 +26,7 @@ ledgers: | |
| - litecoin | ||
| - tezos | ||
| - zcash | ||
| - solana | ||
|
|
||
| # Execution flags | ||
| execution_flags: | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,6 +13,7 @@ | |
| 'litecoin': DefaultParser, | ||
| 'zcash': DefaultParser, | ||
| 'tezos': DummyParser, | ||
| 'solana': DummyParser, | ||
| } | ||
|
|
||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,127 +1,136 @@ | ||
| # Data collection | ||
|
|
||
| Currently, the data for the analysis of the different ledgers is collected through | ||
| [Google BigQuery](https://console.cloud.google.com/bigquery) . | ||
|
|
||
| Note that when saving results from BigQuery you should select the option "JSONL (newline delimited)". | ||
|
|
||
| ## Sample data & queries | ||
|
|
||
| Sample data for all blockchains can be found [here](https://uoe-my.sharepoint.com/:f:/g/personal/s2125265_ed_ac_uk/Eg0L2n9P-txOtibKu9CXfloBt6_D-3D1AEsS2evtXIatVA?e=qHhFp4). | ||
| Alternatively, one can retrieve the data directly from BigQuery using the queries below. | ||
|
|
||
| ### Bitcoin | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_bitcoin.transactions` | ||
| JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Bitcoin Cash | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` | ||
| JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Cardano | ||
|
|
||
| ``` | ||
| SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses | ||
| FROM `iog-data-analytics.cardano_mainnet.block` | ||
| LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash | ||
| WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2018-01-01' | ||
| ORDER BY `iog-data-analytics.cardano_mainnet.block`.block_time | ||
| ``` | ||
|
|
||
| ### Dogecoin | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_dogecoin.transactions` | ||
| JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Ethereum | ||
|
|
||
| ``` | ||
| SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers | ||
| FROM `bigquery-public-data.crypto_ethereum.blocks` | ||
| WHERE timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Litecoin | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_litecoin.transactions` | ||
| JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Tezos | ||
|
|
||
| ``` | ||
| SELECT level as number, timestamp, baker as reward_addresses | ||
| FROM `public-data-finance.crypto_tezos.blocks` | ||
| WHERE timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Zcash | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_zcash.transactions` | ||
| JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ## Automating the data collection process | ||
|
|
||
| Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is | ||
| also possible to automate the process using a | ||
| [script](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/data_collection_scripts/collect_block_data.py) | ||
| and collect all relevant data in one go. Executing this script will run queries | ||
| from [this file](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/data_collection_scripts/queries.yaml). | ||
|
|
||
| IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to | ||
| generate the relevant credentials from Google, as described | ||
| [here](https://developers.google.com/workspace/guides/create-credentials#service-account) and save your key in the | ||
| `data_collection_scripts` directory of the project under the name 'google-service-account-key.json'. There is a | ||
| [sample file](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/data_collection_scripts/google-service-account-key-SAMPLE.json) | ||
| that you can consult, which shows what your credentials are supposed to look like (but note that this is for | ||
| informational purposes only, this file is not used in the code). | ||
|
|
||
| Once you have set up the credentials, you can just run the following command from the root | ||
| directory to retrieve data for all supported blockchains: | ||
|
|
||
| `python -m data_collection_scripts.collect_block_data` | ||
|
|
||
| There are also two command line arguments that can be used to customize the data collection process: | ||
|
|
||
| - `ledgers` accepts any number of the supported ledgers (case-insensitive). For example, adding `--ledgers bitcoin` | ||
| results in collecting data only for Bitcoin, while `--ledgers Bitcoin Ethereum Cardano` would collect data for | ||
| Bitcoin, Ethereum and Cardano. If the `ledgers` argument is omitted, then the default value is used, which | ||
| is taken from the | ||
| [configuration file](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/config.yaml) | ||
| and typically corresponds to all supported blockchains. | ||
| - `--force-query` forces the collection of all raw data files, even if the corresponding files already | ||
| exist. By default, this flag is set to False and the script only fetches block data for some blockchain if the | ||
| corresponding file does not already exist. | ||
| # Data collection | ||
|
|
||
| Currently, the data for the analysis of the different ledgers is collected through | ||
| [Google BigQuery](https://console.cloud.google.com/bigquery) . | ||
|
|
||
| Note that when saving results from BigQuery you should select the option "JSONL (newline delimited)". | ||
|
|
||
| ## Sample data & queries | ||
|
|
||
| Sample data for all blockchains can be found [here](https://uoe-my.sharepoint.com/:f:/g/personal/s2125265_ed_ac_uk/Eg0L2n9P-txOtibKu9CXfloBt6_D-3D1AEsS2evtXIatVA?e=qHhFp4). | ||
| Alternatively, one can retrieve the data directly from BigQuery using the queries below. | ||
|
|
||
| ### Bitcoin | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_bitcoin.transactions` | ||
| JOIN `bigquery-public-data.crypto_bitcoin.blocks` ON `bigquery-public-data.crypto_bitcoin.transactions`.block_number = `bigquery-public-data.crypto_bitcoin.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Bitcoin Cash | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_bitcoin_cash.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_bitcoin_cash.transactions` | ||
| JOIN `bigquery-public-data.crypto_bitcoin_cash.blocks` ON `bigquery-public-data.crypto_bitcoin_cash.transactions`.block_number = `bigquery-public-data.crypto_bitcoin_cash.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Cardano | ||
|
|
||
| ``` | ||
| SELECT `iog-data-analytics.cardano_mainnet.block`.slot_no as number, `iog-data-analytics.cardano_mainnet.pool_offline_data`.ticker_name as identifiers, `iog-data-analytics.cardano_mainnet.block`.block_time as timestamp,`iog-data-analytics.cardano_mainnet.block`.pool_hash as reward_addresses | ||
| FROM `iog-data-analytics.cardano_mainnet.block` | ||
| LEFT JOIN `iog-data-analytics.cardano_mainnet.pool_offline_data` ON `iog-data-analytics.cardano_mainnet.block`.pool_hash = `iog-data-analytics.cardano_mainnet.pool_offline_data`.pool_hash | ||
| WHERE `iog-data-analytics.cardano_mainnet.block`.block_time > '2018-01-01' | ||
| ORDER BY `iog-data-analytics.cardano_mainnet.block`.block_time | ||
| ``` | ||
|
|
||
| ### Dogecoin | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_dogecoin.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_dogecoin.transactions` | ||
| JOIN `bigquery-public-data.crypto_dogecoin.blocks` ON `bigquery-public-data.crypto_dogecoin.transactions`.block_number = `bigquery-public-data.crypto_dogecoin.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Ethereum | ||
|
|
||
| ``` | ||
| SELECT number, timestamp, miner as reward_addresses, extra_data as identifiers | ||
| FROM `bigquery-public-data.crypto_ethereum.blocks` | ||
| WHERE timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Litecoin | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_litecoin.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_litecoin.transactions` | ||
| JOIN `bigquery-public-data.crypto_litecoin.blocks` ON `bigquery-public-data.crypto_litecoin.transactions`.block_number = `bigquery-public-data.crypto_litecoin.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Tezos | ||
|
|
||
| ``` | ||
| SELECT level as number, timestamp, baker as reward_addresses | ||
| FROM `public-data-finance.crypto_tezos.blocks` | ||
| WHERE timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
| ``` | ||
|
|
||
| ### Zcash | ||
|
|
||
| ``` | ||
| SELECT block_number as number, block_timestamp as timestamp, coinbase_param as identifiers, `bigquery-public-data.crypto_zcash.transactions`.outputs | ||
| FROM `bigquery-public-data.crypto_zcash.transactions` | ||
| JOIN `bigquery-public-data.crypto_zcash.blocks` ON `bigquery-public-data.crypto_zcash.transactions`.block_number = `bigquery-public-data.crypto_zcash.blocks`.number | ||
| WHERE is_coinbase is TRUE | ||
| AND timestamp > '2018-01-01' | ||
| ORDER BY timestamp | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Missing closing quotes here |
||
| ### Solana | ||
|
|
||
| ``` | ||
| SELECT height as number, block_timestamp as timestamp, leader as reward_addresses | ||
| FROM `bigquery-public-data.crypto_solana_mainnet_us.Blocks` | ||
| WHERE block_timestamp > '2020-03-15' | ||
| AND block_timestamp < '{{timestamp}}' | ||
| ORDER BY block_timestamp | ||
| ``` | ||
|
|
||
| ## Automating the data collection process | ||
|
|
||
| Instead of executing each of these queries separately on the BigQuery console and saving the results manually, it is | ||
| also possible to automate the process using a | ||
| [script](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/data_collection_scripts/collect_block_data.py) | ||
| and collect all relevant data in one go. Executing this script will run queries | ||
| from [this file](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/data_collection_scripts/queries.yaml). | ||
|
|
||
| IMPORTANT: the script uses service account credentials for authentication, therefore before running it, you need to | ||
| generate the relevant credentials from Google, as described | ||
| [here](https://developers.google.com/workspace/guides/create-credentials#service-account) and save your key in the | ||
| `data_collection_scripts` directory of the project under the name 'google-service-account-key.json'. There is a | ||
| [sample file](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/data_collection_scripts/google-service-account-key-SAMPLE.json) | ||
| that you can consult, which shows what your credentials are supposed to look like (but note that this is for | ||
| informational purposes only, this file is not used in the code). | ||
|
|
||
| Once you have set up the credentials, you can just run the following command from the root | ||
| directory to retrieve data for all supported blockchains: | ||
|
|
||
| `python -m data_collection_scripts.collect_block_data` | ||
|
|
||
| There are also two command line arguments that can be used to customize the data collection process: | ||
|
|
||
| - `ledgers` accepts any number of the supported ledgers (case-insensitive). For example, adding `--ledgers bitcoin` | ||
| results in collecting data only for Bitcoin, while `--ledgers Bitcoin Ethereum Cardano` would collect data for | ||
| Bitcoin, Ethereum and Cardano. If the `ledgers` argument is omitted, then the default value is used, which | ||
| is taken from the | ||
| [configuration file](https://github.com/Blockchain-Technology-Lab/consensus-decentralization/blob/main/config.yaml) | ||
| and typically corresponds to all supported blockchains. | ||
| - `--force-query` forces the collection of all raw data files, even if the corresponding files already | ||
| exist. By default, this flag is set to False and the script only fetches block data for some blockchain if the | ||
| corresponding file does not already exist. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the mapping information is in place, this should be changed to a non-dummy mapping. I'm assuming we'll need a new SolanaMapping that inherits from DefaultMapping and overrides some functions, like
map_from_known_clusters