Conversation
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
Automatically Merged using SST Master Branch Merger
| class ExtendedRequest : public SST::Interfaces::SimpleNetwork::Request { | ||
| private: | ||
| // Map from plugin name/key to metadata | ||
| std::map<std::string, std::shared_ptr<PluginMetadata>> metadata; |
There was a problem hiding this comment.
Using shard_ptr in classes that end up in events is not supported since std::shared_ptr is not serializable for events used in synchronization. This is because pointers are not tracked during event serialization, so nothing will end up being shared outside of the rank it was created on. I know this will cause some memory increase, but, in general, elements in the merlin library are assumed to be able to run across multiple ranks. I'm going to continue to review the code, but I think this will need to be changed before we can merge it.
| { | ||
| public: | ||
| // Static shared data accessible by both SourceRoutingPlugin and topology classes | ||
| static std::map<int, int> endpoint_to_router_map; // Shared endpoint-to-router mapping |
There was a problem hiding this comment.
Generally, static variables shouldn't be used in SubComponents. They should use the SharedObjects provided by the core. Since both of the classes that need access are derived from SubComponent, they will have access to those APIs.
There was a problem hiding this comment.
Ok, a minor question:
these SharedObjects support array of common data types (int, bool, ...) right?
Can we also share arbitrary objects like mutex of iostream?
There was a problem hiding this comment.
The SharedObjects pretty much support any Object that is serializable, which means that a mutex would not be supported. If you need to have a shared Mutex, then that will still need to be a static variable. I/O generally doesn't fall under the same restrictions as variables used during the simulation.
|
Hi, no worries, we are not in any hurry. Recently I have been thinking about the design of this branch. |
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Automatically Merged using SST Master Branch Merger
…ding ugal routing with the source routing mechanisms.
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
| void serialize_order(SST::Core::Serialization::serializer &ser) override { | ||
| Request::serialize_order(ser); | ||
|
|
||
| // Manually serialize the metadata map since std::variant isn't directly supported |
There was a problem hiding this comment.
std::variant is supported by the serializer now, but it was developed primarily for checkpointing and may not work properly with pointer tracking turned off. I'd have to look through that code to know if pointer tracking matters. It may be worth trying to see if you can just serialize the vector directly.
Automatically Merged using SST Master Branch Merger
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
Automatically Merged using SST Master Branch Merger
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
Just an update. I've worked through about a third of the code changes, but have a deadline this week and then have a week of vacation. I will continue to review code when I get back. |
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
One thing that would really improve the usability of these new components is filling out the ELI parameter documentation. Currently The ELI documentation is what users see through For reference, here's what the SST_ELI_DOCUMENT_PARAMS(
{"num_routers", "Number of routers in the network."},
{"num_R2R_ports", "Number of router-to-router ports on this router."},
{"num_R2N_ports", "Number of router-to-endpoint (NIC) ports on this router."},
{"routing_mode", "Routing mode per VN [source_routing | dest_tag_routing].", "source_routing"},
{"source_routing_algo", "Source routing algorithm per VN [weighted | UGAL | UGAL_THRESHOLD].", "weighted"},
{"vcs_per_vn", "Number of virtual channels per virtual network.", ""},
{"vn_ugal_num_valiant", "Number of valiant paths to consider for UGAL routing per VN.", "3"},
{"adaptive_threshold", "Threshold for UGAL_THRESHOLD adaptive routing decisions.", "2.0"},
{"connectivity", "Router connectivity string. Format: \"dest_id,port0,port1;dest_id,port0,...\""},
{"simple_routing_entry_string","Serialized routing table for untimed packets. Format: \"dest:weight,hop1,hop2;weight,hop3|dest:...\""},
{"endpoint_to_port_map", "Map of global endpoint IDs to local router port numbers."},
{"port_to_endpoint_map", "Map of local router port numbers to global endpoint IDs."},
{"verbose_level", "Verbosity level for debug output.", "0"},
)Similarly, |
feldergast
left a comment
There was a problem hiding this comment.
I've looked through all the code. Generally it looks good, with just a few minor code cleanup suggestions. However, there is one big issue that needs to have some thought put into it, which is the fact that the modified code structure has added a series of dynamic_casts() to the heavily used code paths. We have seen noticeable performance issues when dynamic_cast() is used often. We need to give some thought to limiting the use of dynamic casts, which may require some changes to the SimpleNetwork API in the core. I will need to give this some more thought after the SST 16 release goes in the next week or so. I had hoped to get this into the release, but I don't know that we can resolve this issue before the release.
| delete rng; | ||
| } | ||
|
|
||
| std::deque<int> RandomWeightedSelection::selectPath(int dest_router, const std::vector<path_with_weight>& paths) { |
There was a problem hiding this comment.
Do you have to return this by value, causing a copy of the deque? If the contents of the deque are only read after the function is called, can you return a const reference instead?
| WeightedRoundRobinSelection::~WeightedRoundRobinSelection() { | ||
| } | ||
|
|
||
| std::deque<int> WeightedRoundRobinSelection::selectPath(int dest_router, const std::vector<path_with_weight>& paths) { |
There was a problem hiding this comment.
const std::deque& instead??
| namespace SST { | ||
| namespace Merlin { | ||
|
|
||
| class endpointNIC : public SST::Interfaces::SimpleNetwork |
There was a problem hiding this comment.
Generally the class name would capitalized camel case, i.e. EndpointNIC
|
|
||
| loadPlugins(params); | ||
|
|
||
| std::string networkIF = params.find<std::string>("networkIF"); |
There was a problem hiding this comment.
Looks like this is not used other then in the commented out code below.
| out.init(getName() + ": ", 0, 0, Output::STDOUT); | ||
| // See if we need to set up a nid map | ||
| bool found = false; | ||
| EP_id = params.find<int>("EP_id",-1,found); |
There was a problem hiding this comment.
I think for the anytopo topology, getting the endpoint ID from params makes sense, but if you're using this with any of the other topologies as a way to do a NIC with extra plugins, then the EndpointNIC will need to get the ID from the topology using LinkControl::getEndpointID during init(). This may necessitate a change in some of the plug-in APIs so that the endpoint ID can be assigned later.
I don't think this needs to be done before an initial merge, but some thought should be given to how this could be done to make this new infrastructure more generally useful.
| if (!req) return nullptr; | ||
|
|
||
| // Convert to ExtendedRequest if not already | ||
| ExtendedRequest* ext_req = dynamic_cast<ExtendedRequest*>(req); |
There was a problem hiding this comment.
Dynamic casts can have a noticeable performance impact. I'm not sure there's a way to avoid this given the core APIs being used, but we should think of enhancements to the SimpleNetwork interface that could remove the need for a dynamic_cast().
| // Lookup path in routing table and set it as metadata | ||
| std::deque<int> path = selectPath(lookupRtrForEndpoint(ext_req->dest)); | ||
|
|
||
| // // use verbose to print out this routing decision: |
There was a problem hiding this comment.
If this code isn't needed, it should be removed. If you want to keep it around, it should use the Output object instead cout.
| } // namespace Merlin | ||
| } // namespace SST | ||
|
|
||
| #endif No newline at end of file |
|
|
||
| // Need to put in the sequence number | ||
| // Convert to ExtendedRequest if not already | ||
| Merlin::ExtendedRequest* ext_req = dynamic_cast<Merlin::ExtendedRequest*>(req); |
There was a problem hiding this comment.
I'm worried about potentially having to have multiple dynamic_casts() for every request that is sent. If you were to have en EndpointNIC that needed a ReorderLinkControl that also has a LinkControl, then you end up multiple calls to dynamic_cast(). When I was architecting the event hierarchy for merlin, there was a noticeable performance difference between dynamic casting to determine event type and putting an enum in the base class to query the type. This type of structure is little more difficult given the fact that the base class is a core class, but I'm not comfortable with the dynamic casts that will potentially slow down simulations that don't even use the plug-in capable EndpointNIC.
| // uint32_t seq; | ||
|
|
||
| ~ReorderRequest() {} | ||
| // ReorderRequest() : |
There was a problem hiding this comment.
Due to all the additional dynamic casts, I'm not convinced this is a good path to take.
Hi Devs,
This is a pull request, previously mentioned in issue #2584
There are documentations of this work in
"sst-elements-src/src/sst/elements/merlin/interfaces/endpointNIC/documentations/README.md", and"sst-elements-src/src/sst/elements/merlin/topology/anytopo_ultility/README.md"In a few words, this branch extends the Merlin module in SST-element, such that any network topology can be built with an input networkx graph. Another key feature is to support source routing (the endpoint NIC determines the routed paths). And a few popular HPC network topologies (dragonfly, slimfly, polarfly, jellyfish) are already defined in
"sst-elements-src/src/sst/elements/merlin/topology/anytopo_ultility/"dir, together with some tests in"sst-elements-src/src/sst/elements/merlin/topology/anytopo_ultility/tests".If you would like to accept this pull requests, maybe we can move these tests to the main merlin test dir?
It worth mention that the reorderedlinkcontrol has been modified to fit in the new 'ExtendedRequest' framework, this has been included in tests as well, see
EndpointNIC(use_reorderLinkControl=True ...in the tests.I believe that this pull request will have the following contribution to Merlin:
Please let me know if there is any feedback/comments/questions for the code.
Best regards,
Z.