[SOLR-18176] HttpShardHandler query throughput bottleneck from ZooKeeper#4237
[SOLR-18176] HttpShardHandler query throughput bottleneck from ZooKeeper#4237mlbiscoc wants to merge 2 commits intoapache:mainfrom
Conversation
dsmiley
left a comment
There was a problem hiding this comment.
Nice!!!
Remember ./gradlew writeChangelog
Thankfully, org.apache.solr.servlet.HttpSolrCall is also asking for the cached version (2 places).
Makes me wonder, what other callers of getCollection (thus allowCached) were making a deliberate decision vs accidental? IMO the cached version should be the default, if there should even be a default.
I agree. I thought about the same thing but I have no idea of the impact and potential consequences of making that change. |
dsmiley
left a comment
There was a problem hiding this comment.
I'm fine without a test but it'd be nice. Not some narrow unit test but somehow deeper. Like... SolrZkClient has metrics... so I could imagine a test that reads a metric before, does the query, then reads the metric again to ensure it hasn't changed. Such a test would comprehensively ensure what we want without depending on any internals other than of course metric names but that's fair.
|
Thinking of the bigger picture of Solr's internal APIs, caching or not, I'm reminded of a year ago working closer with ClusterState. ClusterState has a bit of javadocs claiming that it is immutable. But it actually isn't -- it can fetch newer state. It's live nodes can change later. Some parts of Solr implicitly expect ClusterState to be an immutable snapshot of the world -- replica placement plugins and I think some general cluster processing. I'd rather see a simplification of ClusterState to actually be immutable, and instead the ClusterStateProvider be the primary API to get / cache state.json & livenodes & aliases & cluster properties etc. Aligned with this would be moving some of CloudSolrClient's caching into an improved ClusterStateProvider base class. I wonder what you think @HoustonPutman ? |
Yeah, that sounds good to me. I don't really have an opinion on whether ClusterState should be immutable or not, but it should be one or the other (immutable or everything is updated) and not in some weird middle-ground. |
https://issues.apache.org/jira/browse/SOLR-18176
HttpShardHandler was bottlenecking in throughput due to
CloudReplicaSourcerecalling for ZooKeeper collection state with everydistribrequest due to the missingallowCache=trueparameter. This resulted in large CPU utilization in ZooKeeper and thesynchronizedcall blocking QTP threads waiting for zookeeper response.See JIRA above for more detailed information.