-
Notifications
You must be signed in to change notification settings - Fork 24
Integration Test for #2246: add IT for session eviction during DB failure #2321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Hazel-Datastax
wants to merge
50
commits into
main
Choose a base branch
from
hazel/IT-for-evict-session
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+397
−13
Open
Changes from all commits
Commits
Show all changes
50 commits
Select commit
Hold shift + click to select a range
305debe
test
Hazel-Datastax 8f4a7c4
test
Hazel-Datastax 7649661
test - isolate the test
Hazel-Datastax f5e4364
test - isolate the integration tests
Hazel-Datastax f71659a
test - isolate the integration tests again
Hazel-Datastax 814c29f
Merge branch 'main' into hazel/IT-for-evict-session
Hazel-Datastax ce1cf5e
test - isolate the integration tests again
Hazel-Datastax c394e56
test - isolate the integration tests again
Hazel-Datastax 45e6a9d
add java docs and make codes clear
Hazel-Datastax 6dc24d5
test - delete maybe unused code
Hazel-Datastax ea49c4d
rollback previous commit
Hazel-Datastax eee8cf6
some code refactor
Hazel-Datastax 29cebeb
rollback some changes
Hazel-Datastax 3f38489
Merge branch 'main' into hazel/IT-for-evict-session
Hazel-Datastax 7c41134
test db failure simulation
Hazel-Datastax 69d7f65
test db failure simulation again
Hazel-Datastax ebe95cf
Merge branch 'main' into hazel/IT-for-evict-session
Hazel-Datastax 2e7dd5c
test db failure simulation again
Hazel-Datastax 0ffa68f
Merge branch 'main' into hazel/IT-for-evict-session
Hazel-Datastax 14940d1
fix test error
Hazel-Datastax 49b6541
test again
Hazel-Datastax 19342d7
Merge branch 'main' into hazel/IT-for-evict-session
Hazel-Datastax 68d3165
add debug log
Hazel-Datastax 666c1db
change resource limit, test again
Hazel-Datastax 6d793a9
rollback resource limit and add log, test again
Hazel-Datastax e92d044
add log, test again
Hazel-Datastax fe9d5d3
more logs, test again
Hazel-Datastax ff8bfe2
check if it's OOM, test again
Hazel-Datastax 7df6042
not OOM, change initial_token, test again
Hazel-Datastax 6c5f875
add java comments
Hazel-Datastax adc66d7
initial_token solved one problem, another problem surfaces, add more …
Hazel-Datastax e30ddf0
cannot find the problem, add more log...
Hazel-Datastax 5617e2a
Force port binding, test again
Hazel-Datastax 0174c07
Add java doc
Hazel-Datastax ee555b2
Makes the core function clean
Hazel-Datastax 517ebfb
Clean up and test
Hazel-Datastax 8e34626
Clean up again
Hazel-Datastax 0f5a6dd
tweak java doc
Hazel-Datastax 8999c93
change log
Hazel-Datastax b701aad
remove logs
Hazel-Datastax 3051ea7
Merge branch 'main' into hazel/IT-for-evict-session
Hazel-Datastax ca3dd41
rename the method
Hazel-Datastax b0d99b1
Merge remote-tracking branch 'origin/hazel/IT-for-evict-session' into…
Hazel-Datastax 308eb24
easy fixes from PR review
Hazel-Datastax 8846a07
add logs and one more test verification
Hazel-Datastax 0231ad0
tweak the logic in isCassandraUp
Hazel-Datastax bff0ee2
Merge remote-tracking branch 'origin/main' into hazel/IT-for-evict-se…
Hazel-Datastax 3133d9c
update error code
Hazel-Datastax ddfda5b
update test assertion
Hazel-Datastax 748923b
add log to see what happened
Hazel-Datastax File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
336 changes: 336 additions & 0 deletions
336
src/test/java/io/stargate/sgv2/jsonapi/api/v1/SessionEvictionIntegrationTest.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,336 @@ | ||
| package io.stargate.sgv2.jsonapi.api.v1; | ||
|
|
||
| import static io.restassured.RestAssured.given; | ||
| import static io.stargate.sgv2.jsonapi.api.v1.ResponseAssertions.responseIsFindSuccess; | ||
| import static org.hamcrest.Matchers.*; | ||
|
|
||
| import com.github.dockerjava.api.DockerClient; | ||
| import com.github.dockerjava.api.async.ResultCallback; | ||
| import com.github.dockerjava.api.model.Frame; | ||
| import io.quarkus.test.common.QuarkusTestResource; | ||
| import io.quarkus.test.junit.QuarkusIntegrationTest; | ||
| import io.restassured.http.ContentType; | ||
| import io.restassured.response.Response; | ||
| import io.stargate.sgv2.jsonapi.exception.DatabaseException; | ||
| import io.stargate.sgv2.jsonapi.testresource.DseTestResource; | ||
| import java.io.IOException; | ||
| import java.net.ServerSocket; | ||
| import java.util.Collections; | ||
| import java.util.Map; | ||
| import org.junit.jupiter.api.Test; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
| import org.testcontainers.containers.GenericContainer; | ||
|
|
||
| @QuarkusIntegrationTest | ||
| @QuarkusTestResource( | ||
| value = SessionEvictionIntegrationTest.SessionEvictionTestResource.class, | ||
| restrictToAnnotatedClass = true) | ||
Hazel-Datastax marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| public class SessionEvictionIntegrationTest extends AbstractCollectionIntegrationTestBase { | ||
|
|
||
| private static final Logger LOGGER = | ||
| LoggerFactory.getLogger(SessionEvictionIntegrationTest.class); | ||
|
|
||
| /** | ||
| * A specialized TestResource that spins up a new HCD/DSE container exclusively for this test | ||
| * class. | ||
| * | ||
| * <p>Unlike the standard {@link DseTestResource} used by other tests, this resource ensures a | ||
| * dedicated database instance. This isolation is crucial because this test involves destructive | ||
| * operations that would negatively impact other tests sharing a common database. | ||
| */ | ||
| public static class SessionEvictionTestResource extends DseTestResource { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: inner classes at the bottom of the class. |
||
|
|
||
| /** | ||
| * Holds the reference to the container started by this resource. | ||
| * | ||
| * <p>This field is {@code static} to act as a bridge between the {@link QuarkusTestResource} | ||
| * lifecycle (which manages the resource instance) and the test instance (where we need to | ||
| * access the container to perform operations). | ||
| */ | ||
| private static GenericContainer<?> sessionEvictionCassandraContainer; | ||
|
|
||
| /** | ||
| * Overridden to enforce a fixed port binding for the Cassandra container native binary / CQL | ||
| * port (9042). | ||
| * | ||
| * <p>Standard Testcontainers use random port mapping. However, this test manually stops and | ||
| * restarts the container to simulate failure. Under normal circumstances, a restarted container | ||
| * will not retain its original random port mapping, causing the initial port forwarding to | ||
| * break. | ||
| * | ||
| * <p>By using a fixed port binding (finding an available local port and mapping it explicitly), | ||
| * we ensure the database is always accessible on the same port after a restart, allowing the | ||
| * Java driver to successfully reconnect. | ||
| */ | ||
| @Override | ||
| protected GenericContainer<?> baseCassandraContainer(boolean reuse) { | ||
| GenericContainer<?> container = super.baseCassandraContainer(reuse); | ||
| try (ServerSocket socket = new ServerSocket(0)) { | ||
| int port = socket.getLocalPort(); | ||
| // Map the randomly selected available host port to the container's native CQL port (9042) | ||
| container.setPortBindings(Collections.singletonList(port + ":9042")); | ||
Hazel-Datastax marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } catch (IOException e) { | ||
| throw new RuntimeException("Failed to find open port", e); | ||
| } | ||
| return container; | ||
| } | ||
|
|
||
| /** | ||
| * Starts the container and captures the reference. | ||
| * | ||
| * <p>We override this method to capture the container instance created by the superclass into | ||
| * our static {@link #sessionEvictionCassandraContainer} field, making it accessible to the test | ||
| * methods. | ||
| */ | ||
| @Override | ||
| public Map<String, String> start() { | ||
| var props = super.start(); | ||
| sessionEvictionCassandraContainer = super.getCassandraContainer(); | ||
| return props; | ||
| } | ||
|
|
||
| /** | ||
| * Overridden to strictly prevent system property pollution. | ||
| * | ||
| * <p>The standard {@link DseTestResource} publishes connection details (like CQL port) to | ||
| * global System Properties. Since this test runs in parallel with others, publishing our | ||
| * isolated container's details would overwrite the shared container's configuration, causing | ||
| * other tests to connect to this container (which we are about to kill), leading to random | ||
| * failures in the test suite. | ||
| */ | ||
| @Override | ||
| protected void setSystemProperties(Map<String, String> props) { | ||
| // No-op: Do not expose system properties to avoid interfering with other tests running in | ||
| // parallel | ||
| } | ||
|
|
||
| public static GenericContainer<?> getSessionEvictionCassandraContainer() { | ||
| return sessionEvictionCassandraContainer; | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Overridden to ensure we connect to the isolated container created for this test. | ||
| * | ||
| * <p>The base class implementation relies on global system properties, which point to the shared | ||
| * database container. To ensure this test correctly interacts with its dedicated container, we | ||
| * must retrieve the port directly from the isolated container instance. | ||
| */ | ||
| @Override | ||
| protected int getCassandraCqlPort() { | ||
| GenericContainer<?> container = | ||
| SessionEvictionTestResource.getSessionEvictionCassandraContainer(); | ||
| if (container == null) { | ||
| throw new IllegalStateException("Session eviction IT Cassandra container not started!"); | ||
| } | ||
| return container.getMappedPort(9042); | ||
| } | ||
|
|
||
| /** | ||
| * @return The DockerClient for the isolated Cassandra container. | ||
| */ | ||
| private DockerClient getDockerClient() { | ||
| return SessionEvictionTestResource.getSessionEvictionCassandraContainer().getDockerClient(); | ||
| } | ||
|
|
||
| /** | ||
| * @return The container ID of the isolated Cassandra container. | ||
| */ | ||
| private String getContainerId() { | ||
| return SessionEvictionTestResource.getSessionEvictionCassandraContainer().getContainerId(); | ||
| } | ||
|
|
||
| @Test | ||
| public void testSessionEvictionOnAllNodesFailed() { | ||
| // 1. Insert and find initial data to ensure the database is healthy before the test | ||
| insertDoc( | ||
| """ | ||
| { | ||
| "_id": "before_crash" | ||
| } | ||
| """); | ||
|
|
||
| LOGGER.error( | ||
| "break here??? Container status: {}, Cassandra status: {}", | ||
| isContainerRunning(), | ||
| isCassandraUp(getDockerClient(), getContainerId())); | ||
|
|
||
| givenHeadersPostJsonThenOkNoErrors( | ||
| """ | ||
| { | ||
| "findOne": { | ||
| "filter" : {"_id" : "before_crash"} | ||
| } | ||
| } | ||
| """) | ||
| .body("$", responseIsFindSuccess()) | ||
| .body("data.document._id", is("before_crash")); | ||
|
|
||
| // 2. Stop the container to simulate DB failure | ||
Hazel-Datastax marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // Use low-level dockerClient to stop the container without triggering Testcontainers' | ||
| // cleanup/termination logic (which dbContainer.stop() would do). | ||
| // This effectively "pulls the plug" while keeping the container instance intact for restart. | ||
| getDockerClient().stopContainerCmd(getContainerId()).exec(); | ||
|
|
||
| // 3. Verify failure: The request should receive a 500 error/AllNodesFailedException | ||
| givenHeadersPostJsonThen( | ||
| """ | ||
| { | ||
| "findOne": {} | ||
| } | ||
| """) | ||
| .statusCode(200) | ||
| .body( | ||
| "errors[0].errorCode", is(DatabaseException.Code.FAILED_TO_CONNECT_TO_DATABASE.name())); | ||
|
|
||
| // 4. Restart the container to simulate recovery | ||
| getDockerClient().startContainerCmd(getContainerId()).exec(); | ||
|
|
||
| // 5. Wait for the database to become responsive again | ||
| waitForDbRecovery(); | ||
|
|
||
| // 6. Verify Session Recovery: check the data before crashing | ||
| // Not to check that cassandra works, but to check that we are running the same container as | ||
| // before. We want to verify that the data / state before the crash is the same as after. | ||
| givenHeadersPostJsonThenOkNoErrors( | ||
| """ | ||
| { | ||
| "findOne": { | ||
| "filter" : {"_id" : "before_crash"} | ||
| } | ||
| } | ||
| """) | ||
| .body("$", responseIsFindSuccess()) | ||
| .body("data.document._id", is("before_crash")); | ||
|
|
||
| // 7. Verify Session Recovery: insert and find data again, the request should succeed | ||
| insertDoc( | ||
| """ | ||
| { | ||
| "_id": "after_crash" | ||
| } | ||
| """); | ||
|
|
||
| givenHeadersPostJsonThenOkNoErrors( | ||
| """ | ||
| { | ||
| "findOne": { | ||
| "filter" : {"_id" : "after_crash"} | ||
| } | ||
| } | ||
| """) | ||
| .body("$", responseIsFindSuccess()) | ||
| .body("data.document._id", is("after_crash")); | ||
| } | ||
|
|
||
| /** | ||
| * Polls the DB container until the container is running, Cassandra is up, and the API request | ||
| * returns 200. | ||
| * | ||
| * @throws RuntimeException if recovery does not occur within the timeout period. | ||
| */ | ||
| private void waitForDbRecovery() { | ||
| long start = System.currentTimeMillis(); | ||
| long timeout = 120000; // 120 seconds timeout | ||
| boolean isContainerRunning = false; | ||
| boolean isCassandraUp = false; | ||
| Response response = null; | ||
|
|
||
| while (System.currentTimeMillis() - start < timeout) { | ||
| try { | ||
| // 1. Check container | ||
| isContainerRunning = isContainerRunning(); | ||
|
|
||
| // 2. Check Cassandra (only after the container is running) | ||
| isCassandraUp = isContainerRunning && isCassandraUp(getDockerClient(), getContainerId()); | ||
|
|
||
| // 3. Check API (only after Cassandra is up) | ||
| if (isCassandraUp) { | ||
| response = getAPIResponse(); | ||
| if (response.getStatusCode() == 200) { | ||
| LOGGER.info( | ||
| "Database and API have recovered after {} ms.", | ||
| (System.currentTimeMillis() - start)); | ||
| return; | ||
| } | ||
| } | ||
| } catch (Exception e) { | ||
| LOGGER.warn("Error checking DB status: {}", e.getMessage(), e); | ||
| } | ||
|
|
||
| // Poll every 1s | ||
| try { | ||
| Thread.sleep(1000); | ||
| } catch (InterruptedException ignored) { | ||
| Thread.currentThread().interrupt(); | ||
| break; | ||
| } | ||
| } | ||
| throw new RuntimeException( | ||
| "DB failed to recover within " | ||
| + timeout | ||
| + "ms. Container status: " | ||
| + isContainerRunning | ||
| + ", Cassandra status: " | ||
| + isCassandraUp | ||
| + ", API response body: " | ||
| + (response != null ? response.asString() : "null")); | ||
| } | ||
|
|
||
| /** Checks if the Cassandra container is currently in the "running" state. */ | ||
| private boolean isContainerRunning() { | ||
| return Boolean.TRUE.equals( | ||
| getDockerClient().inspectContainerCmd(getContainerId()).exec().getState().getRunning()); | ||
| } | ||
|
|
||
| /** | ||
| * Get a simple findOne response from the Data API. We will verify the response after the function | ||
| * call | ||
| */ | ||
| private Response getAPIResponse() { | ||
| return given() | ||
| .headers(getHeaders()) | ||
| .contentType(ContentType.JSON) | ||
| .body("{\"findOne\": {}}") | ||
| .post(CollectionResource.BASE_PATH, keyspaceName, collectionName); | ||
| } | ||
|
|
||
| /** Checks if Cassandra is up and normal by running "nodetool status" inside the container. */ | ||
| private boolean isCassandraUp(DockerClient dockerClient, String containerId) { | ||
| try { | ||
| // .exec() here merely registers the command with Docker and returns an execution ID. It does | ||
| // NOT start the command yet. | ||
| var execCreateCmdResponse = | ||
| dockerClient | ||
| .execCreateCmd(containerId) | ||
| .withAttachStdout(true) | ||
| .withAttachStderr(true) | ||
| .withCmd("nodetool", "status") | ||
| .exec(); | ||
|
|
||
| // Capture the output (stdout/stderr) from the container command | ||
| // The Docker Java API for 'execStart' is asynchronous and requires a ResultCallback to handle | ||
| // the stream of output (Frames) from the container. | ||
| StringBuilder output = new StringBuilder(); | ||
| var callback = | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: comment to explain the mystery |
||
| new ResultCallback.Adapter<Frame>() { | ||
| @Override | ||
| public void onNext(Frame object) { | ||
| output.append(new String(object.getPayload())); | ||
| } | ||
| }; | ||
|
|
||
| // .exec(callback) triggers the actual execution in the container. | ||
| // .awaitCompletion() blocks until the command finished or the callback is closed. | ||
| dockerClient.execStartCmd(execCreateCmdResponse.getId()).exec(callback).awaitCompletion(); | ||
| var inspectExecResponse = dockerClient.inspectExecCmd(execCreateCmdResponse.getId()).exec(); | ||
|
|
||
| // Check that the command succeeded (exit code 0) AND the node status implies "UN" (Up/Normal) | ||
| return inspectExecResponse.getExitCodeLong() == 0 && output.toString().contains("UN"); | ||
| } catch (Exception e) { | ||
| LOGGER.warn("Error Cassandra status: {}", e.getMessage(), e); | ||
| return false; | ||
Hazel-Datastax marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| } | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.