Skip to content

Exception with ParameterString in PySparkProcessor.run() Method #3425

@dipanjank

Description

@dipanjank

Describe the bug
If I use a ParameterString or any other PipelineVariable object in the list passed to the arguments argument in PySparkProcessor.run method, I get a TypeError (TypeError: Object of type ParameterString is not JSON serializable).

According to the documentation, arguments can be a list of PipelineVariables, so expecting this to work. Is this not supported?

To reproduce
A clear, step-by-step set of instructions to reproduce the bug.

    spark_processor = PySparkProcessor(
        base_job_name="sagemaker-spark",
        framework_version="3.1",
        role=role,
        instance_count=2,
        instance_type="ml.m5.xlarge",
        sagemaker_session=sagemaker_session,
        max_runtime_in_seconds=1200,
    )

    spark_processor.run(
        submit_app="spark_processing/preprocess.py",
        arguments=[
            "--s3_input_bucket",
            ParameterString(name="s3-input-bucket", default_value=bucket),
            "--s3_input_key_prefix",
            input_prefix_abalone,
            "--s3_output_bucket",
            bucket,
            "--s3_output_key_prefix",
            input_preprocessed_prefix_abalone,
        ],
    )

Expected behavior
A clear and concise description of what you expected to happen.

Expect a SageMaker ProcessingJob to be created.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

Traceback (most recent call last):
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 63, in <module>
    run_sagemaker_spark_job(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 37, in run_sagemaker_spark_job
    spark_processor.run(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 902, in run
    return super().run(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 265, in run
    return super().run(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 248, in wrapper
    return run_func(*args, **kwargs)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 572, in run
    self.latest_job = ProcessingJob.start_new(
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 796, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 956, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 4317, in _intercept_create_request
    return create(request)
  File "/Users/dipanjan.kailthya@TMNL.nl/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 953, in submit
    LOGGER.debug("process request: %s", json.dumps(request, indent=4))
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/Users/dipanjan.kailthya@TMNL.nl/opt/anaconda3/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ParameterString is not JSON serializable

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.112.2
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PySpark
  • Framework version: 3.1
  • Python version: default
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions