Source, map, collection, sink for bounded data streams in Flink#661
Source, map, collection, sink for bounded data streams in Flink#661juripetersen merged 7 commits intoapache:mainfrom
Conversation
|
yes, pls start in our dev mailing list. |
|
Please LMK if I need to deprecate any parts of the old DataSet API. also I'm unsure whether its a smart idea to implement |
|
@juripetersen @zkaoudi @novatechflow Review when you have time pls |
| private DataStream<?> dataStream; | ||
|
|
||
| // TODO: this.size is currently always 0 | ||
| private long size; |
There was a problem hiding this comment.
Does this have any effects?
There was a problem hiding this comment.
Leftovers from the old DataSetChannel implementation the user could theoretically extend the DataStreamChannel and provide their own, but its such a niche situation I'm fine with removing it.
| @@ -56,6 +56,12 @@ public class FlinkExecutor extends PushExecutorTemplate { | |||
| */ | |||
| public ExecutionEnvironment fee; | |||
There was a problem hiding this comment.
I think ideally we just remove the old Env and also DataSet with it, so that the end user of wayang doesn't even notice this change (other than a few operators missing maybe)
There was a problem hiding this comment.
Per discussion in the dev-list I've delegated this to the configuration.
| import org.apache.wayang.flink.operators.FlinkBoundedTextFileSource; | ||
| import org.apache.wayang.flink.platform.FlinkPlatform; | ||
|
|
||
| public class BoundedTextFileSourceMapping implements Mapping { |
There was a problem hiding this comment.
As everything is bounded as of now in Wayang, I would just call this TextFileSource and replace TextFileSource from Flink with it.
We can later add a StreamedTextFileSource.
There was a problem hiding this comment.
Nevertheless since our discussion on continuous streams hasn't concluded either I still propose we keep the bounded semantic as a way to leave the door open for continuous sources later.
| /** | ||
| * Mapping from {@link JoinOperator} to {@link FlinkDataStreamJoinOperator}. | ||
| */ | ||
| public class StreamedJoinMapping implements Mapping { |
There was a problem hiding this comment.
Same goes for this operator. I think we should just remove the DataSet ones and then replace them with the DataStream.
juripetersen
left a comment
There was a problem hiding this comment.
I left some comments, but we should just discuss if that's the way to go.
|
I guess we just need more activity in the discussion on the mailing list. |
lets have a discussion about how we are going to handle Continuous sources e.g. in flink. And how we are going to integrate this vs the old data set implementation.
resolves #615