The assumption here is that the logical to physical translation will create this dummy operator with just the filename using which the input branch will be stored and used for loading Also the translation should make sure that appropriate filter operators are configured as outputs of this operator using the conditions specified in the LOSplit. So LOSplit will be converted into: | | | Filter1 Filter2 ... Filter3 | | ... | | | ... | ---- POSplit -... ---- This is different than the existing implementation where the POSplit writes to sidefiles after filtering and then loads the appropriate file.
The approach followed here is as good as the old approach if not better in many cases because of the availability of attachinInputs. An optimization that can ensue is if there are multiple loads that load the same file, they can be merged into one and then the operators that take input from the load can be stored. This can be used when the mapPlan executes to read the file only once and attach the resulting tuple as inputs to all the operators that take input from this load. In some cases where the conditions are exclusive and some outputs are ignored, this approach can be worse. But this leads to easier management of the Split and also allows to reuse this data stored from the split job whenever necessary.
|
|