Examples of CsvInputFormat

eu.stratosphere.api.java.record.io.CsvInputFormat
Input format to parse text files and generate Records. The input file is structured by record delimiters and field delimiters (CSV files are common). Record delimiter separate records from each other ('\n' is common). Field delimiters separate fields within a record. Record and field delimiters must be configured using the InputFormat {@link Configuration}. The number of fields to parse must be configured as well. For each field a data type must be specified using the {@link CsvInputFormat#FIELD_TYPE_PARAMETER_PREFIX} config key.The position within the text record can be configured for each field using the {@link CsvInputFormat#TEXT_POSITION_PARAMETER_PREFIX} config key.Either all text positions must be configured or none. If none is configured, the index of the config key is used. The position of a value within the {@link Record} is the index of the config key. @see Configuration @see Record
org.apache.flink.api.java.record.io.CsvInputFormat
Input format to parse text files and generate Records. The input file is structured by record delimiters and field delimiters (CSV files are common). Record delimiter separate records from each other ('\n' is common). Field delimiters separate fields within a record. Record and field delimiters must be configured using the InputFormat {@link Configuration}. The number of fields to parse must be configured as well. For each field a data type must be specified using the {@link CsvInputFormat#FIELD_TYPE_PARAMETER_PREFIX} config key.The position within the text record can be configured for each field using the {@link CsvInputFormat#TEXT_POSITION_PARAMETER_PREFIX} config key.Either all text positions must be configured or none. If none is configured, the index of the config key is used. The position of a value within the {@link Record} is the index of the config key. @see Configuration @see Record

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

    
    String lineitemsPath = (args.length > 5 ? args[5] : "");
    String output        = (args.length > 6 ? args[6] : "");


    // create DataSourceContract for Orders input
    FileDataSource orders1 = new FileDataSource(new CsvInputFormat(), orders1Path, "Orders 1");
    CsvInputFormat.configureRecordFormat(orders1)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(LongValue.class, 0)    // order id
      .field(IntValue.class, 7)     // ship prio
      .field(StringValue.class, 2, 2)  // order status
      .field(StringValue.class, 4, 10)  // order date
      .field(StringValue.class, 5, 8);  // order prio
    
    FileDataSource orders2 = new FileDataSource(new CsvInputFormat(), orders2Path, "Orders 2");
    CsvInputFormat.configureRecordFormat(orders2)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(LongValue.class, 0)    // order id
      .field(IntValue.class, 7)     // ship prio
      .field(StringValue.class, 2, 2)  // order status
      .field(StringValue.class, 4, 10)  // order date
      .field(StringValue.class, 5, 8);  // order prio
    
    // create DataSourceContract for LineItems input
    FileDataSource lineitems = new FileDataSource(new CsvInputFormat(), lineitemsPath, "LineItems");
    CsvInputFormat.configureRecordFormat(lineitems)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(LongValue.class, 0)
      .field(DoubleValue.class, 5);


    // create MapOperator for filtering Orders tuples
    MapOperator filterO1 = MapOperator.builder(new FilterO())
      .name("FilterO")
      .input(orders1)
      .build();
    // filter configuration
    filterO1.setParameter(TPCHQuery3.YEAR_FILTER, 1993);
    filterO1.setParameter(TPCHQuery3.PRIO_FILTER, "5");
    filterO1.getCompilerHints().setFilterFactor(0.05f);
    
    // create MapOperator for filtering Orders tuples
    MapOperator filterO2 = MapOperator.builder(new FilterO())
      .name("FilterO")
      .input(orders2)
      .build();
    // filter configuration
    filterO2.setParameter(TPCHQuery3.YEAR_FILTER, 1993);
    filterO2.setParameter(TPCHQuery3.PRIO_FILTER, "5");


    // create JoinOperator for joining Orders and LineItems
    @SuppressWarnings("unchecked")
    JoinOperator joinLiO = JoinOperator.builder(new JoinLiO(), LongValue.class, 0, 0)
      .input1(filterO2, filterO1)
      .input2(lineitems)
      .name("JoinLiO")
      .build();
    
    FileDataSource partJoin1 = new FileDataSource(new CsvInputFormat(), partJoin1Path, "Part Join 1");
    CsvInputFormat.configureRecordFormat(partJoin1)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(LongValue.class, 0)
      .field(IntValue.class, 1)
      .field(DoubleValue.class, 2);
    
    FileDataSource partJoin2 = new FileDataSource(new CsvInputFormat(), partJoin2Path, "Part Join 2");
    CsvInputFormat.configureRecordFormat(partJoin2)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(LongValue.class, 0)
      .field(IntValue.class, 1)

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

    // parse job parameters
    int numSubTasks = (args.length > 0 ? Integer.parseInt(args[0]) : 1);
    String dataInput = (args.length > 1 ? args[1] : "");
    String output = (args.length > 2 ? args[2] : "");


    @SuppressWarnings("unchecked")
    CsvInputFormat format = new CsvInputFormat(' ', IntValue.class, IntValue.class);
    FileDataSource input = new FileDataSource(format, dataInput, "Input");
    
    // create the reduce contract and sets the key to the first field
    ReduceOperator sorter = ReduceOperator.builder(new IdentityReducer(), IntValue.class, 0)
      .input(input)

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

  //                      /
  //    Sc3(id,y) --------
  @Override
  protected Plan getTestJob() {
    // Sc1 generates M parameters a,b,c for second degree polynomials P(x) = ax^2 + bx + c identified by id
    FileDataSource sc1 = new FileDataSource(new CsvInputFormat(), sc1Path);
    CsvInputFormat.configureRecordFormat(sc1).fieldDelimiter(' ').field(StringValue.class, 0).field(IntValue.class, 1)
        .field(IntValue.class, 2).field(IntValue.class, 3);


    // Sc2 generates N x values to be evaluated with the polynomial identified by id
    FileDataSource sc2 = new FileDataSource(new CsvInputFormat(), sc2Path);
    CsvInputFormat.configureRecordFormat(sc2).fieldDelimiter(' ').field(StringValue.class, 0).field(IntValue.class, 1);


    // Sc3 generates N y values to be evaluated with the polynomial identified by id
    FileDataSource sc3 = new FileDataSource(new CsvInputFormat(), sc3Path);
    CsvInputFormat.configureRecordFormat(sc3).fieldDelimiter(' ').field(StringValue.class, 0).field(IntValue.class, 1);


    // Jn1 matches x and y values on id and emits (id, x, y) triples
    JoinOperator jn1 = JoinOperator.builder(Jn1.class, StringValue.class, 0, 0).input1(sc2).input2(sc3).build();

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

  @SuppressWarnings("unchecked")
  private static Plan getPlanForWorksetConnectedComponentsWithSolutionSetAsFirstInput(
      int numSubTasks, String verticesInput, String edgeInput, String output, int maxIterations)
  {
    // create DataSourceContract for the vertices
    FileDataSource initialVertices = new FileDataSource(new CsvInputFormat(' ', LongValue.class), verticesInput, "Vertices");
    
    MapOperator verticesWithId = MapOperator.builder(DuplicateLongMap.class).input(initialVertices).name("Assign Vertex Ids").build();
    
    DeltaIteration iteration = new DeltaIteration(0, "Connected Components Iteration");
    iteration.setInitialSolutionSet(verticesWithId);
    iteration.setInitialWorkset(verticesWithId);
    iteration.setMaximumNumberOfIterations(maxIterations);
    
    // create DataSourceContract for the edges
    FileDataSource edges = new FileDataSource(new CsvInputFormat(' ', LongValue.class, LongValue.class), edgeInput, "Edges");


    // create CrossOperator for distance computation
    JoinOperator joinWithNeighbors = JoinOperator.builder(new NeighborWithComponentIDJoin(), LongValue.class, 0, 0)
        .input1(iteration.getWorkset())
        .input2(edges)

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

     * Output Format:
     * 0: URL
     * 1: DOCUMENT_TEXT
     */
    // Create DataSourceContract for documents relation
    @SuppressWarnings("unchecked")
    CsvInputFormat docsFormat = new CsvInputFormat('|', StringValue.class, StringValue.class);
    FileDataSource docs = new FileDataSource(docsFormat, docsInput, "Docs Input");
    
    /*
     * Output Format:
     * 0: URL
     * 1: RANK
     * 2: AVG_DURATION
     */
    // Create DataSourceContract for ranks relation
    FileDataSource ranks = new FileDataSource(new CsvInputFormat(), ranksInput, "Ranks input");
    CsvInputFormat.configureRecordFormat(ranks)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(StringValue.class, 1)
      .field(IntValue.class, 0)
      .field(IntValue.class, 2);


    /*
     * Output Format:
     * 0: URL
     * 1: DATE
     */
    // Create DataSourceContract for visits relation
    @SuppressWarnings("unchecked")
    CsvInputFormat visitsFormat = new CsvInputFormat('|', null, StringValue.class, StringValue.class);
    FileDataSource visits = new FileDataSource(visitsFormat, visitsInput, "Visits input:q");


    // Create MapOperator for filtering the entries from the documents
    // relation
    MapOperator filterDocs = MapOperator.builder(new FilterDocs())

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

    String clusterInput = (args.length > 2 ? args[2] : "");
    String output = (args.length > 3 ? args[3] : "");


    // create DataSourceContract for data point input
    @SuppressWarnings("unchecked")
    FileDataSource pointsSource = new FileDataSource(new CsvInputFormat('|', IntValue.class, DoubleValue.class, DoubleValue.class, DoubleValue.class), dataPointInput, "Data Points");


    // create DataSourceContract for cluster center input
    @SuppressWarnings("unchecked")
    FileDataSource clustersSource = new FileDataSource(new CsvInputFormat('|', IntValue.class, DoubleValue.class, DoubleValue.class, DoubleValue.class), clusterInput, "Centers");
    
    MapOperator dataPoints = MapOperator.builder(new PointBuilder()).name("Build data points").input(pointsSource).build();
    
    MapOperator clusterPoints = MapOperator.builder(new PointBuilder()).name("Build cluster points").input(clustersSource).build();

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

    final String edgeInput = (args.length > 2 ? args[2] : "");
    final String output = (args.length > 3 ? args[3] : "");
    final int maxIterations = (args.length > 4 ? Integer.parseInt(args[4]) : 1);


    // data source for initial vertices
    FileDataSource initialVertices = new FileDataSource(new CsvInputFormat(' ', LongValue.class), verticesInput, "Vertices");
    
    MapOperator verticesWithId = MapOperator.builder(DuplicateLongMap.class).input(initialVertices).name("Assign Vertex Ids").build();
    
    DeltaIteration iteration = new DeltaIteration(0, "Connected Components Iteration");
    iteration.setInitialSolutionSet(verticesWithId);
    iteration.setInitialWorkset(verticesWithId);
    iteration.setMaximumNumberOfIterations(maxIterations);
    
    // create DataSourceContract for the edges
    FileDataSource edges = new FileDataSource(new CsvInputFormat(' ', LongValue.class, LongValue.class), edgeInput, "Edges");


    // create CrossOperator for distance computation
    JoinOperator joinWithNeighbors = JoinOperator.builder(new NeighborWithComponentIDJoin(), LongValue.class, 0, 0)
        .input1(iteration.getWorkset())
        .input2(edges)

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

      final int numSubtasks     = (args.length > 0 ? Integer.parseInt(args[0]) : 1);
      final String recordsPath = (args.length > 1 ? args[1] : "");
      final String output      = (args.length > 2 ? args[2] : "");


      @SuppressWarnings("unchecked")
      FileDataSource source = new FileDataSource(new CsvInputFormat(',', IntValue.class, IntValue.class, IntValue.class), recordsPath);


      FileDataSink sink = new FileDataSink(CsvOutputFormat.class, output);
      CsvOutputFormat.configureRecordFormat(sink)
        .recordDelimiter('\n')
        .fieldDelimiter(',')

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

    /*
     * Output Schema:
     * 0: CUSTOMER_ID
     */
    // create DataSourceContract for Orders input
    FileDataSource orders = new FileDataSource(new CsvInputFormat(), ordersPath, "Orders");
    orders.setDegreeOfParallelism(numSubtasks);
    CsvInputFormat.configureRecordFormat(orders)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(IntValue.class, 1);
    
    /*
     * Output Schema:
     * 0: CUSTOMER_ID
     * 1: MKT_SEGMENT
     */
    // create DataSourceContract for Customer input
    FileDataSource customers = new FileDataSource(new CsvInputFormat(), customerPath, "Customers");
    customers.setDegreeOfParallelism(numSubtasks);
    CsvInputFormat.configureRecordFormat(customers)
      .recordDelimiter('\n')
      .fieldDelimiter('|')
      .field(IntValue.class, 0)

View Full Code Here

Examples of org.apache.flink.api.java.record.io.CsvInputFormat

  
  @SuppressWarnings("unchecked")
  public static Plan getPlan(int numSubTasks, String verticesInput, String edgeInput, String output, int maxIterations, boolean extraMap) {


    // data source for initial vertices
    FileDataSource initialVertices = new FileDataSource(new CsvInputFormat(' ', LongValue.class), verticesInput, "Vertices");
    
    MapOperator verticesWithId = MapOperator.builder(DuplicateLongMap.class).input(initialVertices).name("Assign Vertex Ids").build();
    
    // the loop takes the vertices as the solution set and changed vertices as the workset
    // initially, all vertices are changed
    DeltaIteration iteration = new DeltaIteration(0, "Connected Components Iteration");
    iteration.setInitialSolutionSet(verticesWithId);
    iteration.setInitialWorkset(verticesWithId);
    iteration.setMaximumNumberOfIterations(maxIterations);
    
    // data source for the edges
    FileDataSource edges = new FileDataSource(new CsvInputFormat(' ', LongValue.class, LongValue.class), edgeInput, "Edges");


    // join workset (changed vertices) with the edges to propagate changes to neighbors
    JoinOperator joinWithNeighbors = JoinOperator.builder(new NeighborWithComponentIDJoin(), LongValue.class, 0, 0)
        .input1(iteration.getWorkset())
        .input2(edges)

View Full Code Here

0 1 2 3 4

TOP

All source code are property of their respective owners. Java is a trademark of Sun Microsystems, Inc and owned by ORACLE Inc. Contact coftware#gmail.com.