Giter Club home page Giter Club logo

data-import's Introduction

Ddeboer Data Import library

Build Status Scrutinizer Quality Score Code Coverage Latest Stable Version

This library has been renamed to PortPHP and will be deprecated. Please use PortPHP instead.

Introduction

This PHP library offers a way to read data from, and write data to, a range of file formats and media. Additionally, it includes tools to manipulate your data.

Features

  • Read from and write to CSV files, Excel files, databases, and more.
  • Convert between charsets, dates, strings and objects on the fly.
  • Build reusable and extensible import workflows.
  • Decoupled components that you can use on their own, such as a CSV and Excel reader and writer.
  • Well-tested code.

Documentation

Installation

This library is available on Packagist. The recommended way to install it is through Composer:

$ composer require ddeboer/data-import:@stable

Then include Composer’s autoloader:

require_once 'vendor/autoload.php';

For integration with Symfony2 projects, the DdeboerDataImportBundle is available.

Usage

Broadly speaking, you can use this library in two ways:

The workflow

Each data import revolves around the workflow and takes place along the following lines:

  1. Construct a reader.
  2. Construct a workflow and pass the reader to it, optionally pass a logger as second argument. Add at least one writer to the workflow.
  3. Optionally, add filters, item converters and value converters to the workflow.
  4. Process the workflow. This will read the data from the reader, filter and convert the data, and write the output to each of the writers. The process method also returns a Result object which contains various information about the import.

In other words, the workflow acts as a mediator between a reader and one or more writers, filters and converters.

Optionally you can skip items on failure like this $workflow->setSkipItemOnFailure(true). Errors will be logged if you have passed a logger to the workflow constructor.

Schematically:

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Reader;
use Ddeboer\DataImport\Writer;
use Ddeboer\DataImport\Filter;

$reader = new Reader\...;
$workflow = new Workflow($reader, $logger);
$result = $workflow
    ->addWriter(new Writer\...())
    ->addWriter(new Writer\...())
    ->addFilter(new Filter\CallbackFilter(...))
    ->setSkipItemOnFailure(true)
    ->process()
;

The workflow Result

The Workflow Result object exposes various methods which you can use to decide what to do after an import. The result will be an instance of Ddeboer\DataImport\Result. It is automatically created and populated by the Workflow. It will be returned to you after calling the process() method on the Workflow

The Result provides the following methods:

//the name of the import - which is an optional 3rd parameter to
//the Workflow class. Returns null by default.
public function getName();

//DateTime instance created at the start of the import.
public function getStartTime();

//DateTime instance created at the end of the import.
public function getEndTime();

//DateInterval instance. Diff off the start + end times.
public function getElapsed();

//Count of exceptions which caught by the Workflow.
public function getErrorCount();

//Count of processed items minus the count of exceptions caught.
public function getSuccessCount();

//Count of items processed
//This will not include any filtered items or items which fail conversion.
public function getTotalProcessedCount();

//bool to indicate whether any exceptions were caught.
public function hasErrors();

//An array of exceptions caught by the Workflow.
public function getExceptions();

Example use cases:

  • You want to send an e-mail with the results of the import
  • You want to send a Text alert if a particular file failed
  • You want to move an import file to a failed directory if there were errors
  • You want to log how long imports are taking

Readers

Readers read data that will be imported by iterating over it. This library includes a handful of readers. Additionally, you can easily implement your own.

You can use readers on their own, or construct a workflow from them:

$workflow = new Workflow($reader);

ArrayReader

Reads arrays. Most useful for testing your workflow.

CsvReader

Reads CSV files, and is optimized to use as little memory as possible.

use Ddeboer\DataImport\Reader\CsvReader;

$file = new \SplFileObject('/path/to/csv_file.csv');
$reader = new CsvReader($file);

Optionally construct with different delimiter, enclosure and/or escape character:

$reader = new CsvReader($file, ';');

Then iterate over the CSV file:

foreach ($reader as $row) {
    // $row will be an array containing the comma-separated elements of the line:
    // array(
    //   0 => 'James',
    //   1 => 'Bond'
    //   etc...
    // )
}
Column headers

If one of your rows contains column headers, you can read them to make the rows associative arrays:

$reader->setHeaderRowNumber(0);

foreach ($reader as $row) {
    // $row will now be an associative array:
    // array(
    //   'firstName' => 'James',
    //   'lastName'  => 'Bond'
    //   etc...
    // )
}
Strict mode

The CSV reader operates in strict mode by default. If the reader encounters a row where the number of values differs from the number of column headers, an error is logged and the row is skipped. Retrieve the errors with getErrors().

To disable strict mode, set $reader->setStrict(false) after you instantiate the reader.

Disabling strict mode means:

  1. Any rows that contain fewer values than the column headers are simply padded with null values.
  2. Any additional values in a row that contain more values than the column headers are ignored.

Examples where this is useful:

  • Outlook 2010: which omits trailing blank values
  • Google Contacts: which exports more values than there are column headers
Duplicate headers

Sometimes a CSV file contains duplicate column headers, for instance:

id details details
1 bla more bla

By default, a DuplicateHeadersException will be thrown if you call setHeaderRowNumber(0) on this file. You can handle duplicate columns in one of three ways:

  • call setColumnHeaders(['id', 'details', 'details_2']) to specify your own headers
  • call setHeaderRowNumber with the CsvReader::DUPLICATE_HEADERS_INCREMENT flag to generate incremented headers; in this case: id, details and details1
  • call setHeaderRowNumber with the CsvReader::DUPLICATE_HEADERS_MERGE flag to merge duplicate values into arrays; in this case, the first row’s values will become: [ 'id' => 1, 'details' => [ 'bla', 'more bla' ] ].

DbalReader

Reads data through Doctrine’s DBAL. Your project should include Doctrine’s DBAL package:

$ composer require doctrine/dbal
use Ddeboer\DataImport\Reader\DbalReader;

$reader = new DbalReader(
    $connection, // Instance of \Doctrine\DBAL\Connection
    'SELECT u.id, u.username, g.name FROM `user` u INNER JOIN groups g ON u.group_id = g.id'
);

DoctrineReader

Reads data through the Doctrine ORM:

use Ddeboer\DataImport\Reader\DoctrineReader;

$reader = new DoctrineReader($entityManager, 'Your\Namespace\Entity\User');

ExcelReader

Acts as an adapter for the PHPExcel library. Make sure to include that library in your project:

$ composer require phpoffice/phpexcel

Then use the reader to open an Excel file:

use Ddeboer\DataImport\Reader\ExcelReader;

$file = new \SplFileObject('path/to/excel_file.xls');
$reader = new ExcelReader($file);

To set the row number that headers will be read from, pass a number as the second argument.

$reader = new ExcelReader($file, 2);

To read the specific sheet:

$reader = new ExcelReader($file, null, 3);

OneToManyReader

Allows for merging of two data sources (using existing readers), for example you have one CSV with orders and another with order items.

Imagine two CSV's like the following:

OrderId,Price
1,30
2,15
OrderId,Name
1,"Super Cool Item 1"
1,"Super Cool Item 2"
2,"Super Cool Item 3"

You want to associate the items to the order. Using the OneToMany reader we can nest these rows in the order using a key which you specify in the OneToManyReader.

The code would look something like:

$orderFile = new \SplFileObject("orders.csv");
$orderReader = new CsvReader($file, $orderFile);
$orderReader->setHeaderRowNumber(0);

$orderItemFile = new \SplFileObject("order_items.csv");
$orderItemReader = new CsvReader($file, $orderFile);
$orderItemReader->setHeaderRowNumber(0);

$oneToManyReader = new OneToManyReader($orderReader, $orderItemReader, 'items', 'OrderId', 'OrderId');

The third parameter is the key which the order item data will be nested under. This will be an array of order items. The fourth and fifth parameters are "primary" and "foreign" keys of the data. The OneToMany reader will try to match the data using these keys. Take for example the CSV's given above, you would expect that Order "1" has the first 2 Order Items associated to it due to their Order Id's also being "1".

Note: You can omit the last parameter, if both files have the same field. Eg if parameter 4 is 'OrderId' and you don't specify parameter 5, the reader will look for the foreign key using 'OrderId'

The resulting data will look like:

//Row 1
array(
    'OrderId' => 1,
    'Price' => 30,
    'items' => array(
        array(
            'OrderId' => 1,
            'Name' => 'Super Cool Item 1',
        ),
        array(
            'OrderId' => 1,
            'Name' => 'Super Cool Item 2',
        ),
    ),
);

//Row2
array(
    'OrderId' => 2,
    'Price' => 15,
    'items' => array(
        array(
            'OrderId' => 2,
            'Name' => 'Super Cool Item 1',
        ),
    )
);

Create a reader

You can create your own data reader by implementing the Reader Interface.

Writers

ArrayWriter

Resembles the ArrayReader. Probably most useful for testing your workflow.

CsvWriter

Writes CSV files:

use Ddeboer\DataImport\Writer\CsvWriter;

$writer = new CsvWriter();
$writer->setStream(fopen('output.csv', 'w'));

// Write column headers:
$writer->writeItem(array('first', 'last'));

$writer
    ->writeItem(array('James', 'Bond'))
    ->writeItem(array('Auric', 'Goldfinger'))
    ->finish();

DoctrineWriter

Writes data through Doctrine:

use Ddeboer\DataImport\Writer\DoctrineWriter;

$writer = new DoctrineWriter($entityManager, 'YourNamespace:Employee');
$writer
    ->prepare()
    ->writeItem(
        array(
            'first' => 'James',
            'last'  => 'Bond'
        )
    )
    ->finish();

By default, DoctrineWriter will truncate your data before running the workflow. Call disableTruncate() if you don't want this.

If you are not truncating data, DoctrineWriter will try to find an entity having it's primary key set to the value of the first column of the item. If it finds one, the entity will be updated, otherwise it's inserted. You can tell DoctrineWriter to lookup the entity using different columns of your item by passing a third parameter to it's constructor.

$writer = new DoctrineWriter($entityManager, 'YourNamespace:Employee', 'columnName');

or

$writer = new DoctrineWriter($entityManager, 'YourNamespace:Employee', array('column1', 'column2', 'column3'));

The DoctrineWriter will also search out associations automatically and link them by an entity reference. For example suppose you have a Product entity that you are importing and must be associated to a Category. If there is a field in the import file named 'Category' with an id, the writer will use metadata to get the association class and create a reference so that it can be associated properly. The DoctrineWriter will skip any association fields that are already objects in cases where a converter was used to retrieve the association.

PdoWriter

Use the PDO writer for importing data into a relational database (such as MySQL, SQLite or MS SQL) without using Doctrine.

use Ddeboer\DataImport\Writer\PdoWriter;

$pdo = new \PDO('sqlite::memory:');
$pdo->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);

$writer = new PdoWriter($pdo, 'my_table');

ExcelWriter

Writes data to an Excel file. It requires the PHPExcel package:

$ composer require phpoffice/phpexcel
use Ddeboer\DataImport\Writer\ExcelWriter;

$file = new \SplFileObject('data.xlsx', 'w');
$writer = new ExcelWriter($file);

$writer
    ->prepare()
    ->writeItem(array('first', 'last'))
    ->writeItem(array('first' => 'James', 'last' => 'Bond'))
    ->finish();

You can specify the name of the sheet to write to:

$writer = new ExcelWriter($file, 'My sheet');

You can open an already existing file and add a sheet to it:

$file = new \SplFileObject('data.xlsx', 'a');   // Open file with append mode
$writer = new ExcelWriter($file, 'New sheet');

If you wish to overwrite an existing sheet instead, specify the name of the existing sheet:

$writer = new ExcelWriter($file, 'Old sheet');

ConsoleTableWriter

This writer displays items as table on console output for debug purposes when you start the workflow from the command-line. It requires Symfony’s Console component 2.5 or higher:

$ composer require symfony/console ~2.5
use Ddeboer\DataImport\Reader;
use Ddeboer\DataImport\Writer\ConsoleTableWriter;
use Symfony\Component\Console\Output\ConsoleOutput;
use Symfony\Component\Console\Helper\Table;

$reader = new Reader\...;
$output = new ConsoleOutput(...);

$table = new Table($output);

// Make some manipulations, e.g. set table style
$table->setStyle('compact');

$workflow = new Workflow($reader);
$workflow->addWriter(new ConsoleTableWriter($output, $table));

ConsoleProgressWriter

This writer displays import progress when you start the workflow from the command-line. It requires Symfony’s Console component:

$ composer require symfony/console
use Ddeboer\DataImport\Writer\ConsoleProgressWriter;
use Symfony\Component\Console\Output\ConsoleOutput;

$output = new ConsoleOutput(...);
$progressWriter = new ConsoleProgressWriter($output, $reader);

// Most useful when added to a workflow
$workflow->addWriter($progressWriter);

There are various optional arguments you can pass to the ConsoleProgressWriter. These include the output format and the redraw frequency. You can read more about the options here.

You might want to set the redraw rate higher than the default as it can slow down the import/export process quite a bit as it will update the console text after every record has been processed by the Workflow.

$output = new ConsoleOutput(...);
$progressWriter = new ConsoleProgressWriter($output, $reader, 'debug', 100);

Above we set the output format to 'debug' and the redraw rate to 100. This will only re-draw the console progress text after every 100 records.

The debug format is default as it displays ETA's and Memory Usage. You can use a more simple formatter if you wish:

$output = new ConsoleOutput(...);
$progressWriter = new ConsoleProgressWriter($output, $reader, 'normal', 100);

CallbackWriter

Instead of implementing your own writer, you can use the quick solution the CallbackWriter offers:

use Ddeboer\DataImport\Writer\CallbackWriter;

$workflow->addWriter(new CallbackWriter(function ($row) use ($storage) {
    $storage->store($row);
}));

AbstractStreamWriter

Instead of implementing your own writer from scratch, you can use AbstractStreamWriter as a basis, implemented the writeItem method and you're done:

use Ddeboer\DataImport\Writer\AbstractStreamWriter;

class MyStreamWriter extends AbstractStreamWriter
{
    public function writeItem(array $item)
    {
        fputs($this->getStream(), implode(',', $item));
    }
}

$writer = new MyStreamWriter(fopen('php://temp', 'r+'));
$writer->setCloseStreamOnFinish(false);

$workflow->addWriter(new MyStreamWriter());
$workflow->process();

$stream = $writer->getStream();
rewind($stream);

echo stream_get_contents($stream);

StreamMergeWriter

Suppose you have 2 stream writers handling fields differently according to one of the fields. You should then use StreamMergeWriter to call the appropriate Writer for you.

The default field name is discr but could be changed with the setDiscriminantField() method.

use Ddeboer\DataImport\Writer\StreamMergeWriter;

$writer = new StreamMergeWriter();

$writer->addWriter('first writer', new MyStreamWriter());
$writer->addWriter('second writer', new MyStreamWriter());

Create a writer

Build your own writer by implementing the Writer Interface.

Filters

A filter decides whether data input is accepted into the import process.

CallbackFilter

The CallbackFilter wraps your callback function that determines whether data should be accepted. The data input is accepted only if the function returns true.

use Ddeboer\DataImport\Filter\CallbackFilter;

// Don’t import The Beatles
$filter = new CallbackFilter(function ($data) {
    return ('The Beatles' != $data['name']);
});

$workflow->addFilter($filter);

OffsetFilter

OffsetFilter allows you to

  • skip a certain amount of items from the beginning
  • process only specified amount of items (and skip the rest)

You can combine these two parameters to process a slice from the middle of the data, like rows 5-7 of a CSV file with ten rows.

OffsetFilter is configured by its constructor: new OffsetFilter($offset = 0, $limit = null). Note: $offset is a 0-based index.

use Ddeboer\DataImport\Filter\OffsetFilter;

// Default implementation is to start from the beginning without maximum count
$filter = new OffsetFilter(0, null);
$filter = new OffsetFilter(); // You can omit both parameters

// Start from the third item, process to the end
$filter = new OffsetFilter(2, null);
$filter = new OffsetFilter(2); // You can omit the second parameter

// Start from the first item, process max three items
$filter = new OffsetFilter(0, 3);

// Start from the third item, process max five items (items 3 - 7)
$filter = new OffsetFilter(2, 5);

DateTimeThresholdFilter

This filter is useful if you want to do incremental imports. Specify a threshold DateTime instance, a column name (defaults to updated_at), and a DateTimeValueConverter that will be used to convert values read from the filtered items. The item strictly older than the threshold will be discarded.

use Ddeboer\DataImport\Filter\DateTimeThresholdFilter;
use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

new DateTimeThresholdFilter(
    new DateTimeValueConverter(),
    new \DateTime('yesterday')
);

ValidatorFilter

It’s a common use case to validate the data before you save it to the database. This is exactly what the ValidatorFilter does. To use it, include Symfony’s Validator component in your project:

$ compose require symfony/validator

The ValidatorFilter works as follows:

use Ddeboer\DataImport\Filter\ValidatorFilter;

$filter = new ValidatorFilter($validator);
$filter->add('email', new Assert\Email());
$filter->add('sku', new Assert\NotBlank());

The default behaviour for the validator is to collect all violations and skip each invalid row. If you want to stop on the first failing row you can call ValidatorFilter::throwExceptions(), which throws a ValidationException containing the line number and the violation list.

Item converters

MappingItemConverter

Use the MappingItemConverter to add mappings to your workflow. Your keys from the input data will be renamed according to these mappings. Say you have input data:

$data = array(
    array(
        'foo' => 'bar',
        'baz' => array(
            'some' => 'value'
        )
    )
);

You can map the keys foo and baz in the following way:

use Ddeboer\DataImport\ItemConverter\MappingItemConverter;

$converter = new MappingItemConverter();
$converter
    ->addMapping('foo', 'fooloo')
    ->addMapping('baz', array('some' => 'else'));

$workflow->addItemConverter($converter)
    ->process();

Your output data will now be:

array(
    array(
        'fooloo' => 'bar',
        'baz'    => array(
            'else' => 'value'
        )
    )
);

NestedMappingItemConverter

Use the NestedMappingItemConverter to add mappings to your workflow if the input data contains nested arrays. Your keys from the input data will be renamed according to these mappings. Say you have input data:

$data = array(
    'foo'   => 'bar',
    'baz' => array(
        array(
            'another' => 'thing'
        ),
        array(
            'another' => 'thing2'
        ),
    )
);

You can map the keys another in the following way.

use Ddeboer\DataImport\ItemConverter\NestedMappingItemConverter;

$mappings = array(
    'foo'   => 'foobar',
    'baz' => array(
        'another' => 'different_thing'
    )
);

$converter = new NestedItemMappingConverter('baz');
$converter->addMapping($mappings);

$workflow->addItemConverter($converter)
    ->process();

Your output data will now be:

array(
    'foobar' => 'bar',
    'baz' => array(
        array(
            'different_thing' => 'thing'
        ),
        array(
            'different_thing' => 'thing2'
        ),
    )
);

Create an item converter

Implement ItemConverterInterface to create your own item converter:

use Ddeboer\DataImport\ItemConverter\ItemConverterInterface;

class MyItemConverter implements ItemConverterInterface
{
    public function convert($item)
    {
        // Do your conversion and return updated $item
        return $changedItem;
    }
}

CallbackItemConverter

Instead of implementing your own item converter, you can use a callback:

use Ddeboer\DataImport\ItemConverter\CallbackItemConverter;

// Use a fictional $translator service to translate each value
$converter = new CallbackItemConverter(function ($item) use ($translator) {
    foreach ($item as $key => $value) {
        $item[$key] = $translator->translate($value);
    }

    return $item;
});

Value converters

Value converters are used to convert specific fields (e.g., columns in database).

DateTimeValueConverter

There are two uses for the DateTimeValueConverter:

  1. Convert a date representation in a format you specify into a DateTime object.
  2. Convert a date representation in a format you specify into a different format.
Convert a date into a DateTime object.
use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter('d/m/Y H:i:s');
$workflow->addValueConverter('my_date_field', $converter);

If your date string is in a format specified at: http://www.php.net/manual/en/datetime.formats.date.php then you can omit the format parameter.

use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter();
$workflow->addValueConverter('my_date_field', $converter);
Convert a date string into a differently formatted date string.
use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter('d/m/Y H:i:s', 'd-M-Y');
$workflow->addValueConverter('my_date_field', $converter);

If your date is in a format specified at: http://www.php.net/manual/en/datetime.formats.date.php you can pass null as the first argument.

use Ddeboer\DataImport\ValueConverter\DateTimeValueConverter;

$converter = new DateTimeValueConverter(null, 'd-M-Y');
$workflow->addValueConverter('my_date_field', $converter);

DateTimeToStringValueConverter

The main use of DateTimeToStringValueConverter is to convert DateTime object into it's string representation in proper format. Default format is 'Y-m-d H:i:s';

use Ddeboer\DataImport\ValueConverter\DateTimeToStringValueConverter;

$converter = new DateTimeToStringValueConverter;
$converter->convert(\DateTime('2010-01-01 01:00:00'));  //will return string '2010-01-01 01:00:00'

ObjectConverter

Converts an object into a scalar value. To use this converter, you must include Symfony’s PropertyAccess component in your project:

$ composer require symfony/property-access
Using __toString()

If your object has a __toString() method, that value will be used:

use Ddeboer\DataImport\ValueConverter\ObjectConverter;

class SecretAgent
{
    public function __toString()
    {
        return '007';
    }
}

$converter = new ObjectConverter();
$string = $converter->convert(new SecretAgent());   // $string will be '007'
Using object accessors

If your object has no __toString() method, its accessors will be called instead:

class Villain
{
    public function getName()
    {
        return 'Bad Guy';
    }
}

class Organization
{
    public function getVillain()
    {
        return new Villain();
    }
}

use Ddeboer\DataImport\ValueConverter\ObjectConverter;

$converter = new ObjectConverter('villain.name');
$string = $converter->convert(new Organization());   // $string will be 'Bad Guy'

StringToObjectConverter

Looks up an object in the database based on a string value:

use Ddeboer\DataImport\ValueConverter\StringToObjectConverter;

$converter = new StringToObjectConverter($repository, 'name');
$workflow->addValueConverter('input_name', $converter);

CallbackValueConverter

Use this if you want to save the trouble of writing a dedicating class:

use Ddeboer\DataImport\ValueConverter\CallbackValueConverter;

$callable = function ($item) {
    return implode(',', $item);
};

$converter = new CallbackValueConverter($callable);
$output = $converter->convert(array('foo', 'bar')); // $output will be "foo,bar"

MappingValueConverter

Looks for a key in a hash you must provide in the constructor:

use Ddeboer\DataImport\ValueConverter\MappingValueConverter;

$converter = new MappingValueConverter(array(
    'source' => 'destination'
));

$converter->convert('source'); // destination
$converter->convert('unexpected value'); // throws an UnexpectedValueException

Examples

Import CSV file and write to database

This example shows how you can read data from a CSV file and write that to the database.

Assume we have the following CSV file:

event;beginDate;endDate
Christmas;20131225;20131226
New Year;20131231;20140101

And we want to write this data to a Doctrine entity:

namespace MyApp;

use Doctrine\ORM\Mapping as ORM;

/**
 * @ORM\Entity
 */
class Event
{
    /**
     * @ORM\Column()
     */
    protected $event;

    /**
     * @ORM\Column(type="datetime")
     */
    protected $beginDate;

    /**
     * @ORM\Column(type="datetime")
     */
    protected $endDate;

    public function setEvent($event)
    {
        $this->event = $event;
    }

    public function setBeginDate($date)
    {
        $this->beginDate = $date;
    }

    public function setEndDate($date)
    {
        $this->endDate = $date;
    }

    // And some getters
}

Then you can import the CSV and save it as your entity in the following way.

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Reader\CsvReader;
use Ddeboer\DataImport\Writer\DoctrineWriter;
use Ddeboer\DataImport\ValueConverter\StringToDateTimeValueConverter;

// Create and configure the reader
$file = new \SplFileObject('input.csv');
$csvReader = new CsvReader($file);

// Tell the reader that the first row in the CSV file contains column headers
$csvReader->setHeaderRowNumber(0);

// Create the workflow from the reader
$workflow = new Workflow($csvReader);

// Create a writer: you need Doctrine’s EntityManager.
$doctrineWriter = new DoctrineWriter($entityManager, 'MyApp:Event');
$workflow->addWriter($doctrineWriter);

// Add a converter to the workflow that will convert `beginDate` and `endDate`
// to \DateTime objects
$dateTimeConverter = new StringToDateTimeValueConverter('Ymd');
$workflow
    ->addValueConverter('beginDate', $dateTimeConverter)
    ->addValueConverter('endDate', $dateTimeConverter);

// Process the workflow
$workflow->process();

Export to CSV file

This example shows how you can export data to a CSV file.

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Reader\ArrayReader;
use Ddeboer\DataImport\Writer\CsvWriter;
use Ddeboer\DataImport\ValueConverter\CallbackValueConverter;

// Your input data
$reader = new ArrayReader(array(
    array(
        'first',        // This is for the CSV header
        'last',
        array(
            'first' => 'james',
            'last'  => 'Bond'
        ),
        array(
            'first' => 'hugo',
            'last'  => 'Drax'
        )
    ))
);

// Create the workflow from the reader
$workflow = new Workflow($reader);

// Add the writer to the workflow
$file = new \SplFileObject('output.csv', 'w');
$writer = new CsvWriter($file);
$workflow->addWriter($writer);

// As you can see, the first names are not capitalized correctly. Let's fix
// that with a value converter:
$converter = new CallbackValueConverter(function ($input) {
    return ucfirst($input);
});
$workflow->addValueConverter('first', $converter);

// Process the workflow
$workflow->process();

This will write a CSV file output.csv where the first names are capitalized:

first;last
James;Bond
Hugo;Drax

ArrayValueConverterMap

The ArrayValueConverterMap is used to filter values of a multi-level array.

The converters defined in the list are applied on every data-item's value that match the defined array_keys.

//...
$data = array(
    'products' => array(
        0 => array(
            'name' => 'some name',
            'price' => '€12,16',
        ),
        1 => array(
            'name' => 'some name',
            'price' => '€12,16',
        )
    )
);

// ...
// create the workflow and reader etc.
// ...

$workflow->addValueConverter(new ArrayValueConverterMap(array(
    'name' => array(new CharsetValueConverter('UTF-8', 'UTF-16')), // encode to UTF-8
    'price' => array(new CallbackValueConverter(function ($input) {
        return str_replace('€', '', $input); // remove € char
    }),
)));

// ..
// after filtering data looks as follows
$data = array(
    'products' => array(
        0 => array(
            'name' => 'some name', // in UTF-8
            'price' => '12,16',
        ),
        1 => array(
            'name' => 'some name',
            'price' => '12,16',
        )
    )
);

Running the tests

Clone this repository:

$ git clone https://github.com/ddeboer/data-import.git
$ cd data-import

Install dev dependencies:

$ composer install --dev

And run PHPUnit:

$ phpunit

License

DataImport is released under the MIT license. See the LICENSE file for details.

data-import's People

Contributors

aydinhassan avatar baachi avatar bastnic avatar bburnichon avatar boekkooi avatar bryant1410 avatar ddeboer avatar gnat42 avatar greg0ire avatar gries avatar hugohenrique avatar igormukhingmailcom avatar ip512 avatar ivbre avatar louterrailloune avatar mattrq avatar metabor avatar mrafalkoitr avatar mroca avatar pasinter avatar polem avatar potherca avatar rjmackay avatar sagikazarmark avatar seanhussey avatar spib avatar stevelacey avatar tholder avatar woutersioen avatar zales0123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-import's Issues

Doctrine writer - don't overwrite data

Hi,

Currently the doctrine writer would overwrite the table with the new data it's writing, is there a way to make it append to the table instead?

Thank you

skipOnItemFailure fails?

Hi am I correct to assume that the check of skip on Failure is not working on current implemented writers? and that callbackwriters need to throw their own implementation of ExceptionInterface?

I am currently using the DoctrineWriter but on flush it fails on an item but only a Doctrine exception is thrown and nowhere caught which leads to immediate termination of the workflow.

DoctrineWriter array index

Hi,

First of all, I would really like to thank you for this great library, and this is why I would like to propose the extending of the DoctrineWriter, to accept also an array for the index, to be able to find entities by multiple clauses.

I've extended the DoctrineWriter's writeItem() method like this:

if (false === $this->truncate) {
    if ($this->index) {
        $fields = array();
        if (is_array($this->index)) {
            foreach ($this->index as $index) {
                $fields[$index] = $item[$index];
            }
        } else {
            $fields[$this->index] = $item[$this->index];
        }
        $entity = $this->entityRepository->findOneBy($fields);
    } else {
        $entity = $this->entityRepository->find(current($item));
    }
}

This way the index parameter can be an array

$writer = new DoctrineWriter($em, $entityClass, array('field1', 'field2'));

I could try to make a PR if you think this is a feature that should be in the library.

Processing files in chunks

Use-case example:
We have a large csv file, we can't process it in one call, because we could run into memory or time limits.

Use-case solution:
Process it in chunks. Process X records, save the result, show user a message that X records were processed and refresh his page. On refresh start from where we left, process next X records, and so on until we have process them all.

The problem:
Currently there is no way to tell the Workflow to skip first X records.
Using reader->seek(X) will not work, because in Workflow the foreach ($this->reader as $item) will first, automatically, call rewind and then start the cycle.

The proposed solution:
Replace foreach with do ... while this will allow to user reader->seek to move to the correct position.

do {
    if( !$this->reader->valid() ) break;

    $item = $this->reader->current();
    ...
} while( false !== $this->reader->next() );

Merge filters and item converters

The current division between filters and converters creates some confusion:

  • item converters can return false, causing the record to be skipped, so effectively acting as filters
  • we have been forced to add before and after conversion filters to make them work together with converters.

We could merge filters and item converters. They are then combined into one chain, so we wouldn't need the before/after conversion distinction any longer. Possible downsides:

  • a blurring of the single responsibility principle: the new ‘filter–converters’ become responsible for both filtering items and converting them
  • having a single chain (ordered by priority) could be more confusing for users: which filters should use the unconverted, and which the converted, column names? We may overcome this by getting rid of the priorities and just following the order that the components were added to the chain, so:
$workflow->addItemConverter($convC)
    ->addItemConverter($convA)   
    ->addItemConverter($convB)
    ->process();   // Will process filter-converters in order C, A, B

What do you think? By the way, let’s keep the separation between item and value converters, because of the latter’s ability for re-use between multiple values/columns.

OneToManyReader

Hey there, I've spent some hours lately fighting with onetomany reader (nested!) and unfortunatelly I hit the brick wall when tried to fix it!

Firstly, my use-case involves two nested onetomany readers. The workflow looks like this.

$productsReader = new CsvReader($productsFile);
$productsReader->setColumnHeaders(array('reference','unit','shortDescription','name','description','tags','attributes','categories'));

$variantsReader = new CsvReader($variantsFile);
$variantsReader->setColumnHeaders(array('productReference', 'reference', 'name', 'minimalQuantity', 'boxQuantity'));

$pricesReader = new CsvReader($pricesFile);
$pricesReader->setColumnHeaders(array('productReference','variantReference','minQty','price','group'));

$variantPriceReader = new OneToManyReader($variantsReader, $pricesReader, 'prices', 'reference', 'variantReference');
$productVariantReader = new OneToManyReader($productsReader, $variantPriceReader, 'variants', 'reference', 'productReference');

$writer = new DoctrineWriter($entityManager, 'MadnessCoreBundle:Product', 'reference');
$writer->disableTruncate();

$workflow = new Workflow($productVariantReader);
$workflow->addItemConverter(new ProductConverter($entityManager));
$workflow->addWriter($writer);

There are several problems with onetomany reader:

  1. by default, the last line of right reader is ommited
  2. if sequence of "matched" csv lines is interrupted by one that doesn much, then the matching is over and lines that occur later in a file are not matched.

The root cause comes from OneToManyReader.php, method current().
https://github.com/ddeboer/data-import/blob/master/src/Ddeboer/DataImport/Reader/OneToManyReader.php#L75

First issue comes from the fact, that inside of the loop, iterator moves pointer to the position for next loop cycle. When iterator moves pointer to the last item (in the penultimate cycle), valid() returns false, indicating EOF and the valid() condition on while returns false (despite the fact that $rightRow holds valid data)

The second issue is due to $leftId == $rightId condition on the while loop. Once we get to line that doesn't match, loop is over and following lines are not matched. Solution in this case appeared to be moving the condition inside the cycle, allowing iterator to move on.

And now the scary part. I've spent some great number of hours fixing method and I couldn't make it work. This is my take, please have a look and give me some hint why my fixed method doesn't work.

public function current()
{
    $leftRow = $this->leftReader->current();

    if (array_key_exists($this->nestKey, $leftRow)) {
        throw new ReaderException(
            sprintf(
                'Left Row: "%s" Reader already contains a field named "%s". Please choose a different nest key field',
                $this->key(),
                $this->nestKey
            )
        );
    }
    $leftRow[$this->nestKey] = array();

    $leftId     = $this->getRowId($leftRow, $this->leftJoinField);

    while ($this->rightReader->valid()) {

        $rightRow = $this->rightReader->current();
        $rightId = $this->getRowId($rightRow, $this->rightJoinField);

        if ($leftId == $rightId) {
            $leftRow[$this->nestKey][] = $rightRow;
        }

        $this->rightReader->next();
    }

    var_dump($leftRow);

    return $leftRow;
}

The problem now is, that the upper onetomany in the hierarchy only iterates once, and then becomes invalid (eof == true). (it prints out that iterator's key is 0 and count is 4).

Testing CSV:

products.csv

"roll-stand","1ks","vždy k dispozícii na našom sklade;kvalitná zaklápacia lišta;vysúvacie nožičky pre lepšiu stabilitu;dodávané so štandardnou prenosnou taškou","Roll-Up Standard","Náš najpredávanejší bannerový systém, ktorý je cenovo ekonomický a zároveň veľmi kvalitný a stabilný.","paravan rollup; roletka; mosquito","Dodacia lehota|1 – 3 pracovné dni;Použitie|interiér;Príprava grafiky|PDF, 85x200cm, 100 DPI, 1:1, spadávka 6 cm zo spodnej strany;Rozmer grafiky|85/100/120/150 x 200 cm (š x v);Záruka|2 roky;Minimálny odber|1","7;33"

variants.csv

"roll-stand","roll-stand-85","85x200cm s tlačou","1","1"
"roll-stand","roll-stand-100","100x200cm s tlačou","1","1"
"roll-bullshit","roll-bullshit-10","bullshit","1","1"
"roll-stand","roll-stand-120","120x200cm s tlačou","1","1"
"roll-stand","roll-stand-150","150x200cm s tlačou","1","1"

prices.csv

"roll-stand","roll-stand-85","1","63","1"
"roll-stand","roll-stand-85","3","61","1"
"roll-stand","roll-stand-85","6","59","1"
"roll-stand","roll-stand-85","10","57","1"
"roll-stand","roll-stand-100","1","83","1"
"roll-stand","roll-stand-100","3","81","1"
"roll-stand","roll-stand-100","6","79","1"
"roll-stand","roll-stand-100","10","77","1"
"roll-stand","roll-stand-120","1","99","1"
"roll-stand","roll-stand-120","3","97","1"
"roll-stand","roll-stand-120","6","95","1"
"roll-stand","roll-stand-120","10","93","1"
"roll-stand","roll-stand-150","1","109","1"
"roll-stand","roll-stand-150","3","107","1"
"roll-stand","roll-stand-150","5","105","1"
"roll-stand","roll-stand-150","10","103","1"

DBALReader runs out of memory

We are reading large dataset (100K rows) and using DBALReader passing a connection object and the raw SQL to be used for select.

When we run this workflow.process command though the system runs out of memory. Any suggestions of how we can handle this situation? is there a setting we can use to do the processing in batch mode?

CSV Example has an error

https://github.com/ddeboer/data-import#export-to-csv-file

Firstly the array is broke, it needs to be:-

        $reader = new ArrayReader(
            array(
                array(
                    'first',        // This is for the CSV header
                    'last',
                    array(
                        'first' => 'james',
                        'last'  => 'Bond'
                    ),
                    array(
                        'first' => 'hugo',
                        'last'  => 'Drax'
                    )
                )
            )
        );

After running the script it then throws an error exception of 'Array to string conversion'

I'm using Laravel 4.1 to achieve this. Any ideas on how to fix this?

Enhance the ExcelReader with the same features as the CsvReader

The CsvReader has a couple of features that would be useful to the ExcelReader as well.

  • Duplicate header handling (merging, incrementing, throwing an exception)
  • Non-strict mode. Detect trailing null's in the headers and automatically ignore those columns.

Is there any interest in adding these features to the ExcelReader as well?

Let components store feedback

Thanks to @AydinHassan (#93) the workflow now returns a result object with exceptions caught during process(). I think it would be useful to make it possible for components to add non-exception feedback about their actions too. For instance, a converter could send out a notification that some empty fields were filled with default values.

How should we go about this?

  1. Construct the Result object at the beginning of process() and pass that to each component. The interfaces then change to convert($item, Result $result), filter($item, Result $result) etc. We then add a addFeedback() and getFeedbackCollection() (or something similar) to Result. The Result class could automatically add the current row number to the feedback item, something which the filters and converters currently have no access to.
  2. Downside to the above solution is that the $result object may well be unused in many components. So perhaps adding a separate ReporterInterface is better. If components implement that they must add an setResult(Result $result) method. In their convert($item)/filter($item) etc. methods they could then access $this->result to add feedback. In this case, the workflow must take care of calling setResult() at the beginning of process().

@Baachi @AydinHassan Which do you prefer? Do you see any alternatives?

Association support

Hey!

I missed the association support for the doctrine ORM.
Any ideas, how to implement this feature?

Add Gaufrette adapter source

The Source is meant as a filesystem abstraction: to get a file over HTTP (using Source/Http) or any other means. The Gaufrette library already offers filesystem abstraction, so let's create a GaufretteAdapter source.

Proper handling of incorrect CSV files

I have encountered an error that is related to how the CsvReader handles an incorrectly formatted file\line.

Data ( incorrectly formatted, missing last col ):

col1;col2;col3
"37253253575";

Workflow:

use Ddeboer\DataImport\Workflow;
use Ddeboer\DataImport\Source\Stream;
use Ddeboer\DataImport\Reader\CsvReader;

$source = new Stream('test.csv');

$csvReader = new CsvReader($source->getFile());
$csvReader->setHeaderRowNumber(0);

$workflow = new Workflow($csvReader);

$workflow->addWriter(new \Ddeboer\DataImport\Writer\CallbackWriter(
    function($record) {
    // ...
    }
));

$workflow->process();

Error:

Argument 1 passed to Ddeboer\DataImport\Workflow::filterItem() must be an array, null given

This error is related to how CsvReader current method works.
So if you have a look at the method, you will see this code:

    public function current()
    {
        $line = $this->file->current();

        // If the CSV has column headers, use them to construct an associative
        // array for the columns in this line
        if (!empty($this->columnHeaders)) {
            // Count the number of elements in both: they must be equal.
            // If not, ignore the row
            if (count($this->columnHeaders) == count($line)) {
                return array_combine(array_values($this->columnHeaders), $line);
            }

        } else {
            // Else just return the column values
            return $line;
        }
    }

The problem is in case the count($this->columnHeaders) is not equal to count($line), and the method returns null, so the process method in the Workflow fails with the error.

A proposed fix is to throw an Exception in case the header and line columns count do not match. This will allow to correctly handle incorrect files\lines. It would also be good if there was a way to continue processing the file in case a single line in a file is incorrect.

Add count() to readers

Make the ReaderInterface extend \Countable, so each reader must be able to report it's own data size. This makes it easier to implement progress reporting.

WriterInterface is not properly implemented

In WriterInterface, the phpdoc type hints are not respected, it would be better not to have return typehint like:

interface WriterInterface
{
    /**
     * Prepare the writer before writing the items
     */
    public function prepare();

    /**
     * Write one data item
     *
     * @param array $item The data item with converted values
     */
    public function writeItem(array $item);

    /**
     * Wrap up the writer after all items have been written
     */
    public function finish();
}

No XML Reader

I'm migrating a website from symfony 1 to 2, changing from MySQL to PostgreSQL in the process, and I'm willing to use this library to achieve this. mysqldump offers the possibility to dump to xml, but not to CSV.

I implemented an xml reader that using @hakre 's XMLReaderIterator library (which offers the possiblity to read the xml chunk by chunk instead of loading the whole file in memory).

Here is my implementation :

<?php

namespace MyProject\MyBundle\DdeboerDataImport;

use Ddeboer\DataImport\Reader\ReaderInterface;

class XmlReader implements ReaderInterface
{
    private $filename;
    private $reader;
    private $fields;

    public function __construct($filename, $delimiter, $fields)
    {
        $this->filename = $filename;
        $this->fields   = $fields;
        $this->reader   = new \XmlReader();
        if (!is_file($this->filename)) {
            throw new \RunTimeException(sprintf(
                'File "%s" not found.',
                $this->filename
            ));
        }
        $this->reader->open($this->filename);
        $this->iterator = new \XmlElementIterator($this->reader, $delimiter);
    }

    // Iterator imlementation

    public function current()
    {
        $row = array();

        if ($this->iterator->current()->getSimpleXmlElement() != null) {
            foreach ($this->iterator->current()->getSimpleXmlElement() as $element) {
                $row[(string) $element['name']] = (string) $element;
            }
        }

        return $row;
    }

    public function rewind()
    {
        $this->iterator->rewind();
    }

    public function next()
    {
        return $this->iterator->next();
    }

    public function valid()
    {
        return $this->iterator->valid();
    }

    public function key()
    {
        return $this->iterator->key();
    }


    // Countable implementation

    public function count()
    {
        return iterator_count($this->iterator);
    }

    public function getFields()
    {
        return $this->fields;
    }
}

Would you like me to contribute it ?

CsvReader returns wrong number of columns in row if column headers have duplicate labels

Given 2 column headers with identical labels, the array_combine function in the CsvReader->current() method: https://github.com/ddeboer/data-import/blob/master/src/Ddeboer/DataImport/Reader/CsvReader.php#L106
returns an associative array with only the last column and value (normal php behaviour).

I would have fixed and submitted a pull request, but how to fix this represents a design decision -- either disallow duplicate headers (what I have done, but don't like it) and generate an error when seen, or pad the duplicate headers with an occurrence value ("header","header1"), or return the values under duplicate headers as an array of values attached to the header, which are then subsequently handled with an is_array check when the line is ultimately used.

Multiple Array level converter

Hi,

first off thank you for this great library it really helped me getting some data-export done very fast, but now I think I've reached some kind of limit and wanted to know how I'm able to overcome that limit.

The thing is, I've got data that is formated in an array with multiple levels like:

$data = array(
  'name' => 'some name',
  'links' => array(
    'http://somelink',
    'http://foolink'
  )
);

Currently it is not possible to add a ValueConverter for every link of the links array. Right now I have to create my own ValueConverter that handles every link. This is fine as long as the data is a simple as the example above. But if the "links" array was more complex with diffrent properties like:

$data = array(
  'name' => 'some name',
  'links' => array(
    0 => array(
      'url' => 'http://someurl',
      'name' => 'some_name' 
  )
);

My solution is far from optimal.

So what would be the best approach to do this?

Filter is only executed 99 times

I have a CSV file with about 700 rows and I wanted to filter through them. I added a custom filter to the workflow, however I realised that the filter() method was only executed 99 times and not for every row in the input.

When I looked into the issue I found that SplPriorityQueue is consuming, for example,

$q = new SplPriorityQueue();
$q->insert('foobar', 1);

printf("count()=%d\n", count($q)); // count()=1
foreach ($q as $v) { echo "$v\n"; }
printf("count()=%d\n", count($q)); // count()=0

I don't really know why in my Workflow the filter is executed exactly 99 times, however, when I replaced the queue with a simple array everything worked fine and the filter was executed for every row in my input.

boolean type problem in doctrineWriter

It is not possible to change a boolean value to false in a entity with doctrineWriter.
Line 201 in doctrineWriter.php says "if (!$value) continue;". Could it be changed to "if ($value === NULL) "?

Probleme managing files after reading

I think that due to the use of SplFileObject, and its copy in the readers, I can't move, delete... files after the execution of the workflow.

It would be great to clean all the FD at the end of the workflow.

Or am I missing something?

Regards

Drop php 5.3 support

PHP 5.3 is most than 1 year EOL. PHP 5.4 have some improvments, like the new array syntax and traits and other cool things.

What would be the benefit, to drop 5.3 support?

We can make the code more clearly and easier to understand. We can also use the new features to make the library more cooler 👍. And last but not least, it will be easier to help other peoples to use this library.

I would be happy to hear some feedback.

CsvReader->count() broken

  • moves file pointer to EOF without putting it back where it was
  • PHP crashes when calling count() on a csv with incorrect number of header columns

MappingItemConverter nested arrays not mapable

Hi,

I'm currently having issues while trying to map data with a structure like this:

$data = array(
    array(
        'foo' => 'bar',
        'Persons' => array(
            0 => array('name' => 'person a', 'age' => 15),
            1 => array('name' => 'person b', 'age' => 16)
        )
    )
);

Now I want to map every entry of a person according to a mapping schema.

$converter = new MappingItemConverter();
$converter->addMapping('Persons', array('name' => 'surename'));

This is currently not possible as the MappingItemConverter will look for:

$item['Persons']['name']; 

Would it be possible to add such a feature ? I'm currently looking at the Code myself but I did not find a solution yet.

Merging several columns to array

Hello!

I have CSV with such columnts:

№, Question, Answer A, Answer B, answer C, right answer

I want to merge columns Answer A, Answer B, Answer C to array. I couldn't find how to do this. I've tried to set setColumnHeaders(['id', 'question', 'answer', 'answer', 'answer', 'rightAnswer']) and then setHeaderRowNumber(0, CsvReader::DUPLICATE_HEADERS_MERGE); but it didn't work.
Could you please advise?

Example to show a simple CSV export?

I have heard good stuff about the library but the lack of documentation is a big hurdle. Can you kindly show an example of exporting some data as a CSV export?

Allow valueConverters to be called before itemConverters

I have an itemConverter that lookup entities in the database, but I also have valueConverters to transform my data. However, itemConverter are called before the data is converted, and then it does not find my entity (because it use the unconverted values).

DoctrineWriter use imported id for GeneratedValue(strategy="AUTO")

Hi is possible to use imported id when i have row witch - GeneratedValue(strategy="AUTO") ?
id default i get new auto generated id so after import i have other ids in database

if i add
$this->entityMetadata->setIdGenerator(new \Doctrine\ORM\Id\AssignedGenerator());
$this->entityMetadata->setIdGeneratorType(\Doctrine\ORM\Mapping\ClassMetadata::GENERATOR_TYPE_CUSTOM);

i must turn off SET FOREIGN_KEY_CHECKS=1

bacause doctrine first try insert new connections before inserting entities with id.

(conflict is when i have
entity1 = have id
entity2 = new
and have connection entity1 -> entity2 @Orm\ManyToMany with join table

One CSV File to Object with Associations

Hello,

I'm trying to figure out if this situation is possible with this bundle. I have a CSV file that has one row for an entity + one of its associations' fields. So for example, suppose I have the following entities.

<?php 
class Entity1 {
/**
* @var string
*/
 private $name;

/**
* @var Lab
*/
private $lab;
}

class Lab {
  /**
  * @var name
  */
  private $name;
}

and a CSV file with the following rows/data:

name1,lab.name1
name2,lab.name2

Is the above possible?

As a second question. Is it possible to configure a workflow to ignore a column?

Remove duplicates

Hi and happy new year!

How can we remove duplicates before importing to database?

Thank you

Filter priorities

Currently, the filters can be added to the workflow in any order. The ValidatorFilter, however, should be the first filter to be processed. Would it make sense to add filters to the workflow by priority, where each filter class implements a getPriority method, and the workflow calls that to determine the order of filters?

Add ability to specify a value converter for all fields.

I came across a scenario where I'd like to convert all field values when they are empty to convert them to NULL. I don't know if this is a good idea or not, but currently it looks like the only way to convert empty values to NULL would be to assign a value converter to each field.

Optimize count in CsvReader

CsvReader::current() counts the column header array on each iteration. It may be faster too save the count once in a class property, and check that in current().

Created composed key

Hi,

I have a csv file with 10 columns. I must create a composed key with 2 columns of this csv file when I insert or update into database.

How can i do this ? It seems that DoctrineWriter's constructor only accept a simple column like this :
$writer = new DoctrineWriter($em, $curEntityClass, 'columnId');

Sorry for my english.

Thanks

MappingItemConverter skips null fields

In Ddeboer\DataImport\ItemConverter\MappingItemConverter

    /**
     * Applies a mapping to an item
     *
     * @param array  $item
     * @param string $from
     * @param string $to
     *
     * @return array
     */
    protected function applyMapping(array $item, $from, $to)
    {
        // skip fields that dont exist
        if (!isset($item[$from])) {
            return $item;
        }

isset also skips fields which do exist but have a null value, this can be fixed by replacing the isset code (line 79) with:

if (!array_key_exists($from, $item)) { 

If you explicitly do not want to have null fields mapped then you could use another ItemConverter which filters empty keys. Or make it optional behaviour of MappingItemConverters.

Great bundle!

How can I get inserted IDs after write?

I'm trying to import categories from another database. Categories have a parent_id to create a hierarchy.
I need to be able to store a mapping of old to new IDs so I can use these to store the correct parent_id for child categories. However it doesn't seem like writers can return / store that info anywhere?
How would I get the inserted ID from the writer, and store that along with the original ID?

Any suggestions?

Suggestion: Accept streams for file writers

What do you think to this?

I think it would be nice to allow passing in streams to the file writers. eg CSV and Excel writers. Currently I'm using some Filesystem abstraction libraries and it would be cool to pass the streams directly to these writers, instead of hard-coding file paths.

If this is something you'd be interested in, I'll have a mess about and provide a PR, although I'm not 100% on how I'd implement it!

filters and afterConversionFilters use same Queue

First, thanks for this great library! Actually i'm writing a more high-level importer-engine, based on you library right now.

But - I think i found a bug :)
Is it intended that the registered filters and afterConversionFilters (in Workflow) are both getting invoked everytime before conversation and after?
I think the construction of the Queues in workflow's constructor should more be like

$this->filters = new \SplPriorityQueue();
$this->afterConversionFilters = new \SplPriorityQueue();

instead of

$this->filters = $this->afterConversionFilters = new \SplPriorityQueue();

Regards,
Markus

CsvReader's default delimiter should be a comma

Hello,

I just spent awhile trying to figure out why my CSV file was not being read properly after creating a reader with new CsvReader($file);. It took awhile for me to figure out the CsvReader's constructor has a default delimiter of ';'. I find it bizarre that a CSV (aka Comma Separated Value) reader uses a semi colon instead of a comma. Is this not a bug?

String to Object convertor results in catchable fatal error: must be an instance ... string given

$em = $this->getDoctrine()->getManager();
$doctrineWriter = new DoctrineWriter($em, 'BillingBundle:CDR');
$workflow->addWriter($doctrineWriter);

$objAccount = new StringToObjectConverter($em->getRepository('AccountBundle:Account'), 'accountName');
$workflow->addValueConverter('Klant', $objAccount);

$workflow->addMapping('Klant', 'account');

Results in a catchable error: must be instance ... string given...

Goal is to map the particular CSV row based on the account name given in one of its columns to a ManyToOne "Account". Any idea how to solve this?

ExcelReader

loading a .xlsx file, the $reader->getColumnHeaders() returns an array with all the filled columns (with some text) and all NULL ones.

thing is, when reading a row, it reads one more pair column:value than the existing so the row has all its column:values right and the last is "":null.

Make all Reader-classes 'serviceable'

The reader classes for Array, Excel and Csv all depend on runtime injection of working-data in the __construct-call. This makes the classes non-serviceable.

I have a current situation in which i need to override the second parameter of the CsvReader, so i'd like to make a specific service for this reader, instead of modifying a number of 'new CsvReader()' -calls.

Maybe add another interface that requires a setData($mixed) -method? What's you're idea?

Fields rendering as NULL when they aren't NULL in CSV file

I've followed the example provided in the read me, but when I provide it with the CSV file I need to import an exception is returned in Symfony2. The exception comes from some of the fields in the database needing a value passed to them. The CSV file though has these fields filled in though.

I've checked the CSV file in both LibreOffice Calc and Sublime Text and the values are present in the CSV. But as soon as the CSV is passed through to data-import, they are suddenly NULL.

Funny thing is it's not doing it for every single column. Some columns are INT and VARCHAR(), but not all VARCHAR fields are NULL'ed by the system.

Add Capability to insert new mappings/columns that aren't included in read data

Hello,

I'm writing an importer. I have a source CSV file that I can read from, but does not include some requisite fields I would like to add to the data once read in to then write to my database. In essence, I want to add columns much like the MappingItemConverter works. for example, I would like to insert a new mapping "pid" which would have a value of "1" for each row, or whatever arbitrary value I might want to put in.

Thank you for considering this!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.