Giter Club home page Giter Club logo

php-avro-schema-generator's Introduction

Avro schema generator for PHP

Actions Status Maintainability Test Coverage Supported PHP versions: 7.4 .. 8.x Latest Stable Version

Installation

composer require php-kafka/php-avro-schema-generator "^3.0"

Description

This library enables you to:

  • Manage your embedded schema as separate files
  • The library is able to merge those files
  • The library is able to generate avsc schema templates from PHP classes

Merging subschemas / schemas

Schema template directories: directories containing avsc template files (with subschema)
Output directory: output directory for the merged schema files

Console example

./vendor/bin/avro-cli avro:subschema:merge ./example/schemaTemplates ./example/schema

PHP example

<?php

use PhpKafka\PhpAvroSchemaGenerator\Registry\SchemaRegistry;
use PhpKafka\PhpAvroSchemaGenerator\Merger\SchemaMerger;

$registry = (new SchemaRegistry())
    ->addSchemaTemplateDirectory('./schemaTemplates')
    ->load();

$merger = new SchemaMerger('./schema');
$merger->setSchemaRegistry($registry);

$merger->merge();

Merge optimizers

There are optimizers that you can enable for merging schema:

  • FullNameOptimizer: removes unneeded namespaces
  • FieldOrderOptimizer: the first fields of a record schema will be: type, name, namespace (if present)
  • PrimitiveSchemaOptimizer: Optimizes primitive schema e.g. {"type": "string"} to "string"

How to enable optimizer:

Console example

./vendor/bin/avro-cli --optimizeFullNames --optimizeFieldOrder --optimizePrimitiveSchemas avro:subschema:merge ./example/schemaTemplates ./example/schema

PHP Example

<?php

use PhpKafka\PhpAvroSchemaGenerator\Registry\SchemaRegistry;
use PhpKafka\PhpAvroSchemaGenerator\Merger\SchemaMerger;
use PhpKafka\PhpAvroSchemaGenerator\Optimizer\FieldOrderOptimizer;
use PhpKafka\PhpAvroSchemaGenerator\Optimizer\FullNameOptimizer;
use PhpKafka\PhpAvroSchemaGenerator\Optimizer\PrimitiveSchemaOptimizer;

$registry = (new SchemaRegistry())
    ->addSchemaTemplateDirectory('./schemaTemplates')
    ->load();

$merger = new SchemaMerger('./schema');
$merger->setSchemaRegistry($registry);
$merger->addOptimizer(new FieldOrderOptimizer());
$merger->addOptimizer(new FullNameOptimizer());
$merger->addOptimizer(new PrimitiveSchemaOptimizer());

$merger->merge();

Generating schemas from classes

You will need to adjust the generated templates, but it gives you a good starting point to work with.
Class directories: Directories containing the classes you want to generate schemas from
Output directory: output directory for your generated schema templates
After you have reviewed and adjusted your templates you will need to merge them (see above)

Console example

./vendor/bin/avro-cli avro:schema:generate ./example/classes ./example/schemaTemplates

PHP Example

<?php

use PhpKafka\PhpAvroSchemaGenerator\Converter\PhpClassConverter;
use PhpKafka\PhpAvroSchemaGenerator\Parser\ClassParser;
use PhpKafka\PhpAvroSchemaGenerator\Parser\DocCommentParser;
use PhpKafka\PhpAvroSchemaGenerator\Registry\ClassRegistry;
use PhpKafka\PhpAvroSchemaGenerator\Parser\ClassPropertyParser;
use PhpKafka\PhpAvroSchemaGenerator\Generator\SchemaGenerator;
use PhpParser\ParserFactory;

$parser = (new ParserFactory())->create(ParserFactory::PREFER_PHP7);
$classPropertyParser = new ClassPropertyParser(new DocCommentParser());
$classParser = new ClassParser($parser, $classPropertyParser);

$converter = new PhpClassConverter($classParser);
$registry = (new ClassRegistry($converter))->addClassDirectory('./classes')->load();

$generator = new SchemaGenerator('./schema');
$generator->setClassRegistry($registry);
$schemas = $generator->generate();
$generator->exportSchemas($schemas);

The generator is able to detect types from:

  • doc comments
  • property types
  • doc annotations
    • @avro-type to set a fixed type instead of calculating one
    • @avro-default set a default for this property in your schema
    • @avro-doc to set schema doc comment
    • @avro-logical-type set logical type for your property (decimal is not yet supported, since it has additional parameters)

Disclaimer

In v1.3.0 the option --optimizeSubSchemaNamespaces was added. It was not working fully
in the 1.x version and we had some discussions (#13) about it.
Ultimately the decision was to adapt this behaviour fully in v2.0.0 so you might want to
upgrade if you rely on that behaviour.

php-avro-schema-generator's People

Contributors

andrei-arobs avatar bafs avatar bajdzun avatar hubbitus avatar misterxan avatar nick-zh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

php-avro-schema-generator's Issues

Add more examples

As pointed out in #52 some examples are missing.
Add examples for

  • generate & merge
  • cli generate
  • cli merge
  • cli generate & merge

Add a small readme to example folder as well with a short introduction about these examples

Generator: nested record types generated incorrectly

Said run your example of generation we got output schema PhpKafka.PhpAvroSchemaGenerator.Example.SomeTestClass.avsc (part):

    {
      "name": "someOtherTestClass",
      "type": "PhpKafka.PhpAvroSchemaGenerator.Example.SomeOtherTestClass"
    },
    {
      "name": "someOtherTestClasses",
      "type": {
        "type": "array",
        "items": "PhpKafka.PhpAvroSchemaGenerator.Example.SomeOtherTestClass"
      }
    },
    {
      "name": "blaaaaaaaa",
      "type": [
        "int",
        "string"
      ]
    },
    {
      "name": "unicorn",
      "type": "PhpKafka.PhpAvroSchemaGenerator.Example.Wonderland"
    }

Which is incorrect!
AVRO does not have types like PhpKafka.PhpAvroSchemaGenerator.Example.Wonderland!

In your example it is not very representative, because no any inner classes have fields, but said if Wonderland will have single field like:

**                                                                                                                                                                                                                                                                           
* Country of miracles
**/
class Wonderland                                                                                                                                                                                                                                                              
{                                                                                                                                                                                                                                                                             
    private string $land;                                                                                                                                                                                                                                                     
}

That generated part instead of

    {
      "name": "unicorn",
      "type": "PhpKafka.PhpAvroSchemaGenerator.Example.Wonderland"
    }

should look like:

{
  "name": "unicorn",
  "type": "record",
  "doc": "Country of miracles",
  "fields": [
    {
      "name": "land",
      "type": "string"
    }
  ]
}

Explicit type declarations is not handled!

Since PHP 7.4 type declarations were introduced.

But php-avro-schema-generator does not look in it.

Steps to reproduce:

// in ../DAO/DemoNotification.php
class DemoNotification {
    private string $content;

    public function __construct(string $content) {
        $this->content = $content;
    }

    public function getContent(): string {
        return $this->content;
    }
}

$data = new DemoNotification("Hello");
$registry = (new ClassRegistry())
    ->addClassDirectory(__DIR__ . '/../DAO')
    ->load();
$generator = new SchemaGenerator($registry, '');
$schemas = $generator->generate();
echo(current($schemas)); // Out: {"type":"record","name":"DemoNotification","namespace":"App.DAO","fields":[{"name":"content","type":""}]}

Actual behavior

Despite explicit type declaration:

private string $content;

Script outputs:

{"type":"record","name":"DemoNotification","namespace":"App.DAO","fields":[{"name":"content","type":""}]}

Expected behavior

{"type":"record","name":"DemoNotification","namespace":"App.DAO","fields":[{"name":"content","type":"string"}]}

The @var annotation contains a non existent class "self"

        $registry = (new ClassRegistry())
            ->addClassDirectory($this->projectDir . '/var/classes/DataObject')
            ->load();

The directory contains files from PIMcore, and among others scanned class Pimcore\Model\DataObject\AbstractObject with property with annotation:

    /**
     * @internal
     *
     * @var self|null
     */
    protected $o_parent;

Should be there self keyword resolved automatically into Pimcore\Model\DataObject\AbstractObject class?

ParserClass do not resolve parent classes without use statements

$registry = (new ClassRegistry($converter))
            ->addClassDirectory($this->projectDir . '/vendor/pimcore/pimcore/lib/Loader/')
                ->load();

And I have got error:

In ClassParser.php line 197:                                                  
  Class "AbstractClassNameLoader" does not exist  

Problem happened on class declaration starting from:

<?php

declare(strict_types = 1);

namespace Pimcore\Loader\ImplementationLoader;

class ClassMapLoader extends AbstractClassNameLoader
{
...

Class AbstractClassNameLoader indeed did not listed in the use statements, but that present in the same namespace and directory.

Improve optimizer interface

To leverage all the information we have about a template, we should probably chance the interface from

OptimizerInterface::optimize(string $definition): string

to

OptimizerInterface::optimize(SchemaTemplateInterface $schemaTemplate): SchemaTemplateInterface

This way the optimizers can leverage all the information from a SchemaTemplateInterface to optimize the schema.
This will be a breaking change

Parser do not account use alias for parent classes

Said I have class:

<?php
...
namespace Pimcore\Model\Document\Editable\Loader;

use Pimcore\Loader\ImplementationLoader\PrefixLoader as BasePrefixLoader;

/**
 * @internal
 */
final class PrefixLoader extends BasePrefixLoader
{

Parser failed on it with:

In ClassParser.php line 205:
                                                                                                        
  Parent class [BasePrefixLoader] for [Pimcore\Model\Document\Editable\Loader\PrefixLoader] not found! 

(Error message include fix in #38)

Add Wiki

The readme is a bit short and leaves out some of the details that are needed to understand the purpose of this project better.
Create a few wiki pages with more elaborate explanation

Please allow types re-definition (custom mapping)

Now most interesting for me to define optional (nullable fields), so, instead of just:

"type": "string",

Allow to set something like:

"type" : [ "null", "string" ],
"default": null

But there are also more interesting cases when e.g. instead of just int I would like to use logicalType date.

Please allow providing some functional interface to override fixed mechanism by providing an array of mappers, or just callable.

Handling namespaces of embedded types correctly

It was discovered (@bajdzun ), that the SR does not include namespaces of subtypes when they are in the same space.
To cover this a new option --optimizeSubSchemaNamespaces was added (#5, #10) to make this possible without breaking the current behaviour.
Upon further investigation it seems this is part of the Avro specification and should be default behaviour, see discussion in (#12).
This will be properly addressed and will result in a major release.

Thanks @bajdzun, @healerz & co for the help and uncovering this issue ๐Ÿ™

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.