Giter Club home page Giter Club logo

python-documentai's Introduction

NOTE:This github repository is archived. The repository contents and history have moved to google-cloud-python.

Python Client for Document AI API

stable pypi versions

Document AI API: Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, translation, and AutoML.

Quick Start

In order to use this library, you first need to go through the following steps:

  1. Select or create a Cloud Platform project.
  2. Enable billing for your project.
  3. Enable the Document AI API.
  4. Setup Authentication.

Installation

Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.

With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies.

Code samples and snippets

Code samples and snippets live in the samples/ folder.

Supported Python Versions

Our client libraries are compatible with all current active and maintenance versions of Python.

Python >= 3.7

Unsupported Python Versions

Python <= 3.6

If you are using an end-of-life version of Python, we recommend that you update as soon as possible to an actively supported version.

Mac/Linux
pip install virtualenv
virtualenv <your-env>
source <your-env>/bin/activate
<your-env>/bin/pip install google-cloud-documentai
Windows
pip install virtualenv
virtualenv <your-env>
<your-env>\Scripts\activate
<your-env>\Scripts\pip.exe install google-cloud-documentai

Next Steps

python-documentai's People

Contributors

aribray avatar busunkim96 avatar dandhlee avatar dependabot[bot] avatar dgallegos avatar galz10 avatar gcf-owl-bot[bot] avatar google-cloud-policy-bot[bot] avatar holtskinner avatar munkhuushmgl avatar nicain avatar parthea avatar release-please[bot] avatar renovate-bot avatar surferjeffatgoogle avatar vam-google avatar yoshi-automation avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-documentai's Issues

Clause in "if" statement is incorrect

In process_document snippets.

Current

start_index = (
            int(segment.start_index)
            if **segment.start_index** in doc_element.text_anchor.text_segments
            else 0
        )

Fix

start_index = (
            int(segment.start_index)
            if **segment** in doc_element.text_anchor.text_segments
            else 0
        )

Changing segment.start_index to only segment. Opening this bug before requesting a review for my PR.

samples.snippets.batch_parse_table_v1beta2_test: test_batch_parse_table failed

Note: #139 was also for this test, but it was closed more than 10 days ago. So, I didn't mark it flaky.


commit: eab533b
buildURL: Build Status, Sponge
status: failed

Test output
target = functools.partial(>)
predicate = .if_exception_type_predicate at 0x7fe3c8a2add0>
sleep_generator = 
deadline = 120, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
          return target()

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/retry.py:189:


self = <google.api_core.operation.Operation object at 0x7fe3c66fe350>
retry = <google.api_core.retry.Retry object at 0x7fe3c8a362d0>

def _done_or_raise(self, retry=DEFAULT_RETRY):
    """Check if the future is done and raise if it's not."""
    kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}

    if not self.done(**kwargs):
      raise _OperationNotComplete()

E google.api_core.future.polling._OperationNotComplete

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:87: _OperationNotComplete

The above exception was the direct cause of the following exception:

self = <google.api_core.operation.Operation object at 0x7fe3c66fe350>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fe3c8a362d0>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
      retry_(self._done_or_raise)(**kwargs)

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:108:


args = (), kwargs = {}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fe3c66fe350>>)
sleep_generator = <generator object exponential_sleep_generator at 0x7fe3c87e00d0>

@general_helpers.wraps(func)
def retry_wrapped_func(*args, **kwargs):
    """A wrapper that calls target function with retry."""
    target = functools.partial(func, *args, **kwargs)
    sleep_generator = exponential_sleep_generator(
        self._initial, self._maximum, multiplier=self._multiplier
    )
    return retry_target(
        target,
        self._predicate,
        sleep_generator,
        self._deadline,
      on_error=on_error,
    )

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/retry.py:291:


target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fe3c66fe350>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7fe3c8a2add0>
sleep_generator = <generator object exponential_sleep_generator at 0x7fe3c87e00d0>
deadline = 120, on_error = None

def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
            return target()

        # pylint: disable=broad-except
        # This function explicitly must deal with broad exceptions.
        except Exception as exc:
            if not predicate(exc):
                raise
            last_exc = exc
            if on_error is not None:
                on_error(exc)

        now = datetime_helpers.utcnow()

        if deadline_datetime is not None:
            if deadline_datetime <= now:
                six.raise_from(
                    exceptions.RetryError(
                        "Deadline of {:.1f}s exceeded while calling {}".format(
                            deadline, target
                        ),
                        last_exc,
                    ),
                  last_exc,
                )

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/retry.py:211:


value = None, from_value = _OperationNotComplete()

???
E google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fe3c66fe350>>), last exception:

:3: RetryError

During handling of the above exception, another exception occurred:

capsys = <_pytest.capture.CaptureFixture object at 0x7fe3c87e6650>

def test_batch_parse_table(capsys):
    batch_parse_table_v1beta2.batch_parse_table(
      PROJECT_ID, INPUT_URI, BATCH_OUTPUT_URI, 120
    )

batch_parse_table_v1beta2_test.py:45:


batch_parse_table_v1beta2.py:92: in batch_parse_table
operation.result(timeout)
.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:130: in result
self._blocking_poll(timeout=timeout, **kwargs)


self = <google.api_core.operation.Operation object at 0x7fe3c66fe350>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fe3c8a362d0>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
        retry_(self._done_or_raise)(**kwargs)
    except exceptions.RetryError:
        raise concurrent.futures.TimeoutError(
          "Operation did not complete within the designated " "timeout."
        )

E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:111: TimeoutError

How do documents get sent to HITL for manual inspection?

I've recently started using Document AI with the python SDK and the general parsers are giving quite decent results.

The HITL feature is something I would also like to try, but I did not find a way yet to do it.
I've been looking through the Python SDK documentation and there is a config mentioned, however I did not find a way to generate it/use it.
https://cloud.google.com/document-ai/docs/reference/rest/v1/projects.locations.processors.humanReviewConfig/reviewDocument#path-parameters

How do I generate this humanReviewConfig and how do I later use it with my original batch requests?

My requests look like this:
request = documentai.types.document_processor_service.BatchProcessRequest(name=name, input_documents=input_config, document_output_config=output_config, skip_human_review=False)

Then I thought this might work, but it throws an error that this is an invalid constructor input for ReviewDocumentRequest.
request = documentai.types.document_processor_service.ReviewDocumentRequest(documentai.GcsDocument(gcs_uri=f"gs://{gcs_folder}/my-document.pdf", mime_type="application/pdf"))

What would be the correct approach to use this API?

Is it possible to save the whole response document as json object?

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Is your feature request related to a problem? Please describe.
I would love to save all the content of the response as a json object.
Describe the solution you'd like
Being able to easily save the response document object
Describe alternatives you've considered
As an alternative I have to iterate over all the objects in the document and get each calue.

Cannot initialize documentai_v1beta2.types.Document

Hi, first of all thank you for this library. It has been very useful. However I'm having the following problem:

Is there any way to read a response generated by documentai_v1beta2.DocumentUnderstandingServiceClient().batch_process_documents(batch_request) as a documentai_v1beta2.types.Document object?

I followed the steps in https://cloud.google.com/document-ai/docs/process-tables#documentai_batch_parse_table-python for large files, and the json response is generated correctly in my bucket. However I am unable to read the response as a documentai_v1beta2.types.Document to manipulate it more easily.

Environment details

  • OS type and version:
  • Python version: 3.7.8
  • pip version: 20.1.1
  • google-cloud-documentai version: 0.2.0

Steps to reproduce

  1. Create a response for a large image as shown in https://cloud.google.com/document-ai/docs/process-tables#documentai_batch_parse_table-python . This generates a json response in my bucket. The file has all the fields corresponding to a Document json representation.

  2. Read response as json with blob and json.loads.

  3. Initialize documentai_v1beta2.types.Document object using the json object as mapping parameter.

Code example

def process_document_with_ai(ai_request, destination_uri):
    if not destination_uri:
        return DOCUMENTAI_CLIENT.process_document(
            request=ai_request
        )
    else:
        operation = DOCUMENTAI_CLIENT.batch_process_documents(
            ai_request
        )

        operation.result()

        match = re.match(r'gs://([^/]+)/(.+)', destination_uri)
        prefix = match.group(2)

        blob_list = list(bucket.list_blobs(prefix=prefix))

        return blob_list[0].download_as_string()

processed_document = image_table_extraction.process_document_with_ai(
    ai_request, destination
)

json_file = json.loads(processed_document)

doc_test = documentai.types.Document(
    mapping=json_file
)

Stack trace

KeyError                                  Traceback (most recent call last)
<ipython-input-72-56f0ad5dfc37> in <module>
      1 doc_test = documentai.types.Document(
----> 2     mapping=json_file
      3 )

/opt/conda/lib/python3.7/site-packages/proto/message.py in __init__(self, mapping, **kwargs)
    415         marshal = self._meta.marshal
    416         for key, value in mapping.items():
--> 417             pb_type = self._meta.fields[key].pb_type
    418             pb_value = marshal.to_proto(pb_type, value)
    419             if pb_value is not None:

KeyError: 'mimeType'`
# example

Normalized values

The pattern r"\d\d.0\d" is not properly normalized. I mean, the normalized values of amount and unit price entities are not correctly parsed when the decimal digit is 0 and the centesimal digit is different from 0. For instance, the text "18.02" is parsed with the normalized_value of "18.2" instead of "18.02".

Environment details

OS type and version:
Python version: 3.6.12
pip version: 20.2.4
google-cloud-documentai version: 0.4.0

Example

properties {
  text_anchor {
    text_segments {
      start_index: 2079
      end_index: 2084
    }
    content: "39,05"
  }
  type_: "line_item/amount"
  mention_text: "39,05"
  confidence: 0.9064523577690125
  page_anchor {
    page_refs {
      page: 1
      bounding_poly {
        normalized_vertices {
          x: 0.8506841063499451
          y: 0.7494745850563049
        }
        normalized_vertices {
          x: 0.8792385458946228
          y: 0.7494745850563049
        }
        normalized_vertices {
          x: 0.8792385458946228
          y: 0.7591425180435181
        }
        normalized_vertices {
          x: 0.8506841063499451
          y: 0.7591425180435181
        }
      }
    }
  }
  id: "12"
  normalized_value {
    text: "39.5"
    money_value {
      units: 39
      nanos: 50000000
    }
  }
}

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Here's the output from running synth.py:

Cloning into 'working_repo'...
Switched to branch 'autosynth'
Running synthtool
['/tmpfs/src/git/autosynth/env/bin/python3', '-m', 'synthtool', 'synth.py', '--']
synthtool > Executing /tmpfs/src/git/autosynth/working_repo/synth.py.
On branch autosynth
nothing to commit, working tree clean
HEAD detached at FETCH_HEAD
nothing to commit, working tree clean
synthtool > Ensuring dependencies.
synthtool > Pulling artman image.
latest: Pulling from googleapis/artman
Digest: sha256:6aec9c34db0e4be221cdaf6faba27bdc07cfea846808b3d3b964dfce3a9a0f9b
Status: Image is up to date for googleapis/artman:latest
synthtool > Cloning googleapis.
synthtool > Running generator for google/cloud/documentai/artman_documentai_v1beta1.yaml.
synthtool > Generated code into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1.
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document_understanding.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document_understanding.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/geometry.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/geometry.proto
synthtool > Placed proto files into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto.
synthtool > No replacements made in google/cloud/**/document_understanding_pb2.py for pattern \| Specifies a known document type for deeper structure
          detection\. Valid   values are currently "general" and
          "invoice"\. If not provided,   "general" \| is used as default.
          If any other value is given, the request is   rejected\., maybe replacement is not longer needed?
.coveragerc
.flake8
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/feature_request.md
.github/ISSUE_TEMPLATE/support_request.md
.github/PULL_REQUEST_TEMPLATE.md
.github/release-please.yml
.gitignore
.kokoro/build.sh
.kokoro/continuous/common.cfg
.kokoro/continuous/continuous.cfg
.kokoro/docs/common.cfg
.kokoro/docs/docs.cfg
.kokoro/presubmit/common.cfg
.kokoro/presubmit/presubmit.cfg
.kokoro/publish-docs.sh
.kokoro/release.sh
.kokoro/release/common.cfg
.kokoro/release/release.cfg
.kokoro/trampoline.sh
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
docs/_static/custom.css
docs/_templates/layout.html
docs/conf.py.j2
noxfile.py.j2
renovate.json
setup.cfg
Running session blacken
Creating virtual environment (virtualenv) using python3.6 in .nox/blacken
pip install black==19.3b0
Error: pip is not installed into the virtualenv, it is located at /tmpfs/src/git/autosynth/env/bin/pip. Pass external=True into run() to explicitly allow this.
Session blacken failed.
synthtool > Failed executing nox -s blacken:

None
synthtool > Wrote metadata to synth.metadata.
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 102, in <module>
    main()
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 94, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "/tmpfs/src/git/autosynth/working_repo/synth.py", line 53, in <module>
    s.shell.run(["nox", "-s", "blacken"], hide_output=False)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 39, in run
    raise exc
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 33, in run
    encoding="utf-8",
  File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['nox', '-s', 'blacken']' returned non-zero exit status 1.

Synthesis failed

Google internal developers can see the full log here.

Is it possible to output the results similar to an input document

I am trying to get a similar format as an input document. I assume this would be possible to use bounding_poly and normalized_vertices in the response. I am just not sure how to turn vertices into actual pixel locations yet. If someone has a code snippet, that would be awesome.

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Here's the output from running synth.py:

Cloning into 'working_repo'...
Switched to branch 'autosynth'
Running synthtool
['/tmpfs/src/git/autosynth/env/bin/python3', '-m', 'synthtool', 'synth.py', '--']
synthtool > Executing /tmpfs/src/git/autosynth/working_repo/synth.py.
synthtool > Ensuring dependencies.
synthtool > Pulling artman image.
latest: Pulling from googleapis/artman
Digest: sha256:6aec9c34db0e4be221cdaf6faba27bdc07cfea846808b3d3b964dfce3a9a0f9b
Status: Image is up to date for googleapis/artman:latest
synthtool > Cloning googleapis.
synthtool > Running generator for google/cloud/documentai/artman_documentai_v1beta1.yaml.
synthtool > Generated code into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1.
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document_understanding.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document_understanding.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/geometry.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/geometry.proto
synthtool > Placed proto files into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto.
synthtool > No replacements made in google/cloud/**/document_understanding_pb2.py for pattern \| Specifies a known document type for deeper structure
          detection\. Valid   values are currently "general" and
          "invoice"\. If not provided,   "general" \| is used as default.
          If any other value is given, the request is   rejected\., maybe replacement is not longer needed?
.coveragerc
.flake8
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/feature_request.md
.github/ISSUE_TEMPLATE/support_request.md
.github/PULL_REQUEST_TEMPLATE.md
.github/release-please.yml
.gitignore
.kokoro/build.sh
.kokoro/continuous/common.cfg
.kokoro/continuous/continuous.cfg
.kokoro/docs/common.cfg
.kokoro/docs/docs.cfg
.kokoro/presubmit/common.cfg
.kokoro/presubmit/presubmit.cfg
.kokoro/publish-docs.sh
.kokoro/release.sh
.kokoro/release/common.cfg
.kokoro/release/release.cfg
.kokoro/trampoline.sh
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
docs/_static/custom.css
docs/_templates/layout.html
docs/conf.py.j2
noxfile.py.j2
renovate.json
setup.cfg
Running session blacken
Creating virtual environment (virtualenv) using python3.6 in .nox/blacken
pip install black==19.3b0
Error: pip is not installed into the virtualenv, it is located at /tmpfs/src/git/autosynth/env/bin/pip. Pass external=True into run() to explicitly allow this.
Session blacken failed.
synthtool > Failed executing nox -s blacken:

None
synthtool > Wrote metadata to synth.metadata.
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 99, in <module>
    main()
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 91, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
  File "/tmpfs/src/git/autosynth/working_repo/synth.py", line 53, in <module>
    s.shell.run(["nox", "-s", "blacken"], hide_output=False)
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 39, in run
    raise exc
  File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 33, in run
    encoding="utf-8",
  File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/subprocess.py", line 418, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['nox', '-s', 'blacken']' returned non-zero exit status 1.

Synthesis failed

Google internal developers can see the full log here.

Example code is parsing pages, but intends to parse paragraphs

The following code in

           # Read the text recognition output from the processor
            for page in document.pages:
                for form_field in page.form_fields:
                    field_name = get_text(form_field.field_name, document)
                    field_value = get_text(form_field.field_value, document)
                    print("Extracted key value pair:")
                    print(f"\t{field_name}, {field_value}")
                for paragraph in document.pages:
                    paragraph_text = get_text(paragraph.layout, document)
                    print(f"Paragraph text:\n{paragraph_text}") 

code intention, that matches the "Upload Test Document" functionality should be

            for page in document.pages:
                for form_field in page.form_fields:
                    field_name = get_text(form_field.field_name, document)
                    field_value = get_text(form_field.field_value, document)
                    print("Extracted key value pair:")
                    print(f"\t{field_name}, {field_value}")
                for paragraph in page.paragraphs:
                    paragraph_text = get_text(paragraph.layout, document)
                    print(f"Paragraph text:\n{paragraph_text}") 

Reference: #147

cannot import name 'documentai_v1beta3' from 'google.cloud' (unknown location)

Environment details

  • OS type and version: NAME="Ubuntu" VERSION="18.04.5
  • Python version: Python 3.7.8
  • pip version: pip 20.2.4 from /opt/conda/lib/python3.7/site-packages/pip (python 3.7)
  • google-cloud-documentai version: Version: 0.3.0

Steps to reproduce

using jupyter notebook hosted in Google Cloud Notebook Instance
from google.cloud import documentai_v1beta3 as documentai
Error:
ImportError: cannot import name 'documentai_v1beta3' from 'google.cloud' (unknown location)

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Here's the output from running synth.py:

1a2c5659828bb9b41ea3a6efa20a20fd92b121/Jinja2-2.11.2-py2.py3-none-any.whl
  Saved ./Jinja2-2.11.2-py2.py3-none-any.whl
Collecting MarkupSafe==1.1.1 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 5))
  Using cached https://files.pythonhosted.org/packages/b2/5f/23e0023be6bb885d00ffbefad2942bc51a620328ee910f64abe5a8d18dd1/MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
  Saved ./MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting protobuf==3.13.0 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/30/79/510974552cebff2ba04038544799450defe75e96ea5f1675dbf72cc8744f/protobuf-3.13.0-cp36-cp36m-manylinux1_x86_64.whl
  Saved ./protobuf-3.13.0-cp36-cp36m-manylinux1_x86_64.whl
Collecting pypandoc==1.5 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 7))
  Using cached https://files.pythonhosted.org/packages/d6/b7/5050dc1769c8a93d3ec7c4bd55be161991c94b8b235f88bf7c764449e708/pypandoc-1.5.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmpfs/tmp/tmppzt0oz6m/setuptools-tmp/setuptools/__init__.py", line 6, in <module>
        import distutils.core
      File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/_distutils_hack/__init__.py", line 82, in create_module
        return importlib.import_module('._distutils', 'setuptools')
      File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
    ModuleNotFoundError: No module named 'setuptools._distutils'
    
    ----------------------------------------
 (  Cache entry deserialization failed, entry ignored
Command "python setup.py egg_info" failed with error code 1 in /tmpfs/tmp/pip-build-8m_sdyys/pypandoc/
)
ERROR: no such package '@gapic_generator_python_pip_deps//': pip_import failed: Collecting click==7.1.2 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/d2/3d/fa76db83bf75c4f8d338c2fd15c8d33fdd7ad23a9b5e57eb6c5de26b430e/click-7.1.2-py2.py3-none-any.whl
  Saved ./click-7.1.2-py2.py3-none-any.whl
Collecting google-api-core==1.22.1 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 2))
  Using cached https://files.pythonhosted.org/packages/e0/2d/7c6c75013105e1d2b6eaa1bf18a56995be1dbc673c38885aea31136e9918/google_api_core-1.22.1-py2.py3-none-any.whl
  Saved ./google_api_core-1.22.1-py2.py3-none-any.whl
Collecting googleapis-common-protos==1.52.0 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 3))
  Using cached https://files.pythonhosted.org/packages/03/74/3956721ea1eb4bcf7502a311fdaa60b85bd751de4e57d1943afe9b334141/googleapis_common_protos-1.52.0-py2.py3-none-any.whl
  Saved ./googleapis_common_protos-1.52.0-py2.py3-none-any.whl
Collecting jinja2==2.11.2 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 4))
  Using cached https://files.pythonhosted.org/packages/30/9e/f663a2aa66a09d838042ae1a2c5659828bb9b41ea3a6efa20a20fd92b121/Jinja2-2.11.2-py2.py3-none-any.whl
  Saved ./Jinja2-2.11.2-py2.py3-none-any.whl
Collecting MarkupSafe==1.1.1 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 5))
  Using cached https://files.pythonhosted.org/packages/b2/5f/23e0023be6bb885d00ffbefad2942bc51a620328ee910f64abe5a8d18dd1/MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
  Saved ./MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting protobuf==3.13.0 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 6))
  Using cached https://files.pythonhosted.org/packages/30/79/510974552cebff2ba04038544799450defe75e96ea5f1675dbf72cc8744f/protobuf-3.13.0-cp36-cp36m-manylinux1_x86_64.whl
  Saved ./protobuf-3.13.0-cp36-cp36m-manylinux1_x86_64.whl
Collecting pypandoc==1.5 (from -r /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/gapic_generator_python/requirements.txt (line 7))
  Using cached https://files.pythonhosted.org/packages/d6/b7/5050dc1769c8a93d3ec7c4bd55be161991c94b8b235f88bf7c764449e708/pypandoc-1.5.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmpfs/tmp/tmppzt0oz6m/setuptools-tmp/setuptools/__init__.py", line 6, in <module>
        import distutils.core
      File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/_distutils_hack/__init__.py", line 82, in create_module
        return importlib.import_module('._distutils', 'setuptools')
      File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
    ModuleNotFoundError: No module named 'setuptools._distutils'
    
    ----------------------------------------
 (  Cache entry deserialization failed, entry ignored
Command "python setup.py egg_info" failed with error code 1 in /tmpfs/tmp/pip-build-8m_sdyys/pypandoc/
)
INFO: Elapsed time: 2.235s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
FAILED: Build did NOT complete successfully (0 packages loaded)

Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
    main()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 32, in <module>
    bazel_target="//google/cloud/documentai/v1beta2:documentai-v1beta2-py",
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 46, in py_library
    return self._generate_code(service, version, "python", **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 183, in _generate_code
    shell.run(bazel_run_args)
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
    raise exc
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 33, in run
    encoding="utf-8",
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2020-08-31 05:18:13,457 autosynth [ERROR] > Synthesis failed
2020-08-31 05:18:13,457 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at ea83083 feat: add async client (#26)
2020-08-31 05:18:13,462 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2020-08-31 05:18:13,467 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 690, in <module>
    main()
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 539, in main
    return _inner_main(temp_dir)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 670, in _inner_main
    commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 375, in synthesize_loop
    has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 273, in synthesize_version_in_new_branch
    synthesizer.synthesize(synth_log_path, self.environ)
  File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
    synth_proc.check_returncode()  # Raise an exception.
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.

Google internal developers can see the full log here.

docs(samples): add processing samples for each processor category

This is an issue to add samples to cover each category of Document AI's API processing responses:

  • OCR
  • form leaving
  • quality
  • splitter
  • specialized

Currently only the form processor response is covered but only includes generic information on how to parse paragraphs. Each of these samples should cover aspects that are unique to the category
and general information that most or all developer will need to know how to parse and use.

This will give developers a starting point for each processor type that can be quickly and easily adapted to their use case.

Running Google Cloud DocumentAI sample code on Python returned the error 503

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: Ubuntu 20.04 (WSL on MS Windows 10)
  • Python version: 3.8.10
  • pip version: 21.1.3
  • google-cloud-documentai version: 1.0.0

Steps to reproduce

  1. export GOOGLE_APPLICATION_CREDENTIALS="/mnt/c/workspace/document_AI/google_ai.json"
  2. Put the following Py file and document.jpg into the same folder.
  3. Run python test_document_ai.py

Code example

from google.cloud import documentai_v1 as documentai
import os

# TODO(developer): Uncomment these variables before running the sample.

project_id= '123456789'
location = 'us' # Format is 'us' or 'eu'
processor_id = '1a23345gh823892' #  Create processor in Cloud Console
file_path = 'document.jpg'

os.environ['GRPC_DNS_RESOLVER'] = 'native'


def quickstart(project_id: str, location: str, processor_id: str, file_path: str):

    # You must set the api_endpoint if you use a location other than 'us', e.g.:
    opts = {}
    if location == "eu":
        opts = {"api_endpoint": "eu-documentai.googleapis.com"}

    client = documentai.DocumentProcessorServiceClient(client_options=opts)

    # The full resource name of the processor, e.g.:
    # projects/project-id/locations/location/processor/processor-id
    # You must create new processors in the Cloud Console first
    name = f"projects/{project_id}/locations/{location}/processors/{processor_id}:process"

    # Read the file into memory
    with open(file_path, "rb") as image:
        image_content = image.read()

    document = {"content": image_content, "mime_type": "image/jpeg"}

    # Configure the process request
    request = {"name": name, "raw_document": document}

    result = client.process_document(request=request)
    document = result.document

    document_pages = document.pages

    # For a full list of Document object attributes, please reference this page: https://googleapis.dev/python/documentai/latest/_modules/google/cloud/documentai_v1beta3/types/document.html#Document

    # Read the text recognition output from the processor
    print("The document contains the following paragraphs:")
    for page in document_pages:
        paragraphs = page.paragraphs
        for paragraph in paragraphs:
            print(paragraph)
            paragraph_text = get_text(paragraph.layout, document)
            print(f"Paragraph text: {paragraph_text}")


def get_text(doc_element: dict, document: dict):
    """
    Document AI identifies form fields by their offsets
    in document text. This function converts offsets
    to text snippets.
    """
    response = ""
    # If a text segment spans several lines, it will
    # be stored in different text segments.
    for segment in doc_element.text_anchor.text_segments:
        start_index = (
            int(segment.start_index)
            if segment in doc_element.text_anchor.text_segments
            else 0
        )
        end_index = int(segment.end_index)
        response += document.text[start_index:end_index]
    return response

def main ():
    quickstart (project_id = project_id, location = location, processor_id = processor_id, file_path = file_path)

if __name__ == '__main__':
    main ()

Stack trace

metadata=[('x-goog-request-params', 'name=projects/my_proj_id/locations/us/processors/my_processor_id'), ('x-goog-api-client', 'gl-python/3.8.10 grpc/1.38.1 gax/1.30.0 gapic/1.0.0')]), last exception: 503 DNS resolution failed for service: https://us-documentai.googleapis.com/v1/

I can use the DocumentAI service using the web interface, so I assume that there is something wrong with the local Python code?

Can't specify latest model for form extraction

Hello,

Environment details

  • Python version: Python 3.8.2
  • pip version: pip 20.0.2
  • google-cloud-documentai version: 0.2.0

Code example

form_extraction_params = documentai.types.FormExtractionParams(
        enabled=True, key_value_pair_hints=key_value_pair_hints, model_version='builtin/latest')

Stack trace

google.api_core.exceptions.InvalidArgument: 400 Invalid form model version. Must either start with custom/ or gs://

Thanks

samples.snippets.batch_process_documents_sample_v1beta3_test: test_batch_process_documents failed

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: 30008b4
buildURL: Build Status, Sponge
status: failed

Test output
target = functools.partial(>)
predicate = .if_exception_type_predicate at 0x7f45e88e1e50>
sleep_generator = 
deadline = 300, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
          return target()

.nox/py-3-8/lib/python3.8/site-packages/google/api_core/retry.py:184:


self = <google.api_core.operation.Operation object at 0x7f45e5dec0a0>
retry = <google.api_core.retry.Retry object at 0x7f45e88e7940>

def _done_or_raise(self, retry=DEFAULT_RETRY):
    """Check if the future is done and raise if it's not."""
    kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}

    if not self.done(**kwargs):
      raise _OperationNotComplete()

E google.api_core.future.polling._OperationNotComplete

.nox/py-3-8/lib/python3.8/site-packages/google/api_core/future/polling.py:86: _OperationNotComplete

The above exception was the direct cause of the following exception:

self = <google.api_core.operation.Operation object at 0x7f45e5dec0a0>
timeout = 300, retry = <google.api_core.retry.Retry object at 0x7f45e88e7940>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
      retry_(self._done_or_raise)(**kwargs)

.nox/py-3-8/lib/python3.8/site-packages/google/api_core/future/polling.py:107:


args = (), kwargs = {}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f45e5dec0a0>>)
sleep_generator = <generator object exponential_sleep_generator at 0x7f45e5d8f740>

@general_helpers.wraps(func)
def retry_wrapped_func(*args, **kwargs):
    """A wrapper that calls target function with retry."""
    target = functools.partial(func, *args, **kwargs)
    sleep_generator = exponential_sleep_generator(
        self._initial, self._maximum, multiplier=self._multiplier
    )
  return retry_target(
        target,
        self._predicate,
        sleep_generator,
        self._deadline,
        on_error=on_error,
    )

.nox/py-3-8/lib/python3.8/site-packages/google/api_core/retry.py:281:


target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f45e5dec0a0>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7f45e88e1e50>
sleep_generator = <generator object exponential_sleep_generator at 0x7f45e5d8f740>
deadline = 300, on_error = None

def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
            return target()

        # pylint: disable=broad-except
        # This function explicitly must deal with broad exceptions.
        except Exception as exc:
            if not predicate(exc):
                raise
            last_exc = exc
            if on_error is not None:
                on_error(exc)

        now = datetime_helpers.utcnow()

        if deadline_datetime is not None:
            if deadline_datetime <= now:
              six.raise_from(
                    exceptions.RetryError(
                        "Deadline of {:.1f}s exceeded while calling {}".format(
                            deadline, target
                        ),
                        last_exc,
                    ),
                    last_exc,
                )

.nox/py-3-8/lib/python3.8/site-packages/google/api_core/retry.py:199:


value = None, from_value = _OperationNotComplete()

???
E google.api_core.exceptions.RetryError: Deadline of 300.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f45e5dec0a0>>), last exception:

:3: RetryError

During handling of the above exception, another exception occurred:

capsys = <_pytest.capture.CaptureFixture object at 0x7f45e5d89a90>
test_bucket = 'document-ai-python-b9edc129-1f9c-4575-94fa-1d80729c9f79'

def test_batch_process_documents(capsys, test_bucket):
  batch_process_documents_sample_v1beta3.batch_process_documents(
        project_id=project_id,
        location=location,
        processor_id=processor_id,
        gcs_input_uri=gcs_input_uri,
        gcs_output_uri=f"gs://{test_bucket}",
        gcs_output_uri_prefix=gcs_output_uri_prefix,
    )

batch_process_documents_sample_v1beta3_test.py:50:


batch_process_documents_sample_v1beta3.py:72: in batch_process_documents
operation.result(timeout=timeout)
.nox/py-3-8/lib/python3.8/site-packages/google/api_core/future/polling.py:129: in result
self._blocking_poll(timeout=timeout, **kwargs)


self = <google.api_core.operation.Operation object at 0x7f45e5dec0a0>
timeout = 300, retry = <google.api_core.retry.Retry object at 0x7f45e88e7940>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
        retry_(self._done_or_raise)(**kwargs)
    except exceptions.RetryError:
      raise concurrent.futures.TimeoutError(
            "Operation did not complete within the designated " "timeout."
        )

E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.

.nox/py-3-8/lib/python3.8/site-packages/google/api_core/future/polling.py:109: TimeoutError

DocAI: transient error not retried

Using v1beta3, client lib v0.4.0, process_document() (the sync version) does not retry when hitting transient error RESOURCE_EXHAUSTED

We have a cloud function that processes pdf files as they get copied to GCS. Sometimes depending on the load, we hit unsurprisingly the quota limit Quota exceeded for quota metric 'Number of online process document requests using document processor' and limit 'Number of online process document requests using document processor per minute' of service 'documentai.googleapis.com' for consumer 'project_number:xxxx'.

Such "error" is retriable (grpc status 8) and should be retried by default, however it does not and function stops with:
image

I don't have a snippet to reproduce w/o GCF but a simple looping (async ?) at high rate should trigger the quota error.

Release as GA

GA release template

Required

  • 28 days elapsed since last beta release with new API surface
  • Server API is GA
  • Package API is stable, and we can commit to backward compatibility
  • All dependencies are GA

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Here's the output from running synth.py:

_core.tp_print = 0;
                                                                                    ^~~~~~~~
                                                                                    tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132284:72: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_55__schedule_rpc_coro.tp_print = 0;
                                                                        ^~~~~~~~
                                                                        tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132290:65: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_56__handle_rpc.tp_print = 0;
                                                                 ^~~~~~~~
                                                                 tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132296:67: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_57__request_call.tp_print = 0;
                                                                   ^~~~~~~~
                                                                   tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132302:71: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_58__server_main_loop.tp_print = 0;
                                                                       ^~~~~~~~
                                                                       tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132308:59: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_59_start.tp_print = 0;
                                                           ^~~~~~~~
                                                           tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132314:74: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_60__start_shutting_down.tp_print = 0;
                                                                          ^~~~~~~~
                                                                          tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132320:62: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_61_shutdown.tp_print = 0;
                                                              ^~~~~~~~
                                                              tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132326:74: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
   __pyx_type_7_cython_6cygrpc___pyx_scope_struct_62_wait_for_termination.tp_print = 0;
                                                                          ^~~~~~~~
                                                                          tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp: In function 'PyObject* __Pyx_decode_c_bytes(const char*, Py_ssize_t, Py_ssize_t, Py_ssize_t, const char*, const char*, PyObject* (*)(const char*, Py_ssize_t, const char*))':
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:136866:45: warning: 'PyObject* PyUnicode_FromUnicode(const Py_UNICODE*, Py_ssize_t)' is deprecated [-Wdeprecated-declarations]
         return PyUnicode_FromUnicode(NULL, 0);
                                             ^
In file included from bazel-out/host/bin/external/local_config_python/_python3/_python3_include/unicodeobject.h:1026:0,
                 from bazel-out/host/bin/external/local_config_python/_python3/_python3_include/Python.h:97,
                 from bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:4:
bazel-out/host/bin/external/local_config_python/_python3/_python3_include/cpython/unicodeobject.h:551:42: note: declared here
 Py_DEPRECATED(3.3) PyAPI_FUNC(PyObject*) PyUnicode_FromUnicode(
                                          ^~~~~~~~~~~~~~~~~~~~~
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp: In function 'void __pyx_f_7_cython_6cygrpc__unified_socket_write(int)':
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:72692:3: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result]
   (void)(write(__pyx_v_fd, ((char *)"1"), 1));
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp: At global scope:
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:144607:1: warning: 'void __Pyx_PyAsyncGen_Fini()' defined but not used [-Wunused-function]
 __Pyx_PyAsyncGen_Fini(void)
 ^~~~~~~~~~~~~~~~~~~~~
Target //google/cloud/documentai/v1beta2:documentai-v1beta2-py failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 4.113s, Critical Path: 3.84s
INFO: 9 processes: 9 linux-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
    main()
  File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 790, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/root/.cache/synthtool/python-documentai/synth.py", line 35, in <module>
    library = gapic.py_library(
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 45, in py_library
    return self._generate_code(service, version, "python", **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 182, in _generate_code
    shell.run(bazel_run_args)
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
    raise exc
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 27, in run
    return subprocess.run(
  File "/usr/local/lib/python3.9/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2020-12-05 03:06:32,936 autosynth [ERROR] > Synthesis failed
2020-12-05 03:06:32,937 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at bf3aba3 samples(fix): change comments to match function signature (#68)
2020-12-05 03:06:32,942 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2020-12-05 03:06:32,946 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
    main()
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
    return _inner_main(temp_dir)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
    commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
    has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
  File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
    synthesizer.synthesize(synth_log_path, self.environ)
  File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
    synth_proc.check_returncode()  # Raise an exception.
  File "/usr/local/lib/python3.9/subprocess.py", line 456, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.

Google internal developers can see the full log here.

ResourceExhausted

I get ResourceExhausted error for some pdf files. It is weird, for instance, one of them sizes only 177.9 KB in disk (237.14 KB as a base64 string). The error says "ResourceExhausted: 429 Received message larger than max (5218782 vs. 4194304)".

Environment details

  • OS type and version:
  • Python version: 3.6.12
  • pip version: 20.2.4
  • google-cloud-documentai version: 0.3.0

Trace

_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "Received message larger than max (5218782 vs. 4194304)"
	debug_error_string = "{"created":"@1612458152.165208175","description":"Received message larger than max (5218782 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":204,"grpc_status":8}"
>
  File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
ResourceExhausted: 429 Received message larger than max (5218782 vs. 4194304)
    result = client.process_document(request=request, timeout=DOCUMENTAI_TIMEOUT)
  File "/usr/local/lib/python3.6/site-packages/google/cloud/documentai_v1beta3/services/document_processor_service/client.py", line 327, in process_document
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
    on_error=on_error,
  File "/usr/local/lib/python3.6/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/usr/local/lib/python3.6/site-packages/google/api_core/timeout.py", line 102, in func_with_timeout
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
    # Permission is hereby granted, free of charge, to any person obtaining a copy

Code example

client = documentai.DocumentProcessorServiceClient()
# find size
print(len(content.encode('utf-8'))) # 237140 bytes
request = documentai.types.ProcessRequest(
        name=name,
        document=documentai.types.Document(content=content, mime_type=mime_type),
    )
result = client.process_document(request=request, timeout=DOCUMENTAI_TIMEOUT)

JSON Response difference between Node Client Library and try out feature of document AI

I am trying to call Document AI v1beta2 inbuilt client library to parse table inside the document.
As a result I am getting the JSON result object but it doesn't have the table headers for all the table present inside the document. It shows the header for the first table only and then all the other tables data comes under body rows.

I also tried the try out function from the official documentation of Document AI it gives back the right results. It gives the data as required.

Can anyone please assist on this?

samples.snippets.batch_parse_form_v1beta2_test: test_batch_parse_form failed

Note: #138 was also for this test, but it was closed more than 10 days ago. So, I didn't mark it flaky.


commit: 35e3b74
buildURL: Build Status, Sponge
status: failed

Test output
capsys = <_pytest.capture.CaptureFixture object at 0x7f7c25e3ad30>
def test_batch_parse_form(capsys):
  batch_parse_form_v1beta2.batch_parse_form(
        PROJECT_ID, INPUT_URI, BATCH_OUTPUT_URI, 120
    )

batch_parse_form_v1beta2_test.py:44:


batch_parse_form_v1beta2.py:93: in batch_parse_form
bucket = storage_client.get_bucket(output_bucket)
.nox/py-3-8/lib/python3.8/site-packages/google/cloud/storage/client.py:402: in get_bucket
bucket.reload(
.nox/py-3-8/lib/python3.8/site-packages/google/cloud/storage/bucket.py:1001: in reload
super(Bucket, self).reload(
.nox/py-3-8/lib/python3.8/site-packages/google/cloud/storage/_helpers.py:218: in reload
api_response = client._connection.api_request(
.nox/py-3-8/lib/python3.8/site-packages/google/cloud/storage/_http.py:78: in api_request
return call()
.nox/py-3-8/lib/python3.8/site-packages/google/api_core/retry.py:285: in retry_wrapped_func
return retry_target(
.nox/py-3-8/lib/python3.8/site-packages/google/api_core/retry.py:188: in retry_target
return target()
.nox/py-3-8/lib/python3.8/site-packages/google/cloud/_http.py:473: in api_request
response = self._make_request(
.nox/py-3-8/lib/python3.8/site-packages/google/cloud/_http.py:337: in _make_request
return self._do_request(
.nox/py-3-8/lib/python3.8/site-packages/google/cloud/_http.py:375: in _do_request
return self.http.request(
.nox/py-3-8/lib/python3.8/site-packages/google/auth/transport/requests.py:476: in request
self.credentials.before_request(auth_request, method, url, request_headers)
.nox/py-3-8/lib/python3.8/site-packages/google/auth/credentials.py:133: in before_request
self.refresh(request)
.nox/py-3-8/lib/python3.8/site-packages/google/oauth2/service_account.py:407: in refresh
access_token, expiry, _ = _client.jwt_grant(
.nox/py-3-8/lib/python3.8/site-packages/google/oauth2/_client.py:193: in jwt_grant
response_data = _token_endpoint_request(request, token_uri, body)
.nox/py-3-8/lib/python3.8/site-packages/google/oauth2/_client.py:165: in _token_endpoint_request
_handle_error_response(response_data)


response_data = {'error': 'invalid_grant', 'error_description': 'Invalid JWT Signature.'}

def _handle_error_response(response_data):
    """Translates an error response into an exception.

    Args:
        response_data (Mapping): The decoded response data.

    Raises:
        google.auth.exceptions.RefreshError: The errors contained in response_data.
    """
    try:
        error_details = "{}: {}".format(
            response_data["error"], response_data.get("error_description")
        )
    # If no details could be extracted, use the response data.
    except (KeyError, ValueError):
        error_details = json.dumps(response_data)
  raise exceptions.RefreshError(error_details, response_data)

E google.auth.exceptions.RefreshError: ('invalid_grant: Invalid JWT Signature.', {'error': 'invalid_grant', 'error_description': 'Invalid JWT Signature.'})

.nox/py-3-8/lib/python3.8/site-packages/google/oauth2/_client.py:60: RefreshError

samples.snippets.batch_parse_table_v1beta2_test: test_batch_parse_table failed

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: 30008b4
buildURL: Build Status, Sponge
status: failed

Test output
target = functools.partial(>)
predicate = .if_exception_type_predicate at 0x7fb300972950>
sleep_generator = 
deadline = 120, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
          return target()

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/retry.py:184:


self = <google.api_core.operation.Operation object at 0x7fb2fde10cd0>
retry = <google.api_core.retry.Retry object at 0x7fb30097c610>

def _done_or_raise(self, retry=DEFAULT_RETRY):
    """Check if the future is done and raise if it's not."""
    kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}

    if not self.done(**kwargs):
      raise _OperationNotComplete()

E google.api_core.future.polling._OperationNotComplete

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:86: _OperationNotComplete

The above exception was the direct cause of the following exception:

self = <google.api_core.operation.Operation object at 0x7fb2fde10cd0>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fb30097c610>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
      retry_(self._done_or_raise)(**kwargs)

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:107:


args = (), kwargs = {}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb2fde10cd0>>)
sleep_generator = <generator object exponential_sleep_generator at 0x7fb2fde74e50>

@general_helpers.wraps(func)
def retry_wrapped_func(*args, **kwargs):
    """A wrapper that calls target function with retry."""
    target = functools.partial(func, *args, **kwargs)
    sleep_generator = exponential_sleep_generator(
        self._initial, self._maximum, multiplier=self._multiplier
    )
    return retry_target(
        target,
        self._predicate,
        sleep_generator,
        self._deadline,
      on_error=on_error,
    )

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/retry.py:286:


target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb2fde10cd0>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7fb300972950>
sleep_generator = <generator object exponential_sleep_generator at 0x7fb2fde74e50>
deadline = 120, on_error = None

def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
            return target()

        # pylint: disable=broad-except
        # This function explicitly must deal with broad exceptions.
        except Exception as exc:
            if not predicate(exc):
                raise
            last_exc = exc
            if on_error is not None:
                on_error(exc)

        now = datetime_helpers.utcnow()

        if deadline_datetime is not None:
            if deadline_datetime <= now:
                six.raise_from(
                    exceptions.RetryError(
                        "Deadline of {:.1f}s exceeded while calling {}".format(
                            deadline, target
                        ),
                        last_exc,
                    ),
                  last_exc,
                )

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/retry.py:206:


value = None, from_value = _OperationNotComplete()

???
E google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb2fde10cd0>>), last exception:

:3: RetryError

During handling of the above exception, another exception occurred:

capsys = <_pytest.capture.CaptureFixture object at 0x7fb2fde76dd0>

def test_batch_parse_table(capsys):
  batch_parse_table_v1beta2.batch_parse_table(PROJECT_ID, INPUT_URI, BATCH_OUTPUT_URI, 120)

batch_parse_table_v1beta2_test.py:44:


batch_parse_table_v1beta2.py:92: in batch_parse_table
operation.result(timeout)
.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:129: in result
self._blocking_poll(timeout=timeout, **kwargs)


self = <google.api_core.operation.Operation object at 0x7fb2fde10cd0>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fb30097c610>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
        retry_(self._done_or_raise)(**kwargs)
    except exceptions.RetryError:
        raise concurrent.futures.TimeoutError(
          "Operation did not complete within the designated " "timeout."
        )

E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.

.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:110: TimeoutError

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Here's the output from running synth.py:

 experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 5162674 docs: fix pypi link (#46)
2020-10-20 05:43:00,857 autosynth [DEBUG] > Running: git checkout 5a506ec8765cc04f7e29f888b8e9b257d9a7ae11
Note: checking out '5a506ec8765cc04f7e29f888b8e9b257d9a7ae11'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>

HEAD is now at 5a506ec build(java): enable snippet-bot (#818)
2020-10-20 05:43:00,867 autosynth [DEBUG] > Running: git branch -f autosynth-27
2020-10-20 05:43:00,870 autosynth [DEBUG] > Running: git checkout autosynth-27
Switched to branch 'autosynth-27'
2020-10-20 05:43:00,877 autosynth [INFO] > Running synthtool
2020-10-20 05:43:00,877 autosynth [INFO] > ['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']
2020-10-20 05:43:00,877 autosynth [DEBUG] > log_file_path: /tmpfs/src/logs/python-documentai/27/sponge_log.log
2020-10-20 05:43:00,879 autosynth [DEBUG] > Running: /tmpfs/src/github/synthtool/env/bin/python3 -m synthtool --metadata synth.metadata synth.py --
2020-10-20 05:43:01,124 synthtool [DEBUG] > Executing /home/kbuilder/.cache/synthtool/python-documentai/synth.py.
On branch autosynth-27
nothing to commit, working tree clean
2020-10-20 05:43:01,253 synthtool [DEBUG] > Using precloned repo /home/kbuilder/.cache/synthtool/synthtool
2020-10-20 05:43:01,258 synthtool [DEBUG] > Ensuring dependencies.
DEBUG:synthtool:Ensuring dependencies.
2020-10-20 05:43:01,268 synthtool [DEBUG] > Using precloned repo /home/kbuilder/.cache/synthtool/synthtool
DEBUG:synthtool:Using precloned repo /home/kbuilder/.cache/synthtool/synthtool
2020-10-20 05:43:01,271 synthtool [DEBUG] > Cloning googleapis.
DEBUG:synthtool:Cloning googleapis.
2020-10-20 05:43:01,877 synthtool [DEBUG] > Generating code for: //google/cloud/documentai/v1beta2:documentai-v1beta2-py.
DEBUG:synthtool:Generating code for: //google/cloud/documentai/v1beta2:documentai-v1beta2-py.
2020-10-20 05:43:05,201 synthtool [SUCCESS] > Generated code into /tmpfs/tmp/tmpiomnly0x.
SUCCESS:synthtool:Generated code into /tmpfs/tmp/tmpiomnly0x.
2020-10-20 05:43:05,236 synthtool [DEBUG] > Generating code for: //google/cloud/documentai/v1beta3:documentai-v1beta3-py.
DEBUG:synthtool:Generating code for: //google/cloud/documentai/v1beta3:documentai-v1beta3-py.
2020-10-20 05:43:08,478 synthtool [SUCCESS] > Generated code into /tmpfs/tmp/tmpm6dytbd0.
SUCCESS:synthtool:Generated code into /tmpfs/tmp/tmpm6dytbd0.
.coveragerc
.flake8
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/feature_request.md
.github/ISSUE_TEMPLATE/support_request.md
.github/PULL_REQUEST_TEMPLATE.md
.github/release-please.yml
.github/snippet-bot.yml
.gitignore
.kokoro/build.sh
.kokoro/continuous/common.cfg
.kokoro/continuous/continuous.cfg
.kokoro/docker/docs/Dockerfile
.kokoro/docker/docs/fetch_gpg_keys.sh
.kokoro/docs/common.cfg
.kokoro/docs/docs-presubmit.cfg
.kokoro/docs/docs.cfg
.kokoro/populate-secrets.sh
.kokoro/presubmit/common.cfg
.kokoro/presubmit/presubmit.cfg
.kokoro/publish-docs.sh
.kokoro/release.sh
.kokoro/release/common.cfg
.kokoro/release/release.cfg
.kokoro/samples/lint/common.cfg
.kokoro/samples/lint/continuous.cfg
.kokoro/samples/lint/periodic.cfg
.kokoro/samples/lint/presubmit.cfg
.kokoro/samples/python3.6/common.cfg
.kokoro/samples/python3.6/continuous.cfg
.kokoro/samples/python3.6/periodic.cfg
.kokoro/samples/python3.6/presubmit.cfg
.kokoro/samples/python3.7/common.cfg
.kokoro/samples/python3.7/continuous.cfg
.kokoro/samples/python3.7/periodic.cfg
.kokoro/samples/python3.7/presubmit.cfg
.kokoro/samples/python3.8/common.cfg
.kokoro/samples/python3.8/continuous.cfg
.kokoro/samples/python3.8/periodic.cfg
.kokoro/samples/python3.8/presubmit.cfg
.kokoro/test-samples.sh
.kokoro/trampoline.sh
.kokoro/trampoline_v2.sh
.trampolinerc
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
docs/_static/custom.css
docs/_templates/layout.html
docs/conf.py.j2
docs/multiprocessing.rst
noxfile.py.j2
renovate.json
samples/AUTHORING_GUIDE.md
samples/CONTRIBUTING.md
scripts/decrypt-secrets.sh
scripts/readme-gen/readme_gen.py.j2
scripts/readme-gen/templates/README.tmpl.rst
scripts/readme-gen/templates/auth.tmpl.rst
scripts/readme-gen/templates/auth_api_key.tmpl.rst
scripts/readme-gen/templates/install_deps.tmpl.rst
scripts/readme-gen/templates/install_portaudio.tmpl.rst
setup.cfg
testing/.gitignore
2020-10-20 05:43:08,703 synthtool [INFO] > Generating templates for samples project 'samples/snippets'
INFO:synthtool:Generating templates for samples project 'samples/snippets'
Skipping: README.md
README.rst
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
    main()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 57, in <module>
    python.py_samples()
  File "/tmpfs/src/github/synthtool/synthtool/languages/python.py", line 141, in py_samples
    result = t.render(subdir=sample_project_dir, **sample_readme_metadata)
  File "/tmpfs/src/github/synthtool/synthtool/sources/templates.py", line 83, in render
    _render_to_path(self.env, template_name, self.dir / subdir, kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/sources/templates.py", line 53, in _render_to_path
    output.dump(fh)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 1313, in dump
    fp.writelines(iterable)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 1357, in __next__
    return self._next()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 1125, in generate
    yield self.environment.handle_exception()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/home/kbuilder/.cache/synthtool/synthtool/synthtool/gcp/templates/python_samples/README.rst", line 5, in top-level template code
    {{product.name}} Python Samples
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 471, in getattr
    return getattr(obj, attribute)
jinja2.exceptions.UndefinedError: 'product' is undefined
2020-10-20 05:43:08,769 autosynth [ERROR] > Synthesis failed
2020-10-20 05:43:08,770 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at 5162674 docs: fix pypi link (#46)
2020-10-20 05:43:08,781 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2020-10-20 05:43:08,788 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
    main()
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
    return _inner_main(temp_dir)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
    commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
    has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
  File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
    synthesizer.synthesize(synth_log_path, self.environ)
  File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
    synth_proc.check_returncode()  # Raise an exception.
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.

Google internal developers can see the full log here.

samples.snippets.batch_process_documents_sample_test: test_batch_process_documents failed

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: 35e3b74
buildURL: Build Status, Sponge
status: failed

Test output
capsys = <_pytest.capture.CaptureFixture object at 0x7efd3c46b9e8>
test_bucket = 'document-ai-python-5ff81f69-5b71-47f2-b889-571d4cfa8c55'
def test_batch_process_documents(capsys, test_bucket):
    batch_process_documents_sample.batch_process_documents(
        project_id=project_id,
        location=location,
        processor_id=processor_id,
        gcs_input_uri=gcs_input_uri,
        gcs_output_uri=f"gs://{test_bucket}",
      gcs_output_uri_prefix=gcs_output_uri_prefix,
    )

batch_process_documents_sample_test.py:56:


batch_process_documents_sample.py:83: in batch_process_documents
bucket = storage_client.get_bucket(output_bucket)
.nox/py-3-6/lib/python3.6/site-packages/google/cloud/storage/client.py:407: in get_bucket
retry=retry,
.nox/py-3-6/lib/python3.6/site-packages/google/cloud/storage/bucket.py:1007: in reload
retry=retry,
.nox/py-3-6/lib/python3.6/site-packages/google/cloud/storage/_helpers.py:225: in reload
retry=retry,
.nox/py-3-6/lib/python3.6/site-packages/google/cloud/storage/_http.py:78: in api_request
return call()
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/retry.py:290: in retry_wrapped_func
on_error=on_error,
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/retry.py:188: in retry_target
return target()
.nox/py-3-6/lib/python3.6/site-packages/google/cloud/_http.py:480: in api_request
timeout=timeout,
.nox/py-3-6/lib/python3.6/site-packages/google/cloud/_http.py:338: in _make_request
method, url, headers, data, target_object, timeout=timeout
.nox/py-3-6/lib/python3.6/site-packages/google/cloud/_http.py:376: in _do_request
url=url, method=method, headers=headers, data=data, timeout=timeout
.nox/py-3-6/lib/python3.6/site-packages/google/auth/transport/requests.py:476: in request
self.credentials.before_request(auth_request, method, url, request_headers)
.nox/py-3-6/lib/python3.6/site-packages/google/auth/credentials.py:133: in before_request
self.refresh(request)
.nox/py-3-6/lib/python3.6/site-packages/google/oauth2/service_account.py:408: in refresh
request, self._token_uri, assertion
.nox/py-3-6/lib/python3.6/site-packages/google/oauth2/_client.py:193: in jwt_grant
response_data = _token_endpoint_request(request, token_uri, body)
.nox/py-3-6/lib/python3.6/site-packages/google/oauth2/_client.py:165: in _token_endpoint_request
_handle_error_response(response_data)


response_data = {'error': 'invalid_grant', 'error_description': 'Invalid JWT Signature.'}

def _handle_error_response(response_data):
    """Translates an error response into an exception.

    Args:
        response_data (Mapping): The decoded response data.

    Raises:
        google.auth.exceptions.RefreshError: The errors contained in response_data.
    """
    try:
        error_details = "{}: {}".format(
            response_data["error"], response_data.get("error_description")
        )
    # If no details could be extracted, use the response data.
    except (KeyError, ValueError):
        error_details = json.dumps(response_data)
  raise exceptions.RefreshError(error_details, response_data)

E google.auth.exceptions.RefreshError: ('invalid_grant: Invalid JWT Signature.', {'error': 'invalid_grant', 'error_description': 'Invalid JWT Signature.'})

.nox/py-3-6/lib/python3.6/site-packages/google/oauth2/_client.py:60: RefreshError

Clause in "IF IN" statement is incorrect..gives wrong start index for the text segment

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: AI Notebooks
  • Python version: python --version: 3.7
  • pip version: pip --version: pip 20.1.1 from /opt/conda/lib/python3.7/site-packages/pip (python 3.7)
  • google-cloud-documentai version: pip show google-cloud-documentai

jupyter@python-20200630-170900:~$ pip show google-cloud-documentai
Name: google-cloud-documentai
Version: 0.3.0
Summary: UNKNOWN
Home-page: https://github.com/googleapis/python-documentai
Author: Google LLC
Author-email: [email protected]
License: Apache 2.0
Location: /opt/conda/lib/python3.7/site-packages
Requires: google-api-core, proto-plus
Required-by:

Steps to reproduce

  1. Run the code listed at https://cloud.google.com/document-ai/docs/invoice-parser --> Small File Processing --> Python
  2. get_text methos gives the start index always to be 0. That causes the extracted text to be more than required.

Code example

# example

start_index = (
            int(segment.start_index)
            if segment.start_index in doc_element.text_anchor.text_segments # this is the problem code
            else 0
        )

Suggested update to the above:
start_index = (
int(segment.start_index)
if segment in doc_element.text_anchor.text_segments else 0
)

Stack trace

# example

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

Name of repo

Is the name of the repo supposed to be 'python-document' or 'python-documentation'? Because currently it's 'python-documentai'.

Document how to run tests locally in README.rst

Running tests for this library is currently not documented in README.rst despite some mentions of how to run tests in CONTRIBUTING.rst there is no documentation for how to setup your locally environment to fully test the library.

This is a request for documentation on how to run all tests for this library locally.

text_styles empty

Environment details

  • OS type and version:
  • Python version: 3.6.12
  • pip version: 20.2.4
  • google-cloud-documentai version: 0.3.0

I found that some fields are always empty. I want to read the field text_styles but it always provides an empty list.

Code example

client = documentai.DocumentProcessorServiceClient()
request = documentai.types.ProcessRequest(
        name=name,
        document=documentai.types.Document(content=content, mime_type=mime_type),
    )
result = client.process_document(request=request, timeout=DOCUMENTAI_TIMEOUT)
document = result.document
print(document.text_styles) # [ ] empty list

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Here's the output from running synth.py:

ERROR: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta2/BUILD.bazel:165:1: //google/cloud/documentai/v1beta2:documentai_py_gapic: `bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional --plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin --python_gapic_out=retry-config=google/cloud/documentai/v1beta2/documentai_v1beta2_grpc_service_config.json:bazel-out/k8-fastbuild/bin/google/cloud/documentai/v1beta2/documentai_py_gapic.srcjar.zip -Igoogle/cloud/documentai/v1beta2/document.proto=google/cloud/documentai/v1beta2/document.proto -Igoogle/cloud/documentai/v1beta2/document_understanding.proto=google/cloud/documentai/v1beta2/document_understanding.proto -Igoogle/cloud/documentai/v1beta2/geometry.proto=google/cloud/documentai/v1beta2/geometry.proto -Igoogle/api/annotations.proto=google/api/annotations.proto -Igoogle/api/http.proto=google/api/http.proto -Igoogle/protobuf/descriptor.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/descriptor_proto/google/protobuf/descriptor.proto -Igoogle/api/client.proto=google/api/client.proto -Igoogle/api/field_behavior.proto=google/api/field_behavior.proto -Igoogle/longrunning/operations.proto=google/longrunning/operations.proto -Igoogle/rpc/status.proto=google/rpc/status.proto -Igoogle/protobuf/any.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/any_proto/google/protobuf/any.proto -Igoogle/protobuf/duration.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/duration_proto/google/protobuf/duration.proto -Igoogle/protobuf/empty.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/empty_proto/google/protobuf/empty.proto -Igoogle/type/color.proto=google/type/color.proto -Igoogle/protobuf/wrappers.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/wrappers_proto/google/protobuf/wrappers.proto -Igoogle/protobuf/timestamp.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/timestamp_proto/google/protobuf/timestamp.proto google/cloud/documentai/v1beta2/document.proto google/cloud/documentai/v1beta2/document_understanding.proto google/cloud/documentai/v1beta2/geometry.proto` failed (Exit 1) protoc failed: error executing command bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional '--plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin' ... (remaining 20 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
google/cloud/documentai/v1beta2/geometry.proto:19:1: warning: Import google/api/annotations.proto is unused.
google/cloud/documentai/v1beta2/document.proto:23:1: warning: Import google/api/annotations.proto is unused.
Traceback (most recent call last):
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate_with_pandoc.py", line 3, in <module>
    from gapic.cli import generate
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate.py", line 23, in <module>
    from gapic import generator
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/__init__.py", line 21, in <module>
    from .generator import Generator
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/generator.py", line 24, in <module>
    from gapic.samplegen import manifest, samplegen
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/__init__.py", line 15, in <module>
    from gapic.samplegen import samplegen
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/samplegen.py", line 27, in <module>
    from gapic.schema import wrappers
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/__init__.py", line 23, in <module>
    from gapic.schema.api import API
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/api.py", line 29, in <module>
    from google.api_core import exceptions  # type: ignore
ModuleNotFoundError: No module named 'google.api_core'
--python_gapic_out: protoc-gen-python_gapic: Plugin failed with status code 1.
Target //google/cloud/documentai/v1beta2:documentai-v1beta2-py failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1.178s, Critical Path: 0.87s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
    main()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 38, in <module>
    bazel_target=f"//google/cloud/documentai/{version}:documentai-{version}-py",
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 52, in py_library
    return self._generate_code(service, version, "python", **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 197, in _generate_code
    shell.run(bazel_run_args)
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
    raise exc
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 33, in run
    encoding="utf-8",
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2021-01-28 05:42:43,600 autosynth [ERROR] > Synthesis failed
2021-01-28 05:42:43,600 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at 745bb99 chore: added increased timeout on flaky batch request (#84)
2021-01-28 05:42:43,606 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2021-01-28 05:42:43,611 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
    main()
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
    return _inner_main(temp_dir)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
    commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
    has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
  File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
    synthesizer.synthesize(synth_log_path, self.environ)
  File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
    synth_proc.check_returncode()  # Raise an exception.
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.

Google internal developers can see the full log here.

EU location not supported

Hi,
The location 'eu' doesn't seem to be supported by the client.

Environment details

  • OS type and version: Chromebook 86.0
  • Python version: Python 3.7.3
  • pip version: pip 18.1
  • google-cloud-documentai version: 0.3.0

Steps to reproduce

https://github.com/googleapis/python-documentai/blob/master/samples/snippets/process_document_sample_v1beta3_test.py
with the location set to 'eu'

The base url "us-documentai.googleapis.com" seems hardcoded in the code.
https://github.com/googleapis/python-documentai/search?q=us-documentai&type=code

the correct url looks like
https://eu-documentai.googleapis.com/v1beta3/projects/123456789/locations/eu/processors/azerty:process

Stack trace

google.api_core.exceptions.PermissionDenied: 403 Permission 'documentai.processors.processOnline' denied on resource '//documentai.googleapis.com/projects/1234567890/locations/eu/processors/azerty' (or it may not exist).

Thanks!

google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.

I get error trying examples code (both online and offline processing). Same error

Environment details

  • OS type and version: OSX 11.1
  • Python version: 3.8.2
  • pip version: 19.2.3
  • google-cloud-documentai version: 0.3.0

Steps to reproduce

python3 parse_from_gs_beta3.py

Code example

the example in https://github.com/googleapis/python-documentai/blob/master/samples/snippets/batch_process_documents_sample_v1beta3.py

Stack trace

Traceback (most recent call last):
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/grpc/_channel.py", line 923, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "Request contains an invalid argument."
	debug_error_string = "{"created":"@1609844424.988576000","description":"Error received from peer ipv4:216.58.208.170:443","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "parse_from_gs_beta3.py", line 103, in <module>
    batch_process_documents(project_id='my-prj', location='eu', processor_id='cd6.....', gcs_input_uri='gs://my-bucket/file.pdf', gcs_output_uri='gs://my-bucket/', gcs_output_uri_prefix='doc_ai_out')
  File "parse_from_gs_beta3.py", line 41, in batch_process_documents
    operation = client.batch_process_documents(request)
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/cloud/documentai_v1beta3/services/document_processor_service/client.py", line 411, in batch_process_documents
    response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
    return wrapped_func(*args, **kwargs)
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
    return retry_target(
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/retry.py", line 184, in retry_target
    return target()
  File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
    six.raise_from(exceptions.from_grpc_error(exc), exc)
  File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.```

process_document_sample_v1beta3 failed with "Request contains an invalid argument."

Environment details

  • OS type and version: Mac 10.15.6
  • Python version: 3.7.8
  • pip version: 20.2.4
  • google-cloud-documentai version: 0.3.0

Steps to reproduce

  1. Running process_document_sample_v1beta3.py with /resources/invoice.pdf
  2. Processor in EU

Stack trace

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.INVALID_ARGUMENT
	details = "Request contains an invalid argument."
	debug_error_string = "{"created":"@1603381610.907626000","description":"Error received from peer ipv6:[x:x:x:x::x]:443","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>

Thanks!

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Please investigate and fix this issue within 5 business days. While it remains broken,
this library cannot be updated with changes to the python-documentai API, and the library grows
stale.

See https://github.com/googleapis/synthtool/blob/master/autosynth/TroubleShooting.md
for trouble shooting tips.

Here's the output from running synth.py:

ome/kbuilder/.cache/synthtool/googleapis/WORKSPACE:77:1
DEBUG: Rule 'com_google_protoc_java_resource_names_plugin' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "4b714b35ee04ba90f560ee60e64c7357428efcb6b0f3a298f343f8ec2c6d4a5d"
DEBUG: Call stack for the definition of repository 'com_google_protoc_java_resource_names_plugin' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:234:1
DEBUG: Rule 'protoc_docs_plugin' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "33b387245455775e0de45869c7355cc5a9e98b396a6fc43b02812a63b75fee20"
DEBUG: Call stack for the definition of repository 'protoc_docs_plugin' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:258:1
DEBUG: Rule 'rules_python' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "48f7e716f4098b85296ad93f5a133baf712968c13fbc2fdf3a6136158fe86eac"
DEBUG: Call stack for the definition of repository 'rules_python' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:42:1
DEBUG: Rule 'gapic_generator_python' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "fe995def6873fcbdc2a8764ef4bce96eb971a9d1950fe9db9be442f3c64fb3b6"
DEBUG: Call stack for the definition of repository 'gapic_generator_python' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:278:1
DEBUG: Rule 'com_googleapis_gapic_generator_go' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "c0d0efba86429cee5e52baf838165b0ed7cafae1748d025abec109d25e006628"
DEBUG: Call stack for the definition of repository 'com_googleapis_gapic_generator_go' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:300:1
DEBUG: Rule 'gapic_generator_php' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "3dffc5c34a5f35666843df04b42d6ce1c545b992f9c093a777ec40833b548d86"
DEBUG: Call stack for the definition of repository 'gapic_generator_php' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:364:1
DEBUG: Rule 'gapic_generator_csharp' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "4db430cfb9293e4521ec8e8138f8095faf035d8e752cf332d227710d749939eb"
DEBUG: Call stack for the definition of repository 'gapic_generator_csharp' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:386:1
DEBUG: Rule 'gapic_generator_ruby' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "a14ec475388542f2ea70d16d75579065758acc4b99fdd6d59463d54e1a9e4499"
DEBUG: Call stack for the definition of repository 'gapic_generator_ruby' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:400:1
DEBUG: /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/rules_python/python/pip.bzl:61:5: DEPRECATED: the pip_repositories rule has been replaced with pip_install, please see rules_python 0.1 release notes
DEBUG: Rule 'bazel_skylib' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "1dde365491125a3db70731e25658dfdd3bc5dbdfd11b840b3e987ecf043c7ca0"
DEBUG: Call stack for the definition of repository 'bazel_skylib' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
 - <builtin>
 - /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:35:1
Analyzing: target //google/cloud/documentai/v1beta2:documentai-v1beta2-py (1 packages loaded, 0 targets configured)
ERROR: /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/upb/bazel/upb_proto_library.bzl:257:29: aspect() got unexpected keyword argument 'incompatible_use_toolchain_transition'
ERROR: Analysis of target '//google/cloud/documentai/v1beta2:documentai-v1beta2-py' failed; build aborted: error loading package '@com_github_grpc_grpc//': in /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/com_github_grpc_grpc/bazel/grpc_build_system.bzl: Extension file 'bazel/upb_proto_library.bzl' has errors
INFO: Elapsed time: 0.276s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded, 4 targets configured)
FAILED: Build did NOT complete successfully (2 packages loaded, 4 targets configured)

Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
    main()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 37, in <module>
    bazel_target=f"//google/cloud/documentai/{version}:documentai-{version}-py",
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 52, in py_library
    return self._generate_code(service, version, "python", False, **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 204, in _generate_code
    shell.run(bazel_run_args)
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
    raise exc
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 33, in run
    encoding="utf-8",
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2021-04-27 02:14:32,448 autosynth [ERROR] > Synthesis failed
2021-04-27 02:14:32,448 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at 30008b4 chore(revert): revert preventing normalization (#129)
2021-04-27 02:14:32,453 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2021-04-27 02:14:32,460 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 356, in <module>
    main()
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 191, in main
    return _inner_main(temp_dir)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 336, in _inner_main
    commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 68, in synthesize_loop
    has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
  File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
    synthesizer.synthesize(synth_log_path, self.environ)
  File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
    synth_proc.check_returncode()  # Raise an exception.
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.

Google internal developers can see the full log here.

OCR symbols

We need the OCR symbols (characters) for some extra processing. Currently, the documentai response includes: blocks, paragraphs, lines and tokens (equivalent to words). Do you think you can add the OCR symbols to the response?

Issue with running the v3 version of the API..getting an error

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please be sure to include as much information as possible:

Environment details

  • OS type and version: AI Notebook, Python 3.7
  • Python version: python --version : Python 3.7
  • pip version: pip --version: pip 20.1.1 from /opt/conda/lib/python3.7/site-packages/pip (python 3.7)
  • google-cloud-documentai version: pip show google-cloud-documentai: jupyter@python-20200630-170900:~$ pip show google-cloud-documentai
    Name: google-cloud-documentai
    Version: 0.3.0
    Summary: UNKNOWN
    Home-page: https://github.com/googleapis/python-documentai
    Author: Google LLC
    Author-email: [email protected]
    License: Apache 2.0
    Location: /opt/conda/lib/python3.7/site-packages
    Requires: google-api-core, proto-plus
    Required-by:

Steps to reproduce

  1. I am trying to run the OOTB code shared for batch processing of the invoices. Here is the sample call that invokes the code: batch_process_documents(project_id='', location='us', processor_id='', gcs_input_uri='gs:///Invoice.pdf', gcs_output_uri='gs://*', gcs_output_uri_prefix='inv')
  2. Run the boiler plate code.

Code example

# example

Stack trace

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-30-42be8c6d0dd8> in <module>
----> 1 batch_process_documents(project_id='***', location='us', processor_id='***', gcs_input_uri='gs://***/Invoice.pdf', gcs_output_uri='gs://***', gcs_output_uri_prefix='inv')

<ipython-input-28-f0fe586dfadb> in batch_process_documents(project_id, location, processor_id, gcs_input_uri, gcs_output_uri, gcs_output_uri_prefix)
     66     for i, blob in enumerate(blob_list):
     67         # Download the contents of this blob as a bytes object.
---> 68         blob_as_bytes = blob.download_as_bytes()
     69         document = documentai.types.Document.from_json(blob_as_bytes)
     70 

AttributeError: 'Blob' object has no attribute 'download_as_bytes'

Making sure to follow these steps will guarantee the quickest resolution possible.

Thanks!

samples.snippets.batch_parse_form_v1beta2_test: test_batch_parse_form failed

This test failed!

To configure my behavior, see the Flaky Bot documentation.

If I'm commenting on this issue too often, add the flakybot: quiet label and
I will stop commenting.


commit: 30008b4
buildURL: Build Status, Sponge
status: failed

Test output
target = functools.partial(>)
predicate = .if_exception_type_predicate at 0x7f8f8eed6158>
sleep_generator = 
deadline = 120, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
          return target()

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/retry.py:184:


self = <google.api_core.operation.Operation object at 0x7f8f8ec5c198>
retry = <google.api_core.retry.Retry object at 0x7f8f8eed77f0>

def _done_or_raise(self, retry=DEFAULT_RETRY):
    """Check if the future is done and raise if it's not."""
    kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}

    if not self.done(**kwargs):
      raise _OperationNotComplete()

E google.api_core.future.polling._OperationNotComplete

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/future/polling.py:86: _OperationNotComplete

The above exception was the direct cause of the following exception:

self = <google.api_core.operation.Operation object at 0x7f8f8ec5c198>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7f8f8eed77f0>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
      retry_(self._done_or_raise)(**kwargs)

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/future/polling.py:107:


args = (), kwargs = {}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f8f8ec5c198>>)
sleep_generator = <generator object exponential_sleep_generator at 0x7f8f8ed82f68>

@general_helpers.wraps(func)
def retry_wrapped_func(*args, **kwargs):
    """A wrapper that calls target function with retry."""
    target = functools.partial(func, *args, **kwargs)
    sleep_generator = exponential_sleep_generator(
        self._initial, self._maximum, multiplier=self._multiplier
    )
    return retry_target(
        target,
        self._predicate,
        sleep_generator,
        self._deadline,
      on_error=on_error,
    )

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/retry.py:286:


target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f8f8ec5c198>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7f8f8eed6158>
sleep_generator = <generator object exponential_sleep_generator at 0x7f8f8ed82f68>
deadline = 120, on_error = None

def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
    """Call a function and retry if it fails.

    This is the lowest-level retry helper. Generally, you'll use the
    higher-level retry helper :class:`Retry`.

    Args:
        target(Callable): The function to call and retry. This must be a
            nullary function - apply arguments with `functools.partial`.
        predicate (Callable[Exception]): A callable used to determine if an
            exception raised by the target should be considered retryable.
            It should return True to retry or False otherwise.
        sleep_generator (Iterable[float]): An infinite iterator that determines
            how long to sleep between retries.
        deadline (float): How long to keep retrying the target. The last sleep
            period is shortened as necessary, so that the last retry runs at
            ``deadline`` (and not considerably beyond it).
        on_error (Callable[Exception]): A function to call while processing a
            retryable exception.  Any error raised by this function will *not*
            be caught.

    Returns:
        Any: the return value of the target function.

    Raises:
        google.api_core.RetryError: If the deadline is exceeded while retrying.
        ValueError: If the sleep generator stops yielding values.
        Exception: If the target raises a method that isn't retryable.
    """
    if deadline is not None:
        deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
            seconds=deadline
        )
    else:
        deadline_datetime = None

    last_exc = None

    for sleep in sleep_generator:
        try:
            return target()

        # pylint: disable=broad-except
        # This function explicitly must deal with broad exceptions.
        except Exception as exc:
            if not predicate(exc):
                raise
            last_exc = exc
            if on_error is not None:
                on_error(exc)

        now = datetime_helpers.utcnow()

        if deadline_datetime is not None:
            if deadline_datetime <= now:
                six.raise_from(
                    exceptions.RetryError(
                        "Deadline of {:.1f}s exceeded while calling {}".format(
                            deadline, target
                        ),
                        last_exc,
                    ),
                  last_exc,
                )

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/retry.py:206:


value = None, from_value = _OperationNotComplete()

???
E google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f8f8ec5c198>>), last exception:

:3: RetryError

During handling of the above exception, another exception occurred:

capsys = <_pytest.capture.CaptureFixture object at 0x7f8f8ed727b8>

def test_batch_parse_form(capsys):
  batch_parse_form_v1beta2.batch_parse_form(PROJECT_ID, INPUT_URI, BATCH_OUTPUT_URI, 120)

batch_parse_form_v1beta2_test.py:44:


batch_parse_form_v1beta2.py:84: in batch_parse_form
operation.result(timeout)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/future/polling.py:129: in result
self._blocking_poll(timeout=timeout, **kwargs)


self = <google.api_core.operation.Operation object at 0x7f8f8ec5c198>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7f8f8eed77f0>

def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
    """Poll and wait for the Future to be resolved.

    Args:
        timeout (int):
            How long (in seconds) to wait for the operation to complete.
            If None, wait indefinitely.
    """
    if self._result_set:
        return

    retry_ = self._retry.with_deadline(timeout)

    try:
        kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
        retry_(self._done_or_raise)(**kwargs)
    except exceptions.RetryError:
        raise concurrent.futures.TimeoutError(
          "Operation did not complete within the designated " "timeout."
        )

E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.

.nox/py-3-6/lib/python3.6/site-packages/google/api_core/future/polling.py:110: TimeoutError

Synthesis failed for python-documentai

Hello! Autosynth couldn't regenerate python-documentai. 💔

Here's the output from running synth.py:

ome/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta2/BUILD.bazel:165:1: //google/cloud/documentai/v1beta2:documentai_py_gapic: `bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional --plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin --python_gapic_out=retry-config=google/cloud/documentai/v1beta2/documentai_v1beta2_grpc_service_config.json:bazel-out/k8-fastbuild/bin/google/cloud/documentai/v1beta2/documentai_py_gapic.srcjar.zip -Igoogle/cloud/documentai/v1beta2/document.proto=google/cloud/documentai/v1beta2/document.proto -Igoogle/cloud/documentai/v1beta2/document_understanding.proto=google/cloud/documentai/v1beta2/document_understanding.proto -Igoogle/cloud/documentai/v1beta2/geometry.proto=google/cloud/documentai/v1beta2/geometry.proto -Igoogle/api/annotations.proto=google/api/annotations.proto -Igoogle/api/http.proto=google/api/http.proto -Igoogle/protobuf/descriptor.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/descriptor_proto/google/protobuf/descriptor.proto -Igoogle/api/client.proto=google/api/client.proto -Igoogle/api/field_behavior.proto=google/api/field_behavior.proto -Igoogle/longrunning/operations.proto=google/longrunning/operations.proto -Igoogle/rpc/status.proto=google/rpc/status.proto -Igoogle/protobuf/any.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/any_proto/google/protobuf/any.proto -Igoogle/protobuf/duration.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/duration_proto/google/protobuf/duration.proto -Igoogle/protobuf/empty.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/empty_proto/google/protobuf/empty.proto -Igoogle/type/color.proto=google/type/color.proto -Igoogle/protobuf/wrappers.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/wrappers_proto/google/protobuf/wrappers.proto -Igoogle/protobuf/timestamp.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/timestamp_proto/google/protobuf/timestamp.proto google/cloud/documentai/v1beta2/document.proto google/cloud/documentai/v1beta2/document_understanding.proto google/cloud/documentai/v1beta2/geometry.proto` failed (Exit 1) protoc failed: error executing command bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional '--plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin' ... (remaining 20 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
google/cloud/documentai/v1beta2/geometry.proto:19:1: warning: Import google/api/annotations.proto is unused.
google/cloud/documentai/v1beta2/document.proto:23:1: warning: Import google/api/annotations.proto is unused.
Traceback (most recent call last):
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate_with_pandoc.py", line 3, in <module>
    from gapic.cli import generate
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate.py", line 23, in <module>
    from gapic import generator
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/__init__.py", line 21, in <module>
    from .generator import Generator
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/generator.py", line 24, in <module>
    from gapic.samplegen import manifest, samplegen
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/__init__.py", line 15, in <module>
    from gapic.samplegen import samplegen
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/samplegen.py", line 27, in <module>
    from gapic.schema import wrappers
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/__init__.py", line 23, in <module>
    from gapic.schema.api import API
  File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/api.py", line 29, in <module>
    from google.api_core import exceptions  # type: ignore
ModuleNotFoundError: No module named 'google.api_core'
--python_gapic_out: protoc-gen-python_gapic: Plugin failed with status code 1.
Target //google/cloud/documentai/v1beta2:documentai-v1beta2-py failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1.138s, Critical Path: 0.87s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully

Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
    main()
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
    spec.loader.exec_module(synth_module)  # type: ignore
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 38, in <module>
    bazel_target=f"//google/cloud/documentai/{version}:documentai-{version}-py",
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 52, in py_library
    return self._generate_code(service, version, "python", **kwargs)
  File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 193, in _generate_code
    shell.run(bazel_run_args)
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
    raise exc
  File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 33, in run
    encoding="utf-8",
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2021-01-21 05:42:36,028 autosynth [ERROR] > Synthesis failed
2021-01-21 05:42:36,028 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at f2cdc15 chore(deps): update dependency google-cloud-storage to v1.35.0 (#78)
2021-01-21 05:42:36,033 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2021-01-21 05:42:36,038 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
    main()
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
    return _inner_main(temp_dir)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
    commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
  File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
    has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
  File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
    synthesizer.synthesize(synth_log_path, self.environ)
  File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
    synth_proc.check_returncode()  # Raise an exception.
  File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
    self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.

Google internal developers can see the full log here.

Enable self-signed JWT flow for v1beta2

Documentai uses a regional host (us-documentai.googleapis.com) as the default for v1beta2 so self-signed JWT cannot be used for v1beta2.

When the host is defined as documentai.googleapis.com re-enable the self-signed JWT flow.

A 'type' field of entity is always empty

I'm trying to use Google Document AI for getting data from a PDF file. But a 'type' field of entity is empty always. But it must be non-empty according to the documentation.

Environment details

  • Mac OS X - 10.15.6
  • Python version: 3.7.6
  • pip version: 19.3.1
  • google-cloud-documentai version: 0.2.0

Code example

My code is based on the Codelabs tutorial

from google.cloud import documentai_v1beta2 as documentai

def parse_invoice(project_id='myprojectid',
         input_uri='gs://cloud-samples-data/documentai/invoice.pdf'):
    """Procsingle document with the Document AI API, including
    text extraction and entity extraction."""

    client = documentai.DocumentUnderstandingServiceClient()

    gcs_source = documentai.types.GcsSource(uri=input_uri)
    # mime_type can be application/pdf, image/tiff,
    # and image/gif, or application/json
    input_config = documentai.types.InputConfig(
        gcs_source=gcs_source, mime_type='application/pdf')

    entity_p = documentai.types.EntityExtractionParams(enabled=True)
    parent = 'projects/{}/locations/us'.format(project_id)
    
    request = documentai.types.ProcessDocumentRequest(
        parent=parent,
        input_config=input_config,
        entity_extraction_params=entity_p)
    document = client.process_document(request=request)
    print(document.entities)

Here is a part of output:

, text_anchor {
  text_segments {
    start_index: 273
    end_index: 303
  }
}
mention_text: "John Doe\[email protected]"
mention_id: "6"
confidence: 0.8736821413040161

Does anybody know where can be a problem?

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

This repository currently has no open or pending branches.


  • Check this box to trigger a request for Renovate to run again on this repository

How to train/provide "model_version" for FormExtractionParams ( google.cloud.documentai_v1beta2.types.FormExtractionParams)

Hi,

As per the API documentation for v1beta2 there is a provision to provide custom model id /annotation dataset for FormExtraction,

model_version
Model version of the form extraction system. Default is “builtin/stable”. Specify “builtin/latest” for the latest model. For custom form models, specify: “custom/{model_name}”. Model name format is “bucket_name/path/to/modeldir” corresponding to “gs://bucket_name/path/to/modeldir” where annotated examples are stored.

Is there any documentation or guideline available on how to train the model or prepare the annotated dataset which can be put into gs bucket and referred in the model_version parameter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.