This library has moved to https://github.com/googleapis/google-cloud-python/tree/main/packages/google-cloud-documentai
License: Apache License 2.0
python-documentai's Introduction
NOTE:
This github repository is archived. The repository contents and history have moved togoogle-cloud-python.
Python Client for Document AI API
Document AI API: Service to parse structured information from unstructured or semi-structured documents using state-of-the-art Google AI such as natural language, computer vision, translation, and AutoML.
Install this library in a virtualenv using pip. virtualenv is a tool to
create isolated Python environments. The basic problem it addresses is one of
dependencies and versions, and indirectly permissions.
With virtualenv, it's possible to install this library without needing system
install permissions, and without clashing with the installed system
dependencies.
Code samples and snippets
Code samples and snippets live in the samples/ folder.
Supported Python Versions
Our client libraries are compatible with all current active and maintenance versions of
Python.
Python >= 3.7
Unsupported Python Versions
Python <= 3.6
If you are using an end-of-life
version of Python, we recommend that you update as soon as possible to an actively supported version.
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
self = <google.api_core.operation.Operation object at 0x7fe3c66fe350>
retry = <google.api_core.retry.Retry object at 0x7fe3c8a362d0>
def _done_or_raise(self, retry=DEFAULT_RETRY):
"""Check if the future is done and raise if it's not."""
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
if not self.done(**kwargs):
raise _OperationNotComplete()
E google.api_core.future.polling._OperationNotComplete
The above exception was the direct cause of the following exception:
self = <google.api_core.operation.Operation object at 0x7fe3c66fe350>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fe3c8a362d0>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fe3c66fe350>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7fe3c8a2add0>
sleep_generator = <generator object exponential_sleep_generator at 0x7fe3c87e00d0>
deadline = 120, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
return target()
# pylint: disable=broad-except
# This function explicitly must deal with broad exceptions.
except Exception as exc:
if not predicate(exc):
raise
last_exc = exc
if on_error is not None:
on_error(exc)
now = datetime_helpers.utcnow()
if deadline_datetime is not None:
if deadline_datetime <= now:
six.raise_from(
exceptions.RetryError(
"Deadline of {:.1f}s exceeded while calling {}".format(
deadline, target
),
last_exc,
),
value = None, from_value = _OperationNotComplete()
???
E google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fe3c66fe350>>), last exception:
:3: RetryError
During handling of the above exception, another exception occurred:
capsys = <_pytest.capture.CaptureFixture object at 0x7fe3c87e6650>
batch_parse_table_v1beta2.py:92: in batch_parse_table
operation.result(timeout)
.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:130: in result
self._blocking_poll(timeout=timeout, **kwargs)
self = <google.api_core.operation.Operation object at 0x7fe3c66fe350>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fe3c8a362d0>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
retry_(self._done_or_raise)(**kwargs)
except exceptions.RetryError:
raise concurrent.futures.TimeoutError(
"Operation did not complete within the designated " "timeout."
)
E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.
I've recently started using Document AI with the python SDK and the general parsers are giving quite decent results.
The HITL feature is something I would also like to try, but I did not find a way yet to do it.
I've been looking through the Python SDK documentation and there is a config mentioned, however I did not find a way to generate it/use it. https://cloud.google.com/document-ai/docs/reference/rest/v1/projects.locations.processors.humanReviewConfig/reviewDocument#path-parameters
How do I generate this humanReviewConfig and how do I later use it with my original batch requests?
My requests look like this: request = documentai.types.document_processor_service.BatchProcessRequest(name=name, input_documents=input_config, document_output_config=output_config, skip_human_review=False)
Then I thought this might work, but it throws an error that this is an invalid constructor input for ReviewDocumentRequest. request = documentai.types.document_processor_service.ReviewDocumentRequest(documentai.GcsDocument(gcs_uri=f"gs://{gcs_folder}/my-document.pdf", mime_type="application/pdf"))
What would be the correct approach to use this API?
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Is your feature request related to a problem? Please describe.
I would love to save all the content of the response as a json object. Describe the solution you'd like
Being able to easily save the response document object Describe alternatives you've considered
As an alternative I have to iterate over all the objects in the document and get each calue.
Hi, first of all thank you for this library. It has been very useful. However I'm having the following problem:
Is there any way to read a response generated by documentai_v1beta2.DocumentUnderstandingServiceClient().batch_process_documents(batch_request) as a documentai_v1beta2.types.Document object?
The pattern r"\d\d.0\d" is not properly normalized. I mean, the normalized values of amount and unit price entities are not correctly parsed when the decimal digit is 0 and the centesimal digit is different from 0. For instance, the text "18.02" is parsed with the normalized_value of "18.2" instead of "18.02".
Environment details
OS type and version:
Python version: 3.6.12
pip version: 20.2.4
google-cloud-documentai version: 0.4.0
Cloning into 'working_repo'...
Switched to branch 'autosynth'
Running synthtool
['/tmpfs/src/git/autosynth/env/bin/python3', '-m', 'synthtool', 'synth.py', '--']
synthtool > Executing /tmpfs/src/git/autosynth/working_repo/synth.py.
On branch autosynth
nothing to commit, working tree clean
HEAD detached at FETCH_HEAD
nothing to commit, working tree clean
synthtool > Ensuring dependencies.
synthtool > Pulling artman image.
latest: Pulling from googleapis/artman
Digest: sha256:6aec9c34db0e4be221cdaf6faba27bdc07cfea846808b3d3b964dfce3a9a0f9b
Status: Image is up to date for googleapis/artman:latest
synthtool > Cloning googleapis.
synthtool > Running generator for google/cloud/documentai/artman_documentai_v1beta1.yaml.
synthtool > Generated code into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1.
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document_understanding.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document_understanding.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/geometry.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/geometry.proto
synthtool > Placed proto files into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto.
synthtool > No replacements made in google/cloud/**/document_understanding_pb2.py for pattern \| Specifies a known document type for deeper structure
detection\. Valid values are currently "general" and
"invoice"\. If not provided, "general" \| is used as default.
If any other value is given, the request is rejected\., maybe replacement is not longer needed?
.coveragerc
.flake8
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/feature_request.md
.github/ISSUE_TEMPLATE/support_request.md
.github/PULL_REQUEST_TEMPLATE.md
.github/release-please.yml
.gitignore
.kokoro/build.sh
.kokoro/continuous/common.cfg
.kokoro/continuous/continuous.cfg
.kokoro/docs/common.cfg
.kokoro/docs/docs.cfg
.kokoro/presubmit/common.cfg
.kokoro/presubmit/presubmit.cfg
.kokoro/publish-docs.sh
.kokoro/release.sh
.kokoro/release/common.cfg
.kokoro/release/release.cfg
.kokoro/trampoline.sh
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
docs/_static/custom.css
docs/_templates/layout.html
docs/conf.py.j2
noxfile.py.j2
renovate.json
setup.cfg
Running session blacken
Creating virtual environment (virtualenv) using python3.6 in .nox/blacken
pip install black==19.3b0
Error: pip is not installed into the virtualenv, it is located at /tmpfs/src/git/autosynth/env/bin/pip. Pass external=True into run() to explicitly allow this.
Session blacken failed.
synthtool > Failed executing nox -s blacken:
None
synthtool > Wrote metadata to synth.metadata.
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 102, in <module>
main()
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 94, in main
spec.loader.exec_module(synth_module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
File "/tmpfs/src/git/autosynth/working_repo/synth.py", line 53, in <module>
s.shell.run(["nox", "-s", "blacken"], hide_output=False)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 39, in run
raise exc
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 33, in run
encoding="utf-8",
File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['nox', '-s', 'blacken']' returned non-zero exit status 1.
Synthesis failed
Google internal developers can see the full log here.
I am trying to get a similar format as an input document. I assume this would be possible to use bounding_poly and normalized_vertices in the response. I am just not sure how to turn vertices into actual pixel locations yet. If someone has a code snippet, that would be awesome.
Cloning into 'working_repo'...
Switched to branch 'autosynth'
Running synthtool
['/tmpfs/src/git/autosynth/env/bin/python3', '-m', 'synthtool', 'synth.py', '--']
synthtool > Executing /tmpfs/src/git/autosynth/working_repo/synth.py.
synthtool > Ensuring dependencies.
synthtool > Pulling artman image.
latest: Pulling from googleapis/artman
Digest: sha256:6aec9c34db0e4be221cdaf6faba27bdc07cfea846808b3d3b964dfce3a9a0f9b
Status: Image is up to date for googleapis/artman:latest
synthtool > Cloning googleapis.
synthtool > Running generator for google/cloud/documentai/artman_documentai_v1beta1.yaml.
synthtool > Generated code into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1.
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document_understanding.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document_understanding.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/document.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/document.proto
synthtool > Copy: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta1/geometry.proto to /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto/geometry.proto
synthtool > Placed proto files into /home/kbuilder/.cache/synthtool/googleapis/artman-genfiles/python/documentai-v1beta1/google/cloud/documentai_v1beta1/proto.
synthtool > No replacements made in google/cloud/**/document_understanding_pb2.py for pattern \| Specifies a known document type for deeper structure
detection\. Valid values are currently "general" and
"invoice"\. If not provided, "general" \| is used as default.
If any other value is given, the request is rejected\., maybe replacement is not longer needed?
.coveragerc
.flake8
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/feature_request.md
.github/ISSUE_TEMPLATE/support_request.md
.github/PULL_REQUEST_TEMPLATE.md
.github/release-please.yml
.gitignore
.kokoro/build.sh
.kokoro/continuous/common.cfg
.kokoro/continuous/continuous.cfg
.kokoro/docs/common.cfg
.kokoro/docs/docs.cfg
.kokoro/presubmit/common.cfg
.kokoro/presubmit/presubmit.cfg
.kokoro/publish-docs.sh
.kokoro/release.sh
.kokoro/release/common.cfg
.kokoro/release/release.cfg
.kokoro/trampoline.sh
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
docs/_static/custom.css
docs/_templates/layout.html
docs/conf.py.j2
noxfile.py.j2
renovate.json
setup.cfg
Running session blacken
Creating virtual environment (virtualenv) using python3.6 in .nox/blacken
pip install black==19.3b0
Error: pip is not installed into the virtualenv, it is located at /tmpfs/src/git/autosynth/env/bin/pip. Pass external=True into run() to explicitly allow this.
Session blacken failed.
synthtool > Failed executing nox -s blacken:
None
synthtool > Wrote metadata to synth.metadata.
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 99, in <module>
main()
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/__main__.py", line 91, in main
spec.loader.exec_module(synth_module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 205, in _call_with_frames_removed
File "/tmpfs/src/git/autosynth/working_repo/synth.py", line 53, in <module>
s.shell.run(["nox", "-s", "blacken"], hide_output=False)
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 39, in run
raise exc
File "/tmpfs/src/git/autosynth/env/lib/python3.6/site-packages/synthtool/shell.py", line 33, in run
encoding="utf-8",
File "/home/kbuilder/.pyenv/versions/3.6.1/lib/python3.6/subprocess.py", line 418, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['nox', '-s', 'blacken']' returned non-zero exit status 1.
Synthesis failed
Google internal developers can see the full log here.
# Read the text recognition output from the processor
for page in document.pages:
for form_field in page.form_fields:
field_name = get_text(form_field.field_name, document)
field_value = get_text(form_field.field_value, document)
print("Extracted key value pair:")
print(f"\t{field_name}, {field_value}")
for paragraph in document.pages:
paragraph_text = get_text(paragraph.layout, document)
print(f"Paragraph text:\n{paragraph_text}")
code intention, that matches the "Upload Test Document" functionality should be
for page in document.pages:
for form_field in page.form_fields:
field_name = get_text(form_field.field_name, document)
field_value = get_text(form_field.field_value, document)
print("Extracted key value pair:")
print(f"\t{field_name}, {field_value}")
for paragraph in page.paragraphs:
paragraph_text = get_text(paragraph.layout, document)
print(f"Paragraph text:\n{paragraph_text}")
OS type and version: NAME="Ubuntu" VERSION="18.04.5
Python version: Python 3.7.8
pip version: pip 20.2.4 from /opt/conda/lib/python3.7/site-packages/pip (python 3.7)
google-cloud-documentai version: Version: 0.3.0
Steps to reproduce
using jupyter notebook hosted in Google Cloud Notebook Instance
from google.cloud import documentai_v1beta3 as documentai
Error:
ImportError: cannot import name 'documentai_v1beta3' from 'google.cloud' (unknown location)
This is an issue to add samples to cover each category of Document AI's API processing responses:
OCR
form leaving
quality
splitter
specialized
Currently only the form processor response is covered but only includes generic information on how to parse paragraphs. Each of these samples should cover aspects that are unique to the category
and general information that most or all developer will need to know how to parse and use.
This will give developers a starting point for each processor type that can be quickly and easily adapted to their use case.
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
Put the following Py file and document.jpg into the same folder.
Run python test_document_ai.py
Code example
from google.cloud import documentai_v1 as documentai
import os
# TODO(developer): Uncomment these variables before running the sample.
project_id= '123456789'
location = 'us' # Format is 'us' or 'eu'
processor_id = '1a23345gh823892' # Create processor in Cloud Console
file_path = 'document.jpg'
os.environ['GRPC_DNS_RESOLVER'] = 'native'
def quickstart(project_id: str, location: str, processor_id: str, file_path: str):
# You must set the api_endpoint if you use a location other than 'us', e.g.:
opts = {}
if location == "eu":
opts = {"api_endpoint": "eu-documentai.googleapis.com"}
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}:process"
# Read the file into memory
with open(file_path, "rb") as image:
image_content = image.read()
document = {"content": image_content, "mime_type": "image/jpeg"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)
document = result.document
document_pages = document.pages
# For a full list of Document object attributes, please reference this page: https://googleapis.dev/python/documentai/latest/_modules/google/cloud/documentai_v1beta3/types/document.html#Document
# Read the text recognition output from the processor
print("The document contains the following paragraphs:")
for page in document_pages:
paragraphs = page.paragraphs
for paragraph in paragraphs:
print(paragraph)
paragraph_text = get_text(paragraph.layout, document)
print(f"Paragraph text: {paragraph_text}")
def get_text(doc_element: dict, document: dict):
"""
Document AI identifies form fields by their offsets
in document text. This function converts offsets
to text snippets.
"""
response = ""
# If a text segment spans several lines, it will
# be stored in different text segments.
for segment in doc_element.text_anchor.text_segments:
start_index = (
int(segment.start_index)
if segment in doc_element.text_anchor.text_segments
else 0
)
end_index = int(segment.end_index)
response += document.text[start_index:end_index]
return response
def main ():
quickstart (project_id = project_id, location = location, processor_id = processor_id, file_path = file_path)
if __name__ == '__main__':
main ()
Stack trace
metadata=[('x-goog-request-params', 'name=projects/my_proj_id/locations/us/processors/my_processor_id'), ('x-goog-api-client', 'gl-python/3.8.10 grpc/1.38.1 gax/1.30.0 gapic/1.0.0')]), last exception: 503 DNS resolution failed for service: https://us-documentai.googleapis.com/v1/
I can use the DocumentAI service using the web interface, so I assume that there is something wrong with the local Python code?
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
self = <google.api_core.operation.Operation object at 0x7f45e5dec0a0>
retry = <google.api_core.retry.Retry object at 0x7f45e88e7940>
def _done_or_raise(self, retry=DEFAULT_RETRY):
"""Check if the future is done and raise if it's not."""
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
if not self.done(**kwargs):
raise _OperationNotComplete()
E google.api_core.future.polling._OperationNotComplete
The above exception was the direct cause of the following exception:
self = <google.api_core.operation.Operation object at 0x7f45e5dec0a0>
timeout = 300, retry = <google.api_core.retry.Retry object at 0x7f45e88e7940>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f45e5dec0a0>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7f45e88e1e50>
sleep_generator = <generator object exponential_sleep_generator at 0x7f45e5d8f740>
deadline = 300, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
return target()
# pylint: disable=broad-except
# This function explicitly must deal with broad exceptions.
except Exception as exc:
if not predicate(exc):
raise
last_exc = exc
if on_error is not None:
on_error(exc)
now = datetime_helpers.utcnow()
if deadline_datetime is not None:
if deadline_datetime <= now:
six.raise_from(
exceptions.RetryError(
"Deadline of {:.1f}s exceeded while calling {}".format(
deadline, target
),
last_exc,
),
last_exc,
)
value = None, from_value = _OperationNotComplete()
???
E google.api_core.exceptions.RetryError: Deadline of 300.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f45e5dec0a0>>), last exception:
:3: RetryError
During handling of the above exception, another exception occurred:
capsys = <_pytest.capture.CaptureFixture object at 0x7f45e5d89a90>
test_bucket = 'document-ai-python-b9edc129-1f9c-4575-94fa-1d80729c9f79'
batch_process_documents_sample_v1beta3.py:72: in batch_process_documents
operation.result(timeout=timeout)
.nox/py-3-8/lib/python3.8/site-packages/google/api_core/future/polling.py:129: in result
self._blocking_poll(timeout=timeout, **kwargs)
self = <google.api_core.operation.Operation object at 0x7f45e5dec0a0>
timeout = 300, retry = <google.api_core.retry.Retry object at 0x7f45e88e7940>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
retry_(self._done_or_raise)(**kwargs)
except exceptions.RetryError:
raise concurrent.futures.TimeoutError(
"Operation did not complete within the designated " "timeout."
)
E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.
Using v1beta3, client lib v0.4.0, process_document() (the sync version) does not retry when hitting transient error RESOURCE_EXHAUSTED
We have a cloud function that processes pdf files as they get copied to GCS. Sometimes depending on the load, we hit unsurprisingly the quota limit Quota exceeded for quota metric 'Number of online process document requests using document processor' and limit 'Number of online process document requests using document processor per minute' of service 'documentai.googleapis.com' for consumer 'project_number:xxxx'.
Such "error" is retriable (grpc status 8) and should be retried by default, however it does not and function stops with:
I don't have a snippet to reproduce w/o GCF but a simple looping (async ?) at high rate should trigger the quota error.
_core.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132284:72: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_55__schedule_rpc_coro.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132290:65: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_56__handle_rpc.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132296:67: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_57__request_call.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132302:71: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_58__server_main_loop.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132308:59: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_59_start.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132314:74: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_60__start_shutting_down.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132320:62: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_61_shutdown.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:132326:74: error: 'PyTypeObject {aka struct _typeobject}' has no member named 'tp_print'; did you mean 'tp_dict'?
__pyx_type_7_cython_6cygrpc___pyx_scope_struct_62_wait_for_termination.tp_print = 0;
^~~~~~~~
tp_dict
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp: In function 'PyObject* __Pyx_decode_c_bytes(const char*, Py_ssize_t, Py_ssize_t, Py_ssize_t, const char*, const char*, PyObject* (*)(const char*, Py_ssize_t, const char*))':
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:136866:45: warning: 'PyObject* PyUnicode_FromUnicode(const Py_UNICODE*, Py_ssize_t)' is deprecated [-Wdeprecated-declarations]
return PyUnicode_FromUnicode(NULL, 0);
^
In file included from bazel-out/host/bin/external/local_config_python/_python3/_python3_include/unicodeobject.h:1026:0,
from bazel-out/host/bin/external/local_config_python/_python3/_python3_include/Python.h:97,
from bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:4:
bazel-out/host/bin/external/local_config_python/_python3/_python3_include/cpython/unicodeobject.h:551:42: note: declared here
Py_DEPRECATED(3.3) PyAPI_FUNC(PyObject*) PyUnicode_FromUnicode(
^~~~~~~~~~~~~~~~~~~~~
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp: In function 'void __pyx_f_7_cython_6cygrpc__unified_socket_write(int)':
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:72692:3: warning: ignoring return value of 'ssize_t write(int, const void*, size_t)', declared with attribute warn_unused_result [-Wunused-result]
(void)(write(__pyx_v_fd, ((char *)"1"), 1));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp: At global scope:
bazel-out/host/bin/external/com_github_grpc_grpc/src/python/grpcio/grpc/_cython/cygrpc.cpp:144607:1: warning: 'void __Pyx_PyAsyncGen_Fini()' defined but not used [-Wunused-function]
__Pyx_PyAsyncGen_Fini(void)
^~~~~~~~~~~~~~~~~~~~~
Target //google/cloud/documentai/v1beta2:documentai-v1beta2-py failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 4.113s, Critical Path: 3.84s
INFO: 9 processes: 9 linux-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
main()
File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmpfs/src/github/synthtool/env/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
spec.loader.exec_module(synth_module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 790, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/root/.cache/synthtool/python-documentai/synth.py", line 35, in <module>
library = gapic.py_library(
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 45, in py_library
return self._generate_code(service, version, "python", **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 182, in _generate_code
shell.run(bazel_run_args)
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
raise exc
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 27, in run
return subprocess.run(
File "/usr/local/lib/python3.9/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2020-12-05 03:06:32,936 autosynth [ERROR] > Synthesis failed
2020-12-05 03:06:32,937 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at bf3aba3 samples(fix): change comments to match function signature (#68)
2020-12-05 03:06:32,942 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2020-12-05 03:06:32,946 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
main()
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
return _inner_main(temp_dir)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
synthesizer.synthesize(synth_log_path, self.environ)
File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
synth_proc.check_returncode() # Raise an exception.
File "/usr/local/lib/python3.9/subprocess.py", line 456, in check_returncode
raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.
Google internal developers can see the full log here.
I get ResourceExhausted error for some pdf files. It is weird, for instance, one of them sizes only 177.9 KB in disk (237.14 KB as a base64 string). The error says "ResourceExhausted: 429 Received message larger than max (5218782 vs. 4194304)".
Environment details
OS type and version:
Python version: 3.6.12
pip version: 20.2.4
google-cloud-documentai version: 0.3.0
Trace
_InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (5218782 vs. 4194304)"
debug_error_string = "{"created":"@1612458152.165208175","description":"Received message larger than max (5218782 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":204,"grpc_status":8}"
>
File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "grpc/_channel.py", line 923, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "grpc/_channel.py", line 826, in _end_unary_response_blocking
raise _InactiveRpcError(state)
ResourceExhausted: 429 Received message larger than max (5218782 vs. 4194304)
result = client.process_document(request=request, timeout=DOCUMENTAI_TIMEOUT)
File "/usr/local/lib/python3.6/site-packages/google/cloud/documentai_v1beta3/services/document_processor_service/client.py", line 327, in process_document
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/usr/local/lib/python3.6/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.6/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/usr/local/lib/python3.6/site-packages/google/api_core/timeout.py", line 102, in func_with_timeout
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
# Permission is hereby granted, free of charge, to any person obtaining a copy
I am trying to call Document AI v1beta2 inbuilt client library to parse table inside the document.
As a result I am getting the JSON result object but it doesn't have the table headers for all the table present inside the document. It shows the header for the first table only and then all the other tables data comes under body rows.
I also tried the try out function from the official documentation of Document AI it gives back the right results. It gives the data as required.
def _handle_error_response(response_data):
"""Translates an error response into an exception.
Args:
response_data (Mapping): The decoded response data.
Raises:
google.auth.exceptions.RefreshError: The errors contained in response_data.
"""
try:
error_details = "{}: {}".format(
response_data["error"], response_data.get("error_description")
)
# If no details could be extracted, use the response data.
except (KeyError, ValueError):
error_details = json.dumps(response_data)
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
self = <google.api_core.operation.Operation object at 0x7fb2fde10cd0>
retry = <google.api_core.retry.Retry object at 0x7fb30097c610>
def _done_or_raise(self, retry=DEFAULT_RETRY):
"""Check if the future is done and raise if it's not."""
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
if not self.done(**kwargs):
raise _OperationNotComplete()
E google.api_core.future.polling._OperationNotComplete
The above exception was the direct cause of the following exception:
self = <google.api_core.operation.Operation object at 0x7fb2fde10cd0>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fb30097c610>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb2fde10cd0>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7fb300972950>
sleep_generator = <generator object exponential_sleep_generator at 0x7fb2fde74e50>
deadline = 120, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
return target()
# pylint: disable=broad-except
# This function explicitly must deal with broad exceptions.
except Exception as exc:
if not predicate(exc):
raise
last_exc = exc
if on_error is not None:
on_error(exc)
now = datetime_helpers.utcnow()
if deadline_datetime is not None:
if deadline_datetime <= now:
six.raise_from(
exceptions.RetryError(
"Deadline of {:.1f}s exceeded while calling {}".format(
deadline, target
),
last_exc,
),
value = None, from_value = _OperationNotComplete()
???
E google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7fb2fde10cd0>>), last exception:
:3: RetryError
During handling of the above exception, another exception occurred:
capsys = <_pytest.capture.CaptureFixture object at 0x7fb2fde76dd0>
batch_parse_table_v1beta2.py:92: in batch_parse_table
operation.result(timeout)
.nox/py-3-7/lib/python3.7/site-packages/google/api_core/future/polling.py:129: in result
self._blocking_poll(timeout=timeout, **kwargs)
self = <google.api_core.operation.Operation object at 0x7fb2fde10cd0>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7fb30097c610>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
retry_(self._done_or_raise)(**kwargs)
except exceptions.RetryError:
raise concurrent.futures.TimeoutError(
"Operation did not complete within the designated " "timeout."
)
E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.
experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 5162674 docs: fix pypi link (#46)
2020-10-20 05:43:00,857 autosynth [DEBUG] > Running: git checkout 5a506ec8765cc04f7e29f888b8e9b257d9a7ae11
Note: checking out '5a506ec8765cc04f7e29f888b8e9b257d9a7ae11'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 5a506ec build(java): enable snippet-bot (#818)
2020-10-20 05:43:00,867 autosynth [DEBUG] > Running: git branch -f autosynth-27
2020-10-20 05:43:00,870 autosynth [DEBUG] > Running: git checkout autosynth-27
Switched to branch 'autosynth-27'
2020-10-20 05:43:00,877 autosynth [INFO] > Running synthtool
2020-10-20 05:43:00,877 autosynth [INFO] > ['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']
2020-10-20 05:43:00,877 autosynth [DEBUG] > log_file_path: /tmpfs/src/logs/python-documentai/27/sponge_log.log
2020-10-20 05:43:00,879 autosynth [DEBUG] > Running: /tmpfs/src/github/synthtool/env/bin/python3 -m synthtool --metadata synth.metadata synth.py --
2020-10-20 05:43:01,124 synthtool [DEBUG] > Executing /home/kbuilder/.cache/synthtool/python-documentai/synth.py.
On branch autosynth-27
nothing to commit, working tree clean
2020-10-20 05:43:01,253 synthtool [DEBUG] > Using precloned repo /home/kbuilder/.cache/synthtool/synthtool
2020-10-20 05:43:01,258 synthtool [DEBUG] > Ensuring dependencies.
DEBUG:synthtool:Ensuring dependencies.
2020-10-20 05:43:01,268 synthtool [DEBUG] > Using precloned repo /home/kbuilder/.cache/synthtool/synthtool
DEBUG:synthtool:Using precloned repo /home/kbuilder/.cache/synthtool/synthtool
2020-10-20 05:43:01,271 synthtool [DEBUG] > Cloning googleapis.
DEBUG:synthtool:Cloning googleapis.
2020-10-20 05:43:01,877 synthtool [DEBUG] > Generating code for: //google/cloud/documentai/v1beta2:documentai-v1beta2-py.
DEBUG:synthtool:Generating code for: //google/cloud/documentai/v1beta2:documentai-v1beta2-py.
2020-10-20 05:43:05,201 synthtool [SUCCESS] > Generated code into /tmpfs/tmp/tmpiomnly0x.
SUCCESS:synthtool:Generated code into /tmpfs/tmp/tmpiomnly0x.
2020-10-20 05:43:05,236 synthtool [DEBUG] > Generating code for: //google/cloud/documentai/v1beta3:documentai-v1beta3-py.
DEBUG:synthtool:Generating code for: //google/cloud/documentai/v1beta3:documentai-v1beta3-py.
2020-10-20 05:43:08,478 synthtool [SUCCESS] > Generated code into /tmpfs/tmp/tmpm6dytbd0.
SUCCESS:synthtool:Generated code into /tmpfs/tmp/tmpm6dytbd0.
.coveragerc
.flake8
.github/CONTRIBUTING.md
.github/ISSUE_TEMPLATE/bug_report.md
.github/ISSUE_TEMPLATE/feature_request.md
.github/ISSUE_TEMPLATE/support_request.md
.github/PULL_REQUEST_TEMPLATE.md
.github/release-please.yml
.github/snippet-bot.yml
.gitignore
.kokoro/build.sh
.kokoro/continuous/common.cfg
.kokoro/continuous/continuous.cfg
.kokoro/docker/docs/Dockerfile
.kokoro/docker/docs/fetch_gpg_keys.sh
.kokoro/docs/common.cfg
.kokoro/docs/docs-presubmit.cfg
.kokoro/docs/docs.cfg
.kokoro/populate-secrets.sh
.kokoro/presubmit/common.cfg
.kokoro/presubmit/presubmit.cfg
.kokoro/publish-docs.sh
.kokoro/release.sh
.kokoro/release/common.cfg
.kokoro/release/release.cfg
.kokoro/samples/lint/common.cfg
.kokoro/samples/lint/continuous.cfg
.kokoro/samples/lint/periodic.cfg
.kokoro/samples/lint/presubmit.cfg
.kokoro/samples/python3.6/common.cfg
.kokoro/samples/python3.6/continuous.cfg
.kokoro/samples/python3.6/periodic.cfg
.kokoro/samples/python3.6/presubmit.cfg
.kokoro/samples/python3.7/common.cfg
.kokoro/samples/python3.7/continuous.cfg
.kokoro/samples/python3.7/periodic.cfg
.kokoro/samples/python3.7/presubmit.cfg
.kokoro/samples/python3.8/common.cfg
.kokoro/samples/python3.8/continuous.cfg
.kokoro/samples/python3.8/periodic.cfg
.kokoro/samples/python3.8/presubmit.cfg
.kokoro/test-samples.sh
.kokoro/trampoline.sh
.kokoro/trampoline_v2.sh
.trampolinerc
CODE_OF_CONDUCT.md
CONTRIBUTING.rst
LICENSE
MANIFEST.in
docs/_static/custom.css
docs/_templates/layout.html
docs/conf.py.j2
docs/multiprocessing.rst
noxfile.py.j2
renovate.json
samples/AUTHORING_GUIDE.md
samples/CONTRIBUTING.md
scripts/decrypt-secrets.sh
scripts/readme-gen/readme_gen.py.j2
scripts/readme-gen/templates/README.tmpl.rst
scripts/readme-gen/templates/auth.tmpl.rst
scripts/readme-gen/templates/auth_api_key.tmpl.rst
scripts/readme-gen/templates/install_deps.tmpl.rst
scripts/readme-gen/templates/install_portaudio.tmpl.rst
setup.cfg
testing/.gitignore
2020-10-20 05:43:08,703 synthtool [INFO] > Generating templates for samples project 'samples/snippets'
INFO:synthtool:Generating templates for samples project 'samples/snippets'
Skipping: README.md
README.rst
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
main()
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
spec.loader.exec_module(synth_module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 57, in <module>
python.py_samples()
File "/tmpfs/src/github/synthtool/synthtool/languages/python.py", line 141, in py_samples
result = t.render(subdir=sample_project_dir, **sample_readme_metadata)
File "/tmpfs/src/github/synthtool/synthtool/sources/templates.py", line 83, in render
_render_to_path(self.env, template_name, self.dir / subdir, kwargs)
File "/tmpfs/src/github/synthtool/synthtool/sources/templates.py", line 53, in _render_to_path
output.dump(fh)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 1313, in dump
fp.writelines(iterable)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 1357, in __next__
return self._next()
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 1125, in generate
yield self.environment.handle_exception()
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 832, in handle_exception
reraise(*rewrite_traceback_stack(source=source))
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/_compat.py", line 28, in reraise
raise value.with_traceback(tb)
File "/home/kbuilder/.cache/synthtool/synthtool/synthtool/gcp/templates/python_samples/README.rst", line 5, in top-level template code
{{product.name}} Python Samples
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/jinja2/environment.py", line 471, in getattr
return getattr(obj, attribute)
jinja2.exceptions.UndefinedError: 'product' is undefined
2020-10-20 05:43:08,769 autosynth [ERROR] > Synthesis failed
2020-10-20 05:43:08,770 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at 5162674 docs: fix pypi link (#46)
2020-10-20 05:43:08,781 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2020-10-20 05:43:08,788 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
main()
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
return _inner_main(temp_dir)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
synthesizer.synthesize(synth_log_path, self.environ)
File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
synth_proc.check_returncode() # Raise an exception.
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.
Google internal developers can see the full log here.
def _handle_error_response(response_data):
"""Translates an error response into an exception.
Args:
response_data (Mapping): The decoded response data.
Raises:
google.auth.exceptions.RefreshError: The errors contained in response_data.
"""
try:
error_details = "{}: {}".format(
response_data["error"], response_data.get("error_description")
)
# If no details could be extracted, use the response data.
except (KeyError, ValueError):
error_details = json.dumps(response_data)
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
v1beta3 includes a ListProcessorsPager service which is missing in v1. Having the ListProcessorsPager is very important to query information about Processors. Please implement this for v1 as well so that v1 can be used instead of v1beta3.
Running tests for this library is currently not documented in README.rst despite some mentions of how to run tests in CONTRIBUTING.rst there is no documentation for how to setup your locally environment to fully test the library.
This is a request for documentation on how to run all tests for this library locally.
ERROR: /home/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta2/BUILD.bazel:165:1: //google/cloud/documentai/v1beta2:documentai_py_gapic: `bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional --plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin --python_gapic_out=retry-config=google/cloud/documentai/v1beta2/documentai_v1beta2_grpc_service_config.json:bazel-out/k8-fastbuild/bin/google/cloud/documentai/v1beta2/documentai_py_gapic.srcjar.zip -Igoogle/cloud/documentai/v1beta2/document.proto=google/cloud/documentai/v1beta2/document.proto -Igoogle/cloud/documentai/v1beta2/document_understanding.proto=google/cloud/documentai/v1beta2/document_understanding.proto -Igoogle/cloud/documentai/v1beta2/geometry.proto=google/cloud/documentai/v1beta2/geometry.proto -Igoogle/api/annotations.proto=google/api/annotations.proto -Igoogle/api/http.proto=google/api/http.proto -Igoogle/protobuf/descriptor.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/descriptor_proto/google/protobuf/descriptor.proto -Igoogle/api/client.proto=google/api/client.proto -Igoogle/api/field_behavior.proto=google/api/field_behavior.proto -Igoogle/longrunning/operations.proto=google/longrunning/operations.proto -Igoogle/rpc/status.proto=google/rpc/status.proto -Igoogle/protobuf/any.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/any_proto/google/protobuf/any.proto -Igoogle/protobuf/duration.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/duration_proto/google/protobuf/duration.proto -Igoogle/protobuf/empty.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/empty_proto/google/protobuf/empty.proto -Igoogle/type/color.proto=google/type/color.proto -Igoogle/protobuf/wrappers.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/wrappers_proto/google/protobuf/wrappers.proto -Igoogle/protobuf/timestamp.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/timestamp_proto/google/protobuf/timestamp.proto google/cloud/documentai/v1beta2/document.proto google/cloud/documentai/v1beta2/document_understanding.proto google/cloud/documentai/v1beta2/geometry.proto` failed (Exit 1) protoc failed: error executing command bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional '--plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin' ... (remaining 20 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
google/cloud/documentai/v1beta2/geometry.proto:19:1: warning: Import google/api/annotations.proto is unused.
google/cloud/documentai/v1beta2/document.proto:23:1: warning: Import google/api/annotations.proto is unused.
Traceback (most recent call last):
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate_with_pandoc.py", line 3, in <module>
from gapic.cli import generate
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate.py", line 23, in <module>
from gapic import generator
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/__init__.py", line 21, in <module>
from .generator import Generator
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/generator.py", line 24, in <module>
from gapic.samplegen import manifest, samplegen
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/__init__.py", line 15, in <module>
from gapic.samplegen import samplegen
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/samplegen.py", line 27, in <module>
from gapic.schema import wrappers
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/__init__.py", line 23, in <module>
from gapic.schema.api import API
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/api.py", line 29, in <module>
from google.api_core import exceptions # type: ignore
ModuleNotFoundError: No module named 'google.api_core'
--python_gapic_out: protoc-gen-python_gapic: Plugin failed with status code 1.
Target //google/cloud/documentai/v1beta2:documentai-v1beta2-py failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1.178s, Critical Path: 0.87s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
main()
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
spec.loader.exec_module(synth_module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 38, in <module>
bazel_target=f"//google/cloud/documentai/{version}:documentai-{version}-py",
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 52, in py_library
return self._generate_code(service, version, "python", **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 197, in _generate_code
shell.run(bazel_run_args)
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
raise exc
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 33, in run
encoding="utf-8",
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2021-01-28 05:42:43,600 autosynth [ERROR] > Synthesis failed
2021-01-28 05:42:43,600 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at 745bb99 chore: added increased timeout on flaky batch request (#84)
2021-01-28 05:42:43,606 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2021-01-28 05:42:43,611 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
main()
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
return _inner_main(temp_dir)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
synthesizer.synthesize(synth_log_path, self.environ)
File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
synth_proc.check_returncode() # Raise an exception.
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.
Google internal developers can see the full log here.
the correct url looks like
https://eu-documentai.googleapis.com/v1beta3/projects/123456789/locations/eu/processors/azerty:process
Stack trace
google.api_core.exceptions.PermissionDenied: 403 Permission 'documentai.processors.processOnline' denied on resource '//documentai.googleapis.com/projects/1234567890/locations/eu/processors/azerty' (or it may not exist).
Traceback (most recent call last):
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/grpc/_channel.py", line 923, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/grpc/_channel.py", line 826, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Request contains an invalid argument."
debug_error_string = "{"created":"@1609844424.988576000","description":"Error received from peer ipv4:216.58.208.170:443","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "parse_from_gs_beta3.py", line 103, in <module>
batch_process_documents(project_id='my-prj', location='eu', processor_id='cd6.....', gcs_input_uri='gs://my-bucket/file.pdf', gcs_output_uri='gs://my-bucket/', gcs_output_uri_prefix='doc_ai_out')
File "parse_from_gs_beta3.py", line 41, in batch_process_documents
operation = client.batch_process_documents(request)
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/cloud/documentai_v1beta3/services/document_processor_service/client.py", line 411, in batch_process_documents
response = rpc(request, retry=retry, timeout=timeout, metadata=metadata,)
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/gapic_v1/method.py", line 145, in __call__
return wrapped_func(*args, **kwargs)
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/retry.py", line 281, in retry_wrapped_func
return retry_target(
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/Users/postak/Library/Python/3.8/lib/python/site-packages/google/api_core/grpc_helpers.py", line 59, in error_remapped_callable
six.raise_from(exceptions.from_grpc_error(exc), exc)
File "<string>", line 3, in raise_from
google.api_core.exceptions.InvalidArgument: 400 Request contains an invalid argument.```
Running process_document_sample_v1beta3.py with /resources/invoice.pdf
Processor in EU
Stack trace
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py", line 57, in error_remapped_callable
return callable_(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Request contains an invalid argument."
debug_error_string = "{"created":"@1603381610.907626000","description":"Error received from peer ipv6:[x:x:x:x::x]:443","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Request contains an invalid argument.","grpc_status":3}"
>
Please investigate and fix this issue within 5 business days. While it remains broken,
this library cannot be updated with changes to the python-documentai API, and the library grows
stale.
ome/kbuilder/.cache/synthtool/googleapis/WORKSPACE:77:1
DEBUG: Rule 'com_google_protoc_java_resource_names_plugin' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "4b714b35ee04ba90f560ee60e64c7357428efcb6b0f3a298f343f8ec2c6d4a5d"
DEBUG: Call stack for the definition of repository 'com_google_protoc_java_resource_names_plugin' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:234:1
DEBUG: Rule 'protoc_docs_plugin' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "33b387245455775e0de45869c7355cc5a9e98b396a6fc43b02812a63b75fee20"
DEBUG: Call stack for the definition of repository 'protoc_docs_plugin' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:258:1
DEBUG: Rule 'rules_python' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "48f7e716f4098b85296ad93f5a133baf712968c13fbc2fdf3a6136158fe86eac"
DEBUG: Call stack for the definition of repository 'rules_python' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:42:1
DEBUG: Rule 'gapic_generator_python' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "fe995def6873fcbdc2a8764ef4bce96eb971a9d1950fe9db9be442f3c64fb3b6"
DEBUG: Call stack for the definition of repository 'gapic_generator_python' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:278:1
DEBUG: Rule 'com_googleapis_gapic_generator_go' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "c0d0efba86429cee5e52baf838165b0ed7cafae1748d025abec109d25e006628"
DEBUG: Call stack for the definition of repository 'com_googleapis_gapic_generator_go' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:300:1
DEBUG: Rule 'gapic_generator_php' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "3dffc5c34a5f35666843df04b42d6ce1c545b992f9c093a777ec40833b548d86"
DEBUG: Call stack for the definition of repository 'gapic_generator_php' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:364:1
DEBUG: Rule 'gapic_generator_csharp' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "4db430cfb9293e4521ec8e8138f8095faf035d8e752cf332d227710d749939eb"
DEBUG: Call stack for the definition of repository 'gapic_generator_csharp' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:386:1
DEBUG: Rule 'gapic_generator_ruby' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "a14ec475388542f2ea70d16d75579065758acc4b99fdd6d59463d54e1a9e4499"
DEBUG: Call stack for the definition of repository 'gapic_generator_ruby' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:400:1
DEBUG: /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/rules_python/python/pip.bzl:61:5: DEPRECATED: the pip_repositories rule has been replaced with pip_install, please see rules_python 0.1 release notes
DEBUG: Rule 'bazel_skylib' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "1dde365491125a3db70731e25658dfdd3bc5dbdfd11b840b3e987ecf043c7ca0"
DEBUG: Call stack for the definition of repository 'bazel_skylib' which is a http_archive (rule definition at /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/bazel_tools/tools/build_defs/repo/http.bzl:296:16):
- <builtin>
- /home/kbuilder/.cache/synthtool/googleapis/WORKSPACE:35:1
Analyzing: target //google/cloud/documentai/v1beta2:documentai-v1beta2-py (1 packages loaded, 0 targets configured)
ERROR: /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/upb/bazel/upb_proto_library.bzl:257:29: aspect() got unexpected keyword argument 'incompatible_use_toolchain_transition'
ERROR: Analysis of target '//google/cloud/documentai/v1beta2:documentai-v1beta2-py' failed; build aborted: error loading package '@com_github_grpc_grpc//': in /home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/external/com_github_grpc_grpc/bazel/grpc_build_system.bzl: Extension file 'bazel/upb_proto_library.bzl' has errors
INFO: Elapsed time: 0.276s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded, 4 targets configured)
FAILED: Build did NOT complete successfully (2 packages loaded, 4 targets configured)
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
main()
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
spec.loader.exec_module(synth_module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 37, in <module>
bazel_target=f"//google/cloud/documentai/{version}:documentai-{version}-py",
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 52, in py_library
return self._generate_code(service, version, "python", False, **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 204, in _generate_code
shell.run(bazel_run_args)
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
raise exc
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 33, in run
encoding="utf-8",
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2021-04-27 02:14:32,448 autosynth [ERROR] > Synthesis failed
2021-04-27 02:14:32,448 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at 30008b4 chore(revert): revert preventing normalization (#129)
2021-04-27 02:14:32,453 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2021-04-27 02:14:32,460 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 356, in <module>
main()
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 191, in main
return _inner_main(temp_dir)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 336, in _inner_main
commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 68, in synthesize_loop
has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
synthesizer.synthesize(synth_log_path, self.environ)
File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
synth_proc.check_returncode() # Raise an exception.
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.
Google internal developers can see the full log here.
We need the OCR symbols (characters) for some extra processing. Currently, the documentai response includes: blocks, paragraphs, lines and tokens (equivalent to words). Do you think you can add the OCR symbols to the response?
Thanks for stopping by to let us know something could be better!
PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.
Please run down the following list and make sure you've tried the usual "quick fixes":
I am trying to run the OOTB code shared for batch processing of the invoices. Here is the sample call that invokes the code: batch_process_documents(project_id='', location='us', processor_id='', gcs_input_uri='gs:///Invoice.pdf', gcs_output_uri='gs://*', gcs_output_uri_prefix='inv')
Run the boiler plate code.
Code example
# example
Stack trace
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-42be8c6d0dd8> in <module>
----> 1 batch_process_documents(project_id='***', location='us', processor_id='***', gcs_input_uri='gs://***/Invoice.pdf', gcs_output_uri='gs://***', gcs_output_uri_prefix='inv')
<ipython-input-28-f0fe586dfadb> in batch_process_documents(project_id, location, processor_id, gcs_input_uri, gcs_output_uri, gcs_output_uri_prefix)
66 for i, blob in enumerate(blob_list):
67 # Download the contents of this blob as a bytes object.
---> 68 blob_as_bytes = blob.download_as_bytes()
69 document = documentai.types.Document.from_json(blob_as_bytes)
70
AttributeError: 'Blob' object has no attribute 'download_as_bytes'
Making sure to follow these steps will guarantee the quickest resolution possible.
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
self = <google.api_core.operation.Operation object at 0x7f8f8ec5c198>
retry = <google.api_core.retry.Retry object at 0x7f8f8eed77f0>
def _done_or_raise(self, retry=DEFAULT_RETRY):
"""Check if the future is done and raise if it's not."""
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
if not self.done(**kwargs):
raise _OperationNotComplete()
E google.api_core.future.polling._OperationNotComplete
The above exception was the direct cause of the following exception:
self = <google.api_core.operation.Operation object at 0x7f8f8ec5c198>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7f8f8eed77f0>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
target = functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f8f8ec5c198>>)
predicate = <function if_exception_type..if_exception_type_predicate at 0x7f8f8eed6158>
sleep_generator = <generator object exponential_sleep_generator at 0x7f8f8ed82f68>
deadline = 120, on_error = None
def retry_target(target, predicate, sleep_generator, deadline, on_error=None):
"""Call a function and retry if it fails.
This is the lowest-level retry helper. Generally, you'll use the
higher-level retry helper :class:`Retry`.
Args:
target(Callable): The function to call and retry. This must be a
nullary function - apply arguments with `functools.partial`.
predicate (Callable[Exception]): A callable used to determine if an
exception raised by the target should be considered retryable.
It should return True to retry or False otherwise.
sleep_generator (Iterable[float]): An infinite iterator that determines
how long to sleep between retries.
deadline (float): How long to keep retrying the target. The last sleep
period is shortened as necessary, so that the last retry runs at
``deadline`` (and not considerably beyond it).
on_error (Callable[Exception]): A function to call while processing a
retryable exception. Any error raised by this function will *not*
be caught.
Returns:
Any: the return value of the target function.
Raises:
google.api_core.RetryError: If the deadline is exceeded while retrying.
ValueError: If the sleep generator stops yielding values.
Exception: If the target raises a method that isn't retryable.
"""
if deadline is not None:
deadline_datetime = datetime_helpers.utcnow() + datetime.timedelta(
seconds=deadline
)
else:
deadline_datetime = None
last_exc = None
for sleep in sleep_generator:
try:
return target()
# pylint: disable=broad-except
# This function explicitly must deal with broad exceptions.
except Exception as exc:
if not predicate(exc):
raise
last_exc = exc
if on_error is not None:
on_error(exc)
now = datetime_helpers.utcnow()
if deadline_datetime is not None:
if deadline_datetime <= now:
six.raise_from(
exceptions.RetryError(
"Deadline of {:.1f}s exceeded while calling {}".format(
deadline, target
),
last_exc,
),
value = None, from_value = _OperationNotComplete()
???
E google.api_core.exceptions.RetryError: Deadline of 120.0s exceeded while calling functools.partial(<bound method PollingFuture._done_or_raise of <google.api_core.operation.Operation object at 0x7f8f8ec5c198>>), last exception:
:3: RetryError
During handling of the above exception, another exception occurred:
capsys = <_pytest.capture.CaptureFixture object at 0x7f8f8ed727b8>
batch_parse_form_v1beta2.py:84: in batch_parse_form
operation.result(timeout)
.nox/py-3-6/lib/python3.6/site-packages/google/api_core/future/polling.py:129: in result
self._blocking_poll(timeout=timeout, **kwargs)
self = <google.api_core.operation.Operation object at 0x7f8f8ec5c198>
timeout = 120, retry = <google.api_core.retry.Retry object at 0x7f8f8eed77f0>
def _blocking_poll(self, timeout=None, retry=DEFAULT_RETRY):
"""Poll and wait for the Future to be resolved.
Args:
timeout (int):
How long (in seconds) to wait for the operation to complete.
If None, wait indefinitely.
"""
if self._result_set:
return
retry_ = self._retry.with_deadline(timeout)
try:
kwargs = {} if retry is DEFAULT_RETRY else {"retry": retry}
retry_(self._done_or_raise)(**kwargs)
except exceptions.RetryError:
raise concurrent.futures.TimeoutError(
"Operation did not complete within the designated " "timeout."
)
E concurrent.futures._base.TimeoutError: Operation did not complete within the designated timeout.
ome/kbuilder/.cache/synthtool/googleapis/google/cloud/documentai/v1beta2/BUILD.bazel:165:1: //google/cloud/documentai/v1beta2:documentai_py_gapic: `bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional --plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin --python_gapic_out=retry-config=google/cloud/documentai/v1beta2/documentai_v1beta2_grpc_service_config.json:bazel-out/k8-fastbuild/bin/google/cloud/documentai/v1beta2/documentai_py_gapic.srcjar.zip -Igoogle/cloud/documentai/v1beta2/document.proto=google/cloud/documentai/v1beta2/document.proto -Igoogle/cloud/documentai/v1beta2/document_understanding.proto=google/cloud/documentai/v1beta2/document_understanding.proto -Igoogle/cloud/documentai/v1beta2/geometry.proto=google/cloud/documentai/v1beta2/geometry.proto -Igoogle/api/annotations.proto=google/api/annotations.proto -Igoogle/api/http.proto=google/api/http.proto -Igoogle/protobuf/descriptor.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/descriptor_proto/google/protobuf/descriptor.proto -Igoogle/api/client.proto=google/api/client.proto -Igoogle/api/field_behavior.proto=google/api/field_behavior.proto -Igoogle/longrunning/operations.proto=google/longrunning/operations.proto -Igoogle/rpc/status.proto=google/rpc/status.proto -Igoogle/protobuf/any.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/any_proto/google/protobuf/any.proto -Igoogle/protobuf/duration.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/duration_proto/google/protobuf/duration.proto -Igoogle/protobuf/empty.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/empty_proto/google/protobuf/empty.proto -Igoogle/type/color.proto=google/type/color.proto -Igoogle/protobuf/wrappers.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/wrappers_proto/google/protobuf/wrappers.proto -Igoogle/protobuf/timestamp.proto=bazel-out/k8-fastbuild/bin/external/com_google_protobuf/_virtual_imports/timestamp_proto/google/protobuf/timestamp.proto google/cloud/documentai/v1beta2/document.proto google/cloud/documentai/v1beta2/document_understanding.proto google/cloud/documentai/v1beta2/geometry.proto` failed (Exit 1) protoc failed: error executing command bazel-out/host/bin/external/com_google_protobuf/protoc --experimental_allow_proto3_optional '--plugin=protoc-gen-python_gapic=bazel-out/host/bin/external/gapic_generator_python/gapic_plugin' ... (remaining 20 argument(s) skipped)
Use --sandbox_debug to see verbose messages from the sandbox
google/cloud/documentai/v1beta2/geometry.proto:19:1: warning: Import google/api/annotations.proto is unused.
google/cloud/documentai/v1beta2/document.proto:23:1: warning: Import google/api/annotations.proto is unused.
Traceback (most recent call last):
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate_with_pandoc.py", line 3, in <module>
from gapic.cli import generate
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/cli/generate.py", line 23, in <module>
from gapic import generator
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/__init__.py", line 21, in <module>
from .generator import Generator
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/generator/generator.py", line 24, in <module>
from gapic.samplegen import manifest, samplegen
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/__init__.py", line 15, in <module>
from gapic.samplegen import samplegen
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/samplegen/samplegen.py", line 27, in <module>
from gapic.schema import wrappers
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/__init__.py", line 23, in <module>
from gapic.schema.api import API
File "/home/kbuilder/.cache/bazel/_bazel_kbuilder/a732f932c2cbeb7e37e1543f189a2a73/sandbox/linux-sandbox/11/execroot/com_google_googleapis/bazel-out/host/bin/external/gapic_generator_python/gapic_plugin.runfiles/gapic_generator_python/gapic/schema/api.py", line 29, in <module>
from google.api_core import exceptions # type: ignore
ModuleNotFoundError: No module named 'google.api_core'
--python_gapic_out: protoc-gen-python_gapic: Plugin failed with status code 1.
Target //google/cloud/documentai/v1beta2:documentai-v1beta2-py failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1.138s, Critical Path: 0.87s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 102, in <module>
main()
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/tmpfs/src/github/synthtool/env/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/__main__.py", line 94, in main
spec.loader.exec_module(synth_module) # type: ignore
File "<frozen importlib._bootstrap_external>", line 678, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/kbuilder/.cache/synthtool/python-documentai/synth.py", line 38, in <module>
bazel_target=f"//google/cloud/documentai/{version}:documentai-{version}-py",
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 52, in py_library
return self._generate_code(service, version, "python", **kwargs)
File "/tmpfs/src/github/synthtool/synthtool/gcp/gapic_bazel.py", line 193, in _generate_code
shell.run(bazel_run_args)
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 39, in run
raise exc
File "/tmpfs/src/github/synthtool/synthtool/shell.py", line 33, in run
encoding="utf-8",
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['bazel', '--max_idle_secs=240', 'build', '//google/cloud/documentai/v1beta2:documentai-v1beta2-py']' returned non-zero exit status 1.
2021-01-21 05:42:36,028 autosynth [ERROR] > Synthesis failed
2021-01-21 05:42:36,028 autosynth [DEBUG] > Running: git reset --hard HEAD
HEAD is now at f2cdc15 chore(deps): update dependency google-cloud-storage to v1.35.0 (#78)
2021-01-21 05:42:36,033 autosynth [DEBUG] > Running: git checkout autosynth
Switched to branch 'autosynth'
2021-01-21 05:42:36,038 autosynth [DEBUG] > Running: git clean -fdx
Removing __pycache__/
Traceback (most recent call last):
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 354, in <module>
main()
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 189, in main
return _inner_main(temp_dir)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 334, in _inner_main
commit_count = synthesize_loop(x, multiple_prs, change_pusher, synthesizer)
File "/tmpfs/src/github/synthtool/autosynth/synth.py", line 65, in synthesize_loop
has_changes = toolbox.synthesize_version_in_new_branch(synthesizer, youngest)
File "/tmpfs/src/github/synthtool/autosynth/synth_toolbox.py", line 259, in synthesize_version_in_new_branch
synthesizer.synthesize(synth_log_path, self.environ)
File "/tmpfs/src/github/synthtool/autosynth/synthesizer.py", line 120, in synthesize
synth_proc.check_returncode() # Raise an exception.
File "/home/kbuilder/.pyenv/versions/3.6.9/lib/python3.6/subprocess.py", line 389, in check_returncode
self.stderr)
subprocess.CalledProcessError: Command '['/tmpfs/src/github/synthtool/env/bin/python3', '-m', 'synthtool', '--metadata', 'synth.metadata', 'synth.py', '--']' returned non-zero exit status 1.
Google internal developers can see the full log here.
I'm trying to use Google Document AI for getting data from a PDF file. But a 'type' field of entity is empty always. But it must be non-empty according to the documentation.
fromgoogle.cloudimportdocumentai_v1beta2asdocumentaidefparse_invoice(project_id='myprojectid',
input_uri='gs://cloud-samples-data/documentai/invoice.pdf'):
"""Procsingle document with the Document AI API, including text extraction and entity extraction."""client=documentai.DocumentUnderstandingServiceClient()
gcs_source=documentai.types.GcsSource(uri=input_uri)
# mime_type can be application/pdf, image/tiff,# and image/gif, or application/jsoninput_config=documentai.types.InputConfig(
gcs_source=gcs_source, mime_type='application/pdf')
entity_p=documentai.types.EntityExtractionParams(enabled=True)
parent='projects/{}/locations/us'.format(project_id)
request=documentai.types.ProcessDocumentRequest(
parent=parent,
input_config=input_config,
entity_extraction_params=entity_p)
document=client.process_document(request=request)
print(document.entities)
As per the API documentation for v1beta2 there is a provision to provide custom model id /annotation dataset for FormExtraction,
model_version
Model version of the form extraction system. Default is “builtin/stable”. Specify “builtin/latest” for the latest model. For custom form models, specify: “custom/{model_name}”. Model name format is “bucket_name/path/to/modeldir” corresponding to “gs://bucket_name/path/to/modeldir” where annotated examples are stored.
Is there any documentation or guideline available on how to train the model or prepare the annotated dataset which can be put into gs bucket and referred in the model_version parameter?