Giter Club home page Giter Club logo

Comments (3)

chezou avatar chezou commented on July 26, 2024

Thanks for your report, but could you show me the specific error message? other wise I can say nothing.

from tabula-py.

amalroshan avatar amalroshan commented on July 26, 2024

#Code 1:

from tabula import read_pdf
dfs=read_pdf('test.pdf', encoding='cp1254', output_format='csv')
print(dfs)
#ouput 1:
C:\Users\amal\AppData\Local\Programs\Python\Python36-32\python.exe D:/pycharm/cdsco/tabula_csv.py
Oct 03, 2017 7:12:21 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:12:26 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:12:27 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
    S.No.                Drug Name  \
0     NaN                      NaN   
1     1.0   Sofosbuvir 400 mg film   
2     NaN            coated Tablet   
3     NaN                      NaN   
4     2.0   Hydralazine tablets BP   
5     NaN        25 mg (Additional   
6     NaN                Strength)   
7     3.0   Hydralazine tablets BP   
8     NaN                    50 mg   
9     NaN                      NaN   
10    NaN    (Additional Strength)   
11    4.0             Bendamustine   
12    NaN  hydrochloride injection   
13    NaN    90 mg/mL (Fill volume   
14    NaN                      NaN   
15    NaN  0.5 mL in 2 ml capacity   
16    NaN                      NaN   
17    NaN  vial & 2 mL filled in 2   
18    NaN                      NaN   
19    NaN        mL capacity vial)   
20    5.0       Eltrombopagolamine   
21    NaN      25 mg/50 mg tablets   
22    NaN                      NaN   
23    NaN  (Additional indication)   
24    NaN                      NaN   
25    NaN                      NaN   
26    NaN                      NaN   
27    NaN                      NaN   
28    NaN                      NaN   
29    NaN                      NaN   
30    NaN                      NaN   
31    6.0          Azacitidine for   
32    NaN         injection 100 mg   
33    NaN                      NaN   
34    NaN                      NaN   
35    7.0     Methylcobalamin 1500   
36    NaN               mcg orally   
37    NaN    disintegrating strips   
38    8.0   Cefditoren Pivoxil dry   
39    NaN    powder for suspension   
40    NaN              100 mg/5 mL   
41    NaN                      NaN   
42    NaN                      NaN   
43    NaN                      NaN   
44    NaN                      NaN   
45    NaN                      NaN   
46    NaN                      NaN   

                                           Indication     Date of  
0                                                 NaN    Approval  
1        In combination with other medicinal products  16.02.2017  
2      for the treatment of Chronic Hepatitis C (CHC)         NaN  
3                                           in adults         NaN  
4             For moderate to severe hypertension (in  23.02.2017  
5          conjunction with a ?-adrenoceptor blocking         NaN  
6         agent or diuretic) and hypertensive crisis.         NaN  
7             For moderate to severe hypertension (in  23.02.2017  
8          conjunction with a ?-adrenoceptor blocking         NaN  
9         agent or diuretic) and hypertensive crisis.         NaN  
10                                                NaN         NaN  
11              1. For the treatment of patients with  02.03.2017  
12                       chronic lymphocytic leukemia         NaN  
13             2. For the use in Indolent B-cell Non-         NaN  
14                  Hodgkins Lymphoma (NHL) that has         NaN  
15                                                NaN         NaN  
16             Progressed During or Within six months         NaN  
17                                                NaN         NaN  
18                   of treatment with Rituximab or a         NaN  
19                      Rituximab containing Regimen.         NaN  
20           For the treatment of thrombocytopenia in  02.03.2017  
21  paediatric patients 1 year and older with chronic         NaN  
22          immune(idiopathic) thrombocytopenia (ITP)         NaN  
23                                                NaN         NaN  
24           who have had an insufficient response to         NaN  
25                corticosteroids, immunoglobulins or         NaN  
26   splenectomy. (It should be used only in patients         NaN  
27          with ITP whose degree of Thrombocytopenia         NaN  
28       and clinical condition increase the risk for         NaN  
29   bleeding. It should not be used in an attempt to         NaN  
30                        normalize platelet counts).         NaN  
31  For the treatment of adult patients with all sub-  07.03.2017  
32                  types of Myelodysplastic Syndrome         NaN  
33    With the condition: to be sold by retail on the         NaN  
34                    prescription of Oncologist only         NaN  
35       For the treatment of Diabetic Neuropathy and  10.03.2017  
36                              peripheral Neuropathy         NaN  
37                                                NaN         NaN  
38   For the treatment of mild to moderate infections  30.03.2017  
39      in adults and adolescents (12 years of age or         NaN  
40  older) which are caused by susceptible strains of         NaN  
41     the designated microorganisms in the condition         NaN  
42                                       listed below         NaN  
43                  ? Acute Bacterial Exacerbation of         NaN  
44                                 Chronic Bronchitis         NaN  
45                     ? Community-Acquired pneumonia         NaN  
46                          ? Pharyngitis/Tonsillitis         NaN  

Process finished with exit code 0

#description:
It only prints a table from first page

#Code 2:

from tabula import read_pdf
dfs=read_pdf('test.pdf', encoding='cp1254', output_format='csv', pages='all')
print(dfs)
#output 2:
Oct 03, 2017 7:17:20 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:21 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:22 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:24 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Oct 03, 2017 7:17:24 AM org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
INFO: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored
Traceback (most recent call last):
  File "D:/pycharm/cdsco/tabula_csv.py", line 2, in <module>
    dfs=read_pdf('test.pdf', encoding='cp1254', output_format='csv', pages='all')
  File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\tabula\wrapper.py", line 97, in read_pdf
    return pd.read_csv(io.BytesIO(output), **pandas_options)
  File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 411, in _read
    data = parser.read(nrows)
  File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1005, in read
    ret = self._engine.read(nrows)
  File "C:\Users\amal\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\io\parsers.py", line 1748, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10862)
  File "pandas\_libs\parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas\_libs\parsers.c:11138)
  File "pandas\_libs\parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)
  File "pandas\_libs\parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)
  File "pandas\_libs\parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 4 fields in line 49, saw 5

Process finished with exit code 1

from tabula-py.

chezou avatar chezou commented on July 26, 2024

Use guess=False, lattice=True options. It seems the result includes some waste columns, but it is tabula-java's problem...

In [33]: df = read_pdf('test.pdf', pages='all', encoding='shift-jis', guess=False, lattice=True)
10 03, 2017 11:04:42 午前 org.apache.pdfbox.pdmodel.font.PDCIDFontType2 <init>
情報: OpenType Layout tables used in font Times New Roman are not implemented in PDFBox and will be ignored

In [34]: df
Out[34]:
    Unnamed: 0                                              S.No.  \
0          NaN                                                NaN
1          1.0              Sofosbuvir 400 mg film\rcoated Tablet
2          2.0  Hydralazine tablets BP\r25 mg (Additional\rStr...
3          3.0  Hydralazine tablets BP\r50 mg\r(Additional Str...
4          4.0  Bendamustine\rhydrochloride injection\r90 mg/m...
5          5.0  Eltrombopagolamine\r25 mg/50 mg tablets\r(Addi...
6          6.0                  Azacitidine for\rinjection 100 mg
7          7.0  Methylcobalamin 1500\rmcg orally\rdisintegrati...
8          8.0  Cefditoren Pivoxil dry\rpowder for suspension\...
9          NaN                                                NaN
10         9.0  Thiotepa Powder for\rconcentrate for solution\...
11        10.0  Thiotepa Powder for\rconcentrate for solution\...
12        11.0  Bortezomib for\rInjection (i.v) 1 mg/vial\r(Ad...
13        12.0  Bortezomib 3.5 mg/vial\rPowder for solution fo...
14        13.0  Mesalamine Delayed\rRelease Tablets 800 mg\r(A...
15        14.0            Teneligliptin Film\rcoated Tablet 20 mg
16        15.0  Ticagrelor 60 mg\rtablets (Additional\rstrengt...
17         NaN                                                NaN
18        16.0                          Daclatasvir Tablet 30\rmg
19        17.0                          Daclatasvir Tablet 60\rmg
20        18.0                        Rosuvastatin Tablets 15\rmg
21        19.0            Rosuvastatin film\rcoated Tablets 30 mg
22         NaN                                                NaN
23        20.0                 Abiraterone Acetate\rTablet 500 mg
24        21.0          Enzalutamide hard\rGelatin Capsules 40 mg

                                           Unnamed: 2  Unnamed: 3  Drug Name  \
0                                                 NaN    Approval        NaN
1   In combination with other medicinal products\r...  16.02.2017        NaN
2   For moderate to severe hypertension (in\rconju...  23.02.2017        NaN
3   For moderate to severe hypertension (in\rconju...  23.02.2017        NaN
4   1.\rFor the treatment of patients with\rchroni...  02.03.2017        NaN
5   Forthetreatmentofthrombocytopeniain\rpaediatri...  02.03.2017        NaN
6   For the treatment of adult patients with all s...  07.03.2017        NaN
7   For the treatment of Diabetic Neuropathy and\r...  10.03.2017        NaN
8   For the treatment of mild to moderate infectio...  30.03.2017        NaN
9   ?\rAcute sinusitis\r?\rUncomplicated skin and ...         NaN        NaN
10  1.\rWithorwithouttotalbody\rirradiation(TBI),a...  06.04.2017        NaN
11  1.\rWithorwithouttotalbody\rirradiation(TBI),a...  06.04.2017        NaN
12  For the treatment of patients with mantle cell...  11.04.2017        NaN
13  For the treatment of patients with mantle cell...  11.04.2017        NaN
14  Forthetreatmentofmildtomoderateacute\rexacerba...  11.04.2017        NaN
15  For the treatment of Type 2 Diabetes Mellitus ...  28.04.2017        NaN
16  Indicatedforthepreventionofthrombotic\revents(...  02.05.2017        NaN
17  prescriptionofCardiologist/Internal\rMedicine ...         NaN        NaN
18For use with Sofosbuvir for the treatment of\...  24.05.2017        NaN
19For use with Sofosbuvir for the treatment of\...  24.05.2017        NaN
20  a.\rTreatmentofpatientswithprimary\rhyperchole...  30.05.2017        NaN
21  a.\rTreatmentofpatientswithprimary\rhyperchole...  30.05.2017        NaN
22  treatmentofadultpatientswith\rhypertriglycerde...         NaN        NaN
23  1.\rIn combination with prednisone for\rthetre...  12.07.2017        NaN
24  Forthetreatmentofadultswithmetastatic\rcastrat...  12.07.2017        NaN

    Unnamed: 5  Unnamed: 6  Indication  Unnamed: 8  Unnamed: 9  Date of  \
0          NaN         NaN         NaN         NaN         NaN      NaN
1          NaN         NaN         NaN         NaN         NaN      NaN
2          NaN         NaN         NaN         NaN         NaN      NaN
3          NaN         NaN         NaN         NaN         NaN      NaN
4          NaN         NaN         NaN         NaN         NaN      NaN
5          NaN         NaN         NaN         NaN         NaN      NaN
6          NaN         NaN         NaN         NaN         NaN      NaN
7          NaN         NaN         NaN         NaN         NaN      NaN
8          NaN         NaN         NaN         NaN         NaN      NaN
9          NaN         NaN         NaN         NaN         NaN      NaN
10         NaN         NaN         NaN         NaN         NaN      NaN
11         NaN         NaN         NaN         NaN         NaN      NaN
12         NaN         NaN         NaN         NaN         NaN      NaN
13         NaN         NaN         NaN         NaN         NaN      NaN
14         NaN         NaN         NaN         NaN         NaN      NaN
15         NaN         NaN         NaN         NaN         NaN      NaN
16         NaN         NaN         NaN         NaN         NaN      NaN
17         NaN         NaN         NaN         NaN         NaN      NaN
18         NaN         NaN         NaN         NaN         NaN      NaN
19         NaN         NaN         NaN         NaN         NaN      NaN
20         NaN         NaN         NaN         NaN         NaN      NaN
21         NaN         NaN         NaN         NaN         NaN      NaN
22         NaN         NaN         NaN         NaN         NaN      NaN
23         NaN         NaN         NaN         NaN         NaN      NaN
24         NaN         NaN         NaN         NaN         NaN      NaN

    Unnamed: 11
0           NaN
1           NaN
2           NaN
3           NaN
4           NaN
5           NaN
6           NaN
7           NaN
8           NaN
9           NaN
10          NaN
11          NaN
12          NaN
13          NaN
14          NaN
15          NaN
16          NaN
17          NaN
18          NaN
19          NaN
20          NaN
21          NaN
22          NaN
23          NaN
24          NaN

from tabula-py.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.