Giter Club home page Giter Club logo

casparser's People

Contributors

abhishekjain-qb avatar codereverser avatar deepsourcebot avatar dependabot[bot] avatar isaac-philip avatar kaushiksk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

casparser's Issues

Long mutual fund folio scheme name is not fully read

For long mutual fund scheme names that spans more than one row, only the first row is being read.
Example name:
"""
My long mutual fund scheme name ELSS -
Direct growth plan
"""
Only first row will be read: "My long mutual fund scheme name ELSS -"

decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>] exception when parsing IDCW payout transaction

Thanks for your script. It is very useful. I am trying to build some automation on top of this to analyze my MF investments.
Got this exception when parsing the amount of an IDCW payout transaction. Let me know if I can collect any more debug info to help. I will also see if I can debug further.

Versions used

casparser==0.4.6
python 3.8.3

Traceback

File "C:\Sathish\python\mf_statement_parser\venv2\lib\site-packages\casparser\parsers_init_.py", line 35, in read_cas_pdf
processed_data = process_cas_text("\u2029".join(partial_cas_data.lines))
File "C:\Sathish\python\mf_statement_parser\venv2\lib\site-packages\casparser\process_init_.py", line 28, in process_cas_text
return process_detailed_text(text)
File "C:\Sathish\python\mf_statement_parser\venv2\lib\site-packages\casparser\process\cas_detailed.py", line 167, in process_detailed_text
amt = Decimal(m.group(3).replace(",", "_").replace("(", "-"))
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

Screenshot of the transaction causing the issue

I ran in debug mode and noticed that it fails when attempting to parse the following transaction in pdf

image

pyCharm debug screenshot

image

Some more debug info

I got into the console on the debugger and found that we have a "." in the m.group(3) instead of probably the "amount" number?

m.group(3)
'.'

m.group(3).replace(",", "_").replace("(", "-")
'.'

amt = Decimal(m.group(3).replace(",", "_").replace("(", "-"))

Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2020.3.5\plugins\python-ce\helpers\pydev_pydevd_bundle\pydevd_exec2.py", line 1, in Exec
def Exec(exp, global_vars, local_vars=None):
File "", line 1, in
decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

Unable to fetch amfi values of recently renamed funds.

Adding the schemes json snippet for reference :
"schemes": [
{
"scheme": "HSBC Medium Duration Fund - Regular Growth (Formerly",
"advisor": "N/A",
"rta_code": "OLRCBG",
"rta": "CAMS",
"isin": null,
"amfi": null,
"type": "N/A",
"open": "xxx",
"close": "xxx",
"valuation": {
"date": "2023-01-12",
"nav": "16.9027",
"value": "2637.70"
},
"transactions": []
}

Print the Valuation of the Fund as well

Thank you for a great parser.

Took me a few tries to understand which report the parser will pick up as Cams is offering many ( so confusing! :-)

Finally, it worked for me.

The table printed is great.
But, what as a user I would also like to see is the total valuation of my fund.

This is only giving the Open/Close rates but the cams file also has:
Valuation on 06-Nov-2020 INR XX,XXX.XX

Can we pick that up too ?

Thanks.

Issue when there is space character inside RTA code #14

Screenshot 2023-04-03 at 8 32 14 AM

The SUMMARY_ROW_RE regex fails when there are RTA codes like the second and third row. It ends up reading folio number along with the first three characters of RTA code resulting in a null repsonse when ISIN check is done.

Plans to support parsing of CAS generated by CDSL?

Hi, checking whether there are plans to support parsing of CAS generated by CDSL, as it is much richer in info (contains all stock holdings too alongwith mutual funds).

If no near-term plans, has there been any effort in this direction? I could pitch in, or start from somewhere if some work has already done,,

[CAMS CAS]Issue in folio parsing when PAN data unavailable

Hi,

Folio is not getting parsed in below case. Transactions are getting mapped to previously parsed folio.
image

Below are details of pdf elements and lines for debug

[28.93000030517578, 93.44519805908203, 553.7244873046875, 103.9654769897461, 'Date\t\tTransaction\t\tAmount\t\tUnits\t\tPrice\t\tUnit']
[358.6300048828125, 102.31519317626953, 566.5147705078125, 124.0643310546875, '(INR)\t\t(INR)\t\tBalance\nKYC: OK']
[28.93000030517578, 113.60517120361328, 99.20275115966797, 124.12545013427734, 'Folio No: 99999999']

'Date\t\tTransaction\t\tAmount\t\tUnits\t\tPrice\t\tUnit'
'Folio No: 99999999\t\t(INR)\t\t(INR)\t\tBalance\nKYC: OK'

Issue in parsing Schemes names without "Advisor"

This parser is really helpful.

In my CAMS CAS, there are a few Scheme entries without "Advisor" like below

CAMS_CAS

The parser skips those particular schemes due to "SCHEME_RE" not matching.
Can something be done regarding this?

Issue with Amount alone record

In this case, there was just an extra amount credited with zero quantity, and the parser put the amount in the quantity field instead of the amount field. 320 went into qty field and not amount.

I have attached the CAMS statement of last month. And the full statement from the fund house.
Folio
Folio-full

Group capital gains by PAN

CAS pdf files are generated primarily based on the email address and may occasionally contain multiple PAN numbers depending upon the filters used during the generation. To handle such cases, the capital gains report should have an extra column for the PAN number and preferably group the entries based on it.

Error while generating capital gains report with Dividend payout scheme

Getting the following error

File "\lib\site-packages\casparser\analysis\gains.py", line 192, in merge_transactions
merged_transactions[dt].units += txn["units"]
TypeError: unsupported operand type(s) for +=: 'decimal.Decimal' and 'NoneType'

Dividend payout transactions have nothing in the "Units" column as shown in screenshot below (Only "Amount" column)
image

Franklin Schemes not Returned

Hi Team

The Franklin Mutual Fund house changed their registrar to CAMS.
Therefore the data in PDF statement may have also changed
The Latest CAS statement is showing the Franklin Schemes but data is not returned by casparser package.

Support for Multiple pdf files

Hi,

Do you have support for multiple pds files on your roadmap ?

For eg: I have 2 different reports - one from Karvy and other from Cams.

I can run the parser twice and see the results.

But, in the end I would like to see my complete portfolio in one place.

So, running 2 scripts with output on command line and then combining can be done away with it if the parser can support multiple pds.

I know each pdf can have a different password so that needs to be handled as well.

Or you want the parser to be agnostic to this and the person running the code should handle it at their end ?

Let me know.

Thanks!

Issue in Franklin Templeton Segregated Units

Hello team,

Franklin Templeton created few Segregated Portfolio's for some stressed Mutual Funds. The data is read incorrectly in some cases. In the example given below - there are 2 Segregation records - one for qty 215931.176, and second for qty 0.008, but the parser scans the 2nd one as qty 215931.184 ...

{"scheme": "Franklin India Credit Risk Fund- Segregated Portfolio 1 (8.25% Vodafone Idea Ltd-10JUL20-Growth Plan)",
"advisor": "ICICIRON",
"rta_code": "FTI880", "type": "DEBT", "rta": "CAMS", "isin": "INF090I01TJ6", "amfi": "147954", "open": "0.000", "close": "0.000", "close_calculated": "215931.176", "valuation": {"date": "2020-07-17", "value": "0.00", "nav": "0.0818"},
"transactions": [
{"date": "2020-01-24", "description": "Creation of units - Segregated Portfolio\t\t215,931.176", "amount": "0", "units": "215931.176", "nav": "0", "balance": "215931.176", "type": "SEGREGATION", "dividend_rate": null},
{"date": "2020-01-24", "description": "Creation of units - Segregated Portfolio\t\t0.008", "amount": "0", "units": "215931.184", "nav": "0", "balance": "215931.184", "type": "SEGREGATION", "dividend_rate": null},
{"date": "2020-06-15", "description": "Payment - Units Extinguished", "amount": "-1338.33", "units": "-16360.996", "nav": "0.0818", "balance": "199570.188", "type": "REDEMPTION", "dividend_rate": null},
{"date": "2020-07-10", "description": "Payment - Units Extinguished", "amount": "-16324.84", "units": "-199570.188", "nav": "0.0818", "balance": "0.000", "type": "REDEMPTION", "dividend_rate": null}]}

Duplicated transaction

This is a bug in pdfminer/mupdf but I thought It would be useful to document (since the implications are somewhat critical if you rely on the output of casparser).

If you have pages that like look this across page boundaries, it seems to count the transaction at start of page two in the previous page as well. For me, it counts the *** Stamp Duty*** transaction at the start of the second page twice (once as part of the previous page 4, and again for the actual first time it is encountered - in page 5).

parsingbug

My guess is the mediabox (used by pdfminer to determine page boundaries) of the page is larger than necessary and extends into the second one.

CAMS CAS Parsing support for Dividend payout transaction

Hi,

CAMS CAS has Dividend Payout transactions like below.

CAMS_CAS1

TRANSACTION_RE doesn't match since "units", "nav" and "balance" columns are missing in these entries.

Can the parser be updated to handle the Dividend Payout transaction? Not sure if Karvy CAS has similar format.
Also, the Dividend amount may need to be negated for XIRR calculations.

Different exception for when the password is incorrect

In the current code CASParseError("Incorrect PDF password!") is raised when the password is wrong.

raise CASParseError("Incorrect PDF password!")

So you have to do ugly things like:

try:
    read_cas_pdf("pdf", "password")
except CASParseError as err:
    if err.args:
        if 'incorrect pdf password' in err.args[0].lower():
            raise InvalidPasswordError
    
    raise

One possible solution could be to create a separate Exception for wrong password inheriting from CASParseError. Or a code attribute could be set in the CASParseError class, whose value could be like incorrect_password(or something else depending on the context where it is raised) which you can check for when handling the exception.

If you don't have the bandwidth, I can make a PR for the same this weekend.

Unable to parse negative unit balance

There seems to be an assumption that unit balance is never negative. While this assumption seems reasonable, I have a statement in which unit balance is shown as negative (some slight rounding error by AMC). This causes parsing to fail. I believe the fix is simply applying the same logic to unit balance as is applied to units.

See screenshot below for example where it fails.
image

Advisor Details are missing

In the advisor field, only ARN is coming. ARN number is required to identify the advisor associated with it.

Make TransactionType a str enum instead of int

Currently, when the detailed summary is exported, the transaction type with key "type" consists of the string version of the TransactionType Enum.

"type": txn_type.name,

While this is the right design, if someone wants to reuse the TransactionType Enum elsewhere (like I am) on the exported data, this becomes a slight nuisance, as Json parsers like pydantic will not automatically parse the string into the Enum (as TransactionType is an Enum of ints).

Parsing issue in cas In case of Bonus.

We have received a CAS in which there was a transaction related to bonus, in which some transactions contained NAV with it and some did not. so we are facing the issue like this image.

image

CLI works fine but I can't call read_cas_pdf in code

I am trying to use your library. Followed all steps as listed on your pypi page. But it always shows the error
module casparser has no attribute.
data = casparser.read_cas_pdf('/home/path.pdf', 'pwd') AttributeError: module 'casparser' has no attribute 'read_cas_pdf'

Code:
import casparser data = casparser.read_cas_pdf('/home/path.pdf', 'pwd')

Great work regardless thank you.

Unable to parse investor data

This is the code I am using to get the parsed data

import casparser


def main():
    data = casparser.read_cas_pdf('./demo2/JUL2020_AA03773313_TXN.pdf', 'FVXPK2945F', output="json")
    # data = casparser.read_cas_pdf('./demo2/MAR2021_AA06997817_TXN.pdf', password='BCDPJ0121K', force_pdfminer=True)
    print()


if __name__ == '__main__':
    main()

and this is what the error is


Traceback (most recent call last):
  File "/home/usharab/.local/lib/python3.8/site-packages/casparser/parser.py", line 163, in read_cas_pdf
    investor_info = parse_investor_info(layout, *page.mediabox[2:])
  File "/home/usharab/.local/lib/python3.8/site-packages/casparser/parser.py", line 55, in parse_investor_info
    raise CASParseError("Unable to parse investor data")
casparser.exceptions.CASParseError: Unable to parse investor data

The version of casparser I am using is '0.2.1' and before this version I was using version '0.5.3' and that version gave the same error. Can anyone guide me what could be the issue?

I have also tried force_pdfminer too and that also returned the same error

Feature Request: MF category and sub-category

HI Team

First of all many thanks for the great package your team has created.

I am author of repo and using your package to parse the cas pdf for my project.

I have requirement to classify funds based on type debt/equity and subtypes such large cap/small cap etc .

Would it be possible to integrate this feature in your package.

Print NAV also in the table

Hi,

Currently table is printing something like below
https://raw.githubusercontent.com/codereverser/casparser/main/assets/demo.jpg

However, we are getting the NAV as on in the pdf.

If NAV is printed, we can easily multiple with 'close calculated' to get the final value of the fund.
I know you recently added the fund value...

This is important as the only thing changing here daily is NAV and if the SIPs are still going out, then even close calculated changes but that change is less frequent.

Thoughts ?

Kfintech report parsing always results in error

Hi,

While this is parsing the CAMs report correctly, I have had no luck getting it to parse the Kfintech report.

I generated it from the url you mention in your Readme.

Get the report and I am able to open it fine.

When I run it,

โžœ casparser (main) โœ— casparser karvy2.pdf
Enter PDF password:
Error parsing pdf file :: Error parsing CAS header

Thanks.

Issue in parsing

Hello,
I am following the repository continuously and working on it. Currently in Cams version v3.4 live-10014
The annotation of redemption has been changed.
image

Parsed closing units not the same as calc_close

So, I have a unique problem. My CAS has two discreet entries for the same folio number, due to switch from regular plan to direct plan. The Switch in is listed before the Switch out. So what ends up happening is the program parses the closing unit balance of 7.350 first and closing unit balance of 0.00 from the next entry overwrites it. So the parsed closing unit balance is 0.00 and the calc_close is 7.350. Which ends up as an error in the CLI version, but raises no error in the normal call. I saw your TODO comment about adding this validation as well. So here's something you can test it against.

If you don't mind me suggesting options, you could maybe add them up instead of replacing, or include calc_close in the dict too.

Also, here is the parsed data of the pdf for your convenience:

'Folio No: 0000000 / 00\t\tPAN: XXXXX0000X\t\tKYC: OK PAN: OK', 'GD65-IDFC Low Duration Fund-Growth-(Direct Plan) (Advisor: INA000000000)\t\tRegistrar : CAMS', 'Opening Unit Balance: 0.000', '14-Aug-2019\t\tNORMAL SWITCH - From IDFC Low Duration Fund-Gr-(Reg Pln)-BSE -\t\t202.98\t\t7.350\t\t27.6156\t\t7.350', 'Closing Unit Balance: 7.350\t\tNAV on 22-Dec-2020: INR 30.3779\t\tValuation on 22-Dec-2020: INR 223.28', '"Entry Load: Nil - Exit Load : Nil W.E.F 29/June/2012 . Please refer the Offer Document / Addendum issued from time to time"', 'Folio No: 0000000 / 00\t\tPAN: XXXX0000X\t\tKYC: OK PAN: OK', 'G65-IDFC Low Duration Fund-Growth-(Regular Plan) (Advisor: ARN-000000)\t\tRegistrar : CAMS', 'Opening Unit Balance: 0.000', '18-Jun-2019\t\tPurchase\t\t200.00\t\t7.424\t\t26.9379\t\t7.424', '19-Jun-2019 ***Address Updated from KRA Data***', '19-Jun-2019 ***Registration of Nominee***', '14-Aug-2019\t\tSwitch Out - To IDFC Low Duration Fund-Gr-(Dir Pln)-BSE -\t\t(202.98)\t\t(7.424)\t\t27.3406\t\t0.000', '30-Sep-2020 ***Address Updated from KRA Data***', 'Closing Unit Balance: 0.000\t\tNAV on 22-Dec-2020: INR 29.9852\t\tValuation on 22-Dec-2020: INR 0.00', '"Entry Load: Nil - Exit Load : Nil W.E.F 29/June/2012 . Please refer the Offer Document / Addendum issued from time to time"

I have attached relevant screen shots of all, below:
CAS-SCAN
CLI-ERROR-SS
CLI-SS
RESULT-DICT

parser not working, resulting in excalamation mark / error rather than checked

I generated a consolidated report from CAS - CAMS + KFintech at https://www.camsonline.com/Investors/Statements/Consolidated-Account-Statement

Executing the casparser cli utility does not return successfully.

Is this expected ?

Please note the Error and the Excalamation marks in the image below.

casparser_snippet_error

Command executed,

$ casparser <filename>.pdf -p '<$password$>'

File Type details,

File Type : FileType.CAMS
CAS Type : CASFileType.DETAILED

also, is up-to-date,

(.venv_py310) iceman@pop-os ~/D/M/Statements> casparser-isin --update
2023-08-31 00:26:23,325 - INFO - Fetching remote isin db metadata
2023-08-31 00:26:24,283 - INFO - Local db version  : 2023.8.18
2023-08-31 00:26:24,283 - INFO - Remote db version : 2023.8.18
2023-08-31 00:26:24,283 - INFO - casparser-isin database is already upto date

Use quotes for delimiters / use semicolons for separators when generating CSV

Some scheme descriptions have commas in them:

***IDCW @ Rs.2.95000000 per unit  (TDS :138.70, TDS Rate: 7.50%)***
Redemption less TDS, STT
Lateral Shift Out less TDS, STT
Redemption Less STT -BSE - - UTR # CITIN24422132375 , less STT

These cause a problem when reading the CSV file.

Possible solutions:

  1. Use double quotes to delimit the fields
  2. Use semicolons as the separator instead of commas.

HeaderParseError: Error parsing CAS header


HeaderParseError Traceback (most recent call last)
in ()
----> 1 json_str = data = casparser.read_cas_pdf("33220217220210621ZFBF290265631DC70CPIMBCP130542292.pdf", "abcd1234")

2 frames
/usr/local/lib/python3.7/dist-packages/casparser/process.py in parse_header(text)
17 if m:
18 return m.groupdict()
---> 19 raise HeaderParseError("Error parsing CAS header")
20
21

HeaderParseError: Error parsing CAS header

Code:
json_str = data = casparser.read_cas_pdf("33220217220210621ZFBF290265631DC70CPIMBCP130542292.pdf", "xyz")

casparser.exceptions.CASParseError: Unable to parse investor data

while running casparser, it is giving following error:

data = casparser.read_cas_pdf("CAMS_pranshu766.pdf", "pranshu766")
Deprecation: 'getTextPage' removed from class 'Page' after v1.19.0 - use 'get_textpage'.
Traceback (most recent call last):
File "", line 1, in
File "/home2/ajitup/anaconda3/lib/python3.8/site-packages/casparser/parsers/init.py", line 33, in read_cas_pdf
partial_cas_data = cas_pdf_to_text(filename, password)
File "/home2/ajitup/anaconda3/lib/python3.8/site-packages/casparser/parsers/mupdf.py", line 213, in cas_pdf_to_text
investor_info = parse_investor_info(page_dict)
File "/home2/ajitup/anaconda3/lib/python3.8/site-packages/casparser/parsers/mupdf.py", line 145, in parse_investor_info
raise CASParseError("Unable to parse investor data")
casparser.exceptions.CASParseError: Unable to parse investor data

Please help!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.