Pseudo code Data and Conversion
Note that this project is still in development. Please do not share any of the data or code.
PseudocodeExtractor is a java project and used for extracting and reformatting the pdfs installed from arxix data stored in Amazon S3 bucket. Incoder_pseudocode_convertion.ipynb is where extracted pseudocodes are converted to python functions using the pretrained Incoder6B. We also prepared a sample pseudo-code input data located in the data folder to quickly test the pseudocode to code conversion using the Incoder_pseudocode_convertion.ipynb file.
The extracted list of 90 000 pseudo codes along with their reformatted version can be found in the link below. We also included a reference text file inside the link displaying meta-information of all downloaded pdf files and shows which one of them has pseudo code in them.
Confidential Data link: https://drive.google.com/drive/folders/1_es1_bmp4yU2z03GhQJfCBLlaiFbLSaQ?usp=share_link