I am trying to convert a json file which is present on HDFS to CSV. Below two main classes I have tried for this:
1-
public class ClassMain {
public static void main(String[] args) throws IOException {
String uri = args[1];
String uri1 = args[2];
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs:///ip-10-16-37-124:9000/");
String str = new String(Files.readAllBytes(Paths.get(uri)));
//FileSystem fs = FileSystem.get(URI.create("hdfs:///"+uri), conf);
FileSystem fs = FileSystem.get(conf);
FSDataInputStream in = null;
FSDataOutputStream out = fs.create(new Path(uri1));
try{
in = fs.open(new Path(uri));
JsonToCSV toCSV = new JsonToCSV(str);
toCSV.json2Sheet().write2csv(uri1);
IOUtils.copyBytes(in, out, 4096, false);
}
finally{
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}
}
}
I am running the jar as:
hadoop jar json-csv-hdfs.jar com.nishant.ClassMain /nishant/small.json /nishant/small.csv
But somehow, it reads as URI of my input file as hdfs:/ip-10-16-37-124:9000/nishant/small.json which is incorrect. I have tried all possible combinations of the URI, but it does not change the formed URI. I thought maybe Files.readAllBytes is causing problem while reading the JSON file, so I tried using buffer reader to read input file. Below is second main class I wrote for this.
2-
public class ClassMain {
public static void main(String[] args) throws IOException {
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path inFile = new Path(args[1]);
Path outFile = new Path(args[2]);
if (!fs.exists(inFile))
System.out.println("Input file not found");
if (!fs.isFile(inFile))
System.out.println("Input should be a file");
if (fs.exists(outFile))
System.out.println("Output already exists");
FSDataInputStream in = fs.open(inFile);
FSDataOutputStream out = fs.create(outFile);
byte buffer[] = new byte[12000];
try{
int bytesRead = 0;
while ((bytesRead = in.read(buffer)) > 0) {
String str = new String(buffer);
JsonToCSV toCSV = new JsonToCSV(str);
toCSV.json2Sheet().write2csv(outFile.toString());
out.write(buffer, 0, bytesRead);
}
}catch (IOException e) {
System.out.println("Error while copying file");}
finally{in.close();
out.close();
}
}
}
Here, it is reading the input files alright. I tried printing the value of this.jsonString from JsonFlat.java and it was able to read the valid JSON. But the method call -
ele = new JsonParser().parse(this.jsonString); is not going proper and it gives below stack trace:
Exception in thread "main" com.google.gson.JsonSyntaxException: com.google.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 4 column 2
at com.google.gson.JsonParser.parse(JsonParser.java:65)
at com.google.gson.JsonParser.parse(JsonParser.java:45)
at com.nishant.JsonToCSV.json2Sheet(JsonToCSV.java:105)
at com.nishant.ClassMain.main(ClassMain.java:45)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: com.google.gson.stream.MalformedJsonException: Use JsonReader.setLenient(true) to accept malformed JSON at line 4 column 2
at com.google.gson.stream.JsonReader.syntaxError(JsonReader.java:1505)
at com.google.gson.stream.JsonReader.checkLenient(JsonReader.java:1386)
at com.google.gson.stream.JsonReader.doPeek(JsonReader.java:531)
at com.google.gson.stream.JsonReader.peek(JsonReader.java:414)
at com.google.gson.JsonParser.parse(JsonParser.java:60)
... 9 more
To call the parse method, it is sending this.jsonString as parameter which is a valid JSON as I have printed, then why is this giving malformed JSON exception.
Is it because of "public JsonElement parse(Reader json) throws JsonIOException, JsonSyntaxException {" Reader data type?
How to run this code in HDFS which would solve my problem?
value of this.jsonString which is a valid JSON:
[
{"uploadTimeStamp":"1488793033624","PDID":"123","data":[{"Data":{"unit":"rpm","value":"100"},"EventID":"E1","PDID":"123","Timestamp":1488793033624,"Timezone":330,"Version":"1.0","pii":{}},{"Data":{"heading":"N","loc1":"false","loc2":"00.001","loc3":"00.004","loc4":"false","speed":"10"},"EventID":"E2","PDID":"123","Timestamp":1488793033624,"Timezone":330,"Version":"1.1","pii":{}},{"Data":{"xvalue":"1.1","yvalue":"1.2","zvalue":"2.2"},"EventID":"E3","PDID":"123","Timestamp":1488793033624,"Timezone":330,"Version":"1.0","pii":{}},{"EventID":"E4","Data":{"value":"50","unit":"percentage"},"Version":"1.0","Timestamp":1488793033624,"PDID":"123","Timezone":330},{"Data":{"unit":"kmph","value":"70"},"EventID":"E5","PDID":"123","Timestamp":1488793033624,"Timezone":330,"Version":"1.0","pii":{}}]},
{"uploadTimeStamp":"1488793167598","PDID":"124","data":[{"Data":{"unit":"rpm","value":"100"},"EventID":"E1","PDID":"124","Timestamp":1488793167598,"Timezone":330,"Version":"1.0","pii":{}},{"Data":{"heading":"N","loc1":"false","loc2":"00.001","loc3":"00.004","loc4":"false","speed":"10"},"EventID":"E2","PDID":"124","Timestamp":1488793167598,"Timezone":330,"Version":"1.1","pii":{}},{"Data":{"xvalue":"1.1","yvalue":"1.2","zvalue":"2.2"},"EventID":"E3","PDID":"124","Timestamp":1488793167598,"Timezone":330,"Version":"1.0","pii":{}},{"EventID":"E4","Data":{"value":"50","unit":"percentage"},"Version":"1.0","Timestamp":1488793167598,"PDID":"124","Timezone":330},{"Data":{"unit":"kmph","value":"70"},"EventID":"E5","PDID":"124","Timestamp":1488793167598,"Timezone":330,"Version":"1.0","pii":{}}]}]