darija-open-dataset / dataset Goto Github PK
View Code? Open in Web Editor NEWdarija <-> english dataset
License: Other
darija <-> english dataset
License: Other
Translating and fixing typos in 12.csv
.
Translating and fixing typos in 16.csv
.
Translating and fixing typos in 08.csv
.
Translating and fixing typos in 11.csv
.
Translating and fixing typos in 14.csv
.
Translating and fixing typos in 02.csv
.
Translating and fixing typos in 07.csv
.
Translating and fixing typos in 25.csv
.
Translating and fixing typos in 32.csv
.
Translating and fixing typos in 40.csv
.
Translating and fixing typos in 39.csv
.
Translating and fixing typos in 22.csv
.
Translating and fixing typos in 36.csv
.
Translating and fixing typos in 38.csv
.
Translating and fixing typos in 05.csv
.
Translating and fixing typos in 27.csv
.
Translating and fixing typos in 04.csv
.
Fixing typos in the translated.csv
file from the beginning of the file to the line 6374
Translating and fixing typos in 18.csv
.
Fixing typos in the translated.csv
file from line 6375
to the end of the file.
Translating and fixing typos in 29.csv
.
Translating and fixing typos in 33.csv
.
Translating and fixing typos in 43.csv
.
Translating and fixing typos in 13.csv
.
Translating and fixing typos in 15.csv
.
Translating and fixing typos in 26.csv
.
Thanks so much for this initiative.
I will try to contribute during my spare time.
Translating and fixing typos in 24.csv
.
Translating and fixing typos in 19.csv
.
Translating and fixing typos in 10.csv
.
I've been looking through the sentence.csv and found out that they are some typos in some sentences like using "8" as " h" or "7", and also they are some sentences that can have a variant of translated sentences for example: " kaychrab kass dyal lma" as " kaychrb kass dlma"
In addition, the use of "x" instead of "ch".
Translating and fixing typos in 17.csv
.
Translating and fixing typos in 37.csv
.
Translating and fixing typos in 21.csv
.
I want to start contributing to DODA and been exploring the repo, and I am wondering how you do identify if a sentence or a word is already added to the dataset.
Translating and fixing typos in 30.csv
.
Translating and fixing typos in 23.csv
.
Translating and fixing typos in 01.csv
.
ParserError Traceback (most recent call last)
in
1 import pandas as pd
2
----> 3 print(pd.read_csv("dataset/sentences2.csv"))
~\anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision)
674 )
675
--> 676 return _read(filepath_or_buffer, kwds)
677
678 parser_f.name = name
~\anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
452
453 try:
--> 454 data = parser.read(nrows)
455 finally:
456 parser.close()
~\anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
1131 def read(self, nrows=None):
1132 nrows = _validate_integer("nrows", nrows)
-> 1133 ret = self._engine.read(nrows)
1134
1135 # May alter columns / col_dict
~\anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows)
2035 def read(self, nrows=None):
2036 try:
-> 2037 data = self._reader.read(nrows)
2038 except StopIteration:
2039 if self._first_chunk:
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()
pandas_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()
o fach kandir rror_bad_lines=False kaitl3lya hdchi
Translating and fixing typos in 31.csv
.
Translating and fixing typos in 03.csv
.
Translating and fixing typos in 20.csv
.
Translating and fixing typos in 41.csv
.
Translating and fixing typos in 34.csv
.
Translating and fixing typos in 28.csv
.
Translating and fixing typos in 42.csv
.
Translating and fixing typos in 09.csv
.
Translating and fixing typos in 06.csv
.
Translating and fixing typos in 35.csv
.
Hello!
I enjoy seeing community data projects like this. As a former student of Cantonese (a dialect of Chinese spoken in HongKong), I know what it's like when the language you're interested in has no dictionary. At one point, I tried to create one myself, but gave up after 500 words. It's too much effort to do it alone.
The issue today is, that working together on these things is very technical. I'm sure your contributors here are familiar with Github and have no problem helping out, but it would be great if anyone with just the knowledge of English and Darija could help out, without technical skills.
For this reason I am building DataStack. It's a collaboration platform for table data, and works similarly to Github, but much more easy to use for data. No tools or technical knowledge needed.
To show you what it looks like, I've uploaded two files from your dataset there:
If you're interested, please have a look. Does this solution suit your needs? What is still missing in your opinion?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.