Giter Club home page Giter Club logo

openrefine / openrefine Goto Github PK

View Code? Open in Web Editor NEW
10.5K 474.0 1.9K 393.36 MB

OpenRefine is a free, open source power tool for working with messy data and improving it

Home Page: https://openrefine.org/

License: BSD 3-Clause "New" or "Revised" License

Java 67.31% JavaScript 21.81% HTML 8.41% CSS 1.91% Shell 0.31% Batchfile 0.12% mIRC Script 0.01% Python 0.06% Inno Setup 0.06%
datacleansing data-analysis java opendata wikidata journalism data-science datajournalism datacleaning datamining

openrefine's Introduction

OpenRefine

DOI Join the chat at https://gitter.im/OpenRefine/OpenRefine Snapshot release Coverage Status Translation progress

OpenRefine is a Java-based power tool that allows you to load data, understand it, clean it up, reconcile it, and augment it with data coming from the web. All from a web browser and the comfort and privacy of your own computer.

Official website: https://openrefine.org

Community forum: https://forum.openrefine.org

Download

Snapshot releases

You can download snapshots of the development version of OpenRefine. To do so, you need to be logged in to GitHub. Then, click on the first item with a green tick / check mark on this page and scroll down to the Artifacts section to find the version that matches your operating system.

Run from source

If you have cloned this repository to your computer, you can run OpenRefine with:

  • ./refine on Mac OS and Linux
  • refine.bat on Windows

This requires JDK 11 or newer, Apache Maven and NPM 16 or newer.

Documentation

Contributing to the project

Contact us

Licensing and legal issues

OpenRefine is open source software and is licensed under the BSD license located in the LICENSE.txt. See the folder licenses for information on open source libraries that OpenRefine depends on.

Credits

This software was created by Metaweb Technologies, Inc. and originally written and conceived by David Huynh. Metaweb Technologies, Inc. was acquired by Google, Inc. in July 2010 and the product was renamed Google Refine. In October 2012, it was renamed OpenRefine as it transitioned to a community-driven project.

Since 2020, OpenRefine is fiscally sponsored by Code for Science and Society (CS&S).

See CONTRIBUTING.md for instructions on how to contribute yourself.

openrefine's People

Contributors

abbe98 avatar afkbrb avatar allanaaa avatar antoine2711 avatar blakko avatar comradekingu avatar darecoder avatar dependabot-preview[bot] avatar dependabot[bot] avatar dfhuynh avatar elebitzero avatar elroykanye avatar fgiroud avatar iainsproat avatar isaomatsunami avatar jackyq2015 avatar joanneong avatar kushthedude avatar lisa761 avatar magdmartin avatar msaby avatar ostephens avatar sannitassj avatar santossi avatar stefanom avatar tfmorris avatar thadguidry avatar trnstlntk avatar weblate avatar wetneb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openrefine's Issues

Make Java 6 dependency explicit

Original author: tfmorris (May 13, 2010 19:50:54)

The source code uses things which are only available in Java 6 and later.
This dependency should be made explicit in the Eclipse setup and the
developer docs. The attached patch will take care of the Eclipse piece of
it. It fixes the case where Eclipse has multiple JVMs available, but Java 6
isn't the default (a lot of projects still build for Java 5 for
compatibility, so that's my workspace default).

Someone with privileges for the wiki should mention Java 6 there too.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=25

Don't coopt NotImplemented exceptions, particularly Sun internal ones

Original author: tfmorris (May 13, 2010 19:55:35)

There are several places in the code where an Apache or Sun NotImplemented
exception is thrown as some type of generic error. If these things are never
going to be implemented java.lang.UnsupportedOperation would be a better
choice. Using Sun internal classes is particularly problematic because a)
they don't exist in all JVMs and b) they can go away at any time. If some of
the uses are stubs which will get implemented later, defining a Gridworks
specific NotYetImplemented will allow you to easily identify this case.

The attached patch changes them all to java.lang.UnsupportedOperation.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=26

Limiting Freebase load to starred records

Original author: [email protected] (May 10, 2010 21:34:31)

On behalf of Raymond Yee:

I'm trying to debug my tripleloading process by starring only one record, performed "facet on star",
and selected true on that facet to display only my selected column. When I went to load into
freebase, GW created a job that comprises of triples from all the records, not just the single record
that I starred. Is this an expected behavior or a bug? (It was not how I expected GW to behave.)

(As a work around, I will export the filtered row into Excel and try out a Freebase load on the
spreadsheet.)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=14

Facet by date

Original author: EmilStenstrom (May 16, 2010 17:16:54)

I have a large dataset of posts with a published date field, and I would
love to be able to facet on that field, and get an overview on the data
that way.

(This can be done today by splitting the date with datePart and a numeric
facet, but I'm looking for something simpler)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=40

[Wishlist] match() function to turn results of regexp groups into an array

Original author: [email protected] (May 14, 2010 15:08:33)

It would be nice if I could do the following:

// value = July 2010 Something Something Something AVZCX

value.match(/([a-zA-Z]+) ([0-9]{4}) (.*) ([A-Z]+)/)

and get an array

['July','2010','Something Something Something','AVZCX']

Obviously this is a trivial example, but I think it will have its uses.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=37

Column name collision when adding data from Freebase

Original author: [email protected] (May 10, 2010 21:37:09)

On behalf of Raymond Yee:

This morning, when doing a "Add columns from Freebase" on a column containing a county and
asking for the containing /location/us_territory, the operation was not able to complete. I think the
problem was that I already had another column by the name of Contained by that had been created
by a previous "Add columns from Freebase" operation that had asked for the containing
/location/us_state. I've done multiple Freebase column requests before by I had always renamed the
column in between operations. This time, might I have caused a conflicting column name problem
and hence preventing my Freebase query to stop prematurely.

I'll work more on reproducing the error rigorously if there is a need...but does this situation ring a
bell as one that might cause a problem?

Original issue: http://code.google.com/p/google-refine/issues/detail?id=16

Auto-saving during an import of large files can result in java heap errors

Original author: [email protected] (May 09, 2010 04:42:40)

C:\Users\tguidry\Documents\Downloads\gridworks-1.0a4-r93878>gridworks \m 1024m
13:57:27.463 [ gridworks_server] Initializing context: '/' from 'C:\User
s\tguidry\Documents\Downloads\gridworks-1.0a4-r93878\webapp' (0ms)
13:57:27.885 [ project_manager] Using workspace directory: C:\Users\tgu
idry\AppData\Local\Gridworks (422ms)
13:57:27.887 [ project_manager] Loading workspace: C:\Users\tguidry\App
Data\Local\Gridworks\workspace.json (2ms)
13:58:59.236 [ create-project_command] Importing 'dw_sales_invoice query.xml'
(91349ms)
14:02:28.431 [ project_manager] Saved workspace (209195ms)
14:12:20.078 [ project_manager] Saved workspace (591647ms)
14:12:30.689 [ project_manager] Saved workspace (10611ms)
14:12:41.147 [ org.mortbay.log] Error for /command/create-project-from-
upload (10458ms)
java.lang.OutOfMemoryError: Java heap space
14:17:28.035 [ project_manager] Saved workspace (286888ms)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=3

Localized Windows cause save problems for Gridworks

Original author: [email protected] (May 09, 2010 09:08:49)

I have russian Windows 7 installation and all usernames are Russian.

When use Gridworks and run any action that should save something I see
following error on console and nothing saved too.

java.io.FileNotFoundException: C:\Users????
\AppData\Local\Gridworks\2206489941407.project\history\1273396606971.change
.zip (Системе не удается найти указанный путь)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
at
com.metaweb.gridworks.history.HistoryEntry.saveChange(HistoryEntry.java:177
)
at
com.metaweb.gridworks.history.HistoryEntry.saveChange(HistoryEntry.java:172
)
at
com.metaweb.gridworks.history.HistoryEntry.apply(HistoryEntry.java:90)
at com.metaweb.gridworks.history.History.addEntry(History.java:89)
at
com.metaweb.gridworks.process.QuickHistoryEntryProcess.performImmediate(Qui
ckHistoryEntryProcess.java:38)
at
com.metaweb.gridworks.process.ProcessManager.queueProcess(ProcessManager.ja
va:37)
at
com.metaweb.gridworks.commands.Command.performProcessAndRespond(Command.jav
a:130)
at
com.metaweb.gridworks.commands.EngineDependentCommand.doPost(EngineDependen
tCommand.java:48)
at
com.metaweb.gridworks.GridworksServlet.doPost(GridworksServlet.java:223)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java
:938)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:2
28)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j
ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
908)
at java.lang.Thread.run(Thread.java:619)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=5

[Wishlist] Fix the table header so that it's always visible when scrolling a long page

Original author: [email protected] (May 14, 2010 16:03:17)

I like to work with large page sizes (eg 50, though more would be nice too) and it's a little awkward
having to scroll to the top in order to perform operations on cells beyond a simple edit.

So I propose fixing the header to the top of the scrolling viewport and only scrolling the content
cells. This can be achieved relatively easily using <thead> for the heading cells and overflow-y:
scroll on <tbody> (if I recall correctly, I haven't looked into this in great detail)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=38

gridworks_server should acknowledge arguments passed in.

Original author: thadguidry (May 13, 2010 00:25:43)

Gridworks server doesn't respond or acknowledge args or options passed
through. Advise to minimally output to console and/or enable logging option
for this output to file.

Would expect and like to see:

[ gridworks_server] Initializing context: '/' from 'C:...'
[ gridworks_server] Using 4096m, Host 127.0.0.1:3333
[ project_manager] Using workspace directory: C:\Users...

Original issue: http://code.google.com/p/google-refine/issues/detail?id=23

SeparatorRowParser handles blanks differently from TsvCsvRowParser

Original author: iainsproat (May 17, 2010 06:57:24)

Using r793 from SVN.
Import a CSV file with a blank data point e.g. "value1","value2",,"value4"

SeparatorRowParser adds the empty cell to the model. (row.cells.getSize() ==
4)
TsvCsvRowParser omits the empty cell from the model. (row.cells.getSize() ==
3)

What's the expected behaviour? Should it be added or omitted?

Original issue: http://code.google.com/p/google-refine/issues/detail?id=43

Undo History bug

Original author: [email protected] (May 09, 2010 04:42:03)

On behalf of Thad Guidry:

In testing Undo history with the attached TAR project file, I came across an error.

Not sure if it is directly related to STARRED ROWS or not. My
\history\1270136848043.change.zip file was there in Windows folder and still readable by
Windows just fine, so not sure what problem Gridworks had with it in particular. I've attached
both the project TAR file, and zipped the Windows folder itself for your review.

ERROR OUTPUT :

12:45:21.459 [ com.metaweb.gridworks] Using workspace directory: C:\Users\tgu
idry\AppData\Local\Gridworks (0ms)
12:45:21.461 [ com.metaweb.gridworks] Loading workspace from C:\Users\tguidry
\AppData\Local\Gridworks\workspace.json (2ms)
java.lang.RuntimeException: Failed to load change file C:\Users\tguidry\AppData\
Local\Gridworks\1809478922429.project\history\1270136848043.change.zip
at com.metaweb.gridworks.history.HistoryEntry.loadChange(HistoryEntry.ja
va:144)
at com.metaweb.gridworks.history.HistoryEntry.revert(HistoryEntry.java:9
9)
at com.metaweb.gridworks.history.History.undo(History.java:163)
at com.metaweb.gridworks.history.History.undoRedo(History.java:99)
at com.metaweb.gridworks.history.HistoryProcess.performImmediate(History
Process.java:44)
at com.metaweb.gridworks.process.ProcessManager.queueProcess(ProcessMana
ger.java:53)
at com.metaweb.gridworks.commands.edit.UndoRedoCommand.doPost(UndoRedoCo
mmand.java:34)
at com.metaweb.gridworks.GridworksServlet.doPost(GridworksServlet.java:1
58)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511
)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
90)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
65)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)

    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1 
  1. at org.mortbay.jetty.Server.handle(Server.java:326) 
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54 
    
  2.  at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio 
    

n.java:938)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:755)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source
)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at com.metaweb.gridworks.history.History.readOneChange(History.java:36)
at com.metaweb.gridworks.history.HistoryEntry.loadChange(HistoryEntry.ja
va:157)
at com.metaweb.gridworks.history.HistoryEntry.loadChange(HistoryEntry.ja
va:142)
... 26 more
Caused by: java.lang.ClassNotFoundException: newStarred=true
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at com.metaweb.gridworks.history.History.getChangeClass(History.java:51)

    at com.metaweb.gridworks.history.History.readOneChange(History.java:32) 
    at com.metaweb.gridworks.model.changes.MassChange.load(MassChange.java:7 
  1.  ... 33 more
    

Original issue: http://code.google.com/p/google-refine/issues/detail?id=2

MySQL Support Would be Amazing

Original author: mjlissner (May 10, 2010 17:39:53)

What steps will reproduce the problem?

  1. Try to connect to a MySQL database

As a DB administrator, all of my data is locked up in a MySQL database. I
COULD export it to CSV, then import it into Gridworks, then do some
cleanup, then export it to CSV, then import it to MySQL again, overwriting
the old data, but that seems very complicated.

Since this works in the browser anyway, it would be amazing if it could be
connected to MySQL databases, and if data manipulation could happen from there.

This would unlock a TON of new data, though I'm unsure how the program
would scale to the quantities of information that would be pulled in.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=12

[Wishlist] Seamless conversion of arrays into multiple columns

Original author: [email protected] (May 14, 2010 14:53:52)

Unless I'm very much mistaken, there's no direct way to turn an array value into multiple columns.
Using join() requires finding a suitable separator character to split on later which requires knowledge
of the full content of the column.

Perhaps returning an array from a transform should create new columns representing the array
values? I'm not fussy about the specifics, but at the moment arrays seem to be 2nd class data types.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=36

Envelope quotation marks are removed by CSV importer

Original author: iainsproat (May 16, 2010 17:47:01)

Import a CSV or TSV file with a data value of: ""value1""
On import the value becomes "value1"

For most cases this won't be a problem, except where it's a
description which starts or ends with a quotation mark e.g.: ""To be
or not to be" is a quote from Hamlet."

This gets garbled into "To be
or not to be" is a quote from Hamlet."

Original issue: http://code.google.com/p/google-refine/issues/detail?id=41

gridworks ui_test fails with paths with spaces

Original author: iainsproat (May 10, 2010 12:31:54)

When the source directory path has spaces in it, running ./gridworks ui_test fails at lines 319 and 503.

The attached patch should fix this (at least in Windows - not checked other
OS's) by adding quotation marks around the path variables. (There's probably
more of these unquoted paths in this file, but this fixes the immediate bug).

Original issue: http://code.google.com/p/google-refine/issues/detail?id=9

OAuth fails on sign in

Original author: iainsproat (May 10, 2010 16:14:55)

Using Windows XP, Java Runtime 1.6.0 rev 20, Gridworks r667

Steps to reproduce: # 1 Using a project with columns mapped to Freebase schema. # 2 clicked through to Load into Freebase. # 3 clicked on 'Sign into Freebase'.

The below error is produced (gridworks logging set to verbose), and json
dumped as a file in the browser.

19:11:19.031 [ servlet] > GET check-authorization (16ms)
19:11:19.078 [ servlet] < GET check-authorization (47ms)
19:11:30.781 [ servlet] > GET authorize (11703ms)
19:11:32.859 [ org.apache.http.wire] >> "POST
/api/oauth/request_token HTTP/1.1[EOL]" (2078ms)
19:11:32.859 [ org.apache.http.wire] >> "Authorization: OAuth
realm="freebase.com",
oauth_callback="http%3A%2F%2F127.0.0.1%3A3333%2Fcommand%2Fauthorize%2Fwww.f
reebase.com", oauth_consumer_key="%239202a8c04000641f80000000keykey",
oauth_version="1.0", oauth_signature_method="HMAC-SHA1",
oauth_timestamp="1273504291", oauth_nonce="-6157124724385079801",
oauth_signature="S872Vk0YA7IaMnz31IA88signature"[EOL]" (0ms)
19:11:32.859 [ org.apache.http.wire] >> "Content-Length: 0[EOL]"(0ms)
19:11:32.859 [ org.apache.http.wire] >> "Host:
www.freebase.com[EOL]&quot;(0ms)
19:11:32.859 [ org.apache.http.wire] >> "Connection: Keep-
Alive[EOL]"(0ms)
19:11:32.859 [ org.apache.http.wire] >> "User-Agent: Apache-
HttpClient/4.0.1 (java 1.5)[EOL]" (0ms)
19:11:32.859 [ org.apache.http.wire] >> "[EOL]" (0ms)
19:11:33.171 [ org.apache.http.wire] << "HTTP/1.0 400 Bad
Request[EOL]" (312ms)
19:11:33.187 [ org.apache.http.wire] << "Date: Mon, 10 May 2010
15:17:25 GMT[EOL]" (16ms)
19:11:33.187 [ org.apache.http.wire] << "Server: Apache[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "WWW-Authenticate: OAuth
realm="http://freebase.com/&quot;[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "X-Metaweb-Cost: cc=0.004,
dt=0.009, mcs=0.0, mcu=0.0, minflt=4, nvcsw=10, oublock=8, tm=0.0,
utime=0.004[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "Expires: Mon, 10 May 2010
15:17:26 GMT[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "Vary: Accept-Encoding[EOL]"
(0ms)
19:11:33.187 [ org.apache.http.wire] << "Content-Length: 368[EOL]"
(0ms)
19:11:33.187 [ org.apache.http.wire] << "Content-Type: text/html;
charset=UTF-8[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "X-Cache: MISS from
cache01.p01.sjc1.metaweb.com[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "Connection: keep-alive[EOL]"
(0ms)
19:11:33.187 [ org.apache.http.wire] << "X-Metaweb-TID:
cache;cache01.p01.sjc1:8101;2010-05-10T15:17:25Z;0121[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "Set-Cookie:
metaweb_tid=cache%3Bcache01.p01.sjc1%3A8101%3B2010-05-
10T15%3A17%3A25Z%3B0121; path=/[EOL]" (0ms)
19:11:33.187 [ org.apache.http.wire] << "Cache-Control: public, no-
cache="Set-Cookie", max-age=1, s-maxage=1, stale-while-revalidate=1[EOL]"
(0ms)
19:11:33.187 [ org.apache.http.wire] << "[EOL]" (0ms)
19:11:33.234 [ org.apache.http.wire] << "400 Bad Request[\n]" (47ms)
19:11:33.234 [ org.apache.http.wire] << "[\n]" (0ms)
19:11:33.234 [ org.apache.http.wire] << "The server could not comply
with the request since it is either malformed or otherwise incorrect. [\n]"
(0ms)
19:11:33.234 [ org.apache.http.wire] << "[\n]" (0ms)
19:11:33.234 [ org.apache.http.wire] << " {'status': '400 Bad
Request', 'code': '/api/status/error/oauth', 'messages': [{'info': {},
'message': 'Expired timestamp: given 1273504291 and now 1273504645 has a
greater difference than threshold 300', 'code':
'/api/status/error/request/error'}]} " (0ms)
19:11:33.234 [ command] Exception caught (0ms)
oauth.signpost.exception.OAuthCommunicationException: Communication with
the service provider failed: Service provider responded in error: 400 (Bad
Request)
at
oauth.signpost.AbstractOAuthProvider.retrieveToken(AbstractOAuthProvider.ja
va:214)
at
oauth.signpost.AbstractOAuthProvider.retrieveRequestToken(AbstractOAuthProv
ider.java:69)
at
com.metaweb.gridworks.commands.auth.AuthorizeCommand.doGet(AuthorizeCommand
.java:52)
at
com.metaweb.gridworks.GridworksServlet.doGet(GridworksServlet.java:211)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnecti
on.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:2
28)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(UnknownSource)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(UnknownSource)
at java.lang.Thread.run(Unknown Source)
Caused by: oauth.signpost.exception.OAuthCommunicationException: Service
provider responded in error: 400 (Bad Request)
at
oauth.signpost.AbstractOAuthProvider.handleUnexpectedResponse(Abstrac
tOAuthProvider.java:241)
at
oauth.signpost.AbstractOAuthProvider.retrieveToken(AbstractOAuthProvider.ja
va:189)
... 22 more
19:11:33.234 [ servlet] < GET authorize (0ms)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=10

missing "lang" attribute in MQL generated in schema alignment

Original author: [email protected] (May 10, 2010 17:13:45)

In the attached project, the MQL-like preview shows:

[
{
"/base/jsbach/bach_composition/bwv": [
{
"connect": "insert",
"value": "52",
"type": "/type/text"
}
],
"name": "Falsche Welt, dir trau ich nicht",
"/music/composition/composer": [
{
"id": "/en/johann_sebastian_bach"
}
],
"type": "/base/jsbach/bach_composition",
"/music/composition/form": [
{
"id": "/en/cantata"
}
],
"create": "unless_exists"
}
]

If you feed the MQL to the query editor (http://tinyurl.com/3858oww), you get

"Must specify 'lang' when using /type/text in a write"

error.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=11

Delievered "Collapse whitespace" transformation does not work

Original author: staringmonkey (May 14, 2010 13:34:39)

What steps will reproduce the problem?

  1. Load a grid which includes whitespace in a column.
  2. From the column header, select Edit Cells -> Common Transforms ->
    Collapse whitespace.

What is the expected output? What do you see instead?

Contiguous blocks of whitespace should be collapsed to a single space.
This does not occur.

What version of the product are you using? On what operating system?

SVN HEAD on OSX.

Please provide any additional information below.

The following Jython expression collapsed whitespace correctly:
import re
return re.sub(r'\s+',' ',value)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=29

Maximum number of facet values should be configurable.

Original author: staringmonkey (May 14, 2010 13:40:30)

What steps will reproduce the problem?

  1. For any column that contains more than 2000 unique values select Facet
    -> Text Facet from the header.

What is the expected output? What do you see instead?

The facet sidebar displays "Too many choices" rather than all discrete values.

What version of the product are you using? On what operating system?

SVN HEAD on OSX.

Please provide any additional information below.

This value should be configurable either in flat-text or in the UI.
Although there is a definite performance boundary, I found with my very
first dataset that I needed to support upwards of 3000 unique values in a
facet, but could not do so without modifying the source.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=31

float rejected from sandbox upload as Json object

Original author: iainsproat (May 10, 2010 20:42:29)

Using a schema that is ready to load into Freebase, and containing a column
mapped to a property of type 'float'. Load the triples into Freebase
sandbox. I find that all the float values are being rejected as Json objects.
See http://gridworks-loads.freebaseapps.com/job/23

In the schema alignment skeleton, the language value is set to /lang/en. (not
sure if that is of interest)

Let me know if you need the project file.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=13

Browser this cluster does not work

Original author: EmilStenstrom (May 18, 2010 22:12:08)

What steps will reproduce the problem?

  1. Select a column, click Cluster & Edit
  2. On a cluster, click "Browse this cluster"

What is the expected output? What do you see instead?
To see just the values in that cluster. Instead I see an empty page with no
values.

What version of the product are you using? On what operating system?
Gridworks 1.0, on Windows 7.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=47

Gridworks should allow programmatic removal of row

Original author: [email protected] (May 09, 2010 04:40:31)

Gridworks should allow programmatic removal of rows, or deleting rows using a facet, flag, or
starred filter.

Something for this effect:

if(startsWith(value,"RMK"),row.starred=true,value)

Then under Column 1 in interface allow Edit dropdown to have "Remove/Delete Starred Rows"

or perhaps under the rightside facet panel for Starred as a hyperlink click to "Remove Starred
Rows".

Thanks to Thad Guidry for the suggestion.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=1

Move test temp files out of source hierarchy

Original author: tfmorris (May 13, 2010 17:04:57)

SVN is currently offering to commit some files in
./tests/temp/153nnnn.project. These should either be creating in the temp
directory (preferably) or added to svn:ignore (as a last resort). All
operating systems have the concept of a temp directory and Java has an O/S
independent way of creating temp files, so that would be the best solution
unless something mitigates against it.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=24

CSV import is too basic

Original author: eferonline (May 11, 2010 13:09:48)

Consider this toy-csv-file:

name,description,yearOfBirth
Mary II,"Mary II was Queen regnant of England, Scotland, and Ireland from
1689 until her death.", 1662
Napoleon Bonaparte,"Napoleon I.
He was a military and political leader of France and Emperor of the French
as Napoleon I, whose actions shaped European politics in the early 19th
century.",1769

The commas in Mary's description are 'escaped' by using the " mode. The
same is done for the comma and the line break in Napoleon's description.
Pretty common for real-life data.

So two data rows should be detected, (one including a line break). Instead
Three rows are created on import. Not too smart - considering that such an
'extended' escaping is very common, e.g. in exporters of spreadsheet
software and as respective clipboard formats.

No way to import the file "correctly" (or to choose parsing mode) in
Version 1.0-r667, running on Windows XP.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=19

Manually selecting a reconciliation match returns the grid to page 1.

Original author: staringmonkey (May 14, 2010 14:09:53)

What steps will reproduce the problem?

  1. From the header of a data column select Reconcile -> Start Reconciling.
  2. Select and appropriate topic and click Start Reconciling.
  3. Wait for the reconcile to complete.
  4. Click the "next page" link.
  5. Find a cell that which has several possible reconciliation matches and
    click the check-mark or double-check-mark image.

What is the expected output? What do you see instead?

Reconciliation is completed correctly, however, grid is returned to page
one. This makes it nearly impossible to complete manual reconciliation for
large datasets.

What version of the product are you using? On what operating system?

SVN HEAD on OSX.

Please provide any additional information below.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=33

jython re expression doesn't work

Original author: [email protected] (May 16, 2010 21:16:27)

I'm trying to transform BWV 1 — Wie schön leuchtet der Morgenstern, BWV 1 in a cell to Wie schön
leuchtet der Morgenstern using the following Jython function:

import re
v = cell["value"]
g = re.search(r"""— (._),\s_BWV""",v)
return g.group(1)

which, alas, returns null

However, in Jython 2.5.1, the following code works

-- coding: utf-8 --

import re

v = cell["value"]

v = "BWV 1 — Wie schön leuchtet der Morgenstern, BWV 1"
g = re.search(r"""— (._),\s_BWV""",v)
print g.group(1)

I'm using GW Version 1.0.1-r732

Original issue: http://code.google.com/p/google-refine/issues/detail?id=42

./gridworks not compiling server tests (but eclipse does)

Original author: iainsproat (May 12, 2010 08:01:59)

Please find attached patch (based on r719) which is a small example of unit
testing with JUnit and Mockita of Gridworks.

This runs fine in the JUnit runner in Eclipse, but I get a "package
com.metaweb.gridworks.commands does not exist" compile time error when
running ./gridworks server_test from the command line.

File and changes description:
--tests/java/lib/mockita-all-1.8.4.jar
adds the mockita library

--.classpath
change appends the mockita library to the java build path

--tests/java/src/com/metaweb/gridworks/tests/commands/util/
CancelProcessesCommand.java
Sample test script with two methods. One testing for behaviour on null
parameters passed (should fail on running, as it expects
IllegalArgumentException which isn't currently returned by the method under
test).
The second test, doPost, uses mock objects to verify the fully working use
case of the system under test, this should pass. As Mockita seems unable
to deal with mocking variables, I had to wrap the field
Project.projectManager in a method Project.getProjectManager().

--src/main/java/com/metaweb/gridworks/model/Project.java
Additional method getProjectManager() to wrap field projectManager. I
think the variable should be made protected, and all calls to it made
through the method. (In .Net libraries, it's best practice to avoid
calling variables directly, and they are private by default - not sure if
it's the same practice in Java?).

--src/main/java/com/metaweb/gridworks/commands/util/CancelAllProcesses.java
updated to call Project.getProjectManager() so the call can be verified by
Mockita.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=21

Add Edit Cells / Set Null

Original author: thadguidry (May 19, 2010 16:33:49)

Often times, after performing a facet, and finding outliers, such as mis-
typed phone numbers that are completely useless since they are missing
several digits, I have the need to constantly transform and then "value=null"
to remove those worthless phone numbers.

It would be useful to have an easy EDIT CELLS / SET NULL feature.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=49

org.slf4j.impl.StaticLoggerBinder

Original author: iainsproat (May 11, 2010 20:40:48)

On using project in Eclipse the error 'SLF4J: Failed to load class
"org.slf4j.impl.StaticLoggerBinder"' occurred.

More info on a solution to the message can be found here:
http://www.slf4j.org/codes.html#StaticLoggerBinder

sl4j-log4jnnnn.jar is in the source tree, but not in the
classpath. Adding it to the build path resolves this
error.

Patch for classpath attached.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=20

Transform dialog should remember preferred language.

Original author: staringmonkey (May 14, 2010 13:36:52)

What steps will reproduce the problem?

  1. From any column header select Edit Cells -> Transform.
  2. Enter and execute a Jython expression.
  3. Return to the Transform dialog.

What is the expected output? What do you see instead?

Language drop-down should default to Jython, but instead returns to GEL.

What version of the product are you using? On what operating system?

SVN HEAD on OSX.

Please provide any additional information below.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=30

gzip: stdin: not in gzip format

Original author: iainsproat (May 10, 2010 09:05:54)

1.Download trunk (r667) from sourcecode
2.run ./gridworks from Cygwin bash

The following is returned:

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors
Error while expanding apache-ant-1.8.0-bin.tar.gz

I believe this fails as Ant 1.8.0 is now superseded by 1.8.1 and the 1.8.0
files moved to the archive directory.

For the 1.8.0 version the URL in /trunk/gridworks should be:
http://www.apache.org/dist/ant/binaries/apache-ant-1.8.0-bin.tar.gz
But this is likely to break at every Ant release cycle.

The stable URL for the 1.8.0 version will be:
http://archive.apache.org/dist/ant/binaries/apache-ant-1.8.0-bin.tar.gz

Patch attached for 1.8.1, using the archive url.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=6

Renaming Cells with Ctrl-Enter produced ERROR

Original author: thadguidry (May 17, 2010 18:18:54)

What steps will reproduce the problem?

  1. running in netbeans
  2. using Ctrl-Enter with Edit cell view to rename 'Drug" cells with same
    string.
  3. Error resulted after renaming about 4 or 5 rows

What do you see instead?

12:57:23.194 [ gridworks_server] Starting Server bound to
'127.0.0.1:3333' (0ms)
12:57:23.212 [ gridworks_server] Initializing context: '/' from
'C:\Users\tguidry\Documents\NetBeansProjects\trunk798\trunk\src\main\webapp
' (18ms)
12:57:23.261 [ gridworks_server] Starting autoreloading scanner...
(49ms)
12:57:23.854 [ project_manager] Failed to use jdatapath to detect
user data path: resorting to environment variables (593ms)
12:57:23.857 [ project_manager] Using workspace directory:
C:\Users\tguidry\AppData\Roaming\Gridworks (3ms)
12:57:23.858 [ project_manager] Loading workspace:
C:\Users\tguidry\AppData\Roaming\Gridworks\workspace.json (1ms)
13:02:00.891 [ project] Loaded project 1455288364717 from
disk in 0 sec(s) (277033ms)
13:02:23.934 [ project_manager] Saved workspace (23043ms)
13:03:24.211 [ compute-clusters_command] computed clusters
[binning,fingerprint] in 75ms (60277ms)
13:03:29.420 [ compute-clusters_command] computed clusters [binning,ngram-
fingerprint] in 35ms (5209ms)
13:03:40.082 [ compute-clusters_command] computed clusters [binning,ngram-
fingerprint] in 18ms (10662ms)
13:07:23.939 [ project_manager] Saving some modified projects ...
(223857ms)
13:07:23.977 [ project] Saved project '1455288364717'
(38ms)
13:07:23.994 [ project_manager] Saved workspace (17ms)
13:11:08.617 [ org.mortbay.log] /command/get-history (224623ms)
java.util.ConcurrentModificationException
at
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at com.metaweb.gridworks.history.History.write(History.java:223)
at
com.metaweb.gridworks.commands.Command.respondJSON(Command.java:204)
at
com.metaweb.gridworks.commands.Command.respondJSON(Command.java:191)
at
com.metaweb.gridworks.commands.history.GetHistoryCommand.doGet(GetHistoryCo
mmand.java:21)
at
com.metaweb.gridworks.GridworksServlet.doGet(GridworksServlet.java:211)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:390)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnecti
on.java:923)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:547)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:2
28)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.j
ava:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:
908)
at java.lang.Thread.run(Thread.java:619)
13:12:23.994 [ project_manager] Saving some modified projects ...
(75377ms)
13:12:24.099 [ project] Saved project '1455288364717'
(105ms)
13:12:24.131 [ project_manager] Saved workspace (32ms)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=45

Conflated triples - all rows are producing triple with "s" :" $Name_0"

Original author: iainsproat (May 11, 2010 08:34:26)

Using r708.
Example of triples are in http://gridworks-loads.freebaseapps.com/job/28

The MQL produced by the schema alignment preview seems sensible:
[{ "/location/location/geolocation":[{ "/location/geocode/latitude":[{
"connect":"insert", "value":"57.14839008681991", "type":"/type/float" } ],
"/location/geocode/longitude":[{ "connect":"insert", "value":"-
2.095024795601918", "type":"/type/float" } ], "type":"/location/geocode",
"create":"unconditional" } ], "name":"AB10 1AA",
"/location/postal_code/postal_code":[{ "connect":"insert", "value":"AB10
1AA", "type":"/type/text", "lang":"/lang/en" } ],
"type":"/location/postal_code", "create":"unless_exists" }, ...

but the triples all seem to be conflated to the $Name_0 object. (It's the
only one with a type added to it.

{ "s" : "$Name_0", "p" : "type", "o" : "/location/postal_code" } { "s" :
"$Name_0", "p" : "name", "o" : "AB10 1AA" } { "s" : "$Name_0", "p" :
"/location/postal_code/postal_code", "o" : "AB10 1AA" } { "s" : "$Name_0",
"p" : "/location/location/geolocation", "o" : {
"/location/geocode/latitude": 57.14839008681991,
"/location/geocode/longitude": -2.095024795601918 } } { "s" : "$Name_0",
"p" : "/location/postal_code/postal_code", "o" : "AB10 1AF" } { "s" :
"$Name_0", "p" : "/location/location/geolocation", "o" : {
"/location/geocode/latitude": 57.14886535988805,
"/location/geocode/longitude": -2.0961830540030317 } } { "s" : "$Name_0",
"p" : "/location/postal_code/postal_code", "o" : "AB10 1AG" }

Original issue: http://code.google.com/p/google-refine/issues/detail?id=17

./gridworks test fails with "Cannot change ownership" error

Original author: iainsproat (May 10, 2010 12:06:35)

./gridworks test

r683 on Windows XP, Cygwin 1.7, Java JDK 1.6u20 returns the below error. I
think it is to do with permissions when writing to a folder (running
without admin rights):

tar: virtualenv-1.4.6/virtualenv.egg-info/not-zip-safe: Cannot change
ownership
to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv.egg-info/top_level.txt: Cannot change
ownership
to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv.egg-info/entry_points.txt: Cannot change
owners
hip to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv.egg-info/PKG-INFO: Cannot change ownership
to u
id 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv.egg-info/dependency_links.txt: Cannot
change ow
nership to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv.egg-info/SOURCES.txt: Cannot change
ownership t
o uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv.egg-info: Cannot change ownership to uid
1000,
gid 1000: Invalid argument
tar: virtualenv-1.4.6/docs/index.txt: Cannot change ownership to uid 1000,
gid 1
000: Invalid argument
tar: virtualenv-1.4.6/docs/news.txt: Cannot change ownership to uid 1000,
gid 10
00: Invalid argument
tar: virtualenv-1.4.6/docs/license.txt: Cannot change ownership to uid
1000, gid
1000: Invalid argument
tar: virtualenv-1.4.6/docs/_build/_sources/index.txt: Cannot change
ownership to
uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/docs/_build/_sources/news.txt: Cannot change
ownership to
uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/docs/_build/_sources/license.txt: Cannot change
ownership
to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/docs/_build/_sources: Cannot change ownership to uid
1000,
gid 1000: Invalid argument
tar: virtualenv-1.4.6/docs/_build: Cannot change ownership to uid 1000, gid
1000
: Invalid argument
tar: virtualenv-1.4.6/docs: Cannot change ownership to uid 1000, gid 1000:
Inval
id argument
tar: virtualenv-1.4.6/scripts/virtualenv: Cannot change ownership to uid
1000, g
id 1000: Invalid argument
tar: virtualenv-1.4.6/scripts: Cannot change ownership to uid 1000, gid
1000: In
valid argument
tar: virtualenv-1.4.6/PKG-INFO: Cannot change ownership to uid 1000, gid
1000: I
nvalid argument
tar: virtualenv-1.4.6/MANIFEST.in: Cannot change ownership to uid 1000, gid
1000
: Invalid argument
tar: virtualenv-1.4.6/setup.cfg: Cannot change ownership to uid 1000, gid
1000:
Invalid argument
tar: virtualenv-1.4.6/setup.py: Cannot change ownership to uid 1000, gid
1000: I
nvalid argument
tar: virtualenv-1.4.6/virtualenv.py: Cannot change ownership to uid 1000,
gid 10
00: Invalid argument
tar: virtualenv-1.4.6/virtualenv_support/setuptools-0.6c11-py2.6.egg:
Cannot cha
nge ownership to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv_support/setuptools-0.6c11-py2.5.egg:
Cannot cha
nge ownership to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv_support/setuptools-0.6c11-py2.4.egg:
Cannot cha
nge ownership to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv_support/pip-0.6.3.tar.gz: Cannot change
ownersh
ip to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv_support/distribute-0.6.8.tar.gz: Cannot
change
ownership to uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv_support/init.py: Cannot change
ownership to
uid 1000, gid 1000: Invalid argument
tar: virtualenv-1.4.6/virtualenv_support: Cannot change ownership to uid
1000, g
id 1000: Invalid argument
tar: virtualenv-1.4.6: Cannot change ownership to uid 1000, gid 1000:
Invalid ar
gument
tar: Exiting with failure status due to previous errors
Error while expanding virtualenv-1.4.6.tar.gz

Original issue: http://code.google.com/p/google-refine/issues/detail?id=8

Error 500 Problem accessing /command/create-project-from-upload

Original author: thadguidry (May 17, 2010 14:35:27)

What steps will reproduce the problem?

  1. Checkout r797
  2. run with netbeans
  3. try to create project from THAD_SEARCH.xlsx

It should:

Load correctly showing columns and rows of .xlsx file in browser.

Instead produces Error Output:

Problem accessing /command/create-project-from-upload. Reason:

org/apache/xmlbeans/XmlException

Caused by:

java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlException
at
com.metaweb.gridworks.importers.ExcelImporter.read(ExcelImporter.java:51)

Original issue: http://code.google.com/p/google-refine/issues/detail?id=44

ClassNotFoundException - could not find 'main' class on initial build

Original author: iainsproat (May 10, 2010 09:14:09)

1.download/update source (r683)
2.using Cygwin 1.7 & JDK 6 update 20
3.# ./gridworks

Starting Gridworks at 'http://127.0.0.1:3333/'

Exception in thread "main" java.lang.NoClassDefFoundError:
com/metaweb/gridworks/Gridworks
Caused by: java.lang.ClassNotFoundException:
com.metaweb.gridworks.Gridworks
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
Could not find the main class: com.metaweb.gridworks.Gridworks. Program
will exit.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=7

Primary grid should be sortable by columns

Original author: staringmonkey (May 14, 2010 13:49:17)

What steps will reproduce the problem?

  1. Click header on any column, or select View from the menu.

What is the expected output? What do you see instead?

I expect the columns to be sortable, but they aren't.

What version of the product are you using? On what operating system?

SVN HEAD on OSX.

Please provide any additional information below.

If this is by-design--and believe I have some understanding of why it would
be--then its not a huge loss, but if its a reasonably straight-forward
feature to add, it would be very useful.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=32

Array literals in GEL

Original author: [email protected] (May 18, 2010 22:11:19)

Array literals should be supported in GEL so a user could say, loop through a bunch of columns to concatenate their
contents into single column, for example:

forEach(["narr1","narr2","narr3","narr4"], v, if(isNonBlank(cells[v].value), cells[v].value, "")).join(" ")

Would join the columns "narr1 ","narr2","narr3", and "narr4" into a single column. Right now, you just get an error if you try.

You can accomplish this today with something like:

forEach("narr1,narr2,narr3,narr4".split(","), v, if(isNonBlank(cells[v].value), cells[v].value, "")).join(" ")

But it'd be nice to be able to work with arrays as well as strings.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=46

Implement new "stemmed" option now available in Relevance search

Original author: thadguidry (May 20, 2010 00:50:25)

Test Case:
http://search.labs.freebase.com/api/service/search?
query=aramids&indent=1&limit=1&stemmed=1

Details and limits on this new feature of Relevance service:

https://bugs.freebase.com/browse/REL-306

I propose that we eventually support the new Stemmed option in later
milestones of Gridworks. Of course those that are encouraged by this news,
code away !

com.metaweb.gridworks.model.recon.HeuristicReconConfig
method batchReconUsingRelevance

Note: there are around 10 lines of code that build the relevance query
currently according to David H.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=50

Behavior of Text Filter is unpredictable when "regular expression" mode is enabled.

Original author: staringmonkey (May 14, 2010 14:22:00)

What steps will reproduce the problem?

  1. Load the U.S. Overseas Loans and Grants dataset identified on the
    "Sample Datasets" wiki page.
  2. Select "Text Filter" from the header of the "country_name" column.
  3. Type "aus" in the filter. Two rows are returned, "Australia" and "Austria".
  4. Enable "regular expression" mode. All results disappear.
  5. Replace the filter value with "^Aus". The results remain empty.

What is the expected output? What do you see instead?

I tried various other regular expression formats (including /'s, matching
groups, etc.) and could not make the filter return the expected results.
On another dataset I did see matching values, but not enough to have
covered the entire dataset, which leads me to believe the regex filter may
not be searching the entire dataset.

What version of the product are you using? On what operating system?

SVN HEAD on OSX.

Please provide any additional information below.

Original issue: http://code.google.com/p/google-refine/issues/detail?id=34

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.