tilemill-project / millstone Goto Github PK
View Code? Open in Web Editor NEWmiddleware for map styles and datasources
License: BSD 3-Clause "New" or "Revised" License
middleware for map styles and datasources
License: BSD 3-Clause "New" or "Revised" License
Frequently urls to geojson don't end with an extension of .json or .geojson. Also csv files remotely hosted at google docs also present this issue:
https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AqV4OJpywingdFNYLXpKMmxqMG1lWTJzNE45ZUVnNlE&single=true&gid=0&output=csv
Without the ability to detect the type of the file then tilemill is unable to tell mapnik which datasource plugin to use to read it, eg ogr
for .geojson files and csv
for .csv files.
So, we need to find more robust ways to support this. One approach in TileMill may be to detect urls without an obvious extension and then require the user to supply the type of file, but good api's will set the content-disposition
from which a filename and extension can be pulled.
It should be easy to get the content-disposition
during download()
if (!path.extname(filepath)
&& response.headers
&& response.headers['content-disposition']) {
var disp = response.headers['content-disposition'];
var match = disp.match(/^.*filename="(.*)"/i);
if (match[1]) {
filepath = path.join(filepath,match[1]);
}
}
But our caching framework will still likely break because we call path.extname(uri.pathname)
in several places.
Using Tilemill for windows alpha 0.9, type detection fails to recognize the type of a csv published via Google docs. This occurs when following the workflow detailed here.
The problem is currently avoided by adding type=csv
to the "Advanced" dialog.
@springmeyer, I was going through the millstone tests when I found that the WGS84 proj string had changed. I updated the tests, but wanted to confirm that this change was intentional.
In several places in millstone different behavior is triggered depending on whether paths are relative or absolute on the filesystem. Like at https://github.com/mapbox/millstone/blob/master/lib/millstone.js#L400.
We need to make sure that this logic works cross platform. It obviously will not work if paths start with drive letters, e.g. C:\foo\bar.txt
will look like a relative path because it does not start with a path separator.
So, the task here it to either write an is_relative
function that detects drive letters, or see if someone in the node world has already done this, or look for whether it is cleaner to normalize paths somehow across millstone into uri components whereby drive is separate and the absolute path (starting with the path separator).
Then we need good, simple tests for this that can easily be run as mocha tests.
I think there is a small typo in util.js (line 4)
Step = require('Step');
that should be:
Step = require('step');
You can easily get blank maps for layer types in tilemill other than geojson and shapefiles, because projections that are unknown are simply passed through as blanks. So, then mapnik aborts rendering and dumps: proj_init_error:failed to initialize projection with: ''
in the logs.
Millstone should throw and let the api user (in this case tilemill) known about the need to manually specify the projection.
Pretty sure we properly recursed into directories inside archives from the beginning, but because some shapefiles are failing, my hunch is that this is broken now in tilemill 0.5.0.
error is:
"Error: No projection found for layer "al" at /Users/dane/Documents/MapBox/project/aaa".
This appears to be spurious as the shapefile does have a valid .prj, but perhaps is not finding it because of the directory?
Add tests for these commits v0.0.8...master
Recently we moved the parsing fallback required for ESRI variant shapefile WKT into node-srs. But, node-srs only fallsback if there is an error and should not fallback only if the .proj4 value is undefined (node-srs will require other srs representations that did work). But TileMill requires the proj4 be known, so millstone still needs this fallback. Ideally the node-srs api would give a single .proj4() call, but until then we need to handle this correctly in millstone for what tilemill expects.
Currently we do not use a timeout, so for some urls that are wrong the application will hang for upwards of minutes. I was able to prompt a very long hang with marker-file:url("https://example.com/.svg");
from mapbox/carto#56
eg ../data/file.shp
First an error is thrown about SRS detection (which I think is a red herring), and then an error that mapnik cannot find the download in the layers folder.
Tried a zipped shapefile like: https://tileio.cartodb.com/tables/places_by_lon_lat?format=shp
From mapbox/carto#61
when trying to run 'tilebatch' to render an mml file that references a remote geojson file with no extension, carto will download the file, but then render nothing.
if you set a breakpoint in lib/carto/renderer.js:95, you discover that it throws the 'could not be downloaded' error. Also, err is set to the local filename that it was downloaded to.
r elatedly, it would be nice if this error was somehow displayed to the user, instead of silently producing an empty mbtiles file.
The flow of millstone is currently something like this:
The task is to add a step between 2 & 3 that gives us a chance to add indexes to datasources. At the moment we're interested in implementing sqlite index addition but in the future there are other indexing operations (e.g. shapefiles) that we may want to do.
We'll want to add a test for this step as well once it's implemented.
In order to get csv loading out of google docs we need https support (along with 302 support). I found that switching out node-get with request immediatly allowed this to work, and avoid the error of 'socket hung up' that I get with node-get.
Potential patch is:
diff --git a/lib/millstone.js b/lib/millstone.js
index dcf1807..18e0bbe 100644
--- a/lib/millstone.js
+++ b/lib/millstone.js
@@ -8,6 +8,7 @@ var EventEmitter = require('events').EventEmitter;
var _ = require('underscore');
var srs = require('srs');
var get = require('get');
+var request = require('request');
var zipfile = require('zipfile');
var Step = require('step');
var sqlite3 = require('sqlite3');
@@ -31,7 +32,7 @@ function download(url, filepath, callback) {
downloads[url] = new EventEmitter();
pool.acquire(function(obj) {
- (new get(url)).toDisk(dl, function(err, file) {
+ var req_dl = request(url, function (err, response, body) {
pool.release(obj);
if (err) {
downloads[url].emit('done', err);
@@ -44,6 +45,7 @@ function download(url, filepath, callback) {
return callback(err, filepath);
});
});
+ req_dl.pipe(fs.createWriteStream(dl));
});
}
diff --git a/package.json b/package.json
index 8764133..ae22f2f 100644
--- a/package.json
+++ b/package.json
@@ -18,6 +18,7 @@
"underscore" : "1.1.x",
"step": "0.0.x",
"generic-pool": "1.0.x",
+ "request": "1.9.x",
"get": "~0.4.2",
"srs": "~0.2.7",
"zipfile": "~0.2.2",
I'm trying to localize uris passed to marker-file, point-file and the like but I can't seem to see them localized.
Before going any deeper, I went looking in the existing test cases but I found no such case tested there.
Do you have an example ?
What I'm trying is:
{ point-file: url('http://upload.wikimedia.org/wikipedia/commons/7/72/Cup_of_coffee.svg'); }
Millstone says:
[millstone] processing style 'style.mss'
path.exists is now called `fs.exists`.
[millstone] finished processing '/tmp/millstone/base'
And the millstone.resolve callback gets passed a resolved mml which is the same
as the input one.
Indeed I don't see the localizeCartoURIs
function in lib/millstone.js use the "file" parameter passed to the callback, but just the error, if any. What's the rationale for that ?
This is millstone master branch:
https://github.com/mapbox/millstone/blob/master/lib/millstone.js#L401
This appears to only be happening for millstone, not other repos. I've filed a github support request, and am waiting for a response.
URI localization was added in #41 but it appears that no errors are thrown if the url is invalid. This is not good because the lack of a throw ends up leading to an invalid path being constructed in the mapnik xml and then for every feature a console line appears like:
[tilemill] "Mapnik LOG> 2012-08-23 13:12:27:" could not intialize reader for: '/Users/dane/Documents/MapBox/cache/a9b9f043-/a9b9f043-'
This will be mitigated by mapnik/mapnik#1439 slightly but really should be handled earlier in millstone, no?
@willwhite - any sense if this is supposed to pass? Last I checked we don't support sqlite srs detection.
> [email protected] test /Users/dane/projects/millstone
> which expresso | sh
uncaught: AssertionError: "Unable to determine SRS for layer \"sqlite-attach\" at /Users/dane/projects/millstone/test/cache/layers/countries.sqlite" == "Server returned HTTP 404"
at /Users/dane/projects/millstone/test/test.js:43:16
at Function.end (/Users/dane/projects/millstone/lib/millstone.js:485:9)
at next (/Users/dane/projects/millstone/node_modules/step/lib/step.js:51:23)
at next (/Users/dane/projects/millstone/node_modules/step/lib/step.js:54:7)
at next (/Users/dane/projects/millstone/node_modules/step/lib/step.js:54:7)
at Function.<anonymous> (/Users/dane/projects/millstone/lib/millstone.js:221:29)
at native
What it says.
I had to revert this change because autodetection isn't working for all GeoJSON files. We might want to change node-srs to return WGS84 by default if autodetection fails.
In addition to parsing content-type we should also at least:
content-type
?foo=bar&type=fun&format=csv&id=4
Not sure if this ever worked, but currently when fetching from a url that does not have a .json or .geojson ext and we succeed in detecting as json using the content-disposition, srs detection still fails here: https://github.com/mapbox/millstone/blob/master/lib/millstone.js#L485-490
The reason is that node-srs has no idea the file is geojson, so its custom handling of geojson is not triggered.
If a user pushes an escaped url into TileMill's Layer UI like:
http://examples.cartodb.com/api/v1/sql?format=geojson&q=SELECT%20*%20FROM%20costa_rica_pa%20limit%201
Then it appears by the time it gets to the point of being sent to node-get the url is double escaped:
http://examples.cartodb.com/api/v1/sqlformat=geojson&q=SELECT%2520*%2520FROM%2520costa_rica_pa%2520limit%25201
This breaks of course since that url is no longer valid.
It appears harmless to call unescape
on both a raw url (basically unescaping twice) and an escaped url, so this fix can be as easy calling unescape
right here: https://github.com/mapbox/millstone/blob/master/lib/millstone.js#L38
Hello,
I'm packaging millstone for Debian.
The only licensing information is a "BSD" in package.json: please expand this, either by placing a comment in the sourcecode (millstone.js), or by putting a LICENSE/COPYING file in the root directory of the source.
It would be great if you could provide some licensing info in this bugreport, so that I can continue my work, and also so that I don't have to wait for a new release to happen :)
Many thanks,
David
a979c06 is invalid.
node-srs will fallback to trying to parse the prj as if it is an ESRI variant if there is an exception thrown internally by ogr. But the design of node-srs is that it will return as much about the projection as it can if ogr internally does not throw. The original idea behind this design is that you might want to know various things about the projection even if you can't know all (like proj4 string or epsg #).
What this means for millstone/tilemill is that millstone needs to fallback to parsing files as ESRI variants if no proj4 value is known for a given prj (which is possible to be the case for ESRI variant files even if they never caused an exception in ogr originally).
So, the case where the projection is valid enough to be parsed by ogr but not valid enough to have its proj4 representation detected is the most critical case and will fail for all tilemill users until millstone falls back to trying a parse with ESRI:: pre-prepended. This is the same regression I fixed in tilemill 0.6.x but had regressed in 0.5.x after working in 0.4.x.
Overall I think the design and api of node-srs is terrible. Ideally soon we can properly wrap libgdal so we can do projection detection on all datasource types and ditch node-srs - calling it a temporary shim that got us this far.
Something like this is not currently supported, but common in the wild:
"attachment; filename=foo.csv"
Downstream issue: https://github.com/mapbox/tilemill/issues/870
Driven by https://github.com/mapbox/tilemill/issues/922 we should think about how to make the caching (and cache flushing) functionality more modular.
So far for the experimentation in the 'live-cache' branch of tilemill this is all that is needed to trigger rec-caching upon save, but we can do better:
diff --git a/lib/millstone.js b/lib/millstone.js
index ef4beb0..fdd70b5 100644
--- a/lib/millstone.js
+++ b/lib/millstone.js
@@ -312,7 +312,7 @@ function resolve(options, callback) {
if (uri.protocol) {
var filepath = path.join(cache, cachepath(l.Datasource.file));
path.exists(filepath, function(exists) {
- if (exists) {
+ if (exists && !(l.cache_method && l.cache_method === 'live')) {
symlink(filepath);
} else {
utils.mkdirP(path.dirname(filepath), 0755, function(err) {
I was suprised to see that .flush() does not undo the effects of .resolve(), as the README file seems to suggest just that.
It would be useful to be able to clean up the cache by passing the same MML passed to .resolve.
It appears localizeCartoURIs
calls uniq
to get all unique matches of urls from a style. This makes sense to avoid hitting the download queue harder than needed. But the problem with this is that then some urls are not switch out with their localized path, which breaks things.
This module is exceedingly non-self-explanatory.
After a successful npm install
, npm test
fails with :
] npm test
> [email protected] test /home/src/cartodb/millstone
> mocha -R Spec --timeout 10000
module.js:340
throw err;
^
Error: Cannot find module 'Spec'
at Function.Module._resolveFilename (module.js:338:15)
at Function.Module._load (module.js:280:25)
at Module.require (module.js:362:17)
at require (module.js:378:17)
at Mocha.reporter (/home/src/cartodb/millstone/node_modules/mocha/lib/mocha.js:100:24)
at Object.<anonymous> (/home/src/cartodb/millstone/node_modules/mocha/bin/_mocha:178:7)
at Module._compile (module.js:449:26)
at Object.Module._extensions..js (module.js:467:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Module.runMain (module.js:492:10)
at process.startup.processNextTick.process._tickCallback (node.js:244:9)
npm ERR! Test failed. See above for more details.
npm ERR! not ok code 0
Original support ticket:
http://support.mapbox.com/discussions/tilemill/1429-use-relative-paths-in-symlinks-to-cached-mapbox-layer-shapedata-in-os-x
This seems like a good idea to me.
Millstone does some intense introspection of sqlite databases to be able to autodetect primary keys and spatialite geometry type.
When a user in tilemill miss spells a database, the logic fails but leaves behind a blank database (because the default behavior of node-sqlite3 is to create one if the filename passed does not exist).
This then creates very hard to understand errors when this db name is passed to node-mapnik/mapnik in TileMill which indicate the table is missing when really the database filename is wrong.
Because Mapnik latest upstream now can handle auto-detection of primary keys and spatialite geometry types I propose simply removing all this logic (rather than fixing the create if not found behavior). So, assigning to myself to get this cleaned up.
Flagging this issue as requiring more thought.
Consider these various issues:
With the exception of zipped resources, the generic answer to these is using gdal/ogr to introspect the data (and therefore wrapping gdal/ogr as a node c++ addon). But, currently the approach of millstone is to handle each case in a custom way - with the benefit of avoiding the extra dependency and offering advanced functionality.
On XP in copy mode:
[tilemill] [millstone] processing style 'layer'
[tilemill]
[tilemill] C:\Program Files\TileMill-v0.10.0-pre\tilemill\node_modules\millstone\lib\millstone.js:184
[tilemill] return loc[0] !== '\\' && loc.match(/^[a-zA-Z]:\\/) === null;
[tilemill] ^
[tilemill] TypeError: Cannot read property '0' of undefined
[tilemill] at isRelative (C:\Program Files\TileMill-v0.10.0-pre\tilemill\node_modules\millstone\lib\millstone.js:184:19)
[tilemill] at resolved.Layer.forEach.name (C:\Program Files\TileMill-v0.10.0-pre\tilemill\node_modules\millstone\lib\millstone.js:598:29)
[tilemill] at Object.oncomplete (fs.js:297:15)
[tilemill] Error: child process: "tile" failed with code "1"
Likely should/can be fixed here in millstone: mapbox/carto#87 and https://github.com/mapbox/tilemill/issues/616
It might be a good idea to switch to request (more used and developed than node-get).
putting in:
/this/does/not/exist/business.sqlite
for an sqlite layer in tilemill crashes the app with:
/Users/dane/projects/tilemill_master/node_modules/millstone/lib/millstone.js:360
if (err) throw err;
^
Error: SQLITE_CANTOPEN: unable to open database file
I'm going to move forward with a first stab at making it possible to use remote attached dbs. As advertised, I'm not going to be looking to far into the generalization of this functionality.
We use readlink
in the extension checking code. Not sure how this will behave when millstone is in copy mode. Needs a look.
this appears to happen when the main db path is absolute
moved from mapbox/carto#67
Errors for failed layer requests like this
{ message: 'Server returned HTTP 403'
, statusCode: 403
}
would be more helpful if they included the url that was being requested. I had to manually check each layer source to see which one was failing (it was a typo on my part). Example 403 error: http://gis-data.s3.amazonaws.com/foobar.zip
curious one @ajashton, I've not seen that before. Just want to add that, as you likely know, usually missing data usually looks like:
{ message: 'File not found: /Users/dane/projects/arc.js/square.json' , stack: [Getter/Setter] } { message: 'File not found: /Users/dane/Desktop/route.shp' , stack: [Getter/Setter] } { message: [Getter/Setter] , stack: [Getter/Setter] , type: 'non_object_property_load' , arguments: [ 'length', undefined ] }
This was the cause of the failing to load KML from https://github.com/mapbox/tilemill/issues/1242.
Given a filename like: New York City\'s Solidarity Economy.kml
then guessExtension
in millstone tries to deduce the path extension from just 'New York City' leading to no ext being found and no knowledge of the file type.
> var s = 'attachment; filename="New York City\'s Solidarity Economy.kml"'
> s.match(/filename=['"]?([^'";]+)['"]?/)
[ 'filename="New York City\'',
'New York City',
index: 12,
input: 'attachment; filename="New York City\'s Solidarity Economy.kml"' ]
A problem with the test (fixed in dcea549) was preventing the failure from showing up.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.