tristanjuricek / knockoff Goto Github PK
View Code? Open in Web Editor NEWA Markdown parser + object model in scala
Home Page: http://tristanjuricek.github.com/knockoff
License: BSD 3-Clause "New" or "Revised" License
A Markdown parser + object model in scala
Home Page: http://tristanjuricek.github.com/knockoff
License: BSD 3-Clause "New" or "Revised" License
Another very handy tool would be to go 'backwards' from HTML to a "best" guess markdown document. This would allow my literate programming system to edit the HTML documents on the fly.
I consider it to be a pretty low priority, since this only really enables the use of fancy HTML editors to do writing, which the source markdown pretty much excels at. We'll see. (If it's easy, maybe, otherwise, no way.)
e3 Development Environment Checklist
====================================
1. Initial Requirement
1. Initial Setup
1. Java Setup
1. PostgreSQL 8.1 - build it from source
1. Dropbox
1. Db configuration
1. e3db configuration
1. e3local configuration
1. e3mail configuration
1. JBoss 4.0.4.GA
1. Jboss MultiInstance Configuration
1. Apache Configuration Debian
1. Apache Configuration MacOs
1. CHECKDATA.PL SCRIPT
1. SUMMARIZER
1. EXPORTER CONSIDERATION
1. CHIME EXPORTER
1. BASIC EXPORTER
## Requirements ##
1. We have to install:
$ sudo apt-get install openssh-server
$ sudo apt-get install build-essential
$ sudo apt-get install gcc, zlib1g-dev, readline-dev
## Initial Setup ##
1. Directory Structure
* ~/Applications
* ~/Dropbox (automatically created by Dropbox Installation)
* ~/bin
* sudo mkdir /home/emarsys; chown <user>:<user> /home/emarsys
* mkdir /home/emarsys/tools
* mkdir -p /home/emarsys/IO/import
* mkdir -p /home/emarsys/IO/export/chime
* mkdir /home/emarsys/IO/export/basic
* mkdir /home/emarsys/IO/remote_transfer
For MacOs before creating the directory structure you should:
* In the file **/etc/auto_master** comment the line:
#/home auto_home -nobrowse
* Restart automount daemon
$ sudo automount -vc
2. Environment variables
export POSTGRESQL_HOME=$HOME/Applications/postgresql/current
export JAVA_HOME=$HOME/Applications/java/current
export JBOSS_HOME=$HOME/Applications/JBoss/current
export JRUBY_HOME=$HOME/Applications/jruby
export SCALA_HOME=$HOME/Applicaitons/scala
export EMMA_HOME=$HOME/Applications/emma
export MVN_HOME=$HOME/Applications/maven (where maven is a symbolic link to the Dropbox - only for macos)
export DEV_HOME points to you dev directory
1. Symbolic Links
$ ln -sf ~/Dropbox/scala-<LASTVERSION> ~/Applications/scala
$ ln -sf ~/Dropbox/jruby-<LASTVERSION> ~/Applications/jruby
$ ln -sf ~/Dropbox/maven-<LASTVERSION> ~/Applications/maven
3. PATH
export PATH=$POSTGRESQL_HOME/bin:$JAVA_HOME/bin:$JRUBY_HOME/bin:$SCALA_HOME/bin:$MVN_HOME/bin:$JBOSS_HOME/bin:~/bin:$PATH;
1. Setting executable permission (if not already set)
$ chmod +x $SCALA_HOME/bin/*
$ chmod +x $JRUBY_HOME/bin/*
$ chmod +x $MVN_HOME/bin/*
## Java ##
### Install Java 6
For Debian:
$ sudo apt-get install sun-java6-jdk
$ ln -sf /usr/lib/jvm/java-6-sun $HOME/Applications/java/current
### Keytool
Used to sign Jar files for use in applets. Maybe we use certificates, but, meh.
$ keytool -genkey -alias e3 -keypass 123456 -validity 365
Enter keystore password: 123456
What is your first and last name?
[Unknown]: emarsys Developer
What is the name of your organizational unit?
[Unknown]: Development
What is the name of your organization?
[Unknown]: emarsys eMarketing Systems AG
What is the name of your City or Locality?
[Unknown]: Vienna
What is the name of your State or Province?
[Unknown]:
What is the two-letter country code for this unit?
[Unknown]: AT
Is CN=emarsys Developer, OU=Development, O=emarsys eMarketing Systems AG, L=Vienna, ST=Unknown, C=AT correct?
[no]: yes
PostgreSQL 8.1
--------------
1. Download the source of PostgreSQL 8.1 and extract in a temp directory
1. You need a full Perl installation, including the libperl library and the header files.
For Debian:
$ sudo apt-get install perl, libperl-dev
For Leopard
$ sudo port install perl5.8 +shared
$ sudo port activate perl5.8 @5.8.9_3+shared
* Check if the module URI::Escape is installed
$ perl -MURI::Escape -e 1
if not
$ sudo port install p5-uri-fetch (or you can find the same module in /System/Library/Perl/Extras/5.8.8/URI/Escape.pm)
1. Compile and install to ~/Applications/postgresql/postgresql-8.1.`<NUM>`
./configure --prefix=$HOME/Applications/postgresql/postgresql-8.1.<NUM>/ --enable-depend --with-perl
make
make install
1. Create a symlink to ~/Applications/postgresql/current
$ ln -s ~/Applications/postgresql/postgresql-8.1.<NUM> ~/Applications/postgresql/current
1. Create data directory ~/Applications/postgresql/data
1. Run __initdb__ on ~/Applications/postgresql/data
$ initdb -E=UTF-8 data
1. Create the script ~/bin/start_postgresql.sh
postmaster -D $HOME/Applications/postgresql/data&
1. Allow TCP/IP socket
$ vim ~/Applications/postgresql/data/postgresql.conf
Find configuration line that read as follows:
#listen_addresses='localhost'
Change with
listen_addresses='*'
$ vim vim ~/Applications/postgresql/data/pg_hba.conf
Insert the following line at the end of the file:
host all all 192.168.0.0/24 trust
(TODO check if the first option is necessary)
1. Enable languages globally
$ createlang plperl template1
$ createlang plpgsql template1
$ createlang plperlu template1
$ createuser emarsys (superuser)
####Testing our Installation
1. Create a user for your account
2. Create a DB with the same name
3. Create the file $HOME/.emarsys/DatabaseConfig.xml
<config>
<dbAdminURI>jdbc:postgresql://localhost:5432/pinco?user=pinco&password=pinco</dbAdminURI>
<dbUnitTestURI>jdbc:postgresql://localhost:5432/test?user=pinco&password=pinco</dbUnitTestURI>
</config>
4. Run the test EDREI/common/deebee_test/test.sh (if everything works you should find a DB test with a relation insert_query_update)
Dropbox
-------
For Mac OS X, dropbox was as simple as just setting up the account: [email protected]. For
Windows, it shouldn't require much more.
####Dropbox on Debian Lenny for Gnome
Unfortunately it is not possible to install on Etch due to library version problem.
1. Add to __/etc/apt/sources.list__ the line
deb http://www.getdropbox.com/static/ubuntu gutsy main
2. In __/etc/apt/preferences__ set some basic package pinning to make sure that any packages didn't collide with the existing Debian repository (not likely but you never know)
Package: *
Pin: release a=gutsy
Pin-Priority: 400
3. apt-get update
4. apt-get install nautilus-dropbox
####Dropbox without Gnome
1. Download the closed source Dropbox Linux client from http://www.getdropbox.com/download?plat=lnx.x86 (x86_64 for 64 bit)
1. DownloadExtract the contents and you should get a .dropbox-dist folder out of the archive.
1. Move the folder to $HOME the closed source Dropbox Linux client from http://www.getdropbox.com/download?plat=lnx.x86 (x86_64 for 64 bit)
1. Run ~/.dropbox-dist/dropboxd
1. Ensure that the daemon runs whenever you use your computer
Db configuration
------------------
1. Create the file $HOME/.emarsys/Upgrader.xml (For more information have a look at EDREI/admin/upgrader/readme.txt)
<upgrader>
<jdbc>
<id>basetta_e3db</id>
<url>jdbc:postgresql://localhost:5432/e3db?user=basetta&password=basetta</url>
</jdbc>
<jdbc>
<id>basetta_e3local</id>
<url>jdbc:postgresql://localhost:5432/e3local?user=basetta&password=basetta</url>
</jdbc>
<jdbc>
<id>basetta_e3mail</id>
<url>jdbc:postgresql://localhost:5432/e3mail?user=basetta&password=basetta</url>
</jdbc>
<aliases>
<alias>
<name>e3db</name>
<id>basetta_e3db</id>
</alias>
<alias>
<name>e3local</name>
<id>basetta_e3local</id>
</alias>
<alias>
<name>e3mail</name>
<id>basetta_e3mail</id>
</alias>
</aliases>
</upgrader>
1. Create the ROLE emarsys
$ psql template1
template1=# CREATE ROLE emarsys WITH LOGIN PASSWORD '<password>';
template1=# SELECT rolname FROM pg_roles;
e3db Configuration
---------
1. Create the database e3db
$ createdb e3db
2. Lanuch the Upgrader
$ cd /EDREI/admin/upgrader
$ java -cp target/upgrader.jar com.emarsys.e3.upgrader.Main e3db
## e3local Configuration ##
JBoss 4.0.4.GA
--------------
1. Create JBoss directory
$ mkdir ~/Applications/JBoss
1. Unzip ~/Dropbox/JBoss-4.0.4.GA.tar.bz2 in ~/Applications/JBoss
2. Set up a symlinks to the current JBoss version
$ ln -sf $HOME/Applications/JBoss/JBoss-4.0.4.GA $HOME/Applications/JBoss/current
$ ln -sf $HOME/Applications/JBoss/current /opt/JBoss
1. Create directory for configuration files
$ mkdir ~/Applications/JBoss/app
$ mkdir ~/Applications/JBoss/app/<VERSION_APP>
$ mkdir ~/Applications/JBoss/app/<VERSION_APP>/app
$ mkdir ~/Applications/JBoss/app/<VERSION_APP>/broadcasting
$ mkdir ~/Applications/JBoss/app/<VERSION_APP>/mail
$ mkdir ~/Applications/JBoss/app/<VERSION_APP>/common
$ ln -sf ~/Applications/JBoss/app/<VERSION_APP> ~/Applications/JBoss/current_app
3. Create a symlink for imports and exports
$ ln -sf /home/emarsys/import /opt/JBoss/import
$ ln -sf /home/emarsys/export /opt/JBoss/export
## Jboss MultiInstance Configuration ##
* Create three dummy interfaces
* Debian
$ sudo /sbin/ifconfig eth0:1 <IP_1> NETMASK 255.255.255.0
$ sudo /sbin/ifconfig eth0:2 <IP_2> NETMASK 255.255.255.0
$ sudo /sbin/ifconfig eth0:3 <IP_3> NETMASK 255.255.255.0
* You can configure the additional IP addresses automatically at boot with another iface statement in /etc/network/interfaces:
$ sudo vi /etc/network/interfaces
auto eth0:1
iface eth0:1 inet static
address <IP_1>
netmask 255.255.255.0
broadcast 192.168.0.0
(bug https://bugs.launchpad.net/debian/+source/ifupdown/+bug/114457 so ifup --all)
It is not possible to add automatically virtual interface so let-s use the script
* MacOs
$ sudo /sbin/ifconfig en0 alias <IP_1> netmask 255.255.255.255
$ sudo /sbin/ifconfig en0 alias <IP_2> netmask 255.255.255.255
$ sudo /sbin/ifconfig en0 alias <IP_3> netmask 255.255.255.255
* Edit the file **/etc/hosts**
<IP_1> app.<HOST_NAME>.emarsys.int
<IP_2> broadcasting.<HOST_NAME>.emarsys.int
<IP_3> mail.<HOST_NAME>.emarsys.int
* Create a different server configuration directory for each instance of JBoss AS (remember log4j.xml, e3.properties, e3send.properties, emarsys3-ds.xml, e3mail-ds.xml)
$ $JBOSS_HOME
- server
- default
- app
- broadcasting
- mail
* Create symbolic links to the ear,datasourses and properties files
$ ln -sf $HOME/current/common/log4j.xml $JBOSS_HOME/server/app/conf/log4j.xml
$ ln -sf $HOME/current/app/e3.properties $JBOSS_HOME/server/app/conf/e3.properties
$ ln -sf $HOME/current/current_app/app/E3-web.ear $JBOSS_HOME/server/app/deploy/E3-web.ear
$ ln -sf $HOME/current/current_app/app/emarsys3-ds.xml $JBOSS_HOME/server/app/deploy/emarsys3-ds.xml
$ ln -sf $HOME/current/common/log4j.xml $JBOSS_HOME/server/broadcasting/conf/log4j.xml
$ ln -sf $HOME/current/broadcasting/e3.properties $JBOSS_HOME/server/broadcasting/conf/e3.properties
$ ln -sf $HOME/current/broadcasting/E3-web.ear $JBOSS_HOME/server/broadcasting/deploy/E3-web.ear
$ ln -sf $HOME/current/broadcasting/emarsys3-ds.xml $JBOSS_HOME/server/broadcasting/deploy/emarsys3-ds.xml
$ ln -sf $HOME/current/common/log4j.xml $JBOSS_HOME/server/mail/conf/log4j.xml
$ ln -sf $HOME/current/mail/e3.properties $JBOSS_HOME/server/mail/conf/e3.properties
$ ln -sf $HOME/current/mail/e3send.properties $JBOSS_HOME/server/mail/conf/e3send.properties
$ ln -sf $HOME/current/mail/e3send.ear $JBOSS_HOME/server/mail/deploy/e3send.ear
$ ln -sf $HOME/current/mail/e3mail-ds.xml $JBOSS_HOME/server/mail/deploy/e3mail-ds.xml
* Setting the correct connection-url, user-name, password in the datasource file **emarsys-ds.xml** (or e3mail-ds.xml)
* Configure the **e3.properties** and **e3send.properties** files
* General Configuration
* remote.jndi.server
* jndi.db
* common.dbpwd
* Import file
* import.uploadHost
* import.testUser
* import.testPwd
* web.url
* Launch the all instances (do not do it at the same time ....launch one and go for a coffee before the other launch)
$ sh $JBOSS_HOME/bin/run.sh -c app -b <IP_1> -Djboss.messagingServerPeerID=1
$ sh $JBOSS_HOME/bin/run.sh -c broacasting -b <IP_2> -Djboss.messagingServerPeerID=2
$ sh $JBOSS_HOME/bin/run.sh -c mail -b <IP_3> -Djboss.messagingServerPeerID=3
* Check this script http://www.jboss.org/community/docs/DOC-12305#comment-1106
## Apache Configuration Debian ##
### Activate modules: proxy, rewrite
$ cd /etc/apache2/mods-enabled
$ sudo ln -snf ../mods-available/proxy*
$ sudo ln -snf ../mods-available/rewrite*
### Create a VirtualHost for each JBoss instance (example for the app)
This goes in some place like `/etc/apache2/sites-available/e3.conf`, which is then simlinked to the
`/etc/apache2/sites-enabled` directory.
<VirtualHost *:80>
ServerAdmin webmaster@localhost
ServerName app.basiglio.emarsys.int
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Proxy *>
Order allow,deny
Allow from all
</Proxy>
# Possible values include: debug, info, notice, warn, error, crit,
# alert, emerg.
ErrorLog /var/log/apache2/app_error.log
LogLevel warn
CustomLog /var/log/apache2/access.log combined
ProxyPass / http://app.basiglio.emarsys.int:8080/
ProxyPassReverse / http://app.basiglio.emarsys.int:8080/
ProxyPass / ajp://app.basiglio.emarsys.int:8009/
</VirtualHost>
### Extra Proxies And Rewrite Rule For The Web Machine (app)
Insert this into the virtual host for the `app` instance:
# Batch Mailing API
ProxyPass /bmapi http://localhost:9090
ProxyPassReverse /bmapi http://localhost:9090
# MCAPI
ProxyPass /mcapi http://app.caparezza.emarsys.int:9091
ProxyPassReverse /mcapi http://app.caparezza.emarsys.int:9091
# MailHoney
ProxyPass /mh http://caparezza.emarsys.int:9092
ProxyPassReverse /mh http://caparezza.emarsys.int:9092
RewriteEngine On
# mail open
RewriteRule ^/img/([0-9a-fI]+)\.gif$ /op/t.do?event=open&i=$1 [R,L]
1. Activate the e3.conf
$ sudo ln -sf ../sites-available/e3.conf
1. Restart Apache
$ sudo /etc/init.d/apache2 restart
Apache Configuration MacOs
----------------
1. Uncomment the line in /etc/apache2/httpd.conf (line 465)
Include /private/etc/apache2/extra/httpd-vhosts.conf
1. Edit the file **/etc/apache2/extra/httpd-vhost.conf**
Check the debian configuration
1. Restart apache
$ sudo apachectl restart
CHECKDATA.PL SCRIPT
-----------------------------------
1. In order to use the checkdata.pl script
$ mkdir /home/emarsys/tools/checkdata
$ ln -sf <DEV_HOME>/EDREI/broadcasting/checkdata/checkdata.pl /home/emarsys/tools/checkdata/checkdata.pl
$ ln -sf /home/emarsys/tools/checkdata /home/emarsys/cleaner
(path hardcoded in FileCopyTask.java and CsvRecipientSource.java)
1. Install the necessary perl library (Text::CSV_XS)(verify for macos)
Debian:
$ sudo apt-get install libtext-csv-xs-perl
MacOs
$ sudo rm /usr/bin/perl
$ sudo ln -s /opt/local/bin/perl /usr/bin/perl
$ sudo -H cpan -i Text::Iconv
$ sudo -H cpan -i Text::CSV_XS
SUMMARIZER
-----------------------------------
1. Create direcoty for the summarizer
$ mkdir $HOME/tools/summarizer
1. We should tell the compiler where to find the postgres library (libpq.so.4)
$ sudo echo "$HOME/Applications/postgresql/current/lib" >> /etc/ld.so.conf.d/libc.conf
$ sudo ldconfig
1. Copy the source in a temp directory and compile it (remeber to change the variable PGHOME in the Makefile)
1. Copy the target file to $HOME/tools/summarizer
1. Set the properties **nbr_of_default_summarizer** of the broadcasting instance (ask Guy) (default should be 2)
1. Create n script
$ vim $HOME/tools/summarizer/start_[1..nbr_of_default_summarizer].sh
#!/bin/sh
# $1: database connect string
# $2: id of subtable
# $3: sync interval in seconds
# $4: loglevel
./summarizer 'dbname=e3db user=emarsys password=emarsys host=localhost' [1..nbr_of_default_summarizer] 25 2
$ vim $HOME/tools/summarizer/start_queued (same script but with table_id 160301 and loglevel 3)
$ vim $HOME/tools/summarizer/start_top_queue.sh (same script but with table_id 0)
1. e3db Configuration (to include in the upgrader)
CREATE AGGREGATE sum_uniq (
BASETYPE = text,
SFUNC = sum_uniq,
STYPE = int8,
INITCOND = 0);
## EXPORTER CONSIDERATION ##
Remember that the id field should be set properly with the id in t_field_definition
(think about it) (sync_data.scala ?? )
## CHIME EXPORTER ##
1. Create a bunch of directories (have fun)
$ mkdir -p /home/emarsys/tools/exporters/chime_exporter
$ mkdir -p /home/emarsys/tools/exporters/chime_exporter/conf
$ mkdir -p /home/emarsys/tools/exporters/chime_exporter/log
$ mkdir -p /home/emarsys/tools/exporters/chime_exporter/META-INF
$ mkdir -p /home/emarsys/tools/exporters/chime_exporter/lib
$ cd /home/emarsys/tools/exporters/chime_exporter
1. create the following symbolic links
$ ln -sf $DEV_HOME/export/chime_exporter/target/e3-export-chime_exporter-<VERSION>-jar-with-dependencies.jar e3-export-chime_exporter.jar
$ ln -sf $HOME/Dropbox/Library/Java/bcpg-139-jdk15.jar lib/bcpg-139-jdk15.jar
$ ln -sf $HOME/Dropbox/Library/Java/bcprov-139-jdk15.jar lib/bcprov-139-jdk15.jar
1. Copy the script file chime_exporter.sh
$ cp ~/Dropbox/ConfigurationFiles/exporters/chime_exporter/chime_export.sh .
1. Copy the log4j.xml
$ cp $HOME/Dropbox/ConfigurationFiles/exporters/common/log4j.xml .
1. Copy persistence.xml and set accordingly
$ cp $HOME/Dropbox/ConfigurationFiles/expoters/common/persistence.xml ./META-INF
1. Copy emarsys3-ds.xml
$ cp /opt/JBoss/server/broadcasting/deploy/emarsys3-ds.xml .
1. Copy e3properties and set it accordingly to your system
$ cp $HOME/Dropbox/ConfigurationFiles/exporters/chime_exporter/e3.properties ./conf
1. Create a symlinc for the export keys
$ ln -sf $HOME/Dropbox/ConfigurationFiles/exporters/chime_exporter/export_keys export_keys
1. Launch the chime_export
$ ./chime_export.sh <ID_ACCOUNT>
1. (Discuss in order to avoid password request ssh-agent known-host what do you prefer? :)
## BASIC EXPORTER ##
For the basic exporter is not necessary to set any transfer properties seeing that it entails in a merely rsync from an /mnt to anoter /mnt
1. mkdir -p /home/emarsys/tools/exporters/basic_exporter
1. mkdir -p /home/emarsys/tools/exporters/basic_exporter/conf
1. mkdir -p /home/emarsys/tools/exporters/basic_exporter/META-INF
1. mkdir -p /home/emarsys/tools/exporters/basic_exporter/log
1. cd /home/emarsys/tools/exporters/basic_exporter
1. create a symbolic link
$ ln -sf $DEV_HOME/export/basic_exporter/target/e3-export-basic_exporter-<VERSION>-jar-with-dependencies.jar e3-export-basic_exporter.jar
1. Copy the script file basic_exporter.sh
$ cp ~/Dropbox/ConfigurationFiles/exporters/basic_exporter/basic_export.sh .
1. 1. Copy the log4j.xml
$ cp $HOME/Dropbox/ConfigurationFiles/exporters/common/log4j.xml .
1. Copy persistence.xml and set accordingly
$ cp $HOME/Dropbox/ConfigurationFiles/expoters/common/persistence.xml ./META-INF
1. Copy e3properties and set it accordingly to your system
$ cp $HOME/Dropbox/ConfigurationFiles/exporters/basic_exporter/e3.properties ./conf
1. Launch the basic_export (export type bitwise mask 3 = 0011 EXPORT_UNSUBSCRIBE 15 = 1111 ALL)
$ ./basic_exporter.sh 1000 3 <(YYYY-MM-DD)>
$ ./basic_exporter.sh 1000 15 <(YYYY-MM-DD)>
There are several classes I do not intend for clients to make use of. Encapsulate them!
for instance:
val tabsp = """<table><tr><td>
1
1
</td><td></td></tr></table>"""
println(knockoff(tabsp))
results in
ListBuffer(Paragraph(List(HTMLSpan(<table>), HTMLSpan(<tr>), HTMLSpan(<td>), Text(
1
)),1.1), Paragraph(List(Text(1
</td>), HTMLSpan(<td></td>), Text(</tr></table>)),4.1))
Note that the closing /tr and /table are rendered as text.
This does not hapen if the 1s are not separated by empty line...
It looks like a regular expression might be off.
When I used the following source, the links were off:
# Test #
[Link][] leads to
[another][] link.
[link]: http://example.com/link
[another]: http://another.com/another
But I if I alter it just slightly to have the first reference link use the name, both links work.
# Test #
[Link][link] leads to
[another][] link.
[link]: http://example.com/link
[another]: http://another.com/another
Huh? Is it that case-insensitivity breaks all links?
The following test case causes a stack overflow.
knockoff(List.fill(5000)("a").mkString(""))
could you do this?
When I convert such as the following markdown document to HTML.
- item1
- item2
I had expected the results shown in (A), but the actual output like shown in (B).
(A)
<li>item1</li><li>item2</li>
(B)
<li>item1
</li><li>item2
</li>
Is this the correct behavior? I want to send a pull request about that removing this line breaks. Is it OK?
Kind of annoying that
# This #
Captures the title as ' This ' and not 'This'.
OK I had an embedded list, and this seemed to cause a parsing error:
* This is a long line that wrapped
**bold**
And then I notice that if you want a code block that trails a list, things are not so happy:
1. List item
code block
That should actually be a code block within a complex list.
I likely need some special blockquote handling, along the lines of what I've done with the lists. But hopefully without all the crapola.
I'm using Scala 2.10.2, hence using 0.8.1 version.
Calling knockoff
method keeps throwing me this:
java.lang.StackOverflowError: null
at java.util.regex.Pattern.group0(Pattern.java:2513) ~[na:1.6.0_26]
at java.util.regex.Pattern.sequence(Pattern.java:1806) ~[na:1.6.0_26]
at java.util.regex.Pattern.expr(Pattern.java:1752) ~[na:1.6.0_26]
at java.util.regex.Pattern.compile(Pattern.java:1460) ~[na:1.6.0_26]
at java.util.regex.Pattern.(Pattern.java:1133) ~[na:1.6.0_26]
at java.util.regex.Pattern.compile(Pattern.java:823) ~[na:1.6.0_26]
at scala.util.matching.Regex.(Regex.scala:153) ~[scala-library-2.10.2.jar:na]
at scala.collection.immutable.StringLike$class.r(StringLike.scala:224) ~[scala-library-2.10.2.jar:na]
at scala.collection.immutable.StringOps.r(StringOps.scala:31) ~[scala-library-2.10.2.jar:na]
at scala.collection.immutable.StringLike$class.r(StringLike.scala:213) ~[scala-library-2.10.2.jar:na]
at scala.collection.immutable.StringOps.r(StringOps.scala:31) ~[scala-library-2.10.2.jar:na]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:261) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
at com.tristanhunt.knockoff.ChunkParser$$anon$1.findEnd(MarkdownParsing.scala:270) ~[knockoff_2.10-0.8.1.jar:0.8.1]
Any idea?
Oh and just to add more info, using the version for Scala 2.9 works just fine.
UPDATE 08/29/13 10:06 : will try increasing stack size and cross my fingers
Need to investigate how this happened, but having a header at the end of the document errored out:
Blah blah blah
## Foo ##
Not sure why
Knockoff doesn't handle newlines feeds well, as it only expects the \n
delimiter and will fail if it sees a \n
line ending (or a \r
).
Here is script to reveal the problem:
scala> import com.tristanhunt.knockoff.DefaultDiscounter._
import com.tristanhunt.knockoff.DefaultDiscounter._
scala> knockoff("\n") // normal case
res21: Seq[com.tristanhunt.knockoff.Block] = ListBuffer()
scala> knockoff("\r\n") // abnormal case
' foundmatching regex `[\t ]*\n' expected but `
next == reader : false
' foundmatching regex `[\t ]*\n' expected but `
next == reader : false
' foundmatching regex `[\t ]*\n' expected but `
next == reader : false
' foundmatching regex `[\t ]*\n' expected but `
next == reader : false
' foundmatching regex `[\t ]*\n' expected but `
next == reader : false
' foundmatching regex `[\t ]*\n' expected but `
next == reader : false
' foundmatching regex `[\t ]*\n' expected but `
[...]
' foundmatching regex `[\t ]*\n' expected but `
next == reader : false
java.lang.StackOverflowError
at java.util.regex.Pattern.atom(Pattern.java:1952)
at java.util.regex.Pattern.sequence(Pattern.java:1834)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.group0(Pattern.java:2530)
at java.util.regex.Pattern.sequence(Pattern.java:1806)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
at scala.util.matching.Regex.<init>(Regex.scala:41)
at scala.collection.immutable.StringLike$class.r(StringLike.scala:202)
at scala.collection.immutable.StringOps.r(StringOps.scala:31)
at com.tristanhunt.knockoff.ChunkParser.bulletLead(MarkdownParsing.scala:69)
at com.tristanhu...
I'm seeing things like this causing problems.
A code block
(this is an empty line, but spaced like a code block)
A normal block
Hi,
thanks for a great library!
I stumbled about an issue in blockquote parsing, which looks like a parser error:
Following example work just fine:
> # Hi
> * One
> * Two
> * Three
But this won't work as expected:
> Hi
> # Hi
> * One
> * Two
> * Three
and produces following html
<blockquote><p>Hi
# Hi
<em> One
</em> Two
* Three</p></blockquote>
Here is the snippet to reproduce this:
import DefaultDiscounter._
println(toXHTML(knockoff("> # Hi\n> * One\n> * Two\n> * Three")))
println(toXHTML(knockoff("> Hi\n> # Hi\n> * One\n> * Two\n> * Three")))
Best regards,
Matthias
If you don't have a whitespace line after headers, you'll get a matching error.
For example:
Header
----------
Body
This breaks. What I want is a warning in these cases, and for the Body
to be treated as the start of the next paragraph.
this won't work - the code block does not look like a code block
1. some item
code line 1
code line 2
2. some item
If you use characters significant to Markdown in your LaTeX, you'll probably not get what you expect in the output. Considering that characters like _
or *
are very useful in math declarations, well, you're probably not going to have much TeX actually pass through the system.
This should be fixed by the next version, where I'm also using SnuggleTeX to render MathML sequences.
In a source like Text ! text [linktext](linkurl)
, the text
part is skipped/removed and the whole thing is interpreted as Text ![linktext](linkurl)
, i.e. as an image definition. Escaping the exlamation mark does not help.
Reproduction:
import com.tristanhunt.knockoff.DefaultDiscounter._
import com.tristanhunt.knockoff._
val source = """Text ! text [linktext](linkurl)"""
val parsed = knockoff(source)
println(source)
println(toXHTML(parsed))
// try with an escaped exclamation mark
val source2 = """Text \! text [linktext](linkurl)"""
val parsed2 = knockoff(source)
println(source2)
println(toXHTML(parsed2))
gives:
Text ! text [linktext](linkurl)
<p>Text <img src="linkurl" alt="linktext"></img></p>
Text \! text [linktext](linkurl)
<p>Text <img src="linkurl" alt="linktext"></img></p>
While it probably doesn't help too much, the email addresses are not completely entitized by the Converter.
Welcome to Scala version 2.10.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_51).
Type in expressions to have them evaluated.
Type :help for more information.
scala> import com.tristanhunt.knockoff.DefaultDiscounter._
import com.tristanhunt.knockoff.DefaultDiscounter._
scala> knockoff("""val turtlePosition = Lens.lensu[Turtle, Point] (
| (a, value) => a.copy(position = value),
| _.position)
| val pointX = Lens.lensu[Point, Double] (
| (a, value) => a.copy(x = value),
| _.x)""")
java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:313)
at scala.None$.get(Option.scala:311)
at com.tristanhunt.knockoff.SpanConverter.findNormalMatch(MarkdownParsing.scala:810)
at com.tristanhunt.knockoff.SpanConverter$$anonfun$matchers$3.apply(MarkdownParsing.scala:660)
at com.tristanhunt.knockoff.SpanConverter$$anonfun$matchers$3.apply(MarkdownParsing.scala:660)
at com.tristanhunt.knockoff.SpanConverter$$anonfun$2.apply(MarkdownParsing.scala:642)
at com.tristanhunt.knockoff.SpanConverter$$anonfun$2.apply(MarkdownParsing.scala:641)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.scala:138)
at scala.collection.AbstractTraversable.$div$colon(Traversable.scala:105)
at com.tristanhunt.knockoff.SpanConverter.convert(MarkdownParsing.scala:641)
at com.tristanhunt.knockoff.SpanConverter$DelimMatcher.apply(MarkdownParsing.scala:616)
at com.tristanhunt.knockoff.SpanConverter$DelimMatcher.apply(MarkdownParsing.scala:606)
at com.tristanhunt.knockoff.SpanConverter$$anonfun$2.apply(MarkdownParsing.scala:642)
at com.tristanhunt.knockoff.SpanConverter$$anonfun$2.apply(MarkdownParsing.scala:641)
at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
at scala.collection.immutable.List.foldLeft(List.scala:84)
at scala.collection.TraversableOnce$class.$div$colon(TraversableOnce.scala:138)
at scala.collection.AbstractTraversable.$div$colon(Traversable.scala:105)
at com.tristanhunt.knockoff.SpanConverter.convert(MarkdownParsing.scala:641)
at com.tristanhunt.knockoff.SpanConverter.apply(MarkdownParsing.scala:630)
at com.tristanhunt.knockoff.Discounter$$anonfun$2.apply(Discounter.scala:38)
at com.tristanhunt.knockoff.Discounter$$anonfun$2.apply(Discounter.scala:37)
at scala.collection.immutable.Stream.map(Stream.scala:376)
at com.tristanhunt.knockoff.Discounter$class.knockoff(Discounter.scala:37)
at com.tristanhunt.knockoff.DefaultDiscounter$.knockoff(Discounter.scala:79)
As the comment says, it shouldn't fail:
/** Parses and returns our best guess at the sequence of blocks. It will
never fail, just log all suspicious things. */
FYI
This project was a very early stint of me learning how to use Scala. Thus, the parser itself is a series of bad ideas strung together with waaaaay too much complexity.
At some point, I'll totally rewrite the thing as just a combinator parser, but that will take some time. I can live with my hackfest -> just a few more bugs to quash.
It dawned on me that it would be far easier to build serious apps against this thing if you could take, say, a recognized block and get the line numbers and general positions for that block.
This is totally do-able (I think) by the positioned
decoration. This should give us the input start for any of our recognized tokens.
I'm not sure if this requires a full-rewrite or not.
Input:
1. This
2. That; and
3. the other
* not
* the
* end
and
- another
- style
and
+ finally
+ this
+ style
Converts to Block:
ListBuffer(OrderedList(List(OrderedItem(List(Paragraph(List(Text(This2. That; and3. the other), Emphasis(List(Text( not))), Text( the* endand- another- styleand+ finally+ this+ style)),1.1)),1.1))))
Converts to XHTML:
<ol>
<li>
This2. That; and3. the other
<em> not</em>
the* endand- another- styleand+ finally+ this+ style
</li>
</ol>
expected rendering:
and
and
Consider this Markdown text:
Here is a [link][] (cool!)
[link]: http://localhost/
With that input, John Gruber's Perl markdown script produces:
<p>Here is a <a href="http://localhost/">link</a> (Cool!).</p>
Knockoff produces:
<p>Here is a [link]<a href="Cool!" ></a>.
</p>
Remove the trailing parenthetical expression, and Knockoff produces a valid link:
Input
Here is a [link][].
[link]: http://localhost/
Output
<p>Here is a <a href="http://localhost/" >link</a>.
</p>
Error discovered in: knockoff_2.8.0.RC2-0.7.1-11.jar
One option for inline HTML would be to attempt to convert it to XHTML, which might be favorable. This should be possible by adjusting the XHTMLWriter.spanToXHTML
method, but I note that this is currently wrapped in a Group()
as well in the paragraphToXHTML method.
Option[Node]
and use a flatMap
method?It's really hard to figure out where you might have messed up the file, because the parsing error spits out some jibberish, but not where the error was. Dang.
Currently, if you lead a list item with an asterisk-delimited emphasis, you'll mess up the processing.
scala> toXHTML( knockoff("""* *What* is this""") )
res0: scala.xml.Node = <p><em> </em>What* is this</p>
There is a simple workaround:
scala> toXHTML( knockoff("""* _What_ is this""") )
res1: scala.xml.Node = <ul><li><em>What</em> is this</li></ul>
Standard Markdown:
$ echo "*Test*" | markdown
<p><em>Test</em></p>
Knockoff (built against GitHub source with Scala 2.8):
scala> import com.tristanhunt.knockoff.DefaultDiscounter._
import com.tristanhunt.knockoff.DefaultDiscounter._
scala> toXHTML(knockoff("*Test*")).toString
res0: String = <ul><li>Test*</li></ul>
So we parse a text file into a list of Block
objects. This means client code does things like this:
blocks.filter( _.isInstanceOf[ CodeBlock ] )
To get all code blocks. This kind of, well, sucks. Something is wrong here, because it sure seems like my type hierarchy is wrong if we have to cast to do anything useful.
eats up the last bracket...
object Test extends App {
import com.tristanhunt.knockoff.DefaultDiscounter._
println(toXHTML(knockoff("[wiki link](http://en.wikipedia.org/wiki/Bracket_(disambiguation))")).toString)
}
output - note the missing bracket in the href:
<p><a href="http://en.wikipedia.org/wiki/Bracket_(disambiguation">wiki link</a>)</p>
Here is an example:
Title
-----
unrecognized due to space in underline,
The usage of buildr is pretty, well, bad. I went with buildr and testng because I knew it and wanted to get something done.
Probably should try out simple build tool, and then convert the testing framework to ScalaTest (with specs!)
Take the following input:
This is a line ending with two blanks.
That should produce a hard "br" in Markdown, per Daring Fireball.
It doesn't.
The three lines end with two blank characters. Per Daring Fireball, that's supposed to cause a hard break. Here's what the Perl Markdown script produces, given that input:
<p>This is a line ending with two blanks.<br />
That should produce a hard "br" in Markdown, per Daring Fireball.<br />
It doesn't.</p>
Here's what Knockoff produces:
<p>This is a line ending with two blanks.
</p><p>That should produce a hard "br" in Markdown, per Daring Fireball.
</p><p>It doesn't.</p>
On the trunk version (0.7.3-SNAPSHOT) extending the wholesaler trait returned the following compile errors in sbt
illegal combination of modifiers: private and override for: method toMathML
When I created my own trait that combined the LatexXHTMLWriter
with the Discounter
, but removed the SCAML
stuff, it worked.
I had a basic document started and got a stack overflow error. It looked like this:
# Title #
Note that the second line has no spaces and the third line has four spaces (start a code block). This made things go kaboom.
Need to verify this is fixed in the new version:
Some paragraph text...
`code thingy` ...
// A code block
This wouldn't be seen as a paragraph and a code block.
OK, currently, if you do this
Code line one
Code line two
And the line between those those two lines does not have four spaces, they will be parsed as two separate code blocks. This is lame.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.