Giter Club home page Giter Club logo

phantom-of-the-capitol's Introduction

Phantom of the Capitol

Phantom DC for Short

A RESTful API for retrieving the required fields for and filling out the contact forms of members of the US Congress.

Phantom DC has three major functions:

  • Looking up form fields provided by all members of congress
  • Using PhantomJS to proxy fill-in a congress member's form such that they need not navigate directly to the congress member's web page
  • It can return any captcha images and forward the user submitted solution to the .gov website

This project relies on:

Build Status

How to Use This API

Documentation is located here.

How to Contribute to This Project

Dev/ Production Setup with Docker (Recommended)

Docker makes it easy to set up Phantom DC for development, production, and testing.

Here's an example which will get you a quick development instance:

$  cp docker-compose.yml.example docker-compose.yml
$  cp .env.example .env
$  sudo docker-compose up --build

Take a look at config/phantom-dc_config.rb.example to get an idea of what configuration options you can pass on to the phantom-dc docker instance using environment variables in .env. In most instances, you'll want to change the AWS config options.

If you're actively developing, you'll probably also want to share your host directories path with the container by adding volumes to the app service in docker-compose.yml:

  app:
    ...
    volumes:
      - ./cwc:/opt/phantomdc/cwc
      - ./app:/opt/phantomdc/app
      - ./public:/opt/phantomdc/public
      - ./spec:/opt/phantomdc/spec
      - ./tasks:/opt/phantomdc/tasks
      - ./db/migrate:/opt/phantomdc/db/migrate
      - ./docker/app/entrypoint.sh:/opt/phantomdc/entrypoint.sh

To run the test suite using docker, run:

$    sudo docker-compose -f docker-compose.test.yml run test_app rspec spec

You may also want to run a cron daemon for your production setup which pulls the latest YAML files from contact-congress or your other data sources every so often. Only run this after giving time (~5min should do it) for the phantom-dc container to initially populate its members of congress upon the first run:

$  docker run -it --rm --name=phantom-dc-cron \
      -v /var/run/docker.sock:/var/run/docker.sock \
      -v $(pwd)/docker/cron/crontab:/etc/crontabs/root \
      docker crond -f

Development Environment Installation and Setup with Vagrant

Requirements

Using Debian or Ubuntu? Here's a one liner to save you time.

$  apt-get install vagrant virtualbox
  • An AWS account for storing captchas and debug screen shots.

  • SmartyStreets Account An API key for using SmartyStreets allows rake tasks to run.

Installation

On Host
$  # First, using github.com, fork this repo so you can clone directly \
   # from your own repo \
   git clone [email protected]:<YOUR_ACCOUNT>/phantom-of-the-capitol.git &&
   cd phantom-of-the-capitol &&
   vagrant up

$  # Edit config (at minimum change DEBUG_KEY and AWS credentials) \
   cp config/phantom-dc_config.rb.example config/phantom-dc_config.rb &&
   vi config/phantom-dc_config.rb
Within Vagrant VM
$  vagrant ssh

$  cd /vagrant;
   bundle exec rake ar:create;
   bundle exec rake ar:schema:load;
   rackup --host 0.0.0.0

Production Environment Installation and Setup

Requirements

On a Debian based system (we're testing against Ubuntu) download and install the latest phantomjs and then run the below apt-get command.

$  apt-get install imagemagick libmysql++-dev libpq-dev git libqt4-dev xvfb

Install ruby with rvm, then

$  gem install bundler;
   bundle install;

Create the mysql database:

$   cp config/database.rb.example config/database.rb;

    # fill in db info as with any rails app \
    vi config/database.rb;

    # configure the app datafile
    cp config/phantom-dc_config.rb.example  config/phantom-dc_config.rb;
    bundle exec rake ar:create;
    bundle exec rake ar:schema:load

Populating the Database

Once you have Phantom DC running, you have to add DataSources. DataSources are git repositories containing a subdirectory filled with yml files which tell Phantom DC how to fill out forms. In most cases, you want the US Congress data source, which should be added via the below command:

$  ./phantom-dc datasource add --git-clone \
      https://github.com/unitedstates/contact-congress.git us_congress ./us_congress members/

To update the DataSource repos, run...

$  bundle exec rake phantom-dc:update_git

Run this rake task any time you want to update the DataSource repos to the latest commit of each repository. To add and remove DataSources, see the help dialogue for the CLI:

$  ./phantom-dc datasource --help

Running

Just run rackup

Testing

If you haven't set up the test db, create it, using config/database.rb

Then you'll need to create and prepare the test database:

$  RACK_ENV=test bundle exec rake ar:create;
   RACK_ENV=test bundle exec rake ar:schema:load

And run

$  bundle exec rspec spec

Debugging Phantom of the Capitol

The Congress Forms Debugger is a useful tool for debugging Phantom DC. To run it locally, in config/phantom-dc_config.rb first make sure to set DEBUG_KEY to a shared secret and CORS_ALLOWED_DOMAINS to add localhost:8000 if the debugger is going to be run on port 8000. Then:

$  git clone https://github.com/EFForg/congress-forms-test &&
   cd congress-forms-test &&
   vim js/config.js # edit this file so that `PHANTOM_DC_SERVER` points to your own `phantom-of-the-capitol` API root.

$  python -m SimpleHTTPServer # or configure apache for this endpoint

Now, you should be able to point your browser to http://localhost:8000/congress-forms-test/?debug_key=DEBUG_KEY (replacing, of course, DEBUG_KEY) and see a list of members of congress with a column for their Recent Success Rate. From here, you can click on the bioguide identifier for a member of congress and be brought to a page where you can then:

  1. send a test form fill
  2. see details about their recent form fills, including (if it was an attempt resulting in failure or error):
  • the Delayed::Job id #
  • a debugging message
  • a screenshot at the point of failure
  1. view the actions for this member of congress, as the database sees them (e.g. if you want to make sure the actions match the latest YAML from contact-congress)

Re-Running Jobs That Resulted in error or failure

Any jobs that result in an error or failure are added to the Delayed::Job job queue, unless the SKIP_DELAY environment variable is set. This job queue should be checked periodically and the jobs themselves debugged and re-run to ensure delivery. A number of convenience rake tasks have been provided for this purpose.

rake phantom-dc:delayed_job:jobs_per_member

Dispays the number of jobs per member of congress in descending order, indicating which members have captchas on their forms and giving a summation at the end.

rake phantom-dc:delayed_job:perform_fills[regex,job_id,overrides]

Perform the form fills in the queue, optionally providing:

  • regex which will only perform the fills for members with matching bioguide identifiers
  • job_id which will only perform the fill for a given Delayed::Job id
  • overrides, a Ruby hash which will override the field values when the fill is performed

Examples:

$  rake phantom-dc:delayed_job:perform_fills
$  rake phantom-dc:delayed_job:perform_fills[A000000]
$  rake phantom-dc:delayed_job:perform_fills[A000000,,'{"$PHONE" => "555-555-5555"}']
$  rake phantom-dc:delayed_job:perform_fills[,12345,'{"$EMAIL" => "[email protected]"}']

rake phantom-dc:override_field[regex,job_id,overrides]

Override values for jobs in the queue, optionally providing:

  • regex which will only override the values for members with matching bioguide identifiers
  • job_id which will only override the value for a given Delayed::Job id
  • overrides, a Ruby hash which will override the field values for the criteria specified

Examples:

$  rake phantom-dc:delayed_job:override_field
$  rake phantom-dc:delayed_job:override_field[A000000]
$  rake phantom-dc:delayed_job:override_field[A000000,,'{"$PHONE" => "555-555-5555"}']
$  rake phantom-dc:delayed_job:override_field[,12345,'{"$EMAIL" => "[email protected]"}']

rake phantom-dc:delayed_job:zip4_retry[regex]

Pick out the jobs that have no $ADDRESS_ZIP4 defined, figure out the zip+4 based on the address and 5-digit zip in the job (requires an account with SmartyStreets with credentials in config/phantom-dc_config.rb), and try the job again. Optionally provide:

  • regex which will only perform the fills for members with matching bioguide identifiers

Examples:

$  rake phantom-dc:delayed_job:zip4_retry
$  rake phantom-dc:delayed_job:zip4_retry[A000000]

Padrino Console

If you prefer to dive deep, you can fire up the padrino console with padrino c and debug jobs:

> Delayed::Job.where(queue: "error_or_failure").count # count of all jobs
 => 78
> job = Delayed::Job.where(queue: "error_or_failure").first # get the first job
 => #<Delayed::Backend::ActiveRecord::Job id: 318, priority: 0, attempts: 1, handler: "--- !ruby/object:Delayed::PerformableMethod\nobject:...", last_error: "Unable to find css \"p\" with text /Thank you!/\n[\"/ho...", run_at: "2014-07-03 12:14:10", locked_at: nil, failed_at: nil, locked_by: nil, queue: "error_or_failure", created_at: "2014-07-03 12:14:10", updated_at: "2014-08-26 18:50:27">
> handler = YAML.load job.handler # get the "handler" which contains the object to be acted upon and the arguments
 => #<Delayed::PerformableMethod:0x0000000544ae30 @object=#<CongressMember id: 60, bioguide_id: "F000457", success_criteria: "---\nheaders:\n  status: 200\nbody:\n  contains: Your m...", created_at: "2014-04-30 19:08:05", updated_at: "2014-07-03 18:54:34">, @method_name=:fill_out_form, @args=[{"$NAME_FIRST"=>"John", "$NAME_LAST"=>"Doe", "$ADDRESS_STREET"=>"123 Fake Street", "$ADDRESS_CITY"=>"Hennepin", "$ADDRESS_ZIP5"=>"55369", "$EMAIL"=>"[email protected]", "subscribe"=>"1", "$SUBJECT"=>"Example subject", "$MESSAGE"=>"Example Message", "$NAME_PREFIX"=>"Mr.", "$ADDRESS_STATE_POSTAL_ABBREV"=>"MN", "$TOPIC"=>"Example Topic", "$PHONE"=>"555-555-5555", "$ADDRESS_ZIP4"=>"1234"}, nil]>
handler.args[0]['$PHONE'] = '123-456-7890' # set the phone number

Then, when you're ready to retry the fill:

handler.perform # try filling out the form
handler.object.fill_out_form(handler.args[0]) do |c|
  puts c
  STDIN.gets.strip
end # fills out a form with a captcha

phantom-of-the-capitol's People

Contributors

anselmoalves avatar chrisantaki avatar crdunwel avatar cspeisman avatar eff-test avatar gcosta avatar hainish avatar irjudson avatar j-ro avatar k-stewart avatar konklone avatar mike-zorn avatar pollyp avatar rshorey avatar scrozier avatar seanknox avatar tcyrus avatar thenotary avatar tone81 avatar vbrown608 avatar wioux avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

phantom-of-the-capitol's Issues

Concurrent Connection Limits

We have installed this API and are successfully submitting letters at a rate of about 5K per day. However, ideally we are looking for a solution that is capable of sending significantly more than this. I believe that the limitation we're encountering is related to PhantomJS and its ability to process concurrent connections.

Can you tell us if the approximately 5K per day limit is typical? If so, are there suggestions as to how we can architect a solution that can exceed this? I assume multiple virtual servers running separate instances is one/costly option...Is it possible to install multiple instances of Phantom of the Capital on the same server, each listening on a different port, so that a single server would be capable of exceeding this limit?

Thanks

bundle exec issue during installation

Running into an error during the installation step listed below:

bundle exec rake congress-forms:update_git[contact_congress_directory]

I'm receiving this printout:

root@REDACTED:/var/www/html/congress-forms-master# bundle exec rake congress-forms:update_git[/var/www/html/congress-forms-master/contact-congress] --trace
** Invoke congress-forms:update_git (first_time)
** Execute congress-forms:update_git
  DEBUG -  Application Load (1.8ms)  SELECT `application_settings`.* FROM `application_settings` WHERE `application_settings`.`key` = 'contact_congress_commit' LIMIT 1
No previous commit found, reloading all congress members into db
rake aborted!
NoMethodError: undefined method `[]' for false:FalseClass
tasks/congress-forms.rake:320:in `block (2 levels) in update_db_with_git_object'
tasks/congress-forms.rake:335:in `create_congress_member_exception_wrapper'
tasks/congress-forms.rake:317:in `block in update_db_with_git_object'
tasks/congress-forms.rake:315:in `each'
tasks/congress-forms.rake:315:in `update_db_with_git_object'
tasks/congress-forms.rake:183:in `block (2 levels) in <top (required)>'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/task.rb:240:in `call'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/task.rb:240:in `block in execute'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/task.rb:235:in `each'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/task.rb:235:in `execute'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/task.rb:179:in `block in invoke_with_call_chain'
/usr/lib/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/task.rb:172:in `invoke_with_call_chain'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/task.rb:165:in `invoke'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:150:in `invoke_task'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:106:in `block (2 levels) in top_level'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:106:in `each'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:106:in `block in top_level'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:115:in `run_with_threads'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:100:in `top_level'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:78:in `block in run'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:176:in `standard_exception_handling'
/var/lib/gems/1.9.1/gems/rake-10.3.2/lib/rake/application.rb:75:in `run'
/var/lib/gems/1.9.1/gems/rake-10.3.2/bin/rake:33:in `<top (required)>'
/usr/local/bin/rake:23:in `load'
/usr/local/bin/rake:23:in `<main>'
Tasks: TOP => congress-forms:update_git

Click not working?

https://github.com/EFForg/phantom-of-the-capitol/blob/master/app/models/congress_member.rb#L245

May consider doing something like

ele = session.find(a.selector)
if ele.tag_name == 'input' or ele.tag_name == 'button'
  ele.trigger('click')
else
  ele.click
end

This is suggested for poltergeist README as a workaround. - https://github.com/teampoltergeist/poltergeist#mouseeventfailed-errors

It's late so I'll try to see if I can test this more tomorrow. Messages seem to be going through, but I'm getting two screenshots, one right before submission and one with the success page. But yet I'm getting a failure message. So may be my phantom-of-the-capitol is messed up somehow. If somebody else can test this out that'd be great. The problem cases that this is trying to address are the new-style forms such as https://github.com/unitedstates/contact-congress/blob/master/members/B001293.yaml .

Memory leak for PhantomJS

After a while of testing, a bunch of PhantomJS instances linger and eat up memory until killed. This must be solved before go-live!

Make data sources flexible

It would be nice if we were able to add and delete data sources via a CLI. Currently, this project has one data source, hard coded to https://github.com/unitedstates/contact-congress. Because of this tight coupling, @j-ro has has forked the repo to add other data sources such as state congresses. This is not ideal, we should maintain one codebase and add / remove data sources on a per-instance basis.

All data sources should maintain the formatting of the contact-congress repo:

  • Have a git url
  • Have a subfolder with YAML files
  • YAML instruction set should mirror the contact-congress instruction set

This is in progress currently, and after this feature is built in anyone can choose any arbitrary data sources they like. When pulled in to the local database, bioguide ids can be prefixed by data source. For instance, if it's the California state senate, prefixing with CA_ seems prudent.

Comments, @j-ro?

Test queueing system

Currently, if the constant DELAY_ALL_NONCAPTCHA_FILLS is set to true, the FillHandler object created by the controller will automatically throw all fill requests onto the queue.

Additionally, any failures that occur while filling out a form are also thrown onto the queue, with last_job.attempts = 1.

We should test if these two situations work well on the same queue, and if not separate out the queues.

Fix Tests: ThreadError

Getting this after fixing memory leak:

Failures:

  1) Main controller route /fill-out-form with a captcha should destroy the fiber after a time interval
     Failure/Error: post_json :'fill-out-captcha', {
     ThreadError:
       killed thread
     # ./app/helpers/form_fill_handler.rb:47:in `run'
     # ./app/helpers/form_fill_handler.rb:47:in `fill_captcha'
     # ./app/controllers/main.rb:52:in `block (2 levels) in '
     # (eval):208:in `block in call'
     # (eval):198:in `catch'
     # (eval):198:in `call'
     # ./spec/controllers/main_controller_spec.rb:250:in `block (4 levels) in '

Finished in 1 minute 6.18 seconds
42 examples, 1 failure

Lots of "Request failed to reach server, check DNS and/or server status" errors

I've been getting a lot of these:

############################################
Message:
Request failed to reach server, check DNS and/or server status
e:
#
backtrace:
/home/legind/workspace/congress-forms/app/models/congress_member.rb:282:in `rescue in fill_out_form_with_capybara'
/home/legind/workspace/congress-forms/app/models/congress_member.rb:301:in `fill_out_form_with_capybara'
/home/legind/workspace/congress-forms/app/models/congress_member.rb:175:in `fill_out_form_with_poltergeist'
/home/legind/workspace/congress-forms/app/models/congress_member.rb:45:in `fill_out_form'
tasks/congress-forms.rake:48:in `block (4 levels) in '
tasks/congress-forms.rake:43:in `each'
tasks/congress-forms.rake:43:in `block (3 levels) in '
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/task.rb:240:in `call'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/task.rb:240:in `block in execute'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/task.rb:235:in `each'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/task.rb:235:in `execute'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/task.rb:179:in `block in invoke_with_call_chain'
/home/legind/.rvm/rubies/ruby-2.1.0/lib/ruby/2.1.0/monitor.rb:211:in `mon_synchronize'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/task.rb:172:in `invoke_with_call_chain'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/task.rb:165:in `invoke'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:150:in `invoke_task'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:106:in `block (2 levels) in top_level'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:106:in `each'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:106:in `block in top_level'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:115:in `run_with_threads'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:100:in `top_level'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:78:in `block in run'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:176:in `standard_exception_handling'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/lib/rake/application.rb:75:in `run'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/gems/rake-10.3.2/bin/rake:33:in `'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/bin/rake:23:in `load'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/bin/rake:23:in `'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/bin/ruby_executable_hooks:15:in `eval'
/home/legind/.rvm/gems/ruby-2.1.0@congress-forms/bin/ruby_executable_hooks:15:in `'
############################################

This has to do with ariya/phantomjs#12234 and teampoltergeist/poltergeist#375

Moving the webkit bioid array to each yaml?

Thoughts on moving the requires_webkit constant from https://github.com/EFForg/congress-forms/blob/69eff91202d10b40de6fb3fdc358a2f5b20481f2/config/constants.rb and putting it in each YAML file instead?

Makes more sense in some ways, and allows us to incorporate this constant with our git pulls from the contact congress repo, instead of having to restart everything when this changes.

Also, helpful for testing, so you understand why a yaml may not be working in phantomjs if you're testing that way.

NoMethodError during congress-forms:update_git rake task

When running the following command:
bundle exec rake congress-forms:update_git[../contact-congress]

I get the following error:

NoMethodError: undefined method `number_of_candidates=' for #<SmartyStreets::Configuration:0x00000002b40170>
/home/ubuntu/congress-forms/config/boot.rb:25:in `block in <top (required)>'

Simply commenting out boot.rb line 25 solves the problem for me and the tests still pass without this line. That's not to say it won't break anything else, but I haven't encountered issues yet, as I'm still in the setup phase.

The stack trace is below:

NoMethodError: undefined method `number_of_candidates=' for #<SmartyStreets::Configuration:0x00000002b40170>
/home/ubuntu/congress-forms/config/boot.rb:25:in `block in <top (required)>'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/gems/smarty_streets-0.0.4/lib/smarty_streets.rb:22:in `configure'
/home/ubuntu/congress-forms/config/boot.rb:22:in `<top (required)>'
tasks/congress-forms.rake:1:in `require'
tasks/congress-forms.rake:1:in `<top (required)>'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:3:in `load'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:3:in `block in <top (required)>'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:1:in `each'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:1:in `<top (required)>'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake.rb:12:in `load'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake.rb:12:in `init'
/home/ubuntu/congress-forms/Rakefile:6:in `<top (required)>'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/bin/ruby_executable_hooks:15:in `eval'
/home/ubuntu/.rvm/gems/ruby-2.1.0@congress-forms/bin/ruby_executable_hooks:15:in `<main>'
(See full trace by running task with --trace)

Overlapping elements?

Got this for the first time:

Sounds like the suggestion in the error message (node.trigger('click')) might be a good way to go?

Firing a click at co-ordinates [112, 755] failed. Poltergeist detected another element with CSS selector 'html.js body.html.not-front.not-logged-in.page-node.page-node-.page-node-2.node-type-congress-page.context-contact.omega-mediaqueries-processed.responsive-layout-normal div#page.page.clearfix footer#section-footer.section.section-footer div#zone-footer-wrapper.zone-wrapper.zone-footer-wrapper.clearfix div#zone-footer.zone.zone-footer.clearfix.container-24 div#region-footer-first.grid-24.region.region-footer-first div.region-inner.region-footer-first-inner div#block-boxes-offices-open-and-close-button.block.block-boxes.block-boxes-simple.block-offices-open-and-close-button.block-boxes-offices-open-and-close-button.odd.block-without-title div.block-inner.clearfix div.content.clearfix div#boxes-box-offices_open_and_close_button.boxes-box div.boxes-box-content img#switchImgTag.toggler' at this position. It may be overlapping the element you are trying to interact with. If you don't care about overlapping elements, try using node.trigger('click').
["/home/congressforms/congress-forms/app/models/congress_member.rb:52:in `rescue in fill_out_form'", "/home/congressforms/congress-forms/app/models/congress_member.rb:39:in `fill_out_form'", "/home/congressforms/congress-forms/app/helpers/form_fill_handler.rb:15:in `block in create_thread'"]

Test CAPTCHA fills with >1 production server

Because of the issue enumerated in #23, we are relying on the sticky session feature of EC2 to route requests. When a request comes in and a captcha_needed status is returned, the subsequent captcha_fill should be automatically routed to the same server which is able to access the same instance of phantomjs.

We should do some testing if this is actually the case, which involves spinning up one or more additional servers for this purpose.

Javascript yaml instruction?

We're hitting another tough form here that I think could be solved with a javascript yaml instruction. Something like:

- javascript: "do_some_javascript"

This would allow me to do things like fill hidden inputs.

Any thoughts on how best to implement that?

create new ec2 image

We need a new ec2 image from the latest congress-forms. It needs to be the image we bootstrap from when scaling up

Modify auto-scaling criteria

Deployment (not project) task:

Currently, our auto-scaling is set up to be quick to scale up after seeing a memory spike, and quick to scale down after seeing that memory spike go below a threshold. I think it would be better to be easy to scale up, but stay up for some number of hours after we pass under the threshold again. This way, when the threshold is constantly being crossed on big days of action, we won't be constantly spinning up and then down instances.

@j-ro @ApeChimp I'd love your input here. I think it would be interesting to see the deployment strategies for congress-forms across the board, and share things like metrics for scaling. It seems it would be in everyones interest who runs CF to be involved in discussions of these types, for us to optimize the durability of our instances. I wonder what the best forum for this is: a mailing list?

Better solution for captchas that don't work with Poltergeist

Currently, we have a file config/constants.rb that defines which members of congress can not successfully be completed with phantomjs/poltergeist. The two constants we use are REQUIRES_WEBKIT and REQUIRES_WATIR. I've been adding MoC to this manually based on what I've seen works for a given member. As a fallback for phantomjs, we use capybara-webkit. In most cases if it doesn't work with capybara-webkit it's a problem with this webdriver incorrectly getting the position of the captcha. In this case, we fall back to watir-webdriver.

Watir is not ideal for congress-forms because of the large memory footprint it uses. We should find a better solution for this. Ideas I've had are:

  1. Manually providing the coordinates we expect to see the captcha at (probably brittle)
  2. Using yet another low-memory webdriver, which hopefully has capybara integration so we don't need to create yet another fill_out_form_with_ method.

Need timeout setting on find directive

It appears that some of the member YAML files are failing due to timeout issues. (W000815 and possibly L000480, which is the subject of Issue #9.) It has been suggested that a timeout setting on the find command could remedy these.

Issues loading schema

I try to load the schema via bundle exec rake ar:create ar:schema:load > /dev/null, but I get the below error message.

vagrant@precise64:/vagrant$ bundle exec rake ar:create ar:schema:load > /dev/null
rake aborted!
Xvfb not found on your system
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/headless-1.0.1/lib/headless/cli_util.rb:9:in `ensure_application_exists!'
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/headless-1.0.1/lib/headless.rb:68:in `initialize'
/vagrant/config/boot.rb:14:in `new'
/vagrant/config/boot.rb:14:in `<top (required)>'
tasks/congress-forms.rake:1:in `require'
tasks/congress-forms.rake:1:in `<top (required)>'
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:3:in `load'
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:3:in `block in <top (required)>'
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:1:in `each'
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake_tasks.rb:1:in `<top (required)>'
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake.rb:12:in `load'
/home/vagrant/.rvm/gems/ruby-2.1.0/gems/padrino-core-0.11.4/lib/padrino-core/cli/rake.rb:12:in `init'
/vagrant/Rakefile:6:in `<top (required)>'
(See full trace by running task with --trace)

Note that the setup_dev.sh failed when provisioning the vagrant because of an issue during the bundle install step where it couldn't install the json gem. I was able to move past that by installing the json gem manually.

This Rocks

I've wanted this for a long time. You are awesome.

CORS on last error page

I've added support for a modal to popup, but can't show unless the page is CORS enabled.

Just a nice to have, no rush.

Integrating the House CWC API?

Hey everyone,

@joelcollinsdc let me know a few weeks back that the House has released its application for the Communicating with Congress (CWC) API. I just checked in and Joel reports that half the House are signed up to use it.

I'm wondering whether we should consider integrating CWC into phantom-dc. I have the full API spec and am waiting to hear back on whether I'm allowed to post it publicly. If I am, I'll update this post. But basically its an XML based API, and while its relatively complicated, a lot of the fields are optional.

Some thoughts on my end:

  • CWC is only intended for "mass communications" not authentic constituent messages. We should perhaps only have it as an optional delivery method, otherwise authentic constituent messages won't make it through.
  • That means that if we do implement the API, we'll still need to maintain YAML files for the most part.
  • The main advantages would be quicker delivery (API is faster than using a headless browser), and fewer CAPTCHAs for end-users submitting bulk messages.

Curious as to other's thoughts.

YAML Badges and testing system

When a contact-congress YAML file changes, we need a way to test to make sure that forms are submitting correctly. In some cases this test might be able to happen automatically, but it definitely won't be possible for forms with CAPTCHAs.

When a user runs a test on a particular form, the badge should be updated in this repo showing whether the form is submitting correctly.

The badges should also be updated if a particular congress form stops working. A message with more information on when the form last worked and how the fail happened might also be useful.

Failed captcha

When I fail a captcha, do you restart the session? Looks like my UID is unsaved on the second attempt.

Memory leak

A memory leak is occurring after form-fills that require watir, need to fix this ASAP

Add some access-control for debug endpoints

Currently, there is a boolean constant in config/congress-forms_config.rb, DEBUG ENDPOINTS. Setting this to true for any given instance will let any third party access the list-actions/<bio_id>, recent-statuses-detailed/<bio_id>, and list-congress-members endpoints, which is useful for instances of https://github.com/EFForg/congress-forms-test. However, leaving this as true allows any third party to see recent fill failures and errors, which may include private details. In addition, the list-congress-members endpoint is particularly CPU-intensive, and could lead to a Denial Of Service if abused. CORS is no protection, since it is only enforced in the browser and should not be used as a substitute for good access control.

I originally intended to just set this boolean to false before going live, but it occurs to me that having these errors available to see is very useful in debugging problems when they occur.

Rather than keeping this constant as a boolean, we should instead add some kind of authentication for these endpoints, so that we can debug problems while protecting user data.

$MESSAGE not always in required_actions

Some Congress forms, such as Ayotte can be submitted without actually entering a $MESSAGE (not a required field)

So the retrieve-form-elements endpoint doesn't always list this field in the required_actions hash. If the client only fills fields based on the required_actions, and the Congress form doesn't require the field, it's possible to submit forms with no message.

A workaround is to always force the client to submit a $MESSAGE, even when it's not in required_actions. But is it considered a problem with phantom-dc that $MESSAGE isn't always assumed to be required? Maybe not, but thought I'd raise the question.

Javascript Alerts

Is there a way to read the contents of javascript alert boxes?

My thought is that it either requires using a different driver which allows window switching, or I'll have to inject javascript into the page to override the javascript alert function altogether. Is there something already in phantom-of-the-capitol which allows me to read the text from alert windows?

Specifically, I am trying to tackle this form: http://www.senate.leg.state.mn.us/members/member_emailform.php?mem_id=1010 ... Until I can read the javascript alert text, I'll keep getting false negatives.

Lots of "Too many open files - /usr/bin/phantomjs" errors

   Too many open files - /usr/bin/phantomjs
["/home/congressforms/congress-forms/app/models/congress_member.rb:52:in `rescue in fill_out_form'", "/home/congressforms/congress-forms/app/models/congress_member.rb:39:in `fill_out_form'", "/home/congressforms/congress-forms/app/helpers/form_fill_handler.rb:15:in `block in create_thread'"]

Getting a lot of these errors.

FillHash stateful in controller?

Does the controller have state via the FillHash during the entire runtime of the server? If it is, it will be difficult to horizontally scale the application because we cannot share memory across multiple nodes.

Returning success for forms without CAPTCHAs immediately

In our action center UX, forms tend to take up to 15 seconds to be sent, which tends to be a bit frustrating.

A suggested workaround for us would be to basically assume success for any representatives without CAPTCHAs, without waiting for the server to actually respond.

But in order to do that, we need some way of knowing which representatives have CAPTCHAs. Would it be possible to create a congress-forms endpoint for that? Or do you have any other suggestions on how we should handle that on the frontend?

WEBKIT errors

Getting a lot of these:

/home/congressforms/.rvm/gems/ruby-2.1.0@congress-forms/gems/capybara-webkit-1.3.1/bin/webkit_server failed to start
["/home/congressforms/congress-forms/app/models/congress_member.rb:52:in `rescue in fill_out_form'", "/home/congressforms/congress-forms/app/models/congress_member.rb:39:in `fill_out_form'", "/home/congressforms/congress-forms/app/helpers/form_fill_handler.rb:15:in `block in create_thread'"]

Test Fills in Rate-Limited Environment

When testing bulk fills on my localhost, I'm getting a lot more failures than we got during the volunteer phase. Most of them are due to the bandwidth I have at home being much more limited as compared to what we have available on AWS. The fix, most of the time, is to add an additional find instruction after the final submit click instruction, to tell the webdriver that it has to wait for the success message to appear. We should find a way to artificially limit the bandwidth on the server to simulate high-load conditions, which we may encounter moving forward.

SSL Handshake errors on house.gov servers recently?

We just fixed an issue that may be affecting others...

We had this all working well, successfully filling out forms on house.gov. Today, however, we're getting things like:

2.1.2 :027 >   session.visit "https://amodei.house.gov/"
 => {"status"=>"fail"}

It seems to be correlated with SSL -- http sites work fine, and we see errors like this looking further:

39.592689 23.218.140.161 -> 10.226.91.215 SSLv3 73 Alert (Level: Fatal, Description: Handshake Failure)

We didn't update any code between last week and this, but maybe something changed on house.gov to cause failure.

anyway, we've updated some settings and it works again:

if driver == :poltergeist
      Capybara.register_driver :poltergeist do |app|
       Capybara::Poltergeist::Driver.new(app, :phantomjs_options => ['--ssl-protocol=TLSv1'])
      end
    end
    session = Capybara::Session.new(driver, :ignore_ssl_errors => true)

Maybe this will helps others.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.