move-coop / parsons Goto Github PK

View Code? Open in Web Editor NEW

260.0 260.0 132.0 397.7 MB

A python library of connectors for the progressive community.

Home Page: https://www.parsonsproject.org/

License: Other

Dockerfile 0.05% Python 99.95%

parsons's People

Contributors

Stargazers

Watchers

Forkers

jburchard eliotst chryzanthemum dannyboy15 cchiao joemcl aemrich kcym-3c dekedor sihrc varomodt moveonorg angloyna newmode shduttacheezit pjsier mroswell rgriff23 alekvuka ibrand million-voters-project seth-russell mespgh ushayeruva novirtual byrne-saw ndelrossi7 oburbank coreyhaines tiburona johnarky shepardjma margotw40 ssesayjr ehrenfoss akikoiwamizu davidpablocohn katpfleeger anusha-paul shaeferd ted107mk sierrafromcalifornia willraphaelson rdhyee imelendez michaelefisher schlich jason94 colinxfleming derekma73 thebbennett zestyping sjwmoveon bluelinkdata elliotlrichardson lauriej vignesh-ponraj political-data meredithbain stac-labs nikunj-shah-08 targetsmart-devops tippingpointuk jackhouk actblue cmc333333 mkwoods927 schuyler1d tomiiwa 0saurabh0 yanez-j emerge-data achinwalla python-repository-hub bzupnick sethwalker higgycodes andrewrook shaunagm catpage mikiyamfos presentoccupant movement-voter-project cmarzullo agreenspan24 cmdelrio alxmrs sharinef1 dsimison ibuys tinyx3k kathrynpanger lagomorphia austinweisgrau lperson gcollazo talevy42 herschbot jafayer justinkenel

parsons's Issues

Hustle Leads Endpoint

https://api.hustle.com/docs/#operation/createLead

createLead is priority, but all lead endpoints needed

Drop temp table if upsert fails

Right now, rs.upsert() creates a table with a timestamp. If the upsert fails, the timestamp table remains. Can we drop it?

Redshift: One Value Query

Often times, when we run a query, we don't need a Parsons table, we just need a single value. This would be a convenience method that allows for this.

docs: upsert_person doesn't return Parsons Table

The docs for the upsert_person method say it returns a Parsons Table object, but when I used it, it returned a dictionary of the vanid and other info about the person (if matched), or the vanid and a status (if unmatched/new).

s3.transfer_bucket - document that original file is not removed from original bucket

Documentation should be clear here.

Too many dependencies

The current pattern in parsons is to load every submodule into parsons/__init__.py.

This is neither sustainable nor feasible for systems that require smaller upload sizes/memory footprints (e.g. AWS Lambda). The top of that file makes the goal clear:
"Eg. This allows for: from parsons import VAN"
However, it means every install and deploy of parsons requires an ever-increasing number of dependencies.

Proposal:

Remove the pre-amble -- or at least stop adding new ones. Admittedly this is backwards incompatible
- If we must keep backwards incompatibility, then let's create a second lib called parsons_core where we move the sub-directories, and then import them in parsons/ from parsons_core.
Start documenting the best practice of importing from parsons_core.nvpvan import VAN
Document for low environments that they can install/import with pip install --no-deps parsons -- maintain a requirements-core.txt for cross-source dependencies that can/should stay in core.

MobileCommons Error Handling

When the credentials are incorrect, the script raises xml.parsers.expat.ExpatError: mismatched tag:. Instead, it should check for a <Response [401]> and raise a more descriptive error like Invalid credentials.

MobileCommons Bugs

There are a few bugs I discovered in the existing MobilleCommons class:

profiles filters do not seem to work beyond limit - page seems a little useless because there is no order by and pagination is already incorporated.
group members filters do not seem to work
We will want to add automatic pagination to groups and group members. Pagination exists in profiles, so the code can be lifted from there.

Please additionally test the endpoints that add and delete.

Build Azure Class

here is some helpful code to get started with!

Airtable.get_records() produces errors when returning zero rows

To reproduce:

from parsons import Table, Airtable

# Assuming all credentials and other data in env vars
at = Airtable()

# Successful call
all_rows = at.get_records()
print(all_rows.num_rows)

# Call with filters that would return zero rows will appear to succeed, but...
no_rows = at.get_records(formula="FIND('SOMETHING UNFINDABLE', {Some Column}) > 0")
# Error kicks in when you try to do anything with the results
print(no_rows.num_rows)

This returns an error ValueError: 'fields' is not in list, which suggests that there may be a problem with the use of unpack_dict() inside of get_records()

Email validation/parsing

Cleaning signup sheets and other human entered email addresses.

Here's the code I use that may be helpful to y'all. (It's a bit messy.) It uses email-validator and pydash (because I'm sad I don't get to write in node.js).

from email_validator import validate_email, EmailNotValidError, EmailSyntaxError, EmailUndeliverableError
import re
from pydash import predicates
from pydash.strings import trim, reg_exp_replace, clean, deburr
from typing import List
from pydash.collections import every, filter_
from pydash.arrays import flatten_deep


def empty_if_null(value: str) -> str:
    return value if value else ""

def trim_non_printing(value: str) -> str:
    value = trim(value)
    value = reg_exp_replace(value, '[\u202a\u25a0\u00a0\s]+$', '')
    value = reg_exp_replace(value, '^[\u202a\u25a0\u00a0\s]+', '')
    return value

def clean_email_string(value: str) -> str:
    if not predicates.is_string(value):
        return ""
    # lowercase everything
    value = trim_non_printing(clean(deburr(value)))
    # strip spaces in the middle of the address
    value = reg_exp_replace(value, r'\s+', '')
    return value

email_display_name_re = re.compile(r".+\<(?P<email>[^@]+@[^\>]+)\>")

def fix_common_email_problems(value: str) -> str:
    if email_display_name_re.match(value):
        components = email_display_name_re.search(value)
        value = components.group('email')
    value = clean_email_string(value)
    # trim off the start or end: ,  .  :  "  >  <  '
    # then trim whitespace again
    value = trim(trim(value, ',.:"><\''))
    # fix common suffix issues (could do a better job with this though...)
    value = reg_exp_replace(value, r',com$', '.com')
    return value

def clean_emails(email: str) -> List[str]:
    def _clean_emails(email: str, already_fixed: bool) -> List[any]:
        try:
            return [validate_email(email)['email'].lower()]
        except EmailNotValidError as e:
            msg = str(e)
            if 'It must have exactly one @-sign' in msg:
                print(f'try splitting or {email} with {email.count("@")} @ signs')
                for delim in [';', '/', ',', '|']:
                    # if email is split by this delimiter, do we end up with one @ in each set?
                    # if so, split on that delimiter and treat each as their own address in need 
                    # of cleaning.
                    if every(email.split(delim), lambda x: x.count('@') == 1):
                        print(f'the delimiter is {delim}')
                        return list(map(lambda x: _clean_emails(x, False), email.split(delim)))
                print("Can't figure out what delimiter it is so lets just try cleaning in otherways")
            if not already_fixed:
                return _clean_emails(fix_common_email_problems(email), True)
            print(f'Giving up, {email} is probably just a really bad address due to {msg}')
            return []
    results = _clean_emails(email, False)
    return filter_(flatten_deep(results))

Add Actions to ActionKit Class

https://roboticdogs.actionkit.com/docs/manual/api/rest/actionprocessing.html

Google BigQuery Connector

We need to add this sooner rather than later.

Merge Redshift and PostgresCore

The methods are largely overlapping -- given that Redshift is an offshoot of Postgres -- so we should merge the code base, where possible to reduce lines of code.

Add Create Saved List to NGPVAN

This is a common need, so would be good to have.

Custom Fields endpoints in Parsons VAN

VAN charges to add custom fields into Pipeline. Meanwhile, custom fields can be crucial to EveryAction use-cases, and loading a value to a custom field on a person record requires not only the field's numeric ID, but also its group's. Rather than requiring a sync's end users to have to find and submit both IDs as parameters, a single call to get all custom field data would allow comparison to a submitted custom field ID (or even custom field name).

MobileCommons - Addl Endpoints

MC API Docs

We'll want to add functions for the following endpoints, listed in priority order:

Postgres support

It would be great if you could write tables to postgres servers as well as redshift. Being able to run queries from them and transfer data out of them would be a bonus.

VAN.get_events() not paginating correctly.

It appears as though the VAN.get_events() method is not paginating correctly.

GCS get_blob & download_blob methods shouldn't get/list bucket

Parsons GCS get_blob and download_blob methods don't need to get/list the bucket first. This would allow for a more limited set of permissions.

Hustle Class

Hustle just released a beta API, and we're eager to be able to get this incorporated into Parsons.

Docs here: https://api.hustle.com/docs/

Categories of endpoints in priority order are:

Leads
Agents
Groups
Tags

Onboarding Process

Hey! I mentioned this on the previous week's webinar but am now just making an Issue about it.

Problem

Parsons has a very important goal of lowering the barrier for people who wouldn't typically contribute to project or consider themselves technical enough to use parsons. I think that if we are trying to dramatically increase the amount of parsons users and contributors, then we need to focus more on the problems people encounter before they even make it to using parsons in their code- things like (speaking from very specific personal experience) how do I set up a development environment?

Proposed solution

I need at least one other collaborator to make a short guide on how to go from a place of knowing very little code (but having a lot of motivation to do something cool) to getting an environment set up and ready to use parsons (as in pip installing and ready to REPL some .py) and then building a small "hello world" type script using a feature (maybe something like splitting a large file into specific chunks, or turning json to csv- ideally something practical that real world people are always trying to do but are actually really straightforward to do in parsons).

I'm more than happy to take the lead on this but I would need help in splitting up this work for it to happen on a realistic timeline and be good enough to be useful.

Create an ActBlue class / connector

See this, to start - https://secure.actblue.com/docs/webhooks

permission denied error on Windows

Users are seeing a "Permission denied" on Windows machines error when running the Redshift.query method.

It looks like there is a bug in how Parsons manages temporary files in Windows.

Parsons used the Python standard library's tempfile.NamedTemporaryFile to create and track temporary files. The documentation says:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

So, the problem comes down to Parsons opening a temporary file, and then attempting to open that same file again later on, which doesn't work on Windows.

Create CiviCRM Class

Some helpful existing documentation: https://github.com/4ndygu/civicrm_osdi

Hustle: Create A Simple Global Opt Out Method

My suspicion is that this is going to be a pretty common use case. The idea is to be able to pass in a phone number and add it to the global DNC list.

VAN Connector

The VAN connector works, but it was the first one that we built and needs some love.

Create scaffolding for standardized APIConnector class
Create API GET methods that paginate
Refactor all relevant VAN methods to use the new GET method.
Create API POST method.
Refactor all relevant VAN methods to use the new POST method.
Create API DELETE methods.
Refactor all relevant VAN methods to use the new DELETE method.
Create API PATCH methods.
Refactor all relevant VAN methods to use the new PATCH method.

Redshift integration - "nullas" should be "null as"

The Redshift integration code creates a sql string with the copy function. However, one of the SQL data conversion parameters is incorrect on line 57 of the code.

if nullas: sql += f"nullas {nullas}"

According to the Redshift data conversion parameters documentation, the correct version of this SQL should be "NULL AS " not "NULLAS". Therefore, the Python that is using an f-string to create the SQL query should read as follows:

if nullas: sql += f"null as {nullas}"

@eliotst

Redshift: Upsert shouldn't fail if destination table doesn't exist.

When running an upsert, if the destination table does not exists, it currently fails. Instead, it should just run the Redshift.copy() method.

Salesforce Endpoints

Could probably just be a wrapper around https://github.com/simple-salesforce/simple-salesforce

I will probably have to use the above on a member project, and maybe be able to contribute to this from those lessons learned.

Create Rock The Vote Class

We have code in the TMC private repo, and can probably easily create a class.

h/t to Gerard at ACRONYM who I originally stole said code from.

alter_varchar_column_widths() - compare 2 redshift tables

It would be great to be able to compare 2 redshift tables, in addition to a parsons table and a redshift table.

Connection Error in the Person class

Hi there,

I found an error in the People class in the people.py file. I got the following error when trying to use the find_people method: 'VAN' object has no attribute 'post_request'. It looks like this is because it is calling the get_request and post_request methods from the VANConnector class on the VAN class.

I think the simplest fix may be to change the People constructor to:

class People(object):

    def __init__(self, van_object):

        self.connection = van_object.connection

So that self.connection in the People class refers to the VAN connection object rather than the VAN object.

VAN POST Bulk Import Method

As the incidence of VAN usage along with other tools increases, the need to create syncs that move data into VAN will as well. The prospect of using the Bulk Import endpoint rather than individual upserts could streamline syncs into VAN and make them more feasible at scale.

Phone Number Validation/Parsing

We should add in phone number validation/parsing tools.

Create MailChimp Class

VAN apply_activist_code() logger text

When a script uses apply_activist_code a message comes through saying “Method deprecated. Use apply_activist_code() or remove_activist_code().” The AC still gets applied, but it’s a lot of scary red text when it happens for every record.

VAN: logger error

The error message TypeError: string indices must be integers comes up when using toggle_activist_code() , with the following:

/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/parsons/ngpvan/people.py in apply_response(self, id, response, id_type, contact_type_id, input_type_id, date_canvassed, result_code_id)
435 logger.info(f'{id_type.upper()} {id} updated.')
436 else:
--> 437 logger.info(f"{r[1]['errors'][0]['code']}: {r[1]['errors'][0]['text']}")
438 raise ValueError(f"{r[1]['errors'][0]['text']}")
439

It looks like an API error from VAN is coming in as a string, but the logger in Parsons assumes it is a dictionary.

VAN: validation should be optional for upsert_person

When you pass a record to upsert_person, it validates that "the minimum combination of fields were passed." The VAN API doesn't actually reject records without this info; it will just always create new records if there's insufficient info to match on.

I understand that some users may want this validation to avoid adding duplicates, but others may still want to add new records for people they have sparser information on.

to_redshift() - max varchar length issues

I personally have a number of syncs that use to_redshift(if_exists='append') and the call often fails because of varchar length. I am proposing a few changes:

for any columns that are varchar, set the column to at lease varchar(100)
for other varchar lengths, round up! so varchar(862) would become varchar(1000)

Plz add OSDI support

I can haz OSDI support? ;-)

VAN: Approve Score Method

@elyse-weiss - This would be a great contribution for you to work on.

van.approve_scores(score_ids, raise_on_error=False)

Pass in a list of score ids. If they can be approved, then approve. Allow the user to specify if it should fail if it cannot be approved or just to log and return nothing.

MobileCommons - Add Connection Info to Documentation

Other classes have clear connection info at the top of documentation. We should add for MobileCommons.

Build Twilio Endpoints

I think we are most interested in the following endpoints:

https://www.twilio.com/docs/sms/api/message-resource#read-multiple-message-resources

More specific error when Table.from_csv() fails because of empty file

I was running some code to pull down files from Google Cloud Storage, and move them Redshift. I received this error when running Table.from_csv(), and when I investigated, it looked like it was just an empty file. Would love for the logging to reflect that, if possible.

.to_s3_csv should have parameter for acl

s3.put_file allows folks to set permissions on a file, but Table.to_s3_csv does not, and defaults to full permissions.

@eliotst feel free to edit!

APIConnector get_request not passing params

On line 83 of utilities/api_connector.py :
r = self.request(url, 'GET', params=None)

But I think it should be:
r = self.request(url, 'GET', params=params)

Unless it intentionally isn't?

Create to/from Postgres Table Methods

Similar to the Redshift pattern, create the Table.to_postgres() and the Table.from_postgres() convenience methods.

Redshift: Copy method to include column list

For appending to existing target tables, but source tables don't have all the columns/we want to use default values for one or more columnns (e.g. timestamps)