move-coop / parsons Goto Github PK
View Code? Open in Web Editor NEWA python library of connectors for the progressive community.
Home Page: https://www.parsonsproject.org/
License: Other
A python library of connectors for the progressive community.
Home Page: https://www.parsonsproject.org/
License: Other
https://api.hustle.com/docs/#operation/createLead
createLead is priority, but all lead endpoints needed
Right now, rs.upsert()
creates a table with a timestamp. If the upsert fails, the timestamp table remains. Can we drop it?
Often times, when we run a query, we don't need a Parsons table, we just need a single value. This would be a convenience method that allows for this.
The docs for the upsert_person
method say it returns a Parsons Table object, but when I used it, it returned a dictionary of the vanid and other info about the person (if matched), or the vanid and a status
(if unmatched/new).
Documentation should be clear here.
The current pattern in parsons
is to load every submodule into parsons/__init__.py
.
This is neither sustainable nor feasible for systems that require smaller upload sizes/memory footprints (e.g. AWS Lambda). The top of that file makes the goal clear:
"Eg. This allows for: from parsons import VAN
"
However, it means every install and deploy of parsons requires an ever-increasing number of dependencies.
Proposal:
parsons_core
where we move the sub-directories, and then import them in parsons/
from parsons_core.from parsons_core.nvpvan import VAN
pip install --no-deps parsons
-- maintain a requirements-core.txt for cross-source dependencies that can/should stay in core.When the credentials are incorrect, the script raises xml.parsers.expat.ExpatError: mismatched tag:. Instead, it should check for a <Response [401]> and raise a more descriptive error like Invalid credentials.
There are a few bugs I discovered in the existing MobilleCommons class:
Please additionally test the endpoints that add and delete.
here is some helpful code to get started with!
To reproduce:
from parsons import Table, Airtable
# Assuming all credentials and other data in env vars
at = Airtable()
# Successful call
all_rows = at.get_records()
print(all_rows.num_rows)
# Call with filters that would return zero rows will appear to succeed, but...
no_rows = at.get_records(formula="FIND('SOMETHING UNFINDABLE', {Some Column}) > 0")
# Error kicks in when you try to do anything with the results
print(no_rows.num_rows)
This returns an error ValueError: 'fields' is not in list
, which suggests that there may be a problem with the use of unpack_dict()
inside of get_records()
Cleaning signup sheets and other human entered email addresses.
Here's the code I use that may be helpful to y'all. (It's a bit messy.) It uses email-validator and pydash (because I'm sad I don't get to write in node.js).
from email_validator import validate_email, EmailNotValidError, EmailSyntaxError, EmailUndeliverableError
import re
from pydash import predicates
from pydash.strings import trim, reg_exp_replace, clean, deburr
from typing import List
from pydash.collections import every, filter_
from pydash.arrays import flatten_deep
def empty_if_null(value: str) -> str:
return value if value else ""
def trim_non_printing(value: str) -> str:
value = trim(value)
value = reg_exp_replace(value, '[\u202a\u25a0\u00a0\s]+$', '')
value = reg_exp_replace(value, '^[\u202a\u25a0\u00a0\s]+', '')
return value
def clean_email_string(value: str) -> str:
if not predicates.is_string(value):
return ""
# lowercase everything
value = trim_non_printing(clean(deburr(value)))
# strip spaces in the middle of the address
value = reg_exp_replace(value, r'\s+', '')
return value
email_display_name_re = re.compile(r".+\<(?P<email>[^@]+@[^\>]+)\>")
def fix_common_email_problems(value: str) -> str:
if email_display_name_re.match(value):
components = email_display_name_re.search(value)
value = components.group('email')
value = clean_email_string(value)
# trim off the start or end: , . : " > < '
# then trim whitespace again
value = trim(trim(value, ',.:"><\''))
# fix common suffix issues (could do a better job with this though...)
value = reg_exp_replace(value, r',com$', '.com')
return value
def clean_emails(email: str) -> List[str]:
def _clean_emails(email: str, already_fixed: bool) -> List[any]:
try:
return [validate_email(email)['email'].lower()]
except EmailNotValidError as e:
msg = str(e)
if 'It must have exactly one @-sign' in msg:
print(f'try splitting or {email} with {email.count("@")} @ signs')
for delim in [';', '/', ',', '|']:
# if email is split by this delimiter, do we end up with one @ in each set?
# if so, split on that delimiter and treat each as their own address in need
# of cleaning.
if every(email.split(delim), lambda x: x.count('@') == 1):
print(f'the delimiter is {delim}')
return list(map(lambda x: _clean_emails(x, False), email.split(delim)))
print("Can't figure out what delimiter it is so lets just try cleaning in otherways")
if not already_fixed:
return _clean_emails(fix_common_email_problems(email), True)
print(f'Giving up, {email} is probably just a really bad address due to {msg}')
return []
results = _clean_emails(email, False)
return filter_(flatten_deep(results))
We need to add this sooner rather than later.
The methods are largely overlapping -- given that Redshift is an offshoot of Postgres -- so we should merge the code base, where possible to reduce lines of code.
This is a common need, so would be good to have.
VAN charges to add custom fields into Pipeline. Meanwhile, custom fields can be crucial to EveryAction use-cases, and loading a value to a custom field on a person record requires not only the field's numeric ID, but also its group's. Rather than requiring a sync's end users to have to find and submit both IDs as parameters, a single call to get all custom field data would allow comparison to a submitted custom field ID (or even custom field name).
We'll want to add functions for the following endpoints, listed in priority order:
It would be great if you could write tables to postgres servers as well as redshift. Being able to run queries from them and transfer data out of them would be a bonus.
It appears as though the VAN.get_events()
method is not paginating correctly.
Parsons GCS get_blob
and download_blob
methods don't need to get/list the bucket first. This would allow for a more limited set of permissions.
Hustle just released a beta API, and we're eager to be able to get this incorporated into Parsons.
Docs here: https://api.hustle.com/docs/
Categories of endpoints in priority order are:
Hey! I mentioned this on the previous week's webinar but am now just making an Issue about it.
Parsons has a very important goal of lowering the barrier for people who wouldn't typically contribute to project or consider themselves technical enough to use parsons. I think that if we are trying to dramatically increase the amount of parsons users and contributors, then we need to focus more on the problems people encounter before they even make it to using parsons in their code- things like (speaking from very specific personal experience) how do I set up a development environment?
I need at least one other collaborator to make a short guide on how to go from a place of knowing very little code (but having a lot of motivation to do something cool) to getting an environment set up and ready to use parsons (as in pip installing and ready to REPL some .py) and then building a small "hello world" type script using a feature (maybe something like splitting a large file into specific chunks, or turning json to csv- ideally something practical that real world people are always trying to do but are actually really straightforward to do in parsons).
I'm more than happy to take the lead on this but I would need help in splitting up this work for it to happen on a realistic timeline and be good enough to be useful.
See this, to start - https://secure.actblue.com/docs/webhooks
Users are seeing a "Permission denied" on Windows machines error when running the Redshift.query
method.
It looks like there is a bug in how Parsons manages temporary files in Windows.
Parsons used the Python standard library's tempfile.NamedTemporaryFile
to create and track temporary files. The documentation says:
Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).
So, the problem comes down to Parsons opening a temporary file, and then attempting to open that same file again later on, which doesn't work on Windows.
Some helpful existing documentation: https://github.com/4ndygu/civicrm_osdi
My suspicion is that this is going to be a pretty common use case. The idea is to be able to pass in a phone number and add it to the global DNC list.
The VAN connector works, but it was the first one that we built and needs some love.
GET
methods that paginateGET
method.POST
method.POST
method.DELETE
methods.DELETE
method.PATCH
methods.PATCH
method.The Redshift integration code creates a sql string with the copy function. However, one of the SQL data conversion parameters is incorrect on line 57 of the code.
if nullas: sql += f"nullas {nullas}"
According to the Redshift data conversion parameters documentation, the correct version of this SQL should be "NULL AS " not "NULLAS". Therefore, the Python that is using an f-string to create the SQL query should read as follows:
if nullas: sql += f"null as {nullas}"
When running an upsert, if the destination table does not exists, it currently fails. Instead, it should just run the Redshift.copy()
method.
Could probably just be a wrapper around https://github.com/simple-salesforce/simple-salesforce
I will probably have to use the above on a member project, and maybe be able to contribute to this from those lessons learned.
We have code in the TMC private repo, and can probably easily create a class.
h/t to Gerard at ACRONYM who I originally stole said code from.
It would be great to be able to compare 2 redshift tables, in addition to a parsons table and a redshift table.
Hi there,
I found an error in the People class in the people.py
file. I got the following error when trying to use the find_people method: 'VAN' object has no attribute 'post_request'
. It looks like this is because it is calling the get_request
and post_request
methods from the VANConnector class on the VAN class.
I think the simplest fix may be to change the People constructor to:
class People(object):
def __init__(self, van_object):
self.connection = van_object.connection
So that self.connection
in the People class refers to the VAN connection object rather than the VAN object.
As the incidence of VAN usage along with other tools increases, the need to create syncs that move data into VAN will as well. The prospect of using the Bulk Import endpoint rather than individual upserts could streamline syncs into VAN and make them more feasible at scale.
We should add in phone number validation/parsing tools.
The error message TypeError: string indices must be integers
comes up when using toggle_activist_code() , with the following:
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/parsons/ngpvan/people.py in apply_response(self, id, response, id_type, contact_type_id, input_type_id, date_canvassed, result_code_id)
435 logger.info(f'{id_type.upper()} {id} updated.')
436 else:
--> 437 logger.info(f"{r[1]['errors'][0]['code']}: {r[1]['errors'][0]['text']}")
438 raise ValueError(f"{r[1]['errors'][0]['text']}")
439
It looks like an API error from VAN is coming in as a string, but the logger in Parsons assumes it is a dictionary.
When you pass a record to upsert_person
, it validates that "the minimum combination of fields were passed." The VAN API doesn't actually reject records without this info; it will just always create new records if there's insufficient info to match on.
I understand that some users may want this validation to avoid adding duplicates, but others may still want to add new records for people they have sparser information on.
I personally have a number of syncs that use to_redshift(if_exists='append')
and the call often fails because of varchar length. I am proposing a few changes:
I can haz OSDI support? ;-)
@elyse-weiss - This would be a great contribution for you to work on.
van.approve_scores(score_ids, raise_on_error=False)
Pass in a list of score ids. If they can be approved, then approve. Allow the user to specify if it should fail if it cannot be approved or just to log and return nothing.
Other classes have clear connection info at the top of documentation. We should add for MobileCommons.
I think we are most interested in the following endpoints:
s3.put_file
allows folks to set permissions on a file, but Table.to_s3_csv
does not, and defaults to full permissions.
@eliotst feel free to edit!
On line 83 of utilities/api_connector.py :
r = self.request(url, 'GET', params=None)
But I think it should be:
r = self.request(url, 'GET', params=params)
Unless it intentionally isn't?
Similar to the Redshift pattern, create the Table.to_postgres()
and the Table.from_postgres()
convenience methods.
For appending to existing target tables, but source tables don't have all the columns/we want to use default values for one or more columnns (e.g. timestamps)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.