jasonrig / address-net Goto Github PK
View Code? Open in Web Editor NEWA package to structure Australian addresses
License: MIT License
A package to structure Australian addresses
License: MIT License
Hello,
I had used your package and it is very usefull. But the my data is formatted in UNICODE, which is Vietnamese, and it not working well. So can i use your code to retrain a new model for my own Vietnamese data? If yes, can you please help me? Thank you a lot.
For UNICODE example, "Số nhà 25, ngõ 294 Kim Mã, Phường Kim Mã, Quận Ba Đình, Thành phố Hà Nội". "street" is now "ngõ", "state" is now "Quận", ...
Sorry for my bad english,
Looking forward to hearing from you soon.
python 3.5.2
import addressnet.predict as address_lib
print(address_lib.predict_one("Jubilee Street Newport,VIC,3015,AU")["locality_name"])
>NEWPORTU # Should be NEWPORT
print(address_lib.predict_one("Jubilee Street Newport,VIC,3015")["locality_name"])
>NEWPORT # Correct
Hi,
On applying addressnet to address be like "677 Timpany BLVD" , predict_one shows street_type as "BLVD" instead of "BOULEVARD".
Well, apart from this i want to apply it to USA address. Could you please guide me on that.
Any help would be appreciated.
Thanks
for addresses like
Example1:
'rathbone mirrison bakery kenmore road'
it gives this
{"building_name": "RATHBONE MIRRISONB", "street_name": " AKERY KENMORE", "street_type": "ROAD"}
Example2:
pontefract general infirmary southgate pontefract
{"street_name": "PONTEFRAT", "building_name": "CGENERALINFIRMARYSOUTHATE", "locality_name": "GPONTEFR", "state": "AUSTRALIAN CAPITAL TERRITORY", "street_type": "COURT"}
Can you tell me how it can be fixed. seems like a problem in script rather than model.
I'm trying to re-implement this in Keras. What's the output shape for this model? Does it output the indeces around text that falls in each category, or something completely different?
The inference batch size is currently fixed to 1. This should be configurable.
See: https://github.com/jasonrig/address-net/blob/master/addressnet/dataset.py#L499
Hi,
A (unfortunately) common abbreviation for level I have come across is a simple L. For example : UNIT 900, L 9, 50 THINGO ST, HOOHAAVILLE, VIC 3000. I even tried adding L to lookups.py and deleting the cache but to no avail. The kind of result I get is :
"flat_number": "9009",
"flat_number_prefix": "L",
"flat_type": "UNIT",
"locality_name": "HOOHAAVILLE",
"number_first": "50",
"original": "UNIT 900, L 9, 50 THINGO ST, HOOHAAVILLE, VIC 3000",
"postcode": "3000",
"state": "VICTORIA",
"street_name": "THINGO",
"street_type": "STREET"
or L9 with no space drags the 9 into the 50.
Is there a way to get L in and recognised?
Hello, First of all, thank you for the opportunity to use the code you wrote.
I'm trying to train a new model, but the result I get after that is very wrong.
{'street_name': '168A SEPARATION STREET NO', 'locality_name': 'COTE, VIC 3070'}
The code I use is the following, can you share your code or information where I might be mistaken?
Thank you so much.
import argparse
import datetime
import tensorflow as tf
import addressnet.dataset as dataset
from addressnet.model import model_fn
def _get_estimator(model_fn, model_dir):
config = tf.estimator.RunConfig(tf_random_seed=17, keep_checkpoint_max=5, log_step_count_steps=2000,
save_checkpoints_steps=2000)
return tf.estimator.Estimator(model_fn=model_fn, model_dir=model_dir, config=config)
def train(tfrecord_input_file: str, model_output_file: str):
input_file_only = os.path.basename(tfrecord_input_file)
model_output_file_path = f'{model_output_file}/{input_file_only}'
#print('Start training...')
#print(f'tfrecord_input_file={tfrecord_input_file}')
#print(f'model_output_file={model_output_file}')
#print('Get estimator...')
address_net_estimator = _get_estimator(model_fn, model_output_file_path)
#print('Load dataset...')
tfdataset = dataset.dataset(tfrecord_input_file)
#print('Training model...')
start = datetime.datetime.now()
model = address_net_estimator.train(tfdataset)
end = datetime.datetime.now()
print('Evaluate model...')
evaluation = model.evaluate(tfdataset)
print(f'evaluation={evaluation}')
print(f'Finished training in {end - start} sec on file {input_file_only}. '
f'Model saved to {model_output_file_path}')
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--tfrecord_input_file", help="Tfrecord input file from generate_tf_records.py")
parser.add_argument("--model_output_file", help="Model output file")
args = parser.parse_args()
train(args.tfrecord_input_file, args.model_output_file)
Currently, I want to try to retrain a new model but it's hard for me.
As you said, "you are free to train this model using the model_fn provided" https://github.com/jasonrig/address-net#pretrained-model
So I have a question,
Is the model_fn function in model.py for training a new model? If not, so how to train a new model? Could you explain it help me?
Hi Jason,
I used your default trainer to decompose about 2500 addresses that I am trying to match to GNAF (or more specifically VICMAP_ADDRESS). Thanks for posting it.
It worked pretty well, though slow. Up to 7 seconds each on my micro cloud shell.
Maybe this was due to the warning message.
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7eff508251e0>) includes params argument, but params are not passed to Estimator.
row 4 decomposing Address: 146/2 NOONE STREET CLIFTON HILL
predicting for 146/2 NOONE STREET CLIFTON HILL
These addresses were scraped from an old council document so don't follow modern standards. There is no postcode.
Here is a subset of the results.
NormalAddress,number_last_suffix,state,postcode,number_first,street_type,number_last,locality_name,building_name,street_name,flat_number
142 NOONE STREET CLIFTON HILL,,,,142,STREET,,CLIFTON HILL,,NOONE,
144 NOONE STREET CLIFTON HILL,,,,144,STREET,,CLIFTON HILL,,NOONE,
146 NOONE STREET CLIFTON HILL,,,,146,STREET,,CLIFTON HILL,,NOONE,
146/1 NOONE STREET CLIFTON HILL,,,,,STREET,1,CLIFTON HILL,,NOONE,146
146/2 NOONE STREET CLIFTON HILL,,,,,STREET,2,CLIFTON HILL,,NOONE,146
146/7 NOONE STREET CLIFTON HILL,,,,,STREET,7,CLIFTON HILL,,NOONE,146
146/8 NOONE STREET CLIFTON HILL,,,,48,STREET,,CLIFTON HILL,,NOONE,16
146/0 NOONE STREET CLIFTON HILL,,,,40,STREET,,CLIFTON HILL,,NOONE,16
160 NOONE STREET CLIFTON HILL,,,,160,STREET,,CLIFTON HILL,,NOONE,
162 NOONE STREET CLIFTON HILL,,,,162,STREET,,CLIFTON HILL,,NOONE,
Notice that most addressed were decomposed correctly, but 146/8 and 146/0 were converted incorrectly. Interesting that the RNN generated new numbers 16 and 48 which are not in the input data. Its repeatable. Adding a postcode does not change the behaviour.
To be sure, this is not the standard way to write a unit address. Note that 8/146 converts fine.
8/146 NOONE STREET CLIFTON HILL,,,,146,STREET,,CLIFTON HILL,,NOONE,8
0/146 NOONE STREET CLIFTON HILL,,,,146,STREET,,CLIFTON HILL,,NOONE,0
Also, when the number has a suffix, it sometimes gets added to the first_number
176B NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,B,,,,
176C NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,C,,,,
176D NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,D,,,,
176E NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,E,,,,
176G NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,G,,,,
176G NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,G,,,,
176H NOONE STREET CLIFTON HILL,,,,176,STREET,,CLIFTON HILL,,NOONE,,H,,,,
176I NOONE STREET CLIFTON HILL,,,,176I,STREET,,CLIFTON HILL,,NOONE,,,,,,
176J NOONE STREET CLIFTON HILL,,,,176J,STREET,,CLIFTON HILL,,NOONE,,,,,,
Also, there are some issues when the address is
88 THE ESPLANADE CLIFTON HILL,,,,88,,,CLIFTON HILL,,THE ESPLANADE,,,
It does not detect THE ESPLANADE as a road_name, road_type.
Hello, this is looking to be perfect for the project I'm working on right now but encountered an issue with what seems to be the Tensorflow version dependency. I understand the TF version needs to be higher than 1.12 and I'm using 2.2 but there seems to have been changes to the TF syntax.
E.g. get_variable
is now Variable
, random_normal
is now random.normal
in the model.py
file.
I tried to fix these and it worked but now I'm getting an error in a tensorflow file tensorflow\python\ops\variables.py
Line 261: TypeError: _variable_v2_call() got an unexpected keyword argument 'initializer'
.
Can you let me know please which specific Tensorflow version you used during the development?
Hello @jasonrig @Stallon-niranjan ,
I was working on retraining of addressnet. I did it successfully, now i want to find the confidence score / probability of model. Like how much my model is confidence (86% confident of address result generated)
For which I tried using tf.nn.softmax but it's throwing an error.
Value error:- "Truth value of an array with more than one element is ambiguous. Use a.any or a.all".
Is there any way if you guys can help me out to find out confidence score, probability function which helps me out to use addressnet over millions of addresses.
Any help would be appreciated.
Thanks & Regards
Aj.
I have installed python 3.6.8 and tensorflow 1.15 and still running into few installation issues. When I am trying to install
from addressnet.predict import predict_one , i get SyntaxError: future feature annotations is not defined.
As far as i know, function annotations are a feature that was introduced in Python 3.0. This should ideally work given that i have python 3.0.
Can you please help me fix this installation issue. It would be better if anyone could paste the installation steps.
Any help on this would really appreciated.
Hi Jason,
Great stuff and thanks for posting this up.
In real world, a lot of people use Postal address, like this:
GPO Box 500606 Canberra Act 2004
When it goes through your parser, it returns :
{'street_name': 'GPO', 'street_type': 'BROW', 'postcode': '5006062004', 'locality_name': 'CANBERRA', 'state': 'AUSTRALIAN CAPITAL TERRITORY'}
Not sure why it generated a street type BROW. Is there training data that could address this?
Or would that require some re-coding?
Cheers,
Nick
https://github.com/jasonrig/address-net/blob/master/addressnet/predict.py#L111
Add blank dictionary for params
to suppress warning and improve inference performance. See issue #1 for details.
code : from addressnet.predict import predict_one
Output : AttributeError: module 'tensorflow' has no attribute 'FixedLenFeature'
Hi,
With the pretrained model if I have an address like this: Lot 442, 123 AAA RD, BBB, WA 6000 it will get parsed nicely like this:
"flat_number": "442",
"flat_type": "LOT",
"locality_name": "BBB",
"number_first": "123",
"postcode": "6000",
"state": "WESTERN AUSTRALIA",
"street_name": "AAA",
"street_type": "ROAD"
Nice !
However if the Lot number increases to 4 characters like this: Lot 4424, 123 AAA RD, BBB, WA 6000 then I get odd results like this:
"building_name": "O",
"flat_number": "4424",
"flat_type": "LOT",
"locality_name": "BBB",
"number_first": "123",
"postcode": "6000",
"state": "WESTERN AUSTRALIA",
"street_name": "AAA",
"street_type": "ROAD"
Is there a way to fix this ?
P.S. Really great program by the way !
Hi,
I've downloaded "addressnet" from this repo, unfortunately it is not working with latest version of "Tensor Flow." I'm using Python => 3.11.4 and tf => 2.14. Do you have any plan to release a new version of "addressnet" using latest version of Python and TF?
Thx in advance.
module 'tensorflow' has no attribute 'get_variable'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.