Giter Club home page Giter Club logo

ner_youtube's People

Contributors

wjbmattingly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ner_youtube's Issues

won't run with latest version of spaCy

I get this error. It appears to be a result of a later version of spacy being used. I am using spaCy version 3.3.1 . As I am a beginner, I don't know how to build a workaround or follow the instructions given. can you provide any suggestions.

ValueError Traceback (most recent call last)
Input In [11], in <cell line: 97>()
94 return (results)
96 patterns = create_training_data("../data/hp_characters.json", "PERSON")
---> 97 generate_rules(patterns)
98 # print (patterns)
100 nlp = spacy.load("hp_ner")

Input In [11], in generate_rules(patterns)
84 ruler = EntityRuler(nlp)
85 ruler.add_patterns(patterns)
---> 86 nlp.add_pipe(ruler)
87 nlp.to_disk("hp_ner")

File ~/Desktop/Current/NLP_Exp/mvenv/lib/python3.10/site-packages/spacy/language.py:773, in Language.add_pipe(self, factory_name, name, before, after, first, last, source, config, raw_config, validate)
771 bad_val = repr(factory_name)
772 err = Errors.E966.format(component=bad_val, name=name)
--> 773 raise ValueError(err)
774 name = name if name is not None else factory_name
775 if name in self.component_names:

ValueError: [E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy.pipeline.entityruler.EntityRuler object at 0x124583a40> (name: 'None').

  • If you created your component with nlp.create_pipe('name'): remove nlp.create_pipe and call nlp.add_pipe('name') instead.

  • If you passed in a component like TextCategorizer(): call nlp.add_pipe with the string name instead, e.g. nlp.add_pipe('textcat').

  • If you're using a custom component: Add the decorator @Language.component (for function components) or @Language.factory (for class components / factories) to your custom component and assign it a name, e.g. @Language.component('your_name'). You can then run nlp.add_pipe('your_name') to add it to the pipeline.

Spacy Romanian language [W036] The component 'entity_ruler' does not have any patterns defined. error

I'm noobing my way along the tutorial, trying to make a training module containing an entity ruler and test it. The language I'm working with is Romanian, maybe this has something to do with it. I can create it, load it but when I use it on a text it croaks on me.

Could you please help?

Here's my code:
` import spacy
import mysql.connector
import pandas as pd
from spacy.lang.ro import Romanian
from spacy.pipeline import EntityRuler
import json
import os

def load_data(file):
	with open(file, "r", encoding="utf-8") as f:
		data =json.load(f)
	return(data)

def generate_better_characters(file):
	data = load_data(file)
	print(len(data))
	new_characters = []
	for item in data:
		new_characters.append(item)
	for item in data:
		item = item.replace("și", "").replace("si", "").replace("Și", "")
		names = item.split()
		for name in names:
			name = name.strip()
			#	Debug
			#print(name)
			new_characters.append(name)
		if "(" in item:
			names = item.split("(")
			for name in names:
				name = name.replace(")", "").strip()
				new_characters.append(name)
		if "," in item:
			names = item.split(",")
			name = name.replace("și", "").replace("si", "").strip()
			#	Debug
			#print(name)
			if " " in name:
				new_names = name.split()
				for x in new_names:
					new_characters.append(x)
					#print(x)
			new_characters.apend(name)
	
	
	final_characters = []
	
	titles = ["Dr.", "Profesorul", "prof.", "Prof.", "Domnul", "domnul", "dl.", "Dl.", "Doamna", "doamna", "dna.", "domnisoara", "Domnisoara", "Dl. si dna."]
	
	for character in new_characters:
		if "" != character:
			final_characters.append(character)
			for title in titles:
				titled_char = f"{title} {character}"
				final_characters.append(titled_char)
	
	print(len(final_characters))
	final_characters = list(set(final_characters))
	print(len(final_characters))
	final_characters.sort()
	return(final_characters)

def create_training_data(file, type):
	data = generate_better_characters(file)
	patterns = []
	for item in data:
		pattern = {
					"label": type,
					"pattern": item
					}
		patterns.append(pattern)
	return(patterns)

def generate_rules(patterns):
	nlp = Romanian()
	ruler = EntityRuler(nlp)
	ruler.add_patterns(patterns)
	nlp.add_pipe("entity_ruler")
	nlp.to_disk("trained_ner")

patterns = create_training_data("C:\\tutorial.json", "PERSON")
generate_rules(patterns)

with open(f"C:\\mltutorial.txt", "r", encoding='utf-8') as f:
	text = f.read()

nlp = spacy.load("trained_ner")
doc = nlp(text)

for ent in doc.ents:
	print(ent.text)`

i get this error in generate_rules() function after install spacy GPU version on collab


ValueError Traceback (most recent call last)
in ()
60
61 patterns = create_training_data("/content/hp_char.json","PERSON")
---> 62 generate_rules(patterns)

2 frames
/usr/local/lib/python3.7/dist-packages/spacy/language.py in create_pipe(self, factory_name, name, config, raw_config, validate)
637 lang_code=self.lang,
638 )
--> 639 raise ValueError(err)
640 pipe_meta = self.get_factory_meta(factory_name)
641 config = config or {}

ValueError: [E002] Can't find factory for 'ruler' for language English (en). This usually happens when spaCy calls nlp.create_pipe with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator @Language.component (for function components) or @Language.factory (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.