Giter Club home page Giter Club logo

Comments (2)

IsaacGreenMachine avatar IsaacGreenMachine commented on September 3, 2024

here's the output, btw (please ignore the numbers for the cluster names)
scores are infinity

here's the topic info after fully training with partial_fit on my dataset:

topic_model.get_topic(1, full=True)
{'Main': [('1054375', inf),
  ('982395', inf),
  ('1013164', inf),
  ('1031508', inf),
  ('1031576', inf),
  ('957591', inf),
  ('1043766', inf),
  ('1054256', inf),
  ('1054349', inf),
  ('1054355', inf)]}
topic_model.get_topic_info()
	Topic	Count	Name	Representation	Representative_Docs
0	0	109104	0_30470_32665_14_29	[30470, 32665, 14, 29, 31, 10, 40, 44, 53, 45]	NaN
1	1	52972	1_1054375_982395_1013164_1031508	[1054375, 982395, 1013164, 1031508, 1031576, 9...	NaN
2	2	129841	2_369045_369865_371828_371766	[369045, 369865, 371828, 371766, 365563, 36557...	NaN
3	3	66873	3_27_13_16_38	[27, 13, 16, 38, 44, 3350, 46, 3574, 3831, 4222]	NaN
4	4	41935	4_12_22_31_14	[12, 22, 31, 14, 10, 43, 39, 42, 37, 38]	NaN
5	5	67877	5_220939_220545_222224_220700	[220939, 220545, 222224, 220700, 220669, 21894...	NaN
6	6	48487	6_1767968_1593953_1683623_1593883	[1767968, 1593953, 1683623, 1593883, 1683534, ...	NaN
7	7	35557	7_552517_545624_543607_552708	[552517, 545624, 543607, 552708, 543675, 55253...	NaN
8	8	75309	8_14_15_13_16	[14, 15, 13, 16, 42, 38, 29, 53, 10, 3675]	NaN
9	9	77294	9_218852_220508_218868_218942	[218852, 220508, 218868, 218942, 220498, 15, 3...	NaN
10	10	60717	10_599079_574468_571526_588418	[599079, 574468, 571526, 588418, 598261, 59877...	NaN
11	11	46438	11_4756774_4756616_4747992_4756619	[4756774, 4756616, 4747992, 4756619, 4756573, ...	NaN
12	12	91285	12_31_42_3350_40	[31, 42, 3350, 40, 14, 3389, 3381, 48, 3727, 44]	NaN
13	13	67976	13_387158_391193_392310_384981	[387158, 391193, 392310, 384981, 392368, 39232...	NaN
14	14	60478	14_19_13_38_41	[19, 13, 38, 41, 10, 44, 48, 52, 3727, 4204]	NaN
15	15	60942	15_240524_241155_240871_241228	[240524, 241155, 240871, 241228, 241243, 24117...	NaN
16	16	87910	16_218382_10_14_28	[218382, 10, 14, 28, 27, 29, 31, 37, 43, 39]	NaN
17	17	22748	17_849849_815686_839704_826626	[849849, 815686, 839704, 826626, 795510, 84975...	NaN
18	18	58772	18_517017_498821_515931_476445	[517017, 498821, 515931, 476445, 516308, 51588...	NaN
19	19	56241	19_220610_220836_39_13	[220610, 220836, 39, 13, 44, 48, 38, 14, 10, 51]	NaN
20	20	110401	20_28305_16_10_12	[28305, 16, 10, 12, 27, 37, 29, 22, 41, 44]	NaN
21	21	121051	21_15_17_12_27	[15, 17, 12, 27, 10, 18, 14, 37, 43, 42]	NaN
22	22	53491	22_32963_27_37_3381	[32963, 27, 37, 3381, 38, 3532, 3574, 3404, 39...	NaN
23	23	58245	23_30470_10_37_38	[30470, 10, 37, 38, 44, 29, 48, 3727, 4254, 4331]	NaN
24	24	39084	24_444630_437352_436133_440889	[444630, 437352, 436133, 440889, 436174, 43615...	NaN
25	25	13123	25_2883254_2842667_2866698_2866738	[2883254, 2842667, 2866698, 2866738, 2866690, ...	NaN
26	26	60774	26_239877_240079_240123_240453	[239877, 240079, 240123, 240453, 240659, 24070...	NaN
27	27	27280	27_1160426_1115483_1147040_1110211	[1160426, 1115483, 1147040, 1110211, 1110213, ...	NaN
28	28	70162	28_368293_364473_365277_364744	[368293, 364473, 365277, 364744, 365791, 36457...	NaN
29	29	78156	29_224484_222741_222735_224365	[224484, 222741, 222735, 224365, 222872, 22404...	NaN
30	30	39348	30_29260_31_18_12	[29260, 31, 18, 12, 43, 10, 42, 44, 3389, 53]	NaN
31	31	59289	31_33739_212113_212106_12	[33739, 212113, 212106, 12, 28, 43, 31, 14, 33...	NaN
32	32	79654	32_27230_27_14_10	[27230, 27, 14, 10, 22, 29, 38, 37, 41, 3350]	NaN
33	33	76397	33_222872_223204_14_10	[222872, 223204, 14, 10, 16, 22, 37, 47, 44, 29]	NaN
34	34	70535	34_30267_17_10_43	[30267, 17, 10, 43, 27, 47, 31, 37, 3397, 29]	NaN
35	35	40708	35_33454_13_16_43	[33454, 13, 16, 43, 29, 22, 41, 46, 52, 3574]	NaN
36	36	33484	36_30470_28_16_13	[30470, 28, 16, 13, 3389, 44, 10, 48, 37, 4222]	NaN
37	37	72767	37_222875_222950_223145_223221	[222875, 222950, 223145, 223221, 224365, 22330...	NaN
38	38	36732	38_253412_251896_253272_253500	[253412, 251896, 253272, 253500, 253488, 25324...	NaN
39	39	42591	39_250962_250920_248635_251452	[250962, 250920, 248635, 251452, 248993, 24901...	NaN
40	40	17007	40_4714314_4699011_4714064_4714090	[4714314, 4699011, 4714064, 4714090, 4714056, ...	NaN
41	41	68058	41_27321_27363_27393_27407	[27321, 27363, 27393, 27407, 14, 17, 18, 29, 2...	NaN
42	42	94062	42_27321_27411_27552_27259	[27321, 27411, 27552, 27259, 16, 14, 31, 27, 2...	NaN
43	43	79166	43_15_10_29_28	[15, 10, 29, 28, 3350, 37, 27, 3397, 53, 3381]	NaN
44	44	79278	44_879543_839601_849938_850763	[879543, 839601, 849938, 850763, 849858, 87072...	NaN
45	45	54102	45_224506_223904_224478_224046	[224506, 223904, 224478, 224046, 222820, 23759...	NaN
46	46	65796	46_14_27_17_29	[14, 27, 17, 29, 37, 38, 48, 3555, 52, 3582]	NaN
47	47	75014	47_38_52_29_3515	[38, 52, 29, 3515, 3695, 3636, 10, 3831, 3989,...	NaN
48	48	50618	48_251880_249120_251382_251243	[251880, 249120, 251382, 251243, 250920, 25040...	NaN
49	49	42643	49_239463_239393_238194_239129	[239463, 239393, 238194, 239129, 239184, 23920...	NaN

from bertopic.

MaartenGr avatar MaartenGr commented on September 3, 2024

I'm not entirely sure why this happens but it might be worthwhile to use the decay or delete_min_df parameter to prevent certain counts from blowing up. It would be worth a shot.

from bertopic.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.