Hi Josh, hope you can help. I have run into a weird problem I have t

Despite space, its not placing words?,about joshbduncan/word-search-generator

Comments (14)

nicolesimon13 commented on September 26, 2024 1

And if somebody else runs into this while installing:
error message:
word-search-generator 3.5.0 requires fpdf2==2.4.2, but you have fpdf2 2.7.5 which is incompatible.

this worked:

pip uninstall fpdf2
pip install fpdf2==2.4.2
pip install --no-deps git+https://github.com/joshbduncan/word-search-generator

from word-search-generator.

joshbduncan commented on September 26, 2024 1

Let me address the mask question...

If I am using mask (to create a rectangle instead of square), do I need to apply the mask after every generation?
I would think that it is enough?

No, you don't need to reapply the mask after you call generate because the puzzle shape doesn't change and the puzzle generator takes the current mask(s) into account when placing words.

FYI, any mask(s) that is/are calculated (e.g. the built-in shapes) are automatically recalculated anytime the puzzle size is changed. For numerous reasons, a Rectangle mask isn't autocalculated. So if you change the puzzle size the mask will remain however it was set. You can always change the rectangle size to fit the new puzzle size.

from word-search-generator.

joshbduncan commented on September 26, 2024 1

Just a note about validators...

Validators are built from a simple abstract base class so you can create your own very easily or use any of the pre-built validators included in the package. Just can provide a few or as many was you want to the puzzle object as a list.

class Validator(ABC):
    """Base class for the validation of words.

    To implement your own `Validator`, subclass this class.

    Example:
        ```python
        class Palindrome(Validator):
            def validate(self, value: str) -> bool:
                return value == value[::-1]
        ```
    """

Pre-built Validators

NoSingleLetterWords
NoPunctuation
NoPalindromes
NoSubwords

Custom Validator Example

class NoMoreOs(Validator):
    """A validator to ensure no words with the letter 'O' are valid."""

    def validate(self, value: str, *args, **kwargs) -> bool:
        return "o" not in value.lower()

from word-search-generator.

joshbduncan commented on September 26, 2024 1

On your question about placement...

I did a simple test using your words and puzzle size from above.

from collections import defaultdict
from word_search_generator import WordSearch

# your long 118 word wordlist
words = words = "TELLURIUM,CHROMIUM,NICKEL..."

# custom word search puzzle to allow for larger sizes and wordlists
class ReallyBigWordSearch(WordSearch):
    MAX_PUZZLE_SIZE = 1_000_000
    MAX_PUZZLE_WORDS = 2_000

# create the puzzle
p = ReallyBigWordSearch(words, size=60, validators=None)

# ensure all words were placed
assert not p.unplaced_words

# see which directions were used
d = defaultdict(int)
for word in p.placed_words:
    d[word.direction] += 1

After running that twice the results were as below...

# run #1
defaultdict(int,
    {
        <Direction.E: (0, 1)>: 39,
        <Direction.SE: (1, 1)>: 27,
        <Direction.NE: (-1, 1)>: 22,
        <Direction.S: (1, 0)>: 30
    }
)

# run #2
defaultdict(int,
    {
        <Direction.E: (0, 1)>: 28,
        <Direction.S: (1, 0)>: 33,
        <Direction.NE: (-1, 1)>: 23,
        <Direction.SE: (1, 1)>: 34
    }
)

This seems like a pretty good distribution considering the puzzle size and quantity of words. If no level is set for the puzzle it defaults to 2 which allows words to go NE, E, SW, and S. If a level of 1 is specified, then words will only go E and S.

When the generator runs, it first picks a random location on the board, sees if it is available, and then determines which of the available directions (set by the level) the word will fit. It then picks a random direction from all that were valid. Now, some other factors could force words to be mostly horizontal and vertical (N, E, S, or W). If you have a word that is 11 characters long and you have masked your puzzle to only be 10 wide by 15 tall, then the word will not fit at an angle (NE, SE, SW, or NW) as there is not enough room (which is the case in some of your example pics).

Using words almost as long as the puzzle size can also cause this issue. If the first word the generator places is a 10-character word in the middle of a 6-wide by 10-high puzzle unless that word shares lots of similar characters with the other words, most of the valid directions will only be horizontal and vertical.

In your second picture above, if you look carefully at the puzzle, after a few words have been placed it would be hard to fit any of the rest on an angle. This is one time when increasing the size would help.

The generator algorithm could be written to keep trying numerous directions for each word (and backtracking) to increase variability, but that would greatly slow down puzzle generation which I don't want to do. And who is to say what the correct variability is?

Hopefully, this all makes sense.

from word-search-generator.

joshbduncan commented on September 26, 2024 1

Yes but i am keeping a running report on how many did fit. Lets say I am starting with a 15x15 and have 50 words - the ten tries I do in my loop will tell you 40-45 - that tells me "needs to be bigger". I want to start as small as possible but usually there is a size when you can say "this list with this grid will work well".

One way you could look at this is, a 15x15 puzzle has 225 available spaces. If every space was taken by word characters with an average word length of 6 (built-in wordlist average but not representative of your sample words), the puzzle would at most hold 37 words. Since that isn't really possible, I would take maybe 80% of that.

In testing this 100 times using random English words, the average placed word count for a 15x15 puzzle with a 50 word wordlist is 38.63.

from word_search_generator import WordSearch
from word_search_generator.utils import get_random_words

cts = []
for _ in range(100):
    words = ",".join(get_random_words(50))
    p = WordSearch(words, size=15)
    cts.append(len(p.placed_words))
print(sum(cts)/100)
# 38.63

from word-search-generator.

joshbduncan commented on September 26, 2024

So, you are probably working on the latest Pypi release of the package v3.5.0. That release has a slightly different placement algorithm than the latest code here on GitHub. Before I explain what is happening, I would suggest installing the latest version directly from Git using the code below...

$ pip install git+https://github.com/joshbduncan/word-search-generator

Word Validation: When sending words to a WordSearch puzzle, the words are first validated. So, single-letter words, words with punctuation, palindromes, and "sub-words" are all discarded. Single-letter words and words with punctuation are pretty self-explanatory... Palindromes just cause confusion (especially with the key starting position and directions)... And, "subwords", or words that are parts of other words, obviously aren't normally valid as the word could be found twice in the same puzzle.

Your word list has numerous words that fail the validation so there will always be a few that will never show up in the puzzle as they are discarded during initialization. Use the WordSearch.words property to see which words were validated, and the WordSearch.placed_words property to see which words were actually placed (WordSearch.unplaced_words shows which weren't).

You can also use len(WordSearch.placed_words) == len(WordSearch.words) to see if all words were placed, or set the require_all_words property to True which will throw an exception anytime the puzzle is generated and all words are not placed.

In the latest version of the code, the word validation has been extracted, allowing you to specify which validators you want to use (or none).

from word_search_generator import WordSearch

words = "TELLURIUM,CHROMIUM,NICKEL..."
p = WordSearch(words, size=60, validators=None)

## or to require all words to be placed
p = WordSearch(words, size=60, require_all_words=True, validators=None)

There are still times when every word will not be placed. The generator is very good at placing words and can fill up the available space with all words but sometimes (for various reasons), it is not possible. If you want to keep generating different versions of the puzzle layout until all words can be placed you can use the code below.

words = "TELLURIUM,CHROMIUM,NICKEL..."
p = WordSearch(words, size=60, validators=None)
while p.unplaced_words:
    p.generate()

Finally, you asked about the word and size limits... It comes down to the puzzle generation time and PDF output. Puzzles larger than 50 in size don't fit well on a letter-size sheet of paper that is used in the PDF output. The text becomes so small is it illegible. Same for max words, the wordlist and key can't fit on the page.

In the latest version of the code, the defaults are set on the WordSearch object so you can subclass the base object and set your own defaults.

class ReallyBigWordSearch(WordSearch):
    MAX_PUZZLE_SIZE = 1_000_000
    MAX_PUZZLE_WORDS = 2_000

p = ReallyBigWordSearch(words, size=60, validators=None)

from word-search-generator.

nicolesimon13 commented on September 26, 2024

thank you again for your answers. I really appreciate them.

Word Validation
I had wondered about that but did not want to ask. Maybe take this paragraph and add it to the documentation / faq?

And, "subwords", or words that are parts of other words, obviously aren't normally valid as the word could be found twice in the same puzzle.

Usually likely good, but in this case not what is wanted.
The person solving this puzzle would know about the duplication.

Maybe somebody else reads this - I am currently consider using a format where I regenerate with size+1 if not all are placed after a few tries. That will not do for "I need 15x15" but in cases like this I can then start with 50 and go from there.

"In the latest version of the code, the defaults are set on the WordSearch object so you can subclass the base object and set your own defaults."
This would take care of the problem of me having to edit the source code for that, right?

And I understand the limitation for the PDF but in that case I think a big big warning output would be better than just limiting the user. Instead of being able to say "screw it I know what I am doing" I now need to add manually a subclass. And also - this is perfectly well written, just make an odds and ends docu on the wiki, there are several things you answered me which can go in there as faqs! :)

And last a simple yes no question I hope:
If I am using mask (to create a rectangle instead of square), do I need to apply the mask after every generation?
I would think that it is enough?


words = "TELLURIUM,CHROMIUM,NICKEL..."
p = WordSearch(words, size=60, validators=None)
 p.apply_mask(Rectangle(mask_x, mask_y))
 <other code>
while p.unplaced_words:
    p.generate()

Again thanks for the quick reply!

from word-search-generator.

nicolesimon13 commented on September 26, 2024

this is more a fyi observation as something I have noticed throughout and since it is also about placement. ;)

this grid is done only s+e, using the latest version.
The result is only going south. I needed to run it seven more times until i got a version which is not just one direction.
I have seen similar things also on level 3 grids - tons of just vertical or just horizontal, often not even backwards, even if enough space is available.

it looks like there is not enough (for the lack of a better word) 'random diversity'.
yes it is a valid placement but it is a very boring puzzle. Even my biggest puzzles run at maybe 30 seconds generation time for my full script - I can spare the cycles for the grid to at least try to be better than this. As I mentioned, I am not a good coder, so I am not sure what you are using for placements, but it looks to me like it should try to cycle through the valid directions.

the approach I am going to use:
I determine how much diversity I want in my puzzle and will run the generation as long as it will take. f.e. if i am doing s+e, I want at least 45% to be S etc.

1 big grid - 'simple' filled, boring, orange easily could have been placed across
2 small grid - here it needs to fill like this because i said S+E
3 another example of only south
4 after seven more tries

from word-search-generator.

joshbduncan commented on September 26, 2024

Maybe somebody else reads this - I am currently consider using a format where I regenerate with size+1 if not all are placed after a few tries. That will not do for "I need 15x15" but in cases like this I can then start with 50 and go from there.

Increasing the size doesn't mean all words will be placed. On larger puzzles, size isn't usually the limiting factor. Typically it is word length, word count, and word placement. If have a bunch of really long words in a smaller puzzle. once a few words are placed the available positing for the other words is limited.

from word-search-generator.

nicolesimon13 commented on September 26, 2024

Maybe somebody else reads this - I am currently consider using a format where I regenerate with size+1 if not all are placed after a few tries. That will not do for "I need 15x15" but in cases like this I can then start with 50 and go from there.

Increasing the size doesn't mean all words will be placed. On larger puzzles, size isn't usually the limiting factor. Typically it is word length, word count, and word placement. If have a bunch of really long words in a smaller puzzle. once a few words are placed the available positing for the other words is limited.

Yes but i am keeping a running report on how many did fit. Lets say I am starting with a 15x15 and have 50 words - the ten tries I do in my loop will tell you 40-45 - that tells me "needs to be bigger". I want to start as small as possible but usually there is a size when you can say "this list with this grid will work well".

from word-search-generator.

nicolesimon13 commented on September 26, 2024

On your question about placement...

I did a simple test using your words and puzzle size from above.

thank you for the code that would have been another day for me with chatgpt help to get to those numbers - and it would not have been so elegant. And it is good to hear that it is as it should be (random). It also make more sense - your program is so nice it would have been weird if it not had been. Clearly it is a "it me not you" aka my wordlist which is why i am going to use your tuble thingy to regenerate until it is what I need / want it to be. By now I have it set up to do "here is my list, run it. Run it again, each time with 10x tries. Damn, only x will fit. Increase, Increase. Try again. "

I have not yet redone my code to fit your new "no validation" I assume a lot of my tries now will fit much nicer. ;)

from word-search-generator.

nicolesimon13 commented on September 26, 2024

Pre-built Validators

NoSingleLetterWords

NoPunctuation

NoPalindromes

NoSubwords

Yeah I am taking care of that in my workflow before the running of the code. And all of them are pretty much what I want in my list, but again, that is my concious decision whereas for a normal puzzle this is good.

from word-search-generator.

nicolesimon13 commented on September 26, 2024

I am sharing this purely as fyi, i tried Josh's code but it did not work for me plus it did not have all i needed.

This is not meant for Josh to put time into it.
I am not a good programmer. I can script and use chatgpt - so my stuff runs, but that is about it.
I am attaching mine so you can run it if you want to.
it is built with the latest installed as Josh mentioned above

Why not go with "Joshs' random words shows ..."? Because while that is true in general it does not help for a specific word list.

tl;dir, my conclusion:
Everything depends on your word list and your grid size (all bets are off if you use masks)

run stats first if your chosen word list even fits (well) into your chosen grid size
figure out what is your good zone (aka how many diagonals do you want)
adapt the code to rerun the puzzle until you get what you want.

Also please do not forget what Josh said in the other comment "what determines a good puzzle is up to you"
Greeting from Berlin
Nicole

1) I build a script that will create puzzles and give stats on the direction

This code will take your word list and run 50 times and create an output on screen as well as into a file (tab delimited)
e_placement test2.py.txt

The output shows the distribution per puzzle creation across the directions plus a summary normal vs diagonal.
There is a loop to try for x times to have a puzzle first which has placed all the words.

2) I then build a script which will make a diagram out of it
e_diagram3.py.txt
reads the produced r_stats_placement.txt

Good and bad are defined as how many entries are inside the gold zone as defined by the parameter.

3) I ran this with my word list (grid 15, level 3, 26 words) for 5000 times
r_stats_placement_5000.txt
The list has two lines with 25 words, everything else is 26 words.
Yes grid size 15 is a tight grid but it is a default size - and as you can see it can go into the good zone, but you need to check for it.

I ran this with validation off because it interferes with what I want to do. I repeated it for the list without that:

And because I was curios:
grid size 50, the 118 elements, validators=None, 100 run
r_stats_placement_element50.txt

grid size 40, the 118 elements, validators=None, 100 run
r_stats_placement_40.txt

So more space means more even distribution, thus
running the initial list on a size 40 grid 500x
r_stats_placement_500ol.txt

from word-search-generator.

joshbduncan commented on September 26, 2024

So here are a few thoughts...

Yes, a larger size obviously allows more room for diagonal placement. What I meant in my earlier comment is that size isn't always the determining factor.
Possible diagonals will always be limited (compared to "regular") directions due to a few factors, including the puzzle having boundaries and word overlap. For example in the table below, no matter the random position chosen by the generator, there would only be a very small amount of words that could fit diagonally (overlap conflicts, boundary limitations) but there are numerous where another word could fit in a "normal" direction.

* * * * *
B A T * *
* T E S *
* * * E *
* * * T *

If you plan to do any sort of statistics with tabular style data, I recommend using Pandas. I rewrote your script using pandas below. I have also included a sample of the output it generated. This uses your exact wordlist of 26 words from above ("ARTICHOKE,BACON,BEANS...")

from collections import defaultdict

import pandas as pd
from word_search_generator import WordSearch
from word_search_generator.core.word import Direction

POSSIBLE_DIRECTIONS = [d.name for d in Direction]


def run_tests(
    words, runs, puzzle_size, puzzle_level, max_tries
) -> list[tuple[dict, int]]:
    results = []
    for _ in range(runs):
        counts: dict[str, int] = defaultdict(int)
        p = WordSearch(words, size=puzzle_size, level=puzzle_level, validators=None)

        tries = 1
        while True:
            if not p.unplaced_words or tries >= max_tries:
                break
            p.generate()
            tries += 1

        for word in p.placed_words:
            if word.direction is None:
                continue
            counts[word.direction.name] += 1

        results.append((counts, tries))

    return results


if __name__ == "__main__":
    # set defaults
    words = open("words.txt").read()
    runs = 5000
    puzzle_size = 15
    puzzle_level = 3
    max_tries = 15

    # run tests
    test_results = run_tests(words, runs, puzzle_size, puzzle_level, max_tries)

    # format data for pandas
    data: dict[str, list[int]] = defaultdict(list)
    for row in test_results:
        counts, tries = row
        data["tries"].append(tries)
        for d in POSSIBLE_DIRECTIONS:
            data[d].append(counts[d])

    # load data
    normal_dirs = ["N", "E", "S", "W"]
    diagonal_dirs = ["NE", "SE", "SW", "NW"]
    df = pd.DataFrame(data)
    df["Placed"] = sum(df[d] for d in POSSIBLE_DIRECTIONS)
    df["Normal %"] = (sum(df[d] for d in normal_dirs) / df["Placed"]) * 100
    df["Diagonal %"] = (sum(df[d] for d in diagonal_dirs) / df["Placed"]) * 100

    # present data

    # print the entire table
    # print(df.round(2))

    # print only summary info
    print(df.describe().round(2))

    # save data
    df.to_string("test_results.txt")
    df.describe().round(2).to_string("tests_summary.txt")

# test results (truncated)
      tries   N  NE   E  SE   S  SW   W  NW  Placed    Normal %  Diagonal %
0         2   1   4   2   4   3   2   8   2      26   53.846154   46.153846
1         1   7   1   3   0   1   2  11   1      26   84.615385   15.384615
2         4   6   1   3   2   3   2   8   1      26   76.923077   23.076923
3         3   7   1   6   2   4   1   4   1      26   80.769231   19.230769
4         1   3   3  10   0   2   3   5   0      26   76.923077   23.076923
..      ...  ..  ..  ..  ..  ..  ..  ..  ..     ...         ...         ...
4995      2   9   1   3   0  11   0   2   0      26   96.153846    3.846154
4996      1  11   0   3   2   7   0   3   0      26   92.307692    7.692308
4997      1  11   7   1   0   5   1   1   0      26   69.230769   30.769231
4998      1   7   2   9   0   0   0   7   1      26   88.461538   11.538462
4999      1   0   1  13   1   2   0   7   2      26   84.615385   15.384615

# tests summary stats
         tries        N       NE        E       SE        S       SW        W       NW  Placed  Normal %  Diagonal %
count  5000.00  5000.00  5000.00  5000.00  5000.00  5000.00  5000.00  5000.00  5000.00  5000.0   5000.00     5000.00
mean      2.18     5.03     1.51     5.05     1.46     5.01     1.50     4.97     1.46    26.0     77.16       22.84
std       1.59     2.62     1.68     2.57     1.62     2.59     1.67     2.54     1.61     0.0     14.10       14.10
min       1.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00    26.0     23.08        0.00
25%       1.00     3.00     0.00     3.00     0.00     3.00     0.00     3.00     0.00    26.0     65.38       11.54
50%       2.00     5.00     1.00     5.00     1.00     5.00     1.00     5.00     1.00    26.0     76.92       23.08
75%       3.00     7.00     2.00     7.00     2.00     7.00     2.00     7.00     2.00    26.0     88.46       34.62
max      14.00    17.00    11.00    15.00    12.00    16.00    11.00    15.00    11.00    26.0    100.00       76.92

Really, no matter how many tests I ran with your defaults, the results ranged between 72-78% "normal" and 28-22% "diagonal".

With these results, I ran another type of test. Basically, I hijacked to default generator to gain more info on the valid placements for each random position chosen by the generator.

For a test of 500 puzzles (without any retries), the generator tried 12,941 random positions (25.88 per puzzle). On average there were 1.76 valid directions per random position with 70% of those being "normal" directions and 30% being "diagonal" directions.

So, no matter the size of the puzzle, you will always have a limited number of "diagonal" direction words in comparison to "normal" direction words.

Total Word Placement Attempts: 12941
Average Valid Directions per Random Position 1.76
Average Valid NORMAL Directions per Random Position 1.22
Average Valid NORMAL Directions per Random Position 0.53

If you are in need of a specific distribution, you could always write your own generator using the Generator abstract base class. Your custom generator could try "diagonals" first, instead of picking a direction at random. Please note, forcing "diagonals" take many more tries and processing. As you can see from the stats below, forcing "diagonals" uses all available retries and most often doesn't place all words.

# test results (truncated), using level=7 "diagonals" only
     tries  N  NE  E  SE  S  SW  W  NW  Placed  Normal %  Diagonal %
0       15  0   4  0   4  0   9  0   5      22       0.0       100.0
1       15  0   6  0  11  0   1  0   6      24       0.0       100.0
2       15  0   8  0   3  0   8  0   4      23       0.0       100.0
3       15  0   1  0   9  0   3  0   8      21       0.0       100.0
4       15  0   8  0   1  0  12  0   4      25       0.0       100.0
..     ... ..  .. ..  .. ..  .. ..  ..     ...       ...         ...
495     15  0   6  0   2  0   4  0  10      22       0.0       100.0
496     15  0  14  0   1  0   9  0   0      24       0.0       100.0
497     15  0   5  0   6  0   4  0   7      22       0.0       100.0
498     15  0   7  0   4  0  10  0   2      23       0.0       100.0
499     15  0   9  0   7  0   2  0   6      24       0.0       100.0

# tests summary stats
        tries      N      NE      E      SE      S      SW      W      NW  Placed  Normal %  Diagonal %
count  500.00  500.0  500.00  500.0  500.00  500.0  500.00  500.0  500.00  500.00     500.0       500.0
mean    14.31    0.0    5.96    0.0    5.74    0.0    5.94    0.0    5.89   23.54       0.0       100.0
std      2.44    0.0    3.34    0.0    3.18    0.0    3.36    0.0    3.33    1.15       0.0         0.0
min      1.00    0.0    0.00    0.0    0.00    0.0    0.00    0.0    0.00   21.00       0.0       100.0
25%     15.00    0.0    3.00    0.0    3.00    0.0    3.00    0.0    3.00   23.00       0.0       100.0
50%     15.00    0.0    6.00    0.0    6.00    0.0    6.00    0.0    6.00   23.00       0.0       100.0
75%     15.00    0.0    8.00    0.0    8.00    0.0    8.00    0.0    8.00   24.00       0.0       100.0
max     15.00    0.0   21.00    0.0   17.00    0.0   17.00    0.0   15.00   26.00       0.0       100.0

Yes, masks definitely limit the generator since they are always reducing the available are for words.

from word-search-generator.

Despite space, its not placing words? about word-search-generator HOT 14 CLOSED

Comments (14)

Pre-built Validators

Custom Validator Example

Pre-built Validators

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent