If you have a large list of likely users to enumerate with <code class="notra

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This has now been merged into the main branch. Awesome work <a class="user-mention not

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-h

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Run_in_executor pattern over the userlist in enumeration module eats up memory unpredictably with large lists about o365spray HOT 10 CLOSED

Macmod commented on May 30, 2024

Run_in_executor pattern over the userlist in enumeration module eats up memory unpredictably with large lists

from o365spray.

Comments (10)

Macmod commented on May 30, 2024 1

Since this issue hasn't been discussed in a while, if anyone is interested I implemented that possible solution in this commit of my personal fork:

Macmod@0bbaea0

This implementation will allow the usage of big wordlists without the mentioned memory bottleneck. I'm not sure it's the best solution, but it works :)

from o365spray.

0xZDH commented on May 30, 2024 1

I like this approach and if you wanted to open a PR I would be happy to test/review and merge.

The only thing I noticed that would be blocking for a merge is that you are using loop.create_task to execute each enumerator call. In the original code, we are using loop.run_in_executor specifically to leverage the ThreadPoolExecutor to honor the specified threads the user defined via the --rate flag. To my understanding, the updated implementation would not honor the threads so we would need to either move back to the original call to run_in_executor or implement a Semaphore to execute the tasks through.

from o365spray.

0xZDH commented on May 30, 2024 1

@Macmod - Thanks for looking into this further and testing it out a bit. I ran some local tests with the original code base, your updated code base, and another local alternative I wrote that just limited the number of blocking tasks at any given time. While I was able to match your memory management, the overall performance was far superior using your method.

If you want to open up a PR for this change, I would be happy to run some final tests and merge it into the main branch.

Before merging though, I wanted to mention a few things regarding the overall code formatting and style to ensure consistency with the rest of the code base:

If you could document the _consume_threads function and its parameters (see the run() function for docstring formatting).
The threads parameter in the _consume_threads function is typed as a list, but it is actually a Dictionary. Is there a reason for using a dictionary or is it just for easier access to the given Future object when deleting it on completion? Whichever is the cleaner solution, would be great if you could update the type reference.
Lastly, while not a style or formatting suggestion, the newly added --conlimit parameter mentions "Concurrency limit", but the --rate parameter more aligns with the actual concurrency executed within the executor whereas the new flag value relates more to the pool size of the executor. I would suggest maybe using --poolsize to reflect this?
Per the stackoverflow link you mentioned, there is a point mentioning the threads object name is not necessarily accurate and a potential alternative would be to call the object futures.

The above are just suggestions to closer align to the existing code base, but please open a PR and I am happy to merge. Thanks again for your efforts on this - awesome work!

from o365spray.

Macmod commented on May 30, 2024 1

Just a quick note - the same issue probably exists in the sprayer module, but I can't test it right now. Anyway, it's more logic that people will only use large lists with the enumerator module and then perform spraying on top of the valid emails found. If you're interested I can check it another time and send a new PR so we can keep things consistent 😃

from o365spray.

0xZDH commented on May 30, 2024 1

This has now been merged into the main branch. Awesome work @Macmod!

Regarding spraying -- I think for now enumeration is enough as you mentioned that most scenarios won't require massive lists for spraying, only enumeration. If there is a need to update, we can revisit.

from o365spray.

Macmod commented on May 30, 2024

Hey @0xZDH, thanks for the reply =)

At the risk of being too naive, I can't help but to wonder whether a ThreadPoolExecutor is really needed in this particular use case. Maybe we'd get the same performance by just using coroutines and replacing conlimit from my commit for your original rate?

I don't have a clear answer right now, but it's something I've been thinking about.

from o365spray.

0xZDH commented on May 30, 2024

I wrote this with 0 validation or testing, but the idea is to leverage both your concurrency tasks limit and a Semaphore for async concurrency execution limits:

        semaphore = asyncio.Semaphore(args.rate)

        async def async_enumerat(domain: str, user: str, password: str = "Password1"):
            """Async call of enumerate"""
            return self._enumerate(domain=domain, user=user, password=password)

        async def safe_enumerate(domain: str, user: str, password: str = "Password1"):
            """Safe call to enumerate to keep within bounds of concurrency limits"""
            async with semaphore:
                return await async_enumerat(domain=domain, user=user, password=password)

        blocking_tasks = set()

        for user in userlist:
            # Check the concurrency task limit and wait if we reach the upper bound
            # Default: 1,000 concurrency task limit
            if len(blocking_tasks) >= 1000:
                _, blocking_tasks = await asyncio.wait(
                    blocking_tasks,
                    return_when=asyncio.FIRST_COMPLETED,
                )

            # Add new tasks as task pool frees up
            blocking_tasks.add(
                self.loop.create_task(
                    safe_enumerate(
                        domai=domain,
                        use=user,
                        passwor=password,
                    )
                )
            )

from o365spray.

Macmod commented on May 30, 2024

Hey @0xZDH, thanks for the reply =)

At the risk of being too naive, I can't help but to wonder whether a ThreadPoolExecutor is really needed in this particular use case. Maybe we'd get the same performance by just using coroutines and replacing conlimit from my commit for your original rate?

I don't have a clear answer right now, but it's something I've been thinking about.

After some testing I don't think this is the case anymore. Ditching ThreadPoolExecutor in favor of coroutines in this case seems to hurt performance, in either my solution or yours... But take a look at this test using this idea:

import asyncio
import random
import matplotlib.pyplot as plt
from concurrent.futures import ThreadPoolExecutor, wait, FIRST_COMPLETED
import time
import requests

loop = asyncio.get_event_loop()
N = 7000000
CONLIMIT = 1000
RATE = 10

def download(code, completion_times):
    requests.get('http://www.github.com/', headers={'Cache-Control': 'no-cache'})
    completion_times.append((code, loop.time()))
    print(f'Downloaded {code}', completion_times[-1])

def consume(threads: list, max_n: int = CONLIMIT):
    while len(threads) > max_n:
        done, _ = wait(threads, return_when=FIRST_COMPLETED)
        for t in done:
            t.result()
            del threads[t]

async def main(executor):
    completion_times = []

    threads = {}
    for x in range(N):
        future = executor.submit(download, x, completion_times)
        threads[future] = x
        consume(threads)
    consume(threads, 0)

    codes, times = zip(*completion_times)
    plt.scatter(times, codes)
    plt.xlabel('Time (ms)')
    plt.ylabel('Task Code')
    plt.title('Task Completion Times')
    plt.savefig('test.png')

executor = ThreadPoolExecutor(RATE)

try:
    loop.run_until_complete(main(executor))
finally:
    loop.close()

It runs really fast and seems to prevent the issue. I'm going to experiment a little bit more with that idea on o365spray's code and come back later with a pull request if it works.

from o365spray.

Macmod commented on May 30, 2024

I think it works, can you test it so we can maybe proceed to a pull request?

Proposed Fix

from o365spray.

Macmod commented on May 30, 2024

Hey @0xZDH, you're right in your suggestions, I have included them in the PR.
Thanks again for the collaboration! 👍

from o365spray.

Run_in_executor pattern over the userlist in enumeration module eats up memory unpredictably with large lists about o365spray HOT 10 CLOSED

Comments (10)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent