Giter Club home page Giter Club logo

scandir's Introduction

scandir, a better directory iterator and faster os.walk()

scandir on PyPI (Python Package Index) GitHub Actions Tests

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you're lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI, install it with pip install scandir, and then do something like this in your code:

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471, which is the PEP that proposes including scandir in the Python standard library, was accepted in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.7+ and Python 3.4+ (and it has been tested on those versions).

Background

Python's built-in os.walk() is significantly slower than it needs to be, because -- in addition to calling listdir() on each directory -- it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we're not talking about micro-optimizations. See more benchmarks in the "Benchmarks" section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They're pretty easy to use, but see "The API" below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

System version Python version Times as fast
Windows 7 64-bit 2.7.7 64-bit 10.4
Windows 7 64-bit SSD 2.7.7 64-bit 10.3
Windows 7 64-bit NFS 2.7.6 64-bit 36.8
Windows 7 64-bit SSD 3.4.1 64-bit 9.9
Windows 7 64-bit SSD 3.5.0 64-bit 9.5
Ubuntu 14.04 64-bit 2.7.6 64-bit 5.8
Mac OS X 10.9.3 2.7.5 64-bit 3.8

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()

The API for scandir.walk() is exactly the same as os.walk(), so just read the Python docs.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here. But below is a brief summary as well.

scandir(path='.') -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system's directory iteration system calls to get the names of the files in the given path, but it's different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.
  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry's filename, relative to the scandir path argument (corresponds to the return values of os.listdir)
  • path: the entry's full path name (not necessarily an absolute path) -- the equivalent of os.path.join(scandir_path, entry.name)
  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases; don't follow symbolic links if follow_symlinks is False
  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases; don't follow symbolic links if follow_symlinks is False
  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases
  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don't follow symbolic links (like os.lstat()) if follow_symlinks is False
  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here's a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir
  • PEP 471, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library -- a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

scandir's People

Contributors

ahvigil avatar arguile- avatar avylove avatar bashu avatar benhoyt avatar bfrisbie-tanium avatar cielavenir avatar gst avatar htoothrot avatar kianmeng avatar lowks avatar marianielias avatar poupas avatar prashanthpai avatar r4scal avatar ronnypfannschmidt avatar rp-tanium avatar segevfiner avatar thomaswaldmann avatar tjguk avatar vstinner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scandir's Issues

A method based on scandir that could be help others

Thanks development team, to improve so much, this library for our comunity.
I'm working with it, and for my purposes, I had to modify the version of scandir.walk a little bit.
The method has to accomplishes the following goals:

  • It will work in a similiar way like scandir.walk with topdown=True,followlinks=False
  • It will work with a level option to perform the depth-search.
  • It will work like a find command if I specified pattern to match files (it could match folders later too)

And this is my method for above items that I'm using:
https://gist.github.com/d555/c4cd064f19692b57ccdd

from scandir import scandir, islink
import os
import re


def walk(top, level=None, regex=None):
    """A modification of scandir.walk for perform
    a topdown search with level of depth and search pattern option for files
    """
    dirs = []
    nondirs = []

    if isinstance(regex, str):
        regex = re.compile(regex)

    try:
        scandir_it = scandir(top)
    except Exception:
        return

    while True:
        try:
            try:
                entry = next(scandir_it)
            except StopIteration:
                break
        except Exception:
            return

        try:
            is_dir = entry.is_dir()
        except OSError:
            is_dir = False

        if is_dir:
            dirs.append(entry.name)
        else:
            if regex is not None and hasattr(regex, 'match'):
                if regex.match(entry.name):
                    nondirs.append(entry.name)
            else:
                nondirs.append(entry.name)

    yield top, dirs, nondirs
    if level is not None:
        assert isinstance(level, int)
        if not level > 0:
            return

    for name in dirs:
        new_path = os.path.join(top, name)
        if islink(new_path):
            continue
        if isinstance(level, int):
            level -= 1
        for entry in walk(new_path, level, regex):
            yield entry


def main():
    path = os.path.expanduser('~')
    i = 1
    regex = re.compile('.*txt$')
    for r, _, f in walk(path, regex=regex):
        for file in f:
            print i, file
            if file:
                i += 1


if __name__ == "__main__":
    main()

and If I use, the entry given by scandir:

https://gist.github.com/d555/6bd50464756412505a30

Any suggestion? I will appreciate it.

Question: commit: e21a4f781a881168041589cd2887903a6baffaf2 (line 118)

I'm not very familiar with Python C extensions - does FORMAT_EXCEPTION(PyExc_ValueError, "%s too long for Windows"); print out the actual string that caused the exception in the traceback? If so, it might be a bit of a formatting issue, as a 32767-length str is certainly larger than the (single-screen) buffer in a windows command prompt.

Try for _ in range(32767): print('a', end='') for example.

(Keep up the great work!)

misleading error message

Setup to reproduce: using windows, python27 and the current version of scandir.
Create a few directories, one with german umlauts, e.g Aufträge.

mkdir c:\devel\playground\auftraege
mkdir c:\devel\playground\aufträge

Now let scandir work with the directories

C:\>python -c "import scandir;list(scandir.walk('c:/devel/playground'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Python27\lib\site-packages\scandir.py", line 654, in walk
    for entry in walk(new_path, topdown, onerror, followlinks):
  File "C:\Python27\lib\site-packages\scandir.py", line 594, in walk
    scandir_it = scandir(top)
TypeError: os.scandir() doesn't support bytes path on Windows, use Unicode instead

When searching scandir.py for the error message: It's thrown at

  • line 147, which belongs to checking for PY3 on Win32.
if IS_PY3 and sys.platform == 'win32':
    def scandir_generic(path=u'.'):
        if isinstance(path, bytes):
            raise TypeError("os.scandir() doesn't support bytes path on Windows, use Unicode instead")
        return _scandir_generic(path)
    scandir_generic.__doc__ = _scandir_generic.__doc__
else:
    scandir_generic = _scandir_generic
  • line 381, which belongs to checking if PY3 inside the whole if WIN32
        if IS_PY3:
            def scandir_python(path=u'.'):
                if isinstance(path, bytes):
                    raise TypeError("os.scandir() doesn't support bytes path on Windows, use Unicode instead")
                return _scandir_python(path)
            scandir_python.__doc__ = _scandir_python.__doc__
        else:
            scandir_python = _scandir_python

Unexpected Error: undefined reference to 'GetFinalPathNameByHandleW'

Looks like MinGW gcc won't compile scandir on my Win 7 64bit + Python 2.7.3, it gives error: undefined reference to 'GetFinalPathNameByHandleW', any idea how to fix this?

Installing collected packages: scandir
  Running setup.py install for scandir ... error
    Complete output from command "d:\program files (x86)\python2.7.3\python.exe"
 -u -c "import setuptools, tokenize;__file__='c:\\users\\shane\\appdata\\local\\
temp\\pip-build-5n6ehf\\scandir\\setup.py';f=getattr(tokenize, 'open', open)(__f
ile__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__
, 'exec'))" install --record c:\users\shane\appdata\local\temp\pip-klqudr-record
\install-record.txt --single-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-2.7
    copying scandir.py -> build\lib.win32-2.7
    running build_ext
    building '_scandir' extension
    creating build\temp.win32-2.7
    creating build\temp.win32-2.7\Release
    D:\MinGW\mingw32\bin\gcc.exe -mdll -O -Wall "-Id:\program files (x86)\python
2.7.3\include" "-Id:\program files (x86)\python2.7.3\PC" -c _scandir.c -o build\
temp.win32-2.7\Release\_scandir.o
    In file included from _scandir.c:21:0:
    winreparse.h:45:0: warning: "MAXIMUM_REPARSE_DATA_BUFFER_SIZE" redefined
     #define MAXIMUM_REPARSE_DATA_BUFFER_SIZE  ( 16 * 1024 )
     ^
    In file included from D:/MinGW/mingw32/i686-w64-mingw32/include/minwindef.h:
163:0,
                     from D:/MinGW/mingw32/i686-w64-mingw32/include/windef.h:8,
                     from D:/MinGW/mingw32/i686-w64-mingw32/include/windows.h:69
,
                     from _scandir.c:20:
    D:/MinGW/mingw32/i686-w64-mingw32/include/winnt.h:4610:0: note: this is the
location of the previous definition
     #define MAXIMUM_REPARSE_DATA_BUFFER_SIZE (16 *1024)
     ^
    _scandir.c: In function 'get_target_path':
    _scandir.c:194:16: warning: implicit declaration of function 'GetFinalPathNa
meByHandleW' [-Wimplicit-function-declaration]
         buf_size = GetFinalPathNameByHandleW(hdl, 0, 0,
                    ^
    _scandir.c: In function 'path_converter':
    _scandir.c:755:23: warning: too many arguments for format [-Wformat-extra-ar
gs]
         PyErr_Format(exc, "%s%s" fmt, \
                           ^
    _scandir.c:799:13: note: in expansion of macro 'FORMAT_EXCEPTION'
                 FORMAT_EXCEPTION(PyExc_ValueError, "embedded null character");
                 ^
    writing build\temp.win32-2.7\Release\_scandir.def
    D:\MinGW\mingw32\bin\gcc.exe -shared -s build\temp.win32-2.7\Release\_scandi
r.o build\temp.win32-2.7\Release\_scandir.def "-Ld:\program files (x86)\python2.
7.3\libs" "-Ld:\program files (x86)\python2.7.3\PCbuild" -lpython27 -o build\lib
.win32-2.7\_scandir.pyd
    build\temp.win32-2.7\Release\_scandir.o:_scandir.c:(.text+0xff0): undefined
reference to `GetFinalPathNameByHandleW'
    build\temp.win32-2.7\Release\_scandir.o:_scandir.c:(.text+0x1042): undefined
 reference to `GetFinalPathNameByHandleW'
    collect2.exe: error: ld returned 1 exit status
    error: command 'gcc' failed with exit status 1

Crash under PyPy 2.3

Under PyPy 2.3, I'm seeing a crash whenever I attempt to use scandir:

RPython traceback:
  File "pypy_module_cpyext_object.c", line 1485, in _PyObject_New
  File "pypy_module_cpyext_object.c", line 5703, in _PyObject_NewVar
  File "pypy_module_cpyext_pyobject.c", line 112, in from_ref
Fatal RPython error: AssertionError
Aborted (core dumped)

This is with:

Python 2.7.6 (394146e9bb673514c61f0150ab2013ccf78e8de7, May 09 2014, 08:05:14)
[PyPy 2.3.0 with GCC 4.8.2]

And the trivial example:

import scandir

for cur in scandir.scandir('/'):
        print cur.name, cur.is_symlink()

Possible memory leak?

I was just testing scandir.walk on a big directory structure. The structure is a top-level directory with about 16K subdirectories directly underneath it, each of which just holds files and no further subdirectories. There are a total of about 150K files.

The speed improvement over os.walk is great - 3 seconds rather than 11 seconds on a local disk (I've not tried it on the NAS drive yet :-) This is just for a simple test:

start = time.perf_counter()
l = list(scandir.walk('C:\\Test\\TestCollection'))
end = time.perf_counter()
print(end-start)

However, I noticed when looking at the memory usage, that the working set of python.exe increases by about 100M each time I run the test. In contrast, a version using os.walk uses a constant 50M no matter how many times I run the test.

The higher memory usage is fine, easily explained by the fact that we're using DirEntry objects rather than simple strings. But the memory growth is worrying, as it imples something isn't being garbage collected. I tried a gc.collect() but that made no difference.

This is on Windows 7, 64-bit, using Python 3.4 and the latest version of scandir from PyPI built myself via pip install scandir.

Installation on Windows

I've made of scandir one of my essential libraries for my package aTXT https://pypi.python.org/pypi/aTXT , available in pypi. The issue is about how this dependency is affecting the cross-plataform issues for others developers. I mean, In a unix environment like ubuntu, or even OSX, an installation from pip install aTXT ( an this actually call to install scandir) performs well. But If I try to install on Windows (average installation settings), fails with errors about missing libraries,(vcbat) or anothers. Of course, after complete the windows requirements, scandir works and then aTXT too. I have users using my library in all these operative systems. What is the best approach to deal with scandir on windows, something like a compiled version? Something to recommend me? Thanks in advance.

doesn't handle `dirent.d_type == DT_UNKNOWN` correctly

The Unixy implementation of scandir doesn't correctly handle dirent.d_type being DT_UNKNOWN, which happens when the particular filesystem being walked does not store file type information in its directories. (For instance, this is likely to happen on VFAT file systems, as are still commonly used for USB sticks and memory cards, and on CD-ROMs.)

The readdir(3) manpage reads in part:

If the file type could not be determined, the value DT_UNKNOWN is returned in d_type.

Currently, only some file systems (among them: Btrfs, ext2, ext3, and ext4) have full support for returning the file type in d_type. All applications must properly handle a return of DT_UNKNOWN.

What you need to do is check for d_type == DT_UNKNOWN for each and every directory entry, and fall back to lstat.

I don't know, but I would not be at all surprised if the Windows APIs you're using have a similar requirement.

test_walk fails on Linux

======================================================================
FAIL: test_traversal (test_walk.WalkTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Work\scandir\tests\test_walk.py", line 66, in test_traversal
    self.assertEqual(all[3 - 2 * flipped], sub2_tree)
AssertionError: Tuples differ: ('C:\\Work\\scandir\\tests\\te... != ('C:\\Work\\scandir\\tests\\te...

First differing element 1:
[]
['link']

- ('C:\\Work\\scandir\\tests\\temp\\TEST1\\SUB2', [], ['link', 'tmp3'])
?                                                 ----

+ ('C:\\Work\\scandir\\tests\\temp\\TEST1\\SUB2', ['link'], ['tmp3'])
?                                                        +  +

scandir breaks on unicode-named directories in Python 2

e.g.

list(scandir.scandir(u"\N{EURO SIGN}"))
>>> LookupError: unknown error handler name 'surrogateescape'

Root of the issue is here:

#if PY_MAJOR_VERSION >= 3
        if (!PyUnicode_FSConverter(unicode, &bytes))
            bytes = NULL;
#else
        bytes = PyUnicode_AsEncodedString(unicode, "iso-8859-1", "surrogateescape");
#endif

Python 2's codec does not have a surrogateescape error handling function by default.

Also, is there a good reason to use iso-8859-1 here?

python3: import _scandir raises ImportError

Tried to install for python3, got the following during build:

gcc -pthread -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.3m -c _scandir.c -o build/temp.linux-x86_64-3.3/_scandir.o
In file included from /usr/include/python3.3m/Python.h:112:0,
                 from _scandir.c:7:
/usr/include/python3.3m/modsupport.h:29:1: warning: ‘PyArg_ParseTuple’ is an unrecognized format function type [-Wformat=]
 PyAPI_FUNC(int) PyArg_ParseTuple(PyObject *, const char *, ...)   Py_FORMAT_PARSETUPLE(PyArg_ParseTuple, 2, 3);
 ^
_scandir.c: In function ‘scandir_helper’:
_scandir.c:277:9: warning: implicit declaration of function   ‘PyString_FromStringAndSize’ [-Wimplicit-function-declaration]
         v = PyString_FromStringAndSize(ep->d_name, NAMLEN(ep));
         ^
_scandir.c:277:11: warning: assignment makes pointer from integer without a     cast [enabled by default]
         v = PyString_FromStringAndSize(ep->d_name, NAMLEN(ep));
           ^
gcc -pthread -shared -Wl,-O1,--sort-common,--as-needed,-z,relro -Wl,-O1,--sort-common,--as-needed,-z,relro -Wl,-O1,--sort-common,--as-needed,-z,relro -march=x86-64 -mtune=generic -O2 -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 build/temp.linux-x86_64-3.3/_scandir.o -L/usr/lib -lpython3.3m -o build/lib.linux-x86_64-3.3/_scandir.cpython-33m.so

Though /usr/lib/python3.3/site-packages/_scandir.cpython-33m.so is created and manually trying to import _scandir I get:

>>> import _scandir
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: /usr/lib/python3.3/site-packages/_scandir.cpython-33m.so:   undefined symbol: PyString_FromStringAndSize

clarify python version compatibility

I couldn't find any docs on which python version this code is expected to work with. Looking at the code, it seems it might work with 2.6+ and 3.2+?

A quick note in the readme and/or some version-specific tags in setup.py would be helpful.

Thanks for the great project, looks really useful.

UnicodeDecodeError: 'ascii'

in my Mac OSX when I tried to make a scandir(path) and the tree of path contains some subdirectories with accents, like ó á í, the scan raise a error for this kind of names. I actually write the next:

for entry in scandir.scandir(top):
    if entry.is_dir():
         print entry.path

I get the next error:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/helpers/pydev/pydevd.py", line 1733, in <module>
    debugger.run(setup['file'], None, None)
  File "/Applications/PyCharm.app/helpers/pydev/pydevd.py", line 1226, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/usuario/PycharmProjects/p/revisarDir.py", line 65, in <module>
    for a, b, c in mywalk(dir, topdown = True, skipdirs = sindir):
  File "/Users/usuario/PycharmProjects/p/revisarDir.py", line 33, in mywalk
    print entry.path
  File "/usr/local/lib/python2.7/site-packages/scandir.py", line 445, in path
    self._path = join(self._directory, self.name)
  File "/usr/local/Cellar/python/2.7.8/Frameworks/Python.framework/Versions/2.7/lib/python2.7/posixpath.py", line 78, in join
    path +=  b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xcc in position 38: ordinal not in range(128)

But, the next code works perfectly

for entry in scandir.scandir(top):
    if entry.is_dir():
         print os.path.join(entry.name, top)

So, my suggest it just use the os.path.join in the definition of property for class:

@property
    def path(self):
        if self._path is None:
            self._path = join(self._directory, self.name)
        return self._path

to

@property
    def path(self):
        if self._path is None:
            self._path = os.path.join(self._directory, self.name)
        return self._path

Although, looking for that in find out that actually you did it from os.path import join, so please review it. :)

Install results in 'bad reloc address 0x0 in section' error

When I try to install scandir via Pycharm or command line, I get the following mess. Any idea what I can do to fix? Thanks in advance.

System:

setup.py install output follows:

C:\Python34\scandir>setup.py install
running install
running build
running build_py
running build_ext
building '_scandir' extension
C:\MinGW\bin\gcc.exe -mdll -O -Wall -IC:\Python34\include -IC:\Python34\include
-c _scandir.c -o build\temp.win-amd64-3.4\Release\_scandir.o
_scandir.c:212:1: warning: 'win32_error_unicode' defined but not used [-Wunused-
function]
 win32_error_unicode(char* function, Py_UNICODE* filename)
 ^
writing build\temp.win-amd64-3.4\Release\_scandir.def
C:\MinGW\bin\gcc.exe -shared -s build\temp.win-amd64-3.4\Release\_scandir.o build\temp.win-amd64-3.4\Release\_scandir.def -LC:\Python34\libs -LC:\Python34\PCbui
ld\amd64 -lpython34 -lmsvcr100 -o build\lib.win-amd64-3.4\_scandir.pyd
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x4a): undefined reference to `_imp__PyEval_SaveThread'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x60): undefined reference to `_imp__PyEval_RestoreThread'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x7c): undefined reference to `_imp__PyObject_Free'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0xb5): undefined reference to `_imp___Py_NoneStruct'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x101): undefined reference to `_imp__PyExc_TypeError'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x10c): undefined reference to `_imp__PyErr_Format'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x12b): undefined reference to `_imp___Py_NoneStruct'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x148): undefined reference to `_imp__PyUnicode_FromObject'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x16a): undefined reference to `_imp__PyUnicode_AsUnicodeAndSize'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x1d6): undefined reference to `_imp__PyExc_ValueError'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x1e1): undefined reference to `_imp__PyErr_Format'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x22f): undefined reference to `_imp__PyErr_Clear'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x247): undefined reference to `_imp__PyBytes_FromObject'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x259): undefined reference to `_imp__PyErr_Clear'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x25f): undefined reference to `_imp__PyErr_Occurred'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x2ab): undefined reference to `_imp__PyExc_TypeError'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x2b6): undefined reference to `_imp__PyErr_Format'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x362): undefined reference to `_imp__PyExc_ValueError'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x36d): undefined reference to `_imp__PyErr_Format'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x3f7): undefined reference to `_imp__PyExc_ValueError'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x402): undefined reference to `_imp__PyErr_Format'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x477): undefined reference to `_imp__PyExc_DeprecationWarning'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x482): undefined reference to `_imp__PyErr_WarnEx'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x4f3): undefined reference to `_imp__PyArg_ParseTupleAndKeywords'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x51f): undefined reference to `_imp__PyErr_NoMemory'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x59b): undefined reference to `_imp__PyErr_NoMemory'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x653): undefined reference to `_imp___PyObject_New'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x736): undefined reference to `_imp__PyExc_TypeError'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x741): undefined reference to `_imp__PyErr_Format'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x779): undefined reference to `_imp__PyEval_SaveThread'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x7a8): undefined reference to `_imp__PyEval_RestoreThread'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x7ca): undefined reference to `_imp__PyErr_SetFromWindowsErr'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x7e1): undefined reference to `_imp__PyEval_SaveThread'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x810): undefined reference to `_imp__PyEval_RestoreThread'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x82c): undefined reference to `_imp__PyErr_SetFromWindowsErr'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x876): undefined reference to `_imp__PyExc_StopIteration'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x881): undefined reference to `_imp__PyErr_SetNone'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x898): undefined reference to `_imp__PyStructSequence_New'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x8fe): undefined reference to `_imp__PyLong_FromLong'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x952): undefined reference to `_imp__PyLong_FromUnsignedLongLong'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x96e): undefined reference to `_imp__PyFloat_FromDouble'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0x9fc): undefined reference to `_imp__PyLong_FromUnsignedLong'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0xa05): undefined reference to `_imp__PyErr_Occurred'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0xa2b): undefined reference to `_imp__PyErr_SetFromWindowsErr'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0xa55): undefined reference to `_imp__Py_BuildValue'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0xa86): undefined reference to `_imp__PyModule_Create2'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0xa99): undefined reference to `_imp__PyType_Ready'
build\temp.win-amd64-3.4\Release\_scandir.o:_scandir.c:(.text+0xabc): undefined reference to `_imp__PyStructSequence_InitType'
c:/mingw/bin/../lib/gcc/mingw32/4.8.1/../../../../mingw32/bin/ld.exe: build\temp.win-amd64-3.4\Release\_scandir.o: bad reloc address 0x0 in section `.data'
collect2.exe: error: ld returned 1 exit status
error: command 'C:\\MinGW\\bin\\gcc.exe' failed with exit status 1

The benchmark test doesn't work

C:\Users\username\Documents\Python Scripts\scandir>python benchmark.py
Creating tree at benchtree: depth=4, num_dirs=5, num_files=50
Using fast C version of scandir
Comparing against builtin version of os.walk()
Priming the system's cache...
Traceback (most recent call last):
File "benchmark.py", line 255, in
benchmark(tree_dir, get_size=options.size)
File "benchmark.py", line 160, in benchmark
do_scandir_walk()
File "benchmark.py", line 155, in do_scandir_walk
for root, dirs, files in scandir.walk(path):
File "C:\Users\username\Documents\Python Scripts\scandir\scandir.py", line 604, in walk
for entry in scandir(top):
File "C:\Users\username\Documents\Python Scripts\scandir\scandir.py", line 405, in scandir_c
for name, stat in scandir_helper(path):
TypeError: iter() returned non-iterator of type 'tuple'

C:\Users\username\Documents\Python Scripts\scandir>

It's not really an issue with your code, but there's an UnicodeDecodeError

I was running this module against my entire Mac filesytem. It's blindingly fast!

However, it chokes on a file which appears to be related in a bug in python

Here's the stack trace.

Traceback (most recent call last):
File "runner.py", line 17, in
main()
File "runner.py", line 13, in main
finder.find()
File "/Users/bkotch/sorted/lib/filefinder.py", line 33, in find
for root, subfolders, files in scandir.walk(self.root):
File "/Library/Python/2.7/site-packages/scandir.py", line 490, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/Library/Python/2.7/site-packages/scandir.py", line 490, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/Library/Python/2.7/site-packages/scandir.py", line 490, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/Library/Python/2.7/site-packages/scandir.py", line 490, in walk
for x in walk(new_path, topdown, onerror, followlinks):
File "/Library/Python/2.7/site-packages/scandir.py", line 454, in walk
for entry in scandir(top):
File "/Library/Python/2.7/site-packages/scandir.py", line 431, in scandir
for name, d_type in scandir_helper(unicode(path)):

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 62: ordinal not in range(128)

Here's the relevant code:
try:
import _scandir

    scandir_helper = _scandir.scandir_helper

    def scandir(path='.'):
        for name, d_type in scandir_helper(unicode(path)):
            yield PosixDirEntry(path, name, d_type)

except ImportError:
    pass

Looks like the exception is bubbling up from here.

_scandir extension doesn't compile on Solaris 11

gcc -m64 -fno-strict-aliasing -g -O2 -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python2.7 -c _scandir.c -o build/temp.solaris-2.11-i86pc.64bit-2.7/_scandir.o
_scandir.c: In function '_fi_next':
_scandir.c:484:60: error: 'struct dirent' has no member named 'd_type'
_scandir.c:486:60: error: 'struct dirent' has no member named 'd_type'
_scandir.c:488:1: warning: control reaches end of non-void function
error: command 'gcc' failed with exit status 1

See http://stackoverflow.com/questions/2197918/cross-platform-way-of-testing-whether-a-file-is-a-directory

winreparse.h missing from PyPi package

I attempted to install scandir from pypi:

pip install pypi

But the build failed, it seemed to be missing winreparse.h

[ijt_operation] M:\scratch\ijt_operation>pip install scandir
Collecting scandir
  Using cached scandir-1.0.tar.gz
Installing collected packages: scandir
  Running setup.py install for scandir
    Complete output from command C:\Miniconda\envs\ijt_operation\python.exe -c "
import setuptools, tokenize;__file__='c:\\users\\pbranni\\appdata\\local\\temp\\
pip-build-wlbgua\\scandir\\setup.py';exec(compile(getattr(tokenize, 'open', open
)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record c
:\users\pbranni\appdata\local\temp\pip-20nrru-record\install-record.txt --single
-version-externally-managed --compile:
    running install
    running build
    running build_py
    creating build
    creating build\lib.win32-2.7
    copying scandir.py -> build\lib.win32-2.7
    running build_ext
    building '_scandir' extension
    creating build\temp.win32-2.7
    creating build\temp.win32-2.7\Release
    c:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\BIN\cl.exe /c /nologo
/Ox /MD /W3 /GS- /DNDEBUG -IC:\Miniconda\envs\ijt_operation\include -IC:\Minicon
da\envs\ijt_operation\PC /Tc_scandir.c /Fobuild\temp.win32-2.7\Release\_scandir.
obj
    _scandir.c
    _scandir.c(21) : fatal error C1083: Cannot open include file: 'winreparse.h'
: No such file or directory
    error: command 'c:\\Program Files (x86)\\Microsoft Visual Studio 9.0\\VC\\BI
N\\cl.exe' failed with exit status 2

    ----------------------------------------
    Command "C:\Miniconda\envs\ijt_operation\python.exe -c "import setuptools, t
okenize;__file__='c:\\users\\pbranni\\appdata\\local\\temp\\pip-build-wlbgua\\sc
andir\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().r
eplace('\r\n', '\n'), __file__, 'exec'))" install --record c:\users\pbranni\appd
ata\local\temp\pip-20nrru-record\install-record.txt --single-version-externally-
managed --compile" failed with error code 1 in c:\users\pbranni\appdata\local\te
mp\pip-build-wlbgua\scandir

I found a copy of this file in the github repository, so I copied it to the include directory for the current Miniconda environment.

[ijt_operation] M:\scratch\ijt_operation>copy winreparse.h c:\Miniconda\envs\ijt_operation\include
        1 file(s) copied.

And the pypi installation worked

[ijt_operation] M:\scratch\ijt_operation>pip install scandir
Collecting scandir
  Using cached scandir-1.0.tar.gz
Installing collected packages: scandir
  Running setup.py install for scandir
Successfully installed scandir-1.0

I'm running Miniconda from Continuum, 2.7.9 on 64-bit Windows.

Enhancement suggestion - File / Directory count

Hey, Scandir looks real good. And I'm currently rewriting a directory caching system to use it.

The one issue I have is simply due to the generator nature of the system. Since I have been using listdir, I have been able to do something like:

listings = os.listdir ("/examples")
file_count = len(listing)

etc..

It would be incredibly helpful, if I had some manner to easily get a file, and subdirectory count from scandir, without having to parse through each file.

One feature of the directory caching system is that when it detects a subfolder, it will grab the number of files, and subdirectories in that subdirectory. Currently I am rewriting the system to recursively generate that data, but if there was a quick method through scandir to get that information it would be helpful.

Cut a release/tag

Would make it easier to use this repo as a dependency in my project. Thanks!

Handle on scandir iterator on Windows

On Windows, scandir will keep a handle on the directory being scanned until the iterator is exhausted. This behavior can cause various problems if try to use some filesystem calls like chmod or remove on the directory while the handle is still being kept.

This should be at least documented. Alternatively, it might be interesting to provide a way prematurely end the scan without having to exhaust it and close the handle.

has no attribute 'stat' on Ubuntu.

a.name
'%2Fstore-e%2Fagentadmin%2F%2FpropertyImages%2F15274-7833364-302804021-d64768a64e3cf9bb9e7828579ebef676.jpg'
a.stat()
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'PosixDirEntry' object has no attribute 'stat'

Possible memory leak / ref count leaks in C extension

Create a simple script:

import scandir

while True:
    x = list(scandir.scandir('.'))

When you run it memory usage will continue to climb until interrupted. I observed this on windows and linux. The pure python version does not exhibit the same problem.

I have not yet discovered the cause but while investigating I did come across what I believe are ref counting mistakes (although I could be wrong as I'm not very experienced with the C API):

# Line 177
name_stat = PyTuple_Pack(2, v, find_data_to_statresult(&wFileData));

PyTuple_Pack increments the ref counts for any objects added, but it seems find_data_to_statresult returns a new reference (via PyStructSequence_New).

It could be replaced with:
name_stat = Py_BuildValue("ON", v, find_data_to_statresult(&wFileData));
("N" will steal a reference. I tested this on windows and the reference counts were not climbing as quickly but there still seems to be additional ref count problems.)

# Line 301
name_ino_type = PyTuple_Pack(3, v, FROM_LONG(ep->d_ino), FROM_LONG(ep->d_type));

Possible similar issue here in the linux helper since FROM_LONG will return a new reference. Py_BuildValue("ONN", ...) (I did not test this one.)

UTF8 filename come back mangles from scandir/walk

Bytes of the real filename string

":".join("{:02x}".format(ord(c)) for c in u"C:\Temp\xx鳭僣yy.txt")
'43:3a:5c:54:65:6d:70:5c:78:78:9ced:50e3:79:79:2e:74:78:74'

Write something to file, and check size to confirm

with open(u"C:\Temp\xx鳭僣yy.txt","w") as f: f.write("abc")
op.getsize(u"C:\Temp\xx鳭僣yy.txt")
3L # success!

Lets look for filenames in the directory using scandir (only file in dir):

for f in scandir.scandir_c("C:\Temp"): ":".join("{:02x}".format(ord(c)) for c in f.name)
'78:78:3f:3f:79:79:2e:74:78:74'
for f in scandir.scandir_python("C:\Temp"): ":".join("{:02x}".format(ord(c)) for c in f.name)
'78:78:3f:3f:79:79:2e:74:78:74'
for f in scandir.scandir_generic("C:\Temp"): ":".join("{:02x}".format(ord(c)) for c in f.name)
'78:78:3f:3f:79:79:2e:74:78:74'

Note the "3f:3f" is "??", so the filename is being printed as 'xx??yy.txt'

Scandir seems unable to retrieve the UTF8 encoded filename, even though I am able to write to this file and check the size using Python. The standing listdir/walk in OS module also suffer the same problem.

How can I get a directory listing with UTF8 filenames preserved?

scandir on python 2.7.10 is causing a StopIteration on *one* directory only

Ben,

I am using this code successfully on a variety of folders (it's part of a web server). But for some reason, a few days ago, it started to pop a StopIteration error on (so far) only one directory. Please note, this directory has worked fine in the past with scandir.

The directory has currently 2804 items (files + subdirectories).

path, dirs, files = next(scandir.walk(scan_directory))

I just tried isolating it to a particular set of files or directories, but moving them to a subdirectory, and then slowly moving them back. And the issue has stopped.

This issue has occurred consistently for 3 days in this one directory, so I'm not sure why it's being bashful now.

I'm going to file the issue, in case it returns, and to let you know what was happening.
I'll close it, but if for some reason I can't, please feel free to close the issue.

Interaction between mpi4py and scandir

I am encountering an issue when using scandir in a MPI environment and not sure the cause of the issue: the reproducer is as the following test.py, under Python 2.7.9 on a RHEL6 box. Note: that on Mac OS, I can't reproduce.

from mpi4py import MPI
#from scandir import scandir

def main():
    print "hello world"

if __name__ == "__main__": main()

Without importing scandir(), everything is fine:

mpirun -np 2 python test.py

The output is simply two lines of "hello world"

With scandir uncommented, the MPI warning comes in, not critical error, but enough to cause alarms for the user.

--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process.  Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption.  The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The MPI library and utils (mpirun) are from OpenMPI 1.8.4 release. The casual look at the scandir implementation doesn't suggest such fork, is this something to be concerned about?

Thanks

Pull ? long file name on windows

On long file names (+ 255 characters) on windows, it is impossible to browse folders. We can bypass this behavior by adding the prefix: ?.
It would be interesting to implement it no ? :)

see:http://stackoverflow.com/questions/1880321/why-does-the-260-character-path-length-limit-exist-in-windows

error example:

for obj in scandir.scandir(bpath):
OSError: [Errno 3] The system cannot find the path specified: u'c:\\users\\****\\desktop\\data\\l\xe9b\xe04\\igore\\test\\BAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA.pdfffffffff'

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 16: ordinal not in range(128)

The following exception is raised during processing entries returned by scandir.walk()

Traceback (most recent call last):
  File "./reproducer.py", line 39, in <module>
    o = path.replace(dirpath, 'blah')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xeb in position 16: ordinal not in range(128)

Scandir version: Current master branch (03d8b14)

# python -V
Python 2.7.5
# uname -a
Linux saio 3.18.7-100.fc20.x86_64 #1 SMP Wed Feb 11 19:01:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

Here is a sample reproducer:

#!/usr/bin/env python
import os
import random
import scandir


# From openstack swift functional tests.
def create_utf8_name(length=None):
    if length is None:
        length = 15
    else:
        length = int(length)

    utf8_chars = u'\uF10F\uD20D\uB30B\u9409\u8508\u5605\u3703\u1801'\
                 u'\u0900\uF110\uD20E\uB30C\u940A\u8509\u5606\u3704'\
                 u'\u1802\u0901\uF111\uD20F\uB30D\u940B\u850A\u5607'\
                 u'\u3705\u1803\u0902\uF112\uD210\uB30E\u940C\u850B'\
                 u'\u5608\u3706\u1804\u0903\u03A9\u2603'
    return ''.join([random.choice(utf8_chars)
                    for x in xrange(length)]).encode('utf-8')


# Create a directory with UTF8 name
dirname = create_utf8_name()
dirpath = os.path.join('/tmp/reproducer', dirname)
os.makedirs(dirpath)

# This passes
for (path, dirs, files) in os.walk(dirpath):
    print path
    o = path.replace(dirpath, 'blah')
    print o

# This fails
for (path, dirs, files) in scandir.walk(dirpath):
    print path
    o = path.replace(dirpath, 'blah')
    print o

Add inodes to DirEntry objects

I recently needed to speed up code that was finding hardlinks. The first implementation using os.walk and os.stat was too slow, so I wound up writing my on ctypes wrapper around readdir_r. Given that the scandir project is destined for the standard library, it would be fabulous if I could throw away my code an just use scandir. Unfortunately, scandir doesn't expose the inode number from readdir_r. Would it be possible to add the inode number to scandir's DirEntry objects, or at least to PosixDirEntry objects on UNIX-like systems?

Additional windows attributes

I'd like to be able to tell if a file is hidden on windows, but unless I'm mistaken this information is lost when the generic stat object is created (with the C extension at least). Is there a way I'm overlooking? Thoughts?

Thanks.

Maximum depth?

Ben

What about adding a maximum depth? It would be nice to be able to say only go two levels deep, when dealing with recursion.

Enhancement: Make the path a public attribute of DirEntry

I am finding that I often need to pass the path around with a DirEntry. It would be nice to be able to just pass around DirEntry objects as they already contain the path as a private attribute. Making it read-only seems acceptable, in which case the lstat caching wouldn't get any more complicated.

mtime resolution on Windows

It seems just seems like something is going wrong on py 3.4 win10-x64. The resolution is not matching with os.stat, it is truncating everything but seconds:

Py 3.4:

>>> a = list(scandir.scandir('c:\\'))  
>>> os.stat('c:\\bootmgr').st_mtime
1446189514.1278422
>>> a[1].stat().st_mtime
1446189514

Py 3.5: build in

>>> a = list(os.scandir('c:\\'))
>>> a[1].stat().st_mtime
1446189514.1278422

Can't upgrade scandir C extension in place on windows (without uninstalling first)

If you already have scandir installed, the C extension will be imported when setup.py imports scandir to get the version string. This locks the .pyd file, so it can't be overwritten by the install/upgrade process.

You will see an error similar to: error: could not delete 'D:\Python27\Lib\site-packages\_scandir.pyd': Access is denied

scandir fails to compile with PyPy and PyPy3 virtualenvs

I've been testing my package with various combinations of additional optional packages, and in my PyPy and PyPy3 virtual environments, compilation always fails. This is on OS X with PyPy, PyPy3 and CPython 2.7.10 (with pip) installed through Homebrew and with virtualenv installed through pip, and all packages on the pip installation fully updated.

PyPy:

(pypy)Carloss-MacBook-Pro:~ aarzee$ pip install scandir
Collecting scandir
  Using cached scandir-1.1.tar.gz
Building wheels for collected packages: scandir
  Running setup.py bdist_wheel for scandir
  Complete output from command /Users/aarzee/airship/envs/pypy/bin/pypy -c "import setuptools;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/tmpUXh6s1pip-wheel-:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.10-x86_64-2.7
  copying scandir.py -> build/lib.macosx-10.10-x86_64-2.7
  running build_ext
  building '_scandir' extension
  creating build/temp.macosx-10.10-x86_64-2.7
  cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-2.7/_scandir.o
  _scandir.c:17:10: fatal error: 'osdefs.h' file not found
  #include <osdefs.h>
           ^
  1 error generated.
  error: command 'cc' failed with exit status 1

  ----------------------------------------
  Failed building wheel for scandir
Failed to build scandir
Installing collected packages: scandir
  Running setup.py install for scandir
    Complete output from command /Users/aarzee/airship/envs/pypy/bin/pypy -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-Yy7_Eu-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy/include/site/python2.7/scandir:
    running install
    running build
    running build_py
    running build_ext
    building '_scandir' extension
    cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-2.7/_scandir.o
    _scandir.c:17:10: fatal error: 'osdefs.h' file not found
    #include <osdefs.h>
             ^
    1 error generated.
    error: command 'cc' failed with exit status 1

    ----------------------------------------
Command "/Users/aarzee/airship/envs/pypy/bin/pypy -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-Yy7_Eu-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy/include/site/python2.7/scandir" failed with error code 1 in /private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-WoU6on/scandir

PyPy3:

Collecting scandir
  Using cached scandir-1.1.tar.gz
Building wheels for collected packages: scandir
  Running setup.py bdist_wheel for scandir
  Complete output from command /Users/aarzee/airship/envs/pypy3/bin/pypy3 -c "import setuptools;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" bdist_wheel -d /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/tmp_9g_ogpip-wheel-:
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.macosx-10.10-x86_64-3.2
  copying scandir.py -> build/lib.macosx-10.10-x86_64-3.2
  running build_ext
  building '_scandir' extension
  creating build/temp.macosx-10.10-x86_64-3.2
  cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy3/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-3.2/_scandir.o
  _scandir.c:17:10: fatal error: 'osdefs.h' file not found
  #include <osdefs.h>
           ^
  1 error generated.
  error: command 'cc' failed with exit status 1

  ----------------------------------------
  Failed building wheel for scandir
Failed to build scandir
Installing collected packages: scandir
  Running setup.py install for scandir
    Complete output from command /Users/aarzee/airship/envs/pypy3/bin/pypy3 -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-dh64dj-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy3/include/site/python3.2/scandir:
    running install
    running build
    running build_py
    running build_ext
    building '_scandir' extension
    cc -arch x86_64 -O2 -fPIC -Wimplicit -I/Users/aarzee/airship/envs/pypy3/include -c _scandir.c -o build/temp.macosx-10.10-x86_64-3.2/_scandir.o
    _scandir.c:17:10: fatal error: 'osdefs.h' file not found
    #include <osdefs.h>
             ^
    1 error generated.
    error: command 'cc' failed with exit status 1

    ----------------------------------------
Command "/Users/aarzee/airship/envs/pypy3/bin/pypy3 -c "import setuptools, tokenize;__file__='/private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-dh64dj-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/aarzee/airship/envs/pypy3/include/site/python3.2/scandir" failed with error code 1 in /private/var/folders/nj/r2hr0zg5119br99qrpwcgtp80000gn/T/pip-build-2tdqnn/scandir

Thanks, FYI - DirectoryCaching

Ben,

I wanted to let you know about this, since I am using Scandir as a building block for this code. Here's a good example of scandir making a radical performance improvement over os.listdir.

https://github.com/bschollnick/Odds-and-Ends (Look for Directory_Caching.py)

I ended up doing a complete rewrite in comparison to my original os.listdir solution, and scandir dramatically sped up the code mainly due to the reduced stat vs lstat calls, I suspect.

I welcome any feedback, comments, suggestions, if you choose to take a look. But as I mentioned, I just wanted to acknowledge your code's contribution to this solution.

Breaks on broken UTF-8 filenames

With a directory name like
'verio-domains/files/dsid.com.au/htcodes/wp-content/cache/supercache/\xa9'

when using scandir.walk, the entire process chokes and dies with a Unicode decode error on linux when it hits line 536 in scandir.py:

def scandir_c(path=u'.'):
            is_bytes = isinstance(path, bytes)
            for name, d_type in scandir_helper(path):
                if not is_bytes:
                    name = name.decode(file_system_encoding)

Since Linux enforces no particular restrictions on file names, this is a heck of a problem when you need to iterate over a large directory tree of possibly broken filenames. Running on Unix-like systems, scandir should enforce no particular encoding or decoding - the built in Python functions can handle using byte-strings (if passed byte-strings for input) - scandir should do the same.

scandir.stat_result repr 'unnamed field's on Windows

>>> import os, scandir
>>> scandir
<module 'scandir' from 'C:\Python27\lib\site-packages\scandir.pyc'>
>>> d = next(scandir.scandir('C:/Python27'))
>>> os.stat(d.path)
nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=2169L, st_atime=1411551593L, st_mtime=1411551594L, st_ctime=1411551593L)
>>> d.stat()
scandir.stat_result(st_mode=33206L, st_ino=0L, st_dev=0L, st_nlink=0L, st_uid=0L, st_gid=0L, st_size=2169L, unnamed field=1411551593L, unnamed field=1411551594L, unnamed field=1411551593L)

The fields (e.g. st_mtime) still are accessible under their name, so I guess this only concerns the repr.

scandir.stat() call

hi -

I have some question on this particular call - according to the readme:
"like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don't follow symbolic links (like os.lstat()) if follow_symlinks is False"

Does it mean it won't incur system call on Window only, and it will incur system call on other system such as Linux?

If so, it also says "return value cached on DirEntry", is this for only Windows as well? My primary interest here to get the file size information, is it read from cache?

TIA

Support for context managing and scandir.close()

Hi,

First, thank you for this library, super useful.

I saw that Python 3.6 introduced context managing and the close() function to save resources. Would it be possible to add them to the autonomous scandir module as well? I use scandir rigorousy so that would be sweet :)

Release on PyPI

It would be much easier to depend on scandir in my own code if it were registered on PyPI. Even better would be if you published wheels as well. Is there any reason not to release it?

can't find osdefs.h on pypy

On the other hand, if I remove the #include <osdefs.h> it seems to compile fine.

On the other other hand, it should probably use cffi on pypy.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.