splunk / splunk-sdk-python Goto Github PK

View Code? Open in Web Editor NEW

685.0 104.0 370.0 69.87 MB

Splunk Software Development Kit for Python

Home Page: http://dev.splunk.com

License: Apache License 2.0

Python 99.68% Makefile 0.28% Shell 0.03%

splunk-sdk-python's Introduction

Reference Docs

The Splunk Enterprise Software Development Kit for Python

Version 2.0.2

The Splunk Enterprise Software Development Kit (SDK) for Python contains library code designed to enable developers to build applications using the Splunk platform.

The Splunk platform is a search engine and analytic environment that uses a distributed map-reduce architecture to efficiently index, search, and process large time-varying data sets.

The Splunk platform is popular with system administrators for aggregation and monitoring of IT machine data, security, compliance, and a wide variety of other scenarios that share a requirement to efficiently index, search, analyze, and generate real-time notifications from large volumes of time-series data.

The Splunk developer platform enables developers to take advantage of the same technology used by the Splunk platform to build exciting new applications.

Getting started with the Splunk SDK for Python

Get started with the Splunk Enterprise SDK for Python

The Splunk Enterprise SDK for Python contains library code, and its examples are located in the splunk-app-examples repository. They show how to programmatically interact with the Splunk platform for a variety of scenarios including searching, saved searches, data inputs, and many more, along with building complete applications.

Requirements

Here's what you need to get going with the Splunk Enterprise SDK for Python.

Python 3.7 or Python 3.9

The Splunk Enterprise SDK for Python is compatible with python3 and has been tested with Python v3.7 and v3.9.
Splunk Enterprise 9.2 or 8.2

The Splunk Enterprise SDK for Python has been tested with Splunk Enterprise 9.2, 8.2 and 8.1

If you haven't already installed Splunk Enterprise, download it here. For more information, see the Splunk Enterprise Installation Manual.
Splunk Enterprise SDK for Python

Get the Splunk Enterprise SDK for Python from PyPI. If you want to contribute to the SDK, clone the repository from GitHub.

Install the SDK

Use the following commands to install the Splunk Enterprise SDK for Python libraries. However, it's not necessary to install the libraries to run the unit tests from the SDK.

Use pip:

[sudo] pip install splunk-sdk

Install the Python egg:

[sudo] pip install --egg splunk-sdk

Install the sources you cloned from GitHub:

[sudo] python setup.py install

Testing Quickstart

You'll need docker and docker-compose to get up and running using this method.

make up SPLUNK_VERSION=9.2
make wait_up
make test
make down

To run the examples and unit tests, you must put the root of the SDK on your PYTHONPATH. For example, if you downloaded the SDK to your home folder and are running OS X or Linux, add the following line to your .bash_profile file:

export PYTHONPATH=~/splunk-sdk-python

Following are the different ways to connect to Splunk Enterprise

Using username/password

import splunklib.client as client
service = client.connect(host=<host_url>, username=<username>, password=<password>, autologin=True)

Using bearer token

import splunklib.client as client
service = client.connect(host=<host_url>, splunkToken=<bearer_token>, autologin=True)

Using session key

import splunklib.client as client
service = client.connect(host=<host_url>, token=<session_key>, autologin=True)

Update a .env file

To connect to Splunk Enterprise, many of the SDK examples and unit tests take command-line arguments that specify values for the host, port, and login credentials for Splunk Enterprise. For convenience during development, you can store these arguments as key-value pairs in a .env file. Then, the SDK examples and unit tests use the values from the .env file when you don't specify them.

Note: Storing login credentials in the .env file is only for convenience during development. This file isn't part of the Splunk platform and shouldn't be used for storing user credentials for production. And, if you're at all concerned about the security of your credentials, enter them at the command line rather than saving them in this file.

here is an example of .env file:

# Splunk Enterprise host (default: localhost)
host=localhost
# Splunk Enterprise admin port (default: 8089)
port=8089
# Splunk Enterprise username
username=admin
# Splunk Enterprise password
password=changed!
# Access scheme (default: https)
scheme=https
# Your version of Splunk Enterprise
version=9.2
# Bearer token for authentication
#splunkToken=<Bearer-token>
# Session key for authentication
#token=<Session-Key>

SDK examples

Examples for the Splunk Enterprise SDK for Python are located in the splunk-app-examples repository. For details, see the Examples using the Splunk Enterprise SDK for Python on the Splunk Developer Portal.

Run the unit tests

The Splunk Enterprise SDK for Python contains a collection of unit tests. To run them, open a command prompt in the /splunk-sdk-python directory and enter:

make

You can also run individual test files, which are located in /splunk-sdk-python/tests. To run a specific test, enter:

make test_specific

The test suite uses Python's standard library, the built-in unittest library, pytest, and tox.

Notes:

The test run fails unless the SDK App Collection app is installed.

To exclude app-specific tests, use the make test_no_app command.

To learn about our testing framework, see Splunk Test Suite on GitHub. In addition, the test run requires you to build the searchcommands app. The make command runs the tasks to do this, but more complex testing may require you to rebuild using the make build_app command.

Repository

Directory	Description
/docs	Source for Sphinx-based docs and build
/splunklib	Source for the Splunk library modules
/tests	Source for unit tests
/utils	Source for utilities shared by the unit tests

Customization

When working with custom search commands such as Custom Streaming Commands or Custom Generating Commands, We may need to add new fields to the records based on certain conditions.
Structural changes like this may not be preserved.
Make sure to use add_field(record, fieldname, value) method from SearchCommand to add a new field and value to the record.
Note: Usage of add_field method is completely optional, if you are not facing any issues with field retention.

class CustomStreamingCommand(StreamingCommand):
    def stream(self, records):
        for index, record in enumerate(records):
            if index % 1 == 0:
                self.add_field(record, "odd_record", "true")
            yield record

Don't

class CustomStreamingCommand(StreamingCommand):
    def stream(self, records):
        for index, record in enumerate(records):
            if index % 1 == 0:
                record["odd_record"] = "true"
            yield record

Customization for Generating Custom Search Command

Generating Custom Search Command is used to generate events using SDK code.
Make sure to use gen_record() method from SearchCommand to add a new record and pass event data as a key=value pair separated by , (mentioned in below example).

@Configuration()
class GeneratorTest(GeneratingCommand):
    def generate(self):
        yield self.gen_record(_time=time.time(), one=1)
        yield self.gen_record(_time=time.time(), two=2)

Don't

@Configuration()
class GeneratorTest(GeneratingCommand):
    def generate(self):
        yield {'_time': time.time(), 'one': 1}
        yield {'_time': time.time(), 'two': 2}

Access metadata of modular inputs app

In stream_events() method we can access modular input app metadata from InputDefinition object
See GitHub Commit Modular input App example for reference.

    def stream_events(self, inputs, ew):
        # other code
        
        # access metadata (like server_host, server_uri, etc) of modular inputs app from InputDefinition object
        # here inputs is a InputDefinition object
        server_host = inputs.metadata["server_host"]
        server_uri = inputs.metadata["server_uri"]
        
        # Get the checkpoint directory out of the modular input's metadata
        checkpoint_dir = inputs.metadata["checkpoint_dir"]

Access service object in Custom Search Command & Modular Input apps

Custom Search Commands

The service object is created from the Splunkd URI and session key passed to the command invocation the search results info file.
Service object can be accessed using self.service in generate/transform/stream/reduce methods depending on the Custom Search Command.

For Generating Custom Search Command

  def generate(self):
      # other code
      
      # access service object that can be used to connect Splunk Service
      service = self.service
      # to get Splunk Service Info
      info = service.info

Modular Inputs app:

The service object is created from the Splunkd URI and session key passed to the command invocation on the modular input stream respectively.
It is available as soon as the Script.stream_events method is called.

    def stream_events(self, inputs, ew):
        # other code
        
        # access service object that can be used to connect Splunk Service
        service = self.service
        # to get Splunk Service Info
        info = service.info

Optional:Set up logging for splunklib

The default level is WARNING, which means that only events of this level and above will be visible
To change a logging level we can call setup_logging() method and pass the logging level as an argument.
Optional: we can also pass log format and date format string as a method argument to modify default format

import logging
from splunklib import setup_logging

# To see debug and above level logs
setup_logging(logging.DEBUG)

Changelog

The CHANGELOG contains a description of changes for each version of the SDK. For the latest version, see the CHANGELOG.md on GitHub.

Branches

The master branch represents a stable and released version of the SDK. To learn about our branching model, see Branching Model on GitHub.

Documentation and resources

Resource	Description
Splunk Developer Portal	General developer documentation, tools, and examples
Integrate the Splunk platform using development tools for Python	Documentation for Python development
Splunk Enterprise SDK for Python Reference	SDK API reference documentation
REST API Reference Manual	Splunk REST API reference documentation
Splunk>Docs	General documentation for the Splunk platform
GitHub Wiki	Documentation for this SDK's repository on GitHub
Splunk Enterprise SDK for Python Examples	Examples for this SDK's repository

Community

Stay connected with other developers building on the Splunk platform.

Contributions

If you would like to contribute to the SDK, see Contributing to Splunk. For additional guidelines, see CONTRIBUTING.

Support

You will be granted support if you or your company are already covered under an existing maintenance/support agreement. Submit a new case in the Support Portal and include "Splunk Enterprise SDK for Python" in the subject line.

If you are not covered under an existing maintenance/support agreement, you can find help through the broader community at Splunk Answers.
Splunk will NOT provide support for SDKs if the core library (the code in the /splunklib directory) has been modified. If you modify an SDK and want support, you can find help through the broader community and Splunk Answers.

We would also like to know why you modified the core library, so please send feedback to [email protected].
File any issues on GitHub.

Contact Us

You can reach the Splunk Developer Platform team at [email protected].

License

The Splunk Enterprise Software Development Kit for Python is licensed under the Apache License 2.0. See LICENSE for details.

splunk-sdk-python's People

Contributors

Stargazers

Watchers

Forkers

apanda rmak svasan holdensmagicalunicorn brunkle naghi cpennington ragsns dave-shawley chriskelvinlee archankr bigjava joskid rsommer ww9rivers getsantanupathak mangoicestar chiehwen amako11 glennblock huit zroger rm-hull asifiqbal stelles jaykul justinlmeyer zach-taylor martindurant markshao stilaye splnkit chrmorais delfick fourkidsco bumyongchoi rhaarm assios outcoldman kkirsche premchandtheertham brainfold sullivanmatt lowtalker kalpsfeb28 mishin cjw296 beelit94 pathcl slaterbyte filmor bandarusridhar niyue u-kyou yjy56346 linearregression mummu pythonlearner7 rvnthvrm durbha joshwertheim bizdev1 nandlalyadav timcordova malmoore anlim darlingtld doksu tkelleyireland croblee crypto-perry wjo1212 c0ns0le heyang930520 jason790 prats84 zhengyuli vegitron matutter wangjiaji liketic clifg wythel hzhzhang buysse joescii hr2013 laristote 1dinamani rgmendes zhengneng sisodev highfestiva datasearchninja vgulch sharadmalmanchi gaurav22gupta seunomosowon steadfasterx tltx

splunk-sdk-python's Issues

Can't set eventsource from command line in examples/submit.py

$ echo hola | ./submit.py --host my.splunk --port 8089 --username=admin --password=pwd --eventsource=foo my.index
Traceback (most recent call last):
File "./submit.py", line 83, in
main(sys.argv[1:])
File "./submit.py", line 59, in main
{'eventhost':'host'}, 'source', 'sourcetype')
File "./../utils/init.py", line 93, in dslice
if value.has_key(arg): result[arg]
KeyError: 'source'

https://github.com/ntteurope/splunk-sdk-python/commit/86a58fc14876c1c1a0d952d9c0ce39a21e62444c seems to fix the problem, but I'm not sure it's very correct...

examples/submit.py packs multiple lines in a single event

Hi,

We have a tool that generates CSVs to stdout and we are trying to pipe its output to examples/submit.py in order to create an event in Splunk for each line in the CSV. Playing around, we've noticed that:

#
# The following code uses the Splunk streaming receiver in order
# to reduce the buffering of event data read from stdin, which makes
# this tool a little friendlier for submitting large event streams,
# however if the buffering is not a concern, you can achieve the
# submit somewhat more directly using Splunk's 'simple' receiver,
# as follows:
#
# event = sys.stdin.read()
# service.indexes[index].submit(event, **kwargs_submit)
#

cn = service.indexes[index].attach(**kwargs_submit)
try:
    while True:
        line = sys.stdin.readline().rstrip('\r\n')
        if len(line) == 0: break
        cn.write(line)
finally:
    cn.close()

, using the version shown, all lines get bunched up into the same event. However, if we use the commented version, it works perfectly.

I'm not sure what's the right thing to do, but the latter behavior seems to be much more useful

Update Twitter Examples to Use HTTPS

Twitter has changed their auth to require https. Change connection method call in all twitter examples to HTTPS

The Conf class should be renamed

The Conf class represents a stanza in a conf file, it should be renamed to Stanza

Get requests for particular monitor inputs fail because paths are not converted before sending the request

It's probably easiest to describe the issue with an example.

First, create a new monitor input

s = splunklib.connect(...)
s.inputs.create('/home/myuser/log', kind='monitor')

I see a POST with name='%2Fhome%2Fmyuser%2Flog' in the message body, and the splunk instance correctly handles it. However, the subsequent GET (which I presume is used to update the state locally) uses the path I provided, /home/myuser/log. Because of this, or maybe something else, I immediately get this:

splunklib.binding.HTTPError: HTTP 404 Not Found -- In handler 'monitor': Invalid custom action for this internal handler (handler: monitor, custom action: myuser, eai action: list).

Any subsequent attempts to lookup the Input object for this monitor input, using:

s.inputs['/home/myuser/log']

s.inputs[('/home/myuser/log', 'monitor')]

...Fail with a KeyError. However, the following does work:

s.inputs['%2Fhome%2Fmyuser%2Flog']

This is pretty confusing. It seems like /home/myuser/log should work everytime from a custom script that uses the SDK, and that the URL-valid paths (with the %2F sequences) should remain from the user.

Is this intended behavior?

Thanks,
-Jason
[email protected] (perf)

Jobs#list should return a list of Job objects

The Jobs#list method currently returns a list of SIDs, I think it should return a list of Job objects instead.

Currently I have to wrap the Jobs object and override the method to do this:

[Job(self._jobs.service, sid) for sid in self._jobs.list()]

Would there be any down side to created a Job object instead?

Support Python 3

Currently splunklib only suppports Python 2. This is quite a bit annoying for those of us that have made the jump to Python 3, especially as the spunk api is not easily handleable via standard Python XML tools.

Unused variables

I've found a lot of unused variables in the SDK and they should be looked at to find out if they should be used.

This applies to these methods:

Conf#submit
Index#submit
Index#upload
Inputs#create
Inputs#delete

UrlEncoded repr recurses infinitely

https://github.com/splunk/splunk-sdk-python/blob/master/splunklib/binding.py#L157

replace with this:

return "UrlEncoded(%s)" % repr(urllib.unquote(str(self)))

splunklib.modularinput | <done/> written even when done=False

I promised myself I wouldn't gripe about how long it took us to figure out that there was a bug in the SDK and not in our code or in Splunk's handling of our input ....

Bottom line: in event.py line 103 you are checking if "done is not None" but you default it to "True" so we obviously set it to "False," (not "None") -- you should just be evaluating it for true.

Cannot pass a pre-existing session token to Service/Context

The comment for Context specifies that you can pass in an optional session token (using token kwarg) to the Context __init__ method. However, the method never looks it up and always initialize it to None, which is a bug.

When our Splunk machine is heavily in use, a REST call to get the results of a search job will fail unexpectedly

When our Splunk machine is heavily in use, a REST call to get the results of a search job will fail unexpectedly, as below:

job.refresh()
  File "splunk-sdk-python/splunklib/client.py", line 955, in refresh
    self._state = self.read()
  File "/splunk-sdk-python/splunklib/client.py", line 1014, in read
    results = self._load_state(response)
  File "splunk-sdk-python/splunklib/client.py", line 878, in _load_state
    entry = self._load_atom_entry(response)
  File "splunk-sdk-python/splunklib/client.py", line 2379, in _load_atom_entry
    return _load_atom(response).entry
AttributeError: 'NoneType' object has no attribute 'entry'

Job#setttl should be renamed

Setttl just looks wrong, should be set_ttl

Adding parameters to Users.create

I am working on Helmuts Users manager and I noticed something. To create a user you need to specify password as well as roles. This can be done in the kwargs, but shouldn't these be positional arguments? I think that if the method breaks when arguments are not provided in kwargs they should be.

Example response is:

Traceback (most recent call last):
File "./usermanager.htest.py", line 20, in
splunk.users.create_user('Khorde', 'testar')
File "/Users/phashemi/gitdepot/splunk-helmut/helmut/manager/users/sdk/init.py", line 29, in create_user
self.connector.service.users.create(username, *_kwargs))
File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/client.py", line 1004, in create
return Collection.create(self, name.lower(), *_kwargs)
File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/client.py", line 450, in create
self.post(name=name, *_kwargs)
File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/client.py", line 309, in post
return self.service.post("%s%s" % (self.path, relpath), *_kwargs)
File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/binding.py", line 185, in post
return self.http.post(self.url(path), self._headers(), **kwargs)
File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/binding.py", line 372, in post
return self.request(url, message)
File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/binding.py", line 378, in request
raise HTTPError(response)
splunklib.binding.HTTPError: HTTP 400 Bad Request -- In handler 'users': Users must be given at least one role

Empty text node raises 'NoneType' object has no attribute 'encode'

In splunklib/results.py at line 257 values.append(elem.text.encode('utf8')) can raise an AttributeError for encode.

If the xml looks like:

<field k='named_field'>
    <value><text></text></value>
</field>

elem.text will be the None object, not an empty string as one might expect. I've used this instead:

t = elem.text
if not t:
    t = ''
 values.append(t.encode('utf8'))

Job#touch has trailing comma in argument list

Doesn't cause a syntax error but is quite confusing. The relevant code:

def touch(self,):
    self.post("control", action="touch")
    return self

Endpoint in client for changing port(s) (server/settings)

It is possible to change the port of Splunk via the rest endpoint. if you hit the http://docs.splunk.com/Documentation/Splunk/4.3.3/RESTAPI/RESTsystem#POST_server.2Fsettings.2F.7Bname.7D
endpoint. It would be really nice to be able to do this since the current only way of doing it in my framework is by executing the binary.

Example:
~> curl -k -u admin:changeme https://localhost:8089/services/server/settings/settings -d httpport=990

"git clone https://github.com/splunk/splunk-sdk-python.git" does not check out code

Hi,

I get the following issue while trying to check out code

sudo git clone https://github.com/splunk/splunk-sdk-python.git
Cloning into splunk-sdk-python...
warning: remote HEAD refers to nonexistent ref, unable to checkout.

The directory is empty except for a .git folder with some sub-directories. Any thoughts on this? From sites on the internet, seems like server info needs to be updated on the repo side.

Waiting for your input.

Thanks

login routine does not follow HTTP redirects?

I am using the 1.1.0 release currently available in pip. I attempt to connect to my local splunk 6.0 install using the following:

service.client.connect(
            host='localhost',
            scheme='https',
            port=8443,
            app='myapp',
            username="myuser",
            password="********")

But this fails like so:

Traceback (most recent call last):
  File "./trtool.py", line 188, in <module>
    session_report.run_report(subargs[0])
  File "/home/brian/tableau-reporting/sessions.py", line 668, in run_report
    self.connect_splunk()
  File "/home/brian/tableau-reporting/sessions.py", line 665, in connect_splunk
    password="MappyTabl3au")
  File "/usr/local/lib/python2.7/dist-packages/splunklib/client.py", line 289, in connect
    return Service(**kwargs).login()
  File "/usr/local/lib/python2.7/dist-packages/splunklib/binding.py", line 753, in login
    session = XML(body).findtext("./sessionKey")
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1302, in XML
    return parser.close()
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1655, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

I added some debug printing and found that the library is getting a 303 redirecting to https://localhost/en-US/services/auth/login. Manually following this redirect reveals a 404, however. And notably, the Location header in the response does not have the expected port. Am I missing something dumb?

unused import

import sys is never used in data.py

1.1 Release Checklist

Splunk Python SDK Release Instructions

These are instructions on the various steps necessary to cut out a new
release of the Splunk Python SDK. Even though the instructions are public,
these steps are only meant to be taken by the SDK maintainers.

Prerequisites

Read through all of these release instructions.
- Update if necessary. (For example, the version numbers will need updating.)
- For updates that could apply to other SDKs as well, update the release instructions page for every other SDK.
Update changelog.
Run test suite on full test matrix.
Install random_numbers.spl and github_forks.spl (found in build/ after running python setup.py dist on the respository) on Linux (32-bit and 64-bit), MacOS X (64-bit), and Windows (32-bit and 64-bit). Add a new data input for both kinds, and check that they generate events by running the search "*" with time range "Real time (all time)".
Run all examples.
[ ]Run all dev.splunk.com code samples.
Remove old temporary branches. This includes feature branches, old release branches, and most branches that have been merged to develop.
- (Exception: The "promises-old" branch in the JavaScript SDK should be retained for the time being.)

Release Steps

Announce!

Hurrah, the new release is basically done! You can now announce it on the
following channels:

Twitter (@splunkdev, maybe @splunk)
Google Groups (splunkdev)
Dev Portal (http://dev.splunk.com)
Dev Blog (http://blogs.splunk.com/dev)

problem while installing splunk-sdk-python

As mentioned in the document, i cloned the splunk-sdk-python setup from github. when i try to install the sdk using command "python setup.py install", it throws the error that "could not find coverage module. Please install it and try again".

Please tell what could be the possible reason for error. Are there any path settings required?

Cannot create a conf file

From the REST API it's easy to enter values to a conf file that does exist.

All you need is to POST name=stanza_name to configs/conf-<name> and the file is created for you.

This doesn't work at all in the SDK.
When you try to fetch a conf file that doesn't exist you get a KeyError and the Collection doesn't support create (which is understandable seeing as you cannot create an empty conf file)

Confs cannot handle the cases when each stanza is not unique

I tried doing this:

service.confs['web']['settings']['httpport']

But I get a

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/ngiertz/helmut/helmut/contrib/splunk-sdk-python-2011-09-15.zip/splunk-sdk-python/splunk/client.py", line 288, in __getitem__
  File "/Users/ngiertz/helmut/helmut/contrib/splunk-sdk-python-2011-09-15.zip/splunk-sdk-python/splunk/client.py", line 298, in read
  File "/Users/ngiertz/helmut/helmut/contrib/splunk-sdk-python-2011-09-15.zip/splunk-sdk-python/splunk/client.py", line 267, in _filter_content
AttributeError: 'list' object has no attribute 'has_key'

The same goes for just calling read on the stanza

Using a reserved name

list is a reserved keyword and should not be used for a method name.

Maybe use to_list? or items?

The affected methods are:

Collection#list
Inputs#list
Jobs#list

Documentation Error with binding.Context.connect()

The documentation for splunklib.binding.Context.connect() says it returns "A Socket", but it actually returns a splunklib.binding.Context object.

ModularInput Error handling needs to write the traceback

After one attempt and seeing the error handling you're providing I had to put in a wrapper of my own to manually write to your event writer:

    except Exception as e:
        import traceback
        writer.log("ERROR", "{}: {}: {}".format(type(e).__name__, e, traceback.format_exc()))

Enhancement: Context#prefix should be a property

Currently the prefix is an attribute in Context but this causes some problems.

In our framework you can change the ports Splunk listens to which causes our framework to update the ports of the Service

However, writing to port does not update the prefix variable so it won't be reflected when making requests.

I suggest the following code:

class Context:
....
@property
def prefix(self):
    return "%s://%s:%s" % (self.scheme, self.host, self.port)

Top level methods should not return generic objects

Service#confs for example returns a Collection object which is really confusing, just having an empty class called Confs that inherit from Collection would be so much simpler.

Hostnames are ambiguously resolved on Dual Stack IPv4/6 Networks

Steps to reproduce:
Network like Splunk's, with support for both IPv4 and IPv6.
Two machines configured for both protocols.
One running Splunk.
One running Splunk Python SDK.

When hostname is provided, Splunk Python SDK will attempt to connect using IPv6 if the machine being reached has a AAAA record. Unfortunately, Splunkd does not come configured automatically to listen on IPv6. So the service attempts to connect and times out. We should have a default of using IPv4 to connect, with a suite of configurable options akin to Splunk's own "connectUsingIpVersion" directive.

http://docs.splunk.com/Documentation/Splunk/latest/Admin/ConfigureSplunkforIPv6#Configure_Splunk_to_listen_on_an_IPv6_network

http://docs.python.org/library/socket.html

This is a common issue with any utility using php, perl, python... pretty much any language that utilizes the getHostByHostname family of underlying routines. Sadly, this is not elegantly configurable as IPv6 support for python's socket is a compile time option. Best work around is to use socket directly to resolve the hostname at connect time (do not cache this data) and use the appropriate returned IP address(es) for the connect option desired.

Will Index.clean() perform its task eventhough it is timed out?

A different issue, but the way clean is implemented with shrinking bucket sizes and rolling, can this not cause a situation where the operation times out, but the cleaning has actually been performed? I.e OperationTimeout is raised, but the clean() was actually successful? What would timeout mean in this situation?

The reason I am asking is because I think that this might be the situation in my test. Since clean() throws the OperationTimeout but there are no events left when I manually check eventcount.

Secondly, a new sidenote: Isn't the timeout on 60 seconds that now has been put in place also a breaking change? It broke a little in helmut since cleaning sometimes takes longer than 60 seconds, and before Index.clean() ran indefinitely. What I am also curious about, is 60 seconds a good default value? I mean, our test logfile only has 10000 events and it is timing out sometimes, when cleaning out the index holding the data.

Modular input event writer does not support non-unicode encoding

Current splunklib/modularinput/event.py does not support text encoding other than unicode.
Refer to the attachment for a possible solution.

[default] stanza not showing up in props.conf

Running into a problem that is really boggling me. I am listing the stanzas in props.conf. Basically the one under system/default. (from the nobody/system namespace)

The problem is that the [default]-stanza doesn't show up. I have tried adding new stanzas after and before the [default] stanza and they show up when you list the stanzas in the props.conf SDK Conf object.

I tried listing stuff that gets parsed in the splunklib/client.py and I noticed a default-mode title string popping up here and there when listing. The default-mode didn't get listed as a stanza though.

Job#setttl should return self?

Seems like most non result methods returns self (I'm assuming for chaining purposes) but setttl does not:

def setttl(self, value):
    self.post("control", action="setttl", ttl=value)`

Removing attributes from Stanza?

Using the new conf stuff in Helmut, great stuf guys! Got one question though, what is the preferred way of removing attributes from stanzas?

Right now I simply do this:

        self.raw_sdk_stanza.update(**{key : ''})
        self.raw_sdk_stanza.refresh()

EventWriter should be compatible with logging

That is, it should be more similar to the logging class.

At a minimum, the log method of EventWriter should take an integer LoggingLevel rather than a string as it's first argument.

There's really no reason good enough to make this incompatible. It's almost this easy, you'd just have to use stdout and stderr (or set the log file to output in splunk's log folder), and you'd be 90% done ...

It would make life so much easier (particularly when bringing existing code) if a modular input could just set their logger to the writer that gets passed to them in order to switch things from python logging to splunk logging.

</stream> written without <stream>

If there's an error early in the process (which I handle and .log())

I get an ERROR to stderr, and then a tag to stdout....

The writer should write the opening tag during init -- this would have a beneficial side effect of making this whole thing more threadsafe.

delitem()

Hi guys!

Just curious about what your thoughts are on using the 'del' keyword in Python. It could for instance be used with Inputs and Input instances. Have you guys looked into using for example delitem(..) ?

Cheers

Pizza

Modular Input Script should accept inputs if validate_input is not set

Here, we should check if validate_input is not defined, and return 0... otherwise business as usual
https://github.com/splunk/splunk-sdk-python/blob/master/splunklib/modularinput/script.py#L92

Inputs does not support adding oneshot

Hi guys,

In the INPUT_KINDMAP oneshot is not listed. The preferred way seems to be uploading through an index? I see why you could upload to an index but since you through REST can also add an oneshot input, shouldn't that functionality also exist in the SDK?

Thanks!

XML parse error with "Connection: keep-alive"

The python splunk SDK currently breaks when the server uses "Connection: keep-alive".

The issue is in splunklib.binding:handler, in the request method, you get the response object, then call close on the connection, then try to read the data.

When the server is using keep-alive, httplib will give you an empty response after you call connection.close().

We currently have NginX running in front of Splunk, which was changing it to keep-alive.

I'm using python2.7 on ubuntu 13.04 and ubuntu 12.04.
The error I get is:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/aaa/.virtualenvs/splunk/local/lib/python2.7/site-packages/splunklib/client.py", line 288, in connect
    return Service(**kwargs).login()
  File "/home/aaa/.virtualenvs/splunk/local/lib/python2.7/site-packages/splunklib/binding.py", line 753, in login
    session = XML(body).findtext("./sessionKey")
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1302, in XML
    return parser.close()
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1655, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

Test files does not handle authentication arguments

Tested on Windows 7 x64.

When running the test cases:

test_binding.py
test_client.py
test_data.py
test_examples.py

Only test_binding.py can handle the --username and --password arguments. The rest of the files does not recognize the arguments.

Example:

python test_binding.py --username=admin --password=changeme

..........

Ran 10 tests in 2.247s

python test_examples.py --username=admin --password=changeme
option --username not recognized
Usage: test_examples.py [options] [test] [...]

Options:
...

Unable to pass the hostname as unicode string?

Hi,

We're using splunk-sdk and in our code we're getting the hostname as unicode string and pass it in like the example below. Is this be a bug or should my code handle it?

(env26_test)16:19 hsebastian@jemli ~> cat unicode.py 
import splunklib.client

service = splunklib.client.connect(host=u'localhost')
(env26_test)16:19 hsebastian@jemli ~> python unicode.py 
Traceback (most recent call last):
  File "unicode.py", line 3, in <module>
    service = splunklib.client.connect(host=u'localhost')
  File "/Users/hsebastian/storm_repo/env26_test/lib/python2.6/site-packages/splunklib/client.py", line 162, in connect
    return Service(**kwargs).login()
  File "/Users/hsebastian/storm_repo/env26_test/lib/python2.6/site-packages/splunklib/binding.py", line 206, in login
    password=self.password)
  File "/Users/hsebastian/storm_repo/env26_test/lib/python2.6/site-packages/splunklib/binding.py", line 372, in post
    return self.request(url, message)
  File "/Users/hsebastian/storm_repo/env26_test/lib/python2.6/site-packages/splunklib/binding.py", line 375, in request
    response = self.handler(url, message, **kwargs)
  File "/Users/hsebastian/storm_repo/env26_test/lib/python2.6/site-packages/splunklib/binding.py", line 430, in request
    connection.request(method, path, body, head)
  File "/Users/hsebastian/python267/lib/python2.6/httplib.py", line 914, in request
    self._send_request(method, url, body, headers)
  File "/Users/hsebastian/python267/lib/python2.6/httplib.py", line 951, in _send_request
    self.endheaders()
  File "/Users/hsebastian/python267/lib/python2.6/httplib.py", line 908, in endheaders
    self._send_output()
  File "/Users/hsebastian/python267/lib/python2.6/httplib.py", line 780, in _send_output
    self.send(msg)
  File "/Users/hsebastian/python267/lib/python2.6/httplib.py", line 739, in send
    self.connect()
  File "/Users/hsebastian/python267/lib/python2.6/httplib.py", line 1112, in connect
    sock = socket.create_connection((self.host, self.port), self.timeout)
  File "/Users/hsebastian/python267/lib/python2.6/socket.py", line 547, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.error: Int or String expected

When adding a oneshot an object should be returned

Currently when you add a oneshot nothing is returned.

Since the oneshot contains data about the indexing progress an object should be returned instead.

This object should also have a wait method that waits for the oneshot to disappear from the REST API.

client port should be passed to httplib as integer not string

When I pass an unresolved hostname to client.connect, I get an exception:

<snip>
  File "/usr/local/lib/python2.7/dist-packages/splunklib/client.py", line 288, in connect
    return Service(**kwargs).login()
  File "/usr/local/lib/python2.7/dist-packages/splunklib/binding.py", line 751, in login
    password=self.password)
  File "/usr/local/lib/python2.7/dist-packages/splunklib/binding.py", line 1079, in post
    return self.request(url, message)
  File "/usr/local/lib/python2.7/dist-packages/splunklib/binding.py", line 1096, in request
    response = self.handler(url, message, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/splunklib/binding.py", line 1195, in request
    connection.request(method, path, body, head)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 776, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1157, in connect
    self.timeout, self.source_address)
  File "/usr/local/lib/python2.7/dist-packages/gevent/socket.py", line 637, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/local/lib/python2.7/dist-packages/gevent/socket.py", line 737, in getaddrinfo
    raise gaierror(EAI_SERVICE, 'Servname not supported for ai_socktype')
socket.gaierror: [Errno -8] Servname not supported for ai_socktype

binding.py:hander.request extracts the port as a string from a url argument. This gets passed down the stack eventually (on my system, anyway) to gevent/socket.py. getaddrinfo sees the port as a string and tries to look it up in /etc/services instead of accepting it as an integer.

HttpLib#request has strange argument names

If you look at how Context#request calls HttpLib#request it looks like this:

    return self.http.request(
        self.url(path), {
            'method': message.get("method", "GET"),
            'headers': message.get("headers", []) + self._headers(),
            'body': message.get("body", "")})

But if you look at the signature for HttpLib#request:

def request(self, url, headers=None, **kwargs):

And in turn when that calls the Handler#request the signature is:

def request(url, message, **kwargs):

Why is the second argument of HttpLib#request named headers when it's clearly the message argument later?

Supplying multiple roles when creating an User

This is related to issue #32.

When creating a User you must provide a role too: admin or user for instance. As specified in http://docs.splunk.com/Documentation/Splunk/latest/RESTAPI/RESTaccess#authentication.2Fusers under the header "POST authentication/users" which describes creation of a user, you can also specify a list of roles:

"... To assign multiple roles, send them in separate roles parameters."

that means writing something like -d roles=admin -d roles=user. This obviously will not work in our python implementation seeing as we are using a dict. I tried experimenting with different ways of supplying both the roles in one -d roles=-option using & , ; and {}. [] but couldn't figure it out. So I assume you must supply separate roles options.

Do you guys have any idea of how to fix/work around this?

Test files does not allow empty lines in .splunkrc

Using windows 7 x64

When using the .splunkrc for authentication, empty lines are not allowed to be used. If doing so, the file is not read correctly, giving HTTPError: HTTP 401 Unauthorized -- Login failed.

Verified for the files:

tests\test_binding.py
tests\test_client.py
tests\test_examples.py

ReadMe doesn't cover modular input use

The readme should cover how to use this to write modular inputs. Specifically, how much of it do I need to ship in my modinput to get it to work with Splunk's python?

output_mode=csv when fetching results breaks results parser?

With reference to JIRA issue SPL-53634 I am translating the Java SDK example described in http://dev.staging.splunk.com/view/SP-CAAAECN#search to a Helmut example. The code can be seen below. It seems to work when I omit writing output_mode='csv'. In case I don't omit it I get the following error.

Traceback (most recent call last):
  File "./fetcheventstest.htest.py", line 37, in <module>
    for r in job.get_results(output_mode='csv',count=max_result_rows,offset=get_offset):
  File "/Users/phashemi/gitdepot/splunk-helmut/helmut/manager/jobs/sdk/job.py", line 176, in get_results
    return _build_results_from_sdk_response(response)
  File "/Users/phashemi/gitdepot/splunk-helmut/helmut/manager/jobs/sdk/job.py", line 206, in     _build_results_from_sdk_response
    while reader.read():
  File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/results.py", line 397, in read
        kind = self._scan()
  File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/results.py", line 364, in _scan
    self._reader.read()
  File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/results.py", line 197, in read
    self._item = self._scan()
  File "/Users/phashemi/gitdepot/splunk-helmut/helmut/contrib/splunk-sdk-python/splunklib/results.py", line 147, in _scan
    item = self._items.next()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/pulldom.py", line 232, in next
    rc = self.getEvent()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/dom/pulldom.py", line 265, in getEvent
    self.parser.feed(buf)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/xml/sax/handler.py", line 38, in fatalError
    raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:5: not well-formed (invalid token)

The code is this:

SPLUNK_HOME = '/Users/phashemi/splh/'
splunk = LocalSplunk(SPLUNK_HOME)

job = splunk.jobs.create('search index=_internal')
job.wait()

max_result_rows = int(splunk.confs['limits']['restapi']['maxresultrows'])
eventcount = int(job.get_event_count())

get_offset = 0
num = 0
while get_offset < eventcount:

    for r in job.get_results(output_mode='csv',count=max_result_rows,offset=get_offset):
        print num
        num = num + 1

    get_offset = get_offset + max_result_rows

print "Done parsing all events."

and in helmut:

def get_results(self, **kwargs):
    response = self.raw_sdk_job.results(**kwargs)
    return _build_results_from_sdk_response(response)