Comments (6)
This error frequently occurs to me. My solutions include
1) write a retry function and try it 2-3 times. When it fails for the 1st time, it would ALWAYS work in 2nd trail.
2) increase your driver memory
from spark-sklearn.
Thanks! This was helpful. Retry function works fine for some reason...
from spark-sklearn.
There is a hardcoded timeout value here:
def serveIterator(items: Iterator[_], threadName: String): Array[Any] = {
val serverSocket = new ServerSocket(0, 1, InetAddress.getByName("localhost"))
// Close the socket if no connection in 15 seconds
serverSocket.setSoTimeout(15000)
from spark-sklearn.
Enable Arrow-based columnar data transfers
spark.conf.set("spark.sql.execution.arrow.enabled", "true")
from spark-sklearn.
i solved by change rdd.py file , for res in socket.getaddrinfo("localhost", port, socket.AF_UNSPEC, socket.SOCK_STREAM):
i change localhost to my ip , it works.
from spark-sklearn.
I faced a similar issue. Nothing worked until i changed the database completely. I did not understand but it seems like the pyspark dataframe created from SQL server was throwing this error but not the one from Redshift.
from spark-sklearn.
Related Issues (20)
- Spark Broadcast exceeding executor memory with large training data set HOT 4
- Clarify RandomizedSearchCV documentation for sampling with replacement HOT 2
- best_params_ not supported by RandomizedSearchCV() HOT 1
- Need for an example HOT 1
- toSpark() must be called with Converter instance as first argument HOT 3
- It appears that you are attempting to reference SparkContext from a broadcast
- spark-sklearn on windows- not working on local HOT 8
- AttributeError: 'KeyedEstimator' object has no attribute '_input_kwargs'
- test_scipy_sparse (spark_sklearn.converter_test.CSRVectorUDTTests) failure
- Scikit >=20.0 support HOT 5
- Long Time to Collect Results of Distributed Spark-Sklearn Training HOT 1
- Multiple scorers HOT 7
- best_params_ missing on GridSearchCV HOT 3
- Implement parallelized RandomizedSearchCV HOT 1
- pip install spark-sklearn-(version-no) doesn't work HOT 5
- ImportError: Module not found with Azure Spark Cluster HOT 1
- Spark 2.3 compatible? HOT 2
- "TypeError: Can't instantiate abstract class GridSearchCV with abstract methods _run_search" HOT 2
- ImportError: No module named HOT 1
- Feature request: Probability Calibration
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-sklearn.