v6d-io / v6d Goto Github PK
View Code? Open in Web Editor NEWvineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Home Page: https://v6d.io
License: Apache License 2.0
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
Home Page: https://v6d.io
License: Apache License 2.0
Help,
I'm trying to build the libvineyard from source,
the error message "there is no CMAKELists.txt in thirdparty/nlohmann" occurs while i'm using the cmake tool-kit to build it,
And i also notice that all the direcotries in the thirdparty(like uri, pybind) seem to lack the CMAKELists.txt
Any suggestion for it?
Hi, I'm trying out vineyard as alternative to PyArrow Plasma, given that the latter is currently unmaintained.
https://v6d.io/notes/getting-started.html#starting-vineyard-server recommends running vineyardd, whereas https://v6d.io/notes/install.html#install-vineyard recommends installing vineyard from pypi.
Unfortunately https://pypi.org/project/vineyard/ does not contain the vineyardd binary. Is it intended?
Vineyard's memory allocator derived from plasma, and also suffers from issues in plasma store server. When specifying a large --size
value (exceeds the available size of /dev/shm
), creating blobs won't fail, but any touching (aka. read/write) on the memory will trigger a "SIGBUG", i.e., bus error.
Such a signal is hard for users to catch, and usually leads to "crash" of the client program, which is quite bad.
grapelite is listed as required dependency, even when building with -DBUILD_VINEYARD_GRAPH=0
Current uri is bad for network::uri when delimiter=|
Describe your problem
We need to ensure compatibility with the following settings:
OS:
Boost:
apache-arrow:
Additional context
N/A
Describe your problem
Other json libraries, like nlohmann/json and rapidjson, has better performance, memory footprint, and the most importantly, the JSON comformance beyond boost/property_tree
.
Make the dependency clean to avoid potential conflicts from ABI problems of Arrow
Describe your problem
compare to dataframe of pandas, the dataframe of vineyard not support index now.
store index as a member of vineyard object
Describe your problem
The dataset-lifecycle-framework project abstracts data sources on S3 and NFS as CRD Dataset
. By mapping the dataset into PersistentVolumeClaims and ConfigMaps, the dataset could be refered in Pods.
The dataset-lifecycle-framework project also has an operator, and use "labels" to indicate the dataset requirements in Pod. The operator is responsible for mounting the dataset properly. Just like we do in vineyard.
Vineyard use a similar syntax in "labels" of Pods to let the controller know which object the pod will require. Beyond that, vineyard operator considers the data locality and works as a scheduler plugin.
Additional context
N/A
Describe your problem
Beyond C++ and Python, there are also many computation engines are written in other languages, e.g., Java/Scala, Rust and go. To have a well support for such languages, we need to privides SDKs.
Additional context
The Java SDK has been required in discussion by end users.
Describe your problem
The WSL2 should be trivial to support since it is a Linux running inside the virutal machine. But natively supporting Windows looks chanllanging:
But this does worth a try.
Additional context
Such effort could be backported to apache-arrow as well.
Describe your problem
As described in the title.
Additional context
To make IO stuffs works for graphscope.
Real Link is https://github.com/boostorg/leaf
How to deploy a DeamonSet
using helm.
Describe your problem
Current situation doesn't work when the client is not in python context.
Additional context
TBF
Describe your problem
Additional context
This task depends on #113 .
Describe your problem
stream one producer one consumer control
Descirption of the problem
Current one producer one consumer control is on the user's side, which is weak
Additional context
We would like to control this in vineyardd by marking the stream when it is open the first time
Describe your problem
/Users/yecol/git-repo/graphscope-edu/analytical_engine/core/fragment/arrow_projected_fragment.h:891:7: error: use of undeclared identifier
'ErrorCode'; did you mean 'vineyard::ErrorCode'?
ARROW_OK_OR_RAISE(begins_builder.Append(ret.first));
^
/usr/local/include/vineyard/graph/utils/error.h:220:23: note: expanded from macro 'ARROW_OK_OR_RAISE'
RETURN_GS_ERROR(ErrorCode::kArrowError, (status_name).ToString()); \
^
/usr/local/include/vineyard/graph/utils/error.h:31:12: note: 'vineyard::ErrorCode' declared here
enum class ErrorCode {
^
In file included from /Users/yecol/git-repo/graphscope-edu/analytical_engine/core/grape_instance.cc:20:
In file included from /Users/yecol/git-repo/graphscope-edu/analytical_engine/core/context/tensor_context.h:37:
/Users/yecol/git-repo/graphscope-edu/analytical_engine/core/fragment/arrow_projected_fragment.h:892:7: error: use of undeclared identifier
'ErrorCode'; did you mean 'vineyard::ErrorCode'?
ARROW_OK_OR_RAISE(ends_builder.Append(ret.second));
^
/usr/local/include/vineyard/graph/utils/error.h:220:23: note: expanded from macro 'ARROW_OK_OR_RAISE'
RETURN_GS_ERROR(ErrorCode::kArrowError, (status_name).ToString()); \
I suggest we return the errorcode enum class together with namespace.
Since libvineyard may be used by another project which is on the outside of namespace vineyard. Just like graphscope.
NOT_Found libvineyard
# or
Found libvineyard, (include: /usr/local/include, library: /usr/local/lib/libvineyard.dylib)
Descirption of the problem
If it is a bug report, to help us reproducing this bug, please provide information below:
uname -a
): macOSvineyard.__version__
): latestIf it is a feature request, please provides a clear and concise description of what you want to happen:
Additional context
Add any other context about the problem here.
Describe your problem
The fluid project manages a data cache layer (provided by alluxio or jindofs) for on-top bigdata applications. Fluid abstract the data caching services as a Dataset, and mount it to worker pods via a CSI driver. Besides, based the affinity of volumes, fluid also considering the data locality when scheduling.
Vineyard also consider the data locality when scheduling. Currently vineyard abstracts objects as CRDs to make them visible for kubernetes. And volumes also fit vineyard's design well. Integrating with fluid would benifits vineyard to make the sheduler plugin and vineyard-operator more lightweight.
To be abbreviated, vineyard could work as a fluid runtime. Things need to be done to work together with fluid are as follows (if
I understand it correctly):
Additional context
N/A
Descirption of the problem
The vineyardd hangs when launching:
$GLOG_v=100 ./bin/vineyardd --socket=/tmp/vineyard.sockxxx
I1102 12:16:08.303436 104502 vineyardd.cc:65] Hello vineyard!
I1102 12:16:08.309496 104502 meta_service.h:123] start!
After print more detail logs, we could see:
I1102 12:16:10.214583 104511 etcd_meta_service.cc:126] etcd ls use 978080 microseconds: 0
I1102 12:16:10.214630 104511 etcd_meta_service.cc:136] kvs.size() = 0
I1102 12:16:10.214675 104511 etcd_meta_service.cc:139] status = Etcd error: Received message larger than max (18368675 vs. 4194304), error code: 8
I1102 12:16:10.214730 104502 meta_service.h:501] request all: Etcd error: Received message larger than max (18368675 vs. 4194304), error code: 8
Additional context
The hang-up problem itself could be fixed by etcd-cpp-apiv3/etcd-cpp-apiv3#21, but the
Describe your problem
Etcd doesn't do very well for many keys in a single txn, the default limitation of ops in a txn is 128, and etcdserver raises many warnings about "execute too long to execute" both in update and range query.
Meanwhile flatten metadatas in etcd doesn't bring many benefits.
Additional context
None
Both in submodules and CmakeLists, as it's no longer needed and its size is large.
Describe your problem
We have come across the question about "do vineyard replicate data across nodes?" for several times, from end users and from the CNCF community. I think we need a section in the documentation to clearify our design point.
Maybe we also need to sharing such information in README.
Additional context
N/A
Describe your problem
To make vineyard works with kubernetes, we need:
n.b.: some of the work has been merged to the dev/kubernetes
branch.
DaemonSet
KubeCluster
Describe your problem
To handle cases like "opening a numpy ndarray as a stream".
I must admit that it is controversal and counterintuitive to add such feature to vineyardd. I would like to know how others think about that.
Additional context
N/A.
Currently If a driver throws an exception, the error message could not be seen from the client, which is inconvenient for debugging.
Some works need to be done to expose the errors to user.
Describe your problem
Vineyard python client crashes when get empty np.array
from vineyard (#137).
In [1]: import vineyard
In [2]: cl = vineyard.connect("/tmp/vineyard.sock")
In [3]: cl.put(np.array(()))
Out[3]: o00019a42596292fc
In [4]: cl.get(cl.put(np.array(())))
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0108 12:57:40.512176 342670784 object_factory.cc:26] Failed to create an instance due to the unknown typename: vineyard::Tensor<float64>
[1] 14403 segmentation fault ipython
vary-sized tuple (#137)
numpy ndarray with OBJECT type (hard to fixes).
sparse arrays are stored in a continous flatten manner
tensor order (easy to fixes) (#137)
string type.
Additional context
N/A
Describe your problem
When error occurs after seal part of the object into vineyard, we should support the user to recover from error and do proper cleanup in a easy fashion.
Additional context
Or at least, we should provides tools/utilities for users to archieve such goals.
We have obversed that the following code doesn't work:
import pyarrow as pa
import vineyard
client = vineyard.connect('/var/run/vineyard.sock')
But it import vineyard
first, it works as expected.
Describe your problem
Vineyard addresses sharing immutable data in-memory, but sometimes checkpoint or swap data to storage devices do make sense.
Vineyard already support object serailization/deserailzation in #164, but that is not enough.
(WIP) more details of the design will be added later.
Additional context
N/A
Describe your problem
Additional context
We may need to add ossfs support in fsspec, like s3fs.
Hi,
I'm puzzled about how the user can launch a vineyard cluster , Should i configure and start the etcd cluster firstly all
by myself ?
Is there any step or configurations i must pass to the libvineyard?
I have a pyarrow table with columns composed of 21036 chunks (21032413 rows). Storing the table or the equivalent pandas dataframe in vineyard does not succeed in reasonable amount of time (I waited a few minutes) but hangs in record_batch_builder (and the equivalent function for pandas respectively).
That's unlike the pyarrow plasma implementation, which is implemented in C++ and just takes a few seconds:
https://github.com/apache/arrow/blob/995abdc02fed412bbd947fe41a0765036dbbe820/cpp/src/arrow/python/serialize.cc#L588-L599
Do you intend to match performance with pyarrow for large tables? Or is this out of scope for the project?
Describe your problem
We have received bug report on macOS that ImportError
occurs when import vineyard
on Mac:
ImportError: dlopen(/usr/local/lib/python3.7/site-packages/vineyard-0.1.3-py3.7-macosx-10.14-x86_64.egg/vineyard/_C.cpython-37m-darwin.so, 2): Symbol not found: _aligned_alloc
Additional context
That may related to Homebrew/homebrew-core#46393 and Homebrew/homebrew-core#45585.
Hi,
As the document mentioned in the architecture , which is linked below,
https://v6d.io/notes/divein.html#architecture
the communicator seems to act like a data manager for the bulk exchange between different vineyard nodes
But i did not find any API or demo use of this class to exchange data
between different vineyard instance.
And it seems that i can't find this class in the source release of libvineyard
wish for your aid
Describe your problem
pandas 1.2.0 includes some incompatible changes that break our builder and resolver for DataFrame
.
Additional context
Also affects mars: mars-project/mars#1845
We haven't do benchmark and compare with similar systems, but that is actually necessary.
Conditions that we need to test:
Performance related issues:
Describe your problem
vineyardd works with default socket at /var/run/vineyard.sock
, which, usually requires a "root" permission. However when the file exists, vineyardd cannot report permission errors properly.
Vineyard's IPC client also suffers the same problem.
We need to test if the shared memory/IPC between pods are supported by other OCI container runtime (e.g., containerd)
Describe your problem
The compatibility between client and server should follows the semver, and we need to validate that when establishing the connection.
Additional context
None.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.