k2-fsa / k2 Goto Github PK
View Code? Open in Web Editor NEWFSA/FST algorithms, differentiable, with PyTorch compatibility.
Home Page: https://k2-fsa.github.io/k2
License: Apache License 2.0
FSA/FST algorithms, differentiable, with PyTorch compatibility.
Home Page: https://k2-fsa.github.io/k2
License: Apache License 2.0
Can anyone help to design a logo for k2?
Guys, I am creating this issue just as a way to show this pseudocode, it demonstrates parts of where we are going
with k2. One feature is that the fsa objects will have fields 'per_arc' which contain arbitrary tensors whose first
dimension is the values.Dim() of the arcs. (They are accessed as if they were class members but are really
members of a dict). When we do operations on FSAs, these per_arc
quantities are propagated. (This is
easy given the arc_map objects).
The python-level fsa object is going to be a more complicated object than the C++ one. You can think of it as containing the C++ object as just one member (maybe we could make the C++ object a member called arcs
or something).
loglikes = eval_nnet(minibatch.data)
dense_fsas = k2fsa.dense_fsas(loglikes)
lattices = k2fsa.pruned_compose(dense_fsas, decoding_graph)
if train:
oracle = k2fsa.pruned_compose(dense_fsas, minibatch.supervision_graphs)
# `oracle` inherited per_arc.word_labels from `supervision_graphs`
oracle.per_arc.frames = dense_fsas.get_idx01(oracle.src_arcs_a())
oracle = k2fsa.best_path(oracle)
ref_phone_labels = torch.zeros([dense_fsas.shape.TotSize(1)], dtype=torch.long)
ref_word_labels = ref_phone_labels.clone()
ref_phone_labels[oracle.per_arc.frames] = oracle.get_labels()
# note: word_labels were propagated from `supervision_graphs.per_arc.word_labels`
ref_word_labels[oracle.per_arc.frames] = oracle.per_arc.word_labels
post = k2fsa.arc_post(lattices)
lattices.per_arc.frames = dense_fsas.get_idx01(lattices.src_arcs_a())
# gives unique identifiers for
# frames of input, different for different utterances.
arc_post = k2fsa.arc_post(lattices)
num_arcs = lattices.arcs.shape[0]
phones_one_hot = torch.zeros(num_arcs, num_phones)
phones_one_hot[range(num_arcs), lattices.get_labels()] = 1.0
# initial features are (acoustic, LM, posterior, LLR vs. best path, one-hot phone labels
lattices.per_arc.feats = torch.cat(torch.stack(lattices.inputs.scores_a(),
lattices.inputs.scores_b(), log(arc_post), k2fsa.llr(lattices)),
phones_one_hot)
# augment with the features above but averaged on each frame, over all paths.
lattices.per_arc.feats = torch.cat(lattices.per_arc.feats,
pool_and_redistribute(lattices.per_arc.feats,
weights=arc_post, buckets=lattices.per_arc.frames))
if train:
lattices.per_arc.phone_correct = (lattices.get_labels() == ref_phone_labels[lattices.per_arc.frames])
# note: word_labels were propagated from `decoding_graph.per_arc.word_labels`
lattices.per_arc.word_correct = (lattices.per_arc.word_labels == ref_word_labels[lattices.per_arc.frames])
# Convert the lattices into n-best lists
# such that each arc appears in at least one linear sequence.
nbest = k2fsa.covering_nbest(lattices)
(feats, frames, phone_correct, word_correct) = \
k2.ragged_to_tensor(nbest.arcs.shape, nbest.per_arc.feats,
nbest.per_arc.frames, nbest.per_arc.phone_correct,
nbest.per_arc.word_correct)
loglikes = dense_fsas.loglikes[frames]
(word_confidence, phone_confidence) = confidence_model(loglikes, scores)
if train:
weights = 1.0 / (1.0 + nbest.num_paths[nbest.per_fsa.src_indexes])
objf += confidence_objf(word_confidence, word_correct, weights) + \
confidence_objf(phone_confidence, phone_correct, weights)
(nbest.per_arc.phone_confidence,
nbest.per_arc.word_confidence) = k2.ragged_to_tensor_inv(nbest.arcs.shape,
word_confidence, phone_confidence)
recombined = k2fsa.union(nbest, row_ids=nbest.per_fsa.src_indexes)
# Use phone and word confidences as the scores. Note, these include
# confidences for epsilons so we won't just delete everything.
recombined.arcs.scores[:] = (recombined.per_arc.phone_confidence +
3 * recombined.per_arc.word_confidence)
result = k2fsa.best_path(recombined)
# can access result.per_arc.{phone,word}_confidence and the like...
I found that the first line is a relative path of each source file. Is it for any purpose?
Now, I wanna change it to be doxygen friendly and change the dir organization. And keep the RELATIVE path is tedious and non-doxygen style as to me.
I suggest a new one below, though it may be a trivial thing. But I wanna ask guys firstly.
/**
* @file context.cu
* @brief
* Implement ...
*
* @copyright
* Copyright (c) 2020 Name (email)
*
* @copyright
* See LICENSE for clarification regarding multiple authors
*/
The following code
Lines 78 to 82 in daf77cf
Line 875 in cc752e5
I've found two problems in the code:
(1)
Lines 888 to 893 in cc752e5
tot_sizes_out
does not include the last axis, e.g, the number of arcs in an fsa (offsets
have num_axes + 1
rows.)
(2)
Lines 895 to 897 in cc752e5
Lines 813 to 817 in cc752e5
It is invalid to call ans.Populate();
since it does not contain valid data, only the shapes are allocated.
Guys,
This is an issue for the python interface...
there will be lots of times when someone wants to access a member of, say, a class, that is of type Array1.
In these instances we will want to make the Array1 appear as a PyTorch tensor.
I mean, I suppose we could do this when we wrap Array1 itself.
I believe this should be possible, assuming we were using a pytorch context (and maybe even if not).
The same issue arises with Array2 and Tensor, but less often.
OK, I have a proposal for how we can start moving things over to the new framework.
We can't do this by incremental changes to existing code; I think it will be easier to start from the code that's currenly in the 'cuda/' subdirectory, move those files up one level and move the existing files down one level into 'old/'. We can do all this in a separate branch, say, a branch called 'cuda'.
Then the initial task is to figure out how to compile it with nvcc, with a dependency on cub (cub is a header library). (Note: we'd plan to put all the source files through nvcc).
Of course it won't compile initially, but that's OK; at least being at the stage where we have compilation errors would be an improvement.
Note: the eventual goal would be to take all the old implementations and have them be the "CPU implementation" (i.e. the one we use by default on CPU) for those algorithms. Most algorithms won't have a GPU version, initially. But of course we'd have to change things to use the types defined in cuda/, e.g. Array1, Context, Region, and so on. I'll work on completing more of the code so it's more obvious how it's supposed to work.
Before designing the FSA object, we need to decide on a design for k2.Ragged, which will be the Python interface
for the RaggedTensor. I will be using this issue to write down some notes on that.
FYI, Added as a TODO, just to make debug easy by checking error immediately after the kernel call.
@csukuangfj what is the recommended way to tell CMake how to pick up a virtual Python environment?
We are having a problem on a new environment in Xiaomi.
Can we please have Clone() functions for Array1 and Array2, that will give a new array on the same device?
BTW there is something I'm not very satisfied about the current design, that some functions are members but others are not, even though they might most naturally be members. I'm thinking for now we could put them as non-member functions like ToContiguous(), in array_ops.h.
Also my intention was that any functions involving Ragged would go in ragged_ops.h, not array_ops.h. Just FYI: not urgent or important.
The code compiles fine on my MacBook (Darwin - 17.7.0 - x86_64), but it fails on Linux (Linux - 4.9.0-11-amd64 - x86_64), with the following error:
$ cmake ..
CMake Error at cmake/googletest.cmake:40 (target_include_directories):
Cannot specify include directories for imported target "gtest".
Call Stack (most recent call first):
cmake/googletest.cmake:50 (download_googltest)
CMakeLists.txt:29 (include)
I just found we can implement RowSplitsToRowIds
with Load Balancing Search or IntervalExpand in moderngpu.
Not sure which approach would be faster, but I think it may be worth doing this and compare the performance with the current implementation (or steal some ideas from their implementations). Not so hurry though (may do this after the first release).
negate_scores
option with openfst
openfst
is trueopenfst
is trueI notice we import graphviz
in python/k2/fsa.py
which means k2
python package requires graphviz
as a dependency now. Is that necessary? Can we just put those print to dot
code in a separate file and import graphviz
there? Then user could import k2 without graphviz
and just install graphviz
when they really want to print FSA (I guess most of users will not do this?)
24: File "/home/storage23/qiuhaowen/k2/k2/python/tests/arc_test.py", line 13, in <module>
24: import k2
24: File "/home/storage23/qiuhaowen/k2/k2/python/k2/__init__.py", line 2, in <module>
24: from .fsa import Fsa
24: File "/home/storage23/qiuhaowen/k2/k2/python/k2/fsa.py", line 16, in <module>
24: from graphviz import Digraph
24: ModuleNotFoundError: No module named 'graphviz'
1/1 Test #24: arc_test_py ......................***Failed 4.64 sec
What I did with Connect() was intended as a kind of template for how other algorithms can be wrapped.
I suggest a next one that could be wrapped is Intersect(), if someone has time (no hurry!)
I would like to implement the following two functions:
Since the above functions do not affect training, I am going to implement them in Python.
Guys,
I dont know where we are with the I/O format right now, but I propose that we find a way to enable I/O to and
from FSAs and strings.
We could have the following options, with defaults:
acceptor=True # if true, format is: src_state dest_state label cost. if false it's src_state dest_state label aux_label cost; the aux_label would be stored as a separate vector of int32.
negate_scores=False # if true, the string form has the weights as costs, not scores, so we negate as we read/write
The final-state is of course treated differently. The format will be like OpenFST, as just:
final_state
(with no cost, since we don't support a cost on the final state, there should be an arc with -1 going to final state, which
has the costs on).
Eventually we'd have options related to symbol tables, i.e. allow the user to supply a symbol table.
This interface should probably exist at the C++ level in some form, since the aim is to basically have the same functionality
available from C++ as from Python.
We will avoid sort. The bottom line, The K2 reading function should be simple and fast, while the OpenFST reading function may have to address corner cases but we will only address those when we see necessary.
For the following fsa
Lines 26 to 38 in d291daf
It prints
"Valid|Nonempty|TopSorted|TopSortedAndAcyclic|ArcSortedAndDeterministic|EpsilonFree|MaybeAccessible|Serializable"
ArcSortedAndDeterministic should NOT be there.
At the Kaldi Community Roadmap Sanjeev suggested adding this as an issue that is not forgotten.
''Answered orally: we need to think through this. Make this an “issue” on GitHub so we don’t forget?''
Question was came from the context of Lhotse:
Does the ability to deal with long recordings for training carry over to decoding and (force) alignments?
In particular recordings that are >=1h.
Best,
-Thomas
Guys,
For assertions I'm just using assert(...), but I'm not sure if this is a good practice?
I think we should decide how to do assertions and so on. It might make sense to have different level of assert
for things that are inside loops vs. things that won't slow the code down. And maybe errors and warnings too?
Any suggestions?
Would be nice to have a macro for compile time assertion. Meixu, maybe you could dig out that code from what you did and make a PR?
Need this (analgous to MaxPerSublist).
This and similar should be in ragged_ops.h, not array_ops.h.
Guys,
I have been naming files .h and .cc even though they do contain a little CUDA code.
This is partly because my emacs setup doesn't auto-format CUDA right, which I know is a bit lazy.
I notice Meixu in his new PR is naming the test files .cu.
I think we should be consistent here. Do you guys think it was a mistake to start naming
files containing CUDA .h and .cc? (they are mostly C++ though). And do you think we should make
the test files .cc as well? How does this interact with CMake?
As we will declare many classes (Fsa, AuxLabels, Cfsa, etc.) with Array2
, wondering if there is any convenient way to create alias of indexes
and data
so that we can use those alias in functions to make code clear (otherwise we may create those alias in functions again and again)
For example (pseudo code)
using Fsa = Array2<Arc*, int32_t>
alias Fsa::arc_indexes = Fsa::indexes;
alias Fsa::arcs = Fsa::data;
One possible approach I thought of is wrapping Array2
in a struct:
struct Fsa {
Array2<Arc *, int32_t> m;
Arc *&arcs = m.data;
int32_t *&arc_indexes = m.indexes;
};
(Two extra advantages of using struct here are that:
Array2
, for example they can not pass Fsa to a function that accepts Cfsa where Fsa = Cfsa = Array2<Arc*, int32_t>
)NumStates()
, FinalStates()
, NumArcs
. Otherwise it seems we could just define those methods in Python code (and cannot do this in C++ code)?It would be nice to have utilities for randomized testing.
I'm thinking of a mechanism of generating random FSAs.
We have something in Kaldi's fstext/ directory. There could be an option for when it needs to be acyclic.
It may be early to consider this, but I have a thought, and may others have thoughts too.
No hurry to make decision, just post candidates with pros and cons.
I'm documenting the things that we have to improve for the build process. We can have further discussion here, and make changes for the points that we think are reasonable:
Can someone please write a template that works for an arbitrary list of objects of possibly different types,
and does this:
ContextPtr GetContext(S &s, T &t, U &u, ..) {
ContextPtr ans1 = s.Context(), ans2 = GetContext(t, u, ..);
assert(*ans1 == *ans2 && "Contexts mismatch");
return ans1;
}
You can make a PR to my cuda_draft branch, e.g. in context.h, although of course that branch doesn't compile.
Guys,
This issue is for discussion of the wrapping of k2.Array1<X>
and k2.Array2<X>
. I know we already have some code.
Specifically what I wonder is, what w'll be doing about k2.Array1<Arc>
, as this is part of FSAs. Is there any way to make this so it appears to be a PyTorch array of int32_t, of N by 4? I don't even know if that's the best way. If it's possible to make PyTorch treat it as an opaque type of 16 bytes, that's fine with me too.
Dan
see GetTransposeReordering(), it's here
#233
(already merged, but only declared)
Guys (and especially @csukuangfj) just to let you know that we are working under a certain amount of time pressure. We have Nov 1st as a hard deadline to release something, which should at least include an example with CTC and maybe one with LF-MMI. Ideally the key parts of k2 itself will be mostly finished by mid-October or so, which will give us time to test and tune recipes and integration with Lhotse.
Any progress on the issues
will be appreciated.
We need a way to display FSA properties as text, mostly for debug/diagnostics. E.g.
// e.g. FsaPropertiesAsString(3) = "kFsaPropertiesValid|kFsaPropertiesNonempty"
std::string FsaPropertiesAsString(int32_t properties);
I realized that it will be convenient for certain algorithms if we can construct an FSA from vector of arcs that isn't sorted. (So that version of the constructor would have to sort them, not necessarily using std::sort). In addition, if we allow that constructor to accept input where the final-state is not numbered last, and have it topologically sort the input itself, it would be helpful; then algorithms like composition and determinization won't have to worry about renumbering states; and they can output big lists of arcs all at once.
If it seems unnatural to implement this as a constructor, having it as a pointer output-arg to a function is fine.
Also, we should have a swap() or Swap() function in the Fsa type, which would call the vectors' swap functions.
Just an FYI, I moved the master to old_master
and merged the cuda_draft branch into master.
We will now work on master
.
Can someone please check that the makefile setup avoids doing extra work whenever possible?
Scenarios like:
rerun cmake ..
: should not re-download stuff or recompile stuff it's already compiled, if possible?
change one small piece of source: should not recompile everything.
change one test program: is it possible to only recompile that one program?
OS version is Linux 3.10.0-957.el7.x86_64
nvcc is 10.2
gcc is 6.3.1
Error is here:
`
[ 94%] Building CUDA object k2/csrc/cuda/CMakeFiles/ops_test.dir/ops_test.cu.o`
cd /search/odin/wangjiawen/k2/build/k2/csrc/cuda && /usr/local/cuda-10.2/bin/nvcc -forward-unknown-to-host-compiler -DGOOGLE_GLOG_DLL_DECL="" -DGOOGLE_GLOG_DLL_DECL_FOR_UNITTESTS="" -I/search/odin/wangjiawen/k2 -I/search/odin/wangjiawen/k2/build/_deps/cub-src -I/search/odin/wangjiawen/k2/build/_deps/glog_glog-build -I/search/odin/wangjiawen/k2/build/_deps/glog_glog-src/src -I/search/odin/wangjiawen/k2/build/_deps/googletest-src/googlemock/include -isystem=/search/odin/wangjiawen/k2/build/_deps/googletest-src/googletest/include -isystem=/search/odin/wangjiawen/k2/build/_deps/googletest-src/googletest --expt-extended-lambda -gencode arch=compute_30,code=sm_30 --expt-extended-lambda -gencode arch=compute_32,code=sm_32 --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_52,code=sm_52 --expt-extended-lambda -gencode arch=compute_53,code=sm_53 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_62,code=sm_62 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_72,code=sm_72 -g -std=c++14 -x cu -c /search/odin/wangjiawen/k2/k2/csrc/cuda/ops_test.cu -o CMakeFiles/ops_test.dir/ops_test.cu.o
/search/odin/wangjiawen/k2/k2/csrc/cuda/ops.h(79): error: class "std::shared_ptr<k2::Context>" has no member "IsCompatible"
detected during instantiation of "void k2::GpuTransposeTest<T>(int32_t, int32_t, int32_t, __nv_bool) [with T=int32_t]"
/search/odin/wangjiawen/k2/k2/csrc/cuda/ops_test.cu(80): here/search/odin/wangjiawen/k2/k2/csrc/cuda/ops.h(80): error: class "std::shared_ptr<k2::Context>" has no member "IsCompatible"
detected during instantiation of "void k2::GpuTransposeTest<T>(int32_t, int32_t, int32_t, __nv_bool) [with T=int32_t]"
/search/odin/wangjiawen/k2/k2/csrc/cuda/ops_test.cu(80): here
which cause by:
template <typename T> void Transpose(ContextPtr &c, const Array2<T> &src, Array2<T> *dest) { assert(c.IsCompatible(src.Context())); assert(c.IsCompatible(dest->Context())); //...
the c is get by std::make_shared
as i known, this syntax is supported by c++11 and above
and i passed -std=c++14 to nvcc, so, its weird..
I find that there are two projects, both from facebook, that have some overlaps with K2:
Ragged<T>
Perhaps we can spend some time to find whether we can learn something from them.
I made the project, but couldn't find the module _k2
, or is that an extended library?
Guys,
Here's something that it would be great to be able to finish within a couple of weeks. I'd like to be able to do the
following:
This will require writing quite a bit of code, but I think it's mostly straightforward. I have not had enough energy to be extremely pro-active with some of this stuff. If there's anything you guys need clarification on, let me know.
-- Found PythonInterp: /home/linuxbrew/.linuxbrew/bin/python3.8 (found version "3.8.5")
-- Found PythonLibs: /home/linuxbrew/.linuxbrew/opt/[email protected]/lib/libpython3.8.so
-- pybind11 v2.5.0
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'torch'
CMake Error at cmake/torch.cmake:10 (find_package):
By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Torch", but
CMake did not find one.
Could not find a package configuration file provided by "Torch" with any of
the following names:
TorchConfig.cmake
torch-config.cmake
Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
"Torch_DIR" to a directory containing one of the above files. If "Torch"
provides a separate development package or SDK, be sure it has been
installed.
Call Stack (most recent call first):
CMakeLists.txt:100 (include)
-- Configuring incomplete, errors occurred!
See also "/home/storage06/dpovey/k2/build/CMakeFiles/CMakeOutput.log".
See also "/home/storage06/dpovey/k2/build/CMakeFiles/CMakeError.log".
With the introduction of k2::Context
, the current mechanism to communicate with PyTorch via DLPack is not sufficient since we need to allocate/deallocate memory on different devices, while DLPack can only pass pre-allocated memory around.
I would like to update the build system to link against PyTorch with the following goals in mind:
(1) The build system should be simple. The PyTorch dependency will be installed with pip install torch
so that
C++ shares the same PyTorch version with Python.
(2) Replace the current k2::CudaContext
with the one from PyTorch, which is much faster with memory caching.
I'm thinking it may be easiest to have a single class that supports both (C++) Fsa and FsaVec, which of course are actually the same type; we can call it k2.Fsa. The .shape attribute will tell us how many axes there are, e.g. f.shape = (256, None, None) for a vector of 56 Fsa's or f.shape = (13, None) for an Fsa. Note, this .shape doesn't exist at the C++ level but can be obtained from the .NumAxes() and .Dim0(). (Note: below, assume a variable f is of type k2.Fsa).
f.arcs will be a RaggedArc, i.e. a pybind11-wrapped version of Ragged, and its members will be accessible as usual, e.g. f.arcs.row_ids(1), f.arcs.values and so on, although, we can declare that users should not use that interface to mutate anything.
Generally speaking the only way to construct a k2.Fsa will be through one of its constructors or by an FSA operation such as composition, top-sorting, etc. (Those operations will likely be available as k2.fsa.Intersect(), and so on, which will accept k2.Fsa objects and invoke the _k2.xxxx
functions exposed by pybind11. Once constructed, the object's structural elements likely won't be mutable, except we may support, say, swapping the input and output labels or setting the weights (for which, see below) as long as the structure is unchanged. Such operations would probably be enabled through special interfaces, not by having the users simply write to fields.
The Fsa will have arbitrary attributes which are PyTorch tensors and whose first dimension equals the total number of arcs (i.e. f.arcs.numel()). These will be stored in a dict and accessed via __getattr__
and __setattr__
.
__setattr__
will just check that the first index equals the arcs.numel(), and set it in the dict. However, if it is the .weight attribute it will also overwrite the weights in the FSA with it, i.e. the .scores field, in addition to setting the class member.__getattr__
will just return the attribute. It will also support getting certain "special" attributes, particularly the .labels (or .symbols?).Fsa operations will propagate the attributes as follows.
Unary operations where each output-arc corresponds to one input-arc, such as top-sort:
Just do output_attr = input_attr[arc_map]
Unary operations where each output-arc corresponds to zero or more input-arcs, such as
determinization:
Binary operations where each output-arc corresponds to either inputa_arc or inputb_arc or the pair (inputa_arc,inputb_arc); I'm thinking about composition; the 1st 2 cases relate to epsilons.
Unary operations where the output is a scalar, such as computing the total score:
Note: we will always create an attribute called '.scores' when we construct an FSA, e.g. from a tensor. This .scores attribute will be parallel to the .score field of the arcs, the idea is that they will always be the same (but separately stored). If the user overwrites the .scores field, we will also automatically overwrite the .score elements of the arcs. The reason for doing this is for backprop purposes: we won't have to worry about how PyTorch treats backprop when we're reinterpreting int32's as floats. We'll be using PyTorch indexing to propagate the .scores fields when we do operations, so backprop will automagically happen without our having to do anything special.
I changed my mind about putting extra fields in a separate sub-object called per_arc. We can just set them in the k2.Fsa object directly, it's easier to code.
Hi guys,
I tried to update gcc on Xiaomi cluster from 4.8.5
to 7.3.0
with conda (note 4.8.5
is the default version installed by administrators and it works well with nvcc). However, after the updating, when I run
nvcc -o test test.cu
it will get error
$my-local-conda-env-path/x86_64-conda_cos6-linux-gnu/bin/ld: cannot find -lcudadevrt
Then I run
nvcc -L/usr/local/cuda-10.0/lib64 test test.cu
It will succeed with no error (note libcudadevrt
is in folder /usr/local/cuda-10.0/lib64
).
However, LD_LIBRARY_PATH
on the cluster has already included path /usr/local/cuda-10.0/lib64
, so it's strange to me that nvcc -o test test.cu
fails. Do you have any idea about this?
It will be good if we provide a way to allocate memory that just goes to cudaMalloc, for purposes of running cuda-memcheck. Then we have an easy way to automatically find out-of-bounds memory access. Not urgent.
Guys,
Not super important, but is it possible to make it so that k2 tensors can survive the round trip to PyTorch tensors and back?
I mean, so that they would have a pointer to the original Region, rather than an extra layer of wrapping each time we go back and forth?
Can someone please write a function to transpose a RaggedTensor, in ragged.h?
Will have similar interface to Transpose() for RaggedShape, but also take care of the tensor elements.
Will have to have a templated implementation, e.g. in ops_inl.h. Should be doable with a single kernel;
can call the Transpose() for RaggedShape to do the shape part.
Guys,
I spoke with some guys at NVidia to get advice on how we'd implement an interface like
template <typename T, typename Op>
void SortSublists(Ragged<T> &src, Array1<int32_t> *order);
They advised this
https://moderngpu.github.io/segsort.html
It will require adding it as another dependency (moderngpu) but I think it's header only. Apparently it is likely to be added to thrust at some point (i.e. that sort thing). Sorry i dont have much energy right now so rather than doing it myself I am putting the info here.
Eventually we'll want to make it customizable with sorting function-object, actually we should make it
template <typename T, typename Op=LessThan >
struct LessThan {
inline host device forceinline operator (const T &a, const T &b) const { return (a < b); }
};
(+device copy constructor as needed)
Guys, the error-checking stuff is still quite confused.
There is a file error.h that probably shouldn't be there, and glog is included by both error.h and context.h.
My proposal is to limit ourselves to the macros in debug.h and not use glog, because it's too complex and
IMO not really necessary. Any feedback on this?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.