k2-fsa / k2 Goto Github PK

View Code? Open in Web Editor NEW

1.1K 1.1K 211.0 15.37 MB

FSA/FST algorithms, differentiable, with PyTorch compatibility.

Home Page: https://k2-fsa.github.io/k2

License: Apache License 2.0

C++ 30.00% CMake 2.52% Shell 0.78% C 1.29% Python 22.33% Cuda 43.08% Makefile 0.01%

k2's People

Contributors

Stargazers

Watchers

Forkers

qindazhu fanlu csukuangfj chenguoguo mahsa7823 dongjigao jimbozhang naxingyu yangxueruivs yongyug pkufool megazone87 leonwlw lvzhiqiang jarvan-wang lvhang desh2608 street-1205602819 mirishkarganesh iiscleap proger twistedmove amirhussein96 thschaaf entn-at chunhuiwang-china leixin johnjosephmorgan danpovey segmentationfaults zeta1999 zhichaowang yushanyong li563042811 lql0716 karendeng 2017qy noclever zcth428 appalachianwine freewym jiltseb glynpu jerrywei1985 qijiaxing juxiangyu ts0923 cnheider zxynbnb gavinljj pzelasko gaofengcheng ouc-lan llmhao karelvesely84 yaguanghu mfch101 leomax-xiong zhu-han chienlinhuang1116 medabalimi sanzimu superxiang cangyuwangwc gmmo526 saber5433 yyht vijayaditya oplatek hanjiacheng xjohnxjohn aarora8 lumaku jtrmal frankenliu xstarse galv goodatlas normonisping takaaki-hori coderwhisky qingqingxu2020 hiyoung-asr emreozkose xiexukang jfzhouuu kingfener shiyuzh2007 co-develop-drv yh646492956 joewale blessyyyu wenzheliu-speech yueyedeai karenmars wantt georgeliou jasonyong zoucan520 zhengzheng-yay

k2's Issues

K2 logo

Can anyone help to design a logo for k2?

Pseudocode for confidence estimation

Guys, I am creating this issue just as a way to show this pseudocode, it demonstrates parts of where we are going
with k2. One feature is that the fsa objects will have fields 'per_arc' which contain arbitrary tensors whose first
dimension is the values.Dim() of the arcs. (They are accessed as if they were class members but are really
members of a dict). When we do operations on FSAs, these per_arc quantities are propagated. (This is
easy given the arc_map objects).

The python-level fsa object is going to be a more complicated object than the C++ one. You can think of it as containing the C++ object as just one member (maybe we could make the C++ object a member called arcs or something).

loglikes = eval_nnet(minibatch.data)
dense_fsas = k2fsa.dense_fsas(loglikes)
lattices = k2fsa.pruned_compose(dense_fsas, decoding_graph)
if train:
   oracle = k2fsa.pruned_compose(dense_fsas, minibatch.supervision_graphs)
   # `oracle` inherited per_arc.word_labels from `supervision_graphs`
    oracle.per_arc.frames = dense_fsas.get_idx01(oracle.src_arcs_a())
    oracle = k2fsa.best_path(oracle)
    ref_phone_labels = torch.zeros([dense_fsas.shape.TotSize(1)], dtype=torch.long)
    ref_word_labels = ref_phone_labels.clone()
    ref_phone_labels[oracle.per_arc.frames] = oracle.get_labels()
    # note: word_labels were propagated from `supervision_graphs.per_arc.word_labels`
    ref_word_labels[oracle.per_arc.frames] = oracle.per_arc.word_labels

post = k2fsa.arc_post(lattices)
lattices.per_arc.frames = dense_fsas.get_idx01(lattices.src_arcs_a())
      # gives unique identifiers for
      # frames of input, different for different utterances.
arc_post = k2fsa.arc_post(lattices)

num_arcs = lattices.arcs.shape[0]
phones_one_hot = torch.zeros(num_arcs, num_phones)
phones_one_hot[range(num_arcs), lattices.get_labels()] = 1.0
# initial features are (acoustic, LM, posterior, LLR vs. best path, one-hot phone labels
lattices.per_arc.feats = torch.cat(torch.stack(lattices.inputs.scores_a(),
      lattices.inputs.scores_b(), log(arc_post), k2fsa.llr(lattices)),
               phones_one_hot)
# augment with the features above but averaged on each frame, over all paths.
lattices.per_arc.feats = torch.cat(lattices.per_arc.feats,
                           pool_and_redistribute(lattices.per_arc.feats,
				weights=arc_post, buckets=lattices.per_arc.frames))

if train:
   lattices.per_arc.phone_correct = (lattices.get_labels() == ref_phone_labels[lattices.per_arc.frames])
   # note: word_labels were propagated from `decoding_graph.per_arc.word_labels`
   lattices.per_arc.word_correct = (lattices.per_arc.word_labels == ref_word_labels[lattices.per_arc.frames])

# Convert the lattices into n-best lists
# such that each arc appears in at least one linear sequence.
nbest = k2fsa.covering_nbest(lattices)

(feats, frames, phone_correct, word_correct) = \
   k2.ragged_to_tensor(nbest.arcs.shape, nbest.per_arc.feats,
                       nbest.per_arc.frames, nbest.per_arc.phone_correct,
                       nbest.per_arc.word_correct)

 loglikes = dense_fsas.loglikes[frames]

 (word_confidence, phone_confidence) = confidence_model(loglikes, scores)

 if train:
    weights = 1.0 / (1.0 + nbest.num_paths[nbest.per_fsa.src_indexes])
    objf += confidence_objf(word_confidence, word_correct, weights) + \
            confidence_objf(phone_confidence, phone_correct, weights)

 (nbest.per_arc.phone_confidence,
  nbest.per_arc.word_confidence) = k2.ragged_to_tensor_inv(nbest.arcs.shape,
                  word_confidence, phone_confidence)

 recombined = k2fsa.union(nbest, row_ids=nbest.per_fsa.src_indexes)
 # Use phone and word confidences as the scores.  Note, these include
 # confidences for epsilons so we won't just delete everything.
 recombined.arcs.scores[:] = (recombined.per_arc.phone_confidence +
                              3 * recombined.per_arc.word_confidence)
 result = k2fsa.best_path(recombined)
 # can access result.per_arc.{phone,word}_confidence and the like...

About docs at header of source file

I found that the first line is a relative path of each source file. Is it for any purpose?

Now, I wanna change it to be doxygen friendly and change the dir organization. And keep the RELATIVE path is tedious and non-doxygen style as to me.

I suggest a new one below, though it may be a trivial thing. But I wanna ask guys firstly.

/**
 * @file context.cu
 * @brief
 * Implement ...
 *
 * @copyright
 * Copyright (c)  2020  Name (email)					
 *
 * @copyright
 * See LICENSE for clarification regarding multiple authors
 */

Bugs in Append of RaggedShape

The following code

k2/k2/csrc/fsa_algo_test.cu

Lines 78 to 82 in daf77cf

 Fsa fsa1 = FsaFromString(s1); 

 Fsa fsa2 = FsaFromString(s2); 

 const Fsa *fsa_array[] = {&fsa1, &fsa2}; 

 // FsaVec fsa_vec = CreateFsaVec(2, &fsa_array[0]); // crash!

will abort the program due to the invocation of

k2/k2/csrc/ragged.cu

Line 875 in cc752e5

RaggedShape Append(int32_t axis, int32_t num_srcs, RaggedShape **src) {

I've found two problems in the code:

(1)

k2/k2/csrc/ragged.cu

Lines 888 to 893 in cc752e5

 Array2<int32_t> offsets = GetOffsets(num_srcs, src); 

 auto offsets_acc = offsets.Accessor(); 

 std::vector<int32_t> tot_sizes_out(num_axes); 

 for (int32_t axis = 0; axis < num_axes; ++axis) 

 tot_sizes_out[axis] = offsets_acc(axis, num_srcs);

tot_sizes_out does not include the last axis, e.g, the number of arcs in an fsa (offsets have num_axes + 1 rows.)

(2)

k2/k2/csrc/ragged.cu

Lines 895 to 897 in cc752e5

 RaggedShape ans = RaggedShapeFromTotSizes(c, num_axes, tot_sizes_out.data()); 

 Array1<int32_t *> dest_row_splits, dest_row_ids; 

 GetRowInfo(ans, &dest_row_splits, &dest_row_ids);

k2/k2/csrc/ragged.cu

Lines 813 to 817 in cc752e5

 void GetRowInfo(RaggedShape &src, Array1<int32_t *> *row_splits, 

 Array1<int32_t *> *row_ids) { 

 int32_t axes = src.NumAxes(); 

 K2_CHECK_GE(axes, 2); 

 src.Populate();

It is invalid to call ans.Populate(); since it does not contain valid data, only the shapes are allocated.

Returning things as PyTorch tensors

Guys,
This is an issue for the python interface...
there will be lots of times when someone wants to access a member of, say, a class, that is of type Array1.
In these instances we will want to make the Array1 appear as a PyTorch tensor.
I mean, I suppose we could do this when we wrap Array1 itself.
I believe this should be possible, assuming we were using a pytorch context (and maybe even if not).
The same issue arises with Array2 and Tensor, but less often.

Start branch for CUDA development

OK, I have a proposal for how we can start moving things over to the new framework.

We can't do this by incremental changes to existing code; I think it will be easier to start from the code that's currenly in the 'cuda/' subdirectory, move those files up one level and move the existing files down one level into 'old/'. We can do all this in a separate branch, say, a branch called 'cuda'.

Then the initial task is to figure out how to compile it with nvcc, with a dependency on cub (cub is a header library). (Note: we'd plan to put all the source files through nvcc).

Of course it won't compile initially, but that's OK; at least being at the stage where we have compilation errors would be an improvement.

Note: the eventual goal would be to take all the old implementations and have them be the "CPU implementation" (i.e. the one we use by default on CPU) for those algorithms. Most algorithms won't have a GPU version, initially. But of course we'd have to change things to use the types defined in cuda/, e.g. Array1, Context, Region, and so on. I'll work on completing more of the code so it's more obvious how it's supposed to work.

k2.Ragged

Before designing the FSA object, we need to decide on a design for k2.Ragged, which will be the Python interface
for the RaggedTensor. I will be using this issue to write down some notes on that.

TODO: check Cuda error for every kernel launch in debug mode

FYI, Added as a TODO, just to make debug easy by checking error immediately after the kernel call.

Cmake issues

@csukuangfj what is the recommended way to tell CMake how to pick up a virtual Python environment?
We are having a problem on a new environment in Xiaomi.

Clone() for Array1 and Array2

Can we please have Clone() functions for Array1 and Array2, that will give a new array on the same device?

BTW there is something I'm not very satisfied about the current design, that some functions are members but others are not, even though they might most naturally be members. I'm thinking for now we could put them as non-member functions like ToContiguous(), in array_ops.h.

Also my intention was that any functions involving Ragged would go in ragged_ops.h, not array_ops.h. Just FYI: not urgent or important.

Compile issue

The code compiles fine on my MacBook (Darwin - 17.7.0 - x86_64), but it fails on Linux (Linux - 4.9.0-11-amd64 - x86_64), with the following error:

$ cmake ..
CMake Error at cmake/googletest.cmake:40 (target_include_directories):
  Cannot specify include directories for imported target "gtest".
Call Stack (most recent call first):
  cmake/googletest.cmake:50 (download_googltest)
  CMakeLists.txt:29 (include)

Implemented RowSplitsToRowIds with Load-Balancing Search ( or IntervalExpand) in moderngpu?

I just found we can implement RowSplitsToRowIds with Load Balancing Search or IntervalExpand in moderngpu.
Not sure which approach would be faster, but I think it may be worth doing this and compare the performance with the current implementation (or steal some ideas from their implementations). Not so hurry though (may do this after the first release).

add openfst support for fst read

replace the negate_scores option with openfst
support final states with weight when openfst is true
support multiple final states when openfst is true

Is graphviz now a required dependency for k2 python package?

I notice we import graphviz in python/k2/fsa.py which means k2 python package requires graphviz as a dependency now. Is that necessary? Can we just put those print to dot code in a separate file and import graphviz there? Then user could import k2 without graphviz and just install graphviz when they really want to print FSA (I guess most of users will not do this?)

24:   File "/home/storage23/qiuhaowen/k2/k2/python/tests/arc_test.py", line 13, in <module>
24:     import k2
24:   File "/home/storage23/qiuhaowen/k2/k2/python/k2/__init__.py", line 2, in <module>
24:     from .fsa import Fsa
24:   File "/home/storage23/qiuhaowen/k2/k2/python/k2/fsa.py", line 16, in <module>
24:     from graphviz import Digraph
24: ModuleNotFoundError: No module named 'graphviz'
1/1 Test #24: arc_test_py ......................***Failed    4.64 sec

Wrapping more of the host code

What I did with Connect() was intended as a kind of template for how other algorithms can be wrapped.
I suggest a next one that could be wrapped is Intersect(), if someone has time (no hurry!)

Support symbol table and visualization

I would like to implement the following two functions:

Add symbol tables to Fsa
Support visualizations of Fsa

Since the above functions do not affect training, I am going to implement them in Python.

Conversion to/from strings

Guys,
I dont know where we are with the I/O format right now, but I propose that we find a way to enable I/O to and
from FSAs and strings.
We could have the following options, with defaults:

acceptor=True # if true, format is: src_state dest_state label cost. if false it's src_state dest_state label aux_label cost; the aux_label would be stored as a separate vector of int32.
negate_scores=False # if true, the string form has the weights as costs, not scores, so we negate as we read/write

The final-state is of course treated differently. The format will be like OpenFST, as just:
final_state
(with no cost, since we don't support a cost on the final state, there should be an arc with -1 going to final state, which
has the costs on).

Eventually we'd have options related to symbol tables, i.e. allow the user to supply a symbol table.

This interface should probably exist at the C++ level in some form, since the aim is to basically have the same functionality
available from C++ as from Python.

Create separate internal reading functions for K2 FSAs and OpenFST FSAs

For K2 FSAs, we assume all source states are in non-descending order, and there's only 1 final state with no weight.
For OpenFST FSAs, it's possible there will be more than one final states with weight. We assume users will get the FSTs from fstprint, which should be in order, except the start state.

We will avoid sort. The bottom line, The K2 reading function should be simple and fast, while the OpenFST reading function may have to address corner cases but we will only address those when we see necessary.

Bugs in GetFsaBasicProperties

For the following fsa

k2/k2/csrc/fsa_algo_test.cu

Lines 26 to 38 in d291daf

 // src_state dst_state label cost 

 std::string s = R"(0 1 10 -1.2 

  0 2 6 -2.2 

  0 3 9 -2.2 

  1 2 8 -3.2 

  1 3 6 -4.2 

  2 3 5 -5.2 

  2 4 4 -6.2 

  3 5 -1 -7.2 

  5 

  )"; 

 Fsa fsa = FsaFromString(s); 

 int32_t prop = GetFsaBasicProperties(fsa);

It prints

"Valid|Nonempty|TopSorted|TopSortedAndAcyclic|ArcSortedAndDeterministic|EpsilonFree|MaybeAccessible|Serializable"

ArcSortedAndDeterministic should NOT be there.

Will Lothse dealing with long recordings carry over to decoding?

At the Kaldi Community Roadmap Sanjeev suggested adding this as an issue that is not forgotten.
''Answered orally: we need to think through this. Make this an “issue” on GitHub so we don’t forget?''

Question was came from the context of Lhotse:
Does the ability to deal with long recordings for training carry over to decoding and (force) alignments?
In particular recordings that are >=1h.

Best,
-Thomas

Assertions etc.

Guys,
For assertions I'm just using assert(...), but I'm not sure if this is a good practice?
I think we should decide how to do assertions and so on. It might make sense to have different level of assert
for things that are inside loops vs. things that won't slow the code down. And maybe errors and warnings too?
Any suggestions?

Compile time assertion

Would be nice to have a macro for compile time assertion. Meixu, maybe you could dig out that code from what you did and make a PR?

MinPerSublist

Need this (analgous to MaxPerSublist).
This and similar should be in ragged_ops.h, not array_ops.h.

File naming

Guys,
I have been naming files .h and .cc even though they do contain a little CUDA code.
This is partly because my emacs setup doesn't auto-format CUDA right, which I know is a bit lazy.
I notice Meixu in his new PR is naming the test files .cu.
I think we should be consistent here. Do you guys think it was a mistake to start naming
files containing CUDA .h and .cc? (they are mostly C++ though). And do you think we should make
the test files .cc as well? How does this interact with CMake?

Is there any convenient way to create alias of `indexes` and `data` in `Array2`？

As we will declare many classes (Fsa, AuxLabels, Cfsa, etc.) with Array2, wondering if there is any convenient way to create alias of indexes and data so that we can use those alias in functions to make code clear (otherwise we may create those alias in functions again and again)

For example (pseudo code)

using Fsa = Array2<Arc*, int32_t>
alias Fsa::arc_indexes = Fsa::indexes;
alias Fsa::arcs = Fsa::data;

One possible approach I thought of is wrapping Array2 in a struct:

struct Fsa {
 Array2<Arc *, int32_t> m;
 Arc *&arcs = m.data;
 int32_t *&arc_indexes = m.indexes;
}；

(Two extra advantages of using struct here are that:

Users cannot misuse class as another class that has the same template parameters in Array2, for example they can not pass Fsa to a function that accepts Cfsa where Fsa = Cfsa = Array2<Arc*, int32_t>)
We can define extra methods in those structs, such as NumStates(), FinalStates(), NumArcs. Otherwise it seems we could just define those methods in Python code (and cannot do this in C++ code)?
)

Randomized testing

It would be nice to have utilities for randomized testing.
I'm thinking of a mechanism of generating random FSAs.
We have something in Kaldi's fstext/ directory. There could be an option for when it needs to be acyclic.

About document

It may be early to consider this, but I have a thought, and may others have thoughts too.

No hurry to make decision, just post candidates with pros and cons.

Improve build process

I'm documenting the things that we have to improve for the build process. We can have further discussion here, and make changes for the points that we think are reasonable:

Properly document the dependencies (in INSTALL?)
Add requirements.txt when necessary
Use a specific Python environment in cmake

Template for contexts

Can someone please write a template that works for an arbitrary list of objects of possibly different types,
and does this:

ContextPtr GetContext(S &s, T &t, U &u, ..) {
   ContextPtr ans1 = s.Context(),  ans2 = GetContext(t, u, ..);
   assert(*ans1 == *ans2 && "Contexts mismatch");
   return ans1;  
}

You can make a PR to my cuda_draft branch, e.g. in context.h, although of course that branch doesn't compile.

k2.Array1, k2.Array2

Guys,
This issue is for discussion of the wrapping of k2.Array1<X> and k2.Array2<X>. I know we already have some code.
Specifically what I wonder is, what w'll be doing about k2.Array1<Arc>, as this is part of FSAs. Is there any way to make this so it appears to be a PyTorch array of int32_t, of N by 4? I don't even know if that's the best way. If it's possible to make PyTorch treat it as an opaque type of 16 bytes, that's fine with me too.
Dan

Implement transpose function for ragged tensor

see GetTransposeReordering(), it's here
#233
(already merged, but only declared)

Timeline

Guys (and especially @csukuangfj) just to let you know that we are working under a certain amount of time pressure. We have Nov 1st as a hard deadline to release something, which should at least include an example with CTC and maybe one with LF-MMI. Ideally the key parts of k2 itself will be mostly finished by mid-October or so, which will give us time to test and tune recipes and integration with Lhotse.

Any progress on the issues will be appreciated.

k2

Display FSA properties [EASY!]

We need a way to display FSA properties as text, mostly for debug/diagnostics. E.g.
// e.g. FsaPropertiesAsString(3) = "kFsaPropertiesValid|kFsaPropertiesNonempty"
std::string FsaPropertiesAsString(int32_t properties);

Construct FSA from un-sorted list of arcs

I realized that it will be convenient for certain algorithms if we can construct an FSA from vector of arcs that isn't sorted. (So that version of the constructor would have to sort them, not necessarily using std::sort). In addition, if we allow that constructor to accept input where the final-state is not numbered last, and have it topologically sort the input itself, it would be helpful; then algorithms like composition and determinization won't have to worry about renumbering states; and they can output big lists of arcs all at once.

If it seems unnatural to implement this as a constructor, having it as a pointer output-arg to a function is fine.

Also, we should have a swap() or Swap() function in the Fsa type, which would call the vectors' swap functions.

Branch

Just an FYI, I moved the master to old_master and merged the cuda_draft branch into master.
We will now work on master.

CMake issues

Can someone please check that the makefile setup avoids doing extra work whenever possible?
Scenarios like:
rerun cmake ..: should not re-download stuff or recompile stuff it's already compiled, if possible?
change one small piece of source: should not recompile everything.
change one test program: is it possible to only recompile that one program?

A compile error of k2/k2/csrc/cuda/ops_test.cu

OS version is Linux 3.10.0-957.el7.x86_64
nvcc is 10.2
gcc is 6.3.1

Error is here:
`

[ 94%] Building CUDA object k2/csrc/cuda/CMakeFiles/ops_test.dir/ops_test.cu.o

cd /search/odin/wangjiawen/k2/build/k2/csrc/cuda && /usr/local/cuda-10.2/bin/nvcc -forward-unknown-to-host-compiler -DGOOGLE_GLOG_DLL_DECL="" -DGOOGLE_GLOG_DLL_DECL_FOR_UNITTESTS="" -I/search/odin/wangjiawen/k2 -I/search/odin/wangjiawen/k2/build/_deps/cub-src -I/search/odin/wangjiawen/k2/build/_deps/glog_glog-build -I/search/odin/wangjiawen/k2/build/_deps/glog_glog-src/src -I/search/odin/wangjiawen/k2/build/_deps/googletest-src/googlemock/include -isystem=/search/odin/wangjiawen/k2/build/_deps/googletest-src/googletest/include -isystem=/search/odin/wangjiawen/k2/build/_deps/googletest-src/googletest --expt-extended-lambda -gencode arch=compute_30,code=sm_30 --expt-extended-lambda -gencode arch=compute_32,code=sm_32 --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_52,code=sm_52 --expt-extended-lambda -gencode arch=compute_53,code=sm_53 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_62,code=sm_62 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_72,code=sm_72 -g -std=c++14 -x cu -c /search/odin/wangjiawen/k2/k2/csrc/cuda/ops_test.cu -o CMakeFiles/ops_test.dir/ops_test.cu.o

/search/odin/wangjiawen/k2/k2/csrc/cuda/ops.h(79): error: class "std::shared_ptr<k2::Context>" has no member "IsCompatible"

detected during instantiation of "void k2::GpuTransposeTest<T>(int32_t, int32_t, int32_t, __nv_bool) [with T=int32_t]"

/search/odin/wangjiawen/k2/k2/csrc/cuda/ops_test.cu(80): here
/search/odin/wangjiawen/k2/k2/csrc/cuda/ops.h(80): error: class "std::shared_ptr<k2::Context>" has no member "IsCompatible"

detected during instantiation of "void k2::GpuTransposeTest<T>(int32_t, int32_t, int32_t, __nv_bool) [with T=int32_t]"

/search/odin/wangjiawen/k2/k2/csrc/cuda/ops_test.cu(80): here

which cause by:
template <typename T> void Transpose(ContextPtr &c, const Array2<T> &src, Array2<T> *dest) { assert(c.IsCompatible(src.Context())); assert(c.IsCompatible(dest->Context())); //...
the c is get by std::make_shared
as i known, this syntax is supported by c++11 and above
and i passed -std=c++14 to nvcc, so, its weird..

Projects that are related to K2

I find that there are two projects, both from facebook, that have some overlaps with K2:

gtn, Dan mentioned this a few days ago
nestedtensor, for tensors with irregular shapes, like Ragged<T>

Perhaps we can spend some time to find whether we can learn something from them.

where do I find _k2 module

I made the project, but couldn't find the module _k2, or is that an extended library?

Goal: composition to create supervision FSAs

Guys,

Here's something that it would be great to be able to finish within a couple of weeks. I'd like to be able to do the
following:

Create a lexicon.fst in Kaldi and write it as a text file.
Convert that to a k2 FSA (with the olabels as a separate vector)
Do Invert() to swap the olabels and ilabels, as we need to compose on the word labels
Convert a string (a supervision) to a linear FSA, e.g. could just make the FSA as a string.
Compose the lexicon FSA and the linear supervision FSA
Use the arc mapping to replace the word labels on the linear supervision FSA with the phone labels

This will require writing quite a bit of code, but I think it's mostly straightforward. I have not had enough energy to be extremely pro-active with some of this stuff. If there's anything you guys need clarification on, let me know.

Build error

-- Found PythonInterp: /home/linuxbrew/.linuxbrew/bin/python3.8 (found version "3.8.5")
-- Found PythonLibs: /home/linuxbrew/.linuxbrew/opt/[email protected]/lib/libpython3.8.so
-- pybind11 v2.5.0
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'torch'
CMake Error at cmake/torch.cmake:10 (find_package):
By not providing "FindTorch.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "Torch", but
CMake did not find one.

Could not find a package configuration file provided by "Torch" with any of
the following names:

TorchConfig.cmake
torch-config.cmake

Add the installation prefix of "Torch" to CMAKE_PREFIX_PATH or set
"Torch_DIR" to a directory containing one of the above files. If "Torch"
provides a separate development package or SDK, be sure it has been
installed.
Call Stack (most recent call first):
CMakeLists.txt:100 (include)

-- Configuring incomplete, errors occurred!
See also "/home/storage06/dpovey/k2/build/CMakeFiles/CMakeOutput.log".
See also "/home/storage06/dpovey/k2/build/CMakeFiles/CMakeError.log".

Integration with PyTorch

With the introduction of k2::Context, the current mechanism to communicate with PyTorch via DLPack is not sufficient since we need to allocate/deallocate memory on different devices, while DLPack can only pass pre-allocated memory around.

I would like to update the build system to link against PyTorch with the following goals in mind:

(1) The build system should be simple. The PyTorch dependency will be installed with pip install torch so that
C++ shares the same PyTorch version with Python.

(2) Replace the current k2::CudaContext with the one from PyTorch, which is much faster with memory caching.

FsaVec design

I'm thinking it may be easiest to have a single class that supports both (C++) Fsa and FsaVec, which of course are actually the same type; we can call it k2.Fsa. The .shape attribute will tell us how many axes there are, e.g. f.shape = (256, None, None) for a vector of 56 Fsa's or f.shape = (13, None) for an Fsa. Note, this .shape doesn't exist at the C++ level but can be obtained from the .NumAxes() and .Dim0(). (Note: below, assume a variable f is of type k2.Fsa).

f.arcs will be a RaggedArc, i.e. a pybind11-wrapped version of Ragged, and its members will be accessible as usual, e.g. f.arcs.row_ids(1), f.arcs.values and so on, although, we can declare that users should not use that interface to mutate anything.

Generally speaking the only way to construct a k2.Fsa will be through one of its constructors or by an FSA operation such as composition, top-sorting, etc. (Those operations will likely be available as k2.fsa.Intersect(), and so on, which will accept k2.Fsa objects and invoke the _k2.xxxx functions exposed by pybind11. Once constructed, the object's structural elements likely won't be mutable, except we may support, say, swapping the input and output labels or setting the weights (for which, see below) as long as the structure is unchanged. Such operations would probably be enabled through special interfaces, not by having the users simply write to fields.

The Fsa will have arbitrary attributes which are PyTorch tensors and whose first dimension equals the total number of arcs (i.e. f.arcs.numel()). These will be stored in a dict and accessed via __getattr__ and __setattr__.

The __setattr__ will just check that the first index equals the arcs.numel(), and set it in the dict. However, if it is the .weight attribute it will also overwrite the weights in the FSA with it, i.e. the .scores field, in addition to setting the class member.
The __getattr__ will just return the attribute. It will also support getting certain "special" attributes, particularly the .labels (or .symbols?).

Fsa operations will propagate the attributes as follows.

Unary operations where each output-arc corresponds to one input-arc, such as top-sort:
Just do output_attr = input_attr[arc_map]
Unary operations where each output-arc corresponds to zero or more input-arcs, such as
determinization:
- For floating-point attributes, let the output_attr be the sum of the attributes at corresponding input arcs (can be done with index_add(), I believe).
- For integer attributes, we would let the output_attr be a ragged tensor with the input attributes appended; this wouldn't be done with PyTorch itself but we'd have to wrap some C++ code. We can handle this later; anyway it doesn't require differentiation.
Binary operations where each output-arc corresponds to either inputa_arc or inputb_arc or the pair (inputa_arc,inputb_arc); I'm thinking about composition; the 1st 2 cases relate to epsilons.
- For floating-point attributes, let the output_attr be the sum of the corresponding input attributes.
- For integer attributes, this won't be supported. (I.e. we won't support composition where both the input
  and output FSAs have integer attributes of the same name).
Unary operations where the output is a scalar, such as computing the total score:
- For now we may just not do anything with these extra fields in this case, but we could perhaps optionally support adding together the floating-point attributes and appending the integer ones to a ragged tensor.
- Note: for backprop purposes we may compute this total-score in a special way using the arc_map, using PyTorch operations, rather than fust taking the floating-point output from k2. Depends on the semiring though. Dont worry about this for now, I'll figure it out.
Note: we will always create an attribute called '.scores' when we construct an FSA, e.g. from a tensor. This .scores attribute will be parallel to the .score field of the arcs, the idea is that they will always be the same (but separately stored). If the user overwrites the .scores field, we will also automatically overwrite the .score elements of the arcs. The reason for doing this is for backprop purposes: we won't have to worry about how PyTorch treats backprop when we're reinterpreting int32's as floats. We'll be using PyTorch indexing to propagate the .scores fields when we do operations, so backprop will automagically happen without our having to do anything special.

I changed my mind about putting extra fields in a separate sub-object called per_arc. We can just set them in the k2.Fsa object directly, it's easier to code.

ask for help: ld: cannot find -lcudadevrt

Hi guys,

I tried to update gcc on Xiaomi cluster from 4.8.5 to 7.3.0 with conda (note 4.8.5 is the default version installed by administrators and it works well with nvcc). However, after the updating, when I run

nvcc -o test test.cu

it will get error

$my-local-conda-env-path/x86_64-conda_cos6-linux-gnu/bin/ld: cannot find -lcudadevrt

Then I run

 nvcc -L/usr/local/cuda-10.0/lib64 test test.cu

It will succeed with no error (note libcudadevrt is in folder /usr/local/cuda-10.0/lib64).

However, LD_LIBRARY_PATH on the cluster has already included path /usr/local/cuda-10.0/lib64, so it's strange to me that nvcc -o test test.cu fails. Do you have any idea about this?

Basic version of memory allocation for testing purposes

It will be good if we provide a way to allocate memory that just goes to cudaMalloc, for purposes of running cuda-memcheck. Then we have an easy way to automatically find out-of-bounds memory access. Not urgent.

Question RE wrapping tensors

Guys,
Not super important, but is it possible to make it so that k2 tensors can survive the round trip to PyTorch tensors and back?
I mean, so that they would have a pointer to the original Region, rather than an extra layer of wrapping each time we go back and forth?

Transpose of RaggedTensor

Can someone please write a function to transpose a RaggedTensor, in ragged.h?
Will have similar interface to Transpose() for RaggedShape, but also take care of the tensor elements.
Will have to have a templated implementation, e.g. in ops_inl.h. Should be doable with a single kernel;
can call the Transpose() for RaggedShape to do the shape part.

Sort with operator

Guys,
I spoke with some guys at NVidia to get advice on how we'd implement an interface like

template <typename T, typename Op>
void SortSublists(Ragged<T> &src, Array1<int32_t> *order);

They advised this
https://moderngpu.github.io/segsort.html

It will require adding it as another dependency (moderngpu) but I think it's header only. Apparently it is likely to be added to thrust at some point (i.e. that sort thing). Sorry i dont have much energy right now so rather than doing it myself I am putting the info here.
Eventually we'll want to make it customizable with sorting function-object, actually we should make it
template <typename T, typename Op=LessThan >
struct LessThan {
inline host device forceinline operator (const T &a, const T &b) const { return (a < b); }
};
(+device copy constructor as needed)

Normalizing error checking

Guys, the error-checking stuff is still quite confused.
There is a file error.h that probably shouldn't be there, and glog is included by both error.h and context.h.

My proposal is to limit ourselves to the macros in debug.h and not use glog, because it's too complex and
IMO not really necessary. Any feedback on this?

	Fsa fsa1 = FsaFromString(s1);
	Fsa fsa2 = FsaFromString(s2);
	const Fsa *fsa_array[] = {&fsa1, &fsa2};

	// FsaVec fsa_vec = CreateFsaVec(2, &fsa_array[0]); // crash!

	Array2<int32_t> offsets = GetOffsets(num_srcs, src);
	auto offsets_acc = offsets.Accessor();

	std::vector<int32_t> tot_sizes_out(num_axes);
	for (int32_t axis = 0; axis < num_axes; ++axis)
	tot_sizes_out[axis] = offsets_acc(axis, num_srcs);

	RaggedShape ans = RaggedShapeFromTotSizes(c, num_axes, tot_sizes_out.data());
	Array1<int32_t *> dest_row_splits, dest_row_ids;
	GetRowInfo(ans, &dest_row_splits, &dest_row_ids);

	void GetRowInfo(RaggedShape &src, Array1<int32_t > row_splits,
	Array1<int32_t > row_ids) {
	int32_t axes = src.NumAxes();
	K2_CHECK_GE(axes, 2);
	src.Populate();

	// src_state dst_state label cost
	std::string s = R"(0 1 10 -1.2
	0 2 6 -2.2
	0 3 9 -2.2
	1 2 8 -3.2
	1 3 6 -4.2
	2 3 5 -5.2
	2 4 4 -6.2
	3 5 -1 -7.2
	5
	)";
	Fsa fsa = FsaFromString(s);
	int32_t prop = GetFsaBasicProperties(fsa);

k2-fsa / k2 Goto Github PK

k2's People

Contributors

Stargazers

Watchers

Forkers

k2's Issues

Recommend Projects

Recommend Topics

Recommend Org