Giter Club home page Giter Club logo

apisim's Introduction

APISIM

Build Status

apisim's People

Contributors

rahulpandita avatar

Stargazers

SQG avatar

Watchers

 avatar  avatar

apisim's Issues

Technique explaination

The technique is also not clearly explained. The Indexer extracts different types of information and the query is constructed with different components (type name/description and method name/description), but the Searcher doesn't seem to use this information in any way. So the point of separating parts of the query as shown in Fig 5 is not clear. Would it matter if all the terms were merged together?

Description of VSM

Overall, the techniques used are not rigorously described. The description of the vector space model,
cosine similarity, and tf-idf is all rather loose. Similarly, the description of preprocessing steps is also
quite loose. In contrast, quite a bit of detail is given to simple steps such as camelCase splitting and
splitting components connected with dots. A similar problem extends throughout the entire paper - the
explanation of the approach and the description of the analysis is extremely informal. Why separate out the description of the underlying VSM representation of query and target - as this is a shared representation. Further, later in the the paper the VSM is again defined (so scattered across three places).

Table 2

Table 2 doesn't say how many methods can actually be mapped (the ground truth) so that
recall can be computed.

Separate out the presentation of the implementation

In addition to my earlier comment about more rigorously presenting the technique - I suggest that you separate out the presentation of the implementation more clearly from the algorithms. For example, in section 4.3 (which is prior to the section on Implementation) you are already discussing the "Query Builder Component".

Technical novelty

The approach is fairly straight forward and uses standard NLP and information retrieval techniques such as stemming, stop word removal, term weighting using tf-idf, and ranking results using cosine similarity. In this regard, the paper is weak on technical novelty.

Experiment Design Description

The experimental design - especially the comparison to the previous approaches needs to be clarified. For example, it isn't clear to me whether the comparison against the earlier approaches is made by using the same datasets or by rerunning techniques using the existing tools. This is another example of lack of rigor in the explanation of the experiment.

Table 1

Some of the examples shown in Table 1 are a bit contrived and more realistic queries could be shown. Why would you search using keywords such as "public"? Instead, it's more obvious to search using "draw" and "string" as separate words or some of their synonyms.

Approach suggestions

Is confounding effect addressed through tf-idf as well? The it is the same problem as determining proper weights. It seems that a query is evaluated against the entire corpus of documents in which information about each method is one document. An alternative approach would be to match types and then look for method mappings within a pair of matched types only, which could be more efficient. Is there any evidence that the presented approach is preferable (e.g., because it can handle cases where methods in one source type are split into two types in the target API)?

Section 4.2

Sec 4.2: what is total_mtd in the tf-idf formula? Should this be the total number of
documents?

RQ3 seems unnecessary

The reason for investigating RQ3 does not come out clearly. What is the goal of comparing TMAP with Sniff, and that too using only 8 queries? I see that API mapping could be formulated as a code search problem on the target API, where the search query comes from the source API. But still, it's not clear what is the point of the data shown in Table 3.

Description Issue

Given that the techniques behave differently (i.e. return different APIs); given that the purpose of this
paper is to present a new technique that improves over existing techniques; and given that the proposed approach is quite simple (i.e. standard VSM), then I would expect a far more stringent evaluation. In particular - the evaluation should include more variety of classes, perhaps rerunning the existing methods and TMAP across the same datasets, a far more detailed analysis of the results - including more examples, and most importantly at least a very simple and initial user study - which explores whether the kinds of API results returned by Rosetta are more or less useful than those returned by TMAP (etc).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.