apisim's Introduction
apisim's People
Forkers
binyu-xidian-universityapisim's Issues
Technique explaination
The technique is also not clearly explained. The Indexer extracts different types of information and the query is constructed with different components (type name/description and method name/description), but the Searcher doesn't seem to use this information in any way. So the point of separating parts of the query as shown in Fig 5 is not clear. Would it matter if all the terms were merged together?
Description of VSM
Overall, the techniques used are not rigorously described. The description of the vector space model,
cosine similarity, and tf-idf is all rather loose. Similarly, the description of preprocessing steps is also
quite loose. In contrast, quite a bit of detail is given to simple steps such as camelCase splitting and
splitting components connected with dots. A similar problem extends throughout the entire paper - the
explanation of the approach and the description of the analysis is extremely informal. Why separate out the description of the underlying VSM representation of query and target - as this is a shared representation. Further, later in the the paper the VSM is again defined (so scattered across three places).
Table 2
Table 2 doesn't say how many methods can actually be mapped (the ground truth) so that
recall can be computed.
Separate out the presentation of the implementation
In addition to my earlier comment about more rigorously presenting the technique - I suggest that you separate out the presentation of the implementation more clearly from the algorithms. For example, in section 4.3 (which is prior to the section on Implementation) you are already discussing the "Query Builder Component".
Technical novelty
The approach is fairly straight forward and uses standard NLP and information retrieval techniques such as stemming, stop word removal, term weighting using tf-idf, and ranking results using cosine similarity. In this regard, the paper is weak on technical novelty.
Experiment Design Description
The experimental design - especially the comparison to the previous approaches needs to be clarified. For example, it isn't clear to me whether the comparison against the earlier approaches is made by using the same datasets or by rerunning techniques using the existing tools. This is another example of lack of rigor in the explanation of the experiment.
Table 1
Some of the examples shown in Table 1 are a bit contrived and more realistic queries could be shown. Why would you search using keywords such as "public"? Instead, it's more obvious to search using "draw" and "string" as separate words or some of their synonyms.
Format for SCAM style
IEEE style 10 page
Approach suggestions
Is confounding effect addressed through tf-idf as well? The it is the same problem as determining proper weights. It seems that a query is evaluated against the entire corpus of documents in which information about each method is one document. An alternative approach would be to match types and then look for method mappings within a pair of matched types only, which could be more efficient. Is there any evidence that the presented approach is preferable (e.g., because it can handle cases where methods in one source type are split into two types in the target API)?
Section 4.2
Sec 4.2: what is total_mtd in the tf-idf formula? Should this be the total number of
documents?
RQ3 seems unnecessary
The reason for investigating RQ3 does not come out clearly. What is the goal of comparing TMAP with Sniff, and that too using only 8 queries? I see that API mapping could be formulated as a code search problem on the target API, where the search query comes from the source API. But still, it's not clear what is the point of the data shown in Table 3.
Description Issue
Given that the techniques behave differently (i.e. return different APIs); given that the purpose of this
paper is to present a new technique that improves over existing techniques; and given that the proposed approach is quite simple (i.e. standard VSM), then I would expect a far more stringent evaluation. In particular - the evaluation should include more variety of classes, perhaps rerunning the existing methods and TMAP across the same datasets, a far more detailed analysis of the results - including more examples, and most importantly at least a very simple and initial user study - which explores whether the kinds of API results returned by Rosetta are more or less useful than those returned by TMAP (etc).
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.