Giter Club home page Giter Club logo

Comments (7)

nchild avatar nchild commented on June 24, 2024

查看 newsdiff 發現可能原因,應該是標題大改,使得網址也有變動導致。
http://newsdiff.g0v.ronny.tw/index/log/1903648

from newshelper-extension.

winiah avatar winiah commented on June 24, 2024

這應該不是小幫手的 bug,這是因為瀏覽器網址支援 Unicode,所以有些網站,會直接把「標題」拿來當成網址的一部分,對搜尋結果排名會有正面影響。

你可以看看你上面貼的,不是有中文字嗎?但其實固定的網址是要用這樣。

http://www.appledaily.com.tw/realtimenews/article/3c/20140911/467685/

from newshelper-extension.

nchild avatar nchild commented on June 24, 2024

了解了,但是原先 newsdiff 抓取也是跟後面的字一起抓,所以我才會將整串當成回報網址。

這可能要附註在新聞小幫手網頁,不然可能有人貼了類似網址,也會不知道原因就失效,而且會喪失輸入的內容。

from newshelper-extension.

winiah avatar winiah commented on June 24, 2024

其實這是有點沒辦法的問題啦,有些這樣子改掉中文網址就行,有些會有改掉中文網址,實際連還是會吐中文網址出來,只能盡量標題寫正確,讓其他人去幫忙修正。

不過這樣就會發生同樣的新聞,因為不同網址,就抓不出來的問題,畢竟不是每個人都知道要怎麼把網址多餘的字去掉,所以會變成可能需要同一筆寫兩個以上的網址,也許改成前綴(prefix)方式抓網址,可以解決一部分的問題,或是讓網址欄多一個替代(alternative)網址之類的。

from newshelper-extension.

timdream avatar timdream commented on June 24, 2024

可以找 HTML header 裡面的 Canonical URL ...

from newshelper-extension.

ronnywang avatar ronnywang commented on June 24, 2024

之前不用 canonical url 是因為不要多做一次去抓該頁內容的動作
這個問題應該是因為我用 PHP 的 filter_vars($url, FILTER_VALIDATE_URL); ,而他不支援中文網址,我針對這邊改掉就好了

from newshelper-extension.

ronnywang avatar ronnywang commented on June 24, 2024

ok, 改寫掉 filter_vars 的部份了,可以再試試看

from newshelper-extension.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.