如果是非讀懂不可的,那就只有打攻堅(jiān)戰(zhàn),補(bǔ)基礎(chǔ),問別人,盡量搞懂;否則可以先只讀大意,細(xì)節(jié)放一邊,以后有需要再回來細(xì)讀(那時(shí)可能基礎(chǔ)也有了)。至于大意,我覺得起碼要有幾個(gè)要點(diǎn)需要清晰(也請(qǐng)其他網(wǎng)友補(bǔ)充):1。解決的目標(biāo)問題(最好也稍微了解一下問題提出的背景);2。已知條件;3。假設(shè)(有時(shí)候是隱含假設(shè),作者沒有明說,所以要訓(xùn)練金睛火眼,呵呵);4。解決的大致思路(很多paper都有對(duì)思路比較直觀的解釋);5。主要結(jié)論(是完全解決還是部分解決,有沒有比較重要的中間推論,等等);6。跟其它方法的比較;7。不足之處(也就是以后可以繼續(xù)做工作的地方)。其中1是最重要的,其次是2、3、4、5,最后是6、7。不過這都純屬個(gè)人經(jīng)驗(yàn),看情況參考吧:) 88 Information extraction - Wikipedia, the free encyclopedia
Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents.A typical application of IE is to scan a set of documents written in a natural language and populate a database with the information extracted. Current approaches to IE use natural language processing techniques that focus on very restricted domains. For example, the Message Understanding Conference (MUC) is a competition-based conference that focused on the following domains in the past: 87 MySQL安全性指南
作為一個(gè)MySQL的系統(tǒng)管理員,你有責(zé)任維護(hù)你的MySQL數(shù)據(jù)庫(kù)系統(tǒng)的數(shù)據(jù)安全性和完整性。本文主要主要介紹如何建立一個(gè)安全的MySQL系統(tǒng),從系統(tǒng)內(nèi)部和外部網(wǎng)絡(luò)兩個(gè)角度,為你提供一個(gè)指南。 86 MySQL高級(jí)特性----對(duì)比與其他數(shù)據(jù)庫(kù) - MYSQL - 技術(shù)天地 - 賽迪網(wǎng)
對(duì)于速度的真實(shí)比較,以及不斷成熟的MySQL基準(zhǔn)套件。見10.8 使用你自己的基準(zhǔn)。因?yàn)闆]有線程創(chuàng)建開銷、一個(gè)較小的語法分析器、較少功能和簡(jiǎn)單的安全性,mSQL應(yīng)該在下列方面更快些: 85 什么是海量數(shù)據(jù)挖掘引擎--DoNews.com--IT社區(qū)
傳統(tǒng)的關(guān)鍵詞搜索引擎技術(shù)產(chǎn)生于上世紀(jì)末,通過對(duì)網(wǎng)頁文本的全文搜索提供了網(wǎng)頁快速查詢的手段,使得網(wǎng)頁信息的可用性大大提高。但隨著網(wǎng)頁數(shù)量的快速膨脹,重復(fù)引用,使得羅列的搜索結(jié)果越來越難以利用。多媒體技術(shù)、寬帶技術(shù)的發(fā)展也使網(wǎng)絡(luò)資源日趨多元化,這些資源質(zhì)量評(píng)價(jià)標(biāo)準(zhǔn)不同、特征各異,混合排序難以達(dá)到滿意的效果。網(wǎng)絡(luò)用戶年齡結(jié)構(gòu)年輕化,平均知識(shí)水平降低,使得用戶對(duì)搜索技巧掌握、結(jié)果篩選的能力降低。網(wǎng)絡(luò)上不同領(lǐng)域愛好者群體的興起對(duì)搜索結(jié)果的個(gè)性化、專業(yè)化提出了更高要求。 84 Block-Level Link Analysis - What Does It Mean To You?
Microsoft s research lab has released a paper in which they discuss a new way to rank web sites. The new method is called :block-level link analysis. 83 VIPS: a VIsion based Page Segmentation Algorithm
The VIsion-based Page Segmentation (VIPS) algorithm aims to extract the semantic structure of a web page based on its visual presentation. Such semantic structure is a tree structure; each node in the tree corresponds to a block. Each node will be assigned a value (Degree of Coherence) to indicate how coherent of the content in the block based on visual perception, the bigger is the DoC value, the more coherent is the block. The VIPS algo-rithm makes full use of page layout structure. It first extracts all the suitable blocks from the html DOM tree, and then it finds the separators between these blocks. Here, separators denote the hori-zontal or vertical lines in a web page that visually cross with no blocks. Based on these separators, the semantic tree of the web page is constructed. Thus, a web page can be represented as a set of blocks (leaf nodes of the semantic tree). Compared with DOM based methods, the segments obtained by VIPS are much more semantically aggregated. Noisy information, such as navigation, advertisement, and decoration can be easily removed because they are often placed in certain positions of a page. Contents with different topics are distinguished as separate blocks. 82 google電話面試過程
因?yàn)槲疑暾?qǐng)的是Wireless Developer的職位,他問我是否做過J2ME以及手機(jī)應(yīng)用開發(fā)方面的工作。由于沒有做過,只好老實(shí)的說沒有,但做過協(xié)議棧方面的開發(fā)。他顯然對(duì)這個(gè)不感興趣,沒有多問。接下來的所有時(shí)間,我都在回答他給我做的一個(gè)算法問題,耗費(fèi)了40多分鐘,最后基本上是他把算法說出來,狂汗。其實(shí),我現(xiàn)在想想,這應(yīng)該是一個(gè)簡(jiǎn)單的問題,也不知道當(dāng)時(shí)為什么就想不出來,再汗。建議申請(qǐng)開發(fā)職位的兄弟一定要打好算法方面的基本功。偶這方面就從來沒有系統(tǒng)學(xué)習(xí)過,很弱。我把他的題目帖出來吧,感興趣的可以看看已有數(shù)組表示了一個(gè)文檔中的單詞出現(xiàn)的位置,輸入k個(gè)單詞,請(qǐng)找出包含改k個(gè)單詞的最短的位置。比如有其中的三個(gè)數(shù)組為:hello -> 5 14 19 35 52world -> 11 17 29 40goodbye -> 1 25 63 72后面的數(shù)字是該單詞在文檔中出現(xiàn)的位置,若輸入是hello world goodbye的話,最短的位置是什么? 81 :::實(shí)施數(shù)據(jù)挖掘項(xiàng)目考慮的問題:::
談到數(shù)據(jù)挖掘應(yīng)從以下三方面加以考慮,一是用數(shù)據(jù)挖掘解決什么樣的商業(yè)問題,二是為進(jìn)行數(shù)據(jù)挖掘所做的數(shù)據(jù)準(zhǔn)備,三是數(shù)據(jù)挖掘的各種分析算法。 80 :::數(shù)據(jù)挖掘應(yīng)用:::
需要強(qiáng)調(diào)的是,數(shù)據(jù)挖掘技術(shù)從一開始就是面向應(yīng)用的。目前,在很多領(lǐng)域,數(shù)據(jù)挖掘(data mining)都是一個(gè)很時(shí)髦的詞,尤其是在如銀行、電信、保險(xiǎn)、交通、零售(如超級(jí)市場(chǎng))等商業(yè)領(lǐng)域。數(shù)據(jù)挖掘所能解決的典型商業(yè)問題包括:數(shù)據(jù)庫(kù)營(yíng)銷(Database Marketing)、客戶群體劃分(Customer Segmentation |