fbpx
维基百科

分子挖掘

分子探勘(Molecule mining)為使用分子数据挖掘。由於分子可由分子圖表示,這與圖形挖掘和結構化數據挖掘密切相關。主要問題是如何在區分數據實例時表示分子。其中一種方法是化學相似性度量,这在化學信息學領域具有悠久的傳統。

計算化學相似性的典型方法是使用化學指紋,但這会导致丟失有關分子拓撲的基礎信息。挖掘分子圖直接避免了這個問題。反向QSAR問題也適用於矢量映射問題。

編碼(分子i,分子j\neq i) 编辑

核心方法 编辑

  • 邊緣化圖形核心
    [1]
  • 最優分配核心[2][3][4]
  • 藥效核心[5]
  • C++(and R)执行 (页面存档备份,存于互联网档案馆)结合
    • 標記圖之間的邊緣化圖形核心
      [1]
    • 邊緣化核心的擴展[6]
    • 谷本核(Tanimoto kernels)[7]
    • 基於樹形圖的圖形內核[8]
    • 基於用於分子3D結構的藥效核心[5]

最大值共同圖形方法(Maximum Common Graph methods) 编辑

  • MCS-HSCS[9] (單MCS最高得分普通子結構(HSCS)排名策略)
  • 小分子子图檢測器(SMSD)[10]-是一個基於Java的軟件庫,用於計算小分子之間的最大共同子圖(MCS)。這將有助於我們找到兩個分子之間的相似性/距離。 MCS也用於通過擊打分子來篩選藥物化合物,其分享共同的子圖(子結構)。[11]

編碼(分子i) 编辑

分子查詢方法 编辑

基於神經網絡特殊架構的方法 编辑

参见 编辑

参考文献 编辑

  1. ^ 1.0 1.1 H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
  2. ^ H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
  3. ^ H. Fröhlich, J. K. Wegner, A. Zell, Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression, QSAR Comb. Sci., 2006, 25, 317-326. doi:10.1002/qsar.200510135
  4. ^ H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
  5. ^ 5.0 5.1 P. Mahe, L. Ralaivola, V. Stoven, J. Vert, The pharmacophore kernel for virtual screening with support vector machines, J Chem Inf Model, 2006, 46, 2003-2014. doi:10.1021/ci060138m
  6. ^ P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert, J.-P. Extensions of marginalized graph kernels. Proceedings of the 21st ICML. 2004: 552–559. 
  7. ^ L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi. Graph kernels for chemical informatics. Neural Networks. 2005, 18: 1093–1110 [2017-07-02]. doi:10.1016/j.neunet.2005.07.009. (原始内容于2015-09-24). 
  8. ^ P. Mahé and J.-P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning. 2009, 75 (1): 3–35. ISSN 0885-6125. doi:10.1007/s10994-008-5086-2. 
  9. ^ J. K. Wegner, H. Fröhlich, H. Mielenz, A. Zell, Data and Graph Mining in Chemical Space for ADME and Activity Data Sets, QSAR Comb. Sci., 2006, 25, 205-220. doi:10.1002/qsar.200510009
  10. ^ S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. doi:10.1186/1758-2946-1-12
  11. ^ 存档副本. [2017-07-02]. (原始内容于2020-01-28). 
  12. ^ R. D. King, A. Srinivasan, L. Dehaspe, Wamr: a data mining tool for chemical data, J. Comput.-Aid. Mol. Des., 2001, 15, 173-181. doi:10.1023/A:1008171016861
  13. ^ L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
  14. ^ A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001, 2, 87-92.
  15. ^ A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
  16. ^ A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
  17. ^ M. Kuramochi, G. Karypis, An Efficient Algorithm for Discovering Frequent Subgraphs, IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9), 1038-1051.
  18. ^ M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, Frequent Substructure-Based Approaches for Classifying Chemical Compounds, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8), 1036-1050.
  19. ^ C. Helma, T. Cramer, S. Kramer, L. de Raedt, Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds, J. Chem. Inf. Comput. Sci., 2004, 44, 1402-1411. doi:10.1021/ci034254q
  20. ^ T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
  21. ^ T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
  22. ^ T. Meinl, M. R. Berthold, Hybrid Fragment Mining with MoFa and FSG, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  23. ^ S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  24. ^ C. Helma, Predictive Toxicology, CRC Press, 2005.
  25. ^ M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
  26. ^ K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
  27. ^ X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
  28. ^ A. Karwath, L. D. Raedt, SMIREP: predicting chemical activity from SMILES, J Chem Inf Model, 2006, 46, 2432-2444. doi:10.1021/ci060159g
  29. ^ H. Ando, L. Dehaspe, W. Luyten, E. Craenenbroeck, H. Vandecasteele, L. Meervelt, Discovering H-Bonding Rules in Crystals with Inductive Logic Programming, Mol Pharm, 2006, 3, 665-674 . doi:10.1021/mp060034z
  30. ^ P. Mazzatorta, L. Tran, B. Schilter, M. Grigorov, Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity, J. Chem. Inf. Model., 2006, ASAP alert. doi:10.1021/ci600411v
  31. ^ N. Wale, G. Karypis. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification, ICDM, ''2006, 678-689.
  32. ^ A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, In Proc. of ECML--PKDD, pp. 365–376, 2008.
  33. ^ Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , in BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.
  34. ^ Baskin, I. I.; V. A. Palyulin; N. S. Zefirov. [A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks]. Doklady Akademii Nauk SSSR. 1993, 333 (2): 176–179. 
  35. ^ I. I. Baskin, V. A. Palyulin, N. S. Zefirov. A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds. J. Chem. Inf. Comput. Sci. 1997, 37 (4): 715–721. doi:10.1021/ci940128y. 
  36. ^ D. B. Kireev. ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping. J. Chem. Inf. Comput. Sci. 1995, 35 (2): 175–180. doi:10.1021/ci00024a001. 
  37. ^ A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina. Application of Cascade Correlation Networks for Structures to Chemistry. Applied Intelligence. 2000, 12 (1-2): 117–146. doi:10.1023/A:1008368105614. 
  38. ^ A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci. Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines. J. Chem. Inf. Comput. Sci. 2001, 41 (1): 202–218. PMID 11206375. doi:10.1021/ci9903399. 
  39. ^ O. Ivanciuc. Molecular Structure Encoding into Artificial Neural Networks Topology. Roumanian Chemical Quarterly Reviews. 2001, 8: 197–220. 
  40. ^ A. Goulon, T. Picot, A. Duprat, G. Dreyfus. Predicting activities without computing descriptors: Graph machines for QSAR. SAR and QSAR in Environmental Research. 2007, 18 (1-2): 141–153. PMID 17365965. doi:10.1080/10629360601054313. 

进一步阅读 编辑

  • Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
  • R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
  • Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997ISBN 0-521-58519-8
  • R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3-527-29913-0

参见 编辑

外部链接 编辑

  • 小分子子图檢測器(SMSD) (页面存档备份,存于互联网档案馆) - 是一個基於Java的軟件庫,用於計算小分子之間的最大共同子圖(MCS)。
  • 2007年第五屆國際挖掘與學習研討會 (页面存档备份,存于互联网档案馆
  • 2006年概览 (页面存档备份,存于互联网档案馆
  • 和 碩士論文文檔(页面存档备份,存于互联网档案馆) - Java - 開源 - 分佈式挖掘 - 基準算法庫
  • -商业软件
  • AFGen (页面存档备份,存于互联网档案馆) -用於生成基於片段的描述符的軟件

分子挖掘, 分子探勘, molecule, mining, 為使用分子的数据挖掘, 由於分子可由分子圖表示, 這與圖形挖掘和結構化數據挖掘密切相關, 主要問題是如何在區分數據實例時表示分子, 其中一種方法是化學相似性度量, 这在化學信息學領域具有悠久的傳統, 計算化學相似性的典型方法是使用化學指紋, 但這会导致丟失有關分子拓撲的基礎信息, 挖掘分子圖直接避免了這個問題, 反向qsar問題也適用於矢量映射問題, 目录, 編碼, 分子i, 分子j, 核心方法, 最大值共同圖形方法, maximum, common, g. 分子探勘 Molecule mining 為使用分子的数据挖掘 由於分子可由分子圖表示 這與圖形挖掘和結構化數據挖掘密切相關 主要問題是如何在區分數據實例時表示分子 其中一種方法是化學相似性度量 这在化學信息學領域具有悠久的傳統 計算化學相似性的典型方法是使用化學指紋 但這会导致丟失有關分子拓撲的基礎信息 挖掘分子圖直接避免了這個問題 反向QSAR問題也適用於矢量映射問題 目录 1 編碼 分子i 分子j neq i 1 1 核心方法 1 2 最大值共同圖形方法 Maximum Common Graph methods 2 編碼 分子i 2 1 分子查詢方法 2 2 基於神經網絡特殊架構的方法 3 参见 4 参考文献 4 1 进一步阅读 5 参见 6 外部链接編碼 分子i 分子j neq i 编辑核心方法 编辑 邊緣化圖形核心 1 最優分配核心 2 3 4 藥效核心 5 C and R 执行 页面存档备份 存于互联网档案馆 结合 標記圖之間的邊緣化圖形核心 1 邊緣化核心的擴展 6 谷本核 Tanimoto kernels 7 基於樹形圖的圖形內核 8 基於用於分子3D結構的藥效核心 5 最大值共同圖形方法 Maximum Common Graph methods 编辑 MCS HSCS 9 單MCS最高得分普通子結構 HSCS 排名策略 小分子子图檢測器 SMSD 10 是一個基於Java的軟件庫 用於計算小分子之間的最大共同子圖 MCS 這將有助於我們找到兩個分子之間的相似性 距離 MCS也用於通過擊打分子來篩選藥物化合物 其分享共同的子圖 子結構 11 編碼 分子i 编辑分子查詢方法 编辑 Warmr 12 13 AGM 14 15 PolyFARM 16 FSG 17 18 MolFea 19 MoFa MoSS 20 21 22 Gaston 23 LAZAR 24 ParMol 25 包括 MoFa FFSM gSpan 和 Gaston optimized gSpan 26 27 SMIREP 28 DMax 29 SAm AIm RHC 30 AFGen 31 gRed 32 G Hash 33 基於神經網絡特殊架構的方法 编辑 BPZ 34 35 ChemNet 36 CCS 37 38 MolNet 39 Graph machines 40 参见 编辑分子查询语言 化學圖論参考文献 编辑 1 0 1 1 H Kashima K Tsuda A Inokuchi Marginalized Kernels Between Labeled Graphs The 20th International Conference on Machine Learning ICML2003 2003 PDF H Frohlich J K Wegner A Zell Optimal Assignment Kernels For Attributed Molecular Graphs The 22nd International Conference on Machine Learning ICML 2005 Omnipress Madison WI USA 2005 225 232 PDF H Frohlich J K Wegner A Zell Kernel Functions for Attributed Molecular Graphs A New Similarity Based Approach To ADME Prediction in Classification and Regression QSAR Comb Sci 2006 25 317 326 doi 10 1002 qsar 200510135 H Frohlich J K Wegner A Zell Assignment Kernels For Chemical Compounds International Joint Conference on Neural Networks 2005 IJCNN 05 2005 913 918 CiteSeer 5 0 5 1 P Mahe L Ralaivola V Stoven J Vert The pharmacophore kernel for virtual screening with support vector machines J Chem Inf Model 2006 46 2003 2014 doi 10 1021 ci060138m P Mahe N Ueda T Akutsu J L Perret and P Vert J P Extensions of marginalized graph kernels Proceedings of the 21st ICML 2004 552 559 L Ralaivola S J Swamidass S Hiroto and P Baldi Graph kernels for chemical informatics Neural Networks 2005 18 1093 1110 2017 07 02 doi 10 1016 j neunet 2005 07 009 原始内容存档于2015 09 24 P Mahe and J P Vert Graph kernels based on tree patterns for molecules Machine Learning 2009 75 1 3 35 ISSN 0885 6125 doi 10 1007 s10994 008 5086 2 J K Wegner H Frohlich H Mielenz A Zell Data and Graph Mining in Chemical Space for ADME and Activity Data Sets QSAR Comb Sci 2006 25 205 220 doi 10 1002 qsar 200510009 S A Rahman M Bashton G L Holliday R Schrader and J M Thornton Small Molecule Subgraph Detector SMSD toolkit Journal of Cheminformatics 2009 1 12 doi 10 1186 1758 2946 1 12 存档副本 2017 07 02 原始内容存档于2020 01 28 R D King A Srinivasan L Dehaspe Wamr a data mining tool for chemical data J Comput Aid Mol Des 2001 15 173 181 doi 10 1023 A 1008171016861 L Dehaspe H Toivonen King Finding frequent substructures in chemical compounds 4th International Conference on Knowledge Discovery and Data Mining AAAI Press 1998 30 36 A Inokuchi T Washio T Okada H Motoda Applying the Apriori based Graph Mining Method to Mutagenesis Data Analysis Journal of Computer Aided Chemistry 2001 2 87 92 A Inokuchi T Washio K Nishimura H Motoda A Fast Algorithm for Mining Frequent Connected Subgraphs IBM Research Tokyo Research Laboratory 2002 A Clare R D King Data mining the yeast genome in a lazy functional language Practical Aspects of Declarative Languages PADL2003 2003 M Kuramochi G Karypis An Efficient Algorithm for Discovering Frequent Subgraphs IEEE Transactions on Knowledge and Data Engineering 2004 16 9 1038 1051 M Deshpande M Kuramochi N Wale G Karypis Frequent Substructure Based Approaches for Classifying Chemical Compounds IEEE Transactions on Knowledge and Data Engineering 2005 17 8 1036 1050 C Helma T Cramer S Kramer L de Raedt Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds J Chem Inf Comput Sci 2004 44 1402 1411 doi 10 1021 ci034254q T Meinl C Borgelt M R Berthold Discriminative Closed Fragment Mining and Perfect Extensions in MoFa Proceedings of the Second Starting AI Researchers Symposium STAIRS 2004 2004 T Meinl C Borgelt M R Berthold M Philippsen Mining Fragments with Fuzzy Chains in Molecular Databases Second International Workshop on Mining Graphs Trees and Sequences MGTS2004 2004 T Meinl M R Berthold Hybrid Fragment Mining with MoFa and FSG Proceedings of the 2004 IEEE Conference on Systems Man amp Cybernetics SMC2004 2004 S Nijssen J N Kok Frequent Graph Mining and its Application to Molecular Databases Proceedings of the 2004 IEEE Conference on Systems Man amp Cybernetics SMC2004 2004 C Helma Predictive Toxicology CRC Press 2005 M Worlein Extension and parallelization of a graph mining algorithm Friedrich Alexander Universitat 2006 PDF K Jahn S Kramer Optimizing gSpan for Molecular Datasets Proceedings of the Third International Workshop on Mining Graphs Trees and Sequences MGTS 2005 2005 X Yan J Han gSpan Graph Based Substructure Pattern Mining Proceedings of the 2002 IEEE International Conference on Data Mining ICDM 2002 IEEE Computer Society 2002 721 724 A Karwath L D Raedt SMIREP predicting chemical activity from SMILES J Chem Inf Model 2006 46 2432 2444 doi 10 1021 ci060159g H Ando L Dehaspe W Luyten E Craenenbroeck H Vandecasteele L Meervelt Discovering H Bonding Rules in Crystals with Inductive Logic Programming Mol Pharm 2006 3 665 674 doi 10 1021 mp060034z P Mazzatorta L Tran B Schilter M Grigorov Integration of Structure Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity J Chem Inf Model 2006 ASAP alert doi 10 1021 ci600411v N Wale G Karypis Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification ICDM 2006 678 689 A Gago Alonso J E Medina Pagola J A Carrasco Ochoa and J F Martinez Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates In Proc of ECML PKDD pp 365 376 2008 Xiaohong Wang Jun Huan Aaron Smalter Gerald Lushington Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases in BMC Bioinformatics Vol 11 Suppl 3 S8 2010 Baskin I I V A Palyulin N S Zefirov A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks Doklady Akademii Nauk SSSR 1993 333 2 176 179 I I Baskin V A Palyulin N S Zefirov A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds J Chem Inf Comput Sci 1997 37 4 715 721 doi 10 1021 ci940128y D B Kireev ChemNet A Novel Neural Network Based Method for Graph Property Mapping J Chem Inf Comput Sci 1995 35 2 175 180 doi 10 1021 ci00024a001 A M Bianucci Micheli Alessio Sperduti Alessandro Starita Antonina Application of Cascade Correlation Networks for Structures to Chemistry Applied Intelligence 2000 12 1 2 117 146 doi 10 1023 A 1008368105614 A Micheli A Sperduti A Starita A M Bianucci Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure Activity Relationship Studies of Benzodiazepines J Chem Inf Comput Sci 2001 41 1 202 218 PMID 11206375 doi 10 1021 ci9903399 O Ivanciuc Molecular Structure Encoding into Artificial Neural Networks Topology Roumanian Chemical Quarterly Reviews 2001 8 197 220 A Goulon T Picot A Duprat G Dreyfus Predicting activities without computing descriptors Graph machines for QSAR SAR and QSAR in Environmental Research 2007 18 1 2 141 153 PMID 17365965 doi 10 1080 10629360601054313 进一步阅读 编辑 Scholkopf B K Tsuda and J P Vert Kernel Methods in Computational Biology MIT Press Cambridge MA 2004 R O Duda P E Hart D G Stork Pattern Classification John Wiley amp Sons 2001 ISBN 0 471 05669 3 Gusfield D Algorithms on Strings Trees and Sequences Computer Science and Computational Biology Cambridge University Press 1997 ISBN 0 521 58519 8 R Todeschini V Consonni Handbook of Molecular Descriptors Wiley VCH 2000 ISBN 3 527 29913 0参见 编辑定量构效关系 ADME 分配系数外部链接 编辑小分子子图檢測器 SMSD 页面存档备份 存于互联网档案馆 是一個基於Java的軟件庫 用於計算小分子之間的最大共同子圖 MCS 2007年第五屆國際挖掘與學習研討會 页面存档备份 存于互联网档案馆 2006年概览 页面存档备份 存于互联网档案馆 分子開採 基礎化學專家系統 ParMol 和 碩士論文文檔 页面存档备份 存于互联网档案馆 Java 開源 分佈式挖掘 基準算法庫 TU慕尼黑 克萊默集團 分子採礦 高級化學專家系統 DMax化學助理 商业软件 AFGen 页面存档备份 存于互联网档案馆 用於生成基於片段的描述符的軟件 取自 https zh wikipedia org w index php title 分子挖掘 amp oldid 70628197, 维基百科,wiki,书籍,书籍,图书馆,

文章

,阅读,下载,免费,免费下载,mp3,视频,mp4,3gp, jpg,jpeg,gif,png,图片,音乐,歌曲,电影,书籍,游戏,游戏。