Topic Modeling for Textual Learning Materials on Informatics Using R Language
This paper presents results of topic modeling for text learning materials. Learning materials are electronic lecture notes used by teachers to prepare for computer science classes. Topic modeling methods allow users to systematize the content of textual documents without additional manual work. Main topics in documents are highlighted, and the distribution of topics in documents is demonstrated. In other words, the proposed methods provide the framework for the so-called topic model that puts a set of topics that characterize the content of documents in a given collection of documents. The latent Dirichlet allocation (LDA) is used for topic modeling. The implementation is done using the R language. The developed interactive web application provides a set of visual tools for topic modeling to a user (teacher). Visualization techniques gradually improve the ergonomics of a teacher’s work with learning materials and save the time spent on studying, analyzing, and selecting relevant study materials.
Blei D.M. Probabilistic topic models // Communications of the ACM. — 2012. — Т. 55. — №. 4.
Воронцов К.В., Потапенко А.А. Модификации EM-алгоритма для вероятностного тематического моделирования // Машинное обучение и анализ данных. — 2013. — Т. 1. — № 6.
Коляда А.С., Яковенко В.А., Гогунский В.Д., Яковенко В.О., Гогунський В.Д. Применение латентного размещения Дирихле для анализа публикаций из наукометрических баз данных // Pratsi. — 2014. — № 1 (43).
David M. Blei, Andrew Y.Ng, Michael I. Jordan. Latent Dirichlet Allocation // Journal of Machine Learning Research. — Stanford, 2003. — 1/03.
Минаев В.А., Королев И.Д., Кисленко И.А. Методы выявления латентной и негативной информации в текстовых документах // Технологии техносферной безопасности. — 2016. — №. 5.
Celebic G., Rendulic D. Basic Concepts of Information and Communication Technology // Handbook [Electronic resourse]. — URL: http://www.itdesk.info/handbook_basic_ ict_concepts.pdf (дата обращения: 19.05.2018).
Computer Architecture. Online open course [Electronic resourse]. — URL: https://learn.saylor.org/course/view.php?id=71 (дата обращения: 19.05.2018).
Manning C. et al. The Stanford CoreNLP natural language processing toolkit // Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. — 2014.
Chen F. Topic Modeling of Document Metadata for Visualizing Collaborations over Time / P. Chiu, S. Lim // Proc. of the Int. Conf. on Intelligent User Interfaces (IUI). — 2016.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Izvestiya of Altai State University is a golden publisher, as we allow self-archiving, but most importantly we are fully transparent about your rights.
Authors may present and discuss their findings ahead of publication: at biological or scientific conferences, on preprint servers, in public databases, and in blogs, wikis, tweets, and other informal communication channels.
Izvestiya of Altai State University allows authors to deposit manuscripts (currently under review or those for intended submission to Izvestiya of Altai State University) in non-commercial, pre-print servers such as ArXiv.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).