stmaiteam

Turkish Keyphrase Extraction from Web Pages with BERT

Emre Tolga Ayan, Rabia Arslan, Muhammed Said Zengin, Hacı Ali Duru, Sedat Salman ve Batuhan Bardak

Keyword extraction is a natural language processing task that enables the extraction of essential and descriptive words in the text. The task of extracting keywords from well-structured texts has been studied extensively in the literature. However, because of the difference in the structures of the gathered data from the websites and the difficulty of this process, there is a lack of studies in this field. In this study, data from the websites of 25 large Turkish companies operating in Turkey is collected, and keywords related to these companies are extracted. In the proposed deep learning-based model, Sentence-BERT, a BERT-based method that has recently yielded quite successful results in natural language processing, is used. To evaluate the performance of the proposed method, the data is annotated with human effort and the impressive results are shared with the reader. In addition, clues about the companies’ business domains are tried to be detected by clustering the companies’ keywords.

Related Material: