Author ORCID Identifier

https://orcid.org/0000-0001-9860-5072

Date of Award

8-1-2025

Document Type

Thesis

School

School of Computing

Programme

Ph.D.-Doctoral of Philosophy

First Advisor

Dr.R.Venkatesan

Keywords

Sentimental Analysis, Stock Market Prediction, Treebank Filtering, Broken-Stick Regression, Machine Learning

Abstract

Sentiment analysis has become one of the most important procedures to predict the stock market behaviour according to the customer reviews about a particular topic such as news, movie, event, and remarks related to the product. Due to the huge number of reviews generated from the customer, for analyzing information in an accurate manner. In order to detect general view of product, sentiment analysis technique is performed. Lately, the majority of research works is designed for Sentiment analysis by application of an organization and ranking techniques. But it suffers less exactness of the accurate classification of the customer reviews.

The procedure of identifying and classifying opinions in a piece of text to find out whether customer reviews towards a particular product or service are positive, negative, and neutral is termed as sentiment analysis. Stock market prediction is one of the most attractive topics in academic and real-life business. Treebank filtering Data Preprocessing based Ochiai-Barkman Relevance Vector Linear Programming Boost Classification (TFDP-ORVLPBC) technique is used for stock market prediction using sentimental analysis with higher prediction accuracy and lesser classification time for enhancing accuracy of stock market based on product review. Initially, the customer reviews and feedback on services or products are collected from the large database.

After that, the collected customer reviews are preprocessed by performing the process such as tokenization, stemming, filtering. In order to achieve sentimental analysis through classifying customer reviews as positive and negative, Ochiai-Barkman Relevance Vector Linear Programming Boost Classification algorithm is used. The Linear Programming Boost Classification algorithm constructswith an empty set of weak classifiers as the Ochiai-Barkman Relevance Vector machine. The customer reviews are classified based on the Ochiai-Barkman similarity coefficient. The ensemble technique combines the weak classification results into strong by minimizing the error. In this way, the classification performance gets improved and the prediction of the stock market is carried out in a more accurate manner.

Experimental evaluation is carried out on factors such as Prediction Accuracy, Precision, Recall and Prediction Time versus amount of customer reviews. The conventional techniques designed for sentiment analysis does not provide higher accuracy which impacts the reliability of stock market prediction. In order to improve the prediction performance, a Gensim Lovins Truncative Morisita-Horn’s Broken-stick Regression-based Recursive deep neural networks (GLTMBR-RDNN) is introduced for predicting the future outcomes in the stock market with a lesser error rate and minimal time. The customer reviews are collected from a large database. The GLTMBR-RDNN includes different layers for learning the give input reviews.

In the GLTMBR-RDNN technique, the first preprocessing of the text is carried out in the first hidden layer by removing stop words, stem words, truncation, and so on. First, the Gensim tokenizer is applied in preprocessing step to partition the text into a number of words. The proposed GLTMBR-RDNN technique uses a Sklearned model for stop words removal. Finally, the Normalization process is performed to transform the words into a standard form. After the preprocessing, Morisita-Horn’s Broken-stick Regression process is performed in the second hidden layer for predicting the future stock market value based on the classification of customer reviews by setting the breakpoint to the similarity score between the reviews.

In this way, the future stock market values are efficiently identified with enhanced classification accuracy in the output layer. The result of proposed GLTMBR-RDNN technique is analyzed using metrics such as accuracy, precision, recall, F-measure, and prediction time based on a different number of input reviews. The discussed results indicate that the proposed GLTMBR-RDNN technique improves the performance of accuracy with lesser prediction time when compared to existing methods. In current decades, sentiment analysis has used in commodity markets to analyze text data related to commodities, namely news articles, social media posts, and so on, in order to understand the emotions expressed in the text.

To determine people's outlook as well as sentiments regarding commodity, conducting text sentiment analysis on opinions expressed through users is essential. In this work a novel Qualitative Index Multilayer Extreme Learning Machine (QIMELM) model is introduced for sentiment analysis in commodity markets. First, collects news texts from the commodity markets dataset. Then, the collected news texts are preprocessed through three sub-processes namely tokenization, stemming, and stop word removal. With the preprocessed results, sentiment analysis is carried out to classify the opinions. The Tversky qualitative index is applied in the hidden layer to examine words as well as determine sentiment or emotional tendency of the text, classifying it as positive, negative, and neutral. The extreme learning machine provides sentiment classification outcomes at output layer with minimal error, ensuring a more accurate sentiment analysis on commodity markets. An experimental assessment of the proposed QIMELM using various evaluation parameters, namely accuracy, precision, recall, F-measure, as well as prediction time and space complexity. The quantitatively discussed outcomes indicate which performance of the proposed QIMELM improves data accuracy of sentiment classification, precision, recall, and F-measure by lesser time and space complexity compared to conventional methods.

Share

COinS