Advancing Machine Learning in E-Commerce and Predictive Analytics

工商管理學系暨商學研究所

撰文者／商學研究所 ANOOP REMANAN SYAMALA

The second session of Fall 2024’s Operations Management seminar featured an insightful presentation by the Chairman of the Department of Information Management at National Taiwan University, who previously held the position of Deputy Director at TSMC. With a strong background in machine learning, financial text analysis, and medical informatics, he shared two of his recent research works: “Product Representation Learning in Multi-store E-invoice Transaction Data for Product Relationship Understanding” and “Nonparametric Regression via Variance-Adjusted Gradient Boosting Gaussian Process Regression.” The seminar provided a deep dive into the challenges and advancements in data-driven decision-making and artificial intelligence.

In his first presentation, the speaker highlighted the importance of representation learning, especially in e-commerce. The study focused on developing data-driven techniques to identify product relationships in multi-store settings, specifically substitutes and complements. This research aimed to answer two critical questions: how to effectively build models that learn product representations across different stores and how to evaluate the quality of these models. The dataset consisted of e-invoice data from five major retailers in Taiwan for 2021, with a 5% random sample taken from the total transactions. The evaluation involved three main tasks: identifying complement and substitute products and a manual review using hit rates to measure the model’s accuracy.

Key findings revealed that the ID-SG model outperformed the term-SG and BERT-SG models, particularly in identifying complementary products. Interestingly, while the ID-SG model found it more challenging to locate substitutes, the term-SG and BERT-SG models struggled with identifying complements. This outcome was attributed to the tendency of lexically similar products to be in the same category, making it easier for models to identify substitutes. The seminar also discussed potential improvements using transformer-based methods to refine the models further. The study demonstrated practical value, particularly for enhancing recommendation systems in both online and offline retail contexts. However, the speaker emphasized that while practical applications are significant, academic contributions are equally important for researchers.

The second presentation centered on a recently published work in nonparametric regression using Variance-Adjusted Gradient Boosting Gaussian Process Regression (VAGR). The method addresses challenges associated with traditional Gaussian Process Regression (GPR), which, while effective, is computationally intensive in terms of both time and memory. The VAGR method leverages the Gaussian process assumption that features closely related in the input space will exhibit high correlation, integrating the Bayesian Committee Machine (BCM) to manage global approximation challenges. The approach aims to enhance prediction accuracy and computational efficiency by adjusting for variance in gradient boosting.

Results showed that VAGR performed comparably to or better than established baseline models such as Random Forest (RF), Support Vector Regression (SVR), and XGBoost. The speaker also discussed future research directions, including extending the method to classification and survival regression problems.

The presentations underscored the speaker’s dedication to solving real-world problems through innovative machine-learning techniques. The research on product representation learning offers potential improvements for recommendation systems, while advancements in regression methods can significantly enhance predictive modeling capabilities. During the discussion, some critical questions arose about balancing practical and academic contributions. While the real-world applications are promising, there is a need for deeper theoretical exploration to strengthen the educational impact. Additionally, challenges in distinguishing between complements and substitutes present an opportunity for further investigation, possibly through transformer-based enhancements.

Overall, the second session of the Fall 2024 Operations Management seminar provided valuable insights into the latest techniques in machine learning, demonstrating their potential applications in e-commerce and predictive analytics. The speaker’s contributions lay a solid foundation for future research and practical implementations in data-driven decision-making.