Protecting Digital Library Resources through ML-Based Detection of Automated Access Patterns
Automated bots pose a growing challenge for digital libraries, consuming bandwidth, skewing usage statistics, and risking violations of licensed content agreements. Unlike human users, bots can send thousands of rapid or patterned requests, overwhelming servers and obscuring genuine research activity. To address this, the presented tool analyzes web traffic logs to detect abnormal IP behaviors indicative of automation. It extracts behavioral features—such as request frequency, referrer consistency, and HTTP method patterns—and evaluates them using an ensemble of data-driven anomaly detection methods. Each IP is assigned an overall anomaly score, and those exceeding a threshold are flagged as potential bots. The system outputs ranked reports, visual summaries, and interpretable metrics to assist library staff in monitoring and mitigating automated activity. By providing a scalable, unsupervised framework that requires no labeled data, this tool helps libraries preserve system integrity, ensure fair access to resources, and maintain the accuracy of usage analytics essential for decision-making and licensing compliance.