▍1. 微博数据挖掘
利用python调用新浪api接口实现的新浪微博数据的挖掘,能够获取指定的经纬度地点的微博动态信息,还能够将其写入MySQL数据库中。
利用python调用新浪api接口实现的新浪微博数据的挖掘,能够获取指定的经纬度地点的微博动态信息,还能够将其写入MySQL数据库中。
用Apriori算法挖掘出入侵检测数据集KDD99的数据关联性,从而检测出未知的攻击(Apriori algorithm is used to extract the data association of the intrusion detection data set KDD99, and the unknown attack is detected)
分类器的性能比较与调优: 使用scikit-learn 包中的tree,贝叶斯,knn,对数据进行模型训练,尽量了解其原理及运用。 使用不同分析三种分类器在实验中的性能比较,分析它们的特点。 本实验采用的数据集为house与segment。(Performance comparison and optimization of classifiers: We use tree, Bayesian and KNN in scikit-learnpackage to train the data model and try to understand its principle and application. The performances of three classifiers are compared and their characteristics are analyzed. The data set used in this experiment is house and segment.)
说明: 分类器的性能比较与调优: 使用scikit-learn 包中的tree,贝叶斯,knn,对数据进行模型训练,尽量了解其原理及运用。 使用不同分析三种分类器在实验中的性能比较,分析它们的特点。 本实验采用的数据集为house与segment。(Performance comparison and optimization of classifiers: We use tree, Bayesian and KNN in scikit-learnpackage to train the data model and try to understand its principle and application. The performances of three classifiers are compared and their characteristics are analyzed. The data set used in this experiment is house and segment.)
通过贝叶斯信息准则确定高斯混合聚类方法的聚类簇数(Determining the Cluster Number of GMM Clusters by BIC)
说明: 通过贝叶斯信息准则确定高斯混合聚类方法的聚类簇数(Determining the Cluster Number of GMM Clusters by BIC)
说明: 通过时事数据可视化系统,可以清楚地了解全球疫情分布的状况以及密度,以便做出相应的对策(Through the current affairs data visualization system, it is possible to clearly understand the distribution and density of the global epidemic in order to make corresponding countermeasures)
说明: 机器学习实战中文英文pdf+数据集+代码(Practice of machine learning)
Python数据预处理示例,包括数据清洗、数据整合、数据变换等操作。(Python data preprocessing examples, including data cleaning, data integration, data transformation and other operations.)
说明: Python数据预处理示例,包括数据清洗、数据整合、数据变换等操作。(Python data preprocessing examples, including data cleaning, data integration, data transformation and other operations.)
说明: 简单的LSTM进行预测,附带数据集方便测试(simple test of LSTM is used for prediction , and related datasets is attached in the file.)
聚类分析31省市的经济情况,以每个聚类簇的平均值来衡量省市经济的发展水平。(Cluster analysis of the economic situation of 31 provinces and municipalities, with the average value of each cluster to measure the level of economic development of provinces and municipalities.)
数据挖掘在经侦项目中的应用,本文用到python中的社区划分算法(In the application of data mining in economic investigation projects, this paper uses community partition algorithm in Python.)
从数据库获取车辆在一段时间内的所有行驶记录的相关数据,确定所需数据为GPS经纬度坐标和驾驶时长等,QB模型采用MDF的思想,其基本思想为:通过平均直接翻转距离函数定义两条轨迹之间的距离,两条轨迹需要具有相同的经纬度点数,具有相同点数的轨迹最大的优点是对轨迹距离成对计算,且相同轨迹之间具有更高的分辨率,对于轨迹聚类的结果有一定的优化。(Retrieved from the database cars all over a period of time, record the related data, determine the required data for the GPS latitude and longitude coordinates, and the driving time, QB model by adopting the idea of MDF, its basic idea is: flip directly by the average distance function definition of the distance between two trajectories, two tracks will have the same latitude and longitude points, and has the biggest advantages of the same points of trajectory track distance calculation in pairs, and has higher resolution, between the same trajectory for trajectory clustering results have certain optimization.)
说明: 从数据库获取车辆在一段时间内的所有行驶记录的相关数据,确定所需数据为GPS经纬度坐标和驾驶时长等,QB模型采用MDF的思想,其基本思想为:通过平均直接翻转距离函数定义两条轨迹之间的距离,两条轨迹需要具有相同的经纬度点数,具有相同点数的轨迹最大的优点是对轨迹距离成对计算,且相同轨迹之间具有更高的分辨率,对于轨迹聚类的结果有一定的优化。(Retrieved from the database cars all over a period of time, record the related data, determine the required data for the GPS latitude and longitude coordinates, and the driving time, QB model by adopting the idea of MDF, its basic idea is: flip directly by the average distance function definition of the distance between two trajectories, two tracks will have the same latitude and longitude points, and has the biggest advantages of the same points of trajectory track distance calculation in pairs, and has higher resolution, between the same trajectory for trajectory clustering results have certain optimization.)
PCA 数据降维 PTYTHON 数据分析/挖掘(PCA dimensionality reduction data mining/analysis)
说明: PCA 数据降维 PTYTHON 数据分析/挖掘(PCA dimensionality reduction data mining/analysis)