IEEE Spectrum 09月29日 10:49
编程语言流行度评估方法与数据来源
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍了评估编程语言流行度的方法,通过整合多种代理信号来衡量语言的活跃度和关注度。研究共识别了64种编程语言,并从Google搜索、Stack Overflow、IEEE Xplore数字图书馆、IEEE招聘网站、CareerBuilder、GitHub、Trinity College Dublin图书馆以及Discord等多个数据源收集信息。通过对各项指标进行标准化和加权,最终生成了综合的流行度指数。评估维度包括IEEE成员和软件工程师的活跃使用情况(“Spectrum”排名)、雇主需求(“Jobs”排名)以及网络热度(“Trending”排名)。文章详细阐述了各项数据收集的细节、时间节点以及数据处理方式,旨在提供一个全面且客观的编程语言流行度排名。

📊 **多维度数据整合以评估流行度**:文章采用综合方法评估编程语言的流行度,通过整合来自Google搜索、Stack Overflow、IEEE Xplore数字图书馆、IEEE招聘网站、CareerBuilder、GitHub、Trinity College Dublin图书馆及Discord等多个渠道的数据,力求全面反映语言的活跃度和公众关注度。这种多源数据融合的方式有助于弥补单一数据源的局限性,提供更稳健的评估结果。

📈 **三大维度衡量语言活跃度**:流行度评估被分解为三个核心维度:“Spectrum”排名(衡量IEEE会员和软件工程师的实际使用情况)、“Jobs”排名(反映雇主对语言的需求)以及“Trending”排名(捕捉语言的网络热度和时下关注度)。这种区分有助于理解语言在不同场景下的影响力,从学术研究到市场就业再到公众认知。

🔍 **数据收集与处理的严谨性**:文章强调了数据收集的严谨性,包括手动收集数据以应对API变化,以及在数据量过大时采用抽样方法(如95%置信区间)进行处理。对数据进行标准化、加权和重新归一化等统计处理,确保了最终流行度指数的客观性和可比性,并承认权重设定的主观性,同时基于对数据源的理解。

🛠️ **细化各数据源的收集方法**:文章详细介绍了每个数据源的收集过程,例如Google搜索的关键词模板、Stack Overflow的标签计数、IEEE Xplore和招聘网站的抽样检查、GitHub的语言列表筛选、Trinity College Dublin图书馆的图书检索以及Discord的标签计数。这些细节展示了研究团队在数据采集上的细致和务实。



Our Top Programming Languages interactive tries to tackle the problem of estimating a language’s popularity by looking for proxy signals. We do this by constructing measures of popularity from a variety of data sources that we believe are good proxies for active interest for each programming language. In total, we identify 64 programming languages. We then weight each data source to create an overall index of popularity, excluding some of the lowest scorers. Below, we describe the sources of data we use to get the measures, and the weighting scheme we use to produce the overall indices.

By popularity, we mean we are trying to rank languages that are in active use, including activity from maintaining legacy systems. We look at three different aspects of popularity: languages in active use among typical IEEE members and working software engineers (the “Spectrum” ranking), languages that are in demand by employers (the “Jobs” ranking), and languages that are in the zeitgeist (the “Trending” ranking).

We gauged the popularity of languages using the following sources for a total of seven metrics (see below). We gathered the information for all metrics in July—August 2025. In the past we relied heavily on APIs to gather data from sources, but now the data is gathered manually to the difficulty of keeping up with API changes and terminations, and because many of the programming language’s names (C++, Scheme) collided with common terms found in research papers and job ads or were difficult for a search engine to parse. When a large number of search results made it impractical to resolve ambiguities by examining all of the results individually, we used a sample of each data source, and determined the relevant sample size based on estimating the true mean with 95 percent confidence. Not all data sources contain information for each programming language and we interpret this information as the programming language having “no hits” (that is, not being popular).

The results from each metric are normalized to produce a relative popularity score between 0 and 1. Then the individual metrics are multiplied by a weight factor, combined, and the result renormalized to produce an aggregate popularity score.In aggregating metrics, we hope to compensate for statistical quirks that might distort a language’s popularity score in any particular source of data. Varying the weight factors allows us to create the different results for the Spectrum, Jobs, and Trending rankings. We fully acknowledge that, while these weights are subjective, they are based on our understanding of the sources and our prior coverage of software topics. Varying the weight factors allows us to emphasize different types of popularity and produce the different rankings. We then combined each weighted data source for each program and then renormalized the resulting frequency to produce an aggregate popularity score.

The Top Programming Languages was originally created by data journalist Nick Diakopoulos. Our statistical methodology advisor is Hilary Wething, although all the actual data gathering and calculation is performed by us. Rankings are computed using R.

Google

Google is the leading search engine in the world, making it an ideal fit for estimating language popularity. We measured the number of hits for each language by searching on the template, “X programming language” (with quotation marks) and manually recorded the number of results that were returned by the search. We took the measurement in July 2025. We like this measure because it indicates the volume of online information resources about each programming language.

Stack Overflow

Stack Overflow is a popular site where programmers can ask questions about coding. We recorded the number of questions tagged to each program within the last week prior to our search (August 2025). For the Mathematica/Wolfram language, we relied on the sister “Stack” for the Mathematica platform and tallied the number of programming-related questions asked in the past week. These data were gathered manually. This measure indicates what programming languages are currently trending.

IEEE Xplore Digital Library

IEEE maintains a digital library with millions of conference and journal articles covering a wide array of scientific and engineering disciplines. We searched for journal, magazine, and early access articles that mention each of the languages in the template “X programming” for the 2025 year-to-date. For search results that returned thousands of articles, we identified the correct sample size for a 95 percent confidence interval (usually a little over 300) and pulled that number of articles. For each language we sampled, we identified the share of articles that utilize the programming language and then multiplied the total number of articles by this share to tally the likely total number of articles that reference a given programming language. We conducted this search in August 2025. This metric captures the prevalence of the different programming languages as used and referenced in engineering scholarship.

IEEE Jobs Site

We measured the demand for different programming languages in job postings on the IEEE Job Site. For search results that returned thousands of listings, we identified the correct sample size for a 95 percent confidence interval (usually around 300 results) and pulled that number of job listings to manually examine. For each language we sampled, we identified the share of listings that utilize the programming language and then multiplied the total number of job listings by this share to tally the likely total number of job listings that reference a given programming language. Additionally, because some of the languages we track could be ambiguous in plain text—such as Go, J, Ada, and R—we searched for job postings with those words in the job description and then manually examined the results, again sampling entries if the number of results was large. The search was conducted in August 2025. We like the IEEE Job Site for its large number of non-U.S. listings, making it an ideal to measure global popularity.

Career Builder

We measured the demand for different programming languages on the CareerBuilder job site. We searched for “Developer” jobs offered within the United States, as this is the most popular job title for programmers. We sampled 400 job ads and manually examined them to identify which languages employers mentioned in the postings. The search was conducted in August 2024. We like the career builder site to identify the popularity of programmer jobs among U.S.-based companies

GitHub

GitHub is a public repository for many volunteer-driven open-source software projects. We used Github’s listing of it’s top 50 programming languages, filtering out entries for things like Docker configuration scripts. The data cover the first quarter of 2025. This measured provides a strong indication what languages coders choose to work in when they have a personal choice.

Trinity College Dublin Library

The library of Trinity College Dublin is one of six legal deposit libraries in Ireland and the United Kingdom. A copy must be deposited with the library of any printed material published or distributed in Ireland, and on request any U.K. publisher or distributor must also deposit a book. We searched for all books published in the year to date that matched the names of programming languages and checked the results for false positives. The search was conducted in July 2025. We like this library collection because it represents a large and categorized sample of works, primarily in the English language.

Discord

Discord is popular chat-room platform where many programmers exchange information. We counted the number of tags that correspond to each language. In the case of languages that could also be names of nonprogramming topics (many nonprogramming-related topics also have dedicated Discord servers; for example, “Julia” could refer to the programming language or the Sesame Street puppet), results were manually examined. Disboard was searched in August 2025. Disboard lists many public discord servers and many young coders use the site, contributing a different demographic of coders.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

编程语言 流行度 数据分析 方法论 Programming Languages Popularity Data Analysis Methodology
相关文章