This list includes a selection of important data analysis tools that are broadly useful and accessible to data analysis beginners. Certain disciplines have preferred tools, platforms, metadata schema, and more, so it is important to review which resources will be important in a given context.
Note: The Library does not currently provide access to these tools on reading room terminals. Researchers will need to install tools on their own computer in order to compute against datasets downloaded from the Library.
Programming Languages / Software Packages
Python External – is a popular open source, object-oriented programming language. Several packages can be installed to simplify data analysis tasks, such as scikit-learn for machine learning and Statsmodels for statistical modeling.
R External – R is a procedural programming language and freely available software tool that is used for computational statistics and data visualization. The R community has developed several open-source packages that are useful for both general and domain-specific data analysis procedures.
Data Processing Tools
OpenRefine External – OpenRefine is a free, open source tool that assists with data cleaning and transformation (aka data wrangling).
Data Management Planning / Project Management
DMPTool &ndash External The DMPTool (Data Management Plan Tool) assists with creating a data management plan, according to funder specifications or generalized guidelines. A data management plan documents all activity surrounding the collection and preservation of research data.