Skip directly to content

Data Tools

Guidelines and standards

Data tools: conversion, exploration, analysis

  • csvkit - performs search, query, filter and other operations on csv files; enables SQL-style joins on csv files. csvkit online docs.
  • Data Science Toolkit - collection of useful tools to extract and convert test, GIS and other data (my overview here)
  • Data Wrangler - helps transform messy data into nicely formatted table for user in analysis and visualization software
  • Trifacta - Wrangle complex data with ease and simplicity
  • DocumentCloud - Turn documents into data.
  • Epi Info - free software package from CDC that provides data collection, advanced statistical analyses, and GIS mapping capability
  • Google Refine - clean, organize, refine (duh!) and explore your new datasets,  great for exploring new datasets
  • How to Extract Data to CSV file from PDF file with Tabula  | Flowing Data
  • PDF Tables - a web simple tool to extract data from a PDF document. Made by ScraperWiki 
  • Kimono - a tool for non-programmers who want to scrape data off the web | Kimono Labs
  • Microsoft Excel - still the standard for many as the easy first stop to review data
  • NotePad++ - Notepad Puls Plus is a free source code editor and Notepad replacement that supports several languages. 
  • Overview - clean, visualize and interactively explore large documents and data set (started by AP)
  • ScraperWiki - provides software and instructions to extract data and information from web sites
  • Shapefiles to Tableau Extract - Free online converter
  • Stat/Transfer - converts data between formats of statstical analysis packages
  • Tablea Add-In to Reshape Data in Excel - Knowled Base, Tableau Software
  • Tabula - Software by Manuel Aristarán to extract data from a pdf file to csv file
  • The PANDA Project - the new newsroom data appliance
  • VistaMetrix - extract data from any graphic by selecting which points to capture in an overlay; works on pictures and video
  • Neo4j - The Graph Database [Operational Database Management System]
  • EasyMorph - Data preparation tool for non-technical users
  • RegressIt - Free Excel add-in for linear regression and multivariate data analysis
  • KML2CSV - A Kml to csv converter online tool 
  • StatCrunch - StatCrunch is powerful web-based statistical software that allows users to collect data, perform complex analyses, and generate compelling results.

R Language

  • R Language - The R Project for Statistical Computing
  • RStudio - RStudio IDE is a powerful and productive user interface for R. It’s free and open source, and works great on Windows, Mac, and Linux.
  • Shiny by RStudio - A web application framework for R. Turn your analyses into interactive web applications. No HTML, CSS, or JavaScript knowledge required
  • R Packages - Advanced Data Analytics

Python

  • Python -  download | docs | community
  • Anaconda - downloads  | docs - Modern Open Source Analytics Platform. Enterprise-ready Python distribution with 330+ packages for large-scale data processing, predictive analytics, and scientific computing
  • PyCharm - Python IDE for Proffessional Developers

Regular Expressions

Alteryx (Data Extraction, Transformation, Integration & Loading)

  • Process for getting started with Alteryx
  1. Download Alteryx Project Edition – it’s Free (5-10 mins required). Alteryx Project Edition provides a free, fully working copy of the software. You can build and run as many processes as you like, as often as you like, viewing the results within the tool, and once you’ve got the data as you want it you can also export the results to any number of different formats. | The Information Lab
  2. Start with the Video (10 – 15 mins required) or Tutorials (30 mins needed) | Alteryx on YouTube
  3. Tableau users – download the Visual Analytics Toolkit and work through the samples (30 mins needed)
  4. Build out the first data process. Some support resources: i) On Demand Videos from Alteryx; ii) Alteryx Community (forums, knowledgebase, ideas); iii) Product Training; iv) Daily Demos (free daily live tutorials)

Blogs about Alteryx

Mobile data collection

  • Frontline SMS -
  • Medic Mobile - enables data collection via regular phones via SIM apps; extends existing open-source platforms, including FrontlineSMS, OpenMRS, Ushahidi, Google Apps, and HealthMap
  • Open Data Kit (ODK) - University of Washington based research group developing an innovative, open source platform to enable mobile data collection

Data service providers

Health data and metadata standards

Data Confidenciality

Collaboration Tools

  • screenleap - Screen sharing and online meeting software

Graphics Editors

  • Inkscape - a professional vector graphics editor for Windows, Mac OS X and Linux. It's free and open source

SEO Tools

Online References and Tools

Visualization providers

  • GRAPHIQ - A semantic technology company that delivers pre-designed data visualizations. Uselfull for enriching editorial content, applications, and research sites. 

NASA Software