Machine Learning NYC Neighborhoods

Check out the NYC Neighborhood Predictor! Prediction Matrix

It uses AWS Machine Learning to predict which neighborhood a string of text originates from.

Overview

Machine Learning What?

  • Using a dataset of ~1G of geo-tagged tweets, we create a CSV with two columns: text and neighborhood.
  • After training and evaluating a machine learning (ML) model with this data, we expose the real-time endpoint via this elixir application.

Takeaways

  • Molding the training data to create a better model is the real challenge here.
  • Does my data even have statistical correlations or is it just noise?
  • Iterate, iterate, and iterate again on the model and evaluation data is what needs to be done here.

Input Schema

{
  "version": "1.0",
    "targetAttributeName": "Neighborhood",
    "dataFormat": "CSV",
    "dataFileContainsHeader": true,
    "attributes": [
    {
      "attributeName": "Text",
      "attributeType": "TEXT"
    },
    {
      "attributeName": "Neighborhood",
      "attributeType": "CATEGORICAL"
    }
    ],
    "excludedAttributeNames": []
}

Prediction Matrix Neighborhood Categories

comments powered by Disqus