Machine Learning NYC Neighborhoods
13 Jan 2016Check out the NYC Neighborhood Predictor!
It uses AWS Machine Learning to predict which neighborhood a string of text originates from.
Overview
- Uses a simple Elixir and Phoenix Web app to expose an AWS Machine Learning Real-Time Endpoint.
- Source Code
Machine Learning What?
- Using a dataset of ~1G of geo-tagged tweets, we create a CSV with two columns: text and neighborhood.
- After training and evaluating a machine learning (ML) model with this data, we expose the real-time endpoint via this elixir application.
Takeaways
- Molding the training data to create a better model is the real challenge here.
- Does my data even have statistical correlations or is it just noise?
- Iterate, iterate, and iterate again on the model and evaluation data is what needs to be done here.
Input Schema
{
"version": "1.0",
"targetAttributeName": "Neighborhood",
"dataFormat": "CSV",
"dataFileContainsHeader": true,
"attributes": [
{
"attributeName": "Text",
"attributeType": "TEXT"
},
{
"attributeName": "Neighborhood",
"attributeType": "CATEGORICAL"
}
],
"excludedAttributeNames": []
}