Top Performers in Data Science

Forum for Top Performance course users in Data Science
 
HomeHome  UsergroupsUsergroups  RegisterRegister  Log in  

Share | 
 

 NLP Data Sets?

Go down 
AuthorMessage
dain

avatar

Posts : 2
Join date : 2015-10-28

PostSubject: NLP Data Sets?   Wed Oct 28, 2015 11:40 pm

Hey Theo & All,

I had a thought today that was interesting. Would you consider the PDFs from all Wikileaks cables an interesting data set? Consider for a minute that you could extract the text out of them all, I'm wondering if you could use them to do interesting natural language processing.

Do you think that could be interesting? I can't envision any practical use for it that I could state in a sentence, but to learn more about NLP it could be a good exercise. I guess it's a novelty thing, because you could do the same learning with a data set that was for instance "text from every book in 2015", but the wikileaks cables is just novel and fun sounding haha.
Back to top Go down
Admin
Admin


Posts : 14
Join date : 2015-10-27

PostSubject: Re: NLP Data Sets?   Thu Oct 29, 2015 12:18 am

There are a bunch of text data sets you can use, I'll try and dig up some alternatives for you this weekend. The difficulty with NLP is trying to figure what you're trying to do. Build a better generative model? Get lower perplexity for a given corpus? etc. I think with NLP the goal is actually pretty open. Interesting areas that are being pushed forward are memory neural networks for question answering. You can also do sentiment analysis or community detection with twitter data. The difficulty with the wiki-leaks data is that I can't think of much more to do outside just run latent dirchelet allocation. At the same time if you could post a link or explain how to get "text of all books" that would be great.
Back to top Go down
http://tpdatascience.forumotion.com
 
NLP Data Sets?
Back to top 
Page 1 of 1
 Similar topics
-
» New DC Lego sets out at Big W
» Stuck on getting data centers?
» how to get map data back from maps already published yours and others
» *NEW METHOD* How to get your level data back from the broken http://droni.es/getdata.php?l=
» Toy Story LEGO + MATTEL WWE

Permissions in this forum:You cannot reply to topics in this forum
Top Performers in Data Science :: Data Sets :: Small Data Sets-
Jump to: