Just an idea:
I think README would be the best thing to run LDA on, since it contains a pretty good description of the project. Projects without README should be penalised either way. Often times the repository description is too short to describe in detail what the repository is all about.