Given a textual query in traditional text-based image retrieval (TBIR), relevant images should be reranked using visual features after the initial text-based search. In this paper, we propose a new bag-based reranking framework for TBIR on a large scale. Specifically, we first group the relevant images using both textual and visual features. Treating each group as a "bag" and the images in the bag as "instances", we formulate this problem as a multi-instance learning (MI) problem. MI learning methods like mi-SVM can be easily incorporated into our bag-based reranking framework. Observing that at least a certain portion of a positive bag is of positive instances, while a negative bag can also contain positive instances, we continue to use a more suitable Generalized MI (GMI) configuration for this application. To address the ambiguities in instance tags on the positive and negative bags under this GMI setting, we developed a new method called GMI-SVM to improve retrieval performance by propagating labels from the bag level to the level of instance. To acquire the annotations of the bags for the learning of (G) MI, a method of sorting bags is proposed to classify all the bags according to the score of the defined bag. Top rated bags are used as pseudopositive training bags, while pseudonoactive training bags can be obtained by random sampling of some irrelevant images that are not associated with the textual query. Comprehensive experiments in the challenging NUS-WIDE real-world dataset demonstrate that our auto-annotation frame can achieve the best results compared to existing image reranking methods.