|
[1] C. Alkan, S. Sajjadian, and E. E. Eichler. Limitations of next-generation genome sequence assembly. Nature methods, 8(1):61–65, 2011. [2] E. Anderson and J. Tucek. Efficiency matters! ACM SIGOPS Operating Systems Review, 44(1):40–45, 2010. [3] W. J. Ansorge. Next-generation DNA sequencing techniques. New biotechnology, 25(4):195–203, 2009. [4] M. Armbrust, A. Fox, R. Griffith, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010. [5] G. Berg, C. Zachow, H. Müller, J. Philipps, and R. Tilcher. Next-generation bio-products sowing the seeds of success for sustainable agriculture. Agronomy, 3(4): 648–656, 2013. [6] Y.-J. Chang, C.-C. Chen, C.-L. Chen, and J.-M. Ho. A de novo next generation genomic sequence assembler based on string graph and MapReduce cloud computing framework. BMC genomics, 13(7):S28, 2012. [7] K. Chaturvedi and L. Sahijram. Plant molecular biology applications in horticulture: An overview. In Plant Biology and Biotechnology, pages 113–129. Springer, 2015. [8] J. Cohen, B. Dolan, M. Dunlap, J. M. Hellerstein, and C. Welton. MAD skills: new analysis practices for big data. Proceedings of the VLDB Endowment, 2(2):1481–1492, 2009. [9] F. F. Costa. Big data in biomedicine. Drug discovery today, 19(4):433–440, 2014. [10] J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107–113, 2008. [11] J. Ekanayake, H. Li, B. Zhang, T. Gunarathne, S.-H. Bae, J. Qiu, and G. Fox. Twister: a runtime for iterative MapReduce. In Proceedings of the 19th ACM international symposium on high performance distributed computing, pages 810–818. ACM, 2010. [12] T. C. Glenn. Field guide to next-generation DNA sequencers. Molecular ecology resources, 11(5):759–769, 2011. [13] S. Gnerre, I. MacCallum, D. Przybylski, F. J. Ribeiro, J. N. Burton, B. J. Walker, T. Sharpe, G. Hall, T. P. Shea, S. Sykes, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proceedings of the National Academy of Sciences, 108(4):1513–1518, 2011. [14] H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu. Starfish: A self-tuning system for big data analytics. In Cidr, volume 11, pages 261–272, 2011. [15] W. Huang, L. Li, J. R. Myers, and G. T. Marth. ART: a next-generation sequencing read simulator. Bioinformatics, 28(4):593–594, 2012. [16] L. Ilie and M. Molnar. RACER: rapid and accurate correction of errors in reads. Bioinformatics, 29(19):2490–2493, 2013. [17] L. Ilie, F. Fazayeli, and S. Ilie. HiTEC: accurate error correction in high-throughput sequencing data. Bioinformatics, 27(3):295–302, 2011. [18] E. Jahani, M. J. Cafarella, and C. Ré. Automatic optimization for MapReduce programs. Proceedings of the VLDB Endowment, 4(6):385–396, 2011. [19] L. Jourdren, M. Bernard, M.-A. Dillies, and S. Le Crom. Eoulsan: a cloud computing-based framework facilitating high throughput sequencing analyses. Bioinformatics, 28(11):1542–1543, 2012. [20] W.-C. Kao, A. H. Chan, and Y. S. Song. ECHO: a reference-free short-read error correction algorithm. Genome research, 21(7):1181–1192, 2011. [21] D. R. Kelley, M. C. Schatz, and S. L. Salzberg. Quake: quality-aware detection and correction of sequencing errors. Genome biology, 11(11):R116, 2010. [22] F. Krueger. Trim Galore!: A wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, 2015. [23] B. Langmead, M. C. Schatz, J. Lin, M. Pop, and S. L. Salzberg. Searching for SNPs with cloud computing. Genome biology, 10(11):R134, 2009. [24] B. Langmead, K. D. Hansen, and J. T. Leek. Cloud-scale RNA-sequencing differential expression analysis with Myrna. Genome biology, 11(8):R83, 2010. [25] S. Lattanzi, B. Moseley, S. Suri, and S. Vassilvitskii. Filtering: a method for solving graph problems in MapReduce. In Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures, pages 85–94. ACM, 2011. [26] K.-H. Lee, Y.-J. Lee, H. Choi, Y. D. Chung, and B. Moon. Parallel data processing with MapReduce: a survey. AcM sIGMoD Record, 40(4):11–20, 2012. [27] H. Li and R. Durbin. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 25(14):1754–1760, 2009. [28] H. Li, B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. The sequence alignment/map format and SAMtools. Bioinformatics, 25(16):2078–2079, 2009. [29] L. Liu, Y. Li, S. Li, N. Hu, Y. He, R. Pong, D. Lin, L. Lu, and M. Law. Comparison of next-generation sequencing systems. BioMed Research International, 2012, 2012. [30] Y. Liu, J. Schröder, and B. Schmidt. Musket: a multistage k-mer spectrum-based error corrector for illumina sequence data. Bioinformatics, 29(3):308–315, 2013. [31] R. Luo, B. Liu, Y. Xie, Z. Li, W. Huang, J. Yuan, G. He, Y. Chen, Q. Pan, Y. Liu, et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. Gigascience, 1(1):18, 2012. [32] P. Medvedev, E. Scott, B. Kakaradov, and P. Pevzner. Error correction of high-throughput sequencing datasets with non-uniform coverage. Bioinformatics, 27(13): i137–i141, 2011. [33] P. Mell, T. Grance, et al. The NIST definition of cloud computing. 2011. [34] M. L. Metzker. Sequencing technologies–the next generation. Nature reviews. Genetics, 11(1):31, 2010. [35] N. Nagarajan and M. Pop. Sequence assembly demystified. Nature reviews. Genetics, 14(3):157, 2013. [36] H. Nordberg, K. Bhatia, K. Wang, and Z. Wang. BioPig: a Hadoop-based analytic toolkit for large-scale sequence data. Bioinformatics, 29(23):3014–3019, 2013. [37] A. O’Driscoll, J. Daugelaite, and R. D. Sleator. ‘big data’, Hadoop and cloud computing in genomics. Journal of biomedical informatics, 46(5):774–781, 2013. [38] A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 165–178. ACM, 2009. [39] M. Pop. Genome assembly reborn: recent computational challenges. Briefings in bioinformatics, 10(4):354–366, 2009. [40] C. Ranger, R. Raghuraman, A. Penmetsa, G. Bradski, and C. Kozyrakis. Evaluating MapReduce for multi-core and multiprocessor systems. In High Performance Computer Architecture, 2007. HPCA 2007. IEEE 13th International Symposium on, pages 13–24. IEEE, 2007. [41] K. Robasky, N. E. Lewis, and G. M. Church. The role of replicates for error mitigation in next-generation sequencing. Nature Reviews Genetics, 15(1):56–62, 2014. [42] D. H. Roukos and C.-S. Ku. Clinical cancer genome and precision medicine. Annals of surgical oncology, 19(12):3646–3650, 2012. [43] L. Salmela and J. Schröder. Correcting errors in short reads by multiple alignments. Bioinformatics, 27(11):1455–1461, 2011. [44] S. L. Salzberg, A. M. Phillippy, A. Zimin, D. Puiu, T. Magoc, S. Koren, T. J. Treangen, M. C. Schatz, A. L. Delcher, M. Roberts, et al. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome research, 22(3):557–567, 2012. [45] M. C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 25(11):1363–1369, 2009. [46] M. C. Schatz and B. Langmead. The DNA data deluge. IEEE Spectrum, 50(7): 28–33, 2013. [47] M. C. Schatz, B. Langmead, and S. L. Salzberg. Cloud computing and the DNA data race. Nature biotechnology, 28(7):691–693, 2010. [48] J. Schröder, H. Schröder, S. J. Puglisi, R. Sinha, and B. Schmidt. SHREC: a short-read error correction method. Bioinformatics, 25(17):2157–2163, 2009. [49] A. Schumacher, L. Pireddu, M. Niemenmaa, A. Kallio, E. Korpelainen, G. Zanetti, and K. Heljanko. SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop. Bioinformatics, 30(1):119–120, 2014. [50] D. Sims, I. Sudbery, N. E. Ilott, A. Heger, and C. P. Ponting. Sequencing depth and coverage: key considerations in genomic analyses. Nature Reviews Genetics, 15(2): 121–132, 2014. [51] L. D. Stein. The case for cloud computing in genome informatics. Genome biology, 11(5):207, 2010. [52] R. C. Taylor. An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. BMC bioinformatics, 11(12):S1, 2010. [53] T. J. Treangen and S. L. Salzberg. Repetitive DNA and next-generation sequencing: computational challenges and solutions. Nature reviews. Genetics, 13(1):36, 2011. [54] J. D. Ullman. Designing good MapReduce algorithms. XRDS: Crossroads, The ACM Magazine for Students, 19(1):30–34, 2012. [55] G. Wang, A. R. Butt, P. Pandey, and K. Gupta. A simulation approach to evaluating design decisions in MapReduce setups. In Modeling, Analysis & Simulation of Computer and Telecommunication Systems, 2009. MASCOTS’09. IEEE International Symposium on, pages 1–11. IEEE, 2009. [56] X. Yang, K. S. Dorman, and S. Aluru. Reptile: representative tiling for short read error correction. Bioinformatics, 26(20):2526–2533, 2010. [57] X. Yang, S. P. Chockalingam, and S. Aluru. A survey of error-correction methods for next-generation sequencing. Briefings in bioinformatics, 14(1):56–66, 2013. [58] D. R. Zerbino and E. Birney. Velvet: algorithms for de novo short read assembly using de bruijn graphs. Genome research, 18(5):821–829, 2008. [59] Q. Zhang, L. Cheng, and R. Boutaba. Cloud computing: state-of-the-art and research challenges. Journal of internet services and applications, 1(1):7–18, 2010. [60] Q. Zou, X.-B. Li, W.-R. Jiang, Z.-Y. Lin, G.-L. Li, and K. Chen. Survey of MapReduce frame operation in bioinformatics. Briefings in bioinformatics, 15(4):637–647, 2013.
|