Ebulk tool makes easy to exchange or archive very large data sets. It performs data set ingestions or downloads from different protocols, to Wendelin-IA platform. It also allows to perform local changes in data sets and to upload added and modified files. One key feature of Ebulk is to be able to resume and recover from errors happening with interrupted transfers. <b><ahref="erp5/web_site_module/fif_data_runner/#/?page=ebulk_doc"style="color:#116D82">See documentation</a></b>.
Ebulk tool makes easy to exchange or archive very large data sets. It performs data set ingestions or downloads from different protocols, to Wendelin-IA platform. It also allows to perform local changes in data sets and to upload added and modified files. One key feature of Ebulk is to be able to resume and recover from errors happening with interrupted transfers. <b><ahref="#/?page=ebulk_doc"style="color:#116D82">See documentation</a></b>.
<h1>Ebulk + Wendelin = Big Data sharing platform</h1>
<p><ahref="erp5/web_site_module/fif_data_runner/#/?page=ebulk_doc">Ebulk</a> tool and <atarget="_blank"href="https://wendelin.nexedi.com/">Wendelin</a> platform are combined to form an easy to use Data Lake to share petabytes of data grouped into data sets. Big Data sharing is essential for research and startups, due building new A.I. models requires access to large data sets, usually available in big platforms such as Google or Alibaba which tend to keep them secret. This project offers a solution to the big data sharing problem by solving the following key points:</p>
<p><ahref="#/?page=ebulk_doc">Ebulk</a> tool and <atarget="_blank"href="https://wendelin.nexedi.com/">Wendelin</a> platform are combined to form an easy to use Data Lake to share petabytes of data grouped into data sets. Big Data sharing is essential for research and startups, due building new A.I. models requires access to large data sets, usually available in big platforms such as Google or Alibaba which tend to keep them secret. This project offers a solution to the big data sharing problem by solving the following key points:</p>
<p><ahref="erp5/web_site_module/fif_data_runner/#/?page=contact">Contact us to get a full user</a></p>
<p><ahref="#/?page=contact">Contact us to get a full user</a></p>
<p><spanclass="contact-link"></span></p>
</div>
</div>
<h1>Data lake</h1>
<p>Dozens of public and private big data sets are available in the platform, terabytes of data of any kind, including binaries like medical images, ndarrays and more. Do you want to download data sets or share your data? <ahref="erp5/web_site_module/fif_data_runner/#/?page=download">Download</a> our Ebulk tool to transfer big data! Please <ahref="erp5/web_site_module/fif_data_runner/#/?page=contact">contact us</a> to register and get a user. See our full <ahref="erp5/web_site_module/fif_data_runner/#/?page=fifdata">data set list</a> !</p>
<p>Dozens of public and private big data sets are available in the platform, terabytes of data of any kind, including binaries like medical images, ndarrays and more. Do you want to download data sets or share your data? <ahref="#/?page=download">Download</a> our Ebulk tool to transfer big data! Please <ahref="#/?page=contact">contact us</a> to register and get a user. See our full <ahref="#/?page=fifdata">data set list</a> !</p>
<h1>Ebulk tool</h1>
<p>Ebulk tool is a wrapper for <atarget="_blank"href="http://www.embulk.org/docs/">Embulk</a>, an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. It supports any kind of input file formats, parallel and distributed execution to deal with big data sets, transaction control to guarantee All-or-Nothing file transfer, and operation resuming. Ebulk is as easy as git to use, allowing the big data transfering to be done by using very few commands. Please, <ahref="erp5/web_site_module/fif_data_runner/#/?page=download">download</a> Ebulk and check the <ahref="erp5/web_site_module/fif_data_runner/#/?page=ebulk_doc">documentation</a>.</p>
<p>Ebulk tool is a wrapper for <atarget="_blank"href="http://www.embulk.org/docs/">Embulk</a>, an open-source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services. It supports any kind of input file formats, parallel and distributed execution to deal with big data sets, transaction control to guarantee All-or-Nothing file transfer, and operation resuming. Ebulk is as easy as git to use, allowing the big data transfering to be done by using very few commands. Please, <ahref="#/?page=download">download</a> Ebulk and check the <ahref="#/?page=ebulk_doc">documentation</a>.</p>
<h1>Wendelin</h1>
<p><atarget="_blank"href="https://wendelin.nexedi.com/">Wendelin</a> is a big data framework designed for industrial applications based on python, NumPy, Scipy and other NumPy based libraries. It uses at its core the NEO distributed transactional NoSQL database to store petabytes of binary data. Wendelin combines the performance of scikit-learn machine learning with NEO distributed storage in order to provide out-of-core processing of large data sets. Its goal is to bring the best open source, big data engine based on Numpy python technologies and gather a wide community of contributors of new data analytics algorithms.</p>