home link https://storj.io/
블록체인 기반의 P2P 분산형 클라우드 스토리지 네트워크를 만들어 사용자가 타가 저장소 공급자에 의존하지 않고 데이터를 전송하고 공유할 수 있도록 합니다.
Executive Chairman & Interim CEO
Founder & CSO
Changing the Security Parad...
By John Gleeson, VP of Operations at Storj LabsWith the paint barely dry on our production release of the Tardigrade Platform, one of the areas where we’re seeing the strongest interest from customers and partners building apps is our security model and access control layer. The security and privacy capabilities of the platform are some of the most differentiating features and they give our partners and customers some exciting new tools.Distributed and decentralized cloud storage is a fantastic way to take advantage of underutilized storage and bandwidth, but in order to provide highly available and durable cloud storage, we needed to build in some fairly sophisticated security and privacy controls. Because we had to build with the assumption that any Node could be run by an untrusted person, we had to implement a zero-knowledge security architecture. This turns out to not only make our system far more resistant to attacks than traditional architectures, but also brings significant benefits to developers building apps on the platform.Decentralized Architecture Requires Strong Privacy and SecurityFrom the network perspective, we need to make sure the data stored on our platform remains private and secure. At the most basic level, we need to ensure that pieces of files stored on untrusted Nodes can’t be compromised, either by accessing that data or preventing access to that data. We combine several different technologies to achieve data privacy, security and availability.From the client side, we use a combination of end-to-end encryption, erasure coding and macaroon-based API keys. Erasure coding is primarily used to ensure data availability, although storing data across thousands of statistically uncorrelated Storage Nodes does add a layer of security by eliminating any centralized honeypot of data.By way of example, when a file or segment is erasure coded, it is divided into 80 pieces, of which any 29 can be used to reconstitute the (encrypted) file. With our zero-knowledge architecture, any Node Operator only gets one of the 80 pieces. There is nothing in the anonymized metadata to indicate what segment that piece belongs to, or where the other 80 pieces are etc. It’s worth noting that 80 pieces is the minimum number of pieces for a single file. Files larger than 64MB are broken up into 64 MB segments, each of which is further divided up into 80 pieces. A 1GB file for example is broken up into 16 segments, each with a different randomized encryption key, and each broken up into 80 pieces, for a total of 1,280 pieces.If a hacker wants to obtain a complete file, they need to find at least 29 Nodes that hold a piece of that file, compromise the security of each one (with each Node being run by different people, on different Nodes, using different firewalls, etc.). Even then, they would only have enough to reconstitute a file that is still encrypted. And, they’ll have to repeat that process for the next file, and for files larger than 1GB, every segment of a file. Compare that to a situation (e.g. what was seen at Equifax a few years ago), where a simple misconfiguration gave access to hundreds of millions of individuals’ data, and you’ll see the power of this new model.Just storing data on the Tardigrade platform provides significant improvements over centralized data storage in terms of reducing threat surfaces and exposure to a variety of common attack vectors. But when it comes to sharing access to data-especially highly sensitive data-developers really experience the advantages of our platform. Where we’re already seeing the most interest from partners on the combination of end-to-end encryption and the access management capabilities of our API keys.Separating Access and EncryptionOne of the great things about the Tardigrade Platform is that it separates the encryption function from the access management capabilities of the macaroon-based API keys, allowing both to be managed 100% client-side. From a developer perspective, managing those two constructs is easy because all of the complexity is abstracted down to a few simple commands. What this enables developers to do is move access management from a centralized server to the edge.Hierarchically Deterministic End-to-End EncryptionAll data stored on the Tardigrade platform is end-to-end encrypted from the client side. What that means is users control the encryption keys and the result is an extremely private and secure data store. Both the objects and the associated metadata are encrypted using randomized, salted, path-based encryption keys. The randomized keys are then encrypted with the user’s encryption passphrase. Neither Storj Labs nor any Storage Nodes have access to those keys, the data, or the metadata.By using hierarchically derived encryption keys, it becomes easy to share the ability to decrypt a single object or set of objects without sharing the private encryption passphrase or having to re-encrypt objects. Unlike the HD API keys, where the hierarchy is derived from further restrictions of access, the path prefix structure of the object storage hierarchy is the foundation of the encryption structure.A unique encryption key can be derived client-side for each object whether it’s a path or file. That unique key is generated automatically when sharing objects, allowing users to share single objects or paths, with the ability to encrypt just the objects that are shared, without having to worry about separately managing encryption access to objects that aren’t being shared.Access Management with Macaroon-based API KeysIn addition to providing the tools to share the ability to decrypt objects, the Tardigrade Platform also provides sophisticated tools for managing access to objects. Tardigrade uses hierarchically derived API keys as an access management layer for objects. Similar to HD encryption keys, HD API keys are derived from a parent API key.Unlike the HD encryption keys where the hierarchy is derived from the path prefix structure of the object storage hierarchy, the hierarchy of API keys is derived from the structure and relationship of access restrictions. HD API keys embed the logic for the access it allows and can be restricted, simply by embedding the path restrictions and any additional restrictions within the string that represents the macaroon. Unlike a typical API key, a macaroon is not a random string of bytes, but rather an envelope with access logic encoded in it.Bringing it Together with the AccessAccess management on the Tardigrade Platform requires coordination of the two parallel constructs described above-encryption and authorization. Both of these constructs work together to provide an access management framework that is secure and private, as well as extremely flexible for application developers. Both encryption and delegation of authorization are managed client-side.While both of these constructs are managed client-side, it’s important to point out that only the API keys are sent to the Satellite. The Satellite interprets the restrictions set by the client in the form of caveats, then controls what operations are allowed based on those restrictions. Encryption keys are never sent to the Satellite.Sharing access to objects stored on the Tardigrade Platform requires sending encryption and authorization information about that object from one client to another. The information is sent in a construct called an Access. An Access is a security envelope that contains a restricted HD API key and an HD encryption key-everything an application needs to locate an object on the network, access that object, and decrypt it.To make the implementation of these constructs as easy as possible for developers, the Tardigrade developer tools abstract the complexity of encoding objects for access management and encryption/decryption. A simple share command encapsulates both an encryption key and a macaroon into an Access in the format of an encoded string that can be easily imported into an Uplink client. Imported Accesses are managed client-side and may be leveraged in applications via the Uplink client library.Why Security at the Edge MattersThe evolution of cloud services and the transition of many services from on-premise to centralized cloud has massive increases in efficiency and economies of scale. That efficiency in many ways is driven by a concentration not only of technology, but expertise, and especially security expertise. That efficiency has also come at the cost of tradeoffs between security and privacy. Moreover, many new business models have emerged based almost entirely on the exchange of convenience for giving up the privacy of user data. In the cloud economy, user’s most private data is now more at risk than ever, and for the companies that store that data, new regulatory regimes have emerged, increasing the impact on those businesses if that data is compromised.The Intersection of Cybersecurity Skill and Decentralized DataWhile the transition of on-premise to cloud has brought a reduction in the number and types of hacks, much of the vulnerability of on-premise technology was due in part to a lack of cybersecurity experience and expertise. A big part of the push to Gmail is that fact that it’s much less likely to get hacked than a privately operated mail server.The transition to the cloud has resulted in a much greater separation of security expertise and technology use. The cost of best-in-class security expertise of cloud providers is, like the cost of infrastructure, spread across all customers. One additional consequence of that separation-the loss of cybersecurity expertise-is the lack of appreciation of the resulting tradeoff. That security does not come with transparency, and in fact, many times that security comes in exchange for a loss of privacy.This is where a decentralized edge-based security model provides a similar security advantage but without the tradeoffs against transparency or privacy. With Storj, you get the benefit of the team’s distributed storage, encryption, security and privacy expertise but you also get the full transparency of the open-source software. This ultimately enables the ability not only to trust but to verify the security of the platform, but that’s not where the difference ends. Storj provides all the security benefits of a cloud platform, but provides the tools to take back control over your privacy.Edge-based Security + Decentralized Architecture = Privacy by DefaultClassic authorization technologies are built for client-server architectures. Web-centric authorization schemes such as OAuth and JWT are built for largely synchronous transactions that involve separating the resource owner and the authorization service. Each of these approaches depends for its success on a central authority. To truly maximize privacy and security at massive scale, there is a need to efficiently delegate resource authorization away from centralized parties.Moving token generation and verification closer to the edge of the architecture represents a fundamental shift in the way technologists can create verified trust systems. Having the ability in a distributed system to centrally initiate trust (via API Keys) and extrapolate specifically scoped keys from that trust allows systems to generate their own trust chains that can be easily managed for specific roles and responsibilities. Authorization delegation is managed at the edge but derived based on a common, transparent trust framework. This means that access tokens generated at the edge can be efficiently interpreted centrally, but without access to the underlying encrypted data.Distributed and decentralized environments are designed to eliminate trust by definition. By moving security, privacy, and access management to the edge, users regain control over their data. With tools such as client-side encryption, cryptographic audits and completely open-source architecture, trust boundaries and risk are mitigated not by the service provider, but by the tools in the hands of the user.A Different Approach Delivers Differentiated Value Out-of-the-boxThe Tardigrade Platform’s distributed cloud storage and edge-based security model provide easy tools for building applications that are more private, more secure, and less susceptible to the range of common attacks. With this approach, no incompetent or malicious operator can undermine security. There is no careless administrator, no unethical data mining business model, no misconfigured print server, and no social hack that can undermine data. By embracing decentralization and security at the edge, the system is architected to be resilient. Unlike other cloud storage providers, like the AWS Detective solution, Tardigrade integrates security features which are enabled by default. With the Tardigrade Platform, you don’t pay extra for security and privacy.Reduced Risk — Common attacks (misconfigured access control lists, leaky buckets, insider threats, honeypots, man-in-the-middle attacks, etc.) depend for their success on breaching a central repository of access controls or gaining access to a treasure trove of data. The Tardigrade Platform security model provides a way to architect out whole categories of typical application attack vectors.Reduced Threat Surface — By separating trust boundaries and distributing access management and storage functions, a significant percentage of the typical application threat surfaces is either eliminated or made orders of magnitude more complex to attack.Enhanced Privacy — With access managed peer-to-peer, the platform provides the tools to separate responsibilities for creating bearer tokens for access management from encryption for use of the data. Separation of these concerns enables decoupling storage, access management and use of data, ensuring greater privacy with greater transparency.Purpose-Built for Distributed DataDistributed data storage architecture combined with edge-based encryption and access management stores your data as if it were encrypted sand stored on an encrypted beach. The combination of client-side HD Encryption keys and HD API keys in an easy-to-use platform enables application developers to leverage the capability-based security model to build applications that provide superior privacy and security.Originally published at https://storj.io.
20. 05. 20
General Availability for Ta...
The internet was designed to be decentralized. When you send a message, stream media, or do a video conference, you don’t worry about which routers your data is passing through, who owns them, or whether some may be down. The decentralized model for internet communications has delivered multiple orders of magnitude of improvements in reliability, speed, and price. No one questions the appropriateness of leveraging TCP/IP for enterprise applications.However, leveraging decentralization for enterprise grade compute and storage has never been possible–at least until today. Being first is never easy, but it’s always notable. Today, we’re pleased to celebrate the launch of the world’s first SLA-backed decentralized cloud storage service.Two years ago, we started rebuilding our decentralized cloud storage network from scratch. Our previous network reached more than 150 petabytes of data stored across more than 100,000 Nodes — however our offering struggled to scale beyond those numbers, and we didn’t see it would deliver enterprise-grade parameters. It was a tough decision to make, but we can proudly say our rebuild was well worth the effort.Through an extensive beta period, with thousands of users and Storage Node Operators, we’ve been able to demonstrate enterprise grade performance and availability; we’ve delivered S3 compatibility; we’ve demonstrated 100% file durability for over 10 months, enhanced by the fact that we have cross-geography redundancy by default. Our economic model allows us to offer prices at a fraction of the large providers, while still delivering attractive compensation rates to our Storage Node Operators. We’re also able to offer great channel margins to our partners, which builds a solid foundation for a healthy business at Storj Labs.Perhaps most importantly, our end-to-end encryption and zero-knowledge architecture enabled us to deliver a security model that’s significantly more resilient and prevents anyone (including us) from mining data or compromising user privacy.Launch Gates to Measure SuccessAs we’ve talked about in the past, we put in place a rigorous set of launch gates — and we weren’t willing to proceed with a production launch until:We demonstrated multiple months of consistent progress in durability, availability, and usability.Until we had sufficient capacity and a large enough number of vetted node operators to ensure we could meet demand.Until we had stress tested the system with thousands of users and Node Operators.Until we had the tools, documentation, and libraries available to support our major use cases.Until we had battle tested the system with partners.Until we had been thoroughly vetted by third party security firms.Until we could confidently back our service with enterprise grade Service Level Agreements (SLAs).Tardigrade Connectors Ready for UseWe’re proud to be launching Tardigrade along with connectors that allow users to integrate our service with some of the most popular and innovative open source and cloud applications in the world. In addition to our thousands of users, Kafkaesque, Fluree, Verif-y, and CNCTED are just a few of the partners with Tardigrade integrations. We have many more partners finalizing integrations each week. And, we’re launching with not only built in S3 compatibility, but an extensive library of bindings, including .Net, C, Go, Python, Android, Swift, and Node.js, that can enable developers to build Storj-native applications to take advantage of our full range of advanced features, such as macaroons.Perhaps my favorite part about Tardigrade is that the platform is supporting the open source community. I’ve spent the past 15 years working for open source projects and have seen first-hand many of the challenges they face. If you’re an open source project whose users store data in the cloud, you can passively generate revenue every time your users upload data to Tardigrade. Through our Open Source Partner Program, any open source project with a connector will receive a portion of the revenue we earn when your users store data on Tardigrade. There are no limits. There is no catch. We want to support open source because we ourselves are open source software developers.To our amazing community; thank you immensely for supporting us throughout this rebuild journey. We’re proud to say we met most of our deadlines, but we thank you for your patience and hope you love Tardigrade as much as the team here at Storj Labs does!Storj Labs is among a myriad of companies building the infrastructure to power decentralized web 3.0If you haven’t given Tardigrade a try, take a few minutes, sign-up for an account to receive a free credit, and upload a file. See the power of decentralization for yourself.By Ben Golub on BusinessOriginally posted at https://storj.io on March 19, 2020
20. 03. 19
What to Expect in Production
20. 03. 01
Announcing Early Access For...
Today our team is thrilled to announce our Tardigrade decentralized cloud storage service is finally ready for production workloads — we can’t wait for you to try it out. We’re welcoming the first paying customers to experience the advantages of decentralized cloud storage with an early access release (RC 1.0). We expect a complete production launch this quarter, at which time we’ll remove the waitlist and open user registration to all.With this release, Tardigrade users can expect:Full service level agreements: For both our early access release and our production launch, Tardigrade users can expect 3 9s of availability (99.9%) and 9 9s of durability (99.9999999%).1TB credits for storage and bandwidth: All users who signed up for our waitlist will receive this credit after adding a STORJ token balance or credit card to their account. All waitlist credits will expire after one full billing cycle after our production launch, so claim them now so you don’t miss out! After users utilize their credits, they’ll be charged for their usage.1TB limits on storage and bandwidth: During this early access period, all accounts will have limits of 1TB for both their static storage usage and their monthly bandwidth. Submit a request through our support portal if you need to increase this limit.Backward compatibility: Developers building on top of Tardigrade can expect their applications to have full backward compatibility with the general availability production launch.If you haven’t yet, sign up now to get your credit before time runs out! If you’ve already joined the waitlist, check your inbox for an email with details on how to get started. If you’re already a Tardigrade user, first, thank you very much and second, your account will be credited with 1TB of storage and bandwidth after you add a STORJ token balance or a credit card. Users who have both a credit card and a STORJ token balance will first see their STORJ token balance charged, with their credit card as the secondary method of payment. Even after users exhaust their credits, they’ll still pay less than half the price of traditional cloud storage for their usage.Over the past six months, our team has quietly been gathering feedback from customers pilots and POCs, as well as data from network stress tests. We’re confident our first initial partners and customers, as well as users who are joining from the waitlist, will have a positive experience trying out Tardigrade. As an extra measure to ensure the quality of that experience, we’re being extremely vigilant in balancing the network in terms of supply and demand. Over the coming weeks, we’ll continue to gather data and feedback from our initial customers. Once we’re fully confident we can scale that quality experience to meet the anticipated demand, we’ll announce general availability, remove the waitlist, and allow anyone to sign up to experience Tardigrade first-hand.General Availability Release TimingBetween now and production not much will change in the system, other than a significant increase in data to be uploaded by our first paying customers.We’ve previously talked about the qualification gates we use to ensure the product is ready to support our customers’ cloud storage needs and deliver a solid user experience, both of which are critical to drive adoption of the platform. We established these gates to guarantee that we delivered not only an enterprise-grade technology solution but also a world-class customer experience. Since establishing these launch gates, we’ve continuously monitored and reported upon our progress. At every major milestone, we’ve evaluated our progress toward the gates, and the applicability and validity of those gates. As part of this new early access phase, we’ve added an additional qualification gate, which is to deliver two weeks of success with real-world users (and their data).Tardigrade Adoption During Beta 1 and 2To date, we’ve welcomed 10,000+ Tardigrade waitlist members to the platform. There is more than 4 PB (4,000 TB) of data stored with 20 PB of available capacity. Thousands of developers have created accounts, projects, and uploaded data to the network. In addition to these developers, we’ve been working with a number of large, notable partners and customers who have large-scale use cases to prove that decentralized cloud storage is the right solution for their needs. We’ll be sharing more about these customer’s use cases in the coming months.During this early access phase, we’ll provide an extra level of support to early customers and developers, gather further feedback, and continue to refine the onboarding process. We’re doing this so that when we actually move forward with mass adoption, we’ll be ready — and so will Tardigrade.Early Access vs General AvailabilityOnce we announce general availability, users will be able to sign up directly on Tardigrade.io, after which they’ll receive immediate confirmation for their account. During early access, we’ll be sending out invites once a week to continue to throttle new user registrations.As mentioned before, those users that do have access will have their limits raised to 1TB for both bandwidth and storage after they add a method of payment. During general availability, limits will start at 5GB once a credit card is added, and limits will increase from there. So sign up now to reserve your higher limit for our production launch.Our Storage Node Operators won’t experience much of a difference between early access and general availability other than a steady increase in the amount of customer data and bandwidth utilization on the network. We have a lot of upcoming features planned for the storage nodes including SNO board enhancements, configuration tools and improvements to graceful exit . We have some exciting news to share about our team building out features for Storage Node Operators on the Q1 2020 Town Hall, so make sure to tune in.After our general availability release, we’ll share more information about our 2020 roadmap, but we’ve been very impressed by the amount of feedback we’ve received through the ideas portal and on the forum — we’ve actually incorporated many of the suggestions into our plans. If you have additional suggestions, please share them. We review every single suggestion. You can also see other ideas that have been submitted and the status of suggestions.We want to give a HUGE thank you to our community of amazing Tardigrade users and Storage Node Operators for all your contributions to the network. We literally couldn’t build the decentralized future without you and your efforts.By John Gleeson and JT Olio on BusinessOriginally published at https://storj.io on January 30, 2020
20. 03. 01
Use Cases for the Decentral...
Have you heard the news? Tardigrade is in early access production, which means that developers can start using the decentralized cloud storage service for real workloads. This is a huge milestone for the company and network, and we’re excited to have you along for the journey.Tardigrade BenefitsTardigrade is superior to centralized alternatives, for a number of reasons. First off, we’re more durable. Decentralization means there is no single point of failure, and the risk of file loss is dispersed through statistical uncorrelation.Data is streamed, hyper-locally and in-parallel, enabling us to be much faster than centralized competitors. Because our economics are similar to that of Airbnb (or Uber) — we’re also able to sell storage at half the price of AWS.While decentralized cloud storage is awesome and highly optimized for some use cases, it isn’t a perfect fit for everything. Tardigrade object storage is highly optimized for larger files, especially those which are written once and read many times.Use Cases for TardigradeYou may be wondering how you can get started? Here are a few specific use cases that are well suited for decentralized cloud storage:Large File Transfer: Tardigrade is especially well suited for transiting large amounts of data point-to-point over the internet. High-throughput bandwidth takes advantage of parallelism for rapid transit — client-side encryption ensures privacy during transit.Common examples include large files, academic/scientific datasets, binaries, media collections, and the like. Developers aren’t charged for upload bandwidth, so uploading files to the network doesn’t incur any cost, and there are no penalties if you decide to take your data with you and run.For a great example of large file transfer, see transfer.sh. In an average month, transfer.sh is used more than 1,000,000 times to easily transfer files worldwide.Database backups: Storing backups and snapshots of databases is another use case that is specifically well suited for decentralized storage. Regular snapshot backups of databases for data recovery or testing are an entrenched part of infrastructure management. They enable you to quickly capture the state of your database at a given point in time and capture the change in the database from the backup to the present.On the decentralized cloud, streaming backups eliminates the need to write large database snapshots to local disk before backup or for recovery.Low volume CDN: Out of the box, Tardigrade supports the fluid delivery of multimedia files with the ability to seek to specific file ranges and support for large numbers of concurrent downloads.On the decentralized cloud, native file streaming support and distributed bandwidth load across highly distributed nodes reduces bottlenecks.Multimedia Storage: Another common use case is for the storage of large numbers of big multimedia files, especially data produced at the edge from sources like security cameras that must be stored for long periods of time with low access.Rapid transit leveraging parallelism makes distributed storage effective for integrating with video compression systems to reduce the volume of data stored.Private Data: Tardigrade is highly optimized for data that is highly sensitive and an attractive target for ransomware attacks or other attempts to compromise or censor data.Client-side encryption, industry-leading access management controls, and a highly distributed network of storage nodes reduce attack surface and risk.Back-end to dApps: A dApp backed by centralized cloud storage means you’re missing out on the biggest benefits of decentralization. Using Tardigrade as the back-end to your dApp increases its privacy, security, and resiliency when compared to legacy, centralized cloud storage solutions.Get StartedWe expect there are many more ways to incorporate Tardigrade decentralized cloud storage into your applications and cloud environments. Ready to get started and see what Tardigrade can do for you? Follow our Tardigrade documentation to create your first project and upload your first file in just a few minutes.By Kevin Leffew on TutorialsOriginally published on https://storj.io on February 11, 2020
20. 03. 01
How deletes affect performa...
Our team here at Storj Labs is currently in the middle of adding support for CockroachDB, which is a horizontally scalable Postgres compatible database. Each database technology behaves differently and it’s important to understand the tradeoffs they make to utilize them efficiently. Along the way, we are learning a lot about CockroachDB and one of the things we have learned is how deletes impact database performance.PerformanceCockroachDB uses a technique called MVCC ¹ to manage concurrent transactions. Deletes leave behind tombstones to mark records as deleted. Deleted records don’t get removed from the primary key/value store or indices until the gc_ttl grace period window expires, which has a default of 25 hours. This means any query using the table has to process more data than you may expect if you assumed all those records were immediately removed from the system. I want to stress, this doesn't violate any transactional guarantees and doesn't return any undesired results. Deleted data appears deleted correctly. This only affects performance. If you're doing sparse deletes this probably won't be noticeable. If you're doing bulk deletes you may notice performance doesn't improve after you have issued deletes until the 25-hour window has expired and has purged the bulk deleted records. Old values changed with updates also get purged when the gc_ttl grace period has expired.Another thing to consider with CockroachDB deletes is that if you issue too large of a delete statement you may experience a query too large exception. To work around this you can delete records with a limit or some continuous range of the primary key.Some techniques to consider to mitigate these side effects-if you’re experiencing this problem-could be lowering the gc_ttl 25-hour interval. If you're using the enterprise version of CockroachDB you can use partitions, or alternatively views if you're not using the enterprise version. Truncating entire sections of a table is also an option. This avoids tombstones, but requires you to define sufficiently coarse-grained segments when you do the inserts.Thanks, Namibj from the Cockroach Slack for this information.You can read more details on the official CockroachDB documentation ².“CockroachDB relies on multi-version concurrency control (MVCC) to process concurrent requests while guaranteeing strong consistency. As such, when you delete a row, it is not immediately removed from disk. The MVCC values for the row will remain until the garbage collection period defined by the gc.ttlseconds variable in the applicable zone configuration has passed. By default, this period is 25-hours.This means that with the default settings, each iteration of your DELETE statement must scan over all of the rows previously marked for deletion within the last 25-hours. This means that if you try to delete 10,000 rows 10 times within the same 25-hour period, the 10th command will have to scan over the 90,000 rows previously marked for deletion.If you need to iteratively delete rows in constant time, you can alter your zone configuration and change gc.ttlseconds to a low value like five minutes (i.e., 300), and run your DELETE statement once per GC interval. We strongly recommend returning gc.ttlseconds to the default value after your large deletion is completed.”Why gc_ttl existsThis 25-hour window exists to help support long-running queries and the AS OF SYSTEM TIME³ clause that enables querying a specified time in the past. Another purpose is for restore. Data is kept around for a while so that you can restore back to a point in time. For example, if you ended up deleting more data than you intended with a bad where clause, restore can put the table back to where it was before.BackupsIt’s important that you do an incremental backup at least once within the gc_ttl time window.References¹ Multiversion concurrency control. https://en.wikipedia.org/wiki/Multiversion_concurrency_control² Why are my deletes getting slower over time. https://www.cockroachlabs.com/docs/stable/sql-faqs.html#why-are-my-deletes-getting-slower-over-time³ AS OF SYSTEM TIME https://www.cockroachlabs.com/docs/stable/as-of-system-time.htmlBy Simon GuindonOriginally published at https://storj.io.
19. 12. 20
Secure access control in th...
When the tech industry began the transition to cloud-based resource provisioning, the attack/security vectors in which DevOps and CISOs focus on to protect their resources, shifted along with it.Suddenly, protecting users’ data required a fundamentally new approach to containing resources. Rather than simply “defending the perimeter” (through ownership of network infrastructure, firewalls, NICs etc.) the model shifted to an identity-based approach to control access to systems and resources.This practice has become known as Identity and Access Management (IAM), and defines the way users authenticate, access data, and authorize operations in a public cloud environment.When it comes to authorization and authentication on the web, the standard public cloud approach is through Access Control Lists (ACLs). However, the capability-based approach leveraged by decentralized networks is indisputably more secure, and I will explain why in this blog post.Core Problems with Public Cloud’s ACL modelThe ACL model, sometimes referred to as the Ambient Authority Model, is based on user identity privileges (for example, through Role-based Access Control).The ACL keeps a list of which users are allowed to execute which commands for on an object, or file. This list of abilities is kept logically separate from the actual identity of the users.The appeal of ACLs partially arises from a notion of a singular “SuperAdmin” being able to list and fully control every user’s account and privileges.This centralized approach to control creates a massive honeypot for hackers, because when the SuperAdmin loses control, the entire system falls apart.Because the ACL model defines access through the user-agent identity (or abstractions like roles, groups, service accounts etc.), each resource acquires its access control settings as the result of a superuser administrator making deliberate access configuration choices for it.This is a major weakness of the ACL approach, especially within todays’ massively parallel and distributed systems, where resources are accessed across disparate operating systems and multiple data stores.Essentially, the ACL model associates users to files, and controls permissions around them.The Access Control List Approach fails for two reasons:Failure 1: ambient authority trapAn authority is “ambient” if it exists in a broadly visible environment where any subject can request it by name.For example, in Amazon S3, when a request is received against a resource, Amazon has to check the corresponding ACL (an ambient authority) to verify that the requester has the necessary access permissions.This is an unnecessary extra hop in the authentication process that leads to ambient authority. In this scenario the designation of the authority (the user) is separated from the authority itself (the access control list), violating the Principle of Least Authority (POLA).Furthermore, IAM systems based on the ACL model fall into the ambient authority trap — where user roles are granted an array of permissions in such a way that the user does not explicitly know which permissions are being exercised.In this design flaw, inherent to many public cloud platforms, user-agents are unable to independently determine the source, or the number/types of permission that they have, because the list is held separately from them on the ACL. Their only option is through trial and error, making a series of de-escalated privilege calls until they succeed.To invoke an analogy, this is like using a personal, unmarked key to open a series of infinite doors. You don’t know which door will open until you try it. Very inefficient!As a result, If agents cannot identify their own privilege set, they cannot safely delegate restricted authority on another party’s behalf. It would be risky for someone to lend a key to a neighbor, not knowing which of my doors it might open.In the world of operating systems and mission-critical distributed systems, avoiding ambient authority privilege escalation is crucial, especially when running untrusted code.Every application today is launched with grossly excessive authority to the users operating systems. This is why many systems implement FreeBSD jails like Capsicum and Linux Docker containers to sandbox software.Google is even working on a new capability-based operating system called Fuchsia to supercede the Linux Android kernel.Failure 2: confused deputy problemA deputy is a program that manages authorities coming from multiple sources. A confused deputy is a delegate that has been manipulated into wielding its authority inappropriately.Examples of the Confused Deputy Problem can be found across the web. These include injection attacks, cross-site request forgery, cross site scripting attacks, click-jacking etc. These attacks take advantage of ambient authority to use the victim’s existing program logic to nefarious ends in web applications.In order to avoid the Confused Deputy Problem, a subject must be careful to maintain the association between each authority and its intended purpose. This is wholly avoided by the capability-based model described below.Capability-based security is betterFrom a security-design standpoint, the capability model introduces a fundamentally better approach to identity and access management than Public Cloud’s ACL framework.By tying access to keys, rather than a centralized control system, capability-based models push security to the edge, decentralizing the large ACL attack vector and creating a more secure IAM system.The capability-based model solves both the ambient authority trap and the confused deputy problem by design.What is a capability?Often referred to as simply a ‘key,’ a capability is the single thing that both designates a resource and authorizes some kind of access to it. The capability is an unforgeable token of authority.Those coming from the Blockchain world will be very familiar with the capability-based security model, as it is the model implemented in Bitcoin where “your key is your money” and in Ethereum where “your key is gas for EVM computations”.This gives the client-user full insight into their privilege set, illustrating the core tenet of the Capability Mindset: “ don’t separate designation from authority. “.Similar to how in the Blockchain world, “your keys are your money,” with Tardigrade, your keys are your data, and macaroons add additional capabilities that allow the owners of data to caveat it, or granularly delegate access for sharing, programatically.Key-based ownership of object data will enable users to intuitively control their data as a first principle, and then delegate it as they see fit. The decentralized cloud eliminates the increasingly apparent risk of data loss/extortion due to holding data on one single provider (like Amazon, Google, or Microsoft).Storj, with its Tardigrade service, presents a better model where object data is encrypted, erasure-coded, and spread across thousands of nodes stratified by reputation whereby any and every computer can be the cloud.Macaroons are the key innovationMacaroons enable granular, programmatic authorization for resources in a decentralized way.The construction of macaroons was first formulated by a group of Google engineers in 2014. These chained, nested constructions are a great example of the capability-based security model and are deeply integrated into the V3 Storj Network.Macaroons are excellent for use in distributed systems, because they allow applications to enforce complex authorization constraints without requiring server-side modification, making it easy to coordinate between decentralized resource servers and the applications that use them.Their name, “MAC-aroons”, derives from the HMAC process (hash-based message authentication code) by which they are constructed, while also implicitly alluding to a claim of superiority over the HTTP cookie.In practice, HMACs are used to simultaneously verify both the data integrity and the authentication of a message.Similar to the blocks in a blockchain, HMACs are chained within a macaroon (whereby each caveat contains a hash referring to the previous caveats), such that caveats that restrict capabilities can only be appended, and not removed.Macaroons solve the cookie-theft problem associated with OAUTH2 and traditional cloud services by delegating access to a bearer token that can only be used in specific circumstances through HMAC chained ‘caveats’ (i.e. restrictions on IP, time-server parameters, and third- party auth discharges). These caveats can be extended and chained, but not overwritten.Capability-security in the Tardigrade NetworkIn the Tardigrade Network, macaroons are referred to as API Keys, and enable users to granularly restrict and delegate access to object data in a way that is decentralized and more secure than existing cloud solutions.From a developer standpoint, Capabilities make it very easy to write code that granularly defines security privileges. Once baked, the rules within the capability cannot be changed, without reissuing the key itself.Access management on the Tardigrade platform requires coordination of two parallel constructs — Authorization and Encryption. With macaroons, both of these constructs work together to provide an access management framework that is secure and private, as well as extremely flexible for application developers.A macaroon embeds the logic for the access it allows and can be restricted, simply by embedding the path restrictions and any additional restrictions within the string that represents the macaroon. Unlike a typical API key, a macaroon is not a random string of bytes, but rather an envelope with access logic encoded in it.To make the implementation of these constructs as easy as possible for developers, the Tardigrade developer tools abstract the complexity of encoding objects for access management and encryption/decryption ( https://godoc.org/storj.io/storj/lib/uplink#hdr-API_Keys).Macaroons in actionWhile the possibilities for access controls that can be encoded in a caveat are virtually unlimited, the specific caveats supported on the Tardigrade Platform are as follows:Specific operations: Caveats can restrict whether an API Key can perform any of the following operations: Read, Write, Delete, ListBucket: Caveats can restrict whether an API Key can perform operations on one or more BucketsPath and path prefix: Caveats can restrict whether an API Key can perform operations on Objects within a specific path in the object hierarchyTime window: Caveats can restrict when an API Key can perform operations on objects stored on the platformFor some sample Go code around access-restriction, check out https://godoc.org/storj.io/storj/lib/uplink#example-package--RestrictAccessConclusionMacaroons are a great example of capability-based security models in action, and Storj is a shining example of their implementation in decentralized cloud protocols.In Storj, we refer to our implementation of macaroons (HMACs) as simply API Keys. Using macaroons as a construct for API keys is innovative and useful because of their:Speed: HMACs are very fast and lightweightTimeliness: Can require fresh credentials and revocation checks on every requestFlexibility: Contextual confinements, attenuation, delegation, and third-party caveatsAdoptability: HMACs can run everywhereOne of the best ways to learn about capability-based models is to try them in action.Sign up for the developer waitlist, join our community forum, and let us know what you think!—By Kevin LeffewThanks to Noam Hardy and JT Olio.Sourceshttp://srl.cs.jhu.edu/pubs/SRL2003-02.pdf http://zesty.ca/zest/out/msg00139.html http://cap-lore.com/CapTheory/ConfusedDeputy.html https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41892.pdfOriginally published at https://storj.io.
19. 12. 05
How to Measure Whether Your...
By Ben Golub, Storj LabsSo, you’re about to launch a product that will be used for business-critical use cases. Everyone speaks about being “production-ready,” and “enterprise-grade.” How can you make those vague concepts concrete?There is an old saying that “You never get a second chance to make a first impression.” In the enterprise storage world, the corollary is: “You never get a second chance to be trusted with someone’s data.” With Tardigrade, we are asking users to consider storing production data on a decentralized service. So, what does it mean to be “production grade” and how can you measure if you’ve achieved it?To measure whether your new service is ready for the world to start using is not as easy as it sounds. You have to answer a number of questions, including but not limited to:Who is the intended user and what are the intended use cases?Is the product easy for your intended user to incorporate into their environment?Is the product comparable to or better than existing alternatives?Does the product deliver rigorously measurable performance and reliability?What may be production-grade for one person might not be production-grade to another. But what’s clear is that the concept of being production grade goes far beyond delivering a specific amount of code or features.When we set out to build a new type of cloud storage, we knew it would be a challenge. We needed to ensure the highest levels of performance, reliability and resiliency straight from the beginning. People aren’t forgiving when it comes to data loss. (Surprise!) To measure whether our decentralized cloud storage solution has met the needs of our customers and has the performance and durability we can stand strongly behind, we created a series of metrics we can measure that are important to our platform, our company, and our community of developers.We call these metrics “launch gates” and for us to enter a new milestone-from Alpha to production-each gate must be achieved. Today, we announced our Beta 2 (Pioneer 2) release, which is our final milestone before our production launch in January 2020.Here are the details on where we currently are, how it compares to our Beta 2 launch gates, and what we need to do to reach production.AvailabilityOur availability measure is a full-stack end-to-end measurement of all of our systems being responsive and performing requested operations. We do this by uploading and downloading 10MB files. We test a randomly selected segment for each Satellite every minute. We already had good availability for the last few months, but after some exciting changes to our protocol, architecture, and operational processes 3 weeks ago, we have seen a success rate of 100%. If a file fails an availability check, it doesn’t mean the file is gone (we’ve never lost a file); it just means a second attempt is necessary to download it. Our production goal is 99.995% availability (i.e., the service should only be unavailable for two minutes in any given month, or about four seconds per day) so once we have sufficient statistical history on our new architecture and process, we feel confident we can achieve this.File DurabilityFile durability measures the likelihood that one of your files could go missing. This is especially challenging to calculate when you’ve never lost a file. We are very proud of the fact that, since our cutover to version 0.1.0 seven months ago, we haven’t lost a single file. Some might think we could claim 100% durability. However, our meticulous data scientists remind us that, statistically speaking, we need to have had billions of file-months of 100% durability in order to state, with 95% confidence, that we’ll have 99.99999999% durability. So, applying our current level of 2 million files at 100% durability for seven months, the statistical model yields 99.9999% (6 9s) durability. (i.e. The statistical likelihood of losing a file is 1 in a million, far less than the chances of being struck by lightning this year.) For production, we are aiming to get to 99.9999999% (9 9s) durability, while our long-term goal is to reach 11 9s of durability, but we will need to maintain the current level of 100% durability for significantly more files for at least one year to be able to officially make that guarantee.Now, you may be wondering how we achieve this level of durability. Each file on the network is encrypted (using keys only held by the user uploading the data) and then divided into 64MB chunks called segments. These segments are then each encoded into 80 pieces using Reed Solomon error-correcting code, which enables us to rebuild the segment from any 29 of its 80 pieces. Each of those 80 pieces is stored on a different Node, with independent power supply, different geographical locations, different operators, and different operating systems. The system as a whole continually monitors those Nodes for uptime and issues cryptographic audits to ensure that the Nodes are storing what they claim to be storing. The system knows, for every single segment, how many of those 80 pieces are currently on Nodes that are online and have passed all required audits, Whenever a segment drops below a number of pieces equalling or exceeding the repair threshold (currently 52 based on statistical models), the system rebuilds those missing pieces and sends them to new Nodes. As you can see from our segment health distribution chart below, we’ve never had a segment drop below 50 of its 80 pieces. To lose a segment, we’d have to drop below 29. As we add more Nodes to the network, this should continue to improve.Upload PerformanceTo measure upload performance, we calculate the time it takes to upload a 10 MB file and then we repeat this test 250 times. We then compare the results to AWS, which is generally considered the gold standard for centralized cloud service providers. We not only look at the median time to upload, we also look at the long tail performance. Across a broad range of file sizes and locations, we are comparable to AWS. For a Beta, that’s pretty encouraging. Moreover, we have a really tight distribution. Our 99th percentile upload time is only 0.37 seconds slower than our 50th percentile time (i.e. the slowest 5 files uploaded almost as fast as the 50 fastest), versus a 5.35 second differential between the 50th and 99th percentile on AWS. This consistency and predictability is due to the inherent benefits of decentralization and should only get better as we add more Nodes and continue to add Nodes that are distributed closer to the end-users (ultimately, the speed of light becomes a factor).The above results are uploading from a location in Toronto (i.e. Eastern US) in conjunction with a Satellite in Iowa (i.e Central US).Download PerformanceWe measure the time it takes to download and reconstitute a 10 MB file. We repeat this test for 250 times, and then compare the results to AWS. Across a broad range of file sizes and locations, we are comparable to AWS. We’re especially excited about our tight distribution. Our 95th percentile time is only .26 seconds slower than our median time. (i.e. the slowest 5 files uploaded almost as fast as the median). Again, this points to the power of decentralization and should only get better as we add more Nodes and continue to add Nodes that are distributed closer to end-users.The above results are downloading to a location in Toronto (i.e. Eastern US) in conjunction with a Satellite in Iowa (i.e Central US).Proven CapacityProven capacity measures the total available capacity shared by all of our Storage Node Operators. Our Storage Node Operators have offered up over 15 PB of capacity. Using very conservative statistical models, we can state with a 95% confidence that we have at least 7 PB.Note that while we believe the capacity of the network is significantly higher, we hold ourselves to this more conservative, proven number. This number is more than an order of magnitude lower than the V2 network, which had a capacity of 150 PB at its peak. While we have several partners and Beta customers with several petabytes of capacity, we’re aiming to grow the network more gradually, so that we generally only have about three months of excess capacity at a time. This helps us ensure that all Nodes are receiving economically compelling payouts.Number of Active NodesThis is the number of currently Nodes currently connected to the network. The number of active Nodes in the table above (over 2,956 Nodes now), excludes any Nodes temporarily offline, any Nodes that have voluntarily quit the network, and any Nodes that have been disqualified (e.g. due to missing uptime or audit requirements). Our production goal is 5,000 Nodes-still a fraction of the V2 number. Once a Node is vetted, if it ceases to participate in the network, it contributes to our vetted Node churn metric (see below).Vetted Node ChurnOur current vetted Node churn (excluding probationary Nodes) is 1.18% for the last 30 days. Our Beta 2 gate is 3%, and our production gate is 2.0%, so we’ve already hit this production-level metric. Our system is very resilient to having any individual Nodes (or even a significant percentage of Nodes) churn. However, performance, economics, and most statistics all do better as we bring average Node churn down.Other GatesWe have a wide variety of other gates. These include gates around code quality and test coverage, user and storage Node operator set up success rates, various capabilities, payment capabilities, peak network connections, and more. We also have gates around enablement of non-Storj Labs Tardigrade Satellites, and are aiming to be “Chaos Monkey” and even “Chaos Gorilla” resilient before production.We hit all seven gates for Beta 2, as well as an additional two gates for production. For the past 30 days, we’ve had 99.96%% availability on our service, which includes 9 deployments all with zero downtime.Measure What MattersWhen you’re trying to measure something that’s somewhat nebulous like being “enterprise-grade,” try distilling the goals down to their core parts and measure what matters most to you, your business, and your customers. If you can do that, you can measure your progress and continually make improvements to your offerings.We’ll continue to measure the performance of the network over the next several weeks and if everything goes according to plan, we’ll be in production in January 2020. Stay tuned for further updates. You can also sign up to try out the network for yourself! Everyone who signs up ahead of production will receive a credit worth 1 terabyte of cloud storage and 333 of download egress on Tardigrade.Originally published at https://storj.io.
19. 11. 19
Architecting a Decentralize...
GitBackup is a tool that backs up and archives GitHub repositories. The tool is in the process of backing up the entirety of GitHub onto the Storj network, which currently stands at 1–2 PB of data. As of today, October 18, 2019, the tool has currently snapshotted 815,200 repositories across more than 150,000 users.GitHub is the largest store of open source code in the world, with 20 million users and more than 28 million public repositories as of April 2017.We believe that this reservoir of free and open source code acts as a digital version of a public good, similar to a developers’ library — a library that empowers software engineers to access the collective knowledge around open source code, development patterns, and free software.While GitHub is a wonderful service, it’s owned by an agenda-driven global corporation and is thus prone to downtime, blockage, and censorship by a single point of failure. For example, Microsoft’s acquisition of LinkedIn shows how user content can be gradually taken away (by means of paywalls and login walls).Furthermore, on July 25, 2019, for example, a developer based in Iran wrote on Medium about how GitHub blocked his private repositories and prohibited access to GitHub Pages. Soon after, GitHub confirmed that it was blocking developers in Iran, Crimea, Cuba, North Korea, and Syria from accessing private repositories.If we want to guarantee the preservation of the work of hundreds of thousands of open source developers, we need to act now!Let’s download it all!We’re currently using gharchive.org to get a list of GitHub usernames that have had a public action since 2015. So far the 815,200 repositories we’ve backed up constitutes about 80 TB of data. We anticipate that the entirety of public GitHub repos is about 1–2 PBs so we still have a way to goIf you want to backup your codebases’ repository (or all of GitHub) to the decentralized cloud, check out the tool, found here:http://gitbackup.org/Gitbackup was built by Shawn Wilkinson in collaboration with a number of Storj Labs’ engineers and community members. The tool was demonstrated on October 11 at Devcon V (Osaka, Japan).By Kevin Leffew on CommunityOriginally published at https://storj.io on October 18, 2019
19. 10. 18
IPFS Now on Storj Network
Developers have been clamoring for a decentralized storage solution for pinning data on IPFS — and the Storj community has answered the call with storjipfs.com.The IPFS protocol is popular with decentralized app developers as a way to address content from its hash output. While it’s merely a way to address files across a DHT (or network of Kademlia nodes), it’s usually deployed with Amazon S3 or local storage on the backend. What this means is that decentralized apps using IPFS without pinning to a decentralized storage backend aren’t all that decentralized.Any time a file is uploaded to an IPFS node, there’s no guarantee the file will persist longer than a few minutes (unless self-hosted on reliable hardware, or backed by a centralized cloud provider). The users of services built on the IPFS network face issues where the data they’re trying to access and share is no longer hosted by any nodes. The reality of IPFS is best illustrated through the IPFS subreddit — many of the links are dead, because their hosts have gone offline.We’re excited to announce the availability of a reference architecture that backs an IPFS node to the Tardigrade decentralized cloud storage service. This guarantees the persistence, distribution, and security of content-addressed data.You can now upload files to the Storj network through the IPFS system by going to storjipfs.com.Our community members created this impressive project and we can’t thank them enough for their efforts.Traditional IPFS architecture requires copies of a file spread amongst multiple hosts to achieve redundancy measures. While the theory is interesting, in practice, the approach just doesn’t produce the performance and availability required in modern applications. Instead of replicating files to multiple hosts and relying on a single host for file delivery, Storj uses erasure coding and peer-to-peer parallel delivery. We don’t just think our approach is better, the math proves it.When an IPFS node is backed by the Tardigrade Network, we are able to solve many of the problems that IPFS developers face; problems including data decentralization, data persistence, and default encryption at rest. The Storj network architecture has a native repair system built into it that ensures files remain alive, even when nodes go offline. This reference implementation provides IPFS addressability with durability and reliability on par with the best centralized cloud providers in the industry.The IPFS gateway isn’t the only solution for developers on the Storj network. When it comes to reliable, performant, secure, and economical storage for decentralized apps, the native Storj platform is the best option. Storj offers a wide range of developer tools, including a CLI, S3-compatible gateway, and Go library with language bindings for C, Python, Node.js, .NET, Java, Android, and Swift.To gain access to the Storj network and Tardigrade Service, sign up for the developer waitlist here: https://tardigrade.io/waitlist.By Kevin Leffew on BusinessOriginally published at https://storj.io on October 6, 2019
19. 10. 07